* [PATCH v4 01/12] vfio/pci: Extract mdev check into an helper
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
@ 2024-07-12 11:46 ` Joao Martins
2024-07-16 9:21 ` Cédric Le Goater
2024-07-12 11:46 ` [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
` (11 subsequent siblings)
12 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:46 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
In preparation to skip initialization of the HostIOMMUDevice for mdev,
extract the checks that validate if a device is an mdev into helpers.
A vfio_set_mdev() is created, and subsystems consult VFIODevice::mdev
to check if it's mdev or not.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 2 ++
hw/vfio/helpers.c | 18 ++++++++++++++++++
hw/vfio/pci.c | 9 ++-------
3 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e8ddf92bb185..7419466bca92 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -116,6 +116,7 @@ typedef struct VFIODevice {
DeviceState *dev;
int fd;
int type;
+ bool mdev;
bool reset_works;
bool needs_reset;
bool no_mmap;
@@ -231,6 +232,7 @@ void vfio_region_exit(VFIORegion *region);
void vfio_region_finalize(VFIORegion *region);
void vfio_reset_handler(void *opaque);
struct vfio_device_info *vfio_get_device_info(int fd);
+void vfio_set_mdev(VFIODevice *vbasedev);
bool vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void vfio_detach_device(VFIODevice *vbasedev);
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index b14edd46edc9..bace0e788a09 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -675,3 +675,21 @@ int vfio_device_get_aw_bits(VFIODevice *vdev)
return HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX;
}
+
+void vfio_set_mdev(VFIODevice *vbasedev)
+{
+ g_autofree char *tmp = NULL;
+ char *subsys;
+ bool is_mdev;
+
+ if (!vbasedev->sysfsdev) {
+ return;
+ }
+
+ tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
+ subsys = realpath(tmp, NULL);
+ is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
+ free(subsys);
+
+ vbasedev->mdev = is_mdev;
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e03d9f3ba546..585f23a18406 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2963,12 +2963,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
ERRP_GUARD();
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
- char *subsys;
int i, ret;
bool is_mdev;
char uuid[UUID_STR_LEN];
g_autofree char *name = NULL;
- g_autofree char *tmp = NULL;
if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
if (!(~vdev->host.domain || ~vdev->host.bus ||
@@ -2997,11 +2995,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
* stays in sync with the active working set of the guest driver. Prevent
* the x-balloon-allowed option unless this is minimally an mdev device.
*/
- tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
- subsys = realpath(tmp, NULL);
- is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
- free(subsys);
-
+ vfio_set_mdev(vbasedev);
+ is_mdev = vbasedev->mdev;
trace_vfio_mdev(vbasedev->name, is_mdev);
if (vbasedev->ram_block_discard_allowed && !is_mdev) {
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 01/12] vfio/pci: Extract mdev check into an helper
2024-07-12 11:46 ` [PATCH v4 01/12] vfio/pci: Extract mdev check into an helper Joao Martins
@ 2024-07-16 9:21 ` Cédric Le Goater
2024-07-16 9:33 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 9:21 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
Hello Joao
On 7/12/24 13:46, Joao Martins wrote:
> In preparation to skip initialization of the HostIOMMUDevice for mdev,
> extract the checks that validate if a device is an mdev into helpers.
>
> A vfio_set_mdev() is created, and subsystems consult VFIODevice::mdev
> to check if it's mdev or not.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 2 ++
> hw/vfio/helpers.c | 18 ++++++++++++++++++
> hw/vfio/pci.c | 9 ++-------
> 3 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e8ddf92bb185..7419466bca92 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -116,6 +116,7 @@ typedef struct VFIODevice {
> DeviceState *dev;
> int fd;
> int type;
> + bool mdev;
> bool reset_works;
> bool needs_reset;
> bool no_mmap;
> @@ -231,6 +232,7 @@ void vfio_region_exit(VFIORegion *region);
> void vfio_region_finalize(VFIORegion *region);
> void vfio_reset_handler(void *opaque);
> struct vfio_device_info *vfio_get_device_info(int fd);
> +void vfio_set_mdev(VFIODevice *vbasedev);
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index b14edd46edc9..bace0e788a09 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -675,3 +675,21 @@ int vfio_device_get_aw_bits(VFIODevice *vdev)
>
> return HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX;
> }
> +
> +void vfio_set_mdev(VFIODevice *vbasedev)
Could you please change this routine to :
bool vfio_device_is_mdev(VFIODevice *vbasedev)
> +{
> + g_autofree char *tmp = NULL;
> + char *subsys;
a g_autofree variable is preferable here.
> + bool is_mdev;
> +
> + if (!vbasedev->sysfsdev) {
> + return;
> + }
> +
> + tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
> + subsys = realpath(tmp, NULL);
> + is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
simply return is the result here and ....
> + free(subsys);
> +
> + vbasedev->mdev = is_mdev;
> +}
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e03d9f3ba546..585f23a18406 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2963,12 +2963,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> ERRP_GUARD();
> VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
> - char *subsys;
> int i, ret;
> bool is_mdev;
> char uuid[UUID_STR_LEN];
> g_autofree char *name = NULL;
> - g_autofree char *tmp = NULL;
>
> if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
> if (!(~vdev->host.domain || ~vdev->host.bus ||
> @@ -2997,11 +2995,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> * stays in sync with the active working set of the guest driver. Prevent
> * the x-balloon-allowed option unless this is minimally an mdev device.
> */
> - tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
> - subsys = realpath(tmp, NULL);
> - is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
> - free(subsys);
> -
> + vfio_set_mdev(vbasedev);
> + is_mdev = vbasedev->mdev;
replace with :
vbasedev->mdev = vfio_device_is_mdev(vbasedev);
and use vbasedev->mdev instead of is_mdev where needed.
Thanks,
C.
> trace_vfio_mdev(vbasedev->name, is_mdev);
>
> if (vbasedev->ram_block_discard_allowed && !is_mdev) {
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 01/12] vfio/pci: Extract mdev check into an helper
2024-07-16 9:21 ` Cédric Le Goater
@ 2024-07-16 9:33 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-16 9:33 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 16/07/2024 10:21, Cédric Le Goater wrote:
> Hello Joao
>
> On 7/12/24 13:46, Joao Martins wrote:
>> In preparation to skip initialization of the HostIOMMUDevice for mdev,
>> extract the checks that validate if a device is an mdev into helpers.
>>
>> A vfio_set_mdev() is created, and subsystems consult VFIODevice::mdev
>> to check if it's mdev or not.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 2 ++
>> hw/vfio/helpers.c | 18 ++++++++++++++++++
>> hw/vfio/pci.c | 9 ++-------
>> 3 files changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index e8ddf92bb185..7419466bca92 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -116,6 +116,7 @@ typedef struct VFIODevice {
>> DeviceState *dev;
>> int fd;
>> int type;
>> + bool mdev;
>> bool reset_works;
>> bool needs_reset;
>> bool no_mmap;
>> @@ -231,6 +232,7 @@ void vfio_region_exit(VFIORegion *region);
>> void vfio_region_finalize(VFIORegion *region);
>> void vfio_reset_handler(void *opaque);
>> struct vfio_device_info *vfio_get_device_info(int fd);
>> +void vfio_set_mdev(VFIODevice *vbasedev);
>> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp);
>> void vfio_detach_device(VFIODevice *vbasedev);
>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>> index b14edd46edc9..bace0e788a09 100644
>> --- a/hw/vfio/helpers.c
>> +++ b/hw/vfio/helpers.c
>> @@ -675,3 +675,21 @@ int vfio_device_get_aw_bits(VFIODevice *vdev)
>> return HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX;
>> }
>> +
>> +void vfio_set_mdev(VFIODevice *vbasedev)
>
> Could you please change this routine to :
>
> bool vfio_device_is_mdev(VFIODevice *vbasedev)
>
OK
>> +{
>> + g_autofree char *tmp = NULL;
>> + char *subsys;
>
> a g_autofree variable is preferable here.
>
It was a code move hence why I kept the exact same structure
I will change to your suggestion
>> + bool is_mdev;
>> +
>> + if (!vbasedev->sysfsdev) {
>> + return;
>> + }
>> +
>> + tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
>> + subsys = realpath(tmp, NULL);
>> + is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>
> simply return is the result here and ....
>
OK
>> + free(subsys);
>> +
>> + vbasedev->mdev = is_mdev;
>> +}
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e03d9f3ba546..585f23a18406 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2963,12 +2963,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>> ERRP_GUARD();
>> VFIOPCIDevice *vdev = VFIO_PCI(pdev);
>> VFIODevice *vbasedev = &vdev->vbasedev;
>> - char *subsys;
>> int i, ret;
>> bool is_mdev;
>> char uuid[UUID_STR_LEN];
>> g_autofree char *name = NULL;
>> - g_autofree char *tmp = NULL;
>> if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
>> if (!(~vdev->host.domain || ~vdev->host.bus ||
>> @@ -2997,11 +2995,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>> * stays in sync with the active working set of the guest driver. Prevent
>> * the x-balloon-allowed option unless this is minimally an mdev device.
>> */
>> - tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
>> - subsys = realpath(tmp, NULL);
>> - is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>> - free(subsys);
>> -
>> + vfio_set_mdev(vbasedev);
>> + is_mdev = vbasedev->mdev;
>
> replace with :
>
> vbasedev->mdev = vfio_device_is_mdev(vbasedev);
>
> and use vbasedev->mdev instead of is_mdev where needed.
>
OK
>
> Thanks,
>
> C.
>
>
>
>
>> trace_vfio_mdev(vbasedev->name, is_mdev);
>> if (vbasedev->ram_block_discard_allowed && !is_mdev) {
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
2024-07-12 11:46 ` [PATCH v4 01/12] vfio/pci: Extract mdev check into an helper Joao Martins
@ 2024-07-12 11:46 ` Joao Martins
2024-07-16 9:21 ` Cédric Le Goater
` (2 more replies)
2024-07-12 11:46 ` [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
` (10 subsequent siblings)
12 siblings, 3 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:46 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
mdevs aren't "physical" devices and when asking for backing IOMMU info, it
fails the entire provisioning of the guest. Fix that by skipping
HostIOMMUDevice initialization in the presence of mdevs, and skip setting
an iommu device when it is known to be an mdev.
Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
hw/vfio/common.c | 4 ++++
hw/vfio/pci.c | 10 +++++++---
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7cdb969fd396..b0beed44116e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1556,6 +1556,10 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
return false;
}
+ if (vbasedev->mdev) {
+ return true;
+ }
+
hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
object_unref(hiod);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 585f23a18406..3fc72e898a25 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3116,7 +3116,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
vfio_bars_register(vdev);
- if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
+ if (!is_mdev && !pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
error_prepend(errp, "Failed to set iommu_device: ");
goto out_teardown;
}
@@ -3239,7 +3239,9 @@ out_deregister:
timer_free(vdev->intx.mmap_timer);
}
out_unset_idev:
- pci_device_unset_iommu_device(pdev);
+ if (!is_mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
out_teardown:
vfio_teardown_msi(vdev);
vfio_bars_exit(vdev);
@@ -3284,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
vfio_pci_disable_rp_atomics(vdev);
vfio_bars_exit(vdev);
vfio_migration_exit(vbasedev);
- pci_device_unset_iommu_device(pdev);
+ if (!vbasedev->mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
}
static void vfio_pci_reset(DeviceState *dev)
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
2024-07-12 11:46 ` [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
@ 2024-07-16 9:21 ` Cédric Le Goater
2024-07-16 13:26 ` Eric Auger
2024-07-17 1:34 ` Duan, Zhenzhong
2 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 9:21 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> mdevs aren't "physical" devices and when asking for backing IOMMU info, it
> fails the entire provisioning of the guest. Fix that by skipping
> HostIOMMUDevice initialization in the presence of mdevs, and skip setting
> an iommu device when it is known to be an mdev.
>
> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> hw/vfio/common.c | 4 ++++
> hw/vfio/pci.c | 10 +++++++---
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7cdb969fd396..b0beed44116e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1556,6 +1556,10 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> return false;
> }
>
> + if (vbasedev->mdev) {
> + return true;
> + }
> +
> hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> object_unref(hiod);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 585f23a18406..3fc72e898a25 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3116,7 +3116,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>
> vfio_bars_register(vdev);
>
> - if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
> + if (!is_mdev && !pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
let's use vbasedev->mdev instead.
Thanks,
C.
> error_prepend(errp, "Failed to set iommu_device: ");
> goto out_teardown;
> }
> @@ -3239,7 +3239,9 @@ out_deregister:
> timer_free(vdev->intx.mmap_timer);
> }
> out_unset_idev:
> - pci_device_unset_iommu_device(pdev);
> + if (!is_mdev) {
> + pci_device_unset_iommu_device(pdev);
> + }
> out_teardown:
> vfio_teardown_msi(vdev);
> vfio_bars_exit(vdev);
> @@ -3284,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
> vfio_pci_disable_rp_atomics(vdev);
> vfio_bars_exit(vdev);
> vfio_migration_exit(vbasedev);
> - pci_device_unset_iommu_device(pdev);
> + if (!vbasedev->mdev) {
> + pci_device_unset_iommu_device(pdev);
> + }
> }
>
> static void vfio_pci_reset(DeviceState *dev)
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
2024-07-12 11:46 ` [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
2024-07-16 9:21 ` Cédric Le Goater
@ 2024-07-16 13:26 ` Eric Auger
2024-07-17 1:34 ` Duan, Zhenzhong
2 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-16 13:26 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> mdevs aren't "physical" devices and when asking for backing IOMMU info, it
> fails the entire provisioning of the guest. Fix that by skipping
> HostIOMMUDevice initialization in the presence of mdevs, and skip setting
> an iommu device when it is known to be an mdev.
>
> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
With or without Cédric's suggestion
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> hw/vfio/common.c | 4 ++++
> hw/vfio/pci.c | 10 +++++++---
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7cdb969fd396..b0beed44116e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1556,6 +1556,10 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> return false;
> }
>
> + if (vbasedev->mdev) {
> + return true;
> + }
> +
> hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> object_unref(hiod);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 585f23a18406..3fc72e898a25 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3116,7 +3116,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>
> vfio_bars_register(vdev);
>
> - if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
> + if (!is_mdev && !pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
> error_prepend(errp, "Failed to set iommu_device: ");
> goto out_teardown;
> }
> @@ -3239,7 +3239,9 @@ out_deregister:
> timer_free(vdev->intx.mmap_timer);
> }
> out_unset_idev:
> - pci_device_unset_iommu_device(pdev);
> + if (!is_mdev) {
> + pci_device_unset_iommu_device(pdev);
> + }
> out_teardown:
> vfio_teardown_msi(vdev);
> vfio_bars_exit(vdev);
> @@ -3284,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
> vfio_pci_disable_rp_atomics(vdev);
> vfio_bars_exit(vdev);
> vfio_migration_exit(vbasedev);
> - pci_device_unset_iommu_device(pdev);
> + if (!vbasedev->mdev) {
> + pci_device_unset_iommu_device(pdev);
> + }
> }
>
> static void vfio_pci_reset(DeviceState *dev)
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
2024-07-12 11:46 ` [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
2024-07-16 9:21 ` Cédric Le Goater
2024-07-16 13:26 ` Eric Auger
@ 2024-07-17 1:34 ` Duan, Zhenzhong
2 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 1:34 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hello Joao,
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a
>HOST_IOMMU_DEVICE with mdev
>
>mdevs aren't "physical" devices and when asking for backing IOMMU info, it
>fails the entire provisioning of the guest. Fix that by skipping
>HostIOMMUDevice initialization in the presence of mdevs, and skip setting
>an iommu device when it is known to be an mdev.
>
>Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>Fixes: 930589520128 ("vfio/iommufd: Implement
>HostIOMMUDeviceClass::realize() handler")
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Thanks for fixing.
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
BRs.
Zhenzhong
>---
> hw/vfio/common.c | 4 ++++
> hw/vfio/pci.c | 10 +++++++---
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>index 7cdb969fd396..b0beed44116e 100644
>--- a/hw/vfio/common.c
>+++ b/hw/vfio/common.c
>@@ -1556,6 +1556,10 @@ bool vfio_attach_device(char *name,
>VFIODevice *vbasedev,
> return false;
> }
>
>+ if (vbasedev->mdev) {
>+ return true;
>+ }
>+
> hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp)) {
> object_unref(hiod);
>diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>index 585f23a18406..3fc72e898a25 100644
>--- a/hw/vfio/pci.c
>+++ b/hw/vfio/pci.c
>@@ -3116,7 +3116,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>
> vfio_bars_register(vdev);
>
>- if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
>+ if (!is_mdev && !pci_device_set_iommu_device(pdev, vbasedev->hiod,
>errp)) {
> error_prepend(errp, "Failed to set iommu_device: ");
> goto out_teardown;
> }
>@@ -3239,7 +3239,9 @@ out_deregister:
> timer_free(vdev->intx.mmap_timer);
> }
> out_unset_idev:
>- pci_device_unset_iommu_device(pdev);
>+ if (!is_mdev) {
>+ pci_device_unset_iommu_device(pdev);
>+ }
> out_teardown:
> vfio_teardown_msi(vdev);
> vfio_bars_exit(vdev);
>@@ -3284,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
> vfio_pci_disable_rp_atomics(vdev);
> vfio_bars_exit(vdev);
> vfio_migration_exit(vbasedev);
>- pci_device_unset_iommu_device(pdev);
>+ if (!vbasedev->mdev) {
>+ pci_device_unset_iommu_device(pdev);
>+ }
> }
>
> static void vfio_pci_reset(DeviceState *dev)
>--
>2.17.2
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
2024-07-12 11:46 ` [PATCH v4 01/12] vfio/pci: Extract mdev check into an helper Joao Martins
2024-07-12 11:46 ` [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
@ 2024-07-12 11:46 ` Joao Martins
2024-07-16 9:22 ` Cédric Le Goater
2024-07-16 13:34 ` Eric Auger
2024-07-12 11:46 ` [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
` (9 subsequent siblings)
12 siblings, 2 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:46 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
The helper will be able to fetch vendor agnostic IOMMU capabilities
supported both by hardware and software. Right now it is only iommu dirty
tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/sysemu/iommufd.h | 2 +-
backends/iommufd.c | 4 +++-
hw/vfio/iommufd.c | 4 +++-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 9edfec604595..57d502a1c79a 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -49,7 +49,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
hwaddr iova, ram_addr_t size);
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- Error **errp);
+ uint64_t *caps, Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 84fefbc9ee7a..2b3d51af26d2 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -210,7 +210,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- Error **errp)
+ uint64_t *caps, Error **errp)
{
struct iommu_hw_info info = {
.size = sizeof(info),
@@ -226,6 +226,8 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
g_assert(type);
*type = info.out_data_type;
+ g_assert(caps);
+ *caps = info.out_capabilities;
return true;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index c2f158e60386..604eaa4d9a5d 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -628,11 +628,13 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
union {
struct iommu_hw_info_vtd vtd;
} data;
+ uint64_t hw_caps;
hiod->agent = opaque;
if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
- &type, &data, sizeof(data), errp)) {
+ &type, &data, sizeof(data),
+ &hw_caps, errp)) {
return false;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
2024-07-12 11:46 ` [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
@ 2024-07-16 9:22 ` Cédric Le Goater
2024-07-16 13:34 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 9:22 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> The helper will be able to fetch vendor agnostic IOMMU capabilities
> supported both by hardware and software. Right now it is only iommu dirty
> tracking.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/sysemu/iommufd.h | 2 +-
> backends/iommufd.c | 4 +++-
> hw/vfio/iommufd.c | 4 +++-
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 9edfec604595..57d502a1c79a 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -49,7 +49,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> hwaddr iova, ram_addr_t size);
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> - Error **errp);
> + uint64_t *caps, Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 84fefbc9ee7a..2b3d51af26d2 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -210,7 +210,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> - Error **errp)
> + uint64_t *caps, Error **errp)
> {
> struct iommu_hw_info info = {
> .size = sizeof(info),
> @@ -226,6 +226,8 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>
> g_assert(type);
> *type = info.out_data_type;
> + g_assert(caps);
> + *caps = info.out_capabilities;
>
> return true;
> }
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index c2f158e60386..604eaa4d9a5d 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -628,11 +628,13 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> union {
> struct iommu_hw_info_vtd vtd;
> } data;
> + uint64_t hw_caps;
>
> hiod->agent = opaque;
>
> if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
> - &type, &data, sizeof(data), errp)) {
> + &type, &data, sizeof(data),
> + &hw_caps, errp)) {
> return false;
> }
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
2024-07-12 11:46 ` [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
2024-07-16 9:22 ` Cédric Le Goater
@ 2024-07-16 13:34 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-16 13:34 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hi Joao,
On 7/12/24 13:46, Joao Martins wrote:
> The helper will be able to fetch vendor agnostic IOMMU capabilities
> supported both by hardware and software. Right now it is only iommu dirty
> tracking.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> include/sysemu/iommufd.h | 2 +-
> backends/iommufd.c | 4 +++-
> hw/vfio/iommufd.c | 4 +++-
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 9edfec604595..57d502a1c79a 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -49,7 +49,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> hwaddr iova, ram_addr_t size);
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> - Error **errp);
> + uint64_t *caps, Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 84fefbc9ee7a..2b3d51af26d2 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -210,7 +210,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> - Error **errp)
> + uint64_t *caps, Error **errp)
> {
> struct iommu_hw_info info = {
> .size = sizeof(info),
> @@ -226,6 +226,8 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>
> g_assert(type);
> *type = info.out_data_type;
> + g_assert(caps);
> + *caps = info.out_capabilities;
>
> return true;
> }
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index c2f158e60386..604eaa4d9a5d 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -628,11 +628,13 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> union {
> struct iommu_hw_info_vtd vtd;
> } data;
> + uint64_t hw_caps;
>
> hiod->agent = opaque;
>
> if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
> - &type, &data, sizeof(data), errp)) {
> + &type, &data, sizeof(data),
> + &hw_caps, errp)) {
> return false;
> }
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (2 preceding siblings ...)
2024-07-12 11:46 ` [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
@ 2024-07-12 11:46 ` Joao Martins
2024-07-16 9:27 ` Cédric Le Goater
` (2 more replies)
2024-07-12 11:46 ` [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation Joao Martins
` (8 subsequent siblings)
12 siblings, 3 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:46 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
In preparation to implement auto domains have the attach function
return the errno it got during domain attach instead of a bool.
-EINVAL is tracked to track domain incompatibilities, and decide whether
to create a new IOMMU domain.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
hw/vfio/iommufd.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 604eaa4d9a5d..077dea8f1b64 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -172,7 +172,7 @@ out:
return ret;
}
-static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
+static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
Error **errp)
{
int iommufd = vbasedev->iommufd->fd;
@@ -187,12 +187,12 @@ static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
error_setg_errno(errp, errno,
"[iommufd=%d] error attach %s (%d) to id=%d",
iommufd, vbasedev->name, vbasedev->fd, id);
- return false;
+ return -errno;
}
trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
vbasedev->fd, id);
- return true;
+ return 0;
}
static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
@@ -216,7 +216,7 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
- return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
+ return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-12 11:46 ` [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
@ 2024-07-16 9:27 ` Cédric Le Goater
2024-07-16 13:36 ` Eric Auger
2024-07-17 1:37 ` Duan, Zhenzhong
2 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 9:27 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> In preparation to implement auto domains have the attach function
> return the errno it got during domain attach instead of a bool.
>
> -EINVAL is tracked to track domain incompatibilities, and decide whether
> to create a new IOMMU domain.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/iommufd.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 604eaa4d9a5d..077dea8f1b64 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -172,7 +172,7 @@ out:
> return ret;
> }
>
> -static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> +static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> Error **errp)
> {
> int iommufd = vbasedev->iommufd->fd;
> @@ -187,12 +187,12 @@ static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> error_setg_errno(errp, errno,
> "[iommufd=%d] error attach %s (%d) to id=%d",
> iommufd, vbasedev->name, vbasedev->fd, id);
> - return false;
> + return -errno;
> }
>
> trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
> vbasedev->fd, id);
> - return true;
> + return 0;
> }
>
> static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> @@ -216,7 +216,7 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
> - return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> + return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> }
>
> static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-12 11:46 ` [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
2024-07-16 9:27 ` Cédric Le Goater
@ 2024-07-16 13:36 ` Eric Auger
2024-07-17 1:37 ` Duan, Zhenzhong
2 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-16 13:36 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> In preparation to implement auto domains have the attach function
> return the errno it got during domain attach instead of a bool.
>
> -EINVAL is tracked to track domain incompatibilities, and decide whether
> to create a new IOMMU domain.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> hw/vfio/iommufd.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 604eaa4d9a5d..077dea8f1b64 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -172,7 +172,7 @@ out:
> return ret;
> }
>
> -static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> +static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> Error **errp)
> {
> int iommufd = vbasedev->iommufd->fd;
> @@ -187,12 +187,12 @@ static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> error_setg_errno(errp, errno,
> "[iommufd=%d] error attach %s (%d) to id=%d",
> iommufd, vbasedev->name, vbasedev->fd, id);
> - return false;
> + return -errno;
> }
>
> trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
> vbasedev->fd, id);
> - return true;
> + return 0;
> }
>
> static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> @@ -216,7 +216,7 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
> - return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> + return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> }
>
> static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-12 11:46 ` [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
2024-07-16 9:27 ` Cédric Le Goater
2024-07-16 13:36 ` Eric Auger
@ 2024-07-17 1:37 ` Duan, Zhenzhong
2 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 1:37 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v4 04/12] vfio/iommufd: Return errno in
>iommufd_cdev_attach_ioas_hwpt()
>
>In preparation to implement auto domains have the attach function
>return the errno it got during domain attach instead of a bool.
>
>-EINVAL is tracked to track domain incompatibilities, and decide whether
>to create a new IOMMU domain.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
>---
> hw/vfio/iommufd.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 604eaa4d9a5d..077dea8f1b64 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -172,7 +172,7 @@ out:
> return ret;
> }
>
>-static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev,
>uint32_t id,
>+static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev,
>uint32_t id,
> Error **errp)
> {
> int iommufd = vbasedev->iommufd->fd;
>@@ -187,12 +187,12 @@ static bool
>iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> error_setg_errno(errp, errno,
> "[iommufd=%d] error attach %s (%d) to id=%d",
> iommufd, vbasedev->name, vbasedev->fd, id);
>- return false;
>+ return -errno;
> }
>
> trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
> vbasedev->fd, id);
>- return true;
>+ return 0;
> }
>
> static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error
>**errp)
>@@ -216,7 +216,7 @@ static bool
>iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
>- return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>errp);
>+ return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>errp);
> }
>
> static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
>--
>2.17.2
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (3 preceding siblings ...)
2024-07-12 11:46 ` [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
@ 2024-07-12 11:46 ` Joao Martins
2024-07-16 9:39 ` Cédric Le Goater
` (3 more replies)
2024-07-12 11:46 ` [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
` (7 subsequent siblings)
12 siblings, 4 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:46 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
There's generally two modes of operation for IOMMUFD:
* The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
and mainly performs IOAS_MAP and UNMAP.
* The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.
For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.
Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.
To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimmicing kernel
iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
it falls back to IOAS attach.
The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here is not used in this way here given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' of type1 approach that we have
been using so far between container vs device dirty tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 9 ++++
include/sysemu/iommufd.h | 5 +++
backends/iommufd.c | 30 +++++++++++++
hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
backends/trace-events | 1 +
5 files changed, 127 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7419466bca92..2dd468ce3c02 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
typedef struct IOMMUFDBackend IOMMUFDBackend;
+typedef struct VFIOIOASHwpt {
+ uint32_t hwpt_id;
+ QLIST_HEAD(, VFIODevice) device_list;
+ QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
typedef struct VFIOIOMMUFDContainer {
VFIOContainerBase bcontainer;
IOMMUFDBackend *be;
uint32_t ioas_id;
+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
} VFIOIOMMUFDContainer;
OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
@@ -135,6 +142,8 @@ typedef struct VFIODevice {
HostIOMMUDevice *hiod;
int devid;
IOMMUFDBackend *iommufd;
+ VFIOIOASHwpt *hwpt;
+ QLIST_ENTRY(VFIODevice) hwpt_next;
} VFIODevice;
struct VFIODeviceOps {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 57d502a1c79a..e917e7591d05 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp);
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2b3d51af26d2..5d3dfa917415 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
return ret;
}
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp)
+{
+ int ret, fd = be->fd;
+ struct iommu_hwpt_alloc alloc_hwpt = {
+ .size = sizeof(struct iommu_hwpt_alloc),
+ .flags = flags,
+ .dev_id = dev_id,
+ .pt_id = pt_id,
+ .data_type = data_type,
+ .data_len = data_len,
+ .data_uptr = (uint64_t)data_ptr,
+ };
+
+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
+ data_len, (uint64_t)data_ptr,
+ alloc_hwpt.out_hwpt_id, ret);
+ if (ret) {
+ error_setg_errno(errp, errno, "Failed to allocate hwpt");
+ return false;
+ }
+
+ *out_hwpt = alloc_hwpt.out_hwpt_id;
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 077dea8f1b64..325c7598d5a1 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
return true;
}
+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container,
+ Error **errp)
+{
+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
+ uint32_t flags = 0;
+ VFIOIOASHwpt *hwpt;
+ uint32_t hwpt_id;
+ int ret;
+
+ /* Try to find a domain */
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ /* -EINVAL means the domain is incompatible with the device. */
+ if (ret == -EINVAL) {
+ /*
+ * It is an expected failure and it just means we will try
+ * another domain, or create one if no existing compatible
+ * domain is found. Hence why the error is discarded below.
+ */
+ error_free(*errp);
+ *errp = NULL;
+ continue;
+ }
+
+ return false;
+ } else {
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ return true;
+ }
+ }
+
+ if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
+ container->ioas_id, flags,
+ IOMMU_HWPT_DATA_NONE, 0, NULL,
+ &hwpt_id, errp)) {
+ return false;
+ }
+
+ hwpt = g_malloc0(sizeof(*hwpt));
+ hwpt->hwpt_id = hwpt_id;
+ QLIST_INIT(&hwpt->device_list);
+
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ return false;
+ }
+
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ return true;
+}
+
+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container)
+{
+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
+
+ QLIST_REMOVE(vbasedev, hwpt_next);
+ if (QLIST_EMPTY(&hwpt->device_list)) {
+ QLIST_REMOVE(hwpt, next);
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ }
+}
+
static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
+ /* mdevs aren't physical devices and will fail with auto domains */
+ if (!vbasedev->mdev) {
+ return iommufd_cdev_autodomains_get(vbasedev, container, errp);
+ }
+
return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
@@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
{
Error *err = NULL;
+ if (vbasedev->hwpt) {
+ iommufd_cdev_autodomains_put(vbasedev, container);
+ return;
+ }
+
if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
error_report_err(err);
}
@@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
container->be = vbasedev->iommufd;
container->ioas_id = ioas_id;
+ QLIST_INIT(&container->hwpt_list);
bcontainer = &container->bcontainer;
vfio_address_space_insert(space, bcontainer);
diff --git a/backends/trace-events b/backends/trace-events
index 211e6f374adc..4d8ac02fe7d6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-12 11:46 ` [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-07-16 9:39 ` Cédric Le Goater
2024-07-16 9:47 ` Joao Martins
2024-07-16 12:54 ` Cédric Le Goater
` (2 subsequent siblings)
3 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 9:39 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> There's generally two modes of operation for IOMMUFD:
>
> * The simple user API which intends to perform relatively simple things
> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
> and mainly performs IOAS_MAP and UNMAP.
>
> * The native IOMMUFD API where you have fine grained control of the
> IOMMU domain and model it accordingly. This is where most new feature
> are being steered to.
>
> For dirty tracking 2)
I suppose 1) and 2) are the bullets above ?
> is required, as it needs to ensure that
> the stage-2/parent IOMMU domain will only attach devices
> that support dirty tracking (so far it is all homogeneous in x86, likely
> not the case for smmuv3). Such invariant on dirty tracking provides a
> useful guarantee to VMMs that will refuse incompatible device
> attachments for IOMMU domains.
>
> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
> responsible for creating an IOMMU domain. This is contrast to the
> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
> when it attaches to VFIO (usually referred as autodomains) but it has
> the needed handling for mdevs.
>
> To support dirty tracking with the advanced IOMMUFD API, it needs
> similar logic, where IOMMU domains are created and devices attached to
> compatible domains. Essentially mimmicing kernel
mimmicing -> mimicking, I think.
> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
> it falls back to IOAS attach.
>
> The auto domain logic allows different IOMMU domains to be created when
> DMA dirty tracking is not desired (and VF can provide it), and others where
> it is. Here is not used in this way here given how VFIODevice migration
> state is initialized after the device attachment. But such mixed mode of
> IOMMU dirty tracking + device dirty tracking is an improvement that can
> be added on. Keep the 'all of nothing' of type1 approach that we have
> been using so far between container vs device dirty tracking.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
This needs feedback from IOMMUFD experts also.
Thanks,
C.
> ---
> include/hw/vfio/vfio-common.h | 9 ++++
> include/sysemu/iommufd.h | 5 +++
> backends/iommufd.c | 30 +++++++++++++
> hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 5 files changed, 127 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7419466bca92..2dd468ce3c02 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> +typedef struct VFIOIOASHwpt {
> + uint32_t hwpt_id;
> + QLIST_HEAD(, VFIODevice) device_list;
> + QLIST_ENTRY(VFIOIOASHwpt) next;
> +} VFIOIOASHwpt;
> +
> typedef struct VFIOIOMMUFDContainer {
> VFIOContainerBase bcontainer;
> IOMMUFDBackend *be;
> uint32_t ioas_id;
> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> } VFIOIOMMUFDContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
> HostIOMMUDevice *hiod;
> int devid;
> IOMMUFDBackend *iommufd;
> + VFIOIOASHwpt *hwpt;
> + QLIST_ENTRY(VFIODevice) hwpt_next;
> } VFIODevice;
>
> struct VFIODeviceOps {
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 57d502a1c79a..e917e7591d05 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt,
> + Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 2b3d51af26d2..5d3dfa917415 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> return ret;
> }
>
> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt,
> + Error **errp)
> +{
> + int ret, fd = be->fd;
> + struct iommu_hwpt_alloc alloc_hwpt = {
> + .size = sizeof(struct iommu_hwpt_alloc),
> + .flags = flags,
> + .dev_id = dev_id,
> + .pt_id = pt_id,
> + .data_type = data_type,
> + .data_len = data_len,
> + .data_uptr = (uint64_t)data_ptr,
> + };
> +
> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
> + data_len, (uint64_t)data_ptr,
> + alloc_hwpt.out_hwpt_id, ret);
> + if (ret) {
> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
> + return false;
> + }
> +
> + *out_hwpt = alloc_hwpt.out_hwpt_id;
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 077dea8f1b64..325c7598d5a1 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container,
> + Error **errp)
> +{
> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
> + uint32_t flags = 0;
> + VFIOIOASHwpt *hwpt;
> + uint32_t hwpt_id;
> + int ret;
> +
> + /* Try to find a domain */
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> + if (ret) {
> + /* -EINVAL means the domain is incompatible with the device. */
> + if (ret == -EINVAL) {
> + /*
> + * It is an expected failure and it just means we will try
> + * another domain, or create one if no existing compatible
> + * domain is found. Hence why the error is discarded below.
> + */
> + error_free(*errp);
> + *errp = NULL;
> + continue;
> + }
> +
> + return false;
> + } else {
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + return true;
> + }
> + }
> +
> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> + container->ioas_id, flags,
> + IOMMU_HWPT_DATA_NONE, 0, NULL,
> + &hwpt_id, errp)) {
> + return false;
> + }
> +
> + hwpt = g_malloc0(sizeof(*hwpt));
> + hwpt->hwpt_id = hwpt_id;
> + QLIST_INIT(&hwpt->device_list);
> +
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> + if (ret) {
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + return false;
> + }
> +
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> + return true;
> +}
> +
> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container)
> +{
> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
> +
> + QLIST_REMOVE(vbasedev, hwpt_next);
> + if (QLIST_EMPTY(&hwpt->device_list)) {
> + QLIST_REMOVE(hwpt, next);
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + }
> +}
> +
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
> + /* mdevs aren't physical devices and will fail with auto domains */
> + if (!vbasedev->mdev) {
> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
> + }
> +
> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> }
>
> @@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
> {
> Error *err = NULL;
>
> + if (vbasedev->hwpt) {
> + iommufd_cdev_autodomains_put(vbasedev, container);
> + return;
> + }
> +
> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
> error_report_err(err);
> }
> @@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
> container->be = vbasedev->iommufd;
> container->ioas_id = ioas_id;
> + QLIST_INIT(&container->hwpt_list);
>
> bcontainer = &container->bcontainer;
> vfio_address_space_insert(space, bcontainer);
> diff --git a/backends/trace-events b/backends/trace-events
> index 211e6f374adc..4d8ac02fe7d6 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-16 9:39 ` Cédric Le Goater
@ 2024-07-16 9:47 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-16 9:47 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 16/07/2024 10:39, Cédric Le Goater wrote:
> On 7/12/24 13:46, Joao Martins wrote:
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2)
>
> I suppose 1) and 2) are the bullets above ?
>
yeah
>> is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>> responsible for creating an IOMMU domain. This is contrast to the
>> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
>> when it attaches to VFIO (usually referred as autodomains) but it has
>> the needed handling for mdevs.
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>
> mimmicing -> mimicking, I think.
>
Ack
>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
>> it falls back to IOAS attach.
>>
>> The auto domain logic allows different IOMMU domains to be created when
>> DMA dirty tracking is not desired (and VF can provide it), and others where
>> it is. Here is not used in this way here given how VFIODevice migration
>> state is initialized after the device attachment. But such mixed mode of
>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>> be added on. Keep the 'all of nothing' of type1 approach that we have
>> been using so far between container vs device dirty tracking.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
> This needs feedback from IOMMUFD experts also.
I take it that by IOMMUFD experts you the ones more familiar with Qemu side
(Zhenzhong and/or Yi)
Joao
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-12 11:46 ` [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation Joao Martins
2024-07-16 9:39 ` Cédric Le Goater
@ 2024-07-16 12:54 ` Cédric Le Goater
2024-07-16 16:04 ` Eric Auger
2024-07-17 2:18 ` Duan, Zhenzhong
3 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 12:54 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> There's generally two modes of operation for IOMMUFD:
>
> * The simple user API which intends to perform relatively simple things
> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
> and mainly performs IOAS_MAP and UNMAP.
>
> * The native IOMMUFD API where you have fine grained control of the
> IOMMU domain and model it accordingly. This is where most new feature
> are being steered to.
>
> For dirty tracking 2) is required, as it needs to ensure that
> the stage-2/parent IOMMU domain will only attach devices
> that support dirty tracking (so far it is all homogeneous in x86, likely
> not the case for smmuv3). Such invariant on dirty tracking provides a
> useful guarantee to VMMs that will refuse incompatible device
> attachments for IOMMU domains.
>
> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
> responsible for creating an IOMMU domain. This is contrast to the
> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
> when it attaches to VFIO (usually referred as autodomains) but it has
> the needed handling for mdevs.
>
> To support dirty tracking with the advanced IOMMUFD API, it needs
> similar logic, where IOMMU domains are created and devices attached to
> compatible domains. Essentially mimmicing kernel
> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
> it falls back to IOAS attach.
>
> The auto domain logic allows different IOMMU domains to be created when
> DMA dirty tracking is not desired (and VF can provide it), and others where
> it is. Here is not used in this way here given how VFIODevice migration
> state is initialized after the device attachment. But such mixed mode of
> IOMMU dirty tracking + device dirty tracking is an improvement that can
> be added on. Keep the 'all of nothing' of type1 approach that we have
> been using so far between container vs device dirty tracking.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 9 ++++
> include/sysemu/iommufd.h | 5 +++
> backends/iommufd.c | 30 +++++++++++++
> hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 5 files changed, 127 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7419466bca92..2dd468ce3c02 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> +typedef struct VFIOIOASHwpt {
> + uint32_t hwpt_id;
> + QLIST_HEAD(, VFIODevice) device_list;
> + QLIST_ENTRY(VFIOIOASHwpt) next;
> +} VFIOIOASHwpt;
> +
> typedef struct VFIOIOMMUFDContainer {
> VFIOContainerBase bcontainer;
> IOMMUFDBackend *be;
> uint32_t ioas_id;
> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> } VFIOIOMMUFDContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
> HostIOMMUDevice *hiod;
> int devid;
> IOMMUFDBackend *iommufd;
> + VFIOIOASHwpt *hwpt;
> + QLIST_ENTRY(VFIODevice) hwpt_next;
> } VFIODevice;
>
> struct VFIODeviceOps {
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 57d502a1c79a..e917e7591d05 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt,
> + Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 2b3d51af26d2..5d3dfa917415 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> return ret;
> }
>
> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt,
> + Error **errp)
> +{
> + int ret, fd = be->fd;
> + struct iommu_hwpt_alloc alloc_hwpt = {
> + .size = sizeof(struct iommu_hwpt_alloc),
> + .flags = flags,
> + .dev_id = dev_id,
> + .pt_id = pt_id,
> + .data_type = data_type,
> + .data_len = data_len,
> + .data_uptr = (uint64_t)data_ptr,
The type cast should be (uintptr_t)
> + };
> +
> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
> + data_len, (uint64_t)data_ptr,
same here.
Thanks,
C.
> + alloc_hwpt.out_hwpt_id, ret);
> + if (ret) {
> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
> + return false;
> + }
> +
> + *out_hwpt = alloc_hwpt.out_hwpt_id;
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 077dea8f1b64..325c7598d5a1 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container,
> + Error **errp)
> +{
> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
> + uint32_t flags = 0;
> + VFIOIOASHwpt *hwpt;
> + uint32_t hwpt_id;
> + int ret;
> +
> + /* Try to find a domain */
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> + if (ret) {
> + /* -EINVAL means the domain is incompatible with the device. */
> + if (ret == -EINVAL) {
> + /*
> + * It is an expected failure and it just means we will try
> + * another domain, or create one if no existing compatible
> + * domain is found. Hence why the error is discarded below.
> + */
> + error_free(*errp);
> + *errp = NULL;
> + continue;
> + }
> +
> + return false;
> + } else {
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + return true;
> + }
> + }
> +
> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> + container->ioas_id, flags,
> + IOMMU_HWPT_DATA_NONE, 0, NULL,
> + &hwpt_id, errp)) {
> + return false;
> + }
> +
> + hwpt = g_malloc0(sizeof(*hwpt));
> + hwpt->hwpt_id = hwpt_id;
> + QLIST_INIT(&hwpt->device_list);
> +
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> + if (ret) {
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + return false;
> + }
> +
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> + return true;
> +}
> +
> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container)
> +{
> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
> +
> + QLIST_REMOVE(vbasedev, hwpt_next);
> + if (QLIST_EMPTY(&hwpt->device_list)) {
> + QLIST_REMOVE(hwpt, next);
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + }
> +}
> +
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
> + /* mdevs aren't physical devices and will fail with auto domains */
> + if (!vbasedev->mdev) {
> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
> + }
> +
> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> }
>
> @@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
> {
> Error *err = NULL;
>
> + if (vbasedev->hwpt) {
> + iommufd_cdev_autodomains_put(vbasedev, container);
> + return;
> + }
> +
> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
> error_report_err(err);
> }
> @@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
> container->be = vbasedev->iommufd;
> container->ioas_id = ioas_id;
> + QLIST_INIT(&container->hwpt_list);
>
> bcontainer = &container->bcontainer;
> vfio_address_space_insert(space, bcontainer);
> diff --git a/backends/trace-events b/backends/trace-events
> index 211e6f374adc..4d8ac02fe7d6 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-12 11:46 ` [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation Joao Martins
2024-07-16 9:39 ` Cédric Le Goater
2024-07-16 12:54 ` Cédric Le Goater
@ 2024-07-16 16:04 ` Eric Auger
2024-07-16 16:44 ` Joao Martins
2024-07-17 2:18 ` Duan, Zhenzhong
3 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2024-07-16 16:04 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hi Joao,
On 7/12/24 13:46, Joao Martins wrote:
> There's generally two modes of operation for IOMMUFD:
>
> * The simple user API which intends to perform relatively simple things
> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
It generally creates? can you explicit what is "it"
I am confused by this automatic terminology again (not your fault). the doc says:
"
*
Automatic domain - refers to an iommu domain created automatically
when attaching a device to an IOAS object. This is compatible to the
semantics of VFIO type1.
*
Manual domain - refers to an iommu domain designated by the user as
the target pagetable to be attached to by a device. Though currently
there are no uAPIs to directly create such domain, the datastructure
and algorithms are ready for handling that use case.
"
in 1) the device is attached to the ioas id (using the auto domain if I am not wrong)
Here you attach to an hwpt id. Isn't it a manual domain?
> and mainly performs IOAS_MAP and UNMAP.
>
> * The native IOMMUFD API where you have fine grained control of the
> IOMMU domain and model it accordingly. This is where most new feature
> are being steered to.
>
> For dirty tracking 2) is required, as it needs to ensure that
> the stage-2/parent IOMMU domain will only attach devices
> that support dirty tracking (so far it is all homogeneous in x86, likely
> not the case for smmuv3). Such invariant on dirty tracking provides a
> useful guarantee to VMMs that will refuse incompatible device
> attachments for IOMMU domains.
>
> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
> responsible for creating an IOMMU domain. This is contrast to the
> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
> when it attaches to VFIO (usually referred as autodomains) but it has
> the needed handling for mdevs.
>
> To support dirty tracking with the advanced IOMMUFD API, it needs
> similar logic, where IOMMU domains are created and devices attached to
> compatible domains. Essentially mimmicing kernel
> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
> it falls back to IOAS attach.
>
> The auto domain logic allows different IOMMU domains to be created when
> DMA dirty tracking is not desired (and VF can provide it), and others where
> it is. Here is not used in this way here given how VFIODevice migration
Here is not used in this way here ?
> state is initialized after the device attachment. But such mixed mode of
> IOMMU dirty tracking + device dirty tracking is an improvement that can
> be added on. Keep the 'all of nothing' of type1 approach that we have
> been using so far between container vs device dirty tracking.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 9 ++++
> include/sysemu/iommufd.h | 5 +++
> backends/iommufd.c | 30 +++++++++++++
> hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 5 files changed, 127 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7419466bca92..2dd468ce3c02 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> +typedef struct VFIOIOASHwpt {
> + uint32_t hwpt_id;
> + QLIST_HEAD(, VFIODevice) device_list;
> + QLIST_ENTRY(VFIOIOASHwpt) next;
> +} VFIOIOASHwpt;
> +
> typedef struct VFIOIOMMUFDContainer {
> VFIOContainerBase bcontainer;
> IOMMUFDBackend *be;
> uint32_t ioas_id;
> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> } VFIOIOMMUFDContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
> HostIOMMUDevice *hiod;
> int devid;
> IOMMUFDBackend *iommufd;
> + VFIOIOASHwpt *hwpt;
> + QLIST_ENTRY(VFIODevice) hwpt_next;
> } VFIODevice;
>
> struct VFIODeviceOps {
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 57d502a1c79a..e917e7591d05 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt,
> + Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 2b3d51af26d2..5d3dfa917415 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> return ret;
> }
>
> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt,
> + Error **errp)
> +{
> + int ret, fd = be->fd;
> + struct iommu_hwpt_alloc alloc_hwpt = {
> + .size = sizeof(struct iommu_hwpt_alloc),
> + .flags = flags,
> + .dev_id = dev_id,
> + .pt_id = pt_id,
> + .data_type = data_type,
> + .data_len = data_len,
> + .data_uptr = (uint64_t)data_ptr,
> + };
> +
> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
> + data_len, (uint64_t)data_ptr,
> + alloc_hwpt.out_hwpt_id, ret);
> + if (ret) {
> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
> + return false;
> + }
> +
> + *out_hwpt = alloc_hwpt.out_hwpt_id;
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 077dea8f1b64..325c7598d5a1 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container,
> + Error **errp)
> +{
> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
> + uint32_t flags = 0;
> + VFIOIOASHwpt *hwpt;
> + uint32_t hwpt_id;
> + int ret;
> +
> + /* Try to find a domain */
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> + if (ret) {
> + /* -EINVAL means the domain is incompatible with the device. */
> + if (ret == -EINVAL) {
> + /*
> + * It is an expected failure and it just means we will try
> + * another domain, or create one if no existing compatible
> + * domain is found. Hence why the error is discarded below.
> + */
> + error_free(*errp);
> + *errp = NULL;
> + continue;
> + }
> +
> + return false;
> + } else {
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + return true;
> + }
> + }
> +
> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> + container->ioas_id, flags,
> + IOMMU_HWPT_DATA_NONE, 0, NULL,
> + &hwpt_id, errp)) {
> + return false;
> + }
> +
> + hwpt = g_malloc0(sizeof(*hwpt));
> + hwpt->hwpt_id = hwpt_id;
> + QLIST_INIT(&hwpt->device_list);
> +
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> + if (ret) {
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + return false;
> + }
> +
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> + return true;
> +}
> +
> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container)
> +{
> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
> +
> + QLIST_REMOVE(vbasedev, hwpt_next);
don't you want to reset vbasedev->hwpt = NULL too?
> + if (QLIST_EMPTY(&hwpt->device_list)) {
> + QLIST_REMOVE(hwpt, next);
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + }
> +}
> +
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
> + /* mdevs aren't physical devices and will fail with auto domains */
> + if (!vbasedev->mdev) {
> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
> + }
> +
> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> }
>
> @@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
> {
> Error *err = NULL;
>
> + if (vbasedev->hwpt) {
> + iommufd_cdev_autodomains_put(vbasedev, container);
> + return;
Where do we detach the device from the hwpt?
Thanks
Eric
> + }
> +
> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
> error_report_err(err);
> }
> @@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
> container->be = vbasedev->iommufd;
> container->ioas_id = ioas_id;
> + QLIST_INIT(&container->hwpt_list);
>
> bcontainer = &container->bcontainer;
> vfio_address_space_insert(space, bcontainer);
> diff --git a/backends/trace-events b/backends/trace-events
> index 211e6f374adc..4d8ac02fe7d6 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-16 16:04 ` Eric Auger
@ 2024-07-16 16:44 ` Joao Martins
2024-07-16 16:46 ` Joao Martins
2024-07-16 17:32 ` Eric Auger
0 siblings, 2 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-16 16:44 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 16/07/2024 17:04, Eric Auger wrote:
> Hi Joao,
>
> On 7/12/24 13:46, Joao Martins wrote:
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>
> It generally creates? can you explicit what is "it"
>
'It' here refers to the process/API-user
> I am confused by this automatic terminology again (not your fault). the doc says:
> "
>
> *
>
> Automatic domain - refers to an iommu domain created automatically
> when attaching a device to an IOAS object. This is compatible to the
> semantics of VFIO type1.
>
> *
>
> Manual domain - refers to an iommu domain designated by the user as
> the target pagetable to be attached to by a device. Though currently
> there are no uAPIs to directly create such domain, the datastructure
> and algorithms are ready for handling that use case.
>
> "
>
>
> in 1) the device is attached to the ioas id (using the auto domain if I am not wrong)
> Here you attach to an hwpt id. Isn't it a manual domain?
>
Correct.
The 'auto domains' generally refers to the kernel-equivalent own automatic
attaching to a new pagetable.
Here I call 'auto domains' in the userspace version too because we are doing the
exact same but from userspace, using the manual API in IOMMUFD.
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>> responsible for creating an IOMMU domain. This is contrast to the
>> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
>> when it attaches to VFIO (usually referred as autodomains) but it has
>> the needed handling for mdevs.
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
>> it falls back to IOAS attach.
>>
>> The auto domain logic allows different IOMMU domains to be created when
>> DMA dirty tracking is not desired (and VF can provide it), and others where
>> it is. Here is not used in this way here given how VFIODevice migration
>
> Here is not used in this way here ?
>
I meant, 'Here it is not used in this way given (...)'
>> state is initialized after the device attachment. But such mixed mode of
>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>> be added on. Keep the 'all of nothing' of type1 approach that we have
>> been using so far between container vs device dirty tracking.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 9 ++++
>> include/sysemu/iommufd.h | 5 +++
>> backends/iommufd.c | 30 +++++++++++++
>> hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
>> backends/trace-events | 1 +
>> 5 files changed, 127 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 7419466bca92..2dd468ce3c02 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>
>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>
>> +typedef struct VFIOIOASHwpt {
>> + uint32_t hwpt_id;
>> + QLIST_HEAD(, VFIODevice) device_list;
>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>> +} VFIOIOASHwpt;
>> +
>> typedef struct VFIOIOMMUFDContainer {
>> VFIOContainerBase bcontainer;
>> IOMMUFDBackend *be;
>> uint32_t ioas_id;
>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>> } VFIOIOMMUFDContainer;
>>
>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>> HostIOMMUDevice *hiod;
>> int devid;
>> IOMMUFDBackend *iommufd;
>> + VFIOIOASHwpt *hwpt;
>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>> } VFIODevice;
>>
>> struct VFIODeviceOps {
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 57d502a1c79a..e917e7591d05 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp);
>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt,
>> + Error **errp);
>>
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 2b3d51af26d2..5d3dfa917415 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> return ret;
>> }
>>
>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt,
>> + Error **errp)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_hwpt_alloc alloc_hwpt = {
>> + .size = sizeof(struct iommu_hwpt_alloc),
>> + .flags = flags,
>> + .dev_id = dev_id,
>> + .pt_id = pt_id,
>> + .data_type = data_type,
>> + .data_len = data_len,
>> + .data_uptr = (uint64_t)data_ptr,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>> + data_len, (uint64_t)data_ptr,
>> + alloc_hwpt.out_hwpt_id, ret);
>> + if (ret) {
>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>> + return false;
>> + }
>> +
>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>> + return true;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 077dea8f1b64..325c7598d5a1 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>> return true;
>> }
>>
>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container,
>> + Error **errp)
>> +{
>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> + uint32_t flags = 0;
>> + VFIOIOASHwpt *hwpt;
>> + uint32_t hwpt_id;
>> + int ret;
>> +
>> + /* Try to find a domain */
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>> + if (ret) {
>> + /* -EINVAL means the domain is incompatible with the device. */
>> + if (ret == -EINVAL) {
>> + /*
>> + * It is an expected failure and it just means we will try
>> + * another domain, or create one if no existing compatible
>> + * domain is found. Hence why the error is discarded below.
>> + */
>> + error_free(*errp);
>> + *errp = NULL;
>> + continue;
>> + }
>> +
>> + return false;
>> + } else {
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + return true;
>> + }
>> + }
>> +
>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> + container->ioas_id, flags,
>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>> + &hwpt_id, errp)) {
>> + return false;
>> + }
>> +
>> + hwpt = g_malloc0(sizeof(*hwpt));
>> + hwpt->hwpt_id = hwpt_id;
>> + QLIST_INIT(&hwpt->device_list);
>> +
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>> + if (ret) {
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + return false;
>> + }
>> +
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> + return true;
>> +}
>> +
>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container)
>> +{
>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>> +
>> + QLIST_REMOVE(vbasedev, hwpt_next);
> don't you want to reset vbasedev->hwpt = NULL too?
>
Yeap, Thanks for catching that
>
>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>> + QLIST_REMOVE(hwpt, next);
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + }
>> +}
>> +
>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>> VFIOIOMMUFDContainer *container,
>> Error **errp)
>> {
>> + /* mdevs aren't physical devices and will fail with auto domains */
>> + if (!vbasedev->mdev) {
>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>> + }
>> +
>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
>> }
>>
>> @@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
>> {
>> Error *err = NULL;
>>
>> + if (vbasedev->hwpt) {
>> + iommufd_cdev_autodomains_put(vbasedev, container);
>> + return;
> Where do we detach the device from the hwpt?
>
In iommufd_backend_free_id() for auto domains
> Thanks
>
> Eric
>> + }
>> +
>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>> error_report_err(err);
>> }
>> @@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>> container->be = vbasedev->iommufd;
>> container->ioas_id = ioas_id;
>> + QLIST_INIT(&container->hwpt_list);
>>
>> bcontainer = &container->bcontainer;
>> vfio_address_space_insert(space, bcontainer);
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 211e6f374adc..4d8ac02fe7d6 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-16 16:44 ` Joao Martins
@ 2024-07-16 16:46 ` Joao Martins
2024-07-17 2:52 ` Duan, Zhenzhong
2024-07-16 17:32 ` Eric Auger
1 sibling, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-16 16:46 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 16/07/2024 17:44, Joao Martins wrote:
> On 16/07/2024 17:04, Eric Auger wrote:
>> Hi Joao,
>>
>> On 7/12/24 13:46, Joao Martins wrote:
>>> There's generally two modes of operation for IOMMUFD:
>>>
>>> * The simple user API which intends to perform relatively simple things
>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>>
>> It generally creates? can you explicit what is "it"
>>
> 'It' here refers to the process/API-user
>
>> I am confused by this automatic terminology again (not your fault). the doc says:
>> "
>>
>> *
>>
>> Automatic domain - refers to an iommu domain created automatically
>> when attaching a device to an IOAS object. This is compatible to the
>> semantics of VFIO type1.
>>
>> *
>>
>> Manual domain - refers to an iommu domain designated by the user as
>> the target pagetable to be attached to by a device. Though currently
>> there are no uAPIs to directly create such domain, the datastructure
>> and algorithms are ready for handling that use case.
>>
>> "
>>
>>
>> in 1) the device is attached to the ioas id (using the auto domain if I am not wrong)
>> Here you attach to an hwpt id. Isn't it a manual domain?
>>
>
> Correct.
>
> The 'auto domains' generally refers to the kernel-equivalent own automatic
> attaching to a new pagetable.
>
> Here I call 'auto domains' in the userspace version too because we are doing the
> exact same but from userspace, using the manual API in IOMMUFD.
>
>>> and mainly performs IOAS_MAP and UNMAP.
>>>
>>> * The native IOMMUFD API where you have fine grained control of the
>>> IOMMU domain and model it accordingly. This is where most new feature
>>> are being steered to.
>>>
>>> For dirty tracking 2) is required, as it needs to ensure that
>>> the stage-2/parent IOMMU domain will only attach devices
>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>> useful guarantee to VMMs that will refuse incompatible device
>>> attachments for IOMMU domains.
>>>
>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>> responsible for creating an IOMMU domain. This is contrast to the
>>> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>> the needed handling for mdevs.
>>>
>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>> similar logic, where IOMMU domains are created and devices attached to
>>> compatible domains. Essentially mimmicing kernel
>>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
>>> it falls back to IOAS attach.
>>>
>>> The auto domain logic allows different IOMMU domains to be created when
>>> DMA dirty tracking is not desired (and VF can provide it), and others where
>>> it is. Here is not used in this way here given how VFIODevice migration
>>
>> Here is not used in this way here ?
>>
>
> I meant, 'Here it is not used in this way given (...)'
>
>>> state is initialized after the device attachment. But such mixed mode of
>>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>> been using so far between container vs device dirty tracking.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 9 ++++
>>> include/sysemu/iommufd.h | 5 +++
>>> backends/iommufd.c | 30 +++++++++++++
>>> hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
>>> backends/trace-events | 1 +
>>> 5 files changed, 127 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 7419466bca92..2dd468ce3c02 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>
>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>
>>> +typedef struct VFIOIOASHwpt {
>>> + uint32_t hwpt_id;
>>> + QLIST_HEAD(, VFIODevice) device_list;
>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>> +} VFIOIOASHwpt;
>>> +
>>> typedef struct VFIOIOMMUFDContainer {
>>> VFIOContainerBase bcontainer;
>>> IOMMUFDBackend *be;
>>> uint32_t ioas_id;
>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>> } VFIOIOMMUFDContainer;
>>>
>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>> HostIOMMUDevice *hiod;
>>> int devid;
>>> IOMMUFDBackend *iommufd;
>>> + VFIOIOASHwpt *hwpt;
>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>> } VFIODevice;
>>>
>>> struct VFIODeviceOps {
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index 57d502a1c79a..e917e7591d05 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp);
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp);
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>> #endif
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 2b3d51af26d2..5d3dfa917415 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> return ret;
>>> }
>>>
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>> + .flags = flags,
>>> + .dev_id = dev_id,
>>> + .pt_id = pt_id,
>>> + .data_type = data_type,
>>> + .data_len = data_len,
>>> + .data_uptr = (uint64_t)data_ptr,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>>> + data_len, (uint64_t)data_ptr,
>>> + alloc_hwpt.out_hwpt_id, ret);
>>> + if (ret) {
>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>> + return false;
>>> + }
>>> +
>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>> + return true;
>>> +}
>>> +
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp)
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 077dea8f1b64..325c7598d5a1 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>> return true;
>>> }
>>>
>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container,
>>> + Error **errp)
>>> +{
>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>> + uint32_t flags = 0;
>>> + VFIOIOASHwpt *hwpt;
>>> + uint32_t hwpt_id;
>>> + int ret;
>>> +
>>> + /* Try to find a domain */
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>> + if (ret) {
>>> + /* -EINVAL means the domain is incompatible with the device. */
>>> + if (ret == -EINVAL) {
>>> + /*
>>> + * It is an expected failure and it just means we will try
>>> + * another domain, or create one if no existing compatible
>>> + * domain is found. Hence why the error is discarded below.
>>> + */
>>> + error_free(*errp);
>>> + *errp = NULL;
>>> + continue;
>>> + }
>>> +
>>> + return false;
>>> + } else {
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + return true;
>>> + }
>>> + }
>>> +
>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>> + container->ioas_id, flags,
>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>> + &hwpt_id, errp)) {
>>> + return false;
>>> + }
>>> +
>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>> + hwpt->hwpt_id = hwpt_id;
>>> + QLIST_INIT(&hwpt->device_list);
>>> +
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>> + if (ret) {
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + return false;
>>> + }
>>> +
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> + return true;
>>> +}
>>> +
>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container)
>>> +{
>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>> +
>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>> don't you want to reset vbasedev->hwpt = NULL too?
>>
> Yeap, Thanks for catching that
>
>>
>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>> + QLIST_REMOVE(hwpt, next);
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + }
>>> +}
>>> +
>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>> VFIOIOMMUFDContainer *container,
>>> Error **errp)
>>> {
>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>> + if (!vbasedev->mdev) {
>>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>>> + }
>>> +
>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
>>> }
>>>
>>> @@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>> {
>>> Error *err = NULL;
>>>
>>> + if (vbasedev->hwpt) {
>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>> + return;
>> Where do we detach the device from the hwpt?
>>
> In iommufd_backend_free_id() for auto domains
>
to clarify here I meant *userspace* auto domains
*kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>> Thanks
>>
>> Eric
>>> + }
>>> +
>>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>>> error_report_err(err);
>>> }
>>> @@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>> container->be = vbasedev->iommufd;
>>> container->ioas_id = ioas_id;
>>> + QLIST_INIT(&container->hwpt_list);
>>>
>>> bcontainer = &container->bcontainer;
>>> vfio_address_space_insert(space, bcontainer);
>>> diff --git a/backends/trace-events b/backends/trace-events
>>> index 211e6f374adc..4d8ac02fe7d6 100644
>>> --- a/backends/trace-events
>>> +++ b/backends/trace-events
>>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
>>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
>>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
>>
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-16 16:46 ` Joao Martins
@ 2024-07-17 2:52 ` Duan, Zhenzhong
2024-07-17 9:09 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 2:52 UTC (permalink / raw)
To: Joao Martins, eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: Liu, Yi L, Alex Williamson, Cedric Le Goater, Jason Gunthorpe,
Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>creation
>
>On 16/07/2024 17:44, Joao Martins wrote:
>> On 16/07/2024 17:04, Eric Auger wrote:
>>> Hi Joao,
>>>
>>> On 7/12/24 13:46, Joao Martins wrote:
>>>> There's generally two modes of operation for IOMMUFD:
>>>>
>>>> * The simple user API which intends to perform relatively simple things
>>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>>>
>>> It generally creates? can you explicit what is "it"
>>>
>> 'It' here refers to the process/API-user
>>
>>> I am confused by this automatic terminology again (not your fault). the
>doc says:
>>> "
>>>
>>> *
>>>
>>> Automatic domain - refers to an iommu domain created automatically
>>> when attaching a device to an IOAS object. This is compatible to the
>>> semantics of VFIO type1.
>>>
>>> *
>>>
>>> Manual domain - refers to an iommu domain designated by the user as
>>> the target pagetable to be attached to by a device. Though currently
>>> there are no uAPIs to directly create such domain, the datastructure
>>> and algorithms are ready for handling that use case.
>>>
>>> "
>>>
>>>
>>> in 1) the device is attached to the ioas id (using the auto domain if I am
>not wrong)
>>> Here you attach to an hwpt id. Isn't it a manual domain?
>>>
>>
>> Correct.
>>
>> The 'auto domains' generally refers to the kernel-equivalent own
>automatic
>> attaching to a new pagetable.
>>
>> Here I call 'auto domains' in the userspace version too because we are
>doing the
>> exact same but from userspace, using the manual API in IOMMUFD.
>>
>>>> and mainly performs IOAS_MAP and UNMAP.
>>>>
>>>> * The native IOMMUFD API where you have fine grained control of the
>>>> IOMMU domain and model it accordingly. This is where most new
>feature
>>>> are being steered to.
>>>>
>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>> the stage-2/parent IOMMU domain will only attach devices
>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>> useful guarantee to VMMs that will refuse incompatible device
>>>> attachments for IOMMU domains.
>>>>
>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>automatically
>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>> the needed handling for mdevs.
>>>>
>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>> similar logic, where IOMMU domains are created and devices attached
>to
>>>> compatible domains. Essentially mimmicing kernel
>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>IOMMU domain
>>>> it falls back to IOAS attach.
>>>>
>>>> The auto domain logic allows different IOMMU domains to be created
>when
>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>where
>>>> it is. Here is not used in this way here given how VFIODevice migration
>>>
>>> Here is not used in this way here ?
>>>
>>
>> I meant, 'Here it is not used in this way given (...)'
>>
>>>> state is initialized after the device attachment. But such mixed mode of
>>>> IOMMU dirty tracking + device dirty tracking is an improvement that
>can
>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>> been using so far between container vs device dirty tracking.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>> include/sysemu/iommufd.h | 5 +++
>>>> backends/iommufd.c | 30 +++++++++++++
>>>> hw/vfio/iommufd.c | 82
>+++++++++++++++++++++++++++++++++++
>>>> backends/trace-events | 1 +
>>>> 5 files changed, 127 insertions(+)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>>>> index 7419466bca92..2dd468ce3c02 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>
>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>
>>>> +typedef struct VFIOIOASHwpt {
>>>> + uint32_t hwpt_id;
>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>> +} VFIOIOASHwpt;
>>>> +
>>>> typedef struct VFIOIOMMUFDContainer {
>>>> VFIOContainerBase bcontainer;
>>>> IOMMUFDBackend *be;
>>>> uint32_t ioas_id;
>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>> } VFIOIOMMUFDContainer;
>>>>
>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>VFIO_IOMMU_IOMMUFD);
>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>> HostIOMMUDevice *hiod;
>>>> int devid;
>>>> IOMMUFDBackend *iommufd;
>>>> + VFIOIOASHwpt *hwpt;
>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>> } VFIODevice;
>>>>
>>>> struct VFIODeviceOps {
>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>> index 57d502a1c79a..e917e7591d05 100644
>>>> --- a/include/sysemu/iommufd.h
>>>> +++ b/include/sysemu/iommufd.h
>>>> @@ -50,6 +50,11 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t devid,
>>>> uint32_t *type, void *data, uint32_t len,
>>>> uint64_t *caps, Error **errp);
>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>dev_id,
>>>> + uint32_t pt_id, uint32_t flags,
>>>> + uint32_t data_type, uint32_t data_len,
>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>> + Error **errp);
>>>>
>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>> #endif
>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>> index 2b3d51af26d2..5d3dfa917415 100644
>>>> --- a/backends/iommufd.c
>>>> +++ b/backends/iommufd.c
>>>> @@ -208,6 +208,36 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>>> return ret;
>>>> }
>>>>
>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>dev_id,
>>>> + uint32_t pt_id, uint32_t flags,
>>>> + uint32_t data_type, uint32_t data_len,
>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>> + Error **errp)
>>>> +{
>>>> + int ret, fd = be->fd;
>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>> + .flags = flags,
>>>> + .dev_id = dev_id,
>>>> + .pt_id = pt_id,
>>>> + .data_type = data_type,
>>>> + .data_len = data_len,
>>>> + .data_uptr = (uint64_t)data_ptr,
>>>> + };
>>>> +
>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>data_type,
>>>> + data_len, (uint64_t)data_ptr,
>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>> + if (ret) {
>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>> + return false;
>>>> + }
>>>> +
>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>> + return true;
>>>> +}
>>>> +
>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t devid,
>>>> uint32_t *type, void *data, uint32_t len,
>>>> uint64_t *caps, Error **errp)
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 077dea8f1b64..325c7598d5a1 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -212,10 +212,86 @@ static bool
>iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>>> return true;
>>>> }
>>>>
>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>> + VFIOIOMMUFDContainer *container,
>>>> + Error **errp)
>>>> +{
>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>> + uint32_t flags = 0;
>>>> + VFIOIOASHwpt *hwpt;
>>>> + uint32_t hwpt_id;
>>>> + int ret;
>>>> +
>>>> + /* Try to find a domain */
>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>errp);
>>>> + if (ret) {
>>>> + /* -EINVAL means the domain is incompatible with the device.
>*/
>>>> + if (ret == -EINVAL) {
>>>> + /*
>>>> + * It is an expected failure and it just means we will try
>>>> + * another domain, or create one if no existing compatible
>>>> + * domain is found. Hence why the error is discarded below.
>>>> + */
>>>> + error_free(*errp);
>>>> + *errp = NULL;
>>>> + continue;
>>>> + }
>>>> +
>>>> + return false;
>>>> + } else {
>>>> + vbasedev->hwpt = hwpt;
>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>> + return true;
>>>> + }
>>>> + }
>>>> +
>>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>> + container->ioas_id, flags,
>>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>> + &hwpt_id, errp)) {
>>>> + return false;
>>>> + }
>>>> +
>>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>>> + hwpt->hwpt_id = hwpt_id;
>>>> + QLIST_INIT(&hwpt->device_list);
>>>> +
>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>errp);
>>>> + if (ret) {
>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>> + g_free(hwpt);
>>>> + return false;
>>>> + }
>>>> +
>>>> + vbasedev->hwpt = hwpt;
>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>> + return true;
>>>> +}
>>>> +
>>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>>> + VFIOIOMMUFDContainer *container)
>>>> +{
>>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>>> +
>>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>> don't you want to reset vbasedev->hwpt = NULL too?
>>>
>> Yeap, Thanks for catching that
>>
>>>
>>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>>> + QLIST_REMOVE(hwpt, next);
>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>> + g_free(hwpt);
>>>> + }
>>>> +}
>>>> +
>>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>>> VFIOIOMMUFDContainer *container,
>>>> Error **errp)
>>>> {
>>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>>> + if (!vbasedev->mdev) {
>>>> + return iommufd_cdev_autodomains_get(vbasedev, container,
>errp);
>>>> + }
>>>> +
>>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container-
>>ioas_id, errp);
>>>> }
>>>>
>>>> @@ -224,6 +300,11 @@ static void
>iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>> {
>>>> Error *err = NULL;
>>>>
>>>> + if (vbasedev->hwpt) {
>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>> + return;
>>> Where do we detach the device from the hwpt?
>>>
>> In iommufd_backend_free_id() for auto domains
>>
>
>to clarify here I meant *userspace* auto domains
>
>*kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
If the device is still attached to the hwpt, will iommufd_backend_free_id() succeed?
Have you tried the hot unplug?
Thanks
Zhenzhong
>
>>> Thanks
>>>
>>> Eric
>>>> + }
>>>> +
>>>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>>>> error_report_err(err);
>>>> }
>>>> @@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char
>*name, VFIODevice *vbasedev,
>>>> container =
>VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>>> container->be = vbasedev->iommufd;
>>>> container->ioas_id = ioas_id;
>>>> + QLIST_INIT(&container->hwpt_list);
>>>>
>>>> bcontainer = &container->bcontainer;
>>>> vfio_address_space_insert(space, bcontainer);
>>>> diff --git a/backends/trace-events b/backends/trace-events
>>>> index 211e6f374adc..4d8ac02fe7d6 100644
>>>> --- a/backends/trace-events
>>>> +++ b/backends/trace-events
>>>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd,
>uint32_t ioas, uint64_t iova, uint64_t size
>>>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t
>iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>size=0x%"PRIx64" (%d)"
>>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) "
>iommufd=%d ioas=%d"
>>>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t
>data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u
>pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64"
>out_hwpt=%u (%d)"
>>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) "
>iommufd=%d id=%d (%d)"
>>>
>>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 2:52 ` Duan, Zhenzhong
@ 2024-07-17 9:09 ` Joao Martins
2024-07-17 9:28 ` Cédric Le Goater
2024-07-17 9:48 ` Duan, Zhenzhong
0 siblings, 2 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 9:09 UTC (permalink / raw)
To: Duan, Zhenzhong, eric.auger@redhat.com
Cc: Liu, Yi L, Alex Williamson, Cedric Le Goater, Jason Gunthorpe,
Avihai Horon, qemu-devel@nongnu.org
On 17/07/2024 03:52, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>> creation
>>
>> On 16/07/2024 17:44, Joao Martins wrote:
>>> On 16/07/2024 17:04, Eric Auger wrote:
>>>> Hi Joao,
>>>>
>>>> On 7/12/24 13:46, Joao Martins wrote:
>>>>> There's generally two modes of operation for IOMMUFD:
>>>>>
>>>>> * The simple user API which intends to perform relatively simple things
>>>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>>>>
>>>> It generally creates? can you explicit what is "it"
>>>>
>>> 'It' here refers to the process/API-user
>>>
>>>> I am confused by this automatic terminology again (not your fault). the
>> doc says:
>>>> "
>>>>
>>>> *
>>>>
>>>> Automatic domain - refers to an iommu domain created automatically
>>>> when attaching a device to an IOAS object. This is compatible to the
>>>> semantics of VFIO type1.
>>>>
>>>> *
>>>>
>>>> Manual domain - refers to an iommu domain designated by the user as
>>>> the target pagetable to be attached to by a device. Though currently
>>>> there are no uAPIs to directly create such domain, the datastructure
>>>> and algorithms are ready for handling that use case.
>>>>
>>>> "
>>>>
>>>>
>>>> in 1) the device is attached to the ioas id (using the auto domain if I am
>> not wrong)
>>>> Here you attach to an hwpt id. Isn't it a manual domain?
>>>>
>>>
>>> Correct.
>>>
>>> The 'auto domains' generally refers to the kernel-equivalent own
>> automatic
>>> attaching to a new pagetable.
>>>
>>> Here I call 'auto domains' in the userspace version too because we are
>> doing the
>>> exact same but from userspace, using the manual API in IOMMUFD.
>>>
>>>>> and mainly performs IOAS_MAP and UNMAP.
>>>>>
>>>>> * The native IOMMUFD API where you have fine grained control of the
>>>>> IOMMU domain and model it accordingly. This is where most new
>> feature
>>>>> are being steered to.
>>>>>
>>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>>> the stage-2/parent IOMMU domain will only attach devices
>>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>>> useful guarantee to VMMs that will refuse incompatible device
>>>>> attachments for IOMMU domains.
>>>>>
>>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>> automatically
>>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>>> the needed handling for mdevs.
>>>>>
>>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>>> similar logic, where IOMMU domains are created and devices attached
>> to
>>>>> compatible domains. Essentially mimmicing kernel
>>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>> IOMMU domain
>>>>> it falls back to IOAS attach.
>>>>>
>>>>> The auto domain logic allows different IOMMU domains to be created
>> when
>>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>> where
>>>>> it is. Here is not used in this way here given how VFIODevice migration
>>>>
>>>> Here is not used in this way here ?
>>>>
>>>
>>> I meant, 'Here it is not used in this way given (...)'
>>>
>>>>> state is initialized after the device attachment. But such mixed mode of
>>>>> IOMMU dirty tracking + device dirty tracking is an improvement that
>> can
>>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>>> been using so far between container vs device dirty tracking.
>>>>>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> ---
>>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>>> include/sysemu/iommufd.h | 5 +++
>>>>> backends/iommufd.c | 30 +++++++++++++
>>>>> hw/vfio/iommufd.c | 82
>> +++++++++++++++++++++++++++++++++++
>>>>> backends/trace-events | 1 +
>>>>> 5 files changed, 127 insertions(+)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>> common.h
>>>>> index 7419466bca92..2dd468ce3c02 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>>
>>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>>
>>>>> +typedef struct VFIOIOASHwpt {
>>>>> + uint32_t hwpt_id;
>>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>>> +} VFIOIOASHwpt;
>>>>> +
>>>>> typedef struct VFIOIOMMUFDContainer {
>>>>> VFIOContainerBase bcontainer;
>>>>> IOMMUFDBackend *be;
>>>>> uint32_t ioas_id;
>>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>>> } VFIOIOMMUFDContainer;
>>>>>
>>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>> VFIO_IOMMU_IOMMUFD);
>>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>>> HostIOMMUDevice *hiod;
>>>>> int devid;
>>>>> IOMMUFDBackend *iommufd;
>>>>> + VFIOIOASHwpt *hwpt;
>>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>>> } VFIODevice;
>>>>>
>>>>> struct VFIODeviceOps {
>>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>>> index 57d502a1c79a..e917e7591d05 100644
>>>>> --- a/include/sysemu/iommufd.h
>>>>> +++ b/include/sysemu/iommufd.h
>>>>> @@ -50,6 +50,11 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>> uint32_t devid,
>>>>> uint32_t *type, void *data, uint32_t len,
>>>>> uint64_t *caps, Error **errp);
>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>>>>> + uint32_t pt_id, uint32_t flags,
>>>>> + uint32_t data_type, uint32_t data_len,
>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>> + Error **errp);
>>>>>
>>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>>> #endif
>>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>>> index 2b3d51af26d2..5d3dfa917415 100644
>>>>> --- a/backends/iommufd.c
>>>>> +++ b/backends/iommufd.c
>>>>> @@ -208,6 +208,36 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>>>> return ret;
>>>>> }
>>>>>
>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>>>>> + uint32_t pt_id, uint32_t flags,
>>>>> + uint32_t data_type, uint32_t data_len,
>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>> + Error **errp)
>>>>> +{
>>>>> + int ret, fd = be->fd;
>>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>>> + .flags = flags,
>>>>> + .dev_id = dev_id,
>>>>> + .pt_id = pt_id,
>>>>> + .data_type = data_type,
>>>>> + .data_len = data_len,
>>>>> + .data_uptr = (uint64_t)data_ptr,
>>>>> + };
>>>>> +
>>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>> data_type,
>>>>> + data_len, (uint64_t)data_ptr,
>>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>>> + if (ret) {
>>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>>> + return false;
>>>>> + }
>>>>> +
>>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>> uint32_t devid,
>>>>> uint32_t *type, void *data, uint32_t len,
>>>>> uint64_t *caps, Error **errp)
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index 077dea8f1b64..325c7598d5a1 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -212,10 +212,86 @@ static bool
>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>>>> return true;
>>>>> }
>>>>>
>>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>> + VFIOIOMMUFDContainer *container,
>>>>> + Error **errp)
>>>>> +{
>>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>>> + uint32_t flags = 0;
>>>>> + VFIOIOASHwpt *hwpt;
>>>>> + uint32_t hwpt_id;
>>>>> + int ret;
>>>>> +
>>>>> + /* Try to find a domain */
>>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>> errp);
>>>>> + if (ret) {
>>>>> + /* -EINVAL means the domain is incompatible with the device.
>> */
>>>>> + if (ret == -EINVAL) {
>>>>> + /*
>>>>> + * It is an expected failure and it just means we will try
>>>>> + * another domain, or create one if no existing compatible
>>>>> + * domain is found. Hence why the error is discarded below.
>>>>> + */
>>>>> + error_free(*errp);
>>>>> + *errp = NULL;
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> + return false;
>>>>> + } else {
>>>>> + vbasedev->hwpt = hwpt;
>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>>> + return true;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>>> + container->ioas_id, flags,
>>>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>>> + &hwpt_id, errp)) {
>>>>> + return false;
>>>>> + }
>>>>> +
>>>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>>>> + hwpt->hwpt_id = hwpt_id;
>>>>> + QLIST_INIT(&hwpt->device_list);
>>>>> +
>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>> errp);
>>>>> + if (ret) {
>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>> + g_free(hwpt);
>>>>> + return false;
>>>>> + }
>>>>> +
>>>>> + vbasedev->hwpt = hwpt;
>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>>>> + VFIOIOMMUFDContainer *container)
>>>>> +{
>>>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>>>> +
>>>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>>> don't you want to reset vbasedev->hwpt = NULL too?
>>>>
>>> Yeap, Thanks for catching that
>>>
>>>>
>>>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>>>> + QLIST_REMOVE(hwpt, next);
>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>> + g_free(hwpt);
>>>>> + }
>>>>> +}
>>>>> +
>>>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>>>> VFIOIOMMUFDContainer *container,
>>>>> Error **errp)
>>>>> {
>>>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>>>> + if (!vbasedev->mdev) {
>>>>> + return iommufd_cdev_autodomains_get(vbasedev, container,
>> errp);
>>>>> + }
>>>>> +
>>>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container-
>>> ioas_id, errp);
>>>>> }
>>>>>
>>>>> @@ -224,6 +300,11 @@ static void
>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>>> {
>>>>> Error *err = NULL;
>>>>>
>>>>> + if (vbasedev->hwpt) {
>>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>>> + return;
>>>> Where do we detach the device from the hwpt?
>>>>
>>> In iommufd_backend_free_id() for auto domains
>>>
>>
>> to clarify here I meant *userspace* auto domains
>>
>> *kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>
> If the device is still attached to the hwpt, will iommufd_backend_free_id() succeed?
> Have you tried the hot unplug?
>
I have but I didn't see any errors. But I will check again for v5 as it could
also be my oversight.
I was thinking about Eric's remark overnight and I think what I am doing is not
correct regardless of the above.
I should be calling DETACH_IOMMUFD_PT pairing with ATTACH_IOMMUFD_PT, and the
iommufd_backend_free_id() is to drop the final reference pairing with
alloc_hwpt() when the device list is empty i.e. when there's no more devices in
that vdev::hwpt.
DETACH_IOMMUFD_PT decrement the hwpt refcount and it doesn't differentiate
between auto domains vs manual domains.
The code is already there anyhow it just has the order of
iommufd_cdev_autodomains_put vs detach invocation reversed; I'll fix that for
next version.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 9:09 ` Joao Martins
@ 2024-07-17 9:28 ` Cédric Le Goater
2024-07-17 9:31 ` Joao Martins
2024-07-17 9:48 ` Duan, Zhenzhong
1 sibling, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-17 9:28 UTC (permalink / raw)
To: Joao Martins, Duan, Zhenzhong, eric.auger@redhat.com
Cc: Liu, Yi L, Alex Williamson, Jason Gunthorpe, Avihai Horon,
qemu-devel@nongnu.org
On 7/17/24 11:09, Joao Martins wrote:
> On 17/07/2024 03:52, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>>> creation
>>>
>>> On 16/07/2024 17:44, Joao Martins wrote:
>>>> On 16/07/2024 17:04, Eric Auger wrote:
>>>>> Hi Joao,
>>>>>
>>>>> On 7/12/24 13:46, Joao Martins wrote:
>>>>>> There's generally two modes of operation for IOMMUFD:
>>>>>>
>>>>>> * The simple user API which intends to perform relatively simple things
>>>>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>>>>>
>>>>> It generally creates? can you explicit what is "it"
>>>>>
>>>> 'It' here refers to the process/API-user
>>>>
>>>>> I am confused by this automatic terminology again (not your fault). the
>>> doc says:
>>>>> "
>>>>>
>>>>> *
>>>>>
>>>>> Automatic domain - refers to an iommu domain created automatically
>>>>> when attaching a device to an IOAS object. This is compatible to the
>>>>> semantics of VFIO type1.
>>>>>
>>>>> *
>>>>>
>>>>> Manual domain - refers to an iommu domain designated by the user as
>>>>> the target pagetable to be attached to by a device. Though currently
>>>>> there are no uAPIs to directly create such domain, the datastructure
>>>>> and algorithms are ready for handling that use case.
>>>>>
>>>>> "
>>>>>
>>>>>
>>>>> in 1) the device is attached to the ioas id (using the auto domain if I am
>>> not wrong)
>>>>> Here you attach to an hwpt id. Isn't it a manual domain?
>>>>>
>>>>
>>>> Correct.
>>>>
>>>> The 'auto domains' generally refers to the kernel-equivalent own
>>> automatic
>>>> attaching to a new pagetable.
>>>>
>>>> Here I call 'auto domains' in the userspace version too because we are
>>> doing the
>>>> exact same but from userspace, using the manual API in IOMMUFD.
>>>>
>>>>>> and mainly performs IOAS_MAP and UNMAP.
>>>>>>
>>>>>> * The native IOMMUFD API where you have fine grained control of the
>>>>>> IOMMU domain and model it accordingly. This is where most new
>>> feature
>>>>>> are being steered to.
>>>>>>
>>>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>>>> the stage-2/parent IOMMU domain will only attach devices
>>>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>>>> useful guarantee to VMMs that will refuse incompatible device
>>>>>> attachments for IOMMU domains.
>>>>>>
>>>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>> automatically
>>>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>>>> the needed handling for mdevs.
>>>>>>
>>>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>>>> similar logic, where IOMMU domains are created and devices attached
>>> to
>>>>>> compatible domains. Essentially mimmicing kernel
>>>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>>> IOMMU domain
>>>>>> it falls back to IOAS attach.
>>>>>>
>>>>>> The auto domain logic allows different IOMMU domains to be created
>>> when
>>>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>>> where
>>>>>> it is. Here is not used in this way here given how VFIODevice migration
>>>>>
>>>>> Here is not used in this way here ?
>>>>>
>>>>
>>>> I meant, 'Here it is not used in this way given (...)'
>>>>
>>>>>> state is initialized after the device attachment. But such mixed mode of
>>>>>> IOMMU dirty tracking + device dirty tracking is an improvement that
>>> can
>>>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>>>> been using so far between container vs device dirty tracking.
>>>>>>
>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>> ---
>>>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>>>> include/sysemu/iommufd.h | 5 +++
>>>>>> backends/iommufd.c | 30 +++++++++++++
>>>>>> hw/vfio/iommufd.c | 82
>>> +++++++++++++++++++++++++++++++++++
>>>>>> backends/trace-events | 1 +
>>>>>> 5 files changed, 127 insertions(+)
>>>>>>
>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>> common.h
>>>>>> index 7419466bca92..2dd468ce3c02 100644
>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>>>
>>>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>>>
>>>>>> +typedef struct VFIOIOASHwpt {
>>>>>> + uint32_t hwpt_id;
>>>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>>>> +} VFIOIOASHwpt;
>>>>>> +
>>>>>> typedef struct VFIOIOMMUFDContainer {
>>>>>> VFIOContainerBase bcontainer;
>>>>>> IOMMUFDBackend *be;
>>>>>> uint32_t ioas_id;
>>>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>>>> } VFIOIOMMUFDContainer;
>>>>>>
>>>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>> VFIO_IOMMU_IOMMUFD);
>>>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>>>> HostIOMMUDevice *hiod;
>>>>>> int devid;
>>>>>> IOMMUFDBackend *iommufd;
>>>>>> + VFIOIOASHwpt *hwpt;
>>>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>>>> } VFIODevice;
>>>>>>
>>>>>> struct VFIODeviceOps {
>>>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>>>> index 57d502a1c79a..e917e7591d05 100644
>>>>>> --- a/include/sysemu/iommufd.h
>>>>>> +++ b/include/sysemu/iommufd.h
>>>>>> @@ -50,6 +50,11 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>> uint32_t devid,
>>>>>> uint32_t *type, void *data, uint32_t len,
>>>>>> uint64_t *caps, Error **errp);
>>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>>>>> + uint32_t pt_id, uint32_t flags,
>>>>>> + uint32_t data_type, uint32_t data_len,
>>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>>> + Error **errp);
>>>>>>
>>>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>>>> #endif
>>>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>>>> index 2b3d51af26d2..5d3dfa917415 100644
>>>>>> --- a/backends/iommufd.c
>>>>>> +++ b/backends/iommufd.c
>>>>>> @@ -208,6 +208,36 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>>>>> + uint32_t pt_id, uint32_t flags,
>>>>>> + uint32_t data_type, uint32_t data_len,
>>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>>> + Error **errp)
>>>>>> +{
>>>>>> + int ret, fd = be->fd;
>>>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>>>> + .flags = flags,
>>>>>> + .dev_id = dev_id,
>>>>>> + .pt_id = pt_id,
>>>>>> + .data_type = data_type,
>>>>>> + .data_len = data_len,
>>>>>> + .data_uptr = (uint64_t)data_ptr,
>>>>>> + };
>>>>>> +
>>>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>>> data_type,
>>>>>> + data_len, (uint64_t)data_ptr,
>>>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>>>> + if (ret) {
>>>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>>>> + return false;
>>>>>> + }
>>>>>> +
>>>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>>>> + return true;
>>>>>> +}
>>>>>> +
>>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>> uint32_t devid,
>>>>>> uint32_t *type, void *data, uint32_t len,
>>>>>> uint64_t *caps, Error **errp)
>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>> index 077dea8f1b64..325c7598d5a1 100644
>>>>>> --- a/hw/vfio/iommufd.c
>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>> @@ -212,10 +212,86 @@ static bool
>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>>>>> return true;
>>>>>> }
>>>>>>
>>>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>>> + VFIOIOMMUFDContainer *container,
>>>>>> + Error **errp)
>>>>>> +{
>>>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>>>> + uint32_t flags = 0;
>>>>>> + VFIOIOASHwpt *hwpt;
>>>>>> + uint32_t hwpt_id;
>>>>>> + int ret;
>>>>>> +
>>>>>> + /* Try to find a domain */
>>>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>>> errp);
>>>>>> + if (ret) {
>>>>>> + /* -EINVAL means the domain is incompatible with the device.
>>> */
>>>>>> + if (ret == -EINVAL) {
>>>>>> + /*
>>>>>> + * It is an expected failure and it just means we will try
>>>>>> + * another domain, or create one if no existing compatible
>>>>>> + * domain is found. Hence why the error is discarded below.
>>>>>> + */
>>>>>> + error_free(*errp);
>>>>>> + *errp = NULL;
>>>>>> + continue;
>>>>>> + }
>>>>>> +
>>>>>> + return false;
>>>>>> + } else {
>>>>>> + vbasedev->hwpt = hwpt;
>>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>>>> + return true;
>>>>>> + }
>>>>>> + }
>>>>>> +
>>>>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>>>> + container->ioas_id, flags,
>>>>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>>>> + &hwpt_id, errp)) {
>>>>>> + return false;
>>>>>> + }
>>>>>> +
>>>>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>>>>> + hwpt->hwpt_id = hwpt_id;
>>>>>> + QLIST_INIT(&hwpt->device_list);
>>>>>> +
>>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>>> errp);
>>>>>> + if (ret) {
>>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>>> + g_free(hwpt);
>>>>>> + return false;
>>>>>> + }
>>>>>> +
>>>>>> + vbasedev->hwpt = hwpt;
>>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>>>> + return true;
>>>>>> +}
>>>>>> +
>>>>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>>>>> + VFIOIOMMUFDContainer *container)
>>>>>> +{
>>>>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>>>>> +
>>>>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>>>> don't you want to reset vbasedev->hwpt = NULL too?
>>>>>
>>>> Yeap, Thanks for catching that
>>>>
>>>>>
>>>>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>>>>> + QLIST_REMOVE(hwpt, next);
>>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>>> + g_free(hwpt);
>>>>>> + }
>>>>>> +}
>>>>>> +
>>>>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>>>>> VFIOIOMMUFDContainer *container,
>>>>>> Error **errp)
>>>>>> {
>>>>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>>>>> + if (!vbasedev->mdev) {
>>>>>> + return iommufd_cdev_autodomains_get(vbasedev, container,
>>> errp);
>>>>>> + }
>>>>>> +
>>>>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container-
>>>> ioas_id, errp);
>>>>>> }
>>>>>>
>>>>>> @@ -224,6 +300,11 @@ static void
>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>>>> {
>>>>>> Error *err = NULL;
>>>>>>
>>>>>> + if (vbasedev->hwpt) {
>>>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>>>> + return;
>>>>> Where do we detach the device from the hwpt?
>>>>>
>>>> In iommufd_backend_free_id() for auto domains
>>>>
>>>
>>> to clarify here I meant *userspace* auto domains
>>>
>>> *kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>>
>> If the device is still attached to the hwpt, will iommufd_backend_free_id() succeed?
>> Have you tried the hot unplug?
>>
>
> I have but I didn't see any errors. But I will check again for v5 as it could
> also be my oversight.
>
> I was thinking about Eric's remark overnight and I think what I am doing is not
> correct regardless of the above.
>
> I should be calling DETACH_IOMMUFD_PT pairing with ATTACH_IOMMUFD_PT, and the
> iommufd_backend_free_id() is to drop the final reference pairing with
> alloc_hwpt() when the device list is empty i.e. when there's no more devices in
> that vdev::hwpt.
>
> DETACH_IOMMUFD_PT decrement the hwpt refcount and it doesn't differentiate
> between auto domains vs manual domains.
>
> The code is already there anyhow it just has the order of
> iommufd_cdev_autodomains_put vs detach invocation reversed; I'll fix that for
> next version.
While at it, could you please move these routines :
iommufd_cdev_detach_ioas_hwpt
iommufd_cdev_attach_ioas_hwpt
under backends/iommufd.c ? I think that's where they belong.
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 9:28 ` Cédric Le Goater
@ 2024-07-17 9:31 ` Joao Martins
2024-07-18 13:47 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 9:31 UTC (permalink / raw)
To: Cédric Le Goater, Duan, Zhenzhong, eric.auger@redhat.com
Cc: Liu, Yi L, Alex Williamson, Jason Gunthorpe, Avihai Horon,
qemu-devel@nongnu.org
On 17/07/2024 10:28, Cédric Le Goater wrote:
>>>>>>> @@ -224,6 +300,11 @@ static void
>>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>>>>> {
>>>>>>> Error *err = NULL;
>>>>>>>
>>>>>>> + if (vbasedev->hwpt) {
>>>>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>>>>> + return;
>>>>>> Where do we detach the device from the hwpt?
>>>>>>
>>>>> In iommufd_backend_free_id() for auto domains
>>>>>
>>>>
>>>> to clarify here I meant *userspace* auto domains
>>>>
>>>> *kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>>>
>>> If the device is still attached to the hwpt, will iommufd_backend_free_id()
>>> succeed?
>>> Have you tried the hot unplug?
>>>
>>
>> I have but I didn't see any errors. But I will check again for v5 as it could
>> also be my oversight.
>>
>> I was thinking about Eric's remark overnight and I think what I am doing is not
>> correct regardless of the above.
>>
>> I should be calling DETACH_IOMMUFD_PT pairing with ATTACH_IOMMUFD_PT, and the
>> iommufd_backend_free_id() is to drop the final reference pairing with
>> alloc_hwpt() when the device list is empty i.e. when there's no more devices in
>> that vdev::hwpt.
>>
>> DETACH_IOMMUFD_PT decrement the hwpt refcount and it doesn't differentiate
>> between auto domains vs manual domains.
>>
>> The code is already there anyhow it just has the order of
>> iommufd_cdev_autodomains_put vs detach invocation reversed; I'll fix that for
>> next version.
>
> While at it, could you please move these routines :
>
> iommufd_cdev_detach_ioas_hwpt
> iommufd_cdev_attach_ioas_hwpt
>
> under backends/iommufd.c ? I think that's where they belong.
OK
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 9:31 ` Joao Martins
@ 2024-07-18 13:47 ` Joao Martins
2024-07-19 6:06 ` Cédric Le Goater
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-18 13:47 UTC (permalink / raw)
To: Cédric Le Goater, Duan, Zhenzhong, eric.auger@redhat.com
Cc: Liu, Yi L, Alex Williamson, Jason Gunthorpe, Avihai Horon,
qemu-devel@nongnu.org
On 17/07/2024 10:31, Joao Martins wrote:
> On 17/07/2024 10:28, Cédric Le Goater wrote:
>>>>>>>> @@ -224,6 +300,11 @@ static void
>>>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>>>>>> {
>>>>>>>> Error *err = NULL;
>>>>>>>>
>>>>>>>> + if (vbasedev->hwpt) {
>>>>>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>>>>>> + return;
>>>>>>> Where do we detach the device from the hwpt?
>>>>>>>
>>>>>> In iommufd_backend_free_id() for auto domains
>>>>>>
>>>>>
>>>>> to clarify here I meant *userspace* auto domains
>>>>>
>>>>> *kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>>>>
>>>> If the device is still attached to the hwpt, will iommufd_backend_free_id()
>>>> succeed?
>>>> Have you tried the hot unplug?
>>>>
>>>
>>> I have but I didn't see any errors. But I will check again for v5 as it could
>>> also be my oversight.
>>>
>>> I was thinking about Eric's remark overnight and I think what I am doing is not
>>> correct regardless of the above.
>>>
>>> I should be calling DETACH_IOMMUFD_PT pairing with ATTACH_IOMMUFD_PT, and the
>>> iommufd_backend_free_id() is to drop the final reference pairing with
>>> alloc_hwpt() when the device list is empty i.e. when there's no more devices in
>>> that vdev::hwpt.
>>>
>>> DETACH_IOMMUFD_PT decrement the hwpt refcount and it doesn't differentiate
>>> between auto domains vs manual domains.
>>>
>>> The code is already there anyhow it just has the order of
>>> iommufd_cdev_autodomains_put vs detach invocation reversed; I'll fix that for
>>> next version.
>>
>> While at it, could you please move these routines :
>>
>> iommufd_cdev_detach_ioas_hwpt
>> iommufd_cdev_attach_ioas_hwpt
>>
>> under backends/iommufd.c ? I think that's where they belong.
>
> OK
At the first glance I thought this was a good idea. But these functions while
they attach an IOMMUFD they do not really talk to an IOMMUFD backend, but to a
VFIO device file descriptor. Now I think they are in the right place here and we
would leave IOMMUFD uAPI things to backends/iommufd and VFIO APIs in hw/vfio/.
It also uses a lot of VFIODevice* which requires some funny includes in
sysemu/iommufd.h.
Do you still want me to go ahead with it? Here's a snip below of the change
involved:
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2b3d51af26d2..19d1e430ef48 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -20,6 +20,7 @@
#include "trace.h"
#include <sys/ioctl.h>
#include <linux/iommufd.h>
+#include <linux/vfio.h>
static void iommufd_backend_init(Object *obj)
{
@@ -232,6 +233,46 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
uint32_t devid,
return true;
}
+int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
+ Error **errp)
+{
+ int iommufd = vbasedev->iommufd->fd;
+ struct vfio_device_attach_iommufd_pt attach_data = {
+ .argsz = sizeof(attach_data),
+ .flags = 0,
+ .pt_id = id,
+ };
+
+ /* Attach device to an IOAS or hwpt within iommufd */
+ if (ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data)) {
+ error_setg_errno(errp, errno,
+ "[iommufd=%d] error attach %s (%d) to id=%d",
+ iommufd, vbasedev->name, vbasedev->fd, id);
+ return -errno;
+ }
+
+ trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
+ vbasedev->fd, id);
+ return 0;
+}
+
+bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
+{
+ int iommufd = vbasedev->iommufd->fd;
+ struct vfio_device_detach_iommufd_pt detach_data = {
+ .argsz = sizeof(detach_data),
+ .flags = 0,
+ };
+
+ if (ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data)) {
+ error_setg_errno(errp, errno, "detach %s failed", vbasedev->name);
+ return false;
+ }
+
+ trace_iommufd_cdev_detach_ioas_hwpt(iommufd, vbasedev->name);
+ return true;
+}
+
static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
{
HostIOMMUDeviceCaps *caps = &hiod->caps;
diff --git a/backends/trace-events b/backends/trace-events
index 211e6f374adc..2fee8e0af20e 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -15,3 +15,5 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t
ioas, uint64_t iova, u
iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t
size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" s
ize=0x%"PRIx64" (%d)"
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
+iommufd_cdev_attach_ioas_hwpt(int iommufd, const char *name, int devfd, int id)
" [iommufd=%d] Successfully attached device %s (%d)
to id=%d"
+iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name) " [iommufd=%d]
Successfully detached %s"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 077dea8f1b64..5a6d56c915e2 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -172,46 +172,6 @@ out:
return ret;
}
-static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
- Error **errp)
-{
- int iommufd = vbasedev->iommufd->fd;
- struct vfio_device_attach_iommufd_pt attach_data = {
- .argsz = sizeof(attach_data),
- .flags = 0,
- .pt_id = id,
- };
-
- /* Attach device to an IOAS or hwpt within iommufd */
- if (ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data)) {
- error_setg_errno(errp, errno,
- "[iommufd=%d] error attach %s (%d) to id=%d",
- iommufd, vbasedev->name, vbasedev->fd, id);
- return -errno;
- }
-
- trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
- vbasedev->fd, id);
- return 0;
-}
-
-static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
-{
- int iommufd = vbasedev->iommufd->fd;
- struct vfio_device_detach_iommufd_pt detach_data = {
- .argsz = sizeof(detach_data),
- .flags = 0,
- };
-
- if (ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data)) {
- error_setg_errno(errp, errno, "detach %s failed", vbasedev->name);
- return false;
- }
-
- trace_iommufd_cdev_detach_ioas_hwpt(iommufd, vbasedev->name);
- return true;
-}
-
static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index e16179b507ed..24fde6270112 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -170,8 +170,6 @@ vfio_vmstate_change_prepare(const char *name, int running,
const char *reason, c
iommufd_cdev_connect_and_bind(int iommufd, const char *name, int devfd, int
devid) " [iommufd=%d] Successfully bound device %s (fd=%
d): output devid=%d"
iommufd_cdev_getfd(const char *dev, int devfd) " %s (fd=%d)"
-iommufd_cdev_attach_ioas_hwpt(int iommufd, const char *name, int devfd, int id)
" [iommufd=%d] Successfully attached device %s (%d)
to id=%d"
-iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name) " [iommufd=%d]
Successfully detached %s"
iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD
container with ioasid=%d"
iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions,
int flags) " %s (%d) num_irqs=%d num_regions=%d flags
=%d"
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 57d502a1c79a..89780669118f 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -18,6 +18,8 @@
#include "exec/hwaddr.h"
#include "exec/cpu-common.h"
#include "sysemu/host_iommu_device.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio-container-base.h"
#define TYPE_IOMMUFD_BACKEND "iommufd"
OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
@@ -51,5 +53,9 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp);
+bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp);
+int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
+ Error **errp);
+
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-18 13:47 ` Joao Martins
@ 2024-07-19 6:06 ` Cédric Le Goater
0 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-19 6:06 UTC (permalink / raw)
To: Joao Martins, Duan, Zhenzhong, eric.auger@redhat.com
Cc: Liu, Yi L, Alex Williamson, Jason Gunthorpe, Avihai Horon,
qemu-devel@nongnu.org
On 7/18/24 15:47, Joao Martins wrote:
> On 17/07/2024 10:31, Joao Martins wrote:
>> On 17/07/2024 10:28, Cédric Le Goater wrote:
>>>>>>>>> @@ -224,6 +300,11 @@ static void
>>>>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>>>>>>> {
>>>>>>>>> Error *err = NULL;
>>>>>>>>>
>>>>>>>>> + if (vbasedev->hwpt) {
>>>>>>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>>>>>>> + return;
>>>>>>>> Where do we detach the device from the hwpt?
>>>>>>>>
>>>>>>> In iommufd_backend_free_id() for auto domains
>>>>>>>
>>>>>>
>>>>>> to clarify here I meant *userspace* auto domains
>>>>>>
>>>>>> *kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>>>>>
>>>>> If the device is still attached to the hwpt, will iommufd_backend_free_id()
>>>>> succeed?
>>>>> Have you tried the hot unplug?
>>>>>
>>>>
>>>> I have but I didn't see any errors. But I will check again for v5 as it could
>>>> also be my oversight.
>>>>
>>>> I was thinking about Eric's remark overnight and I think what I am doing is not
>>>> correct regardless of the above.
>>>>
>>>> I should be calling DETACH_IOMMUFD_PT pairing with ATTACH_IOMMUFD_PT, and the
>>>> iommufd_backend_free_id() is to drop the final reference pairing with
>>>> alloc_hwpt() when the device list is empty i.e. when there's no more devices in
>>>> that vdev::hwpt.
>>>>
>>>> DETACH_IOMMUFD_PT decrement the hwpt refcount and it doesn't differentiate
>>>> between auto domains vs manual domains.
>>>>
>>>> The code is already there anyhow it just has the order of
>>>> iommufd_cdev_autodomains_put vs detach invocation reversed; I'll fix that for
>>>> next version.
>>>
>>> While at it, could you please move these routines :
>>>
>>> iommufd_cdev_detach_ioas_hwpt
>>> iommufd_cdev_attach_ioas_hwpt
>>>
>>> under backends/iommufd.c ? I think that's where they belong.
>>
>> OK
>
> At the first glance I thought this was a good idea. But these functions while
> they attach an IOMMUFD they do not really talk to an IOMMUFD backend, but to a
> VFIO device file descriptor. Now I think they are in the right place here and we
> would leave IOMMUFD uAPI things to backends/iommufd and VFIO APIs in hw/vfio/.
yep. I was misled by vbasedev->iommufd->fd which is only used in the trace event.
Let's keep things how they are. Thanks for looking,
C.
> It also uses a lot of VFIODevice* which requires some funny includes in
> sysemu/iommufd.h.
>
> Do you still want me to go ahead with it? Here's a snip below of the change
> involved:
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 2b3d51af26d2..19d1e430ef48 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -20,6 +20,7 @@
> #include "trace.h"
> #include <sys/ioctl.h>
> #include <linux/iommufd.h>
> +#include <linux/vfio.h>
>
> static void iommufd_backend_init(Object *obj)
> {
> @@ -232,6 +233,46 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
> uint32_t devid,
> return true;
> }
>
> +int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> + Error **errp)
> +{
> + int iommufd = vbasedev->iommufd->fd;
> + struct vfio_device_attach_iommufd_pt attach_data = {
> + .argsz = sizeof(attach_data),
> + .flags = 0,
> + .pt_id = id,
> + };
> +
> + /* Attach device to an IOAS or hwpt within iommufd */
> + if (ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data)) {
> + error_setg_errno(errp, errno,
> + "[iommufd=%d] error attach %s (%d) to id=%d",
> + iommufd, vbasedev->name, vbasedev->fd, id);
> + return -errno;
> + }
> +
> + trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
> + vbasedev->fd, id);
> + return 0;
> +}
> +
> +bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> +{
> + int iommufd = vbasedev->iommufd->fd;
> + struct vfio_device_detach_iommufd_pt detach_data = {
> + .argsz = sizeof(detach_data),
> + .flags = 0,
> + };
> +
> + if (ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data)) {
> + error_setg_errno(errp, errno, "detach %s failed", vbasedev->name);
> + return false;
> + }
> +
> + trace_iommufd_cdev_detach_ioas_hwpt(iommufd, vbasedev->name);
> + return true;
> +}
> +
> static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
> {
> HostIOMMUDeviceCaps *caps = &hiod->caps;
> diff --git a/backends/trace-events b/backends/trace-events
> index 211e6f374adc..2fee8e0af20e 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -15,3 +15,5 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t
> ioas, uint64_t iova, u
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t
> size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" s
> ize=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> +iommufd_cdev_attach_ioas_hwpt(int iommufd, const char *name, int devfd, int id)
> " [iommufd=%d] Successfully attached device %s (%d)
> to id=%d"
> +iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name) " [iommufd=%d]
> Successfully detached %s"
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 077dea8f1b64..5a6d56c915e2 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -172,46 +172,6 @@ out:
> return ret;
> }
>
> -static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> - Error **errp)
> -{
> - int iommufd = vbasedev->iommufd->fd;
> - struct vfio_device_attach_iommufd_pt attach_data = {
> - .argsz = sizeof(attach_data),
> - .flags = 0,
> - .pt_id = id,
> - };
> -
> - /* Attach device to an IOAS or hwpt within iommufd */
> - if (ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data)) {
> - error_setg_errno(errp, errno,
> - "[iommufd=%d] error attach %s (%d) to id=%d",
> - iommufd, vbasedev->name, vbasedev->fd, id);
> - return -errno;
> - }
> -
> - trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
> - vbasedev->fd, id);
> - return 0;
> -}
> -
> -static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> -{
> - int iommufd = vbasedev->iommufd->fd;
> - struct vfio_device_detach_iommufd_pt detach_data = {
> - .argsz = sizeof(detach_data),
> - .flags = 0,
> - };
> -
> - if (ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data)) {
> - error_setg_errno(errp, errno, "detach %s failed", vbasedev->name);
> - return false;
> - }
> -
> - trace_iommufd_cdev_detach_ioas_hwpt(iommufd, vbasedev->name);
> - return true;
> -}
> -
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index e16179b507ed..24fde6270112 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -170,8 +170,6 @@ vfio_vmstate_change_prepare(const char *name, int running,
> const char *reason, c
>
> iommufd_cdev_connect_and_bind(int iommufd, const char *name, int devfd, int
> devid) " [iommufd=%d] Successfully bound device %s (fd=%
> d): output devid=%d"
> iommufd_cdev_getfd(const char *dev, int devfd) " %s (fd=%d)"
> -iommufd_cdev_attach_ioas_hwpt(int iommufd, const char *name, int devfd, int id)
> " [iommufd=%d] Successfully attached device %s (%d)
> to id=%d"
> -iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name) " [iommufd=%d]
> Successfully detached %s"
> iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
> iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD
> container with ioasid=%d"
> iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions,
> int flags) " %s (%d) num_irqs=%d num_regions=%d flags
> =%d"
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 57d502a1c79a..89780669118f 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -18,6 +18,8 @@
> #include "exec/hwaddr.h"
> #include "exec/cpu-common.h"
> #include "sysemu/host_iommu_device.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/vfio-container-base.h"
>
> #define TYPE_IOMMUFD_BACKEND "iommufd"
> OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
> @@ -51,5 +53,9 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
> uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
>
> +bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp);
> +int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> + Error **errp);
> +
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
>
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 9:09 ` Joao Martins
2024-07-17 9:28 ` Cédric Le Goater
@ 2024-07-17 9:48 ` Duan, Zhenzhong
2024-07-17 9:53 ` Joao Martins
1 sibling, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 9:48 UTC (permalink / raw)
To: Joao Martins, eric.auger@redhat.com
Cc: Liu, Yi L, Alex Williamson, Cedric Le Goater, Jason Gunthorpe,
Avihai Horon, qemu-devel@nongnu.org
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>creation
>
>On 17/07/2024 03:52, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>>> creation
>>>
>>> On 16/07/2024 17:44, Joao Martins wrote:
>>>> On 16/07/2024 17:04, Eric Auger wrote:
>>>>> Hi Joao,
>>>>>
>>>>> On 7/12/24 13:46, Joao Martins wrote:
>>>>>> There's generally two modes of operation for IOMMUFD:
>>>>>>
>>>>>> * The simple user API which intends to perform relatively simple
>things
>>>>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to
>VFIO
>>>>>
>>>>> It generally creates? can you explicit what is "it"
>>>>>
>>>> 'It' here refers to the process/API-user
>>>>
>>>>> I am confused by this automatic terminology again (not your fault). the
>>> doc says:
>>>>> "
>>>>>
>>>>> *
>>>>>
>>>>> Automatic domain - refers to an iommu domain created
>automatically
>>>>> when attaching a device to an IOAS object. This is compatible to the
>>>>> semantics of VFIO type1.
>>>>>
>>>>> *
>>>>>
>>>>> Manual domain - refers to an iommu domain designated by the user
>as
>>>>> the target pagetable to be attached to by a device. Though currently
>>>>> there are no uAPIs to directly create such domain, the datastructure
>>>>> and algorithms are ready for handling that use case.
>>>>>
>>>>> "
>>>>>
>>>>>
>>>>> in 1) the device is attached to the ioas id (using the auto domain if I am
>>> not wrong)
>>>>> Here you attach to an hwpt id. Isn't it a manual domain?
>>>>>
>>>>
>>>> Correct.
>>>>
>>>> The 'auto domains' generally refers to the kernel-equivalent own
>>> automatic
>>>> attaching to a new pagetable.
>>>>
>>>> Here I call 'auto domains' in the userspace version too because we are
>>> doing the
>>>> exact same but from userspace, using the manual API in IOMMUFD.
>>>>
>>>>>> and mainly performs IOAS_MAP and UNMAP.
>>>>>>
>>>>>> * The native IOMMUFD API where you have fine grained control of
>the
>>>>>> IOMMU domain and model it accordingly. This is where most new
>>> feature
>>>>>> are being steered to.
>>>>>>
>>>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>>>> the stage-2/parent IOMMU domain will only attach devices
>>>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>>>> useful guarantee to VMMs that will refuse incompatible device
>>>>>> attachments for IOMMU domains.
>>>>>>
>>>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>> automatically
>>>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>>>> the needed handling for mdevs.
>>>>>>
>>>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>>>> similar logic, where IOMMU domains are created and devices
>attached
>>> to
>>>>>> compatible domains. Essentially mimmicing kernel
>>>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>>> IOMMU domain
>>>>>> it falls back to IOAS attach.
>>>>>>
>>>>>> The auto domain logic allows different IOMMU domains to be created
>>> when
>>>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>>> where
>>>>>> it is. Here is not used in this way here given how VFIODevice
>migration
>>>>>
>>>>> Here is not used in this way here ?
>>>>>
>>>>
>>>> I meant, 'Here it is not used in this way given (...)'
>>>>
>>>>>> state is initialized after the device attachment. But such mixed mode
>of
>>>>>> IOMMU dirty tracking + device dirty tracking is an improvement that
>>> can
>>>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>>>> been using so far between container vs device dirty tracking.
>>>>>>
>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>> ---
>>>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>>>> include/sysemu/iommufd.h | 5 +++
>>>>>> backends/iommufd.c | 30 +++++++++++++
>>>>>> hw/vfio/iommufd.c | 82
>>> +++++++++++++++++++++++++++++++++++
>>>>>> backends/trace-events | 1 +
>>>>>> 5 files changed, 127 insertions(+)
>>>>>>
>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>> common.h
>>>>>> index 7419466bca92..2dd468ce3c02 100644
>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>>>
>>>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>>>
>>>>>> +typedef struct VFIOIOASHwpt {
>>>>>> + uint32_t hwpt_id;
>>>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>>>> +} VFIOIOASHwpt;
>>>>>> +
>>>>>> typedef struct VFIOIOMMUFDContainer {
>>>>>> VFIOContainerBase bcontainer;
>>>>>> IOMMUFDBackend *be;
>>>>>> uint32_t ioas_id;
>>>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>>>> } VFIOIOMMUFDContainer;
>>>>>>
>>>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>> VFIO_IOMMU_IOMMUFD);
>>>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>>>> HostIOMMUDevice *hiod;
>>>>>> int devid;
>>>>>> IOMMUFDBackend *iommufd;
>>>>>> + VFIOIOASHwpt *hwpt;
>>>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>>>> } VFIODevice;
>>>>>>
>>>>>> struct VFIODeviceOps {
>>>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>>>> index 57d502a1c79a..e917e7591d05 100644
>>>>>> --- a/include/sysemu/iommufd.h
>>>>>> +++ b/include/sysemu/iommufd.h
>>>>>> @@ -50,6 +50,11 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>> uint32_t devid,
>>>>>> uint32_t *type, void *data, uint32_t len,
>>>>>> uint64_t *caps, Error **errp);
>>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be,
>uint32_t
>>> dev_id,
>>>>>> + uint32_t pt_id, uint32_t flags,
>>>>>> + uint32_t data_type, uint32_t data_len,
>>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>>> + Error **errp);
>>>>>>
>>>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>>>> #endif
>>>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>>>> index 2b3d51af26d2..5d3dfa917415 100644
>>>>>> --- a/backends/iommufd.c
>>>>>> +++ b/backends/iommufd.c
>>>>>> @@ -208,6 +208,36 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be,
>uint32_t
>>> dev_id,
>>>>>> + uint32_t pt_id, uint32_t flags,
>>>>>> + uint32_t data_type, uint32_t data_len,
>>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>>> + Error **errp)
>>>>>> +{
>>>>>> + int ret, fd = be->fd;
>>>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>>>> + .flags = flags,
>>>>>> + .dev_id = dev_id,
>>>>>> + .pt_id = pt_id,
>>>>>> + .data_type = data_type,
>>>>>> + .data_len = data_len,
>>>>>> + .data_uptr = (uint64_t)data_ptr,
>>>>>> + };
>>>>>> +
>>>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>>> data_type,
>>>>>> + data_len, (uint64_t)data_ptr,
>>>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>>>> + if (ret) {
>>>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>>>> + return false;
>>>>>> + }
>>>>>> +
>>>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>>>> + return true;
>>>>>> +}
>>>>>> +
>>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>> uint32_t devid,
>>>>>> uint32_t *type, void *data, uint32_t len,
>>>>>> uint64_t *caps, Error **errp)
>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>> index 077dea8f1b64..325c7598d5a1 100644
>>>>>> --- a/hw/vfio/iommufd.c
>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>> @@ -212,10 +212,86 @@ static bool
>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>>>>> return true;
>>>>>> }
>>>>>>
>>>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>>> + VFIOIOMMUFDContainer *container,
>>>>>> + Error **errp)
>>>>>> +{
>>>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>>>> + uint32_t flags = 0;
>>>>>> + VFIOIOASHwpt *hwpt;
>>>>>> + uint32_t hwpt_id;
>>>>>> + int ret;
>>>>>> +
>>>>>> + /* Try to find a domain */
>>>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt-
>>hwpt_id,
>>> errp);
>>>>>> + if (ret) {
>>>>>> + /* -EINVAL means the domain is incompatible with the device.
>>> */
>>>>>> + if (ret == -EINVAL) {
>>>>>> + /*
>>>>>> + * It is an expected failure and it just means we will try
>>>>>> + * another domain, or create one if no existing compatible
>>>>>> + * domain is found. Hence why the error is discarded below.
>>>>>> + */
>>>>>> + error_free(*errp);
>>>>>> + *errp = NULL;
>>>>>> + continue;
>>>>>> + }
>>>>>> +
>>>>>> + return false;
>>>>>> + } else {
>>>>>> + vbasedev->hwpt = hwpt;
>>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev,
>hwpt_next);
>>>>>> + return true;
>>>>>> + }
>>>>>> + }
>>>>>> +
>>>>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>>>> + container->ioas_id, flags,
>>>>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>>>> + &hwpt_id, errp)) {
>>>>>> + return false;
>>>>>> + }
>>>>>> +
>>>>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>>>>> + hwpt->hwpt_id = hwpt_id;
>>>>>> + QLIST_INIT(&hwpt->device_list);
>>>>>> +
>>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt-
>>hwpt_id,
>>> errp);
>>>>>> + if (ret) {
>>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>>> + g_free(hwpt);
>>>>>> + return false;
>>>>>> + }
>>>>>> +
>>>>>> + vbasedev->hwpt = hwpt;
>>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>>>> + return true;
>>>>>> +}
>>>>>> +
>>>>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>>>>> + VFIOIOMMUFDContainer *container)
>>>>>> +{
>>>>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>>>>> +
>>>>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>>>> don't you want to reset vbasedev->hwpt = NULL too?
>>>>>
>>>> Yeap, Thanks for catching that
>>>>
>>>>>
>>>>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>>>>> + QLIST_REMOVE(hwpt, next);
>>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>>> + g_free(hwpt);
>>>>>> + }
>>>>>> +}
>>>>>> +
>>>>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>>>>> VFIOIOMMUFDContainer *container,
>>>>>> Error **errp)
>>>>>> {
>>>>>> + /* mdevs aren't physical devices and will fail with auto domains
>*/
>>>>>> + if (!vbasedev->mdev) {
>>>>>> + return iommufd_cdev_autodomains_get(vbasedev, container,
>>> errp);
>>>>>> + }
>>>>>> +
>>>>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container-
>>>> ioas_id, errp);
>>>>>> }
>>>>>>
>>>>>> @@ -224,6 +300,11 @@ static void
>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>>>> {
>>>>>> Error *err = NULL;
>>>>>>
>>>>>> + if (vbasedev->hwpt) {
>>>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>>>> + return;
>>>>> Where do we detach the device from the hwpt?
>>>>>
>>>> In iommufd_backend_free_id() for auto domains
>>>>
>>>
>>> to clarify here I meant *userspace* auto domains
>>>
>>> *kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>>
>> If the device is still attached to the hwpt, will iommufd_backend_free_id()
>succeed?
>> Have you tried the hot unplug?
>>
>
>I have but I didn't see any errors. But I will check again for v5 as it could
>also be my oversight.
>
>I was thinking about Eric's remark overnight and I think what I am doing is
>not
>correct regardless of the above.
>
>I should be calling DETACH_IOMMUFD_PT pairing with
>ATTACH_IOMMUFD_PT, and the
>iommufd_backend_free_id() is to drop the final reference pairing with
>alloc_hwpt() when the device list is empty i.e. when there's no more devices
>in
>that vdev::hwpt.
>
>DETACH_IOMMUFD_PT decrement the hwpt refcount and it doesn't
>differentiate
>between auto domains vs manual domains.
Yes, missing DETACH_IOMMUFD_PT so ref count isn't decreased.
My understanding is freeing hwpt will fails become device is still attached, such as return -EBUSY,
But may be I understand wrong as you didn't see that failure.
Thanks
Zhenzhong
>
>The code is already there anyhow it just has the order of
>iommufd_cdev_autodomains_put vs detach invocation reversed; I'll fix that
>for
>next version.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 9:48 ` Duan, Zhenzhong
@ 2024-07-17 9:53 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 9:53 UTC (permalink / raw)
To: Duan, Zhenzhong, eric.auger@redhat.com
Cc: Liu, Yi L, Alex Williamson, Cedric Le Goater, Jason Gunthorpe,
Avihai Horon, qemu-devel@nongnu.org
On 17/07/2024 10:48, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>> creation
>>
>> On 17/07/2024 03:52, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>>>> creation
>>>>
>>>> On 16/07/2024 17:44, Joao Martins wrote:
>>>>> On 16/07/2024 17:04, Eric Auger wrote:
>>>>>> Hi Joao,
>>>>>>
>>>>>> On 7/12/24 13:46, Joao Martins wrote:
>>>>>>> There's generally two modes of operation for IOMMUFD:
>>>>>>>
>>>>>>> * The simple user API which intends to perform relatively simple
>> things
>>>>>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to
>> VFIO
>>>>>>
>>>>>> It generally creates? can you explicit what is "it"
>>>>>>
>>>>> 'It' here refers to the process/API-user
>>>>>
>>>>>> I am confused by this automatic terminology again (not your fault). the
>>>> doc says:
>>>>>> "
>>>>>>
>>>>>> *
>>>>>>
>>>>>> Automatic domain - refers to an iommu domain created
>> automatically
>>>>>> when attaching a device to an IOAS object. This is compatible to the
>>>>>> semantics of VFIO type1.
>>>>>>
>>>>>> *
>>>>>>
>>>>>> Manual domain - refers to an iommu domain designated by the user
>> as
>>>>>> the target pagetable to be attached to by a device. Though currently
>>>>>> there are no uAPIs to directly create such domain, the datastructure
>>>>>> and algorithms are ready for handling that use case.
>>>>>>
>>>>>> "
>>>>>>
>>>>>>
>>>>>> in 1) the device is attached to the ioas id (using the auto domain if I am
>>>> not wrong)
>>>>>> Here you attach to an hwpt id. Isn't it a manual domain?
>>>>>>
>>>>>
>>>>> Correct.
>>>>>
>>>>> The 'auto domains' generally refers to the kernel-equivalent own
>>>> automatic
>>>>> attaching to a new pagetable.
>>>>>
>>>>> Here I call 'auto domains' in the userspace version too because we are
>>>> doing the
>>>>> exact same but from userspace, using the manual API in IOMMUFD.
>>>>>
>>>>>>> and mainly performs IOAS_MAP and UNMAP.
>>>>>>>
>>>>>>> * The native IOMMUFD API where you have fine grained control of
>> the
>>>>>>> IOMMU domain and model it accordingly. This is where most new
>>>> feature
>>>>>>> are being steered to.
>>>>>>>
>>>>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>>>>> the stage-2/parent IOMMU domain will only attach devices
>>>>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>>>>> useful guarantee to VMMs that will refuse incompatible device
>>>>>>> attachments for IOMMU domains.
>>>>>>>
>>>>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>>> automatically
>>>>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>>>>> the needed handling for mdevs.
>>>>>>>
>>>>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>>>>> similar logic, where IOMMU domains are created and devices
>> attached
>>>> to
>>>>>>> compatible domains. Essentially mimmicing kernel
>>>>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>>>> IOMMU domain
>>>>>>> it falls back to IOAS attach.
>>>>>>>
>>>>>>> The auto domain logic allows different IOMMU domains to be created
>>>> when
>>>>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>>>> where
>>>>>>> it is. Here is not used in this way here given how VFIODevice
>> migration
>>>>>>
>>>>>> Here is not used in this way here ?
>>>>>>
>>>>>
>>>>> I meant, 'Here it is not used in this way given (...)'
>>>>>
>>>>>>> state is initialized after the device attachment. But such mixed mode
>> of
>>>>>>> IOMMU dirty tracking + device dirty tracking is an improvement that
>>>> can
>>>>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>>>>> been using so far between container vs device dirty tracking.
>>>>>>>
>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>> ---
>>>>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>>>>> include/sysemu/iommufd.h | 5 +++
>>>>>>> backends/iommufd.c | 30 +++++++++++++
>>>>>>> hw/vfio/iommufd.c | 82
>>>> +++++++++++++++++++++++++++++++++++
>>>>>>> backends/trace-events | 1 +
>>>>>>> 5 files changed, 127 insertions(+)
>>>>>>>
>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>>> common.h
>>>>>>> index 7419466bca92..2dd468ce3c02 100644
>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>>>>
>>>>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>>>>
>>>>>>> +typedef struct VFIOIOASHwpt {
>>>>>>> + uint32_t hwpt_id;
>>>>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>>>>> +} VFIOIOASHwpt;
>>>>>>> +
>>>>>>> typedef struct VFIOIOMMUFDContainer {
>>>>>>> VFIOContainerBase bcontainer;
>>>>>>> IOMMUFDBackend *be;
>>>>>>> uint32_t ioas_id;
>>>>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>>>>> } VFIOIOMMUFDContainer;
>>>>>>>
>>>>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>>> VFIO_IOMMU_IOMMUFD);
>>>>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>>>>> HostIOMMUDevice *hiod;
>>>>>>> int devid;
>>>>>>> IOMMUFDBackend *iommufd;
>>>>>>> + VFIOIOASHwpt *hwpt;
>>>>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>>>>> } VFIODevice;
>>>>>>>
>>>>>>> struct VFIODeviceOps {
>>>>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>>>>> index 57d502a1c79a..e917e7591d05 100644
>>>>>>> --- a/include/sysemu/iommufd.h
>>>>>>> +++ b/include/sysemu/iommufd.h
>>>>>>> @@ -50,6 +50,11 @@ int
>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>> ioas_id,
>>>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>>> uint32_t devid,
>>>>>>> uint32_t *type, void *data, uint32_t len,
>>>>>>> uint64_t *caps, Error **errp);
>>>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be,
>> uint32_t
>>>> dev_id,
>>>>>>> + uint32_t pt_id, uint32_t flags,
>>>>>>> + uint32_t data_type, uint32_t data_len,
>>>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>>>> + Error **errp);
>>>>>>>
>>>>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>>>>> #endif
>>>>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>>>>> index 2b3d51af26d2..5d3dfa917415 100644
>>>>>>> --- a/backends/iommufd.c
>>>>>>> +++ b/backends/iommufd.c
>>>>>>> @@ -208,6 +208,36 @@ int
>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>> ioas_id,
>>>>>>> return ret;
>>>>>>> }
>>>>>>>
>>>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be,
>> uint32_t
>>>> dev_id,
>>>>>>> + uint32_t pt_id, uint32_t flags,
>>>>>>> + uint32_t data_type, uint32_t data_len,
>>>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>>>> + Error **errp)
>>>>>>> +{
>>>>>>> + int ret, fd = be->fd;
>>>>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>>>>> + .flags = flags,
>>>>>>> + .dev_id = dev_id,
>>>>>>> + .pt_id = pt_id,
>>>>>>> + .data_type = data_type,
>>>>>>> + .data_len = data_len,
>>>>>>> + .data_uptr = (uint64_t)data_ptr,
>>>>>>> + };
>>>>>>> +
>>>>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>>>> data_type,
>>>>>>> + data_len, (uint64_t)data_ptr,
>>>>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>>>>> + if (ret) {
>>>>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>>>>> + return false;
>>>>>>> + }
>>>>>>> +
>>>>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>>>>> + return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>>> uint32_t devid,
>>>>>>> uint32_t *type, void *data, uint32_t len,
>>>>>>> uint64_t *caps, Error **errp)
>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>> index 077dea8f1b64..325c7598d5a1 100644
>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>> @@ -212,10 +212,86 @@ static bool
>>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>>>>>> return true;
>>>>>>> }
>>>>>>>
>>>>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>>>> + VFIOIOMMUFDContainer *container,
>>>>>>> + Error **errp)
>>>>>>> +{
>>>>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>>>>> + uint32_t flags = 0;
>>>>>>> + VFIOIOASHwpt *hwpt;
>>>>>>> + uint32_t hwpt_id;
>>>>>>> + int ret;
>>>>>>> +
>>>>>>> + /* Try to find a domain */
>>>>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt-
>>> hwpt_id,
>>>> errp);
>>>>>>> + if (ret) {
>>>>>>> + /* -EINVAL means the domain is incompatible with the device.
>>>> */
>>>>>>> + if (ret == -EINVAL) {
>>>>>>> + /*
>>>>>>> + * It is an expected failure and it just means we will try
>>>>>>> + * another domain, or create one if no existing compatible
>>>>>>> + * domain is found. Hence why the error is discarded below.
>>>>>>> + */
>>>>>>> + error_free(*errp);
>>>>>>> + *errp = NULL;
>>>>>>> + continue;
>>>>>>> + }
>>>>>>> +
>>>>>>> + return false;
>>>>>>> + } else {
>>>>>>> + vbasedev->hwpt = hwpt;
>>>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev,
>> hwpt_next);
>>>>>>> + return true;
>>>>>>> + }
>>>>>>> + }
>>>>>>> +
>>>>>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>>>>> + container->ioas_id, flags,
>>>>>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>>>>> + &hwpt_id, errp)) {
>>>>>>> + return false;
>>>>>>> + }
>>>>>>> +
>>>>>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>>>>>> + hwpt->hwpt_id = hwpt_id;
>>>>>>> + QLIST_INIT(&hwpt->device_list);
>>>>>>> +
>>>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt-
>>> hwpt_id,
>>>> errp);
>>>>>>> + if (ret) {
>>>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>>>> + g_free(hwpt);
>>>>>>> + return false;
>>>>>>> + }
>>>>>>> +
>>>>>>> + vbasedev->hwpt = hwpt;
>>>>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>>>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>>>>> + return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>>>>>> + VFIOIOMMUFDContainer *container)
>>>>>>> +{
>>>>>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>>>>>> +
>>>>>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>>>>> don't you want to reset vbasedev->hwpt = NULL too?
>>>>>>
>>>>> Yeap, Thanks for catching that
>>>>>
>>>>>>
>>>>>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>>>>>> + QLIST_REMOVE(hwpt, next);
>>>>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>>>>> + g_free(hwpt);
>>>>>>> + }
>>>>>>> +}
>>>>>>> +
>>>>>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>>>>>> VFIOIOMMUFDContainer *container,
>>>>>>> Error **errp)
>>>>>>> {
>>>>>>> + /* mdevs aren't physical devices and will fail with auto domains
>> */
>>>>>>> + if (!vbasedev->mdev) {
>>>>>>> + return iommufd_cdev_autodomains_get(vbasedev, container,
>>>> errp);
>>>>>>> + }
>>>>>>> +
>>>>>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container-
>>>>> ioas_id, errp);
>>>>>>> }
>>>>>>>
>>>>>>> @@ -224,6 +300,11 @@ static void
>>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>>>>> {
>>>>>>> Error *err = NULL;
>>>>>>>
>>>>>>> + if (vbasedev->hwpt) {
>>>>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>>>>> + return;
>>>>>> Where do we detach the device from the hwpt?
>>>>>>
>>>>> In iommufd_backend_free_id() for auto domains
>>>>>
>>>>
>>>> to clarify here I meant *userspace* auto domains
>>>>
>>>> *kernel* auto domains (mdev) goes via DETACH_IOMMUFD_PT
>>>
>>> If the device is still attached to the hwpt, will iommufd_backend_free_id()
>> succeed?
>>> Have you tried the hot unplug?
>>>
>>
>> I have but I didn't see any errors. But I will check again for v5 as it could
>> also be my oversight.
>>
>> I was thinking about Eric's remark overnight and I think what I am doing is
>> not
>> correct regardless of the above.
>>
>> I should be calling DETACH_IOMMUFD_PT pairing with
>> ATTACH_IOMMUFD_PT, and the
>> iommufd_backend_free_id() is to drop the final reference pairing with
>> alloc_hwpt() when the device list is empty i.e. when there's no more devices
>> in
>> that vdev::hwpt.
>>
>> DETACH_IOMMUFD_PT decrement the hwpt refcount and it doesn't
>> differentiate
>> between auto domains vs manual domains.
>
> Yes, missing DETACH_IOMMUFD_PT so ref count isn't decreased.
> My understanding is freeing hwpt will fails become device is still attached, such as return -EBUSY,
> But may be I understand wrong as you didn't see that failure.
>
I recall exercising *hotunplug/hotplug* that working it out, but the error
likely could go silent as it doesn't does fail higher levels. So part of the
reason that although it seemed to work it might be my oversight in seeing that
the errno was returned from free id operation.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-16 16:44 ` Joao Martins
2024-07-16 16:46 ` Joao Martins
@ 2024-07-16 17:32 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-16 17:32 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/16/24 18:44, Joao Martins wrote:
> On 16/07/2024 17:04, Eric Auger wrote:
>> Hi Joao,
>>
>> On 7/12/24 13:46, Joao Martins wrote:
>>> There's generally two modes of operation for IOMMUFD:
>>>
>>> * The simple user API which intends to perform relatively simple things
>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> It generally creates? can you explicit what is "it"
>>
> 'It' here refers to the process/API-user
OK
>
>> I am confused by this automatic terminology again (not your fault). the doc says:
>> "
>>
>> *
>>
>> Automatic domain - refers to an iommu domain created automatically
>> when attaching a device to an IOAS object. This is compatible to the
>> semantics of VFIO type1.
>>
>> *
>>
>> Manual domain - refers to an iommu domain designated by the user as
>> the target pagetable to be attached to by a device. Though currently
>> there are no uAPIs to directly create such domain, the datastructure
>> and algorithms are ready for handling that use case.
>>
>> "
>>
>>
>> in 1) the device is attached to the ioas id (using the auto domain if I am not wrong)
>> Here you attach to an hwpt id. Isn't it a manual domain?
>>
> Correct.
>
> The 'auto domains' generally refers to the kernel-equivalent own automatic
> attaching to a new pagetable.
>
> Here I call 'auto domains' in the userspace version too because we are doing the
> exact same but from userspace, using the manual API in IOMMUFD.
OK
>
>>> and mainly performs IOAS_MAP and UNMAP.
>>>
>>> * The native IOMMUFD API where you have fine grained control of the
>>> IOMMU domain and model it accordingly. This is where most new feature
>>> are being steered to.
>>>
>>> For dirty tracking 2) is required, as it needs to ensure that
>>> the stage-2/parent IOMMU domain will only attach devices
>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>> useful guarantee to VMMs that will refuse incompatible device
>>> attachments for IOMMU domains.
>>>
>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>> responsible for creating an IOMMU domain. This is contrast to the
>>> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>> the needed handling for mdevs.
>>>
>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>> similar logic, where IOMMU domains are created and devices attached to
>>> compatible domains. Essentially mimmicing kernel
>>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
>>> it falls back to IOAS attach.
>>>
>>> The auto domain logic allows different IOMMU domains to be created when
>>> DMA dirty tracking is not desired (and VF can provide it), and others where
>>> it is. Here is not used in this way here given how VFIODevice migration
>> Here is not used in this way here ?
>>
> I meant, 'Here it is not used in this way given (...)'
OK
>
>>> state is initialized after the device attachment. But such mixed mode of
>>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>> been using so far between container vs device dirty tracking.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 9 ++++
>>> include/sysemu/iommufd.h | 5 +++
>>> backends/iommufd.c | 30 +++++++++++++
>>> hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
>>> backends/trace-events | 1 +
>>> 5 files changed, 127 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 7419466bca92..2dd468ce3c02 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>
>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>
>>> +typedef struct VFIOIOASHwpt {
>>> + uint32_t hwpt_id;
>>> + QLIST_HEAD(, VFIODevice) device_list;
>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>> +} VFIOIOASHwpt;
>>> +
>>> typedef struct VFIOIOMMUFDContainer {
>>> VFIOContainerBase bcontainer;
>>> IOMMUFDBackend *be;
>>> uint32_t ioas_id;
>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>> } VFIOIOMMUFDContainer;
>>>
>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>> HostIOMMUDevice *hiod;
>>> int devid;
>>> IOMMUFDBackend *iommufd;
>>> + VFIOIOASHwpt *hwpt;
>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>> } VFIODevice;
>>>
>>> struct VFIODeviceOps {
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index 57d502a1c79a..e917e7591d05 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp);
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp);
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>> #endif
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 2b3d51af26d2..5d3dfa917415 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> return ret;
>>> }
>>>
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>> + .flags = flags,
>>> + .dev_id = dev_id,
>>> + .pt_id = pt_id,
>>> + .data_type = data_type,
>>> + .data_len = data_len,
>>> + .data_uptr = (uint64_t)data_ptr,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>>> + data_len, (uint64_t)data_ptr,
>>> + alloc_hwpt.out_hwpt_id, ret);
>>> + if (ret) {
>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>> + return false;
>>> + }
>>> +
>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>> + return true;
>>> +}
>>> +
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp)
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 077dea8f1b64..325c7598d5a1 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>> return true;
>>> }
>>>
>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container,
>>> + Error **errp)
>>> +{
>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>> + uint32_t flags = 0;
>>> + VFIOIOASHwpt *hwpt;
>>> + uint32_t hwpt_id;
>>> + int ret;
>>> +
>>> + /* Try to find a domain */
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>> + if (ret) {
>>> + /* -EINVAL means the domain is incompatible with the device. */
>>> + if (ret == -EINVAL) {
>>> + /*
>>> + * It is an expected failure and it just means we will try
>>> + * another domain, or create one if no existing compatible
>>> + * domain is found. Hence why the error is discarded below.
>>> + */
>>> + error_free(*errp);
>>> + *errp = NULL;
>>> + continue;
>>> + }
>>> +
>>> + return false;
>>> + } else {
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + return true;
>>> + }
>>> + }
>>> +
>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>> + container->ioas_id, flags,
>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>> + &hwpt_id, errp)) {
>>> + return false;
>>> + }
>>> +
>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>> + hwpt->hwpt_id = hwpt_id;
>>> + QLIST_INIT(&hwpt->device_list);
>>> +
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>> + if (ret) {
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + return false;
>>> + }
>>> +
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> + return true;
>>> +}
>>> +
>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container)
>>> +{
>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>> +
>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>> don't you want to reset vbasedev->hwpt = NULL too?
>>
> Yeap, Thanks for catching that
>
>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>> + QLIST_REMOVE(hwpt, next);
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + }
>>> +}
>>> +
>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>> VFIOIOMMUFDContainer *container,
>>> Error **errp)
>>> {
>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>> + if (!vbasedev->mdev) {
>>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>>> + }
>>> +
>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
>>> }
>>>
>>> @@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>> {
>>> Error *err = NULL;
>>>
>>> + if (vbasedev->hwpt) {
>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>> + return;
>> Where do we detach the device from the hwpt?
>>
> In iommufd_backend_free_id() for auto domains
Hum I see iommufd_backend_free_id frees the object. I guess the detach
then is done on kernel side...
Eric
>
>> Thanks
>>
>> Eric
>>> + }
>>> +
>>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>>> error_report_err(err);
>>> }
>>> @@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>> container->be = vbasedev->iommufd;
>>> container->ioas_id = ioas_id;
>>> + QLIST_INIT(&container->hwpt_list);
>>>
>>> bcontainer = &container->bcontainer;
>>> vfio_address_space_insert(space, bcontainer);
>>> diff --git a/backends/trace-events b/backends/trace-events
>>> index 211e6f374adc..4d8ac02fe7d6 100644
>>> --- a/backends/trace-events
>>> +++ b/backends/trace-events
>>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
>>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
>>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-12 11:46 ` [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation Joao Martins
` (2 preceding siblings ...)
2024-07-16 16:04 ` Eric Auger
@ 2024-07-17 2:18 ` Duan, Zhenzhong
2024-07-17 9:04 ` Joao Martins
3 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 2:18 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
>
>There's generally two modes of operation for IOMMUFD:
>
>* The simple user API which intends to perform relatively simple things
>with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>and mainly performs IOAS_MAP and UNMAP.
>
>* The native IOMMUFD API where you have fine grained control of the
>IOMMU domain and model it accordingly. This is where most new feature
>are being steered to.
>
>For dirty tracking 2) is required, as it needs to ensure that
>the stage-2/parent IOMMU domain will only attach devices
>that support dirty tracking (so far it is all homogeneous in x86, likely
>not the case for smmuv3). Such invariant on dirty tracking provides a
>useful guarantee to VMMs that will refuse incompatible device
>attachments for IOMMU domains.
>
>Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>responsible for creating an IOMMU domain. This is contrast to the
>'simple API' where the IOMMU domain is created by IOMMUFD
>automatically
>when it attaches to VFIO (usually referred as autodomains) but it has
>the needed handling for mdevs.
>
>To support dirty tracking with the advanced IOMMUFD API, it needs
>similar logic, where IOMMU domains are created and devices attached to
>compatible domains. Essentially mimmicing kernel
>iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU
>domain
>it falls back to IOAS attach.
>
>The auto domain logic allows different IOMMU domains to be created when
>DMA dirty tracking is not desired (and VF can provide it), and others where
>it is. Here is not used in this way here given how VFIODevice migration
>state is initialized after the device attachment. But such mixed mode of
>IOMMU dirty tracking + device dirty tracking is an improvement that can
>be added on. Keep the 'all of nothing' of type1 approach that we have
>been using so far between container vs device dirty tracking.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/hw/vfio/vfio-common.h | 9 ++++
> include/sysemu/iommufd.h | 5 +++
> backends/iommufd.c | 30 +++++++++++++
> hw/vfio/iommufd.c | 82
>+++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 5 files changed, 127 insertions(+)
>
>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>index 7419466bca92..2dd468ce3c02 100644
>--- a/include/hw/vfio/vfio-common.h
>+++ b/include/hw/vfio/vfio-common.h
>@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
>
>+typedef struct VFIOIOASHwpt {
>+ uint32_t hwpt_id;
>+ QLIST_HEAD(, VFIODevice) device_list;
>+ QLIST_ENTRY(VFIOIOASHwpt) next;
>+} VFIOIOASHwpt;
>+
> typedef struct VFIOIOMMUFDContainer {
> VFIOContainerBase bcontainer;
> IOMMUFDBackend *be;
> uint32_t ioas_id;
>+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> } VFIOIOMMUFDContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>VFIO_IOMMU_IOMMUFD);
>@@ -135,6 +142,8 @@ typedef struct VFIODevice {
> HostIOMMUDevice *hiod;
> int devid;
> IOMMUFDBackend *iommufd;
>+ VFIOIOASHwpt *hwpt;
>+ QLIST_ENTRY(VFIODevice) hwpt_next;
> } VFIODevice;
>
> struct VFIODeviceOps {
>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index 57d502a1c79a..e917e7591d05 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -50,6 +50,11 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
>+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>dev_id,
>+ uint32_t pt_id, uint32_t flags,
>+ uint32_t data_type, uint32_t data_len,
>+ void *data_ptr, uint32_t *out_hwpt,
>+ Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index 2b3d51af26d2..5d3dfa917415 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -208,6 +208,36 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> return ret;
> }
>
>+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>dev_id,
>+ uint32_t pt_id, uint32_t flags,
>+ uint32_t data_type, uint32_t data_len,
>+ void *data_ptr, uint32_t *out_hwpt,
>+ Error **errp)
>+{
>+ int ret, fd = be->fd;
>+ struct iommu_hwpt_alloc alloc_hwpt = {
>+ .size = sizeof(struct iommu_hwpt_alloc),
>+ .flags = flags,
>+ .dev_id = dev_id,
>+ .pt_id = pt_id,
>+ .data_type = data_type,
>+ .data_len = data_len,
>+ .data_uptr = (uint64_t)data_ptr,
>+ };
>+
>+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>+ data_len, (uint64_t)data_ptr,
>+ alloc_hwpt.out_hwpt_id, ret);
>+ if (ret) {
>+ error_setg_errno(errp, errno, "Failed to allocate hwpt");
>+ return false;
>+ }
>+
>+ *out_hwpt = alloc_hwpt.out_hwpt_id;
>+ return true;
>+}
>+
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 077dea8f1b64..325c7598d5a1 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -212,10 +212,86 @@ static bool
>iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
>+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>+ VFIOIOMMUFDContainer *container,
>+ Error **errp)
>+{
>+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
>+ uint32_t flags = 0;
>+ VFIOIOASHwpt *hwpt;
>+ uint32_t hwpt_id;
>+ int ret;
>+
>+ /* Try to find a domain */
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>errp);
If there is already an hwpt that supports dirty tracking.
Another device that doesn't support dirty tracking attaches to this hwpt, will it succeed?
If existing hwpt doesn't support dirty tracking.
Another device supporting dirty tracking attaches to that hwpt, what will happen?
Thanks
Zhenzhong
>+ if (ret) {
>+ /* -EINVAL means the domain is incompatible with the device. */
>+ if (ret == -EINVAL) {
>+ /*
>+ * It is an expected failure and it just means we will try
>+ * another domain, or create one if no existing compatible
>+ * domain is found. Hence why the error is discarded below.
>+ */
>+ error_free(*errp);
>+ *errp = NULL;
>+ continue;
>+ }
>+
>+ return false;
>+ } else {
>+ vbasedev->hwpt = hwpt;
>+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>+ return true;
>+ }
>+ }
>+
>+ if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>+ container->ioas_id, flags,
>+ IOMMU_HWPT_DATA_NONE, 0, NULL,
>+ &hwpt_id, errp)) {
>+ return false;
>+ }
>+
>+ hwpt = g_malloc0(sizeof(*hwpt));
>+ hwpt->hwpt_id = hwpt_id;
>+ QLIST_INIT(&hwpt->device_list);
>+
>+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>+ if (ret) {
>+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>+ g_free(hwpt);
>+ return false;
>+ }
>+
>+ vbasedev->hwpt = hwpt;
>+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>+ return true;
>+}
>+
>+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>+ VFIOIOMMUFDContainer *container)
>+{
>+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>+
>+ QLIST_REMOVE(vbasedev, hwpt_next);
>+ if (QLIST_EMPTY(&hwpt->device_list)) {
>+ QLIST_REMOVE(hwpt, next);
>+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>+ g_free(hwpt);
>+ }
>+}
>+
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
>+ /* mdevs aren't physical devices and will fail with auto domains */
>+ if (!vbasedev->mdev) {
>+ return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>+ }
>+
> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>errp);
> }
>
>@@ -224,6 +300,11 @@ static void
>iommufd_cdev_detach_container(VFIODevice *vbasedev,
> {
> Error *err = NULL;
>
>+ if (vbasedev->hwpt) {
>+ iommufd_cdev_autodomains_put(vbasedev, container);
>+ return;
>+ }
>+
> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
> error_report_err(err);
> }
>@@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name,
>VFIODevice *vbasedev,
> container =
>VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
> container->be = vbasedev->iommufd;
> container->ioas_id = ioas_id;
>+ QLIST_INIT(&container->hwpt_list);
>
> bcontainer = &container->bcontainer;
> vfio_address_space_insert(space, bcontainer);
>diff --git a/backends/trace-events b/backends/trace-events
>index 211e6f374adc..4d8ac02fe7d6 100644
>--- a/backends/trace-events
>+++ b/backends/trace-events
>@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t
>ioas, uint64_t iova, uint64_t size
> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>size=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>ioas=%d"
>+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>(%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>--
>2.17.2
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 2:18 ` Duan, Zhenzhong
@ 2024-07-17 9:04 ` Joao Martins
2024-07-17 10:05 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 9:04 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 03:18, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
>>
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>> responsible for creating an IOMMU domain. This is contrast to the
>> 'simple API' where the IOMMU domain is created by IOMMUFD
>> automatically
>> when it attaches to VFIO (usually referred as autodomains) but it has
>> the needed handling for mdevs.
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU
>> domain
>> it falls back to IOAS attach.
>>
>> The auto domain logic allows different IOMMU domains to be created when
>> DMA dirty tracking is not desired (and VF can provide it), and others where
>> it is. Here is not used in this way here given how VFIODevice migration
>> state is initialized after the device attachment. But such mixed mode of
>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>> be added on. Keep the 'all of nothing' of type1 approach that we have
>> been using so far between container vs device dirty tracking.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 9 ++++
>> include/sysemu/iommufd.h | 5 +++
>> backends/iommufd.c | 30 +++++++++++++
>> hw/vfio/iommufd.c | 82
>> +++++++++++++++++++++++++++++++++++
>> backends/trace-events | 1 +
>> 5 files changed, 127 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>> common.h
>> index 7419466bca92..2dd468ce3c02 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>
>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>
>> +typedef struct VFIOIOASHwpt {
>> + uint32_t hwpt_id;
>> + QLIST_HEAD(, VFIODevice) device_list;
>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>> +} VFIOIOASHwpt;
>> +
>> typedef struct VFIOIOMMUFDContainer {
>> VFIOContainerBase bcontainer;
>> IOMMUFDBackend *be;
>> uint32_t ioas_id;
>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>> } VFIOIOMMUFDContainer;
>>
>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>> VFIO_IOMMU_IOMMUFD);
>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>> HostIOMMUDevice *hiod;
>> int devid;
>> IOMMUFDBackend *iommufd;
>> + VFIOIOASHwpt *hwpt;
>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>> } VFIODevice;
>>
>> struct VFIODeviceOps {
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 57d502a1c79a..e917e7591d05 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -50,6 +50,11 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>> devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp);
>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt,
>> + Error **errp);
>>
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 2b3d51af26d2..5d3dfa917415 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -208,6 +208,36 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> return ret;
>> }
>>
>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt,
>> + Error **errp)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_hwpt_alloc alloc_hwpt = {
>> + .size = sizeof(struct iommu_hwpt_alloc),
>> + .flags = flags,
>> + .dev_id = dev_id,
>> + .pt_id = pt_id,
>> + .data_type = data_type,
>> + .data_len = data_len,
>> + .data_uptr = (uint64_t)data_ptr,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>> + data_len, (uint64_t)data_ptr,
>> + alloc_hwpt.out_hwpt_id, ret);
>> + if (ret) {
>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>> + return false;
>> + }
>> +
>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>> + return true;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>> devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 077dea8f1b64..325c7598d5a1 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -212,10 +212,86 @@ static bool
>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>> return true;
>> }
>>
>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container,
>> + Error **errp)
>> +{
>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> + uint32_t flags = 0;
>> + VFIOIOASHwpt *hwpt;
>> + uint32_t hwpt_id;
>> + int ret;
>> +
>> + /* Try to find a domain */
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>> errp);
>
> If there is already an hwpt that supports dirty tracking.
> Another device that doesn't support dirty tracking attaches to this hwpt, will it succeed?
>
It returns -EINVAL, and we handle that right after this statement. Which means
another HWPT is created.
> If existing hwpt doesn't support dirty tracking.
> Another device supporting dirty tracking attaches to that hwpt, what will happen?
>
Hmm, It succeeds as there's no incompatbility. At the very least I plan on
blocking migration if the device neither has VF dirty tracking, nor IOMMU dirty
tracking (and patch 11 needs to be adjusted to check hwpt_flags instead of
container).
Qemu right now doesn't handle heteregenous environment, it's all of nothing
approach even before this patchset. Additionally, I am not sure server
environments are applicable here. So essentially I kept the status quo -- more
follow-up is needed to support a mix and match of IOMMU + VF dirty tracking too.
The challenge is having the migration state of VFIO device initialized early
enough that we can make all sort of decisions whether IOMMU dirty tracking is
desired on a per-device basis.
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 9:04 ` Joao Martins
@ 2024-07-17 10:05 ` Duan, Zhenzhong
2024-07-17 11:04 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 10:05 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>creation
>
>On 17/07/2024 03:18, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>creation
>>>
>>> There's generally two modes of operation for IOMMUFD:
>>>
>>> * The simple user API which intends to perform relatively simple things
>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>>> and mainly performs IOAS_MAP and UNMAP.
>>>
>>> * The native IOMMUFD API where you have fine grained control of the
>>> IOMMU domain and model it accordingly. This is where most new feature
>>> are being steered to.
>>>
>>> For dirty tracking 2) is required, as it needs to ensure that
>>> the stage-2/parent IOMMU domain will only attach devices
>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>> useful guarantee to VMMs that will refuse incompatible device
>>> attachments for IOMMU domains.
>>>
>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>> responsible for creating an IOMMU domain. This is contrast to the
>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>> automatically
>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>> the needed handling for mdevs.
>>>
>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>> similar logic, where IOMMU domains are created and devices attached to
>>> compatible domains. Essentially mimmicing kernel
>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>IOMMU
>>> domain
>>> it falls back to IOAS attach.
>>>
>>> The auto domain logic allows different IOMMU domains to be created
>when
>>> DMA dirty tracking is not desired (and VF can provide it), and others
>where
>>> it is. Here is not used in this way here given how VFIODevice migration
>>> state is initialized after the device attachment. But such mixed mode of
>>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>> been using so far between container vs device dirty tracking.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 9 ++++
>>> include/sysemu/iommufd.h | 5 +++
>>> backends/iommufd.c | 30 +++++++++++++
>>> hw/vfio/iommufd.c | 82
>>> +++++++++++++++++++++++++++++++++++
>>> backends/trace-events | 1 +
>>> 5 files changed, 127 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>> common.h
>>> index 7419466bca92..2dd468ce3c02 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>
>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>
>>> +typedef struct VFIOIOASHwpt {
>>> + uint32_t hwpt_id;
>>> + QLIST_HEAD(, VFIODevice) device_list;
>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>> +} VFIOIOASHwpt;
>>> +
>>> typedef struct VFIOIOMMUFDContainer {
>>> VFIOContainerBase bcontainer;
>>> IOMMUFDBackend *be;
>>> uint32_t ioas_id;
>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>> } VFIOIOMMUFDContainer;
>>>
>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>> VFIO_IOMMU_IOMMUFD);
>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>> HostIOMMUDevice *hiod;
>>> int devid;
>>> IOMMUFDBackend *iommufd;
>>> + VFIOIOASHwpt *hwpt;
>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>> } VFIODevice;
>>>
>>> struct VFIODeviceOps {
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index 57d502a1c79a..e917e7591d05 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -50,6 +50,11 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t
>>> devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp);
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp);
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>> #endif
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 2b3d51af26d2..5d3dfa917415 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -208,6 +208,36 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>> return ret;
>>> }
>>>
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>> + .flags = flags,
>>> + .dev_id = dev_id,
>>> + .pt_id = pt_id,
>>> + .data_type = data_type,
>>> + .data_len = data_len,
>>> + .data_uptr = (uint64_t)data_ptr,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>data_type,
>>> + data_len, (uint64_t)data_ptr,
>>> + alloc_hwpt.out_hwpt_id, ret);
>>> + if (ret) {
>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>> + return false;
>>> + }
>>> +
>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>> + return true;
>>> +}
>>> +
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t
>>> devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp)
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 077dea8f1b64..325c7598d5a1 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -212,10 +212,86 @@ static bool
>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>> return true;
>>> }
>>>
>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container,
>>> + Error **errp)
>>> +{
>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>> + uint32_t flags = 0;
>>> + VFIOIOASHwpt *hwpt;
>>> + uint32_t hwpt_id;
>>> + int ret;
>>> +
>>> + /* Try to find a domain */
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>>> errp);
>>
>> If there is already an hwpt that supports dirty tracking.
>> Another device that doesn't support dirty tracking attaches to this hwpt,
>will it succeed?
>>
>
>It returns -EINVAL, and we handle that right after this statement. Which
>means
>another HWPT is created.
Looked into kernel code, I didn't see the check about dirty tracking between device and hwpt, do you know which func does that?
>
>> If existing hwpt doesn't support dirty tracking.
>> Another device supporting dirty tracking attaches to that hwpt, what will
>happen?
>>
>
>Hmm, It succeeds as there's no incompatbility. At the very least I plan on
>blocking migration if the device neither has VF dirty tracking, nor IOMMU
>dirty
>tracking (and patch 11 needs to be adjusted to check hwpt_flags instead of
>container).
When bcontainer->dirty_pages_supported is true, I think that container should only contains hwpt list that support dirty tracking. All hwpt not supporting dirty tracking should be in other container.
If device supports dirty tracking, it should bypass attaching container that doesn't support dirty tracking. Vise versa.
This way we can support the mixing environment.
Thanks
Zhenzhong
>
>Qemu right now doesn't handle heteregenous environment, it's all of
>nothing
>approach even before this patchset. Additionally, I am not sure server
>environments are applicable here. So essentially I kept the status quo --
>more
>follow-up is needed to support a mix and match of IOMMU + VF dirty
>tracking too.
>The challenge is having the migration state of VFIO device initialized early
>enough that we can make all sort of decisions whether IOMMU dirty tracking
>is
>desired on a per-device basis.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 10:05 ` Duan, Zhenzhong
@ 2024-07-17 11:04 ` Joao Martins
2024-07-18 7:44 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 11:04 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 11:05, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>> creation
>>
>> On 17/07/2024 03:18, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Subject: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>> creation
>>>>
>>>> There's generally two modes of operation for IOMMUFD:
>>>>
>>>> * The simple user API which intends to perform relatively simple things
>>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>>>> and mainly performs IOAS_MAP and UNMAP.
>>>>
>>>> * The native IOMMUFD API where you have fine grained control of the
>>>> IOMMU domain and model it accordingly. This is where most new feature
>>>> are being steered to.
>>>>
>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>> the stage-2/parent IOMMU domain will only attach devices
>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>> useful guarantee to VMMs that will refuse incompatible device
>>>> attachments for IOMMU domains.
>>>>
>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>>> automatically
>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>> the needed handling for mdevs.
>>>>
>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>> similar logic, where IOMMU domains are created and devices attached to
>>>> compatible domains. Essentially mimmicing kernel
>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>> IOMMU
>>>> domain
>>>> it falls back to IOAS attach.
>>>>
>>>> The auto domain logic allows different IOMMU domains to be created
>> when
>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>> where
>>>> it is. Here is not used in this way here given how VFIODevice migration
>>>> state is initialized after the device attachment. But such mixed mode of
>>>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>> been using so far between container vs device dirty tracking.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>> include/sysemu/iommufd.h | 5 +++
>>>> backends/iommufd.c | 30 +++++++++++++
>>>> hw/vfio/iommufd.c | 82
>>>> +++++++++++++++++++++++++++++++++++
>>>> backends/trace-events | 1 +
>>>> 5 files changed, 127 insertions(+)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>>> common.h
>>>> index 7419466bca92..2dd468ce3c02 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>
>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>
>>>> +typedef struct VFIOIOASHwpt {
>>>> + uint32_t hwpt_id;
>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>> +} VFIOIOASHwpt;
>>>> +
>>>> typedef struct VFIOIOMMUFDContainer {
>>>> VFIOContainerBase bcontainer;
>>>> IOMMUFDBackend *be;
>>>> uint32_t ioas_id;
>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>> } VFIOIOMMUFDContainer;
>>>>
>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>>> VFIO_IOMMU_IOMMUFD);
>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>> HostIOMMUDevice *hiod;
>>>> int devid;
>>>> IOMMUFDBackend *iommufd;
>>>> + VFIOIOASHwpt *hwpt;
>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>> } VFIODevice;
>>>>
>>>> struct VFIODeviceOps {
>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>> index 57d502a1c79a..e917e7591d05 100644
>>>> --- a/include/sysemu/iommufd.h
>>>> +++ b/include/sysemu/iommufd.h
>>>> @@ -50,6 +50,11 @@ int
>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>> ioas_id,
>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>> uint32_t
>>>> devid,
>>>> uint32_t *type, void *data, uint32_t len,
>>>> uint64_t *caps, Error **errp);
>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>>> dev_id,
>>>> + uint32_t pt_id, uint32_t flags,
>>>> + uint32_t data_type, uint32_t data_len,
>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>> + Error **errp);
>>>>
>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>> #endif
>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>> index 2b3d51af26d2..5d3dfa917415 100644
>>>> --- a/backends/iommufd.c
>>>> +++ b/backends/iommufd.c
>>>> @@ -208,6 +208,36 @@ int
>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>> ioas_id,
>>>> return ret;
>>>> }
>>>>
>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>>> dev_id,
>>>> + uint32_t pt_id, uint32_t flags,
>>>> + uint32_t data_type, uint32_t data_len,
>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>> + Error **errp)
>>>> +{
>>>> + int ret, fd = be->fd;
>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>> + .flags = flags,
>>>> + .dev_id = dev_id,
>>>> + .pt_id = pt_id,
>>>> + .data_type = data_type,
>>>> + .data_len = data_len,
>>>> + .data_uptr = (uint64_t)data_ptr,
>>>> + };
>>>> +
>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>> data_type,
>>>> + data_len, (uint64_t)data_ptr,
>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>> + if (ret) {
>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>> + return false;
>>>> + }
>>>> +
>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>> + return true;
>>>> +}
>>>> +
>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>> uint32_t
>>>> devid,
>>>> uint32_t *type, void *data, uint32_t len,
>>>> uint64_t *caps, Error **errp)
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 077dea8f1b64..325c7598d5a1 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -212,10 +212,86 @@ static bool
>>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>>> return true;
>>>> }
>>>>
>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>> + VFIOIOMMUFDContainer *container,
>>>> + Error **errp)
>>>> +{
>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>> + uint32_t flags = 0;
>>>> + VFIOIOASHwpt *hwpt;
>>>> + uint32_t hwpt_id;
>>>> + int ret;
>>>> +
>>>> + /* Try to find a domain */
>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>>>> errp);
>>>
>>> If there is already an hwpt that supports dirty tracking.
>>> Another device that doesn't support dirty tracking attaches to this hwpt,
>> will it succeed?
>>>
>>
>> It returns -EINVAL, and we handle that right after this statement. Which
>> means
>> another HWPT is created.
>
> Looked into kernel code, I didn't see the check about dirty tracking between device and hwpt, do you know which func does that?
>
A device is associated with a group (aka IOMMU instance) and those checks
happens when the device in a group is firstly being attached the first time or
belongs to some *other* group and gets attach to this domain with dirty tracking
enforced. If the device belongs to the same group that had a device attached
already there's just a bump in the refcount and device is added to the /same
group/ device list. Otherwise the device belongs to a different group and it's
being attached to a domain and the various checks get triggered (dirty tracking
being one of them). These attachment validation checks are part of the iommu
driver, not core (the core just sees a .attach_dev() failure).
Usually follows this codepath when the group attachment checks are firstly being
done:
vfio_iommufd_physical_attach_ioas()
iommufd_device_attach()
iommufd_device_do_attach()
iommufd_hw_pagetable_attach()
iommu_attach_group()
...
__iommu_attach_device()
Then each iommu driver defines the compatibility checks and if the domain has
dirty_ops set (that comes from this ALLOC_DIRTY_TRACKING flag) and the IOMMU
backing the device doesn't have dirty tracking the driver returns -EINVAL
e.g. on Intel IOMMU:
intel_iommu_attach_device()
prepare_domain_attach_device():
domain->dirty_ops && !ssads_supported(iommu)
return -EINVAL;
>>
>>> If existing hwpt doesn't support dirty tracking.
>>> Another device supporting dirty tracking attaches to that hwpt, what will
>> happen?
>>>
>>
>> Hmm, It succeeds as there's no incompatbility. At the very least I plan on
>> blocking migration if the device neither has VF dirty tracking, nor IOMMU
>> dirty
>> tracking (and patch 11 needs to be adjusted to check hwpt_flags instead of
>> container).
>
> When bcontainer->dirty_pages_supported is true, I think that container should only contains hwpt list that support dirty tracking. All hwpt not supporting dirty tracking should be in other container.
>
Well but we are adopting this auto domains scheme and works for any device,
dirty tracking or not. We already track hwpt flags so we know which ones support
dirty tracking. This differentiation would (IMHO) complicate more and I am not
sure the gain
> If device supports dirty tracking, it should bypass attaching container that doesn't support dirty tracking. Vise versa.
> This way we can support the mixing environment.
>
It's not that easy as the whole flow doesn't handle this mixed mode (even
excluding this series). We would to have device-dirty-tracking start all
non-disabled device trackers first [and stop them as well], and then we would
always iterate those first (if device dirty trackers are active), and then defer
to IOMMU tracker for those who don't.
But given this mixed mode might be prone to regressions plus with me being
dangerously close to softfreeze too, I was deeming it follow-up. And hence
hoping I improve detection when the IOMMU doesn't provide the lowest common
denominator for the 'all or nothing' mode then it would block migration. I can
turn that if statement in {start,query}_dirty_tracking into an assert if that
improves things.
>
>>
>> Qemu right now doesn't handle heteregenous environment, it's all of
>> nothing
>> approach even before this patchset. Additionally, I am not sure server
>> environments are applicable here. So essentially I kept the status quo --
>> more
>> follow-up is needed to support a mix and match of IOMMU + VF dirty
>> tracking too.
>> The challenge is having the migration state of VFIO device initialized early
>> enough that we can make all sort of decisions whether IOMMU dirty tracking
>> is
>> desired on a per-device basis.
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-17 11:04 ` Joao Martins
@ 2024-07-18 7:44 ` Duan, Zhenzhong
2024-07-18 9:16 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-18 7:44 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>creation
>
>On 17/07/2024 11:05, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>>> creation
>>>
>>> On 17/07/2024 03:18, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>> Subject: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>>> creation
>>>>>
>>>>> There's generally two modes of operation for IOMMUFD:
>>>>>
>>>>> * The simple user API which intends to perform relatively simple things
>>>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to
>VFIO
>>>>> and mainly performs IOAS_MAP and UNMAP.
>>>>>
>>>>> * The native IOMMUFD API where you have fine grained control of the
>>>>> IOMMU domain and model it accordingly. This is where most new
>feature
>>>>> are being steered to.
>>>>>
>>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>>> the stage-2/parent IOMMU domain will only attach devices
>>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>>> useful guarantee to VMMs that will refuse incompatible device
>>>>> attachments for IOMMU domains.
>>>>>
>>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>>>> automatically
>>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>>> the needed handling for mdevs.
>>>>>
>>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>>> similar logic, where IOMMU domains are created and devices attached
>to
>>>>> compatible domains. Essentially mimmicing kernel
>>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>>> IOMMU
>>>>> domain
>>>>> it falls back to IOAS attach.
>>>>>
>>>>> The auto domain logic allows different IOMMU domains to be created
>>> when
>>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>>> where
>>>>> it is. Here is not used in this way here given how VFIODevice migration
>>>>> state is initialized after the device attachment. But such mixed mode of
>>>>> IOMMU dirty tracking + device dirty tracking is an improvement that
>can
>>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>>> been using so far between container vs device dirty tracking.
>>>>>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> ---
>>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>>> include/sysemu/iommufd.h | 5 +++
>>>>> backends/iommufd.c | 30 +++++++++++++
>>>>> hw/vfio/iommufd.c | 82
>>>>> +++++++++++++++++++++++++++++++++++
>>>>> backends/trace-events | 1 +
>>>>> 5 files changed, 127 insertions(+)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>>>> common.h
>>>>> index 7419466bca92..2dd468ce3c02 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>>
>>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>>
>>>>> +typedef struct VFIOIOASHwpt {
>>>>> + uint32_t hwpt_id;
>>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>>> +} VFIOIOASHwpt;
>>>>> +
>>>>> typedef struct VFIOIOMMUFDContainer {
>>>>> VFIOContainerBase bcontainer;
>>>>> IOMMUFDBackend *be;
>>>>> uint32_t ioas_id;
>>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>>> } VFIOIOMMUFDContainer;
>>>>>
>>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>>>> VFIO_IOMMU_IOMMUFD);
>>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>>> HostIOMMUDevice *hiod;
>>>>> int devid;
>>>>> IOMMUFDBackend *iommufd;
>>>>> + VFIOIOASHwpt *hwpt;
>>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>>> } VFIODevice;
>>>>>
>>>>> struct VFIODeviceOps {
>>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>>> index 57d502a1c79a..e917e7591d05 100644
>>>>> --- a/include/sysemu/iommufd.h
>>>>> +++ b/include/sysemu/iommufd.h
>>>>> @@ -50,6 +50,11 @@ int
>>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>>> ioas_id,
>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>> uint32_t
>>>>> devid,
>>>>> uint32_t *type, void *data, uint32_t len,
>>>>> uint64_t *caps, Error **errp);
>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be,
>uint32_t
>>>>> dev_id,
>>>>> + uint32_t pt_id, uint32_t flags,
>>>>> + uint32_t data_type, uint32_t data_len,
>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>> + Error **errp);
>>>>>
>>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>>> #endif
>>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>>> index 2b3d51af26d2..5d3dfa917415 100644
>>>>> --- a/backends/iommufd.c
>>>>> +++ b/backends/iommufd.c
>>>>> @@ -208,6 +208,36 @@ int
>>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>>> ioas_id,
>>>>> return ret;
>>>>> }
>>>>>
>>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be,
>uint32_t
>>>>> dev_id,
>>>>> + uint32_t pt_id, uint32_t flags,
>>>>> + uint32_t data_type, uint32_t data_len,
>>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>>> + Error **errp)
>>>>> +{
>>>>> + int ret, fd = be->fd;
>>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>>> + .flags = flags,
>>>>> + .dev_id = dev_id,
>>>>> + .pt_id = pt_id,
>>>>> + .data_type = data_type,
>>>>> + .data_len = data_len,
>>>>> + .data_uptr = (uint64_t)data_ptr,
>>>>> + };
>>>>> +
>>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>>> data_type,
>>>>> + data_len, (uint64_t)data_ptr,
>>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>>> + if (ret) {
>>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>>> + return false;
>>>>> + }
>>>>> +
>>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>>> uint32_t
>>>>> devid,
>>>>> uint32_t *type, void *data, uint32_t len,
>>>>> uint64_t *caps, Error **errp)
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index 077dea8f1b64..325c7598d5a1 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -212,10 +212,86 @@ static bool
>>>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error
>**errp)
>>>>> return true;
>>>>> }
>>>>>
>>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>> + VFIOIOMMUFDContainer *container,
>>>>> + Error **errp)
>>>>> +{
>>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>>> + uint32_t flags = 0;
>>>>> + VFIOIOASHwpt *hwpt;
>>>>> + uint32_t hwpt_id;
>>>>> + int ret;
>>>>> +
>>>>> + /* Try to find a domain */
>>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt-
>>hwpt_id,
>>>>> errp);
>>>>
>>>> If there is already an hwpt that supports dirty tracking.
>>>> Another device that doesn't support dirty tracking attaches to this hwpt,
>>> will it succeed?
>>>>
>>>
>>> It returns -EINVAL, and we handle that right after this statement. Which
>>> means
>>> another HWPT is created.
>>
>> Looked into kernel code, I didn't see the check about dirty tracking
>between device and hwpt, do you know which func does that?
>>
>
>A device is associated with a group (aka IOMMU instance) and those checks
>happens when the device in a group is firstly being attached the first time or
>belongs to some *other* group and gets attach to this domain with dirty
>tracking
>enforced. If the device belongs to the same group that had a device attached
>already there's just a bump in the refcount and device is added to the /same
>group/ device list. Otherwise the device belongs to a different group and it's
>being attached to a domain and the various checks get triggered (dirty
>tracking
>being one of them). These attachment validation checks are part of the
>iommu
>driver, not core (the core just sees a .attach_dev() failure).
>
>Usually follows this codepath when the group attachment checks are firstly
>being
>done:
>
>vfio_iommufd_physical_attach_ioas()
> iommufd_device_attach()
> iommufd_device_do_attach()
> iommufd_hw_pagetable_attach()
> iommu_attach_group()
> ...
> __iommu_attach_device()
>
>Then each iommu driver defines the compatibility checks and if the domain
>has
>dirty_ops set (that comes from this ALLOC_DIRTY_TRACKING flag) and the
>IOMMU
>backing the device doesn't have dirty tracking the driver returns -EINVAL
>e.g. on Intel IOMMU:
>
>intel_iommu_attach_device()
> prepare_domain_attach_device():
> domain->dirty_ops && !ssads_supported(iommu)
> return -EINVAL;
Understood, thanks.
>
>
>>>
>>>> If existing hwpt doesn't support dirty tracking.
>>>> Another device supporting dirty tracking attaches to that hwpt, what
>will
>>> happen?
>>>>
>>>
>>> Hmm, It succeeds as there's no incompatbility. At the very least I plan on
>>> blocking migration if the device neither has VF dirty tracking, nor IOMMU
>>> dirty
>>> tracking (and patch 11 needs to be adjusted to check hwpt_flags instead
>of
>>> container).
>>
>> When bcontainer->dirty_pages_supported is true, I think that container
>should only contains hwpt list that support dirty tracking. All hwpt not
>supporting dirty tracking should be in other container.
>>
>Well but we are adopting this auto domains scheme and works for any
>device,
>dirty tracking or not. We already track hwpt flags so we know which ones
>support
>dirty tracking. This differentiation would (IMHO) complicate more and I am
>not
>sure the gain
OK, I was trying to make bcontainer->dirty_pages_supported accurate because it is used in many functions such as vfio_get_dirty_bitmap() which require an accurate value. If there is mix of hwpt in that container, that's impossible.
But as you say you want to address the mix issue in a follow-up and presume all are homogeneous hw for now, then OK, there is no conflict.
>
>> If device supports dirty tracking, it should bypass attaching container that
>doesn't support dirty tracking. Vise versa.
>> This way we can support the mixing environment.
>>
>
>It's not that easy as the whole flow doesn't handle this mixed mode (even
>excluding this series). We would to have device-dirty-tracking start all
>non-disabled device trackers first [and stop them as well], and then we
>would
>always iterate those first (if device dirty trackers are active), and then defer
>to IOMMU tracker for those who don't.
Why is device-dirty-tracking preferred over IOMMU dirty tracking?
Imagine if many devices attached to same domain.
>
>But given this mixed mode might be prone to regressions plus with me being
>dangerously close to softfreeze too, I was deeming it follow-up. And hence
>hoping I improve detection when the IOMMU doesn't provide the lowest
>common
>denominator for the 'all or nothing' mode then it would block migration. I
>can
>turn that if statement in {start,query}_dirty_tracking into an assert if that
>improves things.
OK
>
>>
>>>
>>> Qemu right now doesn't handle heteregenous environment, it's all of
>>> nothing
>>> approach even before this patchset. Additionally, I am not sure server
>>> environments are applicable here. So essentially I kept the status quo --
>>> more
>>> follow-up is needed to support a mix and match of IOMMU + VF dirty
>>> tracking too.
>>> The challenge is having the migration state of VFIO device initialized early
>>> enough that we can make all sort of decisions whether IOMMU dirty
>tracking
>>> is
>>> desired on a per-device basis.
OK.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-18 7:44 ` Duan, Zhenzhong
@ 2024-07-18 9:16 ` Joao Martins
2024-07-19 2:36 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-18 9:16 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 18/07/2024 08:44, Duan, Zhenzhong wrote:
>>>>> If existing hwpt doesn't support dirty tracking.
>>>>> Another device supporting dirty tracking attaches to that hwpt, what
>> will
>>>> happen?
>>>>>
>>>>
>>>> Hmm, It succeeds as there's no incompatbility. At the very least I plan on
>>>> blocking migration if the device neither has VF dirty tracking, nor IOMMU
>>>> dirty
>>>> tracking (and patch 11 needs to be adjusted to check hwpt_flags instead
>> of
>>>> container).
>>>
>>> When bcontainer->dirty_pages_supported is true, I think that container
>> should only contains hwpt list that support dirty tracking. All hwpt not
>> supporting dirty tracking should be in other container.
>>>
>> Well but we are adopting this auto domains scheme and works for any
>> device,
>> dirty tracking or not. We already track hwpt flags so we know which ones
>> support
>> dirty tracking. This differentiation would (IMHO) complicate more and I am
>> not
>> sure the gain
>
> OK, I was trying to make bcontainer->dirty_pages_supported accurate because it is used in many functions such as vfio_get_dirty_bitmap() which require an accurate value. If there is mix of hwpt in that container, that's impossible.
>
> But as you say you want to address the mix issue in a follow-up and presume all are homogeneous hw for now, then OK, there is no conflict.
>
Right
>>
>>> If device supports dirty tracking, it should bypass attaching container that
>> doesn't support dirty tracking. Vise versa.
>>> This way we can support the mixing environment.
>>>
>>
>> It's not that easy as the whole flow doesn't handle this mixed mode (even
>> excluding this series). We would to have device-dirty-tracking start all
>> non-disabled device trackers first [and stop them as well], and then we
>> would
>> always iterate those first (if device dirty trackers are active), and then defer
>> to IOMMU tracker for those who don't.
>
> Why is device-dirty-tracking preferred over IOMMU dirty tracking?
> Imagine if many devices attached to same domain.
>
The heuristic or expectation is that device dirty tracking doesn't involve a
compromise for SW because it can a) perform lowest granularity of IOVA range
being dirty with b) no DMA penalty. With IOMMU though, SW needs to worry about
managing page tables to dictate the granularity and those take time to walk the
deeper the level we descend into. I used to think that IOMMU we have DMA penalty
(because of the IOTLB flushes to clear dirty bit, and IOTLB cache misses) but I
haven't yet that materialized in the field yet (at least for 100Gbit/s rates).
TL;DR At the end of the day with device dirty tracking you have less to worry
about, and it's the VF doing most of the heavy lifting. In theory with device
dirty tracking you could even perform sub basepage tracking if the device allows
it to do so.
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
2024-07-18 9:16 ` Joao Martins
@ 2024-07-19 2:36 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-19 2:36 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain
>creation
>
>On 18/07/2024 08:44, Duan, Zhenzhong wrote:
>>>>>> If existing hwpt doesn't support dirty tracking.
>>>>>> Another device supporting dirty tracking attaches to that hwpt, what
>>> will
>>>>> happen?
>>>>>>
>>>>>
>>>>> Hmm, It succeeds as there's no incompatbility. At the very least I plan
>on
>>>>> blocking migration if the device neither has VF dirty tracking, nor
>IOMMU
>>>>> dirty
>>>>> tracking (and patch 11 needs to be adjusted to check hwpt_flags
>instead
>>> of
>>>>> container).
>>>>
>>>> When bcontainer->dirty_pages_supported is true, I think that container
>>> should only contains hwpt list that support dirty tracking. All hwpt not
>>> supporting dirty tracking should be in other container.
>>>>
>>> Well but we are adopting this auto domains scheme and works for any
>>> device,
>>> dirty tracking or not. We already track hwpt flags so we know which ones
>>> support
>>> dirty tracking. This differentiation would (IMHO) complicate more and I
>am
>>> not
>>> sure the gain
>>
>> OK, I was trying to make bcontainer->dirty_pages_supported accurate
>because it is used in many functions such as vfio_get_dirty_bitmap() which
>require an accurate value. If there is mix of hwpt in that container, that's
>impossible.
>>
>> But as you say you want to address the mix issue in a follow-up and
>presume all are homogeneous hw for now, then OK, there is no conflict.
>>
>
>Right
>
>>>
>>>> If device supports dirty tracking, it should bypass attaching container
>that
>>> doesn't support dirty tracking. Vise versa.
>>>> This way we can support the mixing environment.
>>>>
>>>
>>> It's not that easy as the whole flow doesn't handle this mixed mode (even
>>> excluding this series). We would to have device-dirty-tracking start all
>>> non-disabled device trackers first [and stop them as well], and then we
>>> would
>>> always iterate those first (if device dirty trackers are active), and then
>defer
>>> to IOMMU tracker for those who don't.
>>
>> Why is device-dirty-tracking preferred over IOMMU dirty tracking?
>> Imagine if many devices attached to same domain.
>>
>
>The heuristic or expectation is that device dirty tracking doesn't involve a
>compromise for SW because it can a) perform lowest granularity of IOVA
>range
>being dirty with b) no DMA penalty. With IOMMU though, SW needs to
>worry about
>managing page tables to dictate the granularity and those take time to walk
>the
>deeper the level we descend into. I used to think that IOMMU we have DMA
>penalty
>(because of the IOTLB flushes to clear dirty bit, and IOTLB cache misses) but I
>haven't yet that materialized in the field yet (at least for 100Gbit/s rates).
>
>TL;DR At the end of the day with device dirty tracking you have less to worry
>about, and it's the VF doing most of the heavy lifting. In theory with device
>dirty tracking you could even perform sub basepage tracking if the device
>allows
>it to do so.
Clear, thanks Joao.
BRs.
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (4 preceding siblings ...)
2024-07-12 11:46 ` [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-07-12 11:46 ` Joao Martins
2024-07-16 10:19 ` Cédric Le Goater
2024-07-16 17:40 ` Eric Auger
2024-07-12 11:46 ` [PATCH v4 07/12] vfio/{iommufd, container}: Initialize HostIOMMUDeviceCaps during attach_device() Joao Martins via
` (6 subsequent siblings)
12 siblings, 2 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:46 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
In preparation to moving HostIOMMUDevice realize() being able to called
early during attach_device(), remove properties that rely on container
being initialized.
This means removing caps::aw_bits which requires the
bcontainer::iova_ranges to be inititalized after device is actually
attached. Instead defer that to .get_cap() and call
vfio_device_get_aw_bits() directly.
Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/host_iommu_device.h | 1 -
backends/iommufd.c | 3 ++-
hw/vfio/container.c | 5 +----
hw/vfio/iommufd.c | 1 -
4 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
index ee6c813c8b22..20e77cf54568 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -24,7 +24,6 @@
*/
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
- uint8_t aw_bits;
} HostIOMMUDeviceCaps;
#define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 5d3dfa917415..41a9dec3b2c5 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -18,6 +18,7 @@
#include "qemu/error-report.h"
#include "monitor/monitor.h"
#include "trace.h"
+#include "hw/vfio/vfio-common.h"
#include <sys/ioctl.h>
#include <linux/iommufd.h>
@@ -270,7 +271,7 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
return caps->type;
case HOST_IOMMU_DEVICE_CAP_AW_BITS:
- return caps->aw_bits;
+ return vfio_device_get_aw_bits(hiod->agent);
default:
error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
return -EINVAL;
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 88ede913d6f7..c27f448ba26e 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1144,7 +1144,6 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
VFIODevice *vdev = opaque;
hiod->name = g_strdup(vdev->name);
- hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
hiod->agent = opaque;
return true;
@@ -1153,11 +1152,9 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
Error **errp)
{
- HostIOMMUDeviceCaps *caps = &hiod->caps;
-
switch (cap) {
case HOST_IOMMU_DEVICE_CAP_AW_BITS:
- return caps->aw_bits;
+ return vfio_device_get_aw_bits(hiod->agent);
default:
error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
return -EINVAL;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 325c7598d5a1..873c919e319c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -722,7 +722,6 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
hiod->name = g_strdup(vdev->name);
caps->type = type;
- caps->aw_bits = vfio_device_get_aw_bits(vdev);
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-12 11:46 ` [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
@ 2024-07-16 10:19 ` Cédric Le Goater
2024-07-16 17:40 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 10:19 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> In preparation to moving HostIOMMUDevice realize() being able to called
> early during attach_device(), remove properties that rely on container
> being initialized.
>
> This means removing caps::aw_bits which requires the
> bcontainer::iova_ranges to be inititalized after device is actually
> attached. Instead defer that to .get_cap() and call
> vfio_device_get_aw_bits() directly.
>
> Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/sysemu/host_iommu_device.h | 1 -
> backends/iommufd.c | 3 ++-
> hw/vfio/container.c | 5 +----
> hw/vfio/iommufd.c | 1 -
> 4 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
> index ee6c813c8b22..20e77cf54568 100644
> --- a/include/sysemu/host_iommu_device.h
> +++ b/include/sysemu/host_iommu_device.h
> @@ -24,7 +24,6 @@
> */
> typedef struct HostIOMMUDeviceCaps {
> uint32_t type;
> - uint8_t aw_bits;
> } HostIOMMUDeviceCaps;
>
> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 5d3dfa917415..41a9dec3b2c5 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -18,6 +18,7 @@
> #include "qemu/error-report.h"
> #include "monitor/monitor.h"
> #include "trace.h"
> +#include "hw/vfio/vfio-common.h"
> #include <sys/ioctl.h>
> #include <linux/iommufd.h>
>
> @@ -270,7 +271,7 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
> case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
> return caps->type;
> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
> - return caps->aw_bits;
> + return vfio_device_get_aw_bits(hiod->agent);
> default:
> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
> return -EINVAL;
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 88ede913d6f7..c27f448ba26e 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -1144,7 +1144,6 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> VFIODevice *vdev = opaque;
>
> hiod->name = g_strdup(vdev->name);
> - hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
> hiod->agent = opaque;
>
> return true;
> @@ -1153,11 +1152,9 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
> Error **errp)
> {
> - HostIOMMUDeviceCaps *caps = &hiod->caps;
> -
> switch (cap) {
> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
> - return caps->aw_bits;
> + return vfio_device_get_aw_bits(hiod->agent);
> default:
> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
> return -EINVAL;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 325c7598d5a1..873c919e319c 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -722,7 +722,6 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
> - caps->aw_bits = vfio_device_get_aw_bits(vdev);
>
> return true;
> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-12 11:46 ` [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
2024-07-16 10:19 ` Cédric Le Goater
@ 2024-07-16 17:40 ` Eric Auger
2024-07-16 18:22 ` Joao Martins
1 sibling, 1 reply; 82+ messages in thread
From: Eric Auger @ 2024-07-16 17:40 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hi Joao,
On 7/12/24 13:46, Joao Martins wrote:
> In preparation to moving HostIOMMUDevice realize() being able to called
> early during attach_device(), remove properties that rely on container
> being initialized.
It is difficult to parse the above sentence. Would deserve some rephrasing.
Also properties have a different meaning in qemu.
>
> This means removing caps::aw_bits which requires the
> bcontainer::iova_ranges to be inititalized after device is actually
initialized
> attached. Instead defer that to .get_cap() and call
> vfio_device_get_aw_bits() directly.
>
> Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/host_iommu_device.h | 1 -
> backends/iommufd.c | 3 ++-
> hw/vfio/container.c | 5 +----
> hw/vfio/iommufd.c | 1 -
> 4 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
> index ee6c813c8b22..20e77cf54568 100644
> --- a/include/sysemu/host_iommu_device.h
> +++ b/include/sysemu/host_iommu_device.h
> @@ -24,7 +24,6 @@
> */
> typedef struct HostIOMMUDeviceCaps {
> uint32_t type;
> - uint8_t aw_bits;
the doc comment needs to be updated accordingly.
> } HostIOMMUDeviceCaps;
>
> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 5d3dfa917415..41a9dec3b2c5 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -18,6 +18,7 @@
> #include "qemu/error-report.h"
> #include "monitor/monitor.h"
> #include "trace.h"
> +#include "hw/vfio/vfio-common.h"
> #include <sys/ioctl.h>
> #include <linux/iommufd.h>
>
> @@ -270,7 +271,7 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
> case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
> return caps->type;
> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
> - return caps->aw_bits;
> + return vfio_device_get_aw_bits(hiod->agent);
> default:
> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
> return -EINVAL;
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 88ede913d6f7..c27f448ba26e 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -1144,7 +1144,6 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> VFIODevice *vdev = opaque;
>
> hiod->name = g_strdup(vdev->name);
> - hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
> hiod->agent = opaque;
>
> return true;
> @@ -1153,11 +1152,9 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
> Error **errp)
> {
> - HostIOMMUDeviceCaps *caps = &hiod->caps;
> -
> switch (cap) {
> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
> - return caps->aw_bits;
> + return vfio_device_get_aw_bits(hiod->agent);
> default:
> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
> return -EINVAL;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 325c7598d5a1..873c919e319c 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -722,7 +722,6 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
> - caps->aw_bits = vfio_device_get_aw_bits(vdev);
>
> return true;
> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-16 17:40 ` Eric Auger
@ 2024-07-16 18:22 ` Joao Martins
2024-07-17 11:48 ` Eric Auger
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-16 18:22 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 16/07/2024 18:40, Eric Auger wrote:
> Hi Joao,
>
> On 7/12/24 13:46, Joao Martins wrote:
>> In preparation to moving HostIOMMUDevice realize() being able to called
>> early during attach_device(), remove properties that rely on container
>> being initialized.
> It is difficult to parse the above sentence. Would deserve some rephrasing.
>
> Also properties have a different meaning in qemu.
I think I will remove the above paragraph and instead adopt below with some
rephrasing:
Remove caps::aw_bits which requires the
bcontainer::iova_ranges to be inititalized after device is actually
initialized attached. Instead defer that to .get_cap() and call
vfio_device_get_aw_bits() directly.
This is in preparation for HostIOMMUDevice::realize() being called early during
attach_device().
Better?
>>
>> This means removing caps::aw_bits which requires the
>> bcontainer::iova_ranges to be inititalized after device is actually
> initialized
Yes
>> attached. Instead defer that to .get_cap() and call
>> vfio_device_get_aw_bits() directly.
>>
>> Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/host_iommu_device.h | 1 -
>> backends/iommufd.c | 3 ++-
>> hw/vfio/container.c | 5 +----
>> hw/vfio/iommufd.c | 1 -
>> 4 files changed, 3 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
>> index ee6c813c8b22..20e77cf54568 100644
>> --- a/include/sysemu/host_iommu_device.h
>> +++ b/include/sysemu/host_iommu_device.h
>> @@ -24,7 +24,6 @@
>> */
>> typedef struct HostIOMMUDeviceCaps {
>> uint32_t type;
>> - uint8_t aw_bits;
> the doc comment needs to be updated accordingly.
>> } HostIOMMUDeviceCaps;
>>
>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 5d3dfa917415..41a9dec3b2c5 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -18,6 +18,7 @@
>> #include "qemu/error-report.h"
>> #include "monitor/monitor.h"
>> #include "trace.h"
>> +#include "hw/vfio/vfio-common.h"
>> #include <sys/ioctl.h>
>> #include <linux/iommufd.h>
>>
>> @@ -270,7 +271,7 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
>> case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
>> return caps->type;
>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>> - return caps->aw_bits;
>> + return vfio_device_get_aw_bits(hiod->agent);
>> default:
>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>> return -EINVAL;
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 88ede913d6f7..c27f448ba26e 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -1144,7 +1144,6 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>> VFIODevice *vdev = opaque;
>>
>> hiod->name = g_strdup(vdev->name);
>> - hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
>> hiod->agent = opaque;
>>
>> return true;
>> @@ -1153,11 +1152,9 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>> static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>> Error **errp)
>> {
>> - HostIOMMUDeviceCaps *caps = &hiod->caps;
>> -
>> switch (cap) {
>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>> - return caps->aw_bits;
>> + return vfio_device_get_aw_bits(hiod->agent);
>> default:
>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>> return -EINVAL;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 325c7598d5a1..873c919e319c 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -722,7 +722,6 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>
>> hiod->name = g_strdup(vdev->name);
>> caps->type = type;
>> - caps->aw_bits = vfio_device_get_aw_bits(vdev);
>>
>> return true;
>> }
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-16 18:22 ` Joao Martins
@ 2024-07-17 11:48 ` Eric Auger
0 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-17 11:48 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hi Joao,
On 7/16/24 20:22, Joao Martins wrote:
> On 16/07/2024 18:40, Eric Auger wrote:
>> Hi Joao,
>>
>> On 7/12/24 13:46, Joao Martins wrote:
>>> In preparation to moving HostIOMMUDevice realize() being able to called
>>> early during attach_device(), remove properties that rely on container
>>> being initialized.
>> It is difficult to parse the above sentence. Would deserve some rephrasing.
>>
>> Also properties have a different meaning in qemu.
> I think I will remove the above paragraph and instead adopt below with some
> rephrasing:
>
> Remove caps::aw_bits which requires the
> bcontainer::iova_ranges to be inititalized after device is actually
> initialized attached. Instead defer that to .get_cap() and call
s/initialized//g
> vfio_device_get_aw_bits() directly.
>
> This is in preparation for HostIOMMUDevice::realize() being called early during
> attach_device().
>
> Better?
Yes sounds better
Eric
>
>>> This means removing caps::aw_bits which requires the
>>> bcontainer::iova_ranges to be inititalized after device is actually
>> initialized
> Yes
>
>>> attached. Instead defer that to .get_cap() and call
>>> vfio_device_get_aw_bits() directly.
>>>
>>> Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/sysemu/host_iommu_device.h | 1 -
>>> backends/iommufd.c | 3 ++-
>>> hw/vfio/container.c | 5 +----
>>> hw/vfio/iommufd.c | 1 -
>>> 4 files changed, 3 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
>>> index ee6c813c8b22..20e77cf54568 100644
>>> --- a/include/sysemu/host_iommu_device.h
>>> +++ b/include/sysemu/host_iommu_device.h
>>> @@ -24,7 +24,6 @@
>>> */
>>> typedef struct HostIOMMUDeviceCaps {
>>> uint32_t type;
>>> - uint8_t aw_bits;
>> the doc comment needs to be updated accordingly.
>>> } HostIOMMUDeviceCaps;
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 5d3dfa917415..41a9dec3b2c5 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -18,6 +18,7 @@
>>> #include "qemu/error-report.h"
>>> #include "monitor/monitor.h"
>>> #include "trace.h"
>>> +#include "hw/vfio/vfio-common.h"
>>> #include <sys/ioctl.h>
>>> #include <linux/iommufd.h>
>>>
>>> @@ -270,7 +271,7 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
>>> case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
>>> return caps->type;
>>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>>> - return caps->aw_bits;
>>> + return vfio_device_get_aw_bits(hiod->agent);
>>> default:
>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>>> return -EINVAL;
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index 88ede913d6f7..c27f448ba26e 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -1144,7 +1144,6 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>> VFIODevice *vdev = opaque;
>>>
>>> hiod->name = g_strdup(vdev->name);
>>> - hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
>>> hiod->agent = opaque;
>>>
>>> return true;
>>> @@ -1153,11 +1152,9 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>> static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>> Error **errp)
>>> {
>>> - HostIOMMUDeviceCaps *caps = &hiod->caps;
>>> -
>>> switch (cap) {
>>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>>> - return caps->aw_bits;
>>> + return vfio_device_get_aw_bits(hiod->agent);
>>> default:
>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>>> return -EINVAL;
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 325c7598d5a1..873c919e319c 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -722,7 +722,6 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>>
>>> hiod->name = g_strdup(vdev->name);
>>> caps->type = type;
>>> - caps->aw_bits = vfio_device_get_aw_bits(vdev);
>>>
>>> return true;
>>> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 07/12] vfio/{iommufd, container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (5 preceding siblings ...)
2024-07-12 11:46 ` [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
@ 2024-07-12 11:46 ` Joao Martins via
2024-07-16 10:20 ` [PATCH v4 07/12] vfio/{iommufd,container}: " Cédric Le Goater
` (2 more replies)
2024-07-12 11:47 ` [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
` (5 subsequent siblings)
12 siblings, 3 replies; 82+ messages in thread
From: Joao Martins via @ 2024-07-12 11:46 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
Fetch IOMMU hw raw caps behind the device and thus move the
HostIOMMUDevice::realize() to be done during the attach of the device. It
allows it to cache the information obtained from IOMMU_GET_HW_INFO from
iommufd early on. However, while legacy HostIOMMUDevice caps
always return true and doesn't have dependency on other things, the IOMMUFD
backend requires the iommufd FD to be connected and having a devid to be
able to query capabilities. Hence when exactly is HostIOMMUDevice
initialized inside backend ::attach_device() implementation is backend
specific.
This is in preparation to fetch parse hw capabilities and understand if
dirty tracking is supported by device backing IOMMU without necessarily
duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/host_iommu_device.h | 1 +
hw/vfio/common.c | 16 ++++++----------
hw/vfio/container.c | 6 ++++++
hw/vfio/iommufd.c | 7 +++++++
4 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
index 20e77cf54568..b1e5f4b8ac3e 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -24,6 +24,7 @@
*/
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
+ uint64_t hw_caps;
} HostIOMMUDeviceCaps;
#define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b0beed44116e..cc14f0e3fe24 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
{
const VFIOIOMMUClass *ops =
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
- HostIOMMUDevice *hiod;
+ HostIOMMUDevice *hiod = NULL;
if (vbasedev->iommufd) {
ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
@@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
assert(ops);
- if (!ops->attach_device(name, vbasedev, as, errp)) {
- return false;
- }
- if (vbasedev->mdev) {
- return true;
+ if (!vbasedev->mdev) {
+ hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
+ vbasedev->hiod = hiod;
}
- hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
- if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
+ if (!ops->attach_device(name, vbasedev, as, errp)) {
object_unref(hiod);
- ops->detach_device(vbasedev);
+ vbasedev->hiod = NULL;
return false;
}
- vbasedev->hiod = hiod;
return true;
}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index c27f448ba26e..29da261bbf3e 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp)
{
int groupid = vfio_device_groupid(vbasedev, errp);
+ HostIOMMUDevice *hiod = vbasedev->hiod;
VFIODevice *vbasedev_iter;
VFIOGroup *group;
VFIOContainerBase *bcontainer;
@@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
trace_vfio_attach_device(vbasedev->name, groupid);
+ if (hiod &&
+ !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
+ return false;
+ }
+
group = vfio_get_group(groupid, as, errp);
if (!group) {
return false;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 873c919e319c..d34dc88231ec 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
Error *err = NULL;
const VFIOIOMMUClass *iommufd_vioc =
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
+ HostIOMMUDevice *hiod = vbasedev->hiod;
if (vbasedev->fd < 0) {
devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
@@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
space = vfio_get_address_space(as);
+ if (hiod &&
+ !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
+ return false;
+ }
+
/* try to attach to an existing container in this space */
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
@@ -722,6 +728,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
hiod->name = g_strdup(vdev->name);
caps->type = type;
+ caps->hw_caps = hw_caps;
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-12 11:46 ` [PATCH v4 07/12] vfio/{iommufd, container}: Initialize HostIOMMUDeviceCaps during attach_device() Joao Martins via
@ 2024-07-16 10:20 ` Cédric Le Goater
2024-07-16 10:40 ` Joao Martins
2024-07-17 2:05 ` Duan, Zhenzhong
2024-07-17 12:19 ` Eric Auger
2 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 10:20 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:46, Joao Martins wrote:
> Fetch IOMMU hw raw caps behind the device and thus move the
> HostIOMMUDevice::realize() to be done during the attach of the device. It
> allows it to cache the information obtained from IOMMU_GET_HW_INFO from
> iommufd early on. However, while legacy HostIOMMUDevice caps
> always return true and doesn't have dependency on other things, the IOMMUFD
> backend requires the iommufd FD to be connected and having a devid to be
> able to query capabilities. Hence when exactly is HostIOMMUDevice
> initialized inside backend ::attach_device() implementation is backend
> specific.
>
> This is in preparation to fetch parse hw capabilities and understand if
> dirty tracking is supported by device backing IOMMU without necessarily
> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/host_iommu_device.h | 1 +
> hw/vfio/common.c | 16 ++++++----------
> hw/vfio/container.c | 6 ++++++
> hw/vfio/iommufd.c | 7 +++++++
> 4 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
> index 20e77cf54568..b1e5f4b8ac3e 100644
> --- a/include/sysemu/host_iommu_device.h
> +++ b/include/sysemu/host_iommu_device.h
> @@ -24,6 +24,7 @@
> */
> typedef struct HostIOMMUDeviceCaps {
> uint32_t type;
> + uint64_t hw_caps;
> } HostIOMMUDeviceCaps;
>
> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b0beed44116e..cc14f0e3fe24 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> {
> const VFIOIOMMUClass *ops =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
> - HostIOMMUDevice *hiod;
> + HostIOMMUDevice *hiod = NULL;
>
> if (vbasedev->iommufd) {
> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> @@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>
> assert(ops);
>
> - if (!ops->attach_device(name, vbasedev, as, errp)) {
> - return false;
> - }
>
> - if (vbasedev->mdev) {
> - return true;
> + if (!vbasedev->mdev) {
> + hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> + vbasedev->hiod = hiod;
> }
>
> - hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> - if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> + if (!ops->attach_device(name, vbasedev, as, errp)) {
> object_unref(hiod);
> - ops->detach_device(vbasedev);
> + vbasedev->hiod = NULL;
> return false;
> }
> - vbasedev->hiod = hiod;
>
> return true;
> }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index c27f448ba26e..29da261bbf3e 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp)
> {
> int groupid = vfio_device_groupid(vbasedev, errp);
> + HostIOMMUDevice *hiod = vbasedev->hiod;
> VFIODevice *vbasedev_iter;
> VFIOGroup *group;
> VFIOContainerBase *bcontainer;
> @@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>
> trace_vfio_attach_device(vbasedev->name, groupid);
>
> + if (hiod &&
> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> + return false;
> + }
> +
Could you please introduce an helper :
bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
Thanks,
C.
> group = vfio_get_group(groupid, as, errp);
> if (!group) {
> return false;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 873c919e319c..d34dc88231ec 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> Error *err = NULL;
> const VFIOIOMMUClass *iommufd_vioc =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> + HostIOMMUDevice *hiod = vbasedev->hiod;
>
> if (vbasedev->fd < 0) {
> devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> @@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>
> space = vfio_get_address_space(as);
>
> + if (hiod &&
> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> + return false;
> + }
> +
> /* try to attach to an existing container in this space */
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> @@ -722,6 +728,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
> + caps->hw_caps = hw_caps;
>
> return true;
> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-16 10:20 ` [PATCH v4 07/12] vfio/{iommufd,container}: " Cédric Le Goater
@ 2024-07-16 10:40 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-16 10:40 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 16/07/2024 11:20, Cédric Le Goater wrote:
> On 7/12/24 13:46, Joao Martins wrote:
>> Fetch IOMMU hw raw caps behind the device and thus move the
>> HostIOMMUDevice::realize() to be done during the attach of the device. It
>> allows it to cache the information obtained from IOMMU_GET_HW_INFO from
>> iommufd early on. However, while legacy HostIOMMUDevice caps
>> always return true and doesn't have dependency on other things, the IOMMUFD
>> backend requires the iommufd FD to be connected and having a devid to be
>> able to query capabilities. Hence when exactly is HostIOMMUDevice
>> initialized inside backend ::attach_device() implementation is backend
>> specific.
>>
>> This is in preparation to fetch parse hw capabilities and understand if
>> dirty tracking is supported by device backing IOMMU without necessarily
>> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>>
>> Suggested-by: Cédric Le Goater <clg@redhat.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/host_iommu_device.h | 1 +
>> hw/vfio/common.c | 16 ++++++----------
>> hw/vfio/container.c | 6 ++++++
>> hw/vfio/iommufd.c | 7 +++++++
>> 4 files changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/sysemu/host_iommu_device.h
>> b/include/sysemu/host_iommu_device.h
>> index 20e77cf54568..b1e5f4b8ac3e 100644
>> --- a/include/sysemu/host_iommu_device.h
>> +++ b/include/sysemu/host_iommu_device.h
>> @@ -24,6 +24,7 @@
>> */
>> typedef struct HostIOMMUDeviceCaps {
>> uint32_t type;
>> + uint64_t hw_caps;
>> } HostIOMMUDeviceCaps;
>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index b0beed44116e..cc14f0e3fe24 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>> {
>> const VFIOIOMMUClass *ops =
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>> - HostIOMMUDevice *hiod;
>> + HostIOMMUDevice *hiod = NULL;
>> if (vbasedev->iommufd) {
>> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>> @@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>> assert(ops);
>> - if (!ops->attach_device(name, vbasedev, as, errp)) {
>> - return false;
>> - }
>> - if (vbasedev->mdev) {
>> - return true;
>> + if (!vbasedev->mdev) {
>> + hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> + vbasedev->hiod = hiod;
>> }
>> - hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> - if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> + if (!ops->attach_device(name, vbasedev, as, errp)) {
>> object_unref(hiod);
>> - ops->detach_device(vbasedev);
>> + vbasedev->hiod = NULL;
>> return false;
>> }
>> - vbasedev->hiod = hiod;
>> return true;
>> }
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index c27f448ba26e..29da261bbf3e 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char *name,
>> VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp)
>> {
>> int groupid = vfio_device_groupid(vbasedev, errp);
>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>> VFIODevice *vbasedev_iter;
>> VFIOGroup *group;
>> VFIOContainerBase *bcontainer;
>> @@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char *name,
>> VFIODevice *vbasedev,
>> trace_vfio_attach_device(vbasedev->name, groupid);
>> + if (hiod &&
>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> + return false;
>> + }
>> +
>
>
> Could you please introduce an helper :
>
> bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>
Yeah, let me do that
>
> Thanks,
>
> C.
>
>
>
>> group = vfio_get_group(groupid, as, errp);
>> if (!group) {
>> return false;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 873c919e319c..d34dc88231ec 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>> Error *err = NULL;
>> const VFIOIOMMUClass *iommufd_vioc =
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>> if (vbasedev->fd < 0) {
>> devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>> @@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>> space = vfio_get_address_space(as);
>> + if (hiod &&
>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> + return false;
>> + }
>> +
>> /* try to attach to an existing container in this space */
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> @@ -722,6 +728,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice
>> *hiod, void *opaque,
>> hiod->name = g_strdup(vdev->name);
>> caps->type = type;
>> + caps->hw_caps = hw_caps;
>> return true;
>> }
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-12 11:46 ` [PATCH v4 07/12] vfio/{iommufd, container}: Initialize HostIOMMUDeviceCaps during attach_device() Joao Martins via
2024-07-16 10:20 ` [PATCH v4 07/12] vfio/{iommufd,container}: " Cédric Le Goater
@ 2024-07-17 2:05 ` Duan, Zhenzhong
2024-07-17 8:55 ` Joao Martins
2024-07-17 12:19 ` Eric Auger
2 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 2:05 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize
>HostIOMMUDeviceCaps during attach_device()
>
>Fetch IOMMU hw raw caps behind the device and thus move the
>HostIOMMUDevice::realize() to be done during the attach of the device. It
>allows it to cache the information obtained from IOMMU_GET_HW_INFO
>from
>iommufd early on. However, while legacy HostIOMMUDevice caps
>always return true and doesn't have dependency on other things, the
>IOMMUFD
>backend requires the iommufd FD to be connected and having a devid to be
>able to query capabilities. Hence when exactly is HostIOMMUDevice
>initialized inside backend ::attach_device() implementation is backend
>specific.
>
>This is in preparation to fetch parse hw capabilities and understand if
>dirty tracking is supported by device backing IOMMU without necessarily
>duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>
>Suggested-by: Cédric Le Goater <clg@redhat.com>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/sysemu/host_iommu_device.h | 1 +
> hw/vfio/common.c | 16 ++++++----------
> hw/vfio/container.c | 6 ++++++
> hw/vfio/iommufd.c | 7 +++++++
> 4 files changed, 20 insertions(+), 10 deletions(-)
>
>diff --git a/include/sysemu/host_iommu_device.h
>b/include/sysemu/host_iommu_device.h
>index 20e77cf54568..b1e5f4b8ac3e 100644
>--- a/include/sysemu/host_iommu_device.h
>+++ b/include/sysemu/host_iommu_device.h
>@@ -24,6 +24,7 @@
> */
> typedef struct HostIOMMUDeviceCaps {
> uint32_t type;
>+ uint64_t hw_caps;
> } HostIOMMUDeviceCaps;
>
> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>index b0beed44116e..cc14f0e3fe24 100644
>--- a/hw/vfio/common.c
>+++ b/hw/vfio/common.c
>@@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice
>*vbasedev,
> {
> const VFIOIOMMUClass *ops =
>
>VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>- HostIOMMUDevice *hiod;
>+ HostIOMMUDevice *hiod = NULL;
No need to NULL it?
>
> if (vbasedev->iommufd) {
> ops =
>VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>D));
>@@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name,
>VFIODevice *vbasedev,
>
> assert(ops);
>
>- if (!ops->attach_device(name, vbasedev, as, errp)) {
>- return false;
>- }
>
>- if (vbasedev->mdev) {
>- return true;
>+ if (!vbasedev->mdev) {
>+ hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>+ vbasedev->hiod = hiod;
> }
>
>- hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>- if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp)) {
>+ if (!ops->attach_device(name, vbasedev, as, errp)) {
> object_unref(hiod);
>- ops->detach_device(vbasedev);
>+ vbasedev->hiod = NULL;
> return false;
> }
>- vbasedev->hiod = hiod;
>
> return true;
> }
>diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>index c27f448ba26e..29da261bbf3e 100644
>--- a/hw/vfio/container.c
>+++ b/hw/vfio/container.c
>@@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char
>*name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp)
> {
> int groupid = vfio_device_groupid(vbasedev, errp);
>+ HostIOMMUDevice *hiod = vbasedev->hiod;
Hiod is used only once in this func, may be use vbasedev->hiod directly?
> VFIODevice *vbasedev_iter;
> VFIOGroup *group;
> VFIOContainerBase *bcontainer;
>@@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char
>*name, VFIODevice *vbasedev,
>
> trace_vfio_attach_device(vbasedev->name, groupid);
>
>+ if (hiod &&
>+ !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp)) {
>+ return false;
>+ }
>+
> group = vfio_get_group(groupid, as, errp);
> if (!group) {
> return false;
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 873c919e319c..d34dc88231ec 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name,
>VFIODevice *vbasedev,
> Error *err = NULL;
> const VFIOIOMMUClass *iommufd_vioc =
>
>VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>D));
>+ HostIOMMUDevice *hiod = vbasedev->hiod;
Same here.
>
> if (vbasedev->fd < 0) {
> devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>@@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char
>*name, VFIODevice *vbasedev,
>
> space = vfio_get_address_space(as);
>
>+ if (hiod &&
>+ !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp)) {
>+ return false;
>+ }
>+
> /* try to attach to an existing container in this space */
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOIOMMUFDContainer,
>bcontainer);
>@@ -722,6 +728,7 @@ static bool
>hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
>+ caps->hw_caps = hw_caps;
>
> return true;
> }
>--
>2.17.2
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-17 2:05 ` Duan, Zhenzhong
@ 2024-07-17 8:55 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 8:55 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 03:05, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize
>> HostIOMMUDeviceCaps during attach_device()
>>
>> Fetch IOMMU hw raw caps behind the device and thus move the
>> HostIOMMUDevice::realize() to be done during the attach of the device. It
>> allows it to cache the information obtained from IOMMU_GET_HW_INFO
>> from
>> iommufd early on. However, while legacy HostIOMMUDevice caps
>> always return true and doesn't have dependency on other things, the
>> IOMMUFD
>> backend requires the iommufd FD to be connected and having a devid to be
>> able to query capabilities. Hence when exactly is HostIOMMUDevice
>> initialized inside backend ::attach_device() implementation is backend
>> specific.
>>
>> This is in preparation to fetch parse hw capabilities and understand if
>> dirty tracking is supported by device backing IOMMU without necessarily
>> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>>
>> Suggested-by: Cédric Le Goater <clg@redhat.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/host_iommu_device.h | 1 +
>> hw/vfio/common.c | 16 ++++++----------
>> hw/vfio/container.c | 6 ++++++
>> hw/vfio/iommufd.c | 7 +++++++
>> 4 files changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/sysemu/host_iommu_device.h
>> b/include/sysemu/host_iommu_device.h
>> index 20e77cf54568..b1e5f4b8ac3e 100644
>> --- a/include/sysemu/host_iommu_device.h
>> +++ b/include/sysemu/host_iommu_device.h
>> @@ -24,6 +24,7 @@
>> */
>> typedef struct HostIOMMUDeviceCaps {
>> uint32_t type;
>> + uint64_t hw_caps;
>> } HostIOMMUDeviceCaps;
>>
>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index b0beed44116e..cc14f0e3fe24 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice
>> *vbasedev,
>> {
>> const VFIOIOMMUClass *ops =
>>
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>> - HostIOMMUDevice *hiod;
>> + HostIOMMUDevice *hiod = NULL;
>
> No need to NULL it?
>
/me nods
>>
>> if (vbasedev->iommufd) {
>> ops =
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>> D));
>> @@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name,
>> VFIODevice *vbasedev,
>>
>> assert(ops);
>>
>> - if (!ops->attach_device(name, vbasedev, as, errp)) {
>> - return false;
>> - }
>>
>> - if (vbasedev->mdev) {
>> - return true;
>> + if (!vbasedev->mdev) {
>> + hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> + vbasedev->hiod = hiod;
>> }
>>
>> - hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> - if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>> errp)) {
>> + if (!ops->attach_device(name, vbasedev, as, errp)) {
>> object_unref(hiod);
>> - ops->detach_device(vbasedev);
>> + vbasedev->hiod = NULL;
>> return false;
>> }
>> - vbasedev->hiod = hiod;
>>
>> return true;
>> }
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index c27f448ba26e..29da261bbf3e 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char
>> *name, VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp)
>> {
>> int groupid = vfio_device_groupid(vbasedev, errp);
>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>
> Hiod is used only once in this func, may be use vbasedev->hiod directly?
>
The problem is more of how the line below (...)
>
>> VFIODevice *vbasedev_iter;
>> VFIOGroup *group;
>> VFIOContainerBase *bcontainer;
>> @@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char
>> *name, VFIODevice *vbasedev,
>>
>> trace_vfio_attach_device(vbasedev->name, groupid);
>>
>> + if (hiod &&
>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>> errp)) {
>> + return false;
>> + }
>> +
(...) would look like like really long. And I would end up deref-ing 3 times.
But with the helper function that Cedric suggests might easy to accomodate your
comment.
>> group = vfio_get_group(groupid, as, errp);
>> if (!group) {
>> return false;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 873c919e319c..d34dc88231ec 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>> Error *err = NULL;
>> const VFIOIOMMUClass *iommufd_vioc =
>>
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>> D));
>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>
> Same here.
>
>>
>> if (vbasedev->fd < 0) {
>> devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>> @@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char
>> *name, VFIODevice *vbasedev,
>>
>> space = vfio_get_address_space(as);
>>
>> + if (hiod &&
>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>> errp)) {
>> + return false;
>> + }
>> +
>> /* try to attach to an existing container in this space */
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOIOMMUFDContainer,
>> bcontainer);
>> @@ -722,6 +728,7 @@ static bool
>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>
>> hiod->name = g_strdup(vdev->name);
>> caps->type = type;
>> + caps->hw_caps = hw_caps;
>>
>> return true;
>> }
>> --
>> 2.17.2
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-12 11:46 ` [PATCH v4 07/12] vfio/{iommufd, container}: Initialize HostIOMMUDeviceCaps during attach_device() Joao Martins via
2024-07-16 10:20 ` [PATCH v4 07/12] vfio/{iommufd,container}: " Cédric Le Goater
2024-07-17 2:05 ` Duan, Zhenzhong
@ 2024-07-17 12:19 ` Eric Auger
2024-07-17 12:33 ` Joao Martins
2 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2024-07-17 12:19 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hi Joao,
On 7/12/24 13:46, Joao Martins wrote:
> Fetch IOMMU hw raw caps behind the device and thus move the
what does mean "Fetch IOMMU hw raw caps behind the device'"
> HostIOMMUDevice::realize() to be done during the attach of the device. It
> allows it to cache the information obtained from IOMMU_GET_HW_INFO from
what do you mean by " It allows it to cache the information obtained
from IOMMU_GET_HW_INFO from iommufd early on"
> iommufd early on. However, while legacy HostIOMMUDevice caps
what does mean "legacy HostIOMMUDevice caps always return true"?
> always return true and doesn't have dependency on other things, the IOMMUFD
> backend requires the iommufd FD to be connected and having a devid to be
> able to query capabilities. Hence when exactly is HostIOMMUDevice
> initialized inside backend ::attach_device() implementation is backend
> specific.
>
> This is in preparation to fetch parse hw capabilities and understand if
fetch parse?
> dirty tracking is supported by device backing IOMMU without necessarily
> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
But we move code from generic place to BE specific place?
Sorry I feel really hard to understand the commit msg in general
Eric
>
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/host_iommu_device.h | 1 +
> hw/vfio/common.c | 16 ++++++----------
> hw/vfio/container.c | 6 ++++++
> hw/vfio/iommufd.c | 7 +++++++
> 4 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
> index 20e77cf54568..b1e5f4b8ac3e 100644
> --- a/include/sysemu/host_iommu_device.h
> +++ b/include/sysemu/host_iommu_device.h
> @@ -24,6 +24,7 @@
> */
> typedef struct HostIOMMUDeviceCaps {
> uint32_t type;
> + uint64_t hw_caps;
please also update the doc comment
> } HostIOMMUDeviceCaps;
>
> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b0beed44116e..cc14f0e3fe24 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> {
> const VFIOIOMMUClass *ops =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
> - HostIOMMUDevice *hiod;
> + HostIOMMUDevice *hiod = NULL;
>
> if (vbasedev->iommufd) {
> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> @@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>
> assert(ops);
>
> - if (!ops->attach_device(name, vbasedev, as, errp)) {
> - return false;
> - }
>
> - if (vbasedev->mdev) {
> - return true;
> + if (!vbasedev->mdev) {
> + hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> + vbasedev->hiod = hiod;
> }
>
> - hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> - if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> + if (!ops->attach_device(name, vbasedev, as, errp)) {
> object_unref(hiod);
> - ops->detach_device(vbasedev);
> + vbasedev->hiod = NULL;
> return false;
> }
> - vbasedev->hiod = hiod;
>
> return true;
> }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index c27f448ba26e..29da261bbf3e 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp)
> {
> int groupid = vfio_device_groupid(vbasedev, errp);
> + HostIOMMUDevice *hiod = vbasedev->hiod;
> VFIODevice *vbasedev_iter;
> VFIOGroup *group;
> VFIOContainerBase *bcontainer;
> @@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>
> trace_vfio_attach_device(vbasedev->name, groupid);
>
> + if (hiod &&
> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> + return false;
> + }
> +
> group = vfio_get_group(groupid, as, errp);
> if (!group) {
> return false;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 873c919e319c..d34dc88231ec 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> Error *err = NULL;
> const VFIOIOMMUClass *iommufd_vioc =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> + HostIOMMUDevice *hiod = vbasedev->hiod;
>
> if (vbasedev->fd < 0) {
> devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> @@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>
> space = vfio_get_address_space(as);
>
> + if (hiod &&
> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> + return false;
> + }
> +
> /* try to attach to an existing container in this space */
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> @@ -722,6 +728,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
> + caps->hw_caps = hw_caps;
>
> return true;
> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-17 12:19 ` Eric Auger
@ 2024-07-17 12:33 ` Joao Martins
2024-07-17 13:41 ` Eric Auger
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 12:33 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 13:19, Eric Auger wrote:
> Hi Joao,
>
> On 7/12/24 13:46, Joao Martins wrote:
>> Fetch IOMMU hw raw caps behind the device and thus move the
> what does mean "Fetch IOMMU hw raw caps behind the device'"
Fetching the out_capabilities field from GET_HW_INFO which essentially tell us
if the IOMMU behind the device supports dirty tracking.
>> HostIOMMUDevice::realize() to be done during the attach of the device. It
>> allows it to cache the information obtained from IOMMU_GET_HW_INFO from
> what do you mean by " It allows it to cache the information obtained
> from IOMMU_GET_HW_INFO from iommufd early on"
/me nods
>> iommufd early on. However, while legacy HostIOMMUDevice caps
> what does mean "legacy HostIOMMUDevice caps always return true"?
That means that it can't fail, and the data in there is synthetic:
VFIODevice *vdev = opaque;
hiod->name = g_strdup(vdev->name);
hiod->agent = opaque;
return true;
The IOMMUFD one might fail if GET_HW_INFO fails.
>> always return true and doesn't have dependency on other things, the IOMMUFD
>> backend requires the iommufd FD to be connected and having a devid to be
>> able to query capabilities. Hence when exactly is HostIOMMUDevice
>> initialized inside backend ::attach_device() implementation is backend
>> specific.
>>
>> This is in preparation to fetch parse hw capabilities and understand if
> fetch parse?
>> dirty tracking is supported by device backing IOMMU without necessarily
>> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
> But we move code from generic place to BE specific place?
>
No because in IOMMUFD needs the backend connected, while the legacy backend
doesn't. Otherwise this patch wouldn't be needed to be backend specific.
> Sorry I feel really hard to understand the commit msg in general
>
How about this:
Fetch IOMMU hw raw caps behind the device and thus move the
HostIOMMUDevice::realize() to be done during the attach of the device.
This is in preparation to fetch parse hw capabilities and understand if
dirty tracking is supported by device backing IOMMU without necessarily
duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
Note that the HostIOMMUDevice data with legacy backend is synthetic
and doesn't need any information from the (type1-iommu) backend. While the
IOMMUFD backend requires the iommufd FD to be connected and having a devid
to be able to query device capabilities seeded in HostIOMMUDevice. This
means that HostIOMMUDevice initialization (i.e. ::realized() is invoked) is
container backend specific.
> Eric
>
>
>>
>> Suggested-by: Cédric Le Goater <clg@redhat.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/host_iommu_device.h | 1 +
>> hw/vfio/common.c | 16 ++++++----------
>> hw/vfio/container.c | 6 ++++++
>> hw/vfio/iommufd.c | 7 +++++++
>> 4 files changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
>> index 20e77cf54568..b1e5f4b8ac3e 100644
>> --- a/include/sysemu/host_iommu_device.h
>> +++ b/include/sysemu/host_iommu_device.h
>> @@ -24,6 +24,7 @@
>> */
>> typedef struct HostIOMMUDeviceCaps {
>> uint32_t type;
>> + uint64_t hw_caps;
> please also update the doc comment
OK
>> } HostIOMMUDeviceCaps;
>>
>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index b0beed44116e..cc14f0e3fe24 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>> {
>> const VFIOIOMMUClass *ops =
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>> - HostIOMMUDevice *hiod;
>> + HostIOMMUDevice *hiod = NULL;
>>
>> if (vbasedev->iommufd) {
>> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>> @@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>
>> assert(ops);
>>
>> - if (!ops->attach_device(name, vbasedev, as, errp)) {
>> - return false;
>> - }
>>
>> - if (vbasedev->mdev) {
>> - return true;
>> + if (!vbasedev->mdev) {
>> + hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> + vbasedev->hiod = hiod;
>> }
>>
>> - hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> - if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> + if (!ops->attach_device(name, vbasedev, as, errp)) {
>> object_unref(hiod);
>> - ops->detach_device(vbasedev);
>> + vbasedev->hiod = NULL;
>> return false;
>> }
>> - vbasedev->hiod = hiod;
>>
>> return true;
>> }
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index c27f448ba26e..29da261bbf3e 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp)
>> {
>> int groupid = vfio_device_groupid(vbasedev, errp);
>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>> VFIODevice *vbasedev_iter;
>> VFIOGroup *group;
>> VFIOContainerBase *bcontainer;
>> @@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>>
>> trace_vfio_attach_device(vbasedev->name, groupid);
>>
>> + if (hiod &&
>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> + return false;
>> + }
>> +
>> group = vfio_get_group(groupid, as, errp);
>> if (!group) {
>> return false;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 873c919e319c..d34dc88231ec 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>> Error *err = NULL;
>> const VFIOIOMMUClass *iommufd_vioc =
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>>
>> if (vbasedev->fd < 0) {
>> devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>> @@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>
>> space = vfio_get_address_space(as);
>>
>> + if (hiod &&
>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> + return false;
>> + }
>> +
>> /* try to attach to an existing container in this space */
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> @@ -722,6 +728,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>
>> hiod->name = g_strdup(vdev->name);
>> caps->type = type;
>> + caps->hw_caps = hw_caps;
>>
>> return true;
>> }
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-17 12:33 ` Joao Martins
@ 2024-07-17 13:41 ` Eric Auger
2024-07-17 15:34 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2024-07-17 13:41 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/17/24 14:33, Joao Martins wrote:
> On 17/07/2024 13:19, Eric Auger wrote:
>> Hi Joao,
>>
>> On 7/12/24 13:46, Joao Martins wrote:
>>> Fetch IOMMU hw raw caps behind the device and thus move the
>> what does mean "Fetch IOMMU hw raw caps behind the device'"
> Fetching the out_capabilities field from GET_HW_INFO which essentially tell us
> if the IOMMU behind the device supports dirty tracking.
that's much clearer than the 1st sentence
>
>>> HostIOMMUDevice::realize() to be done during the attach of the device. It
>>> allows it to cache the information obtained from IOMMU_GET_HW_INFO from
>> what do you mean by " It allows it to cache the information obtained
>> from IOMMU_GET_HW_INFO from iommufd early on"
> /me nods
?
>
>>> iommufd early on. However, while legacy HostIOMMUDevice caps
>> what does mean "legacy HostIOMMUDevice caps always return true"?
> That means that it can't fail, and the data in there is synthetic:
>
> VFIODevice *vdev = opaque;
>
> hiod->name = g_strdup(vdev->name);
> hiod->agent = opaque;
>
> return true;
>
> The IOMMUFD one might fail if GET_HW_INFO fails.
so you talk about hiod_legacy_vfio_realize() and not "
legacy HostIOMMUDevice caps"!
>
>>> always return true and doesn't have dependency on other things, the IOMMUFD
>>> backend requires the iommufd FD to be connected and having a devid to be
>>> able to query capabilities. Hence when exactly is HostIOMMUDevice
>>> initialized inside backend ::attach_device() implementation is backend
>>> specific.
>>>
>>> This is in preparation to fetch parse hw capabilities and understand if
>> fetch parse?
>>> dirty tracking is supported by device backing IOMMU without necessarily
>>> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>> But we move code from generic place to BE specific place?
>>
> No because in IOMMUFD needs the backend connected, while the legacy backend
> doesn't. Otherwise this patch wouldn't be needed to be backend specific.
>
>> Sorry I feel really hard to understand the commit msg in general
>>
> How about this:
>
> Fetch IOMMU hw raw caps behind the device and thus move the
You need to tell what the patch does and why.
"Fetch IOMMU hw raw caps behind the device" sentence does not clearly fit in any.
> HostIOMMUDevice::realize() to be done during the attach of the device.
>
> This is in preparation to fetch parse hw capabilities and understand if
> dirty tracking is supported by device backing IOMMU without necessarily
> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>
> Note that the HostIOMMUDevice data with legacy backend is synthetic
> and doesn't need any information from the (type1-iommu) backend. While the
> IOMMUFD backend requires the iommufd FD to be connected and having a devid
> to be able to query device capabilities seeded in HostIOMMUDevice. This
> means that HostIOMMUDevice initialization (i.e. ::realized() is invoked) is
> container backend specific.
>
>
>
>
>> Eric
>>
>>
>>> Suggested-by: Cédric Le Goater <clg@redhat.com>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/sysemu/host_iommu_device.h | 1 +
>>> hw/vfio/common.c | 16 ++++++----------
>>> hw/vfio/container.c | 6 ++++++
>>> hw/vfio/iommufd.c | 7 +++++++
>>> 4 files changed, 20 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
>>> index 20e77cf54568..b1e5f4b8ac3e 100644
>>> --- a/include/sysemu/host_iommu_device.h
>>> +++ b/include/sysemu/host_iommu_device.h
>>> @@ -24,6 +24,7 @@
>>> */
>>> typedef struct HostIOMMUDeviceCaps {
>>> uint32_t type;
>>> + uint64_t hw_caps;
>> please also update the doc comment
> OK
>
>>> } HostIOMMUDeviceCaps;
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index b0beed44116e..cc14f0e3fe24 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>> {
>>> const VFIOIOMMUClass *ops =
>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>> - HostIOMMUDevice *hiod;
>>> + HostIOMMUDevice *hiod = NULL;
>>>
>>> if (vbasedev->iommufd) {
>>> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>>> @@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>
>>> assert(ops);
>>>
>>> - if (!ops->attach_device(name, vbasedev, as, errp)) {
>>> - return false;
>>> - }
>>>
>>> - if (vbasedev->mdev) {
>>> - return true;
>>> + if (!vbasedev->mdev) {
>>> + hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>> + vbasedev->hiod = hiod;
>>> }
>>>
>>> - hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>> - if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>>> + if (!ops->attach_device(name, vbasedev, as, errp)) {
>>> object_unref(hiod);
>>> - ops->detach_device(vbasedev);
>>> + vbasedev->hiod = NULL;
>>> return false;
>>> }
>>> - vbasedev->hiod = hiod;
>>>
>>> return true;
>>> }
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index c27f448ba26e..29da261bbf3e 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -907,6 +907,7 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>>> AddressSpace *as, Error **errp)
>>> {
>>> int groupid = vfio_device_groupid(vbasedev, errp);
>>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>>> VFIODevice *vbasedev_iter;
>>> VFIOGroup *group;
>>> VFIOContainerBase *bcontainer;
>>> @@ -917,6 +918,11 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>>>
>>> trace_vfio_attach_device(vbasedev->name, groupid);
>>>
>>> + if (hiod &&
>>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>>> + return false;
>>> + }
>>> +
>>> group = vfio_get_group(groupid, as, errp);
>>> if (!group) {
>>> return false;
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 873c919e319c..d34dc88231ec 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -384,6 +384,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>> Error *err = NULL;
>>> const VFIOIOMMUClass *iommufd_vioc =
>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>>> + HostIOMMUDevice *hiod = vbasedev->hiod;
>>>
>>> if (vbasedev->fd < 0) {
>>> devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>>> @@ -401,6 +402,11 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>>
>>> space = vfio_get_address_space(as);
>>>
>>> + if (hiod &&
>>> + !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>>> + return false;
>>> + }
>>> +
>>> /* try to attach to an existing container in this space */
>>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>>> container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>>> @@ -722,6 +728,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>>
>>> hiod->name = g_strdup(vdev->name);
>>> caps->type = type;
>>> + caps->hw_caps = hw_caps;
>>>
>>> return true;
>>> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 07/12] vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during attach_device()
2024-07-17 13:41 ` Eric Auger
@ 2024-07-17 15:34 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 15:34 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 14:41, Eric Auger wrote:
> On 7/17/24 14:33, Joao Martins wrote:
>> On 17/07/2024 13:19, Eric Auger wrote:
>>> Hi Joao,
>>>
>>> On 7/12/24 13:46, Joao Martins wrote:
>>>> Fetch IOMMU hw raw caps behind the device and thus move the
>>> what does mean "Fetch IOMMU hw raw caps behind the device'"
>> Fetching the out_capabilities field from GET_HW_INFO which essentially tell us
>> if the IOMMU behind the device supports dirty tracking.
> that's much clearer than the 1st sentence
>>
>>>> HostIOMMUDevice::realize() to be done during the attach of the device. It
>>>> allows it to cache the information obtained from IOMMU_GET_HW_INFO from
>>> what do you mean by " It allows it to cache the information obtained
>>> from IOMMU_GET_HW_INFO from iommufd early on"
>> /me nods
> ?
By caching I mean that invoking realize() earlier allow us to store the value of
@out_capabilities in HostIOMMUDevice::caps for later use and avoid having to
call GET_HW_INFO Again. 'Early on' refers to me doing this at the beginning of
attach_device().
>>
>>>> iommufd early on. However, while legacy HostIOMMUDevice caps
>>> what does mean "legacy HostIOMMUDevice caps always return true"?
>> That means that it can't fail, and the data in there is synthetic:
>>
>> VFIODevice *vdev = opaque;
>>
>> hiod->name = g_strdup(vdev->name);
>> hiod->agent = opaque;
>>
>> return true;
>>
>> The IOMMUFD one might fail if GET_HW_INFO fails.
> so you talk about hiod_legacy_vfio_realize() and not "
>
> legacy HostIOMMUDevice caps"!
>
It's both. Legacy doesn't need to initialize @caps. Whereby in IOMMUFD we do and
with actual info (the capabilities) and in order to do that, we need the backend
initialized. And *that* ioctl() may fail.
>>
>>>> always return true and doesn't have dependency on other things, the IOMMUFD
>>>> backend requires the iommufd FD to be connected and having a devid to be
>>>> able to query capabilities. Hence when exactly is HostIOMMUDevice
>>>> initialized inside backend ::attach_device() implementation is backend
>>>> specific.
>>>>
>>>> This is in preparation to fetch parse hw capabilities and understand if
>>> fetch parse?
>>>> dirty tracking is supported by device backing IOMMU without necessarily
>>>> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>>> But we move code from generic place to BE specific place?
>>>
>> No because in IOMMUFD needs the backend connected, while the legacy backend
>> doesn't. Otherwise this patch wouldn't be needed to be backend specific.
>>
>>> Sorry I feel really hard to understand the commit msg in general
>>>
>> How about this:
>>
>> Fetch IOMMU hw raw caps behind the device and thus move the
> You need to tell what the patch does and why.
>
IMHO, I already do that -- what we are having here is a parsing issue on my
english (likely because it's a bit convoluted).
Me asking you how it sounds is for me to calibrate against how you understand it
or literality of the text (or lack of thereof).
> "Fetch IOMMU hw raw caps behind the device" sentence does not clearly fit in any.
>
>> HostIOMMUDevice::realize() to be done during the attach of the device.
>>
>> This is in preparation to fetch parse hw capabilities and understand if
>> dirty tracking is supported by device backing IOMMU without necessarily
>> duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
>>
>> Note that the HostIOMMUDevice data with legacy backend is synthetic
>> and doesn't need any information from the (type1-iommu) backend. While the
>> IOMMUFD backend requires the iommufd FD to be connected and having a devid
>> to be able to query device capabilities seeded in HostIOMMUDevice. This
>> means that HostIOMMUDevice initialization (i.e. ::realized() is invoked) is
>> container backend specific.
>>
>>
>>
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (6 preceding siblings ...)
2024-07-12 11:46 ` [PATCH v4 07/12] vfio/{iommufd, container}: Initialize HostIOMMUDeviceCaps during attach_device() Joao Martins via
@ 2024-07-12 11:47 ` Joao Martins
2024-07-16 12:21 ` Cédric Le Goater
2024-07-17 12:27 ` Eric Auger
2024-07-12 11:47 ` [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
` (4 subsequent siblings)
12 siblings, 2 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
Probe hardware dirty tracking support by querying device hw capabilities via
IOMMUFD_GET_HW_INFO.
In preparation to using the dirty tracking UAPI, request dirty tracking in the
HWPT flags when the IOMMU supports dirty tracking.
The auto domain logic allows different IOMMU domains to be created when DMA
dirty tracking is not desired (and VF can provide it) while others doesn't have
it and want the IOMMU capability. This is not used in this way here given how
VFIODevice migration capability checking takes place *after* the device
attachment.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/iommufd.c | 12 ++++++++++++
2 files changed, 13 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 2dd468ce3c02..760f31d84ac8 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
typedef struct VFIOIOASHwpt {
uint32_t hwpt_id;
+ uint32_t hwpt_flags;
QLIST_HEAD(, VFIODevice) device_list;
QLIST_ENTRY(VFIOIOASHwpt) next;
} VFIOIOASHwpt;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index d34dc88231ec..edc8f97d8f3d 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -246,6 +246,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
}
}
+ /*
+ * This is quite early and VFIODevice isn't yet fully initialized,
+ * thus rely on IOMMU hardware capabilities as to whether IOMMU dirty
+ * tracking is going to be needed.
+ */
+ if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
+ flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+ }
+
if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
container->ioas_id, flags,
IOMMU_HWPT_DATA_NONE, 0, NULL,
@@ -255,6 +264,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
hwpt = g_malloc0(sizeof(*hwpt));
hwpt->hwpt_id = hwpt_id;
+ hwpt->hwpt_flags = flags;
QLIST_INIT(&hwpt->device_list);
ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
@@ -267,6 +277,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
vbasedev->hwpt = hwpt;
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ container->bcontainer.dirty_pages_supported |=
+ (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-12 11:47 ` [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-07-16 12:21 ` Cédric Le Goater
2024-07-17 12:27 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 12:21 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:47, Joao Martins wrote:
> Probe hardware dirty tracking support by querying device hw capabilities via
> IOMMUFD_GET_HW_INFO.
>
> In preparation to using the dirty tracking UAPI, request dirty tracking in the
> HWPT flags when the IOMMU supports dirty tracking.
>
> The auto domain logic allows different IOMMU domains to be created when DMA
> dirty tracking is not desired (and VF can provide it) while others doesn't have
> it and want the IOMMU capability. This is not used in this way here given how
> VFIODevice migration capability checking takes place *after* the device
> attachment.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/iommufd.c | 12 ++++++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 2dd468ce3c02..760f31d84ac8 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> typedef struct VFIOIOASHwpt {
> uint32_t hwpt_id;
> + uint32_t hwpt_flags;
> QLIST_HEAD(, VFIODevice) device_list;
> QLIST_ENTRY(VFIOIOASHwpt) next;
> } VFIOIOASHwpt;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index d34dc88231ec..edc8f97d8f3d 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -246,6 +246,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> }
> }
>
> + /*
> + * This is quite early and VFIODevice isn't yet fully initialized,
> + * thus rely on IOMMU hardware capabilities as to whether IOMMU dirty
> + * tracking is going to be needed.
> + */
> + if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> + }
> +
> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> container->ioas_id, flags,
> IOMMU_HWPT_DATA_NONE, 0, NULL,
> @@ -255,6 +264,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>
> hwpt = g_malloc0(sizeof(*hwpt));
> hwpt->hwpt_id = hwpt_id;
> + hwpt->hwpt_flags = flags;
> QLIST_INIT(&hwpt->device_list);
>
> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> @@ -267,6 +277,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> vbasedev->hwpt = hwpt;
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> + container->bcontainer.dirty_pages_supported |=
> + (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
> return true;
> }
>
Could you please introduce in this patch helper :
static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
{
return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
}
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-12 11:47 ` [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
2024-07-16 12:21 ` Cédric Le Goater
@ 2024-07-17 12:27 ` Eric Auger
2024-07-17 12:38 ` Joao Martins
1 sibling, 1 reply; 82+ messages in thread
From: Eric Auger @ 2024-07-17 12:27 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hi Joao,
On 7/12/24 13:47, Joao Martins wrote:
> Probe hardware dirty tracking support by querying device hw capabilities via
> IOMMUFD_GET_HW_INFO.
this is not what the patch brings. GET_HW_INFO is always in place.
>
> In preparation to using the dirty tracking UAPI, request dirty tracking in the
> HWPT flags when the IOMMU supports dirty tracking.
this is what the patch brings.
>
> The auto domain logic allows different IOMMU domains to be created when DMA
> dirty tracking is not desired (and VF can provide it) while others doesn't have
don't
> it and want the IOMMU capability. This is not used in this way here given how
> VFIODevice migration capability checking takes place *after* the device
> attachment.
Id on't understand the above sentence
Eric
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/iommufd.c | 12 ++++++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 2dd468ce3c02..760f31d84ac8 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> typedef struct VFIOIOASHwpt {
> uint32_t hwpt_id;
> + uint32_t hwpt_flags;
> QLIST_HEAD(, VFIODevice) device_list;
> QLIST_ENTRY(VFIOIOASHwpt) next;
> } VFIOIOASHwpt;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index d34dc88231ec..edc8f97d8f3d 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -246,6 +246,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> }
> }
>
> + /*
> + * This is quite early and VFIODevice isn't yet fully initialized,
so what's the problem exactly with the above?
> + * thus rely on IOMMU hardware capabilities as to whether IOMMU dirty
> + * tracking is going to be needed.
> + */
> + if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> + }
> +
> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> container->ioas_id, flags,
> IOMMU_HWPT_DATA_NONE, 0, NULL,
> @@ -255,6 +264,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>
> hwpt = g_malloc0(sizeof(*hwpt));
> hwpt->hwpt_id = hwpt_id;
> + hwpt->hwpt_flags = flags;
> QLIST_INIT(&hwpt->device_list);
>
> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> @@ -267,6 +277,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> vbasedev->hwpt = hwpt;
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> + container->bcontainer.dirty_pages_supported |=
> + (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
> return true;
> }
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-17 12:27 ` Eric Auger
@ 2024-07-17 12:38 ` Joao Martins
2024-07-17 13:43 ` Eric Auger
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 12:38 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 13:27, Eric Auger wrote:
> Hi Joao,
>
> On 7/12/24 13:47, Joao Martins wrote:
>> Probe hardware dirty tracking support by querying device hw capabilities via
>> IOMMUFD_GET_HW_INFO.
> this is not what the patch brings. GET_HW_INFO is always in place.
Yes. This is my mistake in squashing things as there was some shuffling going
around on how we do GET_HW_INFO. and didn't adjust the right hand of this sentence.
I'll rephrase it.
>>
>> In preparation to using the dirty tracking UAPI, request dirty tracking in the
>> HWPT flags when the IOMMU supports dirty tracking.
> this is what the patch brings.
Right.
>>
>> The auto domain logic allows different IOMMU domains to be created when DMA
>> dirty tracking is not desired (and VF can provide it) while others doesn't have
> don't
Right
>> it and want the IOMMU capability. This is not used in this way here given how
>> VFIODevice migration capability checking takes place *after* the device
>> attachment.
> Id on't understand the above sentence
>
The whole paragraph is meant to emphasize that we don't know if VF dirty
tracking is supported because VFIODevice migration state hasn't been probed
*yet*. And so we can't pick VF dirty tracking vs IOMMU dirty tracking at this
stage when using IOMMU_HWPT_ALLOC_DIRTY_TRACKING flag and hence we always use it
if IOMMU hw supports it even if later on VFIOMigration decides to use VF dirty
tracking always instead.
> Eric
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 1 +
>> hw/vfio/iommufd.c | 12 ++++++++++++
>> 2 files changed, 13 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 2dd468ce3c02..760f31d84ac8 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>>
>> typedef struct VFIOIOASHwpt {
>> uint32_t hwpt_id;
>> + uint32_t hwpt_flags;
>> QLIST_HEAD(, VFIODevice) device_list;
>> QLIST_ENTRY(VFIOIOASHwpt) next;
>> } VFIOIOASHwpt;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index d34dc88231ec..edc8f97d8f3d 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -246,6 +246,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> }
>> }
>>
>> + /*
>> + * This is quite early and VFIODevice isn't yet fully initialized,
> so what's the problem exactly with the above?
I should really say 'VFIO Migration state' here (see previous comment)
>> + * thus rely on IOMMU hardware capabilities as to whether IOMMU dirty
>> + * tracking is going to be needed.
>> + */
>> + if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
>> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> + }
>> +
>> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> container->ioas_id, flags,
>> IOMMU_HWPT_DATA_NONE, 0, NULL,
>> @@ -255,6 +264,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>
>> hwpt = g_malloc0(sizeof(*hwpt));
>> hwpt->hwpt_id = hwpt_id;
>> + hwpt->hwpt_flags = flags;
>> QLIST_INIT(&hwpt->device_list);
>>
>> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>> @@ -267,6 +277,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> vbasedev->hwpt = hwpt;
>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> + container->bcontainer.dirty_pages_supported |=
>> + (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>> return true;
>> }
>>
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-17 12:38 ` Joao Martins
@ 2024-07-17 13:43 ` Eric Auger
0 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-17 13:43 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/17/24 14:38, Joao Martins wrote:
> On 17/07/2024 13:27, Eric Auger wrote:
>> Hi Joao,
>>
>> On 7/12/24 13:47, Joao Martins wrote:
>>> Probe hardware dirty tracking support by querying device hw capabilities via
>>> IOMMUFD_GET_HW_INFO.
>> this is not what the patch brings. GET_HW_INFO is always in place.
> Yes. This is my mistake in squashing things as there was some shuffling going
> around on how we do GET_HW_INFO. and didn't adjust the right hand of this sentence.
>
> I'll rephrase it.
>
>>> In preparation to using the dirty tracking UAPI, request dirty tracking in the
>>> HWPT flags when the IOMMU supports dirty tracking.
>> this is what the patch brings.
> Right.
>
>>> The auto domain logic allows different IOMMU domains to be created when DMA
>>> dirty tracking is not desired (and VF can provide it) while others doesn't have
>> don't
> Right
>
>>> it and want the IOMMU capability. This is not used in this way here given how
>>> VFIODevice migration capability checking takes place *after* the device
>>> attachment.
>> Id on't understand the above sentence
>>
> The whole paragraph is meant to emphasize that we don't know if VF dirty
> tracking is supported because VFIODevice migration state hasn't been probed
> *yet*. And so we can't pick VF dirty tracking vs IOMMU dirty tracking at this
> stage when using IOMMU_HWPT_ALLOC_DIRTY_TRACKING flag and hence we always use it
> if IOMMU hw supports it even if later on VFIOMigration decides to use VF dirty
> tracking always instead.
that sounds a clearer explanation to me
Eric
>
>> Eric
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 1 +
>>> hw/vfio/iommufd.c | 12 ++++++++++++
>>> 2 files changed, 13 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 2dd468ce3c02..760f31d84ac8 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>
>>> typedef struct VFIOIOASHwpt {
>>> uint32_t hwpt_id;
>>> + uint32_t hwpt_flags;
>>> QLIST_HEAD(, VFIODevice) device_list;
>>> QLIST_ENTRY(VFIOIOASHwpt) next;
>>> } VFIOIOASHwpt;
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index d34dc88231ec..edc8f97d8f3d 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -246,6 +246,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> }
>>> }
>>>
>>> + /*
>>> + * This is quite early and VFIODevice isn't yet fully initialized,
>> so what's the problem exactly with the above?
> I should really say 'VFIO Migration state' here (see previous comment)
>
>>> + * thus rely on IOMMU hardware capabilities as to whether IOMMU dirty
>>> + * tracking is going to be needed.
>>> + */
>>> + if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
>>> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> + }
>>> +
>>> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>> container->ioas_id, flags,
>>> IOMMU_HWPT_DATA_NONE, 0, NULL,
>>> @@ -255,6 +264,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>
>>> hwpt = g_malloc0(sizeof(*hwpt));
>>> hwpt->hwpt_id = hwpt_id;
>>> + hwpt->hwpt_flags = flags;
>>> QLIST_INIT(&hwpt->device_list);
>>>
>>> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>> @@ -267,6 +277,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> vbasedev->hwpt = hwpt;
>>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> + container->bcontainer.dirty_pages_supported |=
>>> + (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>>> return true;
>>> }
>>>
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (7 preceding siblings ...)
2024-07-12 11:47 ` [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-07-12 11:47 ` Joao Martins
2024-07-16 12:24 ` Cédric Le Goater
` (2 more replies)
2024-07-12 11:47 ` [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
` (3 subsequent siblings)
12 siblings, 3 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking. It is used if the hwpt
has been created with dirty tracking supported domain (stored in
hwpt::flags) and it is called on the whole list of iommu domains
it is are tracking. On failure it rolls it back.
The checking of hwpt::flags is introduced here as a second user
and thus consolidate such check into a helper function
iommufd_hwpt_dirty_tracking().
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/iommufd.h | 3 +++
backends/iommufd.c | 23 +++++++++++++++++++++++
hw/vfio/iommufd.c | 39 ++++++++++++++++++++++++++++++++++++++-
backends/trace-events | 1 +
4 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index e917e7591d05..7416d9219703 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -55,6 +55,9 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
uint32_t data_type, uint32_t data_len,
void *data_ptr, uint32_t *out_hwpt,
Error **errp);
+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+ bool start, Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
+
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 41a9dec3b2c5..239f0976e0ad 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -239,6 +239,29 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
return true;
}
+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
+ uint32_t hwpt_id, bool start,
+ Error **errp)
+{
+ int ret;
+ struct iommu_hwpt_set_dirty_tracking set_dirty = {
+ .size = sizeof(set_dirty),
+ .hwpt_id = hwpt_id,
+ .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
+ };
+
+ ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
+ trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno : 0);
+ if (ret) {
+ error_setg_errno(errp, errno,
+ "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
+ hwpt_id);
+ return false;
+ }
+
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index edc8f97d8f3d..da678315faeb 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -110,6 +110,42 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
iommufd_backend_disconnect(vbasedev->iommufd);
}
+static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
+{
+ return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+}
+
+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+ bool start, Error **errp)
+{
+ const VFIOIOMMUFDContainer *container =
+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+ VFIOIOASHwpt *hwpt;
+
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+
+ if (!iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, start, errp)) {
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+ iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, !start, NULL);
+ }
+ return -EINVAL;
+}
+
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
{
ERRP_GUARD();
@@ -278,7 +314,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
container->bcontainer.dirty_pages_supported |=
- (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
+ iommufd_hwpt_dirty_tracking(hwpt);
return true;
}
@@ -717,6 +753,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->attach_device = iommufd_cdev_attach;
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
+ vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
};
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/backends/trace-events b/backends/trace-events
index 4d8ac02fe7d6..28aca3b859d4 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-12 11:47 ` [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-07-16 12:24 ` Cédric Le Goater
2024-07-17 2:24 ` Duan, Zhenzhong
2024-07-17 12:36 ` Eric Auger
2 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 12:24 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:47, Joao Martins wrote:
> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
> enables or disables dirty page tracking. It is used if the hwpt
> has been created with dirty tracking supported domain (stored in
> hwpt::flags) and it is called on the whole list of iommu domains
> it is are tracking. On failure it rolls it back.
>
> The checking of hwpt::flags is introduced here as a second user
> and thus consolidate such check into a helper function
> iommufd_hwpt_dirty_tracking().
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/iommufd.h | 3 +++
> backends/iommufd.c | 23 +++++++++++++++++++++++
> hw/vfio/iommufd.c | 39 ++++++++++++++++++++++++++++++++++++++-
> backends/trace-events | 1 +
> 4 files changed, 65 insertions(+), 1 deletion(-)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index e917e7591d05..7416d9219703 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -55,6 +55,9 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> uint32_t data_type, uint32_t data_len,
> void *data_ptr, uint32_t *out_hwpt,
> Error **errp);
> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> + bool start, Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> +
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 41a9dec3b2c5..239f0976e0ad 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -239,6 +239,29 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> return true;
> }
>
> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
> + uint32_t hwpt_id, bool start,
> + Error **errp)
> +{
> + int ret;
> + struct iommu_hwpt_set_dirty_tracking set_dirty = {
> + .size = sizeof(set_dirty),
> + .hwpt_id = hwpt_id,
> + .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
How about :
.flags = start ? IOMMU_HWPT_DIRTY_TRACKING_ENABLE : 0,
?
Thanks,
C.
> + };
> +
> + ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
> + trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno : 0);
> + if (ret) {
> + error_setg_errno(errp, errno,
> + "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
> + hwpt_id);
> + return false;
> + }
> +
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index edc8f97d8f3d..da678315faeb 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -110,6 +110,42 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
> iommufd_backend_disconnect(vbasedev->iommufd);
> }
>
> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
> +{
> + return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> +}
> +
> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
> + bool start, Error **errp)
> +{
> + const VFIOIOMMUFDContainer *container =
> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> + VFIOIOASHwpt *hwpt;
> +
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> +
> + if (!iommufd_backend_set_dirty_tracking(container->be,
> + hwpt->hwpt_id, start, errp)) {
> + goto err;
> + }
> + }
> +
> + return 0;
> +
> +err:
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> + iommufd_backend_set_dirty_tracking(container->be,
> + hwpt->hwpt_id, !start, NULL);
> + }
> + return -EINVAL;
> +}
> +
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
> @@ -278,7 +314,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> container->bcontainer.dirty_pages_supported |=
> - (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
> + iommufd_hwpt_dirty_tracking(hwpt);
> return true;
> }
>
> @@ -717,6 +753,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->attach_device = iommufd_cdev_attach;
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> + vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> diff --git a/backends/trace-events b/backends/trace-events
> index 4d8ac02fe7d6..28aca3b859d4 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> +iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-12 11:47 ` [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
2024-07-16 12:24 ` Cédric Le Goater
@ 2024-07-17 2:24 ` Duan, Zhenzhong
2024-07-17 9:14 ` Joao Martins
2024-07-17 12:36 ` Eric Auger
2 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 2:24 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v4 09/12] vfio/iommufd: Implement
>VFIOIOMMUClass::set_dirty_tracking support
>
>ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
>enables or disables dirty page tracking. It is used if the hwpt
>has been created with dirty tracking supported domain (stored in
>hwpt::flags) and it is called on the whole list of iommu domains
>it is are tracking. On failure it rolls it back.
>
>The checking of hwpt::flags is introduced here as a second user
>and thus consolidate such check into a helper function
>iommufd_hwpt_dirty_tracking().
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/sysemu/iommufd.h | 3 +++
> backends/iommufd.c | 23 +++++++++++++++++++++++
> hw/vfio/iommufd.c | 39
>++++++++++++++++++++++++++++++++++++++-
> backends/trace-events | 1 +
> 4 files changed, 65 insertions(+), 1 deletion(-)
>
>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index e917e7591d05..7416d9219703 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -55,6 +55,9 @@ bool
>iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> uint32_t data_type, uint32_t data_len,
> void *data_ptr, uint32_t *out_hwpt,
> Error **errp);
>+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>uint32_t hwpt_id,
>+ bool start, Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
>+
> #endif
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index 41a9dec3b2c5..239f0976e0ad 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -239,6 +239,29 @@ bool
>iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> return true;
> }
>
>+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>+ uint32_t hwpt_id, bool start,
>+ Error **errp)
>+{
>+ int ret;
>+ struct iommu_hwpt_set_dirty_tracking set_dirty = {
>+ .size = sizeof(set_dirty),
>+ .hwpt_id = hwpt_id,
>+ .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
>+ };
>+
>+ ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
>+ trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno :
>0);
>+ if (ret) {
>+ error_setg_errno(errp, errno,
>+ "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
>+ hwpt_id);
>+ return false;
>+ }
>+
>+ return true;
>+}
>+
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index edc8f97d8f3d..da678315faeb 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -110,6 +110,42 @@ static void
>iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
> iommufd_backend_disconnect(vbasedev->iommufd);
> }
>
>+static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>+{
>+ return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>+}
>+
>+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase
>*bcontainer,
>+ bool start, Error **errp)
>+{
>+ const VFIOIOMMUFDContainer *container =
>+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>+ VFIOIOASHwpt *hwpt;
>+
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>+ continue;
>+ }
So the devices under an hwpt that doesn't support dirty tracking are bypassed.
Then how to track dirty pages coming from those devices?
Thanks
Zhenzhong
>+
>+ if (!iommufd_backend_set_dirty_tracking(container->be,
>+ hwpt->hwpt_id, start, errp)) {
>+ goto err;
>+ }
>+ }
>+
>+ return 0;
>+
>+err:
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>+ continue;
>+ }
>+ iommufd_backend_set_dirty_tracking(container->be,
>+ hwpt->hwpt_id, !start, NULL);
>+ }
>+ return -EINVAL;
>+}
>+
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
>@@ -278,7 +314,7 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> container->bcontainer.dirty_pages_supported |=
>- (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>+ iommufd_hwpt_dirty_tracking(hwpt);
> return true;
> }
>
>@@ -717,6 +753,7 @@ static void
>vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->attach_device = iommufd_cdev_attach;
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>+ vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void
>*opaque,
>diff --git a/backends/trace-events b/backends/trace-events
>index 4d8ac02fe7d6..28aca3b859d4 100644
>--- a/backends/trace-events
>+++ b/backends/trace-events
>@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd,
>uint32_t ioas, uint64_t iova, uint64_t si
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>(%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start,
>int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
>--
>2.17.2
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-17 2:24 ` Duan, Zhenzhong
@ 2024-07-17 9:14 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 9:14 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 03:24, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v4 09/12] vfio/iommufd: Implement
>> VFIOIOMMUClass::set_dirty_tracking support
>>
>> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
>> enables or disables dirty page tracking. It is used if the hwpt
>> has been created with dirty tracking supported domain (stored in
>> hwpt::flags) and it is called on the whole list of iommu domains
>> it is are tracking. On failure it rolls it back.
>>
>> The checking of hwpt::flags is introduced here as a second user
>> and thus consolidate such check into a helper function
>> iommufd_hwpt_dirty_tracking().
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/iommufd.h | 3 +++
>> backends/iommufd.c | 23 +++++++++++++++++++++++
>> hw/vfio/iommufd.c | 39
>> ++++++++++++++++++++++++++++++++++++++-
>> backends/trace-events | 1 +
>> 4 files changed, 65 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index e917e7591d05..7416d9219703 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -55,6 +55,9 @@ bool
>> iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> uint32_t data_type, uint32_t data_len,
>> void *data_ptr, uint32_t *out_hwpt,
>> Error **errp);
>> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>> uint32_t hwpt_id,
>> + bool start, Error **errp);
>>
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> +
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 41a9dec3b2c5..239f0976e0ad 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -239,6 +239,29 @@ bool
>> iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> return true;
>> }
>>
>> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>> + uint32_t hwpt_id, bool start,
>> + Error **errp)
>> +{
>> + int ret;
>> + struct iommu_hwpt_set_dirty_tracking set_dirty = {
>> + .size = sizeof(set_dirty),
>> + .hwpt_id = hwpt_id,
>> + .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
>> + };
>> +
>> + ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
>> + trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno :
>> 0);
>> + if (ret) {
>> + error_setg_errno(errp, errno,
>> + "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
>> + hwpt_id);
>> + return false;
>> + }
>> +
>> + return true;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>> devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index edc8f97d8f3d..da678315faeb 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -110,6 +110,42 @@ static void
>> iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>> iommufd_backend_disconnect(vbasedev->iommufd);
>> }
>>
>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>> +{
>> + return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> +}
>> +
>> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase
>> *bcontainer,
>> + bool start, Error **errp)
>> +{
>> + const VFIOIOMMUFDContainer *container =
>> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> + VFIOIOASHwpt *hwpt;
>> +
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>> + continue;
>> + }
>
> So the devices under an hwpt that doesn't support dirty tracking are bypassed.
> Then how to track dirty pages coming from those devices?
>
We don't support 'mixed mode' dirty tracking right now even before this series.
I plan on lifting that restriction as a follow up. So far I was thinking that to
make sure migration is blocked if neither VF nor IOMMU VF dirty tracking are
supported.
The reason is that the migration initialization of the VFIODevice needs to be
adjusted to be able to understand all the constraints that the IOMMU dirty
tracking is not requested when VF dirty tracking is in use, and vice-versa. Thus
making this check a lot more representative of the features it is using.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-12 11:47 ` [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
2024-07-16 12:24 ` Cédric Le Goater
2024-07-17 2:24 ` Duan, Zhenzhong
@ 2024-07-17 12:36 ` Eric Auger
2024-07-17 12:41 ` Joao Martins
2 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2024-07-17 12:36 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:47, Joao Martins wrote:
> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
> enables or disables dirty page tracking. It is used if the hwpt
> has been created with dirty tracking supported domain (stored in
> hwpt::flags) and it is called on the whole list of iommu domains
> it is are tracking. On failure it rolls it back.
it is are tracking ?
also please clearly state what is "it"
>
> The checking of hwpt::flags is introduced here as a second user
?? -> introduce iommufd_hwpt_dirty_tracking() helper to avoid code dup?
> and thus consolidate such check into a helper function
> iommufd_hwpt_dirty_tracking().
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/iommufd.h | 3 +++
> backends/iommufd.c | 23 +++++++++++++++++++++++
> hw/vfio/iommufd.c | 39 ++++++++++++++++++++++++++++++++++++++-
> backends/trace-events | 1 +
> 4 files changed, 65 insertions(+), 1 deletion(-)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index e917e7591d05..7416d9219703 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -55,6 +55,9 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> uint32_t data_type, uint32_t data_len,
> void *data_ptr, uint32_t *out_hwpt,
> Error **errp);
> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> + bool start, Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> +
spurious line change
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 41a9dec3b2c5..239f0976e0ad 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -239,6 +239,29 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> return true;
> }
>
> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
> + uint32_t hwpt_id, bool start,
> + Error **errp)
> +{
> + int ret;
> + struct iommu_hwpt_set_dirty_tracking set_dirty = {
> + .size = sizeof(set_dirty),
> + .hwpt_id = hwpt_id,
> + .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
> + };
> +
> + ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
> + trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno : 0);
> + if (ret) {
> + error_setg_errno(errp, errno,
> + "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
> + hwpt_id);
> + return false;
> + }
> +
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index edc8f97d8f3d..da678315faeb 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -110,6 +110,42 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
> iommufd_backend_disconnect(vbasedev->iommufd);
> }
>
> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
> +{
> + return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> +}
> +
> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
> + bool start, Error **errp)
> +{
> + const VFIOIOMMUFDContainer *container =
> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> + VFIOIOASHwpt *hwpt;
> +
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> +
> + if (!iommufd_backend_set_dirty_tracking(container->be,
> + hwpt->hwpt_id, start, errp)) {
> + goto err;
> + }
> + }
> +
> + return 0;
> +
> +err:
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> + iommufd_backend_set_dirty_tracking(container->be,
> + hwpt->hwpt_id, !start, NULL);
> + }
> + return -EINVAL;
> +}
> +
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
> @@ -278,7 +314,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> container->bcontainer.dirty_pages_supported |=
> - (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
> + iommufd_hwpt_dirty_tracking(hwpt);
> return true;
> }
>
> @@ -717,6 +753,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->attach_device = iommufd_cdev_attach;
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> + vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> diff --git a/backends/trace-events b/backends/trace-events
> index 4d8ac02fe7d6..28aca3b859d4 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> +iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-17 12:36 ` Eric Auger
@ 2024-07-17 12:41 ` Joao Martins
2024-07-17 13:34 ` Eric Auger
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 12:41 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 13:36, Eric Auger wrote:
>
>
> On 7/12/24 13:47, Joao Martins wrote:
>> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
>> enables or disables dirty page tracking. It is used if the hwpt
>> has been created with dirty tracking supported domain (stored in
>> hwpt::flags) and it is called on the whole list of iommu domains
>> it is are tracking. On failure it rolls it back.
>
> it is are tracking ?
*it is*
> also please clearly state what is "it"
>
sure
>>
>> The checking of hwpt::flags is introduced here as a second user
> ?? -> introduce iommufd_hwpt_dirty_tracking() helper to avoid code dup?
Right I am doing that already. Not sure what the problem is with this sentence?
Or you meant to use yours as it's simpler/easier to understand.
>> and thus consolidate such check into a helper function
>> iommufd_hwpt_dirty_tracking().
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/iommufd.h | 3 +++
>> backends/iommufd.c | 23 +++++++++++++++++++++++
>> hw/vfio/iommufd.c | 39 ++++++++++++++++++++++++++++++++++++++-
>> backends/trace-events | 1 +
>> 4 files changed, 65 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index e917e7591d05..7416d9219703 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -55,6 +55,9 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> uint32_t data_type, uint32_t data_len,
>> void *data_ptr, uint32_t *out_hwpt,
>> Error **errp);
>> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>> + bool start, Error **errp);
>>
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> +
> spurious line change
ok
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 41a9dec3b2c5..239f0976e0ad 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -239,6 +239,29 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> return true;
>> }
>>
>> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>> + uint32_t hwpt_id, bool start,
>> + Error **errp)
>> +{
>> + int ret;
>> + struct iommu_hwpt_set_dirty_tracking set_dirty = {
>> + .size = sizeof(set_dirty),
>> + .hwpt_id = hwpt_id,
>> + .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
>> + };
>> +
>> + ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
>> + trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno : 0);
>> + if (ret) {
>> + error_setg_errno(errp, errno,
>> + "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
>> + hwpt_id);
>> + return false;
>> + }
>> +
>> + return true;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index edc8f97d8f3d..da678315faeb 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -110,6 +110,42 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>> iommufd_backend_disconnect(vbasedev->iommufd);
>> }
>>
>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>> +{
>> + return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> +}
>> +
>> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
>> + bool start, Error **errp)
>> +{
>> + const VFIOIOMMUFDContainer *container =
>> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> + VFIOIOASHwpt *hwpt;
>> +
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>> + continue;
>> + }
>> +
>> + if (!iommufd_backend_set_dirty_tracking(container->be,
>> + hwpt->hwpt_id, start, errp)) {
>> + goto err;
>> + }
>> + }
>> +
>> + return 0;
>> +
>> +err:
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>> + continue;
>> + }
>> + iommufd_backend_set_dirty_tracking(container->be,
>> + hwpt->hwpt_id, !start, NULL);
>> + }
>> + return -EINVAL;
>> +}
>> +
>> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>> {
>> ERRP_GUARD();
>> @@ -278,7 +314,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> container->bcontainer.dirty_pages_supported |=
>> - (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>> + iommufd_hwpt_dirty_tracking(hwpt);
>> return true;
>> }
>>
>> @@ -717,6 +753,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
>> vioc->attach_device = iommufd_cdev_attach;
>> vioc->detach_device = iommufd_cdev_detach;
>> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>> + vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
>> };
>>
>> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 4d8ac02fe7d6..28aca3b859d4 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
>> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
>> +iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-17 12:41 ` Joao Martins
@ 2024-07-17 13:34 ` Eric Auger
2024-07-17 15:18 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2024-07-17 13:34 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/17/24 14:41, Joao Martins wrote:
> On 17/07/2024 13:36, Eric Auger wrote:
>>
>> On 7/12/24 13:47, Joao Martins wrote:
>>> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
>>> enables or disables dirty page tracking. It is used if the hwpt
>>> has been created with dirty tracking supported domain (stored in
>>> hwpt::flags) and it is called on the whole list of iommu domains
>>> it is are tracking. On failure it rolls it back.
>> it is are tracking ?
> *it is*
>
>> also please clearly state what is "it"
>>
> sure
>
>>> The checking of hwpt::flags is introduced here as a second user
>> ?? -> introduce iommufd_hwpt_dirty_tracking() helper to avoid code dup?
> Right I am doing that already. Not sure what the problem is with this sentence?
"The checking of hwpt::flags is introduced here as a second user" phrasing sounds weird to me.
I guess you just meant that you need to check
hwpt::flags in another place so better off introducing an helper.
>
> Or you meant to use yours as it's simpler/easier to understand.
>
>>> and thus consolidate such check into a helper function
>>> iommufd_hwpt_dirty_tracking().
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/sysemu/iommufd.h | 3 +++
>>> backends/iommufd.c | 23 +++++++++++++++++++++++
>>> hw/vfio/iommufd.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>> backends/trace-events | 1 +
>>> 4 files changed, 65 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index e917e7591d05..7416d9219703 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -55,6 +55,9 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>>> uint32_t data_type, uint32_t data_len,
>>> void *data_ptr, uint32_t *out_hwpt,
>>> Error **errp);
>>> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>>> + bool start, Error **errp);
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>> +
>> spurious line change
> ok
>
>>> #endif
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 41a9dec3b2c5..239f0976e0ad 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -239,6 +239,29 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>>> return true;
>>> }
>>>
>>> +bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>>> + uint32_t hwpt_id, bool start,
>>> + Error **errp)
>>> +{
>>> + int ret;
>>> + struct iommu_hwpt_set_dirty_tracking set_dirty = {
>>> + .size = sizeof(set_dirty),
>>> + .hwpt_id = hwpt_id,
>>> + .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
>>> + };
>>> +
>>> + ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
>>> + trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno : 0);
>>> + if (ret) {
>>> + error_setg_errno(errp, errno,
>>> + "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
>>> + hwpt_id);
>>> + return false;
>>> + }
>>> +
>>> + return true;
>>> +}
>>> +
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp)
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index edc8f97d8f3d..da678315faeb 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -110,6 +110,42 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>> }
>>>
>>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>> +{
>>> + return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> +}
>>> +
>>> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
>>> + bool start, Error **errp)
>>> +{
>>> + const VFIOIOMMUFDContainer *container =
>>> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>>> + VFIOIOASHwpt *hwpt;
>>> +
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>>> + continue;
>>> + }
>>> +
>>> + if (!iommufd_backend_set_dirty_tracking(container->be,
>>> + hwpt->hwpt_id, start, errp)) {
>>> + goto err;
>>> + }
>>> + }
>>> +
>>> + return 0;
>>> +
>>> +err:
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>>> + continue;
>>> + }
>>> + iommufd_backend_set_dirty_tracking(container->be,
>>> + hwpt->hwpt_id, !start, NULL);
>>> + }
>>> + return -EINVAL;
>>> +}
>>> +
>>> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>>> {
>>> ERRP_GUARD();
>>> @@ -278,7 +314,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> container->bcontainer.dirty_pages_supported |=
>>> - (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>>> + iommufd_hwpt_dirty_tracking(hwpt);
>>> return true;
>>> }
>>>
>>> @@ -717,6 +753,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
>>> vioc->attach_device = iommufd_cdev_attach;
>>> vioc->detach_device = iommufd_cdev_detach;
>>> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>>> + vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
>>> };
>>>
>>> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>> diff --git a/backends/trace-events b/backends/trace-events
>>> index 4d8ac02fe7d6..28aca3b859d4 100644
>>> --- a/backends/trace-events
>>> +++ b/backends/trace-events
>>> @@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
>>> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
>>> +iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-17 13:34 ` Eric Auger
@ 2024-07-17 15:18 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 15:18 UTC (permalink / raw)
To: eric.auger, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 14:34, Eric Auger wrote:
> On 7/17/24 14:41, Joao Martins wrote:
>> On 17/07/2024 13:36, Eric Auger wrote:
>>> On 7/12/24 13:47, Joao Martins wrote:
>>>> The checking of hwpt::flags is introduced here as a second user
>>> ?? -> introduce iommufd_hwpt_dirty_tracking() helper to avoid code dup?
>> Right I am doing that already. Not sure what the problem is with this sentence?
>
> "The checking of hwpt::flags is introduced here as a second user" phrasing sounds weird to me.
> I guess you just meant that you need to check
> hwpt::flags in another place so better off introducing an helper.
>
Exactly.
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (8 preceding siblings ...)
2024-07-12 11:47 ` [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-07-12 11:47 ` Joao Martins
2024-07-16 12:31 ` Cédric Le Goater
` (2 more replies)
2024-07-12 11:47 ` [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
` (2 subsequent siblings)
12 siblings, 3 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.
A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/iommufd.h | 4 ++++
backends/iommufd.c | 29 +++++++++++++++++++++++++++++
hw/vfio/iommufd.c | 27 +++++++++++++++++++++++++++
backends/trace-events | 1 +
4 files changed, 61 insertions(+)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 7416d9219703..869ca8b7ef59 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -57,6 +57,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
Error **errp);
bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
bool start, Error **errp);
+bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data,
+ Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 239f0976e0ad..46be719cae71 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -262,6 +262,35 @@ bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
return true;
}
+bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
+ uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data,
+ Error **errp)
+{
+ int ret;
+ struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
+ .size = sizeof(get_dirty_bitmap),
+ .hwpt_id = hwpt_id,
+ .iova = iova,
+ .length = size,
+ .page_size = page_size,
+ .data = (uintptr_t)data,
+ };
+
+ ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
+ trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
+ page_size, ret ? errno : 0);
+ if (ret) {
+ error_setg_errno(errp, errno,
+ "IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"HWADDR_PRIx
+ " size: 0x%"HWADDR_PRIx") failed", iova, size);
+ return false;
+ }
+
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index da678315faeb..1fd1558fa0c0 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
#include "qemu/cutils.h"
#include "qemu/chardev_open.h"
#include "pci.h"
+#include "exec/ram_addr.h"
static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly)
@@ -146,6 +147,31 @@ err:
return -EINVAL;
}
+static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
+ VFIOBitmap *vbmap, hwaddr iova,
+ hwaddr size, Error **errp)
+{
+ VFIOIOMMUFDContainer *container = container_of(bcontainer,
+ VFIOIOMMUFDContainer,
+ bcontainer);
+ unsigned long page_size = qemu_real_host_page_size();
+ VFIOIOASHwpt *hwpt;
+
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+
+ if (!iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
+ iova, size, page_size,
+ vbmap->bitmap, errp)) {
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
{
ERRP_GUARD();
@@ -754,6 +780,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
+ vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
};
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/backends/trace-events b/backends/trace-events
index 28aca3b859d4..40811a316215 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
+iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-12 11:47 ` [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-07-16 12:31 ` Cédric Le Goater
2024-07-16 12:53 ` Cédric Le Goater
2024-07-17 12:50 ` Eric Auger
2 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 12:31 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:47, Joao Martins wrote:
> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
> that fetches the bitmap that tells what was dirty in an IOVA
> range.
>
> A single bitmap is allocated and used across all the hwpts
> sharing an IOAS which is then used in log_sync() to set Qemu
> global bitmaps.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/sysemu/iommufd.h | 4 ++++
> backends/iommufd.c | 29 +++++++++++++++++++++++++++++
> hw/vfio/iommufd.c | 27 +++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 4 files changed, 61 insertions(+)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 7416d9219703..869ca8b7ef59 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -57,6 +57,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> Error **errp);
> bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> bool start, Error **errp);
> +bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data,
> + Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 239f0976e0ad..46be719cae71 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -262,6 +262,35 @@ bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
> return true;
> }
>
> +bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
> + uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data,
> + Error **errp)
> +{
> + int ret;
> + struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
> + .size = sizeof(get_dirty_bitmap),
> + .hwpt_id = hwpt_id,
> + .iova = iova,
> + .length = size,
> + .page_size = page_size,
> + .data = (uintptr_t)data,
> + };
> +
> + ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
> + trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
> + page_size, ret ? errno : 0);
> + if (ret) {
> + error_setg_errno(errp, errno,
> + "IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"HWADDR_PRIx
> + " size: 0x%"HWADDR_PRIx") failed", iova, size);
> + return false;
> + }
> +
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index da678315faeb..1fd1558fa0c0 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -25,6 +25,7 @@
> #include "qemu/cutils.h"
> #include "qemu/chardev_open.h"
> #include "pci.h"
> +#include "exec/ram_addr.h"
>
> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> @@ -146,6 +147,31 @@ err:
> return -EINVAL;
> }
>
> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> + VFIOBitmap *vbmap, hwaddr iova,
> + hwaddr size, Error **errp)
> +{
> + VFIOIOMMUFDContainer *container = container_of(bcontainer,
> + VFIOIOMMUFDContainer,
> + bcontainer);
> + unsigned long page_size = qemu_real_host_page_size();
> + VFIOIOASHwpt *hwpt;
> +
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> +
> + if (!iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
> + iova, size, page_size,
> + vbmap->bitmap, errp)) {
> + return -EINVAL;
> + }
> + }
> +
> + return 0;
> +}
> +
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
> @@ -754,6 +780,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> + vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> diff --git a/backends/trace-events b/backends/trace-events
> index 28aca3b859d4..40811a316215 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-12 11:47 ` [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
2024-07-16 12:31 ` Cédric Le Goater
@ 2024-07-16 12:53 ` Cédric Le Goater
2024-07-17 12:50 ` Eric Auger
2 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2024-07-16 12:53 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:47, Joao Martins wrote:
> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
> that fetches the bitmap that tells what was dirty in an IOVA
> range.
>
> A single bitmap is allocated and used across all the hwpts
> sharing an IOAS which is then used in log_sync() to set Qemu
> global bitmaps.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/iommufd.h | 4 ++++
> backends/iommufd.c | 29 +++++++++++++++++++++++++++++
> hw/vfio/iommufd.c | 27 +++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 4 files changed, 61 insertions(+)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 7416d9219703..869ca8b7ef59 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -57,6 +57,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> Error **errp);
> bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> bool start, Error **errp);
> +bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data,
> + Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 239f0976e0ad..46be719cae71 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -262,6 +262,35 @@ bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
> return true;
> }
>
> +bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
> + uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data,
> + Error **errp)
> +{
> + int ret;
> + struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
> + .size = sizeof(get_dirty_bitmap),
> + .hwpt_id = hwpt_id,
> + .iova = iova,
> + .length = size,
> + .page_size = page_size,
> + .data = (uintptr_t)data,
> + };
> +
> + ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
> + trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
> + page_size, ret ? errno : 0);
> + if (ret) {
> + error_setg_errno(errp, errno,
> + "IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"HWADDR_PRIx
> + " size: 0x%"HWADDR_PRIx") failed", iova, size);
format should be:
" size: 0x"RAM_ADDR_FMT") failed", iova, size);
> + return false;
> + }
> +
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index da678315faeb..1fd1558fa0c0 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -25,6 +25,7 @@
> #include "qemu/cutils.h"
> #include "qemu/chardev_open.h"
> #include "pci.h"
> +#include "exec/ram_addr.h"
>
> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> @@ -146,6 +147,31 @@ err:
> return -EINVAL;
> }
>
> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> + VFIOBitmap *vbmap, hwaddr iova,
> + hwaddr size, Error **errp)
> +{
> + VFIOIOMMUFDContainer *container = container_of(bcontainer,
> + VFIOIOMMUFDContainer,
> + bcontainer);
> + unsigned long page_size = qemu_real_host_page_size();
> + VFIOIOASHwpt *hwpt;
> +
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> +
> + if (!iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
> + iova, size, page_size,
> + vbmap->bitmap, errp)) {
vbmap->bitmap needs a cast.
Thanks,
C.
> + return -EINVAL;
> + }
> + }
> +
> + return 0;
> +}
> +
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
> @@ -754,6 +780,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> + vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> diff --git a/backends/trace-events b/backends/trace-events
> index 28aca3b859d4..40811a316215 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-12 11:47 ` [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
2024-07-16 12:31 ` Cédric Le Goater
2024-07-16 12:53 ` Cédric Le Goater
@ 2024-07-17 12:50 ` Eric Auger
2 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-17 12:50 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:47, Joao Martins wrote:
> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
> that fetches the bitmap that tells what was dirty in an IOVA
> range.
>
> A single bitmap is allocated and used across all the hwpts
> sharing an IOAS which is then used in log_sync() to set Qemu
> global bitmaps.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/iommufd.h | 4 ++++
> backends/iommufd.c | 29 +++++++++++++++++++++++++++++
> hw/vfio/iommufd.c | 27 +++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 4 files changed, 61 insertions(+)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 7416d9219703..869ca8b7ef59 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -57,6 +57,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> Error **errp);
> bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> bool start, Error **errp);
> +bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data,
> + Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 239f0976e0ad..46be719cae71 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -262,6 +262,35 @@ bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
> return true;
> }
>
> +bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
> + uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data,
> + Error **errp)
> +{
> + int ret;
> + struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
> + .size = sizeof(get_dirty_bitmap),
> + .hwpt_id = hwpt_id,
> + .iova = iova,
> + .length = size,
> + .page_size = page_size,
> + .data = (uintptr_t)data,
> + };
> +
> + ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
> + trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
> + page_size, ret ? errno : 0);
> + if (ret) {
> + error_setg_errno(errp, errno,
> + "IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"HWADDR_PRIx
> + " size: 0x%"HWADDR_PRIx") failed", iova, size);
> + return false;
> + }
> +
> + return true;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index da678315faeb..1fd1558fa0c0 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -25,6 +25,7 @@
> #include "qemu/cutils.h"
> #include "qemu/chardev_open.h"
> #include "pci.h"
> +#include "exec/ram_addr.h"
>
> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> @@ -146,6 +147,31 @@ err:
> return -EINVAL;
> }
>
> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> + VFIOBitmap *vbmap, hwaddr iova,
> + hwaddr size, Error **errp)
> +{
> + VFIOIOMMUFDContainer *container = container_of(bcontainer,
> + VFIOIOMMUFDContainer,
> + bcontainer);
> + unsigned long page_size = qemu_real_host_page_size();
> + VFIOIOASHwpt *hwpt;
> +
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> +
> + if (!iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
> + iova, size, page_size,
> + vbmap->bitmap, errp)) {
> + return -EINVAL;
> + }
> + }
> +
> + return 0;
> +}
> +
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
> @@ -754,6 +780,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> + vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> diff --git a/backends/trace-events b/backends/trace-events
> index 28aca3b859d4..40811a316215 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (9 preceding siblings ...)
2024-07-12 11:47 ` [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-07-12 11:47 ` Joao Martins
2024-07-17 2:38 ` Duan, Zhenzhong
2024-07-17 12:57 ` Eric Auger
2024-07-12 11:47 ` [PATCH v4 12/12] vfio/common: Allow disabling device dirty page tracking Joao Martins
2024-07-16 8:20 ` [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Duan, Zhenzhong
12 siblings, 2 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.
For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.
So starting with IOMMU dirty tracking it can use to acomodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those too.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
hw/vfio/migration.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 34d4be2ce1b1..ce3d1b6e9a25 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported) {
+ if (!vbasedev->dirty_pages_supported &&
+ !vbasedev->bcontainer->dirty_pages_supported) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
"%s: VFIO device doesn't support device dirty tracking",
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* RE: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-12 11:47 ` [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-07-17 2:38 ` Duan, Zhenzhong
2024-07-17 9:20 ` Joao Martins
2024-07-17 12:57 ` Eric Auger
1 sibling, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-17 2:38 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty
>tracking is unsupported
>
>By default VFIO migration is set to auto, which will support live
>migration if the migration capability is set *and* also dirty page
>tracking is supported.
>
>For testing purposes one can force enable without dirty page tracking
>via enable-migration=on, but that option is generally left for testing
>purposes.
>
>So starting with IOMMU dirty tracking it can use to acomodate the lack of
>VF dirty page tracking allowing us to minimize the VF requirements for
>migration and thus enabling migration by default for those too.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> hw/vfio/migration.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>index 34d4be2ce1b1..ce3d1b6e9a25 100644
>--- a/hw/vfio/migration.c
>+++ b/hw/vfio/migration.c
>@@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice
>*vbasedev, Error **errp)
> return !vfio_block_migration(vbasedev, err, errp);
> }
>
>- if (!vbasedev->dirty_pages_supported) {
>+ if (!vbasedev->dirty_pages_supported &&
>+ !vbasedev->bcontainer->dirty_pages_supported) {
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> "%s: VFIO device doesn't support device dirty tracking",
I'm not sure if this message needs to be updated, " VFIO device doesn't support device and IOMMU dirty tracking"
Same for the below:
warn_report("%s: VFIO device doesn't support device dirty tracking"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-17 2:38 ` Duan, Zhenzhong
@ 2024-07-17 9:20 ` Joao Martins
2024-07-17 15:35 ` Joao Martins
2024-07-18 7:20 ` Duan, Zhenzhong
0 siblings, 2 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 9:20 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 03:38, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty
>> tracking is unsupported
>>
>> By default VFIO migration is set to auto, which will support live
>> migration if the migration capability is set *and* also dirty page
>> tracking is supported.
>>
>> For testing purposes one can force enable without dirty page tracking
>> via enable-migration=on, but that option is generally left for testing
>> purposes.
>>
>> So starting with IOMMU dirty tracking it can use to acomodate the lack of
>> VF dirty page tracking allowing us to minimize the VF requirements for
>> migration and thus enabling migration by default for those too.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> hw/vfio/migration.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 34d4be2ce1b1..ce3d1b6e9a25 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice
>> *vbasedev, Error **errp)
>> return !vfio_block_migration(vbasedev, err, errp);
>> }
>>
>> - if (!vbasedev->dirty_pages_supported) {
>> + if (!vbasedev->dirty_pages_supported &&
>> + !vbasedev->bcontainer->dirty_pages_supported) {
>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>> error_setg(&err,
>> "%s: VFIO device doesn't support device dirty tracking",
>
> I'm not sure if this message needs to be updated, " VFIO device doesn't support device and IOMMU dirty tracking"
>
> Same for the below:
>
> warn_report("%s: VFIO device doesn't support device dirty tracking"
Ah yes, good catch. Additionally I think I should check device hwpt rather than
container::dirty_pages_supported i.e.
if (!vbasedev->dirty_pages_supported &&
(vbasedev->hwpt && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)))
This makes sure that migration is blocked with more accuracy
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-17 9:20 ` Joao Martins
@ 2024-07-17 15:35 ` Joao Martins
2024-07-17 16:02 ` Joao Martins
2024-07-18 7:20 ` Duan, Zhenzhong
1 sibling, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 15:35 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 10:20, Joao Martins wrote:
> On 17/07/2024 03:38, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty
>>> tracking is unsupported
>>>
>>> By default VFIO migration is set to auto, which will support live
>>> migration if the migration capability is set *and* also dirty page
>>> tracking is supported.
>>>
>>> For testing purposes one can force enable without dirty page tracking
>>> via enable-migration=on, but that option is generally left for testing
>>> purposes.
>>>
>>> So starting with IOMMU dirty tracking it can use to acomodate the lack of
>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>> migration and thus enabling migration by default for those too.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> hw/vfio/migration.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 34d4be2ce1b1..ce3d1b6e9a25 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice
>>> *vbasedev, Error **errp)
>>> return !vfio_block_migration(vbasedev, err, errp);
>>> }
>>>
>>> - if (!vbasedev->dirty_pages_supported) {
>>> + if (!vbasedev->dirty_pages_supported &&
>>> + !vbasedev->bcontainer->dirty_pages_supported) {
>>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>>> error_setg(&err,
>>> "%s: VFIO device doesn't support device dirty tracking",
>>
>> I'm not sure if this message needs to be updated, " VFIO device doesn't support device and IOMMU dirty tracking"
>>
>> Same for the below:
>>
>> warn_report("%s: VFIO device doesn't support device dirty tracking"
>
>
> Ah yes, good catch. Additionally I think I should check device hwpt rather than
> container::dirty_pages_supported i.e.
>
> if (!vbasedev->dirty_pages_supported &&
> (vbasedev->hwpt && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)))
>
> This makes sure that migration is blocked with more accuracy
I retract this comment as I think it can all be easily detected by not OR-ing
the setting of vbasedev->bcontainer->dirty_pages_supported. I should put a
warn_report_once() there.
Joao
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-17 15:35 ` Joao Martins
@ 2024-07-17 16:02 ` Joao Martins
2024-07-17 16:54 ` Joao Martins
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-17 16:02 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 16:35, Joao Martins wrote:
> On 17/07/2024 10:20, Joao Martins wrote:
>> On 17/07/2024 03:38, Duan, Zhenzhong wrote:
>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>> index 34d4be2ce1b1..ce3d1b6e9a25 100644
>>>> --- a/hw/vfio/migration.c
>>>> +++ b/hw/vfio/migration.c
>>>> @@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice
>>>> *vbasedev, Error **errp)
>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>> }
>>>>
>>>> - if (!vbasedev->dirty_pages_supported) {
>>>> + if (!vbasedev->dirty_pages_supported &&
>>>> + !vbasedev->bcontainer->dirty_pages_supported) {
>>>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>>>> error_setg(&err,
>>>> "%s: VFIO device doesn't support device dirty tracking",
>>>
>>> I'm not sure if this message needs to be updated, " VFIO device doesn't support device and IOMMU dirty tracking"
>>>
>>> Same for the below:
>>>
>>> warn_report("%s: VFIO device doesn't support device dirty tracking"
>>
>>
>> Ah yes, good catch. Additionally I think I should check device hwpt rather than
>> container::dirty_pages_supported i.e.
>>
>> if (!vbasedev->dirty_pages_supported &&
>> (vbasedev->hwpt && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)))
>>
>> This makes sure that migration is blocked with more accuracy
>
> I retract this comment as I think it can all be easily detected by not OR-ing
> the setting of vbasedev->bcontainer->dirty_pages_supported. I should put a
> warn_report_once() there.
Something like this below.
To be clear: this is mostly a safe guard against a theoretic case that we don't
know it exists. For example on x86, this is homogeneous and I suspect server ARM
to be the case too. embedded ARM might be different as there's so many
incantations of it.
@@ -267,6 +282,13 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
vbasedev->hwpt = hwpt;
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+
+ if (container->bcontainer.dirty_pages_supported &&
+ !iommufd_hwpt_dirty_tracking(hwpt)) {
+ warn_report("%s: IOMMU dirty tracking not supported\n", vbasedev->name);
+ }
+ container->bcontainer.dirty_pages_supported =
+ iommufd_hwpt_dirty_tracking(hwpt);
return true;
}
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-17 16:02 ` Joao Martins
@ 2024-07-17 16:54 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-17 16:54 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 17/07/2024 17:02, Joao Martins wrote:
> On 17/07/2024 16:35, Joao Martins wrote:
>> On 17/07/2024 10:20, Joao Martins wrote:
>>> On 17/07/2024 03:38, Duan, Zhenzhong wrote:
>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>> index 34d4be2ce1b1..ce3d1b6e9a25 100644
>>>>> --- a/hw/vfio/migration.c
>>>>> +++ b/hw/vfio/migration.c
>>>>> @@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice
>>>>> *vbasedev, Error **errp)
>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>> }
>>>>>
>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>> + !vbasedev->bcontainer->dirty_pages_supported) {
>>>>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>>>>> error_setg(&err,
>>>>> "%s: VFIO device doesn't support device dirty tracking",
>>>>
>>>> I'm not sure if this message needs to be updated, " VFIO device doesn't support device and IOMMU dirty tracking"
>>>>
>>>> Same for the below:
>>>>
>>>> warn_report("%s: VFIO device doesn't support device dirty tracking"
>>>
>>>
>>> Ah yes, good catch. Additionally I think I should check device hwpt rather than
>>> container::dirty_pages_supported i.e.
>>>
>>> if (!vbasedev->dirty_pages_supported &&
>>> (vbasedev->hwpt && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)))
>>>
>>> This makes sure that migration is blocked with more accuracy
>>
>> I retract this comment as I think it can all be easily detected by not OR-ing
>> the setting of vbasedev->bcontainer->dirty_pages_supported. I should put a
>> warn_report_once() there.
>
> Something like this below.
>
> To be clear: this is mostly a safe guard against a theoretic case that we don't
> know it exists. For example on x86, this is homogeneous and I suspect server ARM
> to be the case too. embedded ARM might be different as there's so many
> incantations of it.
>
Except that it won't work with hotplug :( so the previous snip was actually a
bit better.
> @@ -267,6 +282,13 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> vbasedev->hwpt = hwpt;
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> +
> + if (container->bcontainer.dirty_pages_supported &&
> + !iommufd_hwpt_dirty_tracking(hwpt)) {
> + warn_report("%s: IOMMU dirty tracking not supported\n", vbasedev->name);
> + }
> + container->bcontainer.dirty_pages_supported =
> + iommufd_hwpt_dirty_tracking(hwpt);
> return true;
> }
>
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-17 9:20 ` Joao Martins
2024-07-17 15:35 ` Joao Martins
@ 2024-07-18 7:20 ` Duan, Zhenzhong
2024-07-18 9:05 ` Joao Martins
1 sibling, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-18 7:20 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 11/12] vfio/migration: Don't block migration device
>dirty tracking is unsupported
>
>On 17/07/2024 03:38, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v4 11/12] vfio/migration: Don't block migration device
>dirty
>>> tracking is unsupported
>>>
>>> By default VFIO migration is set to auto, which will support live
>>> migration if the migration capability is set *and* also dirty page
>>> tracking is supported.
>>>
>>> For testing purposes one can force enable without dirty page tracking
>>> via enable-migration=on, but that option is generally left for testing
>>> purposes.
>>>
>>> So starting with IOMMU dirty tracking it can use to acomodate the lack of
>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>> migration and thus enabling migration by default for those too.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> hw/vfio/migration.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 34d4be2ce1b1..ce3d1b6e9a25 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice
>>> *vbasedev, Error **errp)
>>> return !vfio_block_migration(vbasedev, err, errp);
>>> }
>>>
>>> - if (!vbasedev->dirty_pages_supported) {
>>> + if (!vbasedev->dirty_pages_supported &&
>>> + !vbasedev->bcontainer->dirty_pages_supported) {
>>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>>> error_setg(&err,
>>> "%s: VFIO device doesn't support device dirty tracking",
>>
>> I'm not sure if this message needs to be updated, " VFIO device doesn't
>support device and IOMMU dirty tracking"
>>
>> Same for the below:
>>
>> warn_report("%s: VFIO device doesn't support device dirty tracking"
>
>
>Ah yes, good catch. Additionally I think I should check device hwpt rather
>than
>container::dirty_pages_supported i.e.
>
>if (!vbasedev->dirty_pages_supported &&
> (vbasedev->hwpt && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)))
>
>This makes sure that migration is blocked with more accuracy
Yes, this is better. Looks bcontainer->dirty_pages_supported is not as accurate as in legacy VFIO days.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-18 7:20 ` Duan, Zhenzhong
@ 2024-07-18 9:05 ` Joao Martins
0 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-18 9:05 UTC (permalink / raw)
To: Duan, Zhenzhong
Cc: qemu-devel@nongnu.org, Liu, Yi L, Eric Auger, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon
On 18/07/2024 08:20, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v4 11/12] vfio/migration: Don't block migration device
>> dirty tracking is unsupported
>>
>> On 17/07/2024 03:38, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Subject: [PATCH v4 11/12] vfio/migration: Don't block migration device
>> dirty
>>>> tracking is unsupported
>>>>
>>>> By default VFIO migration is set to auto, which will support live
>>>> migration if the migration capability is set *and* also dirty page
>>>> tracking is supported.
>>>>
>>>> For testing purposes one can force enable without dirty page tracking
>>>> via enable-migration=on, but that option is generally left for testing
>>>> purposes.
>>>>
>>>> So starting with IOMMU dirty tracking it can use to acomodate the lack of
>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>> migration and thus enabling migration by default for those too.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> hw/vfio/migration.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>> index 34d4be2ce1b1..ce3d1b6e9a25 100644
>>>> --- a/hw/vfio/migration.c
>>>> +++ b/hw/vfio/migration.c
>>>> @@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice
>>>> *vbasedev, Error **errp)
>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>> }
>>>>
>>>> - if (!vbasedev->dirty_pages_supported) {
>>>> + if (!vbasedev->dirty_pages_supported &&
>>>> + !vbasedev->bcontainer->dirty_pages_supported) {
>>>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>>>> error_setg(&err,
>>>> "%s: VFIO device doesn't support device dirty tracking",
>>>
>>> I'm not sure if this message needs to be updated, " VFIO device doesn't
>> support device and IOMMU dirty tracking"
>>>
>>> Same for the below:
>>>
>>> warn_report("%s: VFIO device doesn't support device dirty tracking"
>>
>>
>> Ah yes, good catch. Additionally I think I should check device hwpt rather
>> than
>> container::dirty_pages_supported i.e.
>>
>> if (!vbasedev->dirty_pages_supported &&
>> (vbasedev->hwpt && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)))
>>
>> This makes sure that migration is blocked with more accuracy
>
> Yes, this is better. Looks bcontainer->dirty_pages_supported is not as accurate as in legacy VFIO days.
>
Heh, That's just because legacy is always marking true (and marking anything as
dirty) regardless of what the hardware does :)
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-12 11:47 ` [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
2024-07-17 2:38 ` Duan, Zhenzhong
@ 2024-07-17 12:57 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2024-07-17 12:57 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/12/24 13:47, Joao Martins wrote:
> By default VFIO migration is set to auto, which will support live
> migration if the migration capability is set *and* also dirty page
> tracking is supported.
>
> For testing purposes one can force enable without dirty page tracking
> via enable-migration=on, but that option is generally left for testing
> purposes.
>
> So starting with IOMMU dirty tracking it can use to acomodate the lack of
accomodate
Eric
> VF dirty page tracking allowing us to minimize the VF requirements for
> migration and thus enabling migration by default for those too.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> hw/vfio/migration.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 34d4be2ce1b1..ce3d1b6e9a25 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> return !vfio_block_migration(vbasedev, err, errp);
> }
>
> - if (!vbasedev->dirty_pages_supported) {
> + if (!vbasedev->dirty_pages_supported &&
> + !vbasedev->bcontainer->dirty_pages_supported) {
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> "%s: VFIO device doesn't support device dirty tracking",
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v4 12/12] vfio/common: Allow disabling device dirty page tracking
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (10 preceding siblings ...)
2024-07-12 11:47 ` [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-07-12 11:47 ` Joao Martins
2024-07-16 8:20 ` [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Duan, Zhenzhong
12 siblings, 0 replies; 82+ messages in thread
From: Joao Martins @ 2024-07-12 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.
Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/common.c | 3 +++
hw/vfio/migration.c | 3 ++-
hw/vfio/pci.c | 3 +++
4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 760f31d84ac8..0ed4ebfb4696 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -138,6 +138,7 @@ typedef struct VFIODevice {
VFIOMigration *migration;
Error *migration_blocker;
OnOffAuto pre_copy_dirty_page_tracking;
+ OnOffAuto device_dirty_page_tracking;
bool dirty_pages_supported;
bool dirty_tracking;
HostIOMMUDevice *hiod;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cc14f0e3fe24..070a4a2df020 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -199,6 +199,9 @@ bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
VFIODevice *vbasedev;
QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
+ if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+ return false;
+ }
if (!vbasedev->dirty_pages_supported) {
return false;
}
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ce3d1b6e9a25..9b41db367629 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported &&
+ if ((!vbasedev->dirty_pages_supported ||
+ vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
!vbasedev->bcontainer->dirty_pages_supported) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3fc72e898a25..5794d72bd3ab 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3361,6 +3361,9 @@ static Property vfio_pci_dev_properties[] = {
DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
vbasedev.pre_copy_dirty_page_tracking,
ON_OFF_AUTO_ON),
+ DEFINE_PROP_ON_OFF_AUTO("x-device-dirty-page-tracking", VFIOPCIDevice,
+ vbasedev.device_dirty_page_tracking,
+ ON_OFF_AUTO_ON),
DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
display, ON_OFF_AUTO_OFF),
DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
--
2.17.2
^ permalink raw reply related [flat|nested] 82+ messages in thread
* RE: [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (11 preceding siblings ...)
2024-07-12 11:47 ` [PATCH v4 12/12] vfio/common: Allow disabling device dirty page tracking Joao Martins
@ 2024-07-16 8:20 ` Duan, Zhenzhong
2024-07-16 9:22 ` Joao Martins
12 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-16 8:20 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking
>
>This small series adds support for IOMMU dirty tracking support via the
>IOMMUFD backend. The hardware capability is available on most recent x86
>hardware. The series is divided organized as follows:
>
>* Patch 1-2: Fixes a regression into mdev support with IOMMUFD. This
> one is independent of the series but happened to cross it
> while testing mdev with this series
I guess VFIO ap/ccw may need fixes too.
Will you help on that or I can take it if you want to focus on dirty tracking.
The fix may be trivial, just assign VFIODevice->mdev = true.
Thanks
Zhenzhong
>
>* Patch 3: Adds a support to iommufd_get_device_info() for capabilities
>
>* Patches 4 - 10: IOMMUFD backend support for dirty tracking;
>
>Introduce auto domains -- Patch 5 goes into more detail, but the gist is that
>we will find and attach a device to a compatible IOMMU domain, or allocate
>a new
>hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards
>the
>workflow is relatively simple:
>
>1) Probe device and allow dirty tracking in the HWPT
>2) Toggling dirty tracking on/off
>3) Read-and-clear of Dirty IOVAs
>
>The heuristics selected for (1) were to always request the HWPT for
>dirty tracking if supported, or rely on device dirty page tracking. This
>is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>tracking even if we ask during hwpt allocation.
>
>The unmap case is deferred until further vIOMMU support with migration
>is added[3] which will then introduce the usage of
>IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl
>in the
>dma unmap bitmap flow.
>
>* Patches 11-12: Don't block live migration where there's no VF dirty
>tracker, considering that we have IOMMU dirty tracking.
>
>Comments and feedback appreciated. Thanks for all the review thus far!
>
>Cheers,
> Joao
>
>P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
>some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>
>Changes since v3[5]:
>* Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check
>if the VFIO
> device is mdev. (Zhenzhong)
>* Skip setting IOMMU device for mdev (Zhenzhong)
>* Add Zhenzhong review tag in patch 3
>* Utilize vbasedev::bcontainer::dirty_pages_supported instead of
>introducing
> a new HostIOMMUDevice capability and thus remove the cap patch from
>the series (Zhenzhong)
>* Move the HostIOMMUDevice::realize() to be part of VFIODevice
>initialization in attach_device()
>while skipping it all together for mdev. (Cedric)
>* Due to the previous item, had to remove aw_bits because it depends on
>device attach being
>finished, instead defer it to when get_cap() gets called.
>* Skip auto domains for mdev instead of purposedly erroring out
>(Zhenzhong)
>* Pass errp in all cases, and instead just free the error in case of -EINVAL
> in most of all patches, and also pass Error* in
>iommufd_backend_alloc_hwpt() amd
> set/query dirty. This is made better thanks in part to skipping auto domains
>for mdev (Cedric)
>
>Changes since RFCv2[4]:
>* Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even
>if
>we end up not actually toggling dirty tracking. (Avihai)
>* Fix error handling widely in auto domains logic and all patches (Avihai)
>* Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>* New patches 1 and 2 taking into consideration previous comments.
>* Store hwpt::flags to know if we have dirty tracking (Avihai)
>* New patch 8, that allows to query dirty tracking support after
>provisioning. This is a cleaner way to check IOMMU dirty tracking support
>when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>device caps way is still used because at vfio attach we aren't yet with
>a fully initialized migration state.
>* Adopt error propagation in query,set dirty tracking
>* Misc improvements overall broadly and Avihai
>* Drop hugepages as it's a bit unrelated; I can pursue that patch
>* separately. The main motivation is to provide a way to test
>without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
>does.
>
>Changes since RFCv1[2]:
>* Remove intel/amd dirty tracking emulation enabling
>* Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>[Will pursue these two in separate series]
>* Introduce auto domains support
>* Enforce dirty tracking following the IOMMUFD UAPI for this
>* Add support for toggling hugepages in IOMMUFD
>* Auto enable support when VF supports migration to use IOMMU
>when it doesn't have VF dirty tracking
>* Add a parameter to toggle VF dirty tracking
>
>[0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-
>zhenzhong.duan@intel.com/
>[1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-
>zhenzhong.duan@intel.com/
>[2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-
>joao.m.martins@oracle.com/
>[3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-
>joao.m.martins@oracle.com/
>[4] https://lore.kernel.org/qemu-devel/20240212135643.5858-1-
>joao.m.martins@oracle.com/
>[5] https://lore.kernel.org/qemu-devel/20240708143420.16953-1-
>joao.m.martins@oracle.com/
>
>Joao Martins (12):
> vfio/pci: Extract mdev check into an helper
> vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
> backends/iommufd: Extend iommufd_backend_get_device_info() to fetch
>HW
> capabilities
> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
> vfio/iommufd: Introduce auto domain creation
> vfio/{iommufd,container}: Remove caps::aw_bits
> vfio/{iommufd,container}: Initialize HostIOMMUDeviceCaps during
> attach_device()
> vfio/iommufd: Probe and request hwpt dirty tracking capability
> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
> vfio/migration: Don't block migration device dirty tracking is
> unsupported
> vfio/common: Allow disabling device dirty page tracking
>
> include/hw/vfio/vfio-common.h | 13 +++
> include/sysemu/host_iommu_device.h | 2 +-
> include/sysemu/iommufd.h | 14 ++-
> backends/iommufd.c | 89 ++++++++++++++-
> hw/vfio/common.c | 17 +--
> hw/vfio/container.c | 11 +-
> hw/vfio/helpers.c | 18 +++
> hw/vfio/iommufd.c | 178 ++++++++++++++++++++++++++++-
> hw/vfio/migration.c | 4 +-
> hw/vfio/pci.c | 22 ++--
> backends/trace-events | 3 +
> 11 files changed, 339 insertions(+), 32 deletions(-)
>
>--
>2.17.2
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking
2024-07-16 8:20 ` [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Duan, Zhenzhong
@ 2024-07-16 9:22 ` Joao Martins
2024-07-18 7:50 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Joao Martins @ 2024-07-16 9:22 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org, Cedric Le Goater
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Jason Gunthorpe,
Avihai Horon
On 16/07/2024 09:20, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking
>>
>> This small series adds support for IOMMU dirty tracking support via the
>> IOMMUFD backend. The hardware capability is available on most recent x86
>> hardware. The series is divided organized as follows:
>>
>> * Patch 1-2: Fixes a regression into mdev support with IOMMUFD. This
>> one is independent of the series but happened to cross it
>> while testing mdev with this series
>
> I guess VFIO ap/ccw may need fixes too.
> Will you help on that or I can take it if you want to focus on dirty tracking.
> The fix may be trivial, just assign VFIODevice->mdev = true.
>
If you have something in mind already by all means go ahead.
But from the code are we sure these are mdev bus devices? Certainly are grepping
with 'mdev' but unclear if that's abbreviation for 'My Device' or actually bus
mdev/mediated-device?
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking
2024-07-16 9:22 ` Joao Martins
@ 2024-07-18 7:50 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2024-07-18 7:50 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org, Cedric Le Goater
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Jason Gunthorpe,
Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking
>
>On 16/07/2024 09:20, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking
>>>
>>> This small series adds support for IOMMU dirty tracking support via the
>>> IOMMUFD backend. The hardware capability is available on most recent
>x86
>>> hardware. The series is divided organized as follows:
>>>
>>> * Patch 1-2: Fixes a regression into mdev support with IOMMUFD. This
>>> one is independent of the series but happened to cross it
>>> while testing mdev with this series
>>
>> I guess VFIO ap/ccw may need fixes too.
>> Will you help on that or I can take it if you want to focus on dirty tracking.
>> The fix may be trivial, just assign VFIODevice->mdev = true.
>>
>
>If you have something in mind already by all means go ahead.
OK, will be after your 'dirty tracking' v5 as there is dependency.
>
>But from the code are we sure these are mdev bus devices? Certainly are
>grepping
>with 'mdev' but unclear if that's abbreviation for 'My Device' or actually bus
>mdev/mediated-device?
I think so, docs/system/s390x/vfio-[ap|ccw].rst shows /sys/bus/mdev
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread