* [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-09 3:43 ` Duan, Zhenzhong
2024-07-08 14:34 ` [PATCH v3 02/10] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
` (9 subsequent siblings)
10 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
mdevs aren't "physical" devices and when asking for backing IOMMU info, it
fails the entire provisioning of the guest. Fix that by filling caps info
when IOMMU_GET_HW_INFO succeeds plus discarding the error we would get into
iommufd_backend_get_device_info().
Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
hw/vfio/iommufd.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index c2f158e60386..a4d23f488b01 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -631,15 +631,13 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
hiod->agent = opaque;
- if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
- &type, &data, sizeof(data), errp)) {
- return false;
+ if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
+ &type, &data, sizeof(data), NULL)) {
+ hiod->name = g_strdup(vdev->name);
+ caps->type = type;
+ caps->aw_bits = vfio_device_get_aw_bits(vdev);
}
- hiod->name = g_strdup(vdev->name);
- caps->type = type;
- caps->aw_bits = vfio_device_get_aw_bits(vdev);
-
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* RE: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-08 14:34 ` [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure Joao Martins
@ 2024-07-09 3:43 ` Duan, Zhenzhong
2024-07-09 8:56 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-09 3:43 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
Hi Joao,
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>IOMMU_GET_HW_INFO failure
>
>mdevs aren't "physical" devices and when asking for backing IOMMU info, it
>fails the entire provisioning of the guest. Fix that by filling caps info
>when IOMMU_GET_HW_INFO succeeds plus discarding the error we would
>get into
>iommufd_backend_get_device_info().
>
>Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>Fixes: 930589520128 ("vfio/iommufd: Implement
>HostIOMMUDeviceClass::realize() handler")
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> hw/vfio/iommufd.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index c2f158e60386..a4d23f488b01 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -631,15 +631,13 @@ static bool
>hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->agent = opaque;
>
>- if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>- &type, &data, sizeof(data), errp)) {
>- return false;
>+ if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>+ &type, &data, sizeof(data), NULL)) {
This will make us miss the real error. What about bypassing host IOMMU device
creation for mdev as it's not "physical device", passing corresponding host IOMMU
device to vIOMMU make no sense.
Thanks
Zhenzhong
>+ hiod->name = g_strdup(vdev->name);
>+ caps->type = type;
>+ caps->aw_bits = vfio_device_get_aw_bits(vdev);
> }
>
>- hiod->name = g_strdup(vdev->name);
>- caps->type = type;
>- caps->aw_bits = vfio_device_get_aw_bits(vdev);
>-
> return true;
> }
>
>--
>2.17.2
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-09 3:43 ` Duan, Zhenzhong
@ 2024-07-09 8:56 ` Joao Martins
2024-07-09 11:45 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-09 8:56 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 04:43, Duan, Zhenzhong wrote:
> Hi Joao,
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>> IOMMU_GET_HW_INFO failure
>>
>> mdevs aren't "physical" devices and when asking for backing IOMMU info, it
>> fails the entire provisioning of the guest. Fix that by filling caps info
>> when IOMMU_GET_HW_INFO succeeds plus discarding the error we would
>> get into
>> iommufd_backend_get_device_info().
>>
>> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> Fixes: 930589520128 ("vfio/iommufd: Implement
>> HostIOMMUDeviceClass::realize() handler")
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> hw/vfio/iommufd.c | 12 +++++-------
>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index c2f158e60386..a4d23f488b01 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -631,15 +631,13 @@ static bool
>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>
>> hiod->agent = opaque;
>>
>> - if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>> - &type, &data, sizeof(data), errp)) {
>> - return false;
>> + if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>> + &type, &data, sizeof(data), NULL)) {
>
> This will make us miss the real error. What about bypassing host IOMMU device
> creation for mdev as it's not "physical device", passing corresponding host IOMMU
> device to vIOMMU make no sense.
Yeap -- This was my second alternative.
I can add an helper for vfio_is_mdev()) and just call
iommufd_backend_get_device_info() if !vfio_is_mdev(). I am assuming you meant
to skip the initialization of HostIOMMUDeviceCaps::caps as I think that
initializing hiod still makes sense as we are still using a
TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO somewhat?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-09 8:56 ` Joao Martins
@ 2024-07-09 11:45 ` Joao Martins
2024-07-09 11:50 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-09 11:45 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 09:56, Joao Martins wrote:
> On 09/07/2024 04:43, Duan, Zhenzhong wrote:
>> Hi Joao,
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>> IOMMU_GET_HW_INFO failure
>>>
>>> mdevs aren't "physical" devices and when asking for backing IOMMU info, it
>>> fails the entire provisioning of the guest. Fix that by filling caps info
>>> when IOMMU_GET_HW_INFO succeeds plus discarding the error we would
>>> get into
>>> iommufd_backend_get_device_info().
>>>
>>> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> Fixes: 930589520128 ("vfio/iommufd: Implement
>>> HostIOMMUDeviceClass::realize() handler")
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> hw/vfio/iommufd.c | 12 +++++-------
>>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index c2f158e60386..a4d23f488b01 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -631,15 +631,13 @@ static bool
>>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>>
>>> hiod->agent = opaque;
>>>
>>> - if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>>> - &type, &data, sizeof(data), errp)) {
>>> - return false;
>>> + if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>>> + &type, &data, sizeof(data), NULL)) {
>>
>> This will make us miss the real error. What about bypassing host IOMMU device
>> creation for mdev as it's not "physical device", passing corresponding host IOMMU
>> device to vIOMMU make no sense.
>
> Yeap -- This was my second alternative.
>
> I can add an helper for vfio_is_mdev()) and just call
> iommufd_backend_get_device_info() if !vfio_is_mdev(). I am assuming you meant
> to skip the initialization of HostIOMMUDeviceCaps::caps as I think that
> initializing hiod still makes sense as we are still using a
> TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO somewhat?
>
Something like this is what I've done with this patch, see below. I think it
matches what you suggested? Naturally there's a precedent patch that introduces
vfio_is_mdev().
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index a4d23f488b01..c0a019dffdb6 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -631,8 +631,9 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod,
void *opaque,
hiod->agent = opaque;
- if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
- &type, &data, sizeof(data), NULL)) {
+ if (!vfio_is_mdev(vdev) &&
+ iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
+ &type, &data, sizeof(data), errp)) {
hiod->name = g_strdup(vdev->name);
caps->type = type;
caps->aw_bits = vfio_device_get_aw_bits(vdev);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d95aa6b65788..f092c1537999 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3115,7 +3115,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
vfio_bars_register(vdev);
- if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
+ if (!is_mdev && !pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
error_prepend(errp, "Failed to set iommu_device: ");
goto out_teardown;
}
@@ -3238,7 +3238,9 @@ out_deregister:
timer_free(vdev->intx.mmap_timer);
}
out_unset_idev:
- pci_device_unset_iommu_device(pdev);
+ if (!is_mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
out_teardown:
vfio_teardown_msi(vdev);
vfio_bars_exit(vdev);
@@ -3268,6 +3270,7 @@ static void vfio_exitfn(PCIDevice *pdev)
{
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
+ bool is_mdev = vfio_is_mdev(vbasedev);
vfio_unregister_req_notifier(vdev);
vfio_unregister_err_notifier(vdev);
@@ -3283,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
vfio_pci_disable_rp_atomics(vdev);
vfio_bars_exit(vdev);
vfio_migration_exit(vbasedev);
- pci_device_unset_iommu_device(pdev);
+ if (!is_mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
}
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-09 11:45 ` Joao Martins
@ 2024-07-09 11:50 ` Joao Martins
2024-07-10 2:53 ` Duan, Zhenzhong
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-09 11:50 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 12:45, Joao Martins wrote:
> On 09/07/2024 09:56, Joao Martins wrote:
>> On 09/07/2024 04:43, Duan, Zhenzhong wrote:
>>> Hi Joao,
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>>> IOMMU_GET_HW_INFO failure
>>>>
>>>> mdevs aren't "physical" devices and when asking for backing IOMMU info, it
>>>> fails the entire provisioning of the guest. Fix that by filling caps info
>>>> when IOMMU_GET_HW_INFO succeeds plus discarding the error we would
>>>> get into
>>>> iommufd_backend_get_device_info().
>>>>
>>>> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> Fixes: 930589520128 ("vfio/iommufd: Implement
>>>> HostIOMMUDeviceClass::realize() handler")
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> hw/vfio/iommufd.c | 12 +++++-------
>>>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index c2f158e60386..a4d23f488b01 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -631,15 +631,13 @@ static bool
>>>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>>>
>>>> hiod->agent = opaque;
>>>>
>>>> - if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>>>> - &type, &data, sizeof(data), errp)) {
>>>> - return false;
>>>> + if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>>>> + &type, &data, sizeof(data), NULL)) {
>>>
>>> This will make us miss the real error. What about bypassing host IOMMU device
>>> creation for mdev as it's not "physical device", passing corresponding host IOMMU
>>> device to vIOMMU make no sense.
>>
>> Yeap -- This was my second alternative.
>>
>> I can add an helper for vfio_is_mdev()) and just call
>> iommufd_backend_get_device_info() if !vfio_is_mdev(). I am assuming you meant
>> to skip the initialization of HostIOMMUDeviceCaps::caps as I think that
>> initializing hiod still makes sense as we are still using a
>> TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO somewhat?
>>
> Something like this is what I've done with this patch, see below. I think it
> matches what you suggested? Naturally there's a precedent patch that introduces
> vfio_is_mdev().
>
Sorry ignore the previous snip, it was the wrong version, see below instead.
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index c2f158e60386..987dd9779f94 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -631,6 +631,10 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice
*hiod, void *opaque,
hiod->agent = opaque;
+ if (vfio_is_mdev(vdev)) {
+ return true;
+ }
+
if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
&type, &data, sizeof(data), errp)) {
return false;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d95aa6b65788..f092c1537999 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3115,7 +3115,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
vfio_bars_register(vdev);
- if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
+ if (!is_mdev && !pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
error_prepend(errp, "Failed to set iommu_device: ");
goto out_teardown;
}
@@ -3238,7 +3238,9 @@ out_deregister:
timer_free(vdev->intx.mmap_timer);
}
out_unset_idev:
- pci_device_unset_iommu_device(pdev);
+ if (!is_mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
out_teardown:
vfio_teardown_msi(vdev);
vfio_bars_exit(vdev);
@@ -3268,6 +3270,7 @@ static void vfio_exitfn(PCIDevice *pdev)
{
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
+ bool is_mdev = vfio_is_mdev(vbasedev);
vfio_unregister_req_notifier(vdev);
vfio_unregister_err_notifier(vdev);
@@ -3283,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
vfio_pci_disable_rp_atomics(vdev);
vfio_bars_exit(vdev);
vfio_migration_exit(vbasedev);
- pci_device_unset_iommu_device(pdev);
+ if (!is_mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
}
^ permalink raw reply related [flat|nested] 48+ messages in thread
* RE: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-09 11:50 ` Joao Martins
@ 2024-07-10 2:53 ` Duan, Zhenzhong
2024-07-10 9:29 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-10 2:53 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>IOMMU_GET_HW_INFO failure
>
>On 09/07/2024 12:45, Joao Martins wrote:
>> On 09/07/2024 09:56, Joao Martins wrote:
>>> On 09/07/2024 04:43, Duan, Zhenzhong wrote:
>>>> Hi Joao,
>>>>
>>>>> -----Original Message-----
>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>> Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>>>> IOMMU_GET_HW_INFO failure
>>>>>
>>>>> mdevs aren't "physical" devices and when asking for backing IOMMU
>info, it
>>>>> fails the entire provisioning of the guest. Fix that by filling caps info
>>>>> when IOMMU_GET_HW_INFO succeeds plus discarding the error we
>would
>>>>> get into
>>>>> iommufd_backend_get_device_info().
>>>>>
>>>>> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> Fixes: 930589520128 ("vfio/iommufd: Implement
>>>>> HostIOMMUDeviceClass::realize() handler")
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> ---
>>>>> hw/vfio/iommufd.c | 12 +++++-------
>>>>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index c2f158e60386..a4d23f488b01 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -631,15 +631,13 @@ static bool
>>>>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>>>>
>>>>> hiod->agent = opaque;
>>>>>
>>>>> - if (!iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>devid,
>>>>> - &type, &data, sizeof(data), errp)) {
>>>>> - return false;
>>>>> + if (iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>devid,
>>>>> + &type, &data, sizeof(data), NULL)) {
>>>>
>>>> This will make us miss the real error. What about bypassing host
>IOMMU device
>>>> creation for mdev as it's not "physical device", passing corresponding
>host IOMMU
>>>> device to vIOMMU make no sense.
>>>
>>> Yeap -- This was my second alternative.
>>>
>>> I can add an helper for vfio_is_mdev()) and just call
>>> iommufd_backend_get_device_info() if !vfio_is_mdev(). I am assuming
>you meant
>>> to skip the initialization of HostIOMMUDeviceCaps::caps as I think that
>>> initializing hiod still makes sense as we are still using a
>>> TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO somewhat?
>>>
>> Something like this is what I've done with this patch, see below. I think it
>> matches what you suggested? Naturally there's a precedent patch that
>introduces
>> vfio_is_mdev().
>>
>
>Sorry ignore the previous snip, it was the wrong version, see below instead.
>
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index c2f158e60386..987dd9779f94 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -631,6 +631,10 @@ static bool
>hiod_iommufd_vfio_realize(HostIOMMUDevice
>*hiod, void *opaque,
>
> hiod->agent = opaque;
>
>+ if (vfio_is_mdev(vdev)) {
>+ return true;
>+ }
>+
Not necessary to create a dummy object.
What about bypassing object_new(ops->hiod_typename) in vfio_attach_device()?
> if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
> &type, &data, sizeof(data), errp)) {
> return false;
>diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>index d95aa6b65788..f092c1537999 100644
>--- a/hw/vfio/pci.c
>+++ b/hw/vfio/pci.c
>@@ -3115,7 +3115,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>
> vfio_bars_register(vdev);
>
>- if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
>+ if (!is_mdev && !pci_device_set_iommu_device(pdev, vbasedev->hiod,
>errp)) {
Yes.
> error_prepend(errp, "Failed to set iommu_device: ");
> goto out_teardown;
> }
>@@ -3238,7 +3238,9 @@ out_deregister:
> timer_free(vdev->intx.mmap_timer);
> }
> out_unset_idev:
>- pci_device_unset_iommu_device(pdev);
>+ if (!is_mdev) {
>+ pci_device_unset_iommu_device(pdev);
>+ }
> out_teardown:
> vfio_teardown_msi(vdev);
> vfio_bars_exit(vdev);
>@@ -3268,6 +3270,7 @@ static void vfio_exitfn(PCIDevice *pdev)
> {
> VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
>+ bool is_mdev = vfio_is_mdev(vbasedev);
>
> vfio_unregister_req_notifier(vdev);
> vfio_unregister_err_notifier(vdev);
>@@ -3283,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
> vfio_pci_disable_rp_atomics(vdev);
> vfio_bars_exit(vdev);
> vfio_migration_exit(vbasedev);
>- pci_device_unset_iommu_device(pdev);
>+ if (!is_mdev) {
>+ pci_device_unset_iommu_device(pdev);
>+ }
Yes.
Thanks
Zhenzhong
> }
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-10 2:53 ` Duan, Zhenzhong
@ 2024-07-10 9:29 ` Joao Martins
2024-07-10 9:54 ` Duan, Zhenzhong
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-10 9:29 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 10/07/2024 03:53, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>> IOMMU_GET_HW_INFO failure
>>
>> On 09/07/2024 12:45, Joao Martins wrote:
>>> On 09/07/2024 09:56, Joao Martins wrote:
>>>> On 09/07/2024 04:43, Duan, Zhenzhong wrote:
>>>>> Hi Joao,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>> Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>>>>> IOMMU_GET_HW_INFO failure
>>>>>>
>>>>>> mdevs aren't "physical" devices and when asking for backing IOMMU
>> info, it
>>>>>> fails the entire provisioning of the guest. Fix that by filling caps info
>>>>>> when IOMMU_GET_HW_INFO succeeds plus discarding the error we
>> would
>>>>>> get into
>>>>>> iommufd_backend_get_device_info().
>>>>>>
>>>>>> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>> Fixes: 930589520128 ("vfio/iommufd: Implement
>>>>>> HostIOMMUDeviceClass::realize() handler")
>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>> ---
>>>>>> hw/vfio/iommufd.c | 12 +++++-------
>>>>>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>> index c2f158e60386..a4d23f488b01 100644
>>>>>> --- a/hw/vfio/iommufd.c
>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>> @@ -631,15 +631,13 @@ static bool
>>>>>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>>>>>
>>>>>> hiod->agent = opaque;
>>>>>>
>>>>>> - if (!iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>> devid,
>>>>>> - &type, &data, sizeof(data), errp)) {
>>>>>> - return false;
>>>>>> + if (iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>> devid,
>>>>>> + &type, &data, sizeof(data), NULL)) {
>>>>>
>>>>> This will make us miss the real error. What about bypassing host
>> IOMMU device
>>>>> creation for mdev as it's not "physical device", passing corresponding
>> host IOMMU
>>>>> device to vIOMMU make no sense.
>>>>
>>>> Yeap -- This was my second alternative.
>>>>
>>>> I can add an helper for vfio_is_mdev()) and just call
>>>> iommufd_backend_get_device_info() if !vfio_is_mdev(). I am assuming
>> you meant
>>>> to skip the initialization of HostIOMMUDeviceCaps::caps as I think that
>>>> initializing hiod still makes sense as we are still using a
>>>> TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO somewhat?
>>>>
>>> Something like this is what I've done with this patch, see below. I think it
>>> matches what you suggested? Naturally there's a precedent patch that
>> introduces
>>> vfio_is_mdev().
>>>
>>
>> Sorry ignore the previous snip, it was the wrong version, see below instead.
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index c2f158e60386..987dd9779f94 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -631,6 +631,10 @@ static bool
>> hiod_iommufd_vfio_realize(HostIOMMUDevice
>> *hiod, void *opaque,
>>
>> hiod->agent = opaque;
>>
>> + if (vfio_is_mdev(vdev)) {
>> + return true;
>> + }
>> +
>
> Not necessary to create a dummy object.
> What about bypassing object_new(ops->hiod_typename) in vfio_attach_device()?
>
Not sure I am parsing this. What dummy object you refer to here if it's not
vbasedev::hiod that remains unused? Also in a suggestion by Cedric, and
pre-seeding vbasedev::hiod during attach_device()[0]. So I will sort of do that
already, but your comments means we are allocating a dummy object anyways too?
Or are you perhaps suggesting something like:
@@ -1552,17 +1552,20 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
assert(ops);
if (!ops->attach_device(name, vbasedev, as, errp)) {
return false;
}
if (!vfio_mdev(vbasedev) &&
!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
?
[0]
https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-59170c471e24@oracle.com/
^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-10 9:29 ` Joao Martins
@ 2024-07-10 9:54 ` Duan, Zhenzhong
2024-07-10 9:56 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-10 9:54 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>IOMMU_GET_HW_INFO failure
>
>On 10/07/2024 03:53, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>> IOMMU_GET_HW_INFO failure
>>>
>>> On 09/07/2024 12:45, Joao Martins wrote:
>>>> On 09/07/2024 09:56, Joao Martins wrote:
>>>>> On 09/07/2024 04:43, Duan, Zhenzhong wrote:
>>>>>> Hi Joao,
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>>> Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>>>>>> IOMMU_GET_HW_INFO failure
>>>>>>>
>>>>>>> mdevs aren't "physical" devices and when asking for backing
>IOMMU
>>> info, it
>>>>>>> fails the entire provisioning of the guest. Fix that by filling caps info
>>>>>>> when IOMMU_GET_HW_INFO succeeds plus discarding the error we
>>> would
>>>>>>> get into
>>>>>>> iommufd_backend_get_device_info().
>>>>>>>
>>>>>>> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>> Fixes: 930589520128 ("vfio/iommufd: Implement
>>>>>>> HostIOMMUDeviceClass::realize() handler")
>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>> ---
>>>>>>> hw/vfio/iommufd.c | 12 +++++-------
>>>>>>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>> index c2f158e60386..a4d23f488b01 100644
>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>> @@ -631,15 +631,13 @@ static bool
>>>>>>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void
>*opaque,
>>>>>>>
>>>>>>> hiod->agent = opaque;
>>>>>>>
>>>>>>> - if (!iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>>> devid,
>>>>>>> - &type, &data, sizeof(data), errp)) {
>>>>>>> - return false;
>>>>>>> + if (iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>>> devid,
>>>>>>> + &type, &data, sizeof(data), NULL)) {
>>>>>>
>>>>>> This will make us miss the real error. What about bypassing host
>>> IOMMU device
>>>>>> creation for mdev as it's not "physical device", passing corresponding
>>> host IOMMU
>>>>>> device to vIOMMU make no sense.
>>>>>
>>>>> Yeap -- This was my second alternative.
>>>>>
>>>>> I can add an helper for vfio_is_mdev()) and just call
>>>>> iommufd_backend_get_device_info() if !vfio_is_mdev(). I am
>assuming
>>> you meant
>>>>> to skip the initialization of HostIOMMUDeviceCaps::caps as I think that
>>>>> initializing hiod still makes sense as we are still using a
>>>>> TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO somewhat?
>>>>>
>>>> Something like this is what I've done with this patch, see below. I think it
>>>> matches what you suggested? Naturally there's a precedent patch that
>>> introduces
>>>> vfio_is_mdev().
>>>>
>>>
>>> Sorry ignore the previous snip, it was the wrong version, see below
>instead.
>>>
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index c2f158e60386..987dd9779f94 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -631,6 +631,10 @@ static bool
>>> hiod_iommufd_vfio_realize(HostIOMMUDevice
>>> *hiod, void *opaque,
>>>
>>> hiod->agent = opaque;
>>>
>>> + if (vfio_is_mdev(vdev)) {
>>> + return true;
>>> + }
>>> +
>>
>> Not necessary to create a dummy object.
>> What about bypassing object_new(ops->hiod_typename) in
>vfio_attach_device()?
>>
>Not sure I am parsing this. What dummy object you refer to here if it's not
>vbasedev::hiod that remains unused? Also in a suggestion by Cedric, and
>pre-seeding vbasedev::hiod during attach_device()[0]. So I will sort of do that
>already, but your comments means we are allocating a dummy object
>anyways too?
Yes, with your snip change, it's allocated by object_new(ops->hiod_typename) but not realized
and never used else where.
>
>Or are you perhaps suggesting something like:
>
>@@ -1552,17 +1552,20 @@ bool vfio_attach_device(char *name,
>VFIODevice *vbasedev,
>
> assert(ops);
>
> if (!ops->attach_device(name, vbasedev, as, errp)) {
> return false;
> }
>
> if (!vfio_mdev(vbasedev) &&
> !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp)) {
>
>?
I mean bypass host IOMMU device thoroughly for mdev, like:
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1548,6 +1548,10 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
return false;
}
+ if (vfio_is_mdev(vdev)) {
+ return true;
+ }
+
hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
object_unref(hiod);
>
>
>[0]
>https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-
>59170c471e24@oracle.com/
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure
2024-07-10 9:54 ` Duan, Zhenzhong
@ 2024-07-10 9:56 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-10 9:56 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 10/07/2024 10:54, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>> IOMMU_GET_HW_INFO failure
>>
>> On 10/07/2024 03:53, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Subject: Re: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>>> IOMMU_GET_HW_INFO failure
>>>>
>>>> On 09/07/2024 12:45, Joao Martins wrote:
>>>>> On 09/07/2024 09:56, Joao Martins wrote:
>>>>>> On 09/07/2024 04:43, Duan, Zhenzhong wrote:
>>>>>>> Hi Joao,
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>> Subject: [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on
>>>>>>>> IOMMU_GET_HW_INFO failure
>>>>>>>>
>>>>>>>> mdevs aren't "physical" devices and when asking for backing
>> IOMMU
>>>> info, it
>>>>>>>> fails the entire provisioning of the guest. Fix that by filling caps info
>>>>>>>> when IOMMU_GET_HW_INFO succeeds plus discarding the error we
>>>> would
>>>>>>>> get into
>>>>>>>> iommufd_backend_get_device_info().
>>>>>>>>
>>>>>>>> Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>> Fixes: 930589520128 ("vfio/iommufd: Implement
>>>>>>>> HostIOMMUDeviceClass::realize() handler")
>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>> ---
>>>>>>>> hw/vfio/iommufd.c | 12 +++++-------
>>>>>>>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>> index c2f158e60386..a4d23f488b01 100644
>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>> @@ -631,15 +631,13 @@ static bool
>>>>>>>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void
>> *opaque,
>>>>>>>>
>>>>>>>> hiod->agent = opaque;
>>>>>>>>
>>>>>>>> - if (!iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>>>> devid,
>>>>>>>> - &type, &data, sizeof(data), errp)) {
>>>>>>>> - return false;
>>>>>>>> + if (iommufd_backend_get_device_info(vdev->iommufd, vdev-
>>>>> devid,
>>>>>>>> + &type, &data, sizeof(data), NULL)) {
>>>>>>>
>>>>>>> This will make us miss the real error. What about bypassing host
>>>> IOMMU device
>>>>>>> creation for mdev as it's not "physical device", passing corresponding
>>>> host IOMMU
>>>>>>> device to vIOMMU make no sense.
>>>>>>
>>>>>> Yeap -- This was my second alternative.
>>>>>>
>>>>>> I can add an helper for vfio_is_mdev()) and just call
>>>>>> iommufd_backend_get_device_info() if !vfio_is_mdev(). I am
>> assuming
>>>> you meant
>>>>>> to skip the initialization of HostIOMMUDeviceCaps::caps as I think that
>>>>>> initializing hiod still makes sense as we are still using a
>>>>>> TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO somewhat?
>>>>>>
>>>>> Something like this is what I've done with this patch, see below. I think it
>>>>> matches what you suggested? Naturally there's a precedent patch that
>>>> introduces
>>>>> vfio_is_mdev().
>>>>>
>>>>
>>>> Sorry ignore the previous snip, it was the wrong version, see below
>> instead.
>>>>
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index c2f158e60386..987dd9779f94 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -631,6 +631,10 @@ static bool
>>>> hiod_iommufd_vfio_realize(HostIOMMUDevice
>>>> *hiod, void *opaque,
>>>>
>>>> hiod->agent = opaque;
>>>>
>>>> + if (vfio_is_mdev(vdev)) {
>>>> + return true;
>>>> + }
>>>> +
>>>
>>> Not necessary to create a dummy object.
>>> What about bypassing object_new(ops->hiod_typename) in
>> vfio_attach_device()?
>>>
>> Not sure I am parsing this. What dummy object you refer to here if it's not
>> vbasedev::hiod that remains unused? Also in a suggestion by Cedric, and
>> pre-seeding vbasedev::hiod during attach_device()[0]. So I will sort of do that
>> already, but your comments means we are allocating a dummy object
>> anyways too?
>
> Yes, with your snip change, it's allocated by object_new(ops->hiod_typename) but not realized
> and never used else where.
>
>>
>> Or are you perhaps suggesting something like:
>>
>> @@ -1552,17 +1552,20 @@ bool vfio_attach_device(char *name,
>> VFIODevice *vbasedev,
>>
>> assert(ops);
>>
>> if (!ops->attach_device(name, vbasedev, as, errp)) {
>> return false;
>> }
>>
>> if (!vfio_mdev(vbasedev) &&
>> !HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>> errp)) {
>>
>> ?
>
> I mean bypass host IOMMU device thoroughly for mdev, like:
>
/me facepalm.
Makes sense!
I read your comment in my head as "What about by passing
object_new(ops->hiod_typename)", when it was 'bypassing' that you wrote.
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1548,6 +1548,10 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> return false;
> }
>
> + if (vfio_is_mdev(vdev)) {
> + return true;
> + }
> +
> hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> object_unref(hiod);
>
>
>>
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-
>> 59170c471e24@oracle.com/
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 02/10] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
2024-07-08 14:34 ` [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-09 6:13 ` Duan, Zhenzhong
2024-07-08 14:34 ` [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
` (8 subsequent siblings)
10 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
The helper will be able to fetch vendor agnostic IOMMU capabilities
supported both by hardware and software. Right now it is only iommu dirty
tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/iommufd.h | 2 +-
backends/iommufd.c | 4 +++-
hw/vfio/iommufd.c | 4 +++-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 9edfec604595..57d502a1c79a 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -49,7 +49,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
hwaddr iova, ram_addr_t size);
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- Error **errp);
+ uint64_t *caps, Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 84fefbc9ee7a..2b3d51af26d2 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -210,7 +210,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- Error **errp)
+ uint64_t *caps, Error **errp)
{
struct iommu_hw_info info = {
.size = sizeof(info),
@@ -226,6 +226,8 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
g_assert(type);
*type = info.out_data_type;
+ g_assert(caps);
+ *caps = info.out_capabilities;
return true;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index a4d23f488b01..9cee71659b1c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -628,11 +628,13 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
union {
struct iommu_hw_info_vtd vtd;
} data;
+ uint64_t hw_caps;
hiod->agent = opaque;
if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
- &type, &data, sizeof(data), NULL)) {
+ &type, &data, sizeof(data),
+ &hw_caps, NULL)) {
hiod->name = g_strdup(vdev->name);
caps->type = type;
caps->aw_bits = vfio_device_get_aw_bits(vdev);
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* RE: [PATCH v3 02/10] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
2024-07-08 14:34 ` [PATCH v3 02/10] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
@ 2024-07-09 6:13 ` Duan, Zhenzhong
0 siblings, 0 replies; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-09 6:13 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v3 02/10] backends/iommufd: Extend
>iommufd_backend_get_device_info() to fetch HW capabilities
>
>The helper will be able to fetch vendor agnostic IOMMU capabilities
>supported both by hardware and software. Right now it is only iommu dirty
>tracking.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
>---
> include/sysemu/iommufd.h | 2 +-
> backends/iommufd.c | 4 +++-
> hw/vfio/iommufd.c | 4 +++-
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index 9edfec604595..57d502a1c79a 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -49,7 +49,7 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> hwaddr iova, ram_addr_t size);
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
>- Error **errp);
>+ uint64_t *caps, Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index 84fefbc9ee7a..2b3d51af26d2 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -210,7 +210,7 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
>- Error **errp)
>+ uint64_t *caps, Error **errp)
> {
> struct iommu_hw_info info = {
> .size = sizeof(info),
>@@ -226,6 +226,8 @@ bool
>iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>
> g_assert(type);
> *type = info.out_data_type;
>+ g_assert(caps);
>+ *caps = info.out_capabilities;
>
> return true;
> }
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index a4d23f488b01..9cee71659b1c 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -628,11 +628,13 @@ static bool
>hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> union {
> struct iommu_hw_info_vtd vtd;
> } data;
>+ uint64_t hw_caps;
>
> hiod->agent = opaque;
>
> if (iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
>- &type, &data, sizeof(data), NULL)) {
>+ &type, &data, sizeof(data),
>+ &hw_caps, NULL)) {
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
> caps->aw_bits = vfio_device_get_aw_bits(vdev);
>--
>2.17.2
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
2024-07-08 14:34 ` [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure Joao Martins
2024-07-08 14:34 ` [PATCH v3 02/10] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-08 15:28 ` Cédric Le Goater
2024-07-08 14:34 ` [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation Joao Martins
` (7 subsequent siblings)
10 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
In preparation to implement auto domains have the attach function
return the errno it got during domain attach instead of a bool.
-EINVAL is tracked to track domain incompatibilities, and decide whether
to create a new IOMMU domain.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
hw/vfio/iommufd.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 9cee71659b1c..5a5993b17c2e 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -172,7 +172,7 @@ out:
return ret;
}
-static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
+static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
Error **errp)
{
int iommufd = vbasedev->iommufd->fd;
@@ -187,12 +187,12 @@ static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
error_setg_errno(errp, errno,
"[iommufd=%d] error attach %s (%d) to id=%d",
iommufd, vbasedev->name, vbasedev->fd, id);
- return false;
+ return -errno;
}
trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
vbasedev->fd, id);
- return true;
+ return 0;
}
static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
@@ -216,7 +216,7 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
- return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
+ return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-08 14:34 ` [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
@ 2024-07-08 15:28 ` Cédric Le Goater
2024-07-08 15:32 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-08 15:28 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
Hello Joao,
On 7/8/24 4:34 PM, Joao Martins wrote:
> In preparation to implement auto domains have the attach function
> return the errno it got during domain attach instead of a bool.
>
> -EINVAL is tracked to track domain incompatibilities, and decide whether
> to create a new IOMMU domain.
Please leave the return value as a bool unless there is a very
good reason not to.
Thanks,
C.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> hw/vfio/iommufd.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 9cee71659b1c..5a5993b17c2e 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -172,7 +172,7 @@ out:
> return ret;
> }
>
> -static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> +static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> Error **errp)
> {
> int iommufd = vbasedev->iommufd->fd;
> @@ -187,12 +187,12 @@ static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
> error_setg_errno(errp, errno,
> "[iommufd=%d] error attach %s (%d) to id=%d",
> iommufd, vbasedev->name, vbasedev->fd, id);
> - return false;
> + return -errno;
> }
>
> trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
> vbasedev->fd, id);
> - return true;
> + return 0;
> }
>
> static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> @@ -216,7 +216,7 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
> - return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> + return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> }
>
> static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-08 15:28 ` Cédric Le Goater
@ 2024-07-08 15:32 ` Joao Martins
2024-07-08 16:28 ` Joao Martins
2024-07-09 6:20 ` Cédric Le Goater
0 siblings, 2 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-08 15:32 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 08/07/2024 16:28, Cédric Le Goater wrote:
> Hello Joao,
>
> On 7/8/24 4:34 PM, Joao Martins wrote:
>> In preparation to implement auto domains have the attach function
>> return the errno it got during domain attach instead of a bool.
>>
>> -EINVAL is tracked to track domain incompatibilities, and decide whether
>> to create a new IOMMU domain.
>
> Please leave the return value as a bool unless there is a very
> good reason not to.
>
Error* doesn't store the errno, and thus I can't actually test the number of
errno to know when to bail to the next hwpt. Maybe the commit message wasn't
clear enough there. But not sure if we have an alternative here? Or maybe Error
does store errno, and I totally missed it.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-08 15:32 ` Joao Martins
@ 2024-07-08 16:28 ` Joao Martins
2024-07-09 6:20 ` Cédric Le Goater
1 sibling, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-08 16:28 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 08/07/2024 16:32, Joao Martins wrote:
> On 08/07/2024 16:28, Cédric Le Goater wrote:
>> Hello Joao,
>>
>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>> In preparation to implement auto domains have the attach function
>>> return the errno it got during domain attach instead of a bool.
>>>
>>> -EINVAL is tracked to track domain incompatibilities, and decide whether
>>> to create a new IOMMU domain.
>>
>> Please leave the return value as a bool unless there is a very
>> good reason not to.
>>
>
> Error* doesn't store the errno, and thus I can't actually test the number of
> errno to know when to bail to the next hwpt. Maybe the commit message wasn't
> clear enough there. But not sure if we have an alternative here? Or maybe Error
> does store errno, and I totally missed it.
Or I can just use 'errno' and keep the same return value as bool. But I didn't
do that because we are purposedly calling error_set*(errp, errno) with the errno
value that it felt the right thing to just return it. Also, considering how hard
the code in util/error.c saves/restores errno in all the helpers.
Joao
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-08 15:32 ` Joao Martins
2024-07-08 16:28 ` Joao Martins
@ 2024-07-09 6:20 ` Cédric Le Goater
2024-07-09 8:56 ` Joao Martins
1 sibling, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-09 6:20 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/8/24 5:32 PM, Joao Martins wrote:
> On 08/07/2024 16:28, Cédric Le Goater wrote:
>> Hello Joao,
>>
>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>> In preparation to implement auto domains have the attach function
>>> return the errno it got during domain attach instead of a bool.
>>>
>>> -EINVAL is tracked to track domain incompatibilities, and decide whether
>>> to create a new IOMMU domain.
>>
>> Please leave the return value as a bool unless there is a very
>> good reason not to.
>>
>
> Error* doesn't store the errno, and thus I can't actually test the number of
> errno to know when to bail to the next hwpt. Maybe the commit message wasn't
> clear enough there. But not sure if we have an alternative here? Or maybe Error
> does store errno, and I totally missed it.
OK. Let's do the 'bool' -> 'int' change for iommufd_cdev_attach_ioas_hwpt().
I see how it is used later on in the series.
Thanks,
C.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-09 6:20 ` Cédric Le Goater
@ 2024-07-09 8:56 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-09 8:56 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 07:20, Cédric Le Goater wrote:
> On 7/8/24 5:32 PM, Joao Martins wrote:
>> On 08/07/2024 16:28, Cédric Le Goater wrote:
>>> Hello Joao,
>>>
>>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>>> In preparation to implement auto domains have the attach function
>>>> return the errno it got during domain attach instead of a bool.
>>>>
>>>> -EINVAL is tracked to track domain incompatibilities, and decide whether
>>>> to create a new IOMMU domain.
>>>
>>> Please leave the return value as a bool unless there is a very
>>> good reason not to.
>>>
>>
>> Error* doesn't store the errno, and thus I can't actually test the number of
>> errno to know when to bail to the next hwpt. Maybe the commit message wasn't
>> clear enough there. But not sure if we have an alternative here? Or maybe Error
>> does store errno, and I totally missed it.
>
> OK. Let's do the 'bool' -> 'int' change for iommufd_cdev_attach_ioas_hwpt().
> I see how it is used later on in the series.
Got it
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (2 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-09 6:26 ` Duan, Zhenzhong
2024-07-09 6:50 ` Cédric Le Goater
2024-07-08 14:34 ` [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
` (6 subsequent siblings)
10 siblings, 2 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
There's generally two modes of operation for IOMMUFD:
* The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
and mainly performs IOAS_MAP and UNMAP.
* The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.
For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.
Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.
To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimmicing kernel
iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
to IOAS attach (which again is always the case for mdevs).
The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here is not used in this way here given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' approach that we have been using
so far between container vs device dirty tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 9 ++++
include/sysemu/iommufd.h | 4 ++
backends/iommufd.c | 29 +++++++++++
hw/vfio/iommufd.c | 91 +++++++++++++++++++++++++++++++++++
backends/trace-events | 1 +
5 files changed, 134 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e8ddf92bb185..82c5a4aaa61e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
typedef struct IOMMUFDBackend IOMMUFDBackend;
+typedef struct VFIOIOASHwpt {
+ uint32_t hwpt_id;
+ QLIST_HEAD(, VFIODevice) device_list;
+ QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
typedef struct VFIOIOMMUFDContainer {
VFIOContainerBase bcontainer;
IOMMUFDBackend *be;
uint32_t ioas_id;
+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
} VFIOIOMMUFDContainer;
OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
@@ -134,6 +141,8 @@ typedef struct VFIODevice {
HostIOMMUDevice *hiod;
int devid;
IOMMUFDBackend *iommufd;
+ VFIOIOASHwpt *hwpt;
+ QLIST_ENTRY(VFIODevice) hwpt_next;
} VFIODevice;
struct VFIODeviceOps {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 57d502a1c79a..35a8cec9780f 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -50,6 +50,10 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp);
+int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2b3d51af26d2..f5f73eaf4a1a 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -208,6 +208,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
return ret;
}
+int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt)
+{
+ int ret, fd = be->fd;
+ struct iommu_hwpt_alloc alloc_hwpt = {
+ .size = sizeof(struct iommu_hwpt_alloc),
+ .flags = flags,
+ .dev_id = dev_id,
+ .pt_id = pt_id,
+ .data_type = data_type,
+ .data_len = data_len,
+ .data_uptr = (uint64_t)data_ptr,
+ };
+
+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
+ data_len, (uint64_t)data_ptr,
+ alloc_hwpt.out_hwpt_id, ret);
+ if (ret) {
+ ret = -errno;
+ error_report("IOMMU_HWPT_ALLOC failed: %m");
+ } else {
+ *out_hwpt = alloc_hwpt.out_hwpt_id;
+ }
+ return ret;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 5a5993b17c2e..2ca9a32cc7b6 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -212,10 +212,95 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
return true;
}
+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container,
+ Error **errp)
+{
+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
+ uint32_t flags = 0;
+ VFIOIOASHwpt *hwpt;
+ uint32_t hwpt_id;
+ int ret;
+
+ /* Try to find a domain */
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, NULL);
+ if (ret) {
+ /* -EINVAL means the domain is incompatible with the device. */
+ if (ret == -EINVAL) {
+ continue;
+ }
+
+ return false;
+ } else {
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ return true;
+ }
+ }
+
+ ret = iommufd_backend_alloc_hwpt(iommufd,
+ vbasedev->devid,
+ container->ioas_id, flags,
+ IOMMU_HWPT_DATA_NONE, 0, NULL, &hwpt_id);
+ if (ret) {
+ /*
+ * When there's no domains allocated we can still attempt a hardware
+ * pagetable allocation for an mdev, which ends up returning -ENOENT
+ * This is benign and meant to fallback into IOAS attach, hence don't
+ * set errp.
+ */
+ return false;
+ }
+
+ hwpt = g_malloc0(sizeof(*hwpt));
+ hwpt->hwpt_id = hwpt_id;
+ QLIST_INIT(&hwpt->device_list);
+
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ return false;
+ }
+
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ return true;
+}
+
+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container)
+{
+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
+
+ QLIST_REMOVE(vbasedev, hwpt_next);
+ if (QLIST_EMPTY(&hwpt->device_list)) {
+ QLIST_REMOVE(hwpt, next);
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ }
+}
+
static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
+ if (iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
+ return true;
+ }
+
+ /*
+ * iommufd_cdev_autodomains_get() may have an expected failure (-ENOENT)
+ * for mdevs as they aren't real physical devices. @errp is only set
+ * when real failures happened i.e. failing to attach to a newly created
+ * hardware pagetable. These are fatal and should fail the attachment.
+ */
+ if (*errp) {
+ return false;
+ }
+
return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
@@ -224,6 +309,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
{
Error *err = NULL;
+ if (vbasedev->hwpt) {
+ iommufd_cdev_autodomains_put(vbasedev, container);
+ return;
+ }
+
if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
error_report_err(err);
}
@@ -354,6 +444,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
container->be = vbasedev->iommufd;
container->ioas_id = ioas_id;
+ QLIST_INIT(&container->hwpt_list);
bcontainer = &container->bcontainer;
vfio_address_space_insert(space, bcontainer);
diff --git a/backends/trace-events b/backends/trace-events
index 211e6f374adc..4d8ac02fe7d6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* RE: [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation
2024-07-08 14:34 ` [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-07-09 6:26 ` Duan, Zhenzhong
2024-07-09 9:00 ` Joao Martins
2024-07-09 6:50 ` Cédric Le Goater
1 sibling, 1 reply; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-09 6:26 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation
>
>There's generally two modes of operation for IOMMUFD:
>
>* The simple user API which intends to perform relatively simple things
>with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>and mainly performs IOAS_MAP and UNMAP.
>
>* The native IOMMUFD API where you have fine grained control of the
>IOMMU domain and model it accordingly. This is where most new feature
>are being steered to.
>
>For dirty tracking 2) is required, as it needs to ensure that
>the stage-2/parent IOMMU domain will only attach devices
>that support dirty tracking (so far it is all homogeneous in x86, likely
>not the case for smmuv3). Such invariant on dirty tracking provides a
>useful guarantee to VMMs that will refuse incompatible device
>attachments for IOMMU domains.
>
>Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>responsible for creating an IOMMU domain. This is contrast to the
>'simple API' where the IOMMU domain is created by IOMMUFD
>automatically
>when it attaches to VFIO (usually referred as autodomains) but it has
>the needed handling for mdevs.
>
>To support dirty tracking with the advanced IOMMUFD API, it needs
>similar logic, where IOMMU domains are created and devices attached to
>compatible domains. Essentially mimmicing kernel
>iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
>to IOAS attach (which again is always the case for mdevs).
>
>The auto domain logic allows different IOMMU domains to be created when
>DMA dirty tracking is not desired (and VF can provide it), and others where
>it is. Here is not used in this way here given how VFIODevice migration
>state is initialized after the device attachment. But such mixed mode of
>IOMMU dirty tracking + device dirty tracking is an improvement that can
>be added on. Keep the 'all of nothing' approach that we have been using
>so far between container vs device dirty tracking.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/hw/vfio/vfio-common.h | 9 ++++
> include/sysemu/iommufd.h | 4 ++
> backends/iommufd.c | 29 +++++++++++
> hw/vfio/iommufd.c | 91
>+++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 5 files changed, 134 insertions(+)
>
>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>index e8ddf92bb185..82c5a4aaa61e 100644
>--- a/include/hw/vfio/vfio-common.h
>+++ b/include/hw/vfio/vfio-common.h
>@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
>
>+typedef struct VFIOIOASHwpt {
>+ uint32_t hwpt_id;
>+ QLIST_HEAD(, VFIODevice) device_list;
>+ QLIST_ENTRY(VFIOIOASHwpt) next;
>+} VFIOIOASHwpt;
>+
> typedef struct VFIOIOMMUFDContainer {
> VFIOContainerBase bcontainer;
> IOMMUFDBackend *be;
> uint32_t ioas_id;
>+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> } VFIOIOMMUFDContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>VFIO_IOMMU_IOMMUFD);
>@@ -134,6 +141,8 @@ typedef struct VFIODevice {
> HostIOMMUDevice *hiod;
> int devid;
> IOMMUFDBackend *iommufd;
>+ VFIOIOASHwpt *hwpt;
>+ QLIST_ENTRY(VFIODevice) hwpt_next;
> } VFIODevice;
>
> struct VFIODeviceOps {
>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index 57d502a1c79a..35a8cec9780f 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -50,6 +50,10 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
>+int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>+ uint32_t pt_id, uint32_t flags,
>+ uint32_t data_type, uint32_t data_len,
>+ void *data_ptr, uint32_t *out_hwpt);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index 2b3d51af26d2..f5f73eaf4a1a 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -208,6 +208,35 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> return ret;
> }
>
>+int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>+ uint32_t pt_id, uint32_t flags,
>+ uint32_t data_type, uint32_t data_len,
>+ void *data_ptr, uint32_t *out_hwpt)
>+{
>+ int ret, fd = be->fd;
>+ struct iommu_hwpt_alloc alloc_hwpt = {
>+ .size = sizeof(struct iommu_hwpt_alloc),
>+ .flags = flags,
>+ .dev_id = dev_id,
>+ .pt_id = pt_id,
>+ .data_type = data_type,
>+ .data_len = data_len,
>+ .data_uptr = (uint64_t)data_ptr,
>+ };
>+
>+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>+ data_len, (uint64_t)data_ptr,
>+ alloc_hwpt.out_hwpt_id, ret);
>+ if (ret) {
>+ ret = -errno;
>+ error_report("IOMMU_HWPT_ALLOC failed: %m");
>+ } else {
>+ *out_hwpt = alloc_hwpt.out_hwpt_id;
>+ }
>+ return ret;
>+}
>+
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 5a5993b17c2e..2ca9a32cc7b6 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -212,10 +212,95 @@ static bool
>iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
>+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>+ VFIOIOMMUFDContainer *container,
>+ Error **errp)
>+{
>+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
>+ uint32_t flags = 0;
>+ VFIOIOASHwpt *hwpt;
>+ uint32_t hwpt_id;
>+ int ret;
>+
>+ /* Try to find a domain */
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>NULL);
>+ if (ret) {
>+ /* -EINVAL means the domain is incompatible with the device. */
>+ if (ret == -EINVAL) {
>+ continue;
>+ }
>+
>+ return false;
>+ } else {
>+ vbasedev->hwpt = hwpt;
>+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>+ return true;
>+ }
>+ }
>+
>+ ret = iommufd_backend_alloc_hwpt(iommufd,
>+ vbasedev->devid,
>+ container->ioas_id, flags,
>+ IOMMU_HWPT_DATA_NONE, 0, NULL, &hwpt_id);
>+ if (ret) {
>+ /*
>+ * When there's no domains allocated we can still attempt a hardware
>+ * pagetable allocation for an mdev, which ends up returning -ENOENT
>+ * This is benign and meant to fallback into IOAS attach, hence don't
>+ * set errp.
>+ */
>+ return false;
>+ }
>+
>+ hwpt = g_malloc0(sizeof(*hwpt));
>+ hwpt->hwpt_id = hwpt_id;
>+ QLIST_INIT(&hwpt->device_list);
>+
>+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>+ if (ret) {
>+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>+ g_free(hwpt);
>+ return false;
>+ }
>+
>+ vbasedev->hwpt = hwpt;
>+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>+ return true;
>+}
>+
>+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>+ VFIOIOMMUFDContainer *container)
>+{
>+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>+
>+ QLIST_REMOVE(vbasedev, hwpt_next);
>+ if (QLIST_EMPTY(&hwpt->device_list)) {
>+ QLIST_REMOVE(hwpt, next);
>+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>+ g_free(hwpt);
>+ }
>+}
>+
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
>+ if (iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
>+ return true;
>+ }
>+
>+ /*
>+ * iommufd_cdev_autodomains_get() may have an expected failure (-
>ENOENT)
>+ * for mdevs as they aren't real physical devices. @errp is only set
>+ * when real failures happened i.e. failing to attach to a newly created
>+ * hardware pagetable. These are fatal and should fail the attachment.
>+ */
>+ if (*errp) {
Better to use ERRP_GUARD()
Thanks
Zhenzhong
>+ return false;
>+ }
>+
> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>errp);
> }
>
>@@ -224,6 +309,11 @@ static void
>iommufd_cdev_detach_container(VFIODevice *vbasedev,
> {
> Error *err = NULL;
>
>+ if (vbasedev->hwpt) {
>+ iommufd_cdev_autodomains_put(vbasedev, container);
>+ return;
>+ }
>+
> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
> error_report_err(err);
> }
>@@ -354,6 +444,7 @@ static bool iommufd_cdev_attach(const char *name,
>VFIODevice *vbasedev,
> container =
>VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
> container->be = vbasedev->iommufd;
> container->ioas_id = ioas_id;
>+ QLIST_INIT(&container->hwpt_list);
>
> bcontainer = &container->bcontainer;
> vfio_address_space_insert(space, bcontainer);
>diff --git a/backends/trace-events b/backends/trace-events
>index 211e6f374adc..4d8ac02fe7d6 100644
>--- a/backends/trace-events
>+++ b/backends/trace-events
>@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t
>ioas, uint64_t iova, uint64_t size
> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>size=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>ioas=%d"
>+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>(%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>--
>2.17.2
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation
2024-07-09 6:26 ` Duan, Zhenzhong
@ 2024-07-09 9:00 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-09 9:00 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 07:26, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation
>>
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>> responsible for creating an IOMMU domain. This is contrast to the
>> 'simple API' where the IOMMU domain is created by IOMMUFD
>> automatically
>> when it attaches to VFIO (usually referred as autodomains) but it has
>> the needed handling for mdevs.
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
>> to IOAS attach (which again is always the case for mdevs).
>>
>> The auto domain logic allows different IOMMU domains to be created when
>> DMA dirty tracking is not desired (and VF can provide it), and others where
>> it is. Here is not used in this way here given how VFIODevice migration
>> state is initialized after the device attachment. But such mixed mode of
>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>> be added on. Keep the 'all of nothing' approach that we have been using
>> so far between container vs device dirty tracking.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 9 ++++
>> include/sysemu/iommufd.h | 4 ++
>> backends/iommufd.c | 29 +++++++++++
>> hw/vfio/iommufd.c | 91
>> +++++++++++++++++++++++++++++++++++
>> backends/trace-events | 1 +
>> 5 files changed, 134 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>> common.h
>> index e8ddf92bb185..82c5a4aaa61e 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>
>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>
>> +typedef struct VFIOIOASHwpt {
>> + uint32_t hwpt_id;
>> + QLIST_HEAD(, VFIODevice) device_list;
>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>> +} VFIOIOASHwpt;
>> +
>> typedef struct VFIOIOMMUFDContainer {
>> VFIOContainerBase bcontainer;
>> IOMMUFDBackend *be;
>> uint32_t ioas_id;
>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>> } VFIOIOMMUFDContainer;
>>
>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>> VFIO_IOMMU_IOMMUFD);
>> @@ -134,6 +141,8 @@ typedef struct VFIODevice {
>> HostIOMMUDevice *hiod;
>> int devid;
>> IOMMUFDBackend *iommufd;
>> + VFIOIOASHwpt *hwpt;
>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>> } VFIODevice;
>>
>> struct VFIODeviceOps {
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 57d502a1c79a..35a8cec9780f 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -50,6 +50,10 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>> devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp);
>> +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt);
>>
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 2b3d51af26d2..f5f73eaf4a1a 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -208,6 +208,35 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> return ret;
>> }
>>
>> +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_hwpt_alloc alloc_hwpt = {
>> + .size = sizeof(struct iommu_hwpt_alloc),
>> + .flags = flags,
>> + .dev_id = dev_id,
>> + .pt_id = pt_id,
>> + .data_type = data_type,
>> + .data_len = data_len,
>> + .data_uptr = (uint64_t)data_ptr,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>> + data_len, (uint64_t)data_ptr,
>> + alloc_hwpt.out_hwpt_id, ret);
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_HWPT_ALLOC failed: %m");
>> + } else {
>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>> + }
>> + return ret;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>> devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 5a5993b17c2e..2ca9a32cc7b6 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -212,10 +212,95 @@ static bool
>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>> return true;
>> }
>>
>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container,
>> + Error **errp)
>> +{
>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> + uint32_t flags = 0;
>> + VFIOIOASHwpt *hwpt;
>> + uint32_t hwpt_id;
>> + int ret;
>> +
>> + /* Try to find a domain */
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>> NULL);
>> + if (ret) {
>> + /* -EINVAL means the domain is incompatible with the device. */
>> + if (ret == -EINVAL) {
>> + continue;
>> + }
>> +
>> + return false;
>> + } else {
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + return true;
>> + }
>> + }
>> +
>> + ret = iommufd_backend_alloc_hwpt(iommufd,
>> + vbasedev->devid,
>> + container->ioas_id, flags,
>> + IOMMU_HWPT_DATA_NONE, 0, NULL, &hwpt_id);
>> + if (ret) {
>> + /*
>> + * When there's no domains allocated we can still attempt a hardware
>> + * pagetable allocation for an mdev, which ends up returning -ENOENT
>> + * This is benign and meant to fallback into IOAS attach, hence don't
>> + * set errp.
>> + */
>> + return false;
>> + }
>> +
>> + hwpt = g_malloc0(sizeof(*hwpt));
>> + hwpt->hwpt_id = hwpt_id;
>> + QLIST_INIT(&hwpt->device_list);
>> +
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>> + if (ret) {
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + return false;
>> + }
>> +
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> + return true;
>> +}
>> +
>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container)
>> +{
>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>> +
>> + QLIST_REMOVE(vbasedev, hwpt_next);
>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>> + QLIST_REMOVE(hwpt, next);
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + }
>> +}
>> +
>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>> VFIOIOMMUFDContainer *container,
>> Error **errp)
>> {
>> + if (iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
>> + return true;
>> + }
>> +
>> + /*
>> + * iommufd_cdev_autodomains_get() may have an expected failure (-
>> ENOENT)
>> + * for mdevs as they aren't real physical devices. @errp is only set
>> + * when real failures happened i.e. failing to attach to a newly created
>> + * hardware pagetable. These are fatal and should fail the attachment.
>> + */
>> + if (*errp) {
>
> Better to use ERRP_GUARD()
>
OK
> Thanks
> Zhenzhong
>
>> + return false;
>> + }
>> +
>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>> errp);
>> }
>>
>> @@ -224,6 +309,11 @@ static void
>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>> {
>> Error *err = NULL;
>>
>> + if (vbasedev->hwpt) {
>> + iommufd_cdev_autodomains_put(vbasedev, container);
>> + return;
>> + }
>> +
>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>> error_report_err(err);
>> }
>> @@ -354,6 +444,7 @@ static bool iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>> container =
>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>> container->be = vbasedev->iommufd;
>> container->ioas_id = ioas_id;
>> + QLIST_INIT(&container->hwpt_list);
>>
>> bcontainer = &container->bcontainer;
>> vfio_address_space_insert(space, bcontainer);
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 211e6f374adc..4d8ac02fe7d6 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t
>> ioas, uint64_t iova, uint64_t size
>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>> size=0x%"PRIx64" (%d)"
>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>> ioas=%d"
>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>> (%d)"
>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>> id=%d (%d)"
>> --
>> 2.17.2
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation
2024-07-08 14:34 ` [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation Joao Martins
2024-07-09 6:26 ` Duan, Zhenzhong
@ 2024-07-09 6:50 ` Cédric Le Goater
2024-07-09 9:09 ` Joao Martins
1 sibling, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-09 6:50 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/8/24 4:34 PM, Joao Martins wrote:
> There's generally two modes of operation for IOMMUFD:
>
> * The simple user API which intends to perform relatively simple things
> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
> and mainly performs IOAS_MAP and UNMAP.
>
> * The native IOMMUFD API where you have fine grained control of the
> IOMMU domain and model it accordingly. This is where most new feature
> are being steered to.
>
> For dirty tracking 2) is required, as it needs to ensure that
> the stage-2/parent IOMMU domain will only attach devices
> that support dirty tracking (so far it is all homogeneous in x86, likely
> not the case for smmuv3). Such invariant on dirty tracking provides a
> useful guarantee to VMMs that will refuse incompatible device
> attachments for IOMMU domains.
>
> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
> responsible for creating an IOMMU domain. This is contrast to the
> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
> when it attaches to VFIO (usually referred as autodomains) but it has
> the needed handling for mdevs.
>
> To support dirty tracking with the advanced IOMMUFD API, it needs
> similar logic, where IOMMU domains are created and devices attached to
> compatible domains. Essentially mimmicing kernel
> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
> to IOAS attach (which again is always the case for mdevs).
>
> The auto domain logic allows different IOMMU domains to be created when
> DMA dirty tracking is not desired (and VF can provide it), and others where
> it is. Here is not used in this way here given how VFIODevice migration
> state is initialized after the device attachment. But such mixed mode of
> IOMMU dirty tracking + device dirty tracking is an improvement that can
> be added on. Keep the 'all of nothing' approach that we have been using
> so far between container vs device dirty tracking.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 9 ++++
> include/sysemu/iommufd.h | 4 ++
> backends/iommufd.c | 29 +++++++++++
> hw/vfio/iommufd.c | 91 +++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 5 files changed, 134 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e8ddf92bb185..82c5a4aaa61e 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> +typedef struct VFIOIOASHwpt {
> + uint32_t hwpt_id;
> + QLIST_HEAD(, VFIODevice) device_list;
> + QLIST_ENTRY(VFIOIOASHwpt) next;
> +} VFIOIOASHwpt;
> +
> typedef struct VFIOIOMMUFDContainer {
> VFIOContainerBase bcontainer;
> IOMMUFDBackend *be;
> uint32_t ioas_id;
> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> } VFIOIOMMUFDContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
> @@ -134,6 +141,8 @@ typedef struct VFIODevice {
> HostIOMMUDevice *hiod;
> int devid;
> IOMMUFDBackend *iommufd;
> + VFIOIOASHwpt *hwpt;
> + QLIST_ENTRY(VFIODevice) hwpt_next;
> } VFIODevice;
>
> struct VFIODeviceOps {
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 57d502a1c79a..35a8cec9780f 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -50,6 +50,10 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
> +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 2b3d51af26d2..f5f73eaf4a1a 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -208,6 +208,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> return ret;
> }
>
> +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> + uint32_t pt_id, uint32_t flags,
> + uint32_t data_type, uint32_t data_len,
> + void *data_ptr, uint32_t *out_hwpt)
> +{
> + int ret, fd = be->fd;
> + struct iommu_hwpt_alloc alloc_hwpt = {
> + .size = sizeof(struct iommu_hwpt_alloc),
> + .flags = flags,
> + .dev_id = dev_id,
> + .pt_id = pt_id,
> + .data_type = data_type,
> + .data_len = data_len,
> + .data_uptr = (uint64_t)data_ptr,
> + };
> +
> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
> + data_len, (uint64_t)data_ptr,
> + alloc_hwpt.out_hwpt_id, ret);
> + if (ret) {
> + ret = -errno;
> + error_report("IOMMU_HWPT_ALLOC failed: %m");
Please add an 'Error **errp' parameter.
> + } else {
> + *out_hwpt = alloc_hwpt.out_hwpt_id;
> + }
> + return ret;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 5a5993b17c2e..2ca9a32cc7b6 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -212,10 +212,95 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container,
> + Error **errp)
> +{
> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
> + uint32_t flags = 0;
> + VFIOIOASHwpt *hwpt;
> + uint32_t hwpt_id;
> + int ret;
> +
> + /* Try to find a domain */
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, NULL);
> + if (ret) {
> + /* -EINVAL means the domain is incompatible with the device. */
> + if (ret == -EINVAL) {
> + continue;
> + }
We should propagate an error in the false case.
> + return false;
> + } else {
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + return true;
> + }
> + }
> +
> + ret = iommufd_backend_alloc_hwpt(iommufd,
> + vbasedev->devid,
> + container->ioas_id, flags,
> + IOMMU_HWPT_DATA_NONE, 0, NULL, &hwpt_id);
> + if (ret) {
> + /*
> + * When there's no domains allocated we can still attempt a hardware
> + * pagetable allocation for an mdev, which ends up returning -ENOENT
> + * This is benign and meant to fallback into IOAS attach, hence don't
> + * set errp.
> + */
same here.
> + return false;
> + }
> +
> + hwpt = g_malloc0(sizeof(*hwpt));
> + hwpt->hwpt_id = hwpt_id;
> + QLIST_INIT(&hwpt->device_list);
> +
> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> + if (ret) {
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + return false;
> + }
> +
> + vbasedev->hwpt = hwpt;
> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> + return true;
> +}
> +
> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container)
> +{
> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
> +
> + QLIST_REMOVE(vbasedev, hwpt_next);
> + if (QLIST_EMPTY(&hwpt->device_list)) {
> + QLIST_REMOVE(hwpt, next);
> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> + g_free(hwpt);
> + }
> +}
> +
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
> + if (iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
> + return true;
> + }
> +
> + /*
> + * iommufd_cdev_autodomains_get() may have an expected failure (-ENOENT)
> + * for mdevs as they aren't real physical devices. @errp is only set
> + * when real failures happened i.e. failing to attach to a newly created
> + * hardware pagetable. These are fatal and should fail the attachment.
> + */
> + if (*errp) {
> + return false;
> + }
> +
> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
> }
>
> @@ -224,6 +309,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
> {
> Error *err = NULL;
>
> + if (vbasedev->hwpt) {
> + iommufd_cdev_autodomains_put(vbasedev, container);
> + return;
> + }
> +
> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
> error_report_err(err);
> }
> @@ -354,6 +444,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
> container->be = vbasedev->iommufd;
> container->ioas_id = ioas_id;
> + QLIST_INIT(&container->hwpt_list);
>
> bcontainer = &container->bcontainer;
> vfio_address_space_insert(space, bcontainer);
> diff --git a/backends/trace-events b/backends/trace-events
> index 211e6f374adc..4d8ac02fe7d6 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation
2024-07-09 6:50 ` Cédric Le Goater
@ 2024-07-09 9:09 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-09 9:09 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 07:50, Cédric Le Goater wrote:
> On 7/8/24 4:34 PM, Joao Martins wrote:
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>> responsible for creating an IOMMU domain. This is contrast to the
>> 'simple API' where the IOMMU domain is created by IOMMUFD automatically
>> when it attaches to VFIO (usually referred as autodomains) but it has
>> the needed handling for mdevs.
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
>> to IOAS attach (which again is always the case for mdevs).
>>
>> The auto domain logic allows different IOMMU domains to be created when
>> DMA dirty tracking is not desired (and VF can provide it), and others where
>> it is. Here is not used in this way here given how VFIODevice migration
>> state is initialized after the device attachment. But such mixed mode of
>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>> be added on. Keep the 'all of nothing' approach that we have been using
>> so far between container vs device dirty tracking.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 9 ++++
>> include/sysemu/iommufd.h | 4 ++
>> backends/iommufd.c | 29 +++++++++++
>> hw/vfio/iommufd.c | 91 +++++++++++++++++++++++++++++++++++
>> backends/trace-events | 1 +
>> 5 files changed, 134 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index e8ddf92bb185..82c5a4aaa61e 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>> +typedef struct VFIOIOASHwpt {
>> + uint32_t hwpt_id;
>> + QLIST_HEAD(, VFIODevice) device_list;
>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>> +} VFIOIOASHwpt;
>> +
>> typedef struct VFIOIOMMUFDContainer {
>> VFIOContainerBase bcontainer;
>> IOMMUFDBackend *be;
>> uint32_t ioas_id;
>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>> } VFIOIOMMUFDContainer;
>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
>> @@ -134,6 +141,8 @@ typedef struct VFIODevice {
>> HostIOMMUDevice *hiod;
>> int devid;
>> IOMMUFDBackend *iommufd;
>> + VFIOIOASHwpt *hwpt;
>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>> } VFIODevice;
>> struct VFIODeviceOps {
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 57d502a1c79a..35a8cec9780f 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -50,6 +50,10 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>> ioas_id,
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp);
>> +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt);
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 2b3d51af26d2..f5f73eaf4a1a 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -208,6 +208,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be,
>> uint32_t ioas_id,
>> return ret;
>> }
>> +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_hwpt_alloc alloc_hwpt = {
>> + .size = sizeof(struct iommu_hwpt_alloc),
>> + .flags = flags,
>> + .dev_id = dev_id,
>> + .pt_id = pt_id,
>> + .data_type = data_type,
>> + .data_len = data_len,
>> + .data_uptr = (uint64_t)data_ptr,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>> + data_len, (uint64_t)data_ptr,
>> + alloc_hwpt.out_hwpt_id, ret);
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_HWPT_ALLOC failed: %m");
>
> Please add an 'Error **errp' parameter.
>
Yes, but I need to do some special casing for mdev then. I'll expand below.
>> + } else {
>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>> + }
>> + return ret;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 5a5993b17c2e..2ca9a32cc7b6 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -212,10 +212,95 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice
>> *vbasedev, Error **errp)
>> return true;
>> }
>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container,
>> + Error **errp)
>> +{
>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> + uint32_t flags = 0;
>> + VFIOIOASHwpt *hwpt;
>> + uint32_t hwpt_id;
>> + int ret;
>> +
>> + /* Try to find a domain */
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, NULL);
>> + if (ret) {
>> + /* -EINVAL means the domain is incompatible with the device. */
>> + if (ret == -EINVAL) {
>> + continue;
>> + }
>
> We should propagate an error in the false case.
>
Same as before due to mdev.
When we are dealing with physical devices this is correct to handle as an error
and stops the attach. However we may still get a -ENOENT for mdevs which is why
I wasn't making an error here.
It just so seems confusing that all the code tries not to special case mdev,
that I went along with it. But honestly on a second look I should really special
case mdev to avoid autodomains given that we know it will fail. This way I
should be able to propagate all errors causes and not wrongly hide errors that
might be applicable for physical devices.
Now the -EINVAL is not an error we are supposed to propagate, as we expect
-EINVAL as a 'this domain can't be used try the next one'.
>> + return false;
>> + } else {
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + return true;
>> + }
>> + }
>> +
>> + ret = iommufd_backend_alloc_hwpt(iommufd,
>> + vbasedev->devid,
>> + container->ioas_id, flags,
>> + IOMMU_HWPT_DATA_NONE, 0, NULL, &hwpt_id);
>> + if (ret) {
>> + /*
>> + * When there's no domains allocated we can still attempt a hardware
>> + * pagetable allocation for an mdev, which ends up returning -ENOENT
>> + * This is benign and meant to fallback into IOAS attach, hence don't
>> + * set errp.
>> + */
>
> same here.
>
As the comment says the previous paragraph was the reason for not propagating an
error here.
>> + return false;
>> + }
>> +
>> + hwpt = g_malloc0(sizeof(*hwpt));
>> + hwpt->hwpt_id = hwpt_id;
>> + QLIST_INIT(&hwpt->device_list);
>> +
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>> + if (ret) {
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + return false;
>> + }
>> +
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> + return true;
>> +}
>> +
>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container)
>> +{
>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>> +
>> + QLIST_REMOVE(vbasedev, hwpt_next);
>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>> + QLIST_REMOVE(hwpt, next);
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + }
>> +}
>> +
>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>> VFIOIOMMUFDContainer *container,
>> Error **errp)
>> {
>> + if (iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
>> + return true;
>> + }
>> +
>> + /*
>> + * iommufd_cdev_autodomains_get() may have an expected failure (-ENOENT)
>> + * for mdevs as they aren't real physical devices. @errp is only set
>> + * when real failures happened i.e. failing to attach to a newly created
>> + * hardware pagetable. These are fatal and should fail the attachment.
>> + */
>> + if (*errp) {
>> + return false;
>> + }
>> +
>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
>> }
>> @@ -224,6 +309,11 @@ static void iommufd_cdev_detach_container(VFIODevice
>> *vbasedev,
>> {
>> Error *err = NULL;
>> + if (vbasedev->hwpt) {
>> + iommufd_cdev_autodomains_put(vbasedev, container);
>> + return;
>> + }
>> +
>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>> error_report_err(err);
>> }
>> @@ -354,6 +444,7 @@ static bool iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>> container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>> container->be = vbasedev->iommufd;
>> container->ioas_id = ioas_id;
>> + QLIST_INIT(&container->hwpt_list);
>> bcontainer = &container->bcontainer;
>> vfio_address_space_insert(space, bcontainer);
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 211e6f374adc..4d8ac02fe7d6 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t
>> iova, uint64_t size
>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t
>> iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d
>> iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64"
>> (%d)"
>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id,
>> uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t
>> out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u
>> len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>> id=%d (%d)"
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (3 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-09 6:28 ` Cédric Le Goater
2024-07-08 14:34 ` [PATCH v3 06/10] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
` (5 subsequent siblings)
10 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
Probe hardware dirty tracking support by querying device hw capabilities via
IOMMUFD_GET_HW_INFO.
In preparation to using the dirty tracking UAPI, request dirty tracking in the
HWPT flags when the IOMMU supports dirty tracking.
The auto domain logic allows different IOMMU domains to be created when DMA
dirty tracking is not desired (and VF can provide it) while others doesn't have
it and want the IOMMU capability. This is not used in this way here given how
VFIODevice migration capability checking takes place *after* the device
attachment. But such granularity is a nice property that can be implemented
later on.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/iommufd.c | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 82c5a4aaa61e..7ce925cfab19 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
typedef struct VFIOIOASHwpt {
uint32_t hwpt_id;
+ uint32_t hwpt_flags;
QLIST_HEAD(, VFIODevice) device_list;
QLIST_ENTRY(VFIOIOASHwpt) next;
} VFIOIOASHwpt;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 2ca9a32cc7b6..1b5b46d28ed6 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -212,6 +212,20 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
return true;
}
+static bool iommufd_device_dirty_tracking(IOMMUFDBackend *iommufd,
+ VFIODevice *vbasedev)
+{
+ enum iommu_hw_info_type type;
+ uint64_t caps;
+
+ if (!iommufd_backend_get_device_info(iommufd, vbasedev->devid, &type,
+ NULL, 0, &caps, NULL)) {
+ return false;
+ }
+
+ return caps & IOMMU_HW_CAP_DIRTY_TRACKING;
+}
+
static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
@@ -239,6 +253,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
}
}
+ /*
+ * This is quite early and VFIODevice isn't yet fully initialized,
+ * thus rely on IOMMU hardware capabilities as to whether IOMMU dirty
+ * tracking is going to be needed.
+ */
+ if (iommufd_device_dirty_tracking(iommufd, vbasedev)) {
+ flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+ }
+
ret = iommufd_backend_alloc_hwpt(iommufd,
vbasedev->devid,
container->ioas_id, flags,
@@ -255,6 +278,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
hwpt = g_malloc0(sizeof(*hwpt));
hwpt->hwpt_id = hwpt_id;
+ hwpt->hwpt_flags = flags;
QLIST_INIT(&hwpt->device_list);
ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
@@ -267,6 +291,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
vbasedev->hwpt = hwpt;
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ container->bcontainer.dirty_pages_supported |=
+ (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-08 14:34 ` [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-07-09 6:28 ` Cédric Le Goater
2024-07-09 9:04 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-09 6:28 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/8/24 4:34 PM, Joao Martins wrote:
> Probe hardware dirty tracking support by querying device hw capabilities via
> IOMMUFD_GET_HW_INFO.
>
> In preparation to using the dirty tracking UAPI, request dirty tracking in the
> HWPT flags when the IOMMU supports dirty tracking.
>
> The auto domain logic allows different IOMMU domains to be created when DMA
> dirty tracking is not desired (and VF can provide it) while others doesn't have
> it and want the IOMMU capability. This is not used in this way here given how
> VFIODevice migration capability checking takes place *after* the device
> attachment. But such granularity is a nice property that can be implemented
> later on.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/iommufd.c | 26 ++++++++++++++++++++++++++
> 2 files changed, 27 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 82c5a4aaa61e..7ce925cfab19 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> typedef struct VFIOIOASHwpt {
> uint32_t hwpt_id;
> + uint32_t hwpt_flags;
> QLIST_HEAD(, VFIODevice) device_list;
> QLIST_ENTRY(VFIOIOASHwpt) next;
> } VFIOIOASHwpt;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 2ca9a32cc7b6..1b5b46d28ed6 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -212,6 +212,20 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
> +static bool iommufd_device_dirty_tracking(IOMMUFDBackend *iommufd,
> + VFIODevice *vbasedev)
> +{
> + enum iommu_hw_info_type type;
> + uint64_t caps;
> +
> + if (!iommufd_backend_get_device_info(iommufd, vbasedev->devid, &type,
> + NULL, 0, &caps, NULL)) {
I think we should report the error and not ignore it.
That said, since we are probing the hw features of the host IOMMU device,
could we use the data cached in the HostIOMMUDevice struct instead ?
This means would need to move the ->realize() call doing the probing
before attaching the device in vfio_attach_device(). That way we would
catch probing errors in one place. Does this make sense ?
Thanks,
C.
> + return false;
> + }
> +
> + return caps & IOMMU_HW_CAP_DIRTY_TRACKING;
> +}
> +
> static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> @@ -239,6 +253,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> }
> }
>
> + /*
> + * This is quite early and VFIODevice isn't yet fully initialized,
> + * thus rely on IOMMU hardware capabilities as to whether IOMMU dirty
> + * tracking is going to be needed.
> + */
> + if (iommufd_device_dirty_tracking(iommufd, vbasedev)) {
> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> + }
> +
> ret = iommufd_backend_alloc_hwpt(iommufd,
> vbasedev->devid,
> container->ioas_id, flags,
> @@ -255,6 +278,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>
> hwpt = g_malloc0(sizeof(*hwpt));
> hwpt->hwpt_id = hwpt_id;
> + hwpt->hwpt_flags = flags;
> QLIST_INIT(&hwpt->device_list);
>
> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
> @@ -267,6 +291,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> vbasedev->hwpt = hwpt;
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> + container->bcontainer.dirty_pages_supported |=
> + (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
> return true;
> }
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-09 6:28 ` Cédric Le Goater
@ 2024-07-09 9:04 ` Joao Martins
2024-07-09 12:47 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-09 9:04 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 07:28, Cédric Le Goater wrote:
> On 7/8/24 4:34 PM, Joao Martins wrote:
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 2ca9a32cc7b6..1b5b46d28ed6 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -212,6 +212,20 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice
>> *vbasedev, Error **errp)
>> return true;
>> }
>> +static bool iommufd_device_dirty_tracking(IOMMUFDBackend *iommufd,
>> + VFIODevice *vbasedev)
>> +{
>> + enum iommu_hw_info_type type;
>> + uint64_t caps;
>> +
>> + if (!iommufd_backend_get_device_info(iommufd, vbasedev->devid, &type,
>> + NULL, 0, &caps, NULL)) {
>
> I think we should report the error and not ignore it.
>
> That said, since we are probing the hw features of the host IOMMU device,
> could we use the data cached in the HostIOMMUDevice struct instead ?
> This means would need to move the ->realize() call doing the probing
> before attaching the device in vfio_attach_device(). That way we would
> catch probing errors in one place. Does this make sense ?
Yeap. It also helps centralizing cap checking in addition.
This stanadlone use of iommufd_backend_get_device_info() was also annoying me a
little, and there doesn't seem to have a reason not to move the initialization
of caps earlier. I'll do that
Joao
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-09 9:04 ` Joao Martins
@ 2024-07-09 12:47 ` Joao Martins
2024-07-09 16:53 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-09 12:47 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 10:04, Joao Martins wrote:
> On 09/07/2024 07:28, Cédric Le Goater wrote:
>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 2ca9a32cc7b6..1b5b46d28ed6 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -212,6 +212,20 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice
>>> *vbasedev, Error **errp)
>>> return true;
>>> }
>>> +static bool iommufd_device_dirty_tracking(IOMMUFDBackend *iommufd,
>>> + VFIODevice *vbasedev)
>>> +{
>>> + enum iommu_hw_info_type type;
>>> + uint64_t caps;
>>> +
>>> + if (!iommufd_backend_get_device_info(iommufd, vbasedev->devid, &type,
>>> + NULL, 0, &caps, NULL)) {
>>
>> I think we should report the error and not ignore it.
>>
>> That said, since we are probing the hw features of the host IOMMU device,
>> could we use the data cached in the HostIOMMUDevice struct instead ?
>> This means would need to move the ->realize() call doing the probing
>> before attaching the device in vfio_attach_device(). That way we would
>> catch probing errors in one place. Does this make sense ?
>
> Yeap. It also helps centralizing cap checking in addition.
>
> This stanadlone use of iommufd_backend_get_device_info() was also annoying me a
> little, and there doesn't seem to have a reason not to move the initialization
> of caps earlier. I'll do that
Maybe I was a little too early into this. I had the snip below, but I forgot
that vbasedev::devid / vbasedev::iommufd are used by hiod realize() and that is
done in the very beginning of ->attach_device() of iommufd backend. Not enterily
sure how to unravel that hmmm
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7cdb969fd396..42931891bf8e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1552,17 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
assert(ops);
- if (!ops->attach_device(name, vbasedev, as, errp)) {
- return false;
- }
-
hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
object_unref(hiod);
- ops->detach_device(vbasedev);
return false;
}
+
vbasedev->hiod = hiod;
+ if (!ops->attach_device(name, vbasedev, as, errp)) {
+ object_unref(hiod);
+ return false;
+ }
return true;
}
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-09 12:47 ` Joao Martins
@ 2024-07-09 16:53 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-09 16:53 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 13:47, Joao Martins wrote:
> On 09/07/2024 10:04, Joao Martins wrote:
>> On 09/07/2024 07:28, Cédric Le Goater wrote:
>>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 2ca9a32cc7b6..1b5b46d28ed6 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -212,6 +212,20 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice
>>>> *vbasedev, Error **errp)
>>>> return true;
>>>> }
>>>> +static bool iommufd_device_dirty_tracking(IOMMUFDBackend *iommufd,
>>>> + VFIODevice *vbasedev)
>>>> +{
>>>> + enum iommu_hw_info_type type;
>>>> + uint64_t caps;
>>>> +
>>>> + if (!iommufd_backend_get_device_info(iommufd, vbasedev->devid, &type,
>>>> + NULL, 0, &caps, NULL)) {
>>>
>>> I think we should report the error and not ignore it.
>>>
>>> That said, since we are probing the hw features of the host IOMMU device,
>>> could we use the data cached in the HostIOMMUDevice struct instead ?
>>> This means would need to move the ->realize() call doing the probing
>>> before attaching the device in vfio_attach_device(). That way we would
>>> catch probing errors in one place. Does this make sense ?
>>
>> Yeap. It also helps centralizing cap checking in addition.
>>
>> This stanadlone use of iommufd_backend_get_device_info() was also annoying me a
>> little, and there doesn't seem to have a reason not to move the initialization
>> of caps earlier. I'll do that
>
> Maybe I was a little too early into this. I had the snip below, but I forgot
> that vbasedev::devid / vbasedev::iommufd are used by hiod realize() and that is
> done in the very beginning of ->attach_device() of iommufd backend. Not enterily
> sure how to unravel that hmmm
>
Here's what I came up with (below). Does it matches what you expected?
One thing I wasn't quite clear is what you and Zhenzhong purpose with
HostIOMMUDevice. It looks quite geared towards IOMMUFD with 'enough'
compatiblity for legacy backend:
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
uint8_t aw_bits;
+ struct {
+ void *hw_data;
+ uint64_t hw_data_len;
+ uint64_t hw_caps;
+ } iommufd;
} HostIOMMUDeviceCaps;
So I namespaced the new data we get from ioctl with iommufd, but maybe it wasn't
needed and this doesn't match the style? Let me know your thoughts
--------------->8----------------
From: Joao Martins <joao.m.martins@oracle.com>
Date: Tue, 9 Jul 2024 11:56:18 +0000
Subject: [PATCH] vfio/common: Initialize HostIOMMUDeviceCaps during
attach_device()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Fetch IOMMU hw raw caps behind the device and thus move part of what
happens in HostIOMMUDevice::realize() to be done during the attach
of the device. It allows us to cache the information obtained from
IOMMU_GET_HW_INFO from iommufd and then serialize to external agents.
This is in preparation to fetch parse hw capabilities and understand
if dirty tracking is supported by device backing IOMMU without
necessarily duplicating the amount of calls we do to IOMMU_GET_HW_INFO.
Suggested-by: Cédric Le Goater <clg@redhat.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/host_iommu_device.h | 5 ++++
backends/host_iommu_device.c | 3 +++
hw/vfio/common.c | 7 ++++--
hw/vfio/iommufd.c | 39 +++++++++++++++++++++---------
4 files changed, 40 insertions(+), 14 deletions(-)
diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
index ee6c813c8b22..9f5f368d97f0 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -25,6 +25,11 @@
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
uint8_t aw_bits;
+ struct {
+ void *hw_data;
+ uint64_t hw_data_len;
+ uint64_t hw_caps;
+ } iommufd;
} HostIOMMUDeviceCaps;
#define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
diff --git a/backends/host_iommu_device.c b/backends/host_iommu_device.c
index 8f2dda1beb9b..df74b740f8fe 100644
--- a/backends/host_iommu_device.c
+++ b/backends/host_iommu_device.c
@@ -29,5 +29,8 @@ static void host_iommu_device_finalize(Object *obj)
{
HostIOMMUDevice *hiod = HOST_IOMMU_DEVICE(obj);
+ g_free(hiod->caps.iommufd.hw_data);
+ hiod->caps.iommufd.hw_data_len = 0;
+
g_free(hiod->name);
}
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7cdb969fd396..f3e7fb550788 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1552,17 +1552,20 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
assert(ops);
+
+ hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
+ vbasedev->hiod = hiod;
+
if (!ops->attach_device(name, vbasedev, as, errp)) {
return false;
}
- hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
object_unref(hiod);
ops->detach_device(vbasedev);
+ vbasedev->hiod = NULL;
return false;
}
- vbasedev->hiod = hiod;
return true;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 69a13d240811..87c2d2f07d0a 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -381,6 +381,29 @@ error:
return false;
}
+static bool iommufd_cdev_hiod_caps_init(VFIODevice *vbasedev, Error **errp)
+{
+ HostIOMMUDeviceCaps *caps;
+ union {
+ struct iommu_hw_info_vtd vtd;
+ } hw_data;
+
+ g_assert(vbasedev->hiod);
+
+ /* Cache IOMMU hardware information ahead of HostIOMMUDevice::realize() */
+ caps = &vbasedev->hiod->caps;
+ caps->iommufd.hw_data = g_malloc0(sizeof(hw_data));
+ caps->iommufd.hw_data_len = sizeof(hw_data);
+ if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
+ &caps->type, &caps->iommufd.hw_data,
+ caps->iommufd.hw_data_len,
+ &caps->iommufd.hw_caps, errp)) {
+ return false;
+ }
+
+ return true;
+}
+
static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp)
{
@@ -410,6 +433,10 @@ static bool iommufd_cdev_attach(const char *name,
VFIODevice *vbasedev,
space = vfio_get_address_space(as);
+ if (!iommufd_cdev_hiod_caps_init(vbasedev, errp)) {
+ goto err_alloc_ioas;
+ }
+
/* try to attach to an existing container in this space */
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
@@ -715,11 +742,6 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice
*hiod, void *opaque,
{
VFIODevice *vdev = opaque;
HostIOMMUDeviceCaps *caps = &hiod->caps;
- enum iommu_hw_info_type type;
- union {
- struct iommu_hw_info_vtd vtd;
- } data;
- uint64_t hw_caps;
hiod->agent = opaque;
@@ -727,14 +749,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice
*hiod, void *opaque,
return true;
}
- if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
- &type, &data, sizeof(data),
- &hw_caps, errp)) {
- return false;
- }
-
hiod->name = g_strdup(vdev->name);
- caps->type = type;
caps->aw_bits = vfio_device_get_aw_bits(vdev);
return true;
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 06/10] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (4 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-09 7:07 ` Cédric Le Goater
2024-07-08 14:34 ` [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
` (4 subsequent siblings)
10 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking. It is used if the hwpt
has been created with dirty tracking supported domain (stored in
hwpt::flags) and it is called on the whole list of iommu domains
it is are tracking. On failure it rolls it back.
The checking of hwpt::flags is introduced here as a second user
and thus consolidate such check into a helper function
iommufd_hwpt_dirty_tracking().
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/iommufd.h | 3 +++
backends/iommufd.c | 20 ++++++++++++++++++
hw/vfio/iommufd.c | 45 +++++++++++++++++++++++++++++++++++++++-
backends/trace-events | 1 +
4 files changed, 68 insertions(+), 1 deletion(-)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 35a8cec9780f..1470377f55ba 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -54,6 +54,9 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
uint32_t pt_id, uint32_t flags,
uint32_t data_type, uint32_t data_len,
void *data_ptr, uint32_t *out_hwpt);
+int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+ bool start);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
+
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index f5f73eaf4a1a..69daabc27473 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -237,6 +237,26 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
return ret;
}
+int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+ bool start)
+{
+ int ret;
+ struct iommu_hwpt_set_dirty_tracking set_dirty = {
+ .size = sizeof(set_dirty),
+ .hwpt_id = hwpt_id,
+ .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
+ };
+
+ ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
+ trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret);
+ if (ret) {
+ ret = -errno;
+ error_report("IOMMU_HWPT_SET_DIRTY_TRACKING failed: %s",
+ strerror(errno));
+ }
+ return ret;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 1b5b46d28ed6..158a98cb3b12 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -110,6 +110,48 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
iommufd_backend_disconnect(vbasedev->iommufd);
}
+static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
+{
+ return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+}
+
+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+ bool start, Error **errp)
+{
+ const VFIOIOMMUFDContainer *container =
+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+ int ret;
+ VFIOIOASHwpt *hwpt;
+
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+
+ ret = iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, start);
+ if (ret) {
+ ret = -errno;
+ error_setg_errno(errp, errno,
+ "Failed to start dirty tracking on hwpt_id %u",
+ hwpt->hwpt_id);
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+ iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, !start);
+ }
+ return ret;
+}
+
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
{
ERRP_GUARD();
@@ -292,7 +334,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
container->bcontainer.dirty_pages_supported |=
- (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
+ iommufd_hwpt_dirty_tracking(hwpt);
return true;
}
@@ -734,6 +776,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->attach_device = iommufd_cdev_attach;
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
+ vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
};
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/backends/trace-events b/backends/trace-events
index 4d8ac02fe7d6..28aca3b859d4 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 06/10] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-08 14:34 ` [PATCH v3 06/10] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-07-09 7:07 ` Cédric Le Goater
2024-07-09 9:13 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-09 7:07 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/8/24 4:34 PM, Joao Martins wrote:
> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
> enables or disables dirty page tracking. It is used if the hwpt
> has been created with dirty tracking supported domain (stored in
> hwpt::flags) and it is called on the whole list of iommu domains
> it is are tracking. On failure it rolls it back.
>
> The checking of hwpt::flags is introduced here as a second user
> and thus consolidate such check into a helper function
> iommufd_hwpt_dirty_tracking().
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/iommufd.h | 3 +++
> backends/iommufd.c | 20 ++++++++++++++++++
> hw/vfio/iommufd.c | 45 +++++++++++++++++++++++++++++++++++++++-
> backends/trace-events | 1 +
> 4 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 35a8cec9780f..1470377f55ba 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -54,6 +54,9 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> uint32_t pt_id, uint32_t flags,
> uint32_t data_type, uint32_t data_len,
> void *data_ptr, uint32_t *out_hwpt);
> +int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> + bool start);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> +
> #endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index f5f73eaf4a1a..69daabc27473 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -237,6 +237,26 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> return ret;
> }
>
> +int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> + bool start)
> +{
> + int ret;
> + struct iommu_hwpt_set_dirty_tracking set_dirty = {
> + .size = sizeof(set_dirty),
> + .hwpt_id = hwpt_id,
> + .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
> + };
> +
> + ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
> + trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret);
> + if (ret) {
> + ret = -errno;
> + error_report("IOMMU_HWPT_SET_DIRTY_TRACKING failed: %s",
> + strerror(errno));
> + }
> + return ret;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 1b5b46d28ed6..158a98cb3b12 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -110,6 +110,48 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
> iommufd_backend_disconnect(vbasedev->iommufd);
> }
>
> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
> +{
> + return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> +}
> +
> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
> + bool start, Error **errp)
> +{
> + const VFIOIOMMUFDContainer *container =
> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> + int ret;
> + VFIOIOASHwpt *hwpt;
> +
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> +
> + ret = iommufd_backend_set_dirty_tracking(container->be,
> + hwpt->hwpt_id, start);
> + if (ret) {
> + ret = -errno;
> + error_setg_errno(errp, errno,
> + "Failed to start dirty tracking on hwpt_id %u",
> + hwpt->hwpt_id);
This error looks redundant with the one printed out in the lower backend
version. Couldn't we add an 'Error **' parameter and return a bool ?
Thanks,
C.
> + goto err;
> + }
> + }
> +
> + return 0;
> +
> +err:
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> + iommufd_backend_set_dirty_tracking(container->be,
> + hwpt->hwpt_id, !start);
> + }
> + return ret;
> +}
> +
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
> @@ -292,7 +334,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> container->bcontainer.dirty_pages_supported |=
> - (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
> + iommufd_hwpt_dirty_tracking(hwpt);
> return true;
> }
>
> @@ -734,6 +776,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->attach_device = iommufd_cdev_attach;
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> + vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> diff --git a/backends/trace-events b/backends/trace-events
> index 4d8ac02fe7d6..28aca3b859d4 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> +iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 06/10] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-09 7:07 ` Cédric Le Goater
@ 2024-07-09 9:13 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-09 9:13 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 08:07, Cédric Le Goater wrote:
> On 7/8/24 4:34 PM, Joao Martins wrote:
>> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
>> enables or disables dirty page tracking. It is used if the hwpt
>> has been created with dirty tracking supported domain (stored in
>> hwpt::flags) and it is called on the whole list of iommu domains
>> it is are tracking. On failure it rolls it back.
>>
>> The checking of hwpt::flags is introduced here as a second user
>> and thus consolidate such check into a helper function
>> iommufd_hwpt_dirty_tracking().
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/iommufd.h | 3 +++
>> backends/iommufd.c | 20 ++++++++++++++++++
>> hw/vfio/iommufd.c | 45 +++++++++++++++++++++++++++++++++++++++-
>> backends/trace-events | 1 +
>> 4 files changed, 68 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 35a8cec9780f..1470377f55ba 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -54,6 +54,9 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>> uint32_t pt_id, uint32_t flags,
>> uint32_t data_type, uint32_t data_len,
>> void *data_ptr, uint32_t *out_hwpt);
>> +int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>> + bool start);
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> +
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index f5f73eaf4a1a..69daabc27473 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -237,6 +237,26 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be,
>> uint32_t dev_id,
>> return ret;
>> }
>> +int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>> + bool start)
>> +{
>> + int ret;
>> + struct iommu_hwpt_set_dirty_tracking set_dirty = {
>> + .size = sizeof(set_dirty),
>> + .hwpt_id = hwpt_id,
>> + .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
>> + };
>> +
>> + ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
>> + trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret);
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_HWPT_SET_DIRTY_TRACKING failed: %s",
>> + strerror(errno));
>> + }
>> + return ret;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 1b5b46d28ed6..158a98cb3b12 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -110,6 +110,48 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice
>> *vbasedev)
>> iommufd_backend_disconnect(vbasedev->iommufd);
>> }
>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>> +{
>> + return hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> +}
>> +
>> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
>> + bool start, Error **errp)
>> +{
>> + const VFIOIOMMUFDContainer *container =
>> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> + int ret;
>> + VFIOIOASHwpt *hwpt;
>> +
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>> + continue;
>> + }
>> +
>> + ret = iommufd_backend_set_dirty_tracking(container->be,
>> + hwpt->hwpt_id, start);
>> + if (ret) {
>> + ret = -errno;
>> + error_setg_errno(errp, errno,
>> + "Failed to start dirty tracking on hwpt_id %u",
>> + hwpt->hwpt_id);
>
>
>
> This error looks redundant with the one printed out in the lower backend
> version. Couldn't we add an 'Error **' parameter and return a bool ?
I'll add here as well.
Joao
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (5 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 06/10] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-09 7:05 ` Cédric Le Goater
2024-07-08 14:34 ` [PATCH v3 08/10] vfio/iommufd: Parse hw_caps and store dirty tracking support Joao Martins
` (3 subsequent siblings)
10 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.
A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/iommufd.h | 3 +++
backends/iommufd.c | 26 ++++++++++++++++++++++++++
hw/vfio/iommufd.c | 34 ++++++++++++++++++++++++++++++++++
backends/trace-events | 1 +
4 files changed, 64 insertions(+)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 1470377f55ba..223f1ea14e84 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -56,6 +56,9 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
void *data_ptr, uint32_t *out_hwpt);
int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
bool start);
+int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 69daabc27473..b2d3bbd7c31b 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -257,6 +257,32 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
return ret;
}
+int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data)
+{
+ int ret;
+ struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
+ .size = sizeof(get_dirty_bitmap),
+ .hwpt_id = hwpt_id,
+ .iova = iova,
+ .length = size,
+ .page_size = page_size,
+ .data = (uintptr_t)data,
+ };
+
+ ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
+ trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
+ page_size, ret);
+ if (ret) {
+ error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
+ " size: 0x%"PRIx64") failed: %s", iova,
+ size, strerror(errno));
+ }
+
+ return !ret ? 0 : -errno;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 158a98cb3b12..9fad47baed9e 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
#include "qemu/cutils.h"
#include "qemu/chardev_open.h"
#include "pci.h"
+#include "exec/ram_addr.h"
static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly)
@@ -152,6 +153,38 @@ err:
return ret;
}
+static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
+ VFIOBitmap *vbmap, hwaddr iova,
+ hwaddr size, Error **errp)
+{
+ VFIOIOMMUFDContainer *container = container_of(bcontainer,
+ VFIOIOMMUFDContainer,
+ bcontainer);
+ int ret;
+ VFIOIOASHwpt *hwpt;
+ unsigned long page_size;
+
+ page_size = qemu_real_host_page_size();
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+
+ ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
+ iova, size, page_size,
+ vbmap->bitmap);
+ if (ret) {
+ error_setg_errno(errp, ret,
+ "Failed to get dirty bitmap report, hwpt_id %u, iova: "
+ "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx,
+ hwpt->hwpt_id, iova, size);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
{
ERRP_GUARD();
@@ -777,6 +810,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
+ vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
};
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/backends/trace-events b/backends/trace-events
index 28aca3b859d4..40811a316215 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
+iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-08 14:34 ` [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-07-09 7:05 ` Cédric Le Goater
2024-07-09 9:13 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-09 7:05 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/8/24 4:34 PM, Joao Martins wrote:
> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
> that fetches the bitmap that tells what was dirty in an IOVA
> range.
>
> A single bitmap is allocated and used across all the hwpts
> sharing an IOAS which is then used in log_sync() to set Qemu
> global bitmaps.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/sysemu/iommufd.h | 3 +++
> backends/iommufd.c | 26 ++++++++++++++++++++++++++
> hw/vfio/iommufd.c | 34 ++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 4 files changed, 64 insertions(+)
>
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 1470377f55ba..223f1ea14e84 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -56,6 +56,9 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> void *data_ptr, uint32_t *out_hwpt);
> int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> bool start);
> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 69daabc27473..b2d3bbd7c31b 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -257,6 +257,32 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> return ret;
> }
>
> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
> + uint64_t iova, ram_addr_t size,
> + uint64_t page_size, uint64_t *data)
> +{
> + int ret;
> + struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
> + .size = sizeof(get_dirty_bitmap),
> + .hwpt_id = hwpt_id,
> + .iova = iova,
> + .length = size,
> + .page_size = page_size,
> + .data = (uintptr_t)data,
> + };
> +
> + ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
> + trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
> + page_size, ret);
> + if (ret) {
> + error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
> + " size: 0x%"PRIx64") failed: %s", iova,
> + size, strerror(errno));
> + }
> +
> + return !ret ? 0 : -errno;
> +}
> +
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 158a98cb3b12..9fad47baed9e 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -25,6 +25,7 @@
> #include "qemu/cutils.h"
> #include "qemu/chardev_open.h"
> #include "pci.h"
> +#include "exec/ram_addr.h"
>
> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> @@ -152,6 +153,38 @@ err:
> return ret;
> }
>
> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> + VFIOBitmap *vbmap, hwaddr iova,
> + hwaddr size, Error **errp)
> +{
> + VFIOIOMMUFDContainer *container = container_of(bcontainer,
> + VFIOIOMMUFDContainer,
> + bcontainer);
> + int ret;
> + VFIOIOASHwpt *hwpt;
> + unsigned long page_size;
> +
> + page_size = qemu_real_host_page_size();
> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
> + continue;
> + }
> +
> + ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
> + iova, size, page_size,
> + vbmap->bitmap);
> + if (ret) {
> + error_setg_errno(errp, ret,
> + "Failed to get dirty bitmap report, hwpt_id %u, iova: "
> + "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx,
> + hwpt->hwpt_id, iova, size);
This error looks redundant with the one printed out in
iommufd_backend_get_dirty_bitmap(). Couldn't we add an 'Error **'
parameter and simply return a bool ?
Thanks,
C.
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
> @@ -777,6 +810,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> + vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> diff --git a/backends/trace-events b/backends/trace-events
> index 28aca3b859d4..40811a316215 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-09 7:05 ` Cédric Le Goater
@ 2024-07-09 9:13 ` Joao Martins
2024-07-09 12:41 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-09 9:13 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 08:05, Cédric Le Goater wrote:
> On 7/8/24 4:34 PM, Joao Martins wrote:
>> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
>> that fetches the bitmap that tells what was dirty in an IOVA
>> range.
>>
>> A single bitmap is allocated and used across all the hwpts
>> sharing an IOAS which is then used in log_sync() to set Qemu
>> global bitmaps.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/sysemu/iommufd.h | 3 +++
>> backends/iommufd.c | 26 ++++++++++++++++++++++++++
>> hw/vfio/iommufd.c | 34 ++++++++++++++++++++++++++++++++++
>> backends/trace-events | 1 +
>> 4 files changed, 64 insertions(+)
>>
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 1470377f55ba..223f1ea14e84 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -56,6 +56,9 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>> void *data_ptr, uint32_t *out_hwpt);
>> int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>> bool start);
>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>> + uint64_t iova, ram_addr_t size,
>> + uint64_t page_size, uint64_t *data);
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 69daabc27473..b2d3bbd7c31b 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -257,6 +257,32 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend
>> *be, uint32_t hwpt_id,
>> return ret;
>> }
>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>> + uint64_t iova, ram_addr_t size,
>> + uint64_t page_size, uint64_t *data)
>> +{
>> + int ret;
>> + struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
>> + .size = sizeof(get_dirty_bitmap),
>> + .hwpt_id = hwpt_id,
>> + .iova = iova,
>> + .length = size,
>> + .page_size = page_size,
>> + .data = (uintptr_t)data,
>> + };
>> +
>> + ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
>> + trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
>> + page_size, ret);
>> + if (ret) {
>> + error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
>> + " size: 0x%"PRIx64") failed: %s", iova,
>> + size, strerror(errno));
>> + }
>> +
>> + return !ret ? 0 : -errno;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 158a98cb3b12..9fad47baed9e 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -25,6 +25,7 @@
>> #include "qemu/cutils.h"
>> #include "qemu/chardev_open.h"
>> #include "pci.h"
>> +#include "exec/ram_addr.h"
>> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>> ram_addr_t size, void *vaddr, bool readonly)
>> @@ -152,6 +153,38 @@ err:
>> return ret;
>> }
>> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>> + VFIOBitmap *vbmap, hwaddr iova,
>> + hwaddr size, Error **errp)
>> +{
>> + VFIOIOMMUFDContainer *container = container_of(bcontainer,
>> + VFIOIOMMUFDContainer,
>> + bcontainer);
>> + int ret;
>> + VFIOIOASHwpt *hwpt;
>> + unsigned long page_size;
>> +
>> + page_size = qemu_real_host_page_size();
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>> + continue;
>> + }
>> +
>> + ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
>> + iova, size, page_size,
>> + vbmap->bitmap);
>> + if (ret) {
>> + error_setg_errno(errp, ret,
>> + "Failed to get dirty bitmap report, hwpt_id %u,
>> iova: "
>> + "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx,
>> + hwpt->hwpt_id, iova, size);
>
> This error looks redundant with the one printed out in
> iommufd_backend_get_dirty_bitmap(). Couldn't we add an 'Error **'
> parameter and simply return a bool ?
>
I'll add it.
Just as a sidebar: This is a odd pattern which seems somewhat spread, that
somehow we only care about @errno as something to put on a message, rather then
letting the user know what exact error code it had returned programmatically.
Here in this series it is only important for the device attach so likely doesn't
justify a Error structure enhancement, hence the rest of functions I introduced
here can just adopt bool+errp.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-09 9:13 ` Joao Martins
@ 2024-07-09 12:41 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-09 12:41 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 10:13, Joao Martins wrote:
> On 09/07/2024 08:05, Cédric Le Goater wrote:
>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
>>> that fetches the bitmap that tells what was dirty in an IOVA
>>> range.
>>>
>>> A single bitmap is allocated and used across all the hwpts
>>> sharing an IOAS which is then used in log_sync() to set Qemu
>>> global bitmaps.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/sysemu/iommufd.h | 3 +++
>>> backends/iommufd.c | 26 ++++++++++++++++++++++++++
>>> hw/vfio/iommufd.c | 34 ++++++++++++++++++++++++++++++++++
>>> backends/trace-events | 1 +
>>> 4 files changed, 64 insertions(+)
>>>
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index 1470377f55ba..223f1ea14e84 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -56,6 +56,9 @@ int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>> void *data_ptr, uint32_t *out_hwpt);
>>> int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>>> bool start);
>>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>>> + uint64_t iova, ram_addr_t size,
>>> + uint64_t page_size, uint64_t *data);
>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 69daabc27473..b2d3bbd7c31b 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -257,6 +257,32 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend
>>> *be, uint32_t hwpt_id,
>>> return ret;
>>> }
>>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>>> + uint64_t iova, ram_addr_t size,
>>> + uint64_t page_size, uint64_t *data)
>>> +{
>>> + int ret;
>>> + struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
>>> + .size = sizeof(get_dirty_bitmap),
>>> + .hwpt_id = hwpt_id,
>>> + .iova = iova,
>>> + .length = size,
>>> + .page_size = page_size,
>>> + .data = (uintptr_t)data,
>>> + };
>>> +
>>> + ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
>>> + trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
>>> + page_size, ret);
>>> + if (ret) {
>>> + error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
>>> + " size: 0x%"PRIx64") failed: %s", iova,
>>> + size, strerror(errno));
>>> + }
>>> +
>>> + return !ret ? 0 : -errno;
>>> +}
>>> +
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp)
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 158a98cb3b12..9fad47baed9e 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -25,6 +25,7 @@
>>> #include "qemu/cutils.h"
>>> #include "qemu/chardev_open.h"
>>> #include "pci.h"
>>> +#include "exec/ram_addr.h"
>>> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>>> ram_addr_t size, void *vaddr, bool readonly)
>>> @@ -152,6 +153,38 @@ err:
>>> return ret;
>>> }
>>> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>> + VFIOBitmap *vbmap, hwaddr iova,
>>> + hwaddr size, Error **errp)
>>> +{
>>> + VFIOIOMMUFDContainer *container = container_of(bcontainer,
>>> + VFIOIOMMUFDContainer,
>>> + bcontainer);
>>> + int ret;
>>> + VFIOIOASHwpt *hwpt;
>>> + unsigned long page_size;
>>> +
>>> + page_size = qemu_real_host_page_size();
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>>> + continue;
>>> + }
>>> +
>>> + ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
>>> + iova, size, page_size,
>>> + vbmap->bitmap);
>>> + if (ret) {
>>> + error_setg_errno(errp, ret,
>>> + "Failed to get dirty bitmap report, hwpt_id %u,
>>> iova: "
>>> + "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx,
>>> + hwpt->hwpt_id, iova, size);
>>
>> This error looks redundant with the one printed out in
>> iommufd_backend_get_dirty_bitmap(). Couldn't we add an 'Error **'
>> parameter and simply return a bool ?
>>
>
> I'll add it.
>
> Just as a sidebar: This is a odd pattern which seems somewhat spread, that
> somehow we only care about @errno as something to put on a message, rather then
> letting the user know what exact error code it had returned programmatically.
> Here in this series it is only important for the device attach so likely doesn't
> justify a Error structure enhancement, hence the rest of functions I introduced
> here can just adopt bool+errp.
>
>
Looks like this. A bit odd as the container ops (and legacy backend) expect an
errno. But I am assuming that you intended that way.
-int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
- uint64_t iova, ram_addr_t size,
- uint64_t page_size, uint64_t *data)
+bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
+ uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data,
+ Error **errp)
{
int ret;
struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
@@ -273,14 +278,15 @@ int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
uint32_t hwpt_id,
ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
- page_size, ret);
+ page_size, ret ? errno : 0);
if (ret) {
- error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
- " size: 0x%"PRIx64") failed: %s", iova,
- size, strerror(errno));
+ error_setg_errno(errp, errno,
+ "IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"HWADDR_PRIx
+ " size: 0x%"HWADDR_PRIx") failed", iova, size);
+ return false;
}
- return !ret ? 0 : -errno;
+ return true;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 35cf8da22ef7..4369df34a6ac 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -160,25 +154,18 @@ static int iommufd_query_dirty_bitmap(const
VFIOContainerBase *bcontainer,
VFIOIOMMUFDContainer *container = container_of(bcontainer,
VFIOIOMMUFDContainer,
bcontainer);
- int ret;
+ unsigned long page_size = qemu_real_host_page_size();
VFIOIOASHwpt *hwpt;
- unsigned long page_size;
- page_size = qemu_real_host_page_size();
QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
if (!iommufd_hwpt_dirty_tracking(hwpt)) {
continue;
}
- ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
- iova, size, page_size,
- vbmap->bitmap);
- if (ret) {
- error_setg_errno(errp, ret,
- "Failed to get dirty bitmap report, hwpt_id %u,
iova: "
- "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx,
- hwpt->hwpt_id, iova, size);
- return ret;
+ if (!iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
+ iova, size, page_size,
+ vbmap->bitmap)) {
+ return -EINVAL;
}
}
@@ -287,24 +274,11 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice
*vbasedev, Error **errp)
return true;
}
(...)
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 08/10] vfio/iommufd: Parse hw_caps and store dirty tracking support
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (6 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
` (2 subsequent siblings)
10 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
Fetch capabilities from IOMMU device and add a capability to
host-iommu-device to reflect whether backing IOMMU has dirty tracking.
This is in preparation to relax the migration eligibility when device
doesn't have dirty tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/host_iommu_device.h | 2 ++
backends/iommufd.c | 2 ++
hw/vfio/iommufd.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
index ee6c813c8b22..d38a31693482 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -25,6 +25,7 @@
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
uint8_t aw_bits;
+ bool dirty_tracking;
} HostIOMMUDeviceCaps;
#define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
@@ -97,6 +98,7 @@ struct HostIOMMUDeviceClass {
*/
#define HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE 0
#define HOST_IOMMU_DEVICE_CAP_AW_BITS 1
+#define HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING 2
#define HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX 64
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index b2d3bbd7c31b..9400d51004f0 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -314,6 +314,8 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
switch (cap) {
case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
return caps->type;
+ case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
+ return caps->dirty_tracking;
case HOST_IOMMU_DEVICE_CAP_AW_BITS:
return caps->aw_bits;
default:
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 9fad47baed9e..2678801f1cad 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -832,6 +832,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
hiod->name = g_strdup(vdev->name);
caps->type = type;
caps->aw_bits = vfio_device_get_aw_bits(vdev);
+ caps->dirty_tracking = (hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
}
return true;
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (7 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 08/10] vfio/iommufd: Parse hw_caps and store dirty tracking support Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-09 7:02 ` Cédric Le Goater
2024-07-10 10:38 ` Duan, Zhenzhong
2024-07-08 14:34 ` [PATCH v3 10/10] vfio/common: Allow disabling device dirty page tracking Joao Martins
2024-07-11 7:41 ` [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Cédric Le Goater
10 siblings, 2 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.
For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.
So starting with IOMMU dirty tracking it can use to acomodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
hw/vfio/migration.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 34d4be2ce1b1..89195928666f 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1012,6 +1012,7 @@ void vfio_reset_bytes_transferred(void)
*/
bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
{
+ HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(vbasedev->hiod);
Error *err = NULL;
int ret;
@@ -1036,7 +1037,10 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported) {
+ if (!vbasedev->dirty_pages_supported &&
+ (vbasedev->iommufd &&
+ !hiodc->get_cap(vbasedev->hiod,
+ HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL))) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
"%s: VFIO device doesn't support device dirty tracking",
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-08 14:34 ` [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-07-09 7:02 ` Cédric Le Goater
2024-07-09 9:09 ` Joao Martins
2024-07-10 10:38 ` Duan, Zhenzhong
1 sibling, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-09 7:02 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/8/24 4:34 PM, Joao Martins wrote:
> By default VFIO migration is set to auto, which will support live
> migration if the migration capability is set *and* also dirty page
> tracking is supported.
>
> For testing purposes one can force enable without dirty page tracking
> via enable-migration=on, but that option is generally left for testing
> purposes.
>
> So starting with IOMMU dirty tracking it can use to acomodate the lack of
> VF dirty page tracking allowing us to minimize the VF requirements for
> migration and thus enabling migration by default for those.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> hw/vfio/migration.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 34d4be2ce1b1..89195928666f 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -1012,6 +1012,7 @@ void vfio_reset_bytes_transferred(void)
> */
> bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> {
> + HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(vbasedev->hiod);
> Error *err = NULL;
> int ret;
>
> @@ -1036,7 +1037,10 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> return !vfio_block_migration(vbasedev, err, errp);
> }
>
> - if (!vbasedev->dirty_pages_supported) {
> + if (!vbasedev->dirty_pages_supported &&
> + (vbasedev->iommufd &&
I don't think we need to check ->iommufd. The class handler below will
return false for the vfio/legacy backend.
Thanks,
C.
> + !hiodc->get_cap(vbasedev->hiod,
> + HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL))) {
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> "%s: VFIO device doesn't support device dirty tracking",
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-09 7:02 ` Cédric Le Goater
@ 2024-07-09 9:09 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-09 9:09 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 09/07/2024 08:02, Cédric Le Goater wrote:
> On 7/8/24 4:34 PM, Joao Martins wrote:
>> By default VFIO migration is set to auto, which will support live
>> migration if the migration capability is set *and* also dirty page
>> tracking is supported.
>>
>> For testing purposes one can force enable without dirty page tracking
>> via enable-migration=on, but that option is generally left for testing
>> purposes.
>>
>> So starting with IOMMU dirty tracking it can use to acomodate the lack of
>> VF dirty page tracking allowing us to minimize the VF requirements for
>> migration and thus enabling migration by default for those.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> hw/vfio/migration.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 34d4be2ce1b1..89195928666f 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -1012,6 +1012,7 @@ void vfio_reset_bytes_transferred(void)
>> */
>> bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> {
>> + HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(vbasedev->hiod);
>> Error *err = NULL;
>> int ret;
>> @@ -1036,7 +1037,10 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>> Error **errp)
>> return !vfio_block_migration(vbasedev, err, errp);
>> }
>> - if (!vbasedev->dirty_pages_supported) {
>> + if (!vbasedev->dirty_pages_supported &&
>> + (vbasedev->iommufd &&
>
>
> I don't think we need to check ->iommufd. The class handler below will
> return false for the vfio/legacy backend.
>
OK
Joao
^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-08 14:34 ` [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
2024-07-09 7:02 ` Cédric Le Goater
@ 2024-07-10 10:38 ` Duan, Zhenzhong
2024-07-10 10:59 ` Joao Martins
1 sibling, 1 reply; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-10 10:38 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v3 09/10] vfio/migration: Don't block migration device dirty
>tracking is unsupported
>
>By default VFIO migration is set to auto, which will support live
>migration if the migration capability is set *and* also dirty page
>tracking is supported.
>
>For testing purposes one can force enable without dirty page tracking
>via enable-migration=on, but that option is generally left for testing
>purposes.
>
>So starting with IOMMU dirty tracking it can use to acomodate the lack of
>VF dirty page tracking allowing us to minimize the VF requirements for
>migration and thus enabling migration by default for those.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> hw/vfio/migration.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>index 34d4be2ce1b1..89195928666f 100644
>--- a/hw/vfio/migration.c
>+++ b/hw/vfio/migration.c
>@@ -1012,6 +1012,7 @@ void vfio_reset_bytes_transferred(void)
> */
> bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> {
>+ HostIOMMUDeviceClass *hiodc =
>HOST_IOMMU_DEVICE_GET_CLASS(vbasedev->hiod);
> Error *err = NULL;
> int ret;
>
>@@ -1036,7 +1037,10 @@ bool vfio_migration_realize(VFIODevice
>*vbasedev, Error **errp)
> return !vfio_block_migration(vbasedev, err, errp);
> }
>
>- if (!vbasedev->dirty_pages_supported) {
>+ if (!vbasedev->dirty_pages_supported &&
>+ (vbasedev->iommufd &&
>+ !hiodc->get_cap(vbasedev->hiod,
>+ HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL))) {
What about below, this can avoid a new CAP define.
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1036,7 +1036,7 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported) {
+ if (!vbasedev->dirty_pages_supported && !vbasedev->bcontainer->dirty_pages_supported) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
"%s: VFIO device doesn't support device dirty tracking",
Thanks
Zhenzhong
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> "%s: VFIO device doesn't support device dirty tracking",
>--
>2.17.2
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-10 10:38 ` Duan, Zhenzhong
@ 2024-07-10 10:59 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-10 10:59 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 10/07/2024 11:38, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v3 09/10] vfio/migration: Don't block migration device dirty
>> tracking is unsupported
>>
>> By default VFIO migration is set to auto, which will support live
>> migration if the migration capability is set *and* also dirty page
>> tracking is supported.
>>
>> For testing purposes one can force enable without dirty page tracking
>> via enable-migration=on, but that option is generally left for testing
>> purposes.
>>
>> So starting with IOMMU dirty tracking it can use to acomodate the lack of
>> VF dirty page tracking allowing us to minimize the VF requirements for
>> migration and thus enabling migration by default for those.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> hw/vfio/migration.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 34d4be2ce1b1..89195928666f 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -1012,6 +1012,7 @@ void vfio_reset_bytes_transferred(void)
>> */
>> bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> {
>> + HostIOMMUDeviceClass *hiodc =
>> HOST_IOMMU_DEVICE_GET_CLASS(vbasedev->hiod);
>> Error *err = NULL;
>> int ret;
>>
>> @@ -1036,7 +1037,10 @@ bool vfio_migration_realize(VFIODevice
>> *vbasedev, Error **errp)
>> return !vfio_block_migration(vbasedev, err, errp);
>> }
>>
>> - if (!vbasedev->dirty_pages_supported) {
>> + if (!vbasedev->dirty_pages_supported &&
>> + (vbasedev->iommufd &&
>> + !hiodc->get_cap(vbasedev->hiod,
>> + HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL))) {
>
> What about below, this can avoid a new CAP define.
>
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -1036,7 +1036,7 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> return !vfio_block_migration(vbasedev, err, errp);
> }
>
> - if (!vbasedev->dirty_pages_supported) {
> + if (!vbasedev->dirty_pages_supported && !vbasedev->bcontainer->dirty_pages_supported) {
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> "%s: VFIO device doesn't support device dirty tracking",
>
From the kernel POV this is supposedly device specific, and the
container::dirty_pages_supported doesn't quite capture a case where the system
is less homogeneous (aka more than one hwpt where one has dirty tracking and the
other doesn't). So asking the HostIOMMUDevice sort of futureproof it (and better
represents the kernel interface). But I don't know of systems like this. And
furthemore mix and match of device dirty tracker with IOMMU dirty tracker isn't
supported in code, so for now I can look at bcontainer::dirty_pages_supported
and I'll remove the CAP addition.
Joao
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 10/10] vfio/common: Allow disabling device dirty page tracking
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (8 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-07-08 14:34 ` Joao Martins
2024-07-10 10:42 ` Duan, Zhenzhong
2024-07-11 7:41 ` [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Cédric Le Goater
10 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-08 14:34 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.
Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/common.c | 3 +++
hw/vfio/migration.c | 3 ++-
hw/vfio/pci.c | 3 +++
4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7ce925cfab19..9db3fd31cfae 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -137,6 +137,7 @@ typedef struct VFIODevice {
VFIOMigration *migration;
Error *migration_blocker;
OnOffAuto pre_copy_dirty_page_tracking;
+ OnOffAuto device_dirty_page_tracking;
bool dirty_pages_supported;
bool dirty_tracking;
HostIOMMUDevice *hiod;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7cdb969fd396..eaa33d9ee037 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -199,6 +199,9 @@ bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
VFIODevice *vbasedev;
QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
+ if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+ return false;
+ }
if (!vbasedev->dirty_pages_supported) {
return false;
}
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 89195928666f..35d67332db20 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1037,7 +1037,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported &&
+ if ((!vbasedev->dirty_pages_supported ||
+ vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
(vbasedev->iommufd &&
!hiodc->get_cap(vbasedev->hiod,
HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL))) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e03d9f3ba546..22819b2036b3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3362,6 +3362,9 @@ static Property vfio_pci_dev_properties[] = {
DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
vbasedev.pre_copy_dirty_page_tracking,
ON_OFF_AUTO_ON),
+ DEFINE_PROP_ON_OFF_AUTO("x-device-dirty-page-tracking", VFIOPCIDevice,
+ vbasedev.device_dirty_page_tracking,
+ ON_OFF_AUTO_ON),
DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
display, ON_OFF_AUTO_OFF),
DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
--
2.17.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* RE: [PATCH v3 10/10] vfio/common: Allow disabling device dirty page tracking
2024-07-08 14:34 ` [PATCH v3 10/10] vfio/common: Allow disabling device dirty page tracking Joao Martins
@ 2024-07-10 10:42 ` Duan, Zhenzhong
2024-07-10 10:51 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-10 10:42 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v3 10/10] vfio/common: Allow disabling device dirty page
>tracking
>
>The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
>tracking of VF pre-copy phase of dirty page tracking, though it means
>that it will only be used at the start of the switchover phase.
>
>Add an option that disables the VF dirty page tracking, and fall
>back into container-based dirty page tracking. This also allows to
>use IOMMU dirty tracking even on VFs with their own dirty
>tracker scheme.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/common.c | 3 +++
> hw/vfio/migration.c | 3 ++-
> hw/vfio/pci.c | 3 +++
> 4 files changed, 9 insertions(+), 1 deletion(-)
>
>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>index 7ce925cfab19..9db3fd31cfae 100644
>--- a/include/hw/vfio/vfio-common.h
>+++ b/include/hw/vfio/vfio-common.h
>@@ -137,6 +137,7 @@ typedef struct VFIODevice {
> VFIOMigration *migration;
> Error *migration_blocker;
> OnOffAuto pre_copy_dirty_page_tracking;
>+ OnOffAuto device_dirty_page_tracking;
> bool dirty_pages_supported;
> bool dirty_tracking;
> HostIOMMUDevice *hiod;
>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>index 7cdb969fd396..eaa33d9ee037 100644
>--- a/hw/vfio/common.c
>+++ b/hw/vfio/common.c
>@@ -199,6 +199,9 @@ bool vfio_devices_all_device_dirty_tracking(const
>VFIOContainerBase *bcontainer)
> VFIODevice *vbasedev;
>
> QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
>+ if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
>+ return false;
Maybe we can initialize vbasedev->dirty_pages_supported to false by checking vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF?
This way we can avoid extra check.
>+ }
> if (!vbasedev->dirty_pages_supported) {
> return false;
> }
>diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>index 89195928666f..35d67332db20 100644
>--- a/hw/vfio/migration.c
>+++ b/hw/vfio/migration.c
>@@ -1037,7 +1037,8 @@ bool vfio_migration_realize(VFIODevice
>*vbasedev, Error **errp)
> return !vfio_block_migration(vbasedev, err, errp);
> }
>
>- if (!vbasedev->dirty_pages_supported &&
>+ if ((!vbasedev->dirty_pages_supported ||
>+ vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> (vbasedev->iommufd &&
> !hiodc->get_cap(vbasedev->hiod,
> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL))) {
>diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>index e03d9f3ba546..22819b2036b3 100644
>--- a/hw/vfio/pci.c
>+++ b/hw/vfio/pci.c
>@@ -3362,6 +3362,9 @@ static Property vfio_pci_dev_properties[] = {
> DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking",
>VFIOPCIDevice,
> vbasedev.pre_copy_dirty_page_tracking,
> ON_OFF_AUTO_ON),
>+ DEFINE_PROP_ON_OFF_AUTO("x-device-dirty-page-tracking",
>VFIOPCIDevice,
>+ vbasedev.device_dirty_page_tracking,
>+ ON_OFF_AUTO_ON),
> DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
> display, ON_OFF_AUTO_OFF),
> DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
>--
>2.17.2
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 10/10] vfio/common: Allow disabling device dirty page tracking
2024-07-10 10:42 ` Duan, Zhenzhong
@ 2024-07-10 10:51 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-10 10:51 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 10/07/2024 11:42, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v3 10/10] vfio/common: Allow disabling device dirty page
>> tracking
>>
>> The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
>> tracking of VF pre-copy phase of dirty page tracking, though it means
>> that it will only be used at the start of the switchover phase.
>>
>> Add an option that disables the VF dirty page tracking, and fall
>> back into container-based dirty page tracking. This also allows to
>> use IOMMU dirty tracking even on VFs with their own dirty
>> tracker scheme.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 1 +
>> hw/vfio/common.c | 3 +++
>> hw/vfio/migration.c | 3 ++-
>> hw/vfio/pci.c | 3 +++
>> 4 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>> common.h
>> index 7ce925cfab19..9db3fd31cfae 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -137,6 +137,7 @@ typedef struct VFIODevice {
>> VFIOMigration *migration;
>> Error *migration_blocker;
>> OnOffAuto pre_copy_dirty_page_tracking;
>> + OnOffAuto device_dirty_page_tracking;
>> bool dirty_pages_supported;
>> bool dirty_tracking;
>> HostIOMMUDevice *hiod;
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 7cdb969fd396..eaa33d9ee037 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -199,6 +199,9 @@ bool vfio_devices_all_device_dirty_tracking(const
>> VFIOContainerBase *bcontainer)
>> VFIODevice *vbasedev;
>>
>> QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
>> + if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
>> + return false;
>
> Maybe we can initialize vbasedev->dirty_pages_supported to false by checking vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF?
> This way we can avoid extra check.
>
Hmm, I guess it's a matter of style i.e. the device may support it
(vbasedev::dirty_pages_supported), but user disabled
(vbasedev::device_dirty_page_tracking). It's true that it simplifies, but feels
somewhat misleading?
>> + }
>> if (!vbasedev->dirty_pages_supported) {
>> return false;
>> }
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 89195928666f..35d67332db20 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -1037,7 +1037,8 @@ bool vfio_migration_realize(VFIODevice
>> *vbasedev, Error **errp)
>> return !vfio_block_migration(vbasedev, err, errp);
>> }
>>
>> - if (!vbasedev->dirty_pages_supported &&
>> + if ((!vbasedev->dirty_pages_supported ||
>> + vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
>> (vbasedev->iommufd &&
>> !hiodc->get_cap(vbasedev->hiod,
>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL))) {
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e03d9f3ba546..22819b2036b3 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3362,6 +3362,9 @@ static Property vfio_pci_dev_properties[] = {
>> DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking",
>> VFIOPCIDevice,
>> vbasedev.pre_copy_dirty_page_tracking,
>> ON_OFF_AUTO_ON),
>> + DEFINE_PROP_ON_OFF_AUTO("x-device-dirty-page-tracking",
>> VFIOPCIDevice,
>> + vbasedev.device_dirty_page_tracking,
>> + ON_OFF_AUTO_ON),
>> DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
>> display, ON_OFF_AUTO_OFF),
>> DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
>> --
>> 2.17.2
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
` (9 preceding siblings ...)
2024-07-08 14:34 ` [PATCH v3 10/10] vfio/common: Allow disabling device dirty page tracking Joao Martins
@ 2024-07-11 7:41 ` Cédric Le Goater
2024-07-11 8:33 ` Joao Martins
10 siblings, 1 reply; 48+ messages in thread
From: Cédric Le Goater @ 2024-07-11 7:41 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
Hello Joao,
On 7/8/24 4:34 PM, Joao Martins wrote:
> This small series adds support for IOMMU dirty tracking support via the
> IOMMUFD backend. The hardware capability is available on most recent x86
> hardware. The series is divided organized as follows:
>
> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This
> one is independent of the series but happened to cross it
> while testing mdev with this series
>
> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities
>
> * Patches 3 - 7: IOMMUFD backend support for dirty tracking;
>
> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
> we will find and attach a device to a compatible IOMMU domain, or allocate a new
> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
> workflow is relatively simple:
>
> 1) Probe device and allow dirty tracking in the HWPT
> 2) Toggling dirty tracking on/off
> 3) Read-and-clear of Dirty IOVAs
>
> The heuristics selected for (1) were to always request the HWPT for
> dirty tracking if supported, or rely on device dirty page tracking. This
> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
> tracking even if we ask during hwpt allocation.
>
> The unmap case is deferred until further vIOMMU support with migration
> is added[3] which will then introduce the usage of
> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
> dma unmap bitmap flow.
>
> * Patches 8-10: Don't block live migration where there's no VF dirty
> tracker, considering that we have IOMMU dirty tracking.
>
> Comments and feedback appreciated.
>
> Cheers,
> Joao
>
> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>
> Changes since RFCv2[4]:
> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
> we end up not actually toggling dirty tracking. (Avihai)
> * Fix error handling widely in auto domains logic and all patches (Avihai)
> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
> * New patches 1 and 2 taking into consideration previous comments.
> * Store hwpt::flags to know if we have dirty tracking (Avihai)
> * New patch 8, that allows to query dirty tracking support after
> provisioning. This is a cleaner way to check IOMMU dirty tracking support
> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
> device caps way is still used because at vfio attach we aren't yet with
> a fully initialized migration state.
> * Adopt error propagation in query,set dirty tracking
> * Misc improvements overall broadly and Avihai
> * Drop hugepages as it's a bit unrelated; I can pursue that patch
> * separately. The main motivation is to provide a way to test
> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
> does.
>
> Changes since RFCv1[2]:
> * Remove intel/amd dirty tracking emulation enabling
> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
> [Will pursue these two in separate series]
> * Introduce auto domains support
> * Enforce dirty tracking following the IOMMUFD UAPI for this
> * Add support for toggling hugepages in IOMMUFD
> * Auto enable support when VF supports migration to use IOMMU
> when it doesn't have VF dirty tracking
> * Add a parameter to toggle VF dirty tracking
>
> [0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
> [2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
> [3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
> [4] https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
>
> Joao Martins (10):
> vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure
> backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
> vfio/iommufd: Introduce auto domain creation
> vfio/iommufd: Probe and request hwpt dirty tracking capability
> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
> vfio/iommufd: Parse hw_caps and store dirty tracking support
> vfio/migration: Don't block migration device dirty tracking is unsupported
> vfio/common: Allow disabling device dirty page tracking
>
> include/hw/vfio/vfio-common.h | 11 ++
> include/sysemu/host_iommu_device.h | 2 +
> include/sysemu/iommufd.h | 12 +-
> backends/iommufd.c | 81 ++++++++++-
> hw/vfio/common.c | 3 +
> hw/vfio/iommufd.c | 217 +++++++++++++++++++++++++++--
> hw/vfio/migration.c | 7 +-
> hw/vfio/pci.c | 3 +
> backends/trace-events | 3 +
> 9 files changed, 325 insertions(+), 14 deletions(-)
I am a bit confused with all the inline proposals. Would you mind
resending a v4 please ?
Regarding my comments on error handling,
The error should be set in case of failure, which means a routine
can not return 'false' or '-errno' and not setting 'Error **'
parameter at the same time.
If the returned value needs to be interpreted in some ways, for a
retry or any reason, then it makes sense to use an int, else please
use a bool. This is to avoid random negative values being interpreted
as an errno when they are not.
With VFIO migration support, low level errors (from the adapter FW
through the VFIO PCI variant driver) now reach to the core migration
subsystem. It is preferable to propagate this error, possibly literal,
to the VMM, monitor or libvirt. It's not fully symmetric today because
the log_global_stop handler for dirty tracking enablement is not
addressed. Anyhow, an effort on error reporting needs to be made and
any use of error_report() in a low level function is a sign for
improvement.
I think it would have value to probe early the host IOMMU device for
its HW features. If the results were cached in the HostIOMMUDevice
struct, it would then remove unnecessary and redundant calls to the
host kernel and avoid error handling in complex code paths. I hope
this is feasible. I haven't looked closely tbh.
We are reaching soft freeze, in ~10 days. There is a chance this
proposal could make it for 9.1.
Thanks,
C.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
2024-07-11 7:41 ` [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Cédric Le Goater
@ 2024-07-11 8:33 ` Joao Martins
2024-07-11 10:22 ` Duan, Zhenzhong
0 siblings, 1 reply; 48+ messages in thread
From: Joao Martins @ 2024-07-11 8:33 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 11/07/2024 08:41, Cédric Le Goater wrote:
> Hello Joao,
>
> On 7/8/24 4:34 PM, Joao Martins wrote:
>> This small series adds support for IOMMU dirty tracking support via the
>> IOMMUFD backend. The hardware capability is available on most recent x86
>> hardware. The series is divided organized as follows:
>>
>> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This
>> one is independent of the series but happened to cross it
>> while testing mdev with this series
>>
>> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities
>>
>> * Patches 3 - 7: IOMMUFD backend support for dirty tracking;
>>
>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
>> we will find and attach a device to a compatible IOMMU domain, or allocate a new
>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
>> workflow is relatively simple:
>>
>> 1) Probe device and allow dirty tracking in the HWPT
>> 2) Toggling dirty tracking on/off
>> 3) Read-and-clear of Dirty IOVAs
>>
>> The heuristics selected for (1) were to always request the HWPT for
>> dirty tracking if supported, or rely on device dirty page tracking. This
>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>> tracking even if we ask during hwpt allocation.
>>
>> The unmap case is deferred until further vIOMMU support with migration
>> is added[3] which will then introduce the usage of
>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>> dma unmap bitmap flow.
>>
>> * Patches 8-10: Don't block live migration where there's no VF dirty
>> tracker, considering that we have IOMMU dirty tracking.
>>
>> Comments and feedback appreciated.
>>
>> Cheers,
>> Joao
>>
>> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>
>> Changes since RFCv2[4]:
>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
>> we end up not actually toggling dirty tracking. (Avihai)
>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>> * New patches 1 and 2 taking into consideration previous comments.
>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>> * New patch 8, that allows to query dirty tracking support after
>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>> device caps way is still used because at vfio attach we aren't yet with
>> a fully initialized migration state.
>> * Adopt error propagation in query,set dirty tracking
>> * Misc improvements overall broadly and Avihai
>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>> * separately. The main motivation is to provide a way to test
>> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
>> does.
>>
>> Changes since RFCv1[2]:
>> * Remove intel/amd dirty tracking emulation enabling
>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>> [Will pursue these two in separate series]
>> * Introduce auto domains support
>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>> * Add support for toggling hugepages in IOMMUFD
>> * Auto enable support when VF supports migration to use IOMMU
>> when it doesn't have VF dirty tracking
>> * Add a parameter to toggle VF dirty tracking
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
>> [1]
>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
>> [2]
>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
>> [3]
>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
>> [4]
>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
>>
>> Joao Martins (10):
>> vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure
>> backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW
>> capabilities
>> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
>> vfio/iommufd: Introduce auto domain creation
>> vfio/iommufd: Probe and request hwpt dirty tracking capability
>> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>> vfio/iommufd: Parse hw_caps and store dirty tracking support
>> vfio/migration: Don't block migration device dirty tracking is unsupported
>> vfio/common: Allow disabling device dirty page tracking
>>
>> include/hw/vfio/vfio-common.h | 11 ++
>> include/sysemu/host_iommu_device.h | 2 +
>> include/sysemu/iommufd.h | 12 +-
>> backends/iommufd.c | 81 ++++++++++-
>> hw/vfio/common.c | 3 +
>> hw/vfio/iommufd.c | 217 +++++++++++++++++++++++++++--
>> hw/vfio/migration.c | 7 +-
>> hw/vfio/pci.c | 3 +
>> backends/trace-events | 3 +
>> 9 files changed, 325 insertions(+), 14 deletions(-)
>
>
> I am a bit confused with all the inline proposals. Would you mind
> resending a v4 please ?
>
Yeap, I'll send it out today, or worst case tomorrow morning.
> Regarding my comments on error handling,
>
> The error should be set in case of failure, which means a routine
> can not return 'false' or '-errno' and not setting 'Error **'
> parameter at the same time.
>
> If the returned value needs to be interpreted in some ways, for a
> retry or any reason, then it makes sense to use an int, else please
> use a bool. This is to avoid random negative values being interpreted
> as an errno when they are not.
>
OK, I'll retain the Error* creation even when expecting to test the errno.
> With VFIO migration support, low level errors (from the adapter FW
> through the VFIO PCI variant driver) now reach to the core migration
> subsystem. It is preferable to propagate this error, possibly literal,
> to the VMM, monitor or libvirt. It's not fully symmetric today because
> the log_global_stop handler for dirty tracking enablement is not
> addressed. Anyhow, an effort on error reporting needs to be made and
> any use of error_report() in a low level function is a sign for
> improvement.
>
Gotcha. My earlier comment was mostly that it sounded like there was no place
for returning -errno, but it seems it's not that binary and the Error* is the
thing that really matters here.
> I think it would have value to probe early the host IOMMU device for
> its HW features. If the results were cached in the HostIOMMUDevice
> struct, it would then remove unnecessary and redundant calls to the
> host kernel and avoid error handling in complex code paths. I hope
> this is feasible. I haven't looked closely tbh.
>
OK, I'll post in this series what I had inline[0], as that's what I did.
[0]
https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-59170c471e24@oracle.com/
The gotcha in my opinion is that I cache IOMMUFD specific data returned by the
GET_HW_INFO ioctl inside a new HostIOMMUDeviceCaps::iommufd. The reason being
that vfio_device_get_aw_bits() has a hidden assumption that the container is
already populated with the list of allowed iova ranges, which is not true for
the first device. So rather than have partial set of caps initialized, I
essentially ended up with fetching the raw caps and store them, and serialize
caps into named features (e.g. caps::aw_bits) in HostIOMMUDevice::realize().
> We are reaching soft freeze, in ~10 days. There is a chance this
> proposal could make it for 9.1.
>
> Thanks,
>
> C.
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
2024-07-11 8:33 ` Joao Martins
@ 2024-07-11 10:22 ` Duan, Zhenzhong
2024-07-11 10:44 ` Joao Martins
0 siblings, 1 reply; 48+ messages in thread
From: Duan, Zhenzhong @ 2024-07-11 10:22 UTC (permalink / raw)
To: Joao Martins, Cédric Le Goater, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Jason Gunthorpe,
Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
>
>On 11/07/2024 08:41, Cédric Le Goater wrote:
>> Hello Joao,
>>
>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>> This small series adds support for IOMMU dirty tracking support via the
>>> IOMMUFD backend. The hardware capability is available on most recent
>x86
>>> hardware. The series is divided organized as follows:
>>>
>>> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This
>>> one is independent of the series but happened to cross it
>>> while testing mdev with this series
>>>
>>> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities
>>>
>>> * Patches 3 - 7: IOMMUFD backend support for dirty tracking;
>>>
>>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is
>that
>>> we will find and attach a device to a compatible IOMMU domain, or
>allocate a new
>>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs).
>Afterwards the
>>> workflow is relatively simple:
>>>
>>> 1) Probe device and allow dirty tracking in the HWPT
>>> 2) Toggling dirty tracking on/off
>>> 3) Read-and-clear of Dirty IOVAs
>>>
>>> The heuristics selected for (1) were to always request the HWPT for
>>> dirty tracking if supported, or rely on device dirty page tracking. This
>>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>>> tracking even if we ask during hwpt allocation.
>>>
>>> The unmap case is deferred until further vIOMMU support with migration
>>> is added[3] which will then introduce the usage of
>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP
>ioctl in the
>>> dma unmap bitmap flow.
>>>
>>> * Patches 8-10: Don't block live migration where there's no VF dirty
>>> tracker, considering that we have IOMMU dirty tracking.
>>>
>>> Comments and feedback appreciated.
>>>
>>> Cheers,
>>> Joao
>>>
>>> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
>>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>>
>>> Changes since RFCv2[4]:
>>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>even if
>>> we end up not actually toggling dirty tracking. (Avihai)
>>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>>> * New patches 1 and 2 taking into consideration previous comments.
>>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>>> * New patch 8, that allows to query dirty tracking support after
>>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>>> device caps way is still used because at vfio attach we aren't yet with
>>> a fully initialized migration state.
>>> * Adopt error propagation in query,set dirty tracking
>>> * Misc improvements overall broadly and Avihai
>>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>>> * separately. The main motivation is to provide a way to test
>>> without hugepages similar to what
>vfio_type1_iommu.disable_hugepages=1
>>> does.
>>>
>>> Changes since RFCv1[2]:
>>> * Remove intel/amd dirty tracking emulation enabling
>>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>>> [Will pursue these two in separate series]
>>> * Introduce auto domains support
>>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>>> * Add support for toggling hugepages in IOMMUFD
>>> * Auto enable support when VF supports migration to use IOMMU
>>> when it doesn't have VF dirty tracking
>>> * Add a parameter to toggle VF dirty tracking
>>>
>>> [0]
>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-
>zhenzhong.duan@intel.com/
>>> [1]
>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10-
>zhenzhong.duan@intel.com/
>>> [2]
>>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1-
>joao.m.martins@oracle.com/
>>> [3]
>>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-
>joao.m.martins@oracle.com/
>>> [4]
>>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-
>joao.m.martins@oracle.com/
>>>
>>> Joao Martins (10):
>>> vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure
>>> backends/iommufd: Extend iommufd_backend_get_device_info() to
>fetch HW
>>> capabilities
>>> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
>>> vfio/iommufd: Introduce auto domain creation
>>> vfio/iommufd: Probe and request hwpt dirty tracking capability
>>> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking
>support
>>> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap
>support
>>> vfio/iommufd: Parse hw_caps and store dirty tracking support
>>> vfio/migration: Don't block migration device dirty tracking is
>unsupported
>>> vfio/common: Allow disabling device dirty page tracking
>>>
>>> include/hw/vfio/vfio-common.h | 11 ++
>>> include/sysemu/host_iommu_device.h | 2 +
>>> include/sysemu/iommufd.h | 12 +-
>>> backends/iommufd.c | 81 ++++++++++-
>>> hw/vfio/common.c | 3 +
>>> hw/vfio/iommufd.c | 217 +++++++++++++++++++++++++++--
>>> hw/vfio/migration.c | 7 +-
>>> hw/vfio/pci.c | 3 +
>>> backends/trace-events | 3 +
>>> 9 files changed, 325 insertions(+), 14 deletions(-)
>>
>>
>> I am a bit confused with all the inline proposals. Would you mind
>> resending a v4 please ?
>>
>
>Yeap, I'll send it out today, or worst case tomorrow morning.
>
>> Regarding my comments on error handling,
>>
>> The error should be set in case of failure, which means a routine
>> can not return 'false' or '-errno' and not setting 'Error **'
>> parameter at the same time.
>>
>> If the returned value needs to be interpreted in some ways, for a
>> retry or any reason, then it makes sense to use an int, else please
>> use a bool. This is to avoid random negative values being interpreted
>> as an errno when they are not.
>>
>OK, I'll retain the Error* creation even when expecting to test the errno.
>
>> With VFIO migration support, low level errors (from the adapter FW
>> through the VFIO PCI variant driver) now reach to the core migration
>> subsystem. It is preferable to propagate this error, possibly literal,
>> to the VMM, monitor or libvirt. It's not fully symmetric today because
>> the log_global_stop handler for dirty tracking enablement is not
>> addressed. Anyhow, an effort on error reporting needs to be made and
>> any use of error_report() in a low level function is a sign for
>> improvement.
>>
>Gotcha. My earlier comment was mostly that it sounded like there was no
>place
>for returning -errno, but it seems it's not that binary and the Error* is the
>thing that really matters here.
>
>> I think it would have value to probe early the host IOMMU device for
>> its HW features. If the results were cached in the HostIOMMUDevice
>> struct, it would then remove unnecessary and redundant calls to the
>> host kernel and avoid error handling in complex code paths. I hope
>> this is feasible. I haven't looked closely tbh.
>>
>OK, I'll post in this series what I had inline[0], as that's what I did.
>
>[0]
>https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-
>59170c471e24@oracle.com/
>
>The gotcha in my opinion is that I cache IOMMUFD specific data returned by
>the
>GET_HW_INFO ioctl inside a new HostIOMMUDeviceCaps::iommufd. The
>reason being
>that vfio_device_get_aw_bits() has a hidden assumption that the container
>is
>already populated with the list of allowed iova ranges, which is not true for
>the first device. So rather than have partial set of caps initialized, I
>essentially ended up with fetching the raw caps and store them, and serialize
>caps into named features (e.g. caps::aw_bits) in
>HostIOMMUDevice::realize().
Another way is to call vfio_device_get_aw_bits() and return its result directly
in get_cap(), then no need to initialize caps::aw_bits.
This way host IOMMU device can be moved ahead as Cédric suggested.
Thanks
Zhenzhong
>
>> We are reaching soft freeze, in ~10 days. There is a chance this
>> proposal could make it for 9.1.
>>
>> Thanks,
>>
>> C.
>>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
2024-07-11 10:22 ` Duan, Zhenzhong
@ 2024-07-11 10:44 ` Joao Martins
0 siblings, 0 replies; 48+ messages in thread
From: Joao Martins @ 2024-07-11 10:44 UTC (permalink / raw)
To: Duan, Zhenzhong, Cédric Le Goater, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Jason Gunthorpe,
Avihai Horon
On 11/07/2024 11:22, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
>>
>> On 11/07/2024 08:41, Cédric Le Goater wrote:
>>> Hello Joao,
>>>
>>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>>> This small series adds support for IOMMU dirty tracking support via the
>>>> IOMMUFD backend. The hardware capability is available on most recent
>> x86
>>>> hardware. The series is divided organized as follows:
>>>>
>>>> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This
>>>> one is independent of the series but happened to cross it
>>>> while testing mdev with this series
>>>>
>>>> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities
>>>>
>>>> * Patches 3 - 7: IOMMUFD backend support for dirty tracking;
>>>>
>>>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is
>> that
>>>> we will find and attach a device to a compatible IOMMU domain, or
>> allocate a new
>>>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs).
>> Afterwards the
>>>> workflow is relatively simple:
>>>>
>>>> 1) Probe device and allow dirty tracking in the HWPT
>>>> 2) Toggling dirty tracking on/off
>>>> 3) Read-and-clear of Dirty IOVAs
>>>>
>>>> The heuristics selected for (1) were to always request the HWPT for
>>>> dirty tracking if supported, or rely on device dirty page tracking. This
>>>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>>>> tracking even if we ask during hwpt allocation.
>>>>
>>>> The unmap case is deferred until further vIOMMU support with migration
>>>> is added[3] which will then introduce the usage of
>>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP
>> ioctl in the
>>>> dma unmap bitmap flow.
>>>>
>>>> * Patches 8-10: Don't block live migration where there's no VF dirty
>>>> tracker, considering that we have IOMMU dirty tracking.
>>>>
>>>> Comments and feedback appreciated.
>>>>
>>>> Cheers,
>>>> Joao
>>>>
>>>> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
>>>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>>>
>>>> Changes since RFCv2[4]:
>>>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>> even if
>>>> we end up not actually toggling dirty tracking. (Avihai)
>>>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>>>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>>>> * New patches 1 and 2 taking into consideration previous comments.
>>>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>>>> * New patch 8, that allows to query dirty tracking support after
>>>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>>>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>>>> device caps way is still used because at vfio attach we aren't yet with
>>>> a fully initialized migration state.
>>>> * Adopt error propagation in query,set dirty tracking
>>>> * Misc improvements overall broadly and Avihai
>>>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>>>> * separately. The main motivation is to provide a way to test
>>>> without hugepages similar to what
>> vfio_type1_iommu.disable_hugepages=1
>>>> does.
>>>>
>>>> Changes since RFCv1[2]:
>>>> * Remove intel/amd dirty tracking emulation enabling
>>>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>>>> [Will pursue these two in separate series]
>>>> * Introduce auto domains support
>>>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>>>> * Add support for toggling hugepages in IOMMUFD
>>>> * Auto enable support when VF supports migration to use IOMMU
>>>> when it doesn't have VF dirty tracking
>>>> * Add a parameter to toggle VF dirty tracking
>>>>
>>>> [0]
>>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-
>> zhenzhong.duan@intel.com/
>>>> [1]
>>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10-
>> zhenzhong.duan@intel.com/
>>>> [2]
>>>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1-
>> joao.m.martins@oracle.com/
>>>> [3]
>>>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-
>> joao.m.martins@oracle.com/
>>>> [4]
>>>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-
>> joao.m.martins@oracle.com/
>>>>
>>>> Joao Martins (10):
>>>> vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure
>>>> backends/iommufd: Extend iommufd_backend_get_device_info() to
>> fetch HW
>>>> capabilities
>>>> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
>>>> vfio/iommufd: Introduce auto domain creation
>>>> vfio/iommufd: Probe and request hwpt dirty tracking capability
>>>> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking
>> support
>>>> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap
>> support
>>>> vfio/iommufd: Parse hw_caps and store dirty tracking support
>>>> vfio/migration: Don't block migration device dirty tracking is
>> unsupported
>>>> vfio/common: Allow disabling device dirty page tracking
>>>>
>>>> include/hw/vfio/vfio-common.h | 11 ++
>>>> include/sysemu/host_iommu_device.h | 2 +
>>>> include/sysemu/iommufd.h | 12 +-
>>>> backends/iommufd.c | 81 ++++++++++-
>>>> hw/vfio/common.c | 3 +
>>>> hw/vfio/iommufd.c | 217 +++++++++++++++++++++++++++--
>>>> hw/vfio/migration.c | 7 +-
>>>> hw/vfio/pci.c | 3 +
>>>> backends/trace-events | 3 +
>>>> 9 files changed, 325 insertions(+), 14 deletions(-)
>>>
>>>
>>> I am a bit confused with all the inline proposals. Would you mind
>>> resending a v4 please ?
>>>
>>
>> Yeap, I'll send it out today, or worst case tomorrow morning.
>>
>>> Regarding my comments on error handling,
>>>
>>> The error should be set in case of failure, which means a routine
>>> can not return 'false' or '-errno' and not setting 'Error **'
>>> parameter at the same time.
>>>
>>> If the returned value needs to be interpreted in some ways, for a
>>> retry or any reason, then it makes sense to use an int, else please
>>> use a bool. This is to avoid random negative values being interpreted
>>> as an errno when they are not.
>>>
>> OK, I'll retain the Error* creation even when expecting to test the errno.
>>
>>> With VFIO migration support, low level errors (from the adapter FW
>>> through the VFIO PCI variant driver) now reach to the core migration
>>> subsystem. It is preferable to propagate this error, possibly literal,
>>> to the VMM, monitor or libvirt. It's not fully symmetric today because
>>> the log_global_stop handler for dirty tracking enablement is not
>>> addressed. Anyhow, an effort on error reporting needs to be made and
>>> any use of error_report() in a low level function is a sign for
>>> improvement.
>>>
>> Gotcha. My earlier comment was mostly that it sounded like there was no
>> place
>> for returning -errno, but it seems it's not that binary and the Error* is the
>> thing that really matters here.
>>
>>> I think it would have value to probe early the host IOMMU device for
>>> its HW features. If the results were cached in the HostIOMMUDevice
>>> struct, it would then remove unnecessary and redundant calls to the
>>> host kernel and avoid error handling in complex code paths. I hope
>>> this is feasible. I haven't looked closely tbh.
>>>
>> OK, I'll post in this series what I had inline[0], as that's what I did.
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-
>> 59170c471e24@oracle.com/
>>
>> The gotcha in my opinion is that I cache IOMMUFD specific data returned by
>> the
>> GET_HW_INFO ioctl inside a new HostIOMMUDeviceCaps::iommufd. The
>> reason being
>> that vfio_device_get_aw_bits() has a hidden assumption that the container
>> is
>> already populated with the list of allowed iova ranges, which is not true for
>> the first device. So rather than have partial set of caps initialized, I
>> essentially ended up with fetching the raw caps and store them, and serialize
>> caps into named features (e.g. caps::aw_bits) in
>> HostIOMMUDevice::realize().
>
> Another way is to call vfio_device_get_aw_bits() and return its result directly
> in get_cap(), then no need to initialize caps::aw_bits.
> This way host IOMMU device can be moved ahead as Cédric suggested.
Oh, yes, that's a great alternative. Let me adopt that instead and we don't need
to make so huge changes structure wise.
^ permalink raw reply [flat|nested] 48+ messages in thread