From: Joao Martins <joao.m.martins@oracle.com>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
Cedric Le Goater <clg@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
David Hildenbrand <david@redhat.com>,
Philippe Mathieu-Daude <philmd@linaro.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Jason Wang <jasowang@redhat.com>,
Richard Henderson <richard.henderson@linaro.org>,
Eduardo Habkost <eduardo@habkost.net>,
Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v4 12/15] vfio/common: Support device dirty page tracking with vIOMMU
Date: Mon, 10 Jul 2023 14:49:29 +0100 [thread overview]
Message-ID: <743c8d19-2e74-0542-d39c-df75a2ebb4f3@oracle.com> (raw)
In-Reply-To: <3dd304a7-3ec2-9e6d-1916-adfbb0c417b6@nvidia.com>
On 09/07/2023 16:24, Avihai Horon wrote:
> On 23/06/2023 0:48, Joao Martins wrote:
>> Currently, device dirty page tracking with vIOMMU is not supported,
>> and a blocker is added and the migration is prevented.
>>
>> When vIOMMU is used, IOVA ranges are DMA mapped/unmapped on the fly as
>> requesting by the vIOMMU. These IOVA ranges can potentially be mapped
>> anywhere in the vIOMMU IOVA space as advertised by the VMM.
>>
>> To support device dirty tracking when vIOMMU enabled instead create the
>> dirty ranges based on the vIOMMU provided limits, which leads to the
>> tracking of the whole IOVA space regardless of what devices use.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 1 +
>> hw/vfio/common.c | 58 +++++++++++++++++++++++++++++------
>> hw/vfio/pci.c | 7 +++++
>> 3 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index f41860988d6b..c4bafad084b4 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -71,6 +71,7 @@ typedef struct VFIOMigration {
>> typedef struct VFIOAddressSpace {
>> AddressSpace *as;
>> bool no_dma_translation;
>> + hwaddr max_iova;
>> QLIST_HEAD(, VFIOContainer) containers;
>> QLIST_ENTRY(VFIOAddressSpace) list;
>> } VFIOAddressSpace;
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index ecfb9afb3fb6..85fddef24026 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -428,6 +428,25 @@ static bool vfio_viommu_preset(void)
>> return false;
>> }
>>
>> +static int vfio_viommu_get_max_iova(hwaddr *max_iova)
>> +{
>> + VFIOAddressSpace *space;
>> +
>> + *max_iova = 0;
>> +
>> + QLIST_FOREACH(space, &vfio_address_spaces, list) {
>> + if (space->as == &address_space_memory) {
>> + continue;
>> + }
>> +
>> + if (*max_iova < space->max_iova) {
>> + *max_iova = space->max_iova;
>> + }
>> + }
>
> Looks like max_iova is a per VFIOAddressSpace property, so why do we need to
> iterate over all address spaces?
>
This was more futureproof-ing when Qemu supports multiple vIOMMU. In theory this
tracks device address space, and if two different devices stand behind different
vIOMMU, then this loop would compute the highest IOVA that we would track by the
host device dirty tracker.
But I realize this might introduce unnecessary complexity, and we should 'obey'
the advertised vIOMMU max_iova for the device. With Zhenzhong blocker cleanup I
can make this just fetch the max_iova in the space and be done with it.
Joao
> Thanks.
>
>> +
>> + return *max_iova == 0;
>> +}
>> +
>> int vfio_block_giommu_migration(Error **errp)
>> {
>> int ret;
>> @@ -1464,10 +1483,11 @@ static const MemoryListener
>> vfio_dirty_tracking_listener = {
>> .region_add = vfio_listener_dirty_tracking_update,
>> };
>>
>> -static void vfio_dirty_tracking_init(VFIOContainer *container,
>> +static int vfio_dirty_tracking_init(VFIOContainer *container,
>> VFIODirtyRanges *ranges)
>> {
>> VFIODirtyRangesListener dirty;
>> + int ret;
>>
>> memset(&dirty, 0, sizeof(dirty));
>> dirty.ranges.min32 = UINT32_MAX;
>> @@ -1475,17 +1495,29 @@ static void vfio_dirty_tracking_init(VFIOContainer
>> *container,
>> dirty.listener = vfio_dirty_tracking_listener;
>> dirty.container = container;
>>
>> - memory_listener_register(&dirty.listener,
>> - container->space->as);
>> + if (vfio_viommu_preset()) {
>> + hwaddr iommu_max_iova;
>> +
>> + ret = vfio_viommu_get_max_iova(&iommu_max_iova);
>> + if (ret) {
>> + return -EINVAL;
>> + }
>> +
>> + vfio_dirty_tracking_update(0, iommu_max_iova, &dirty.ranges);
>> + } else {
>> + memory_listener_register(&dirty.listener,
>> + container->space->as);
>> + /*
>> + * The memory listener is synchronous, and used to calculate the range
>> + * to dirty tracking. Unregister it after we are done as we are not
>> + * interested in any follow-up updates.
>> + */
>> + memory_listener_unregister(&dirty.listener);
>> + }
>>
>> *ranges = dirty.ranges;
>>
>> - /*
>> - * The memory listener is synchronous, and used to calculate the range
>> - * to dirty tracking. Unregister it after we are done as we are not
>> - * interested in any follow-up updates.
>> - */
>> - memory_listener_unregister(&dirty.listener);
>> + return 0;
>> }
>>
>> static void vfio_devices_dma_logging_stop(VFIOContainer *container)
>> @@ -1590,7 +1622,13 @@ static int vfio_devices_dma_logging_start(VFIOContainer
>> *container)
>> VFIOGroup *group;
>> int ret = 0;
>>
>> - vfio_dirty_tracking_init(container, &ranges);
>> + ret = vfio_dirty_tracking_init(container, &ranges);
>> + if (ret) {
>> + error_report("Failed to init DMA logging ranges, err %d",
>> + ret);
>> + return -EOPNOTSUPP;
>> + }
>> +
>> feature = vfio_device_feature_dma_logging_start_create(container,
>> &ranges);
>> if (!feature) {
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 8a98e6ffc480..3bda5618c5b5 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2974,6 +2974,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>> &dma_translation);
>> space->no_dma_translation = !dma_translation;
>>
>> + /*
>> + * Support for advertised IOMMU address space boundaries is optional.
>> + * By default, it is not advertised i.e. space::max_iova is 0.
>> + */
>> + pci_device_iommu_get_attr(pdev, IOMMU_ATTR_MAX_IOVA,
>> + &space->max_iova);
>> +
>> QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
>> error_setg(errp, "device is already attached");
>> --
>> 2.17.2
>>
next prev parent reply other threads:[~2023-07-10 13:50 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-22 21:48 [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU Joao Martins
2023-06-22 21:48 ` [PATCH v4 01/15] hw/pci: Add a pci_setup_iommu_ops() helper Joao Martins
2023-10-02 15:12 ` Cédric Le Goater
2023-10-06 8:38 ` Joao Martins
2023-10-06 8:50 ` Cédric Le Goater
2023-10-06 11:06 ` Joao Martins
2023-10-06 17:09 ` Cédric Le Goater
2023-10-06 17:59 ` Joao Martins
2023-10-09 13:01 ` Cédric Le Goater
2023-10-06 8:45 ` Eric Auger
2023-10-06 11:03 ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 02/15] hw/pci: Refactor pci_device_iommu_address_space() Joao Martins
2023-10-02 15:22 ` Cédric Le Goater
2023-10-06 8:39 ` Joao Martins
2023-10-06 8:40 ` Joao Martins
2023-10-06 8:52 ` Eric Auger
2023-10-06 11:07 ` Joao Martins
2023-10-06 9:11 ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 03/15] hw/pci: Introduce pci_device_iommu_get_attr() Joao Martins
2023-06-22 21:48 ` [PATCH v4 04/15] intel-iommu: Switch to pci_setup_iommu_ops() Joao Martins
2023-06-22 21:48 ` [PATCH v4 05/15] memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute Joao Martins
2023-10-06 13:08 ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 06/15] intel-iommu: Implement get_attr() method Joao Martins
2023-09-08 6:23 ` Duan, Zhenzhong
2023-09-08 10:11 ` Joao Martins
2023-10-02 15:23 ` Cédric Le Goater
2023-10-06 8:42 ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 07/15] vfio/common: Track whether DMA Translation is enabled on the vIOMMU Joao Martins
2023-07-09 15:10 ` Avihai Horon
2023-07-10 13:44 ` Joao Martins
2023-10-06 13:09 ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 08/15] vfio/common: Relax vIOMMU detection when DMA translation is off Joao Martins
2023-06-22 21:48 ` [PATCH v4 09/15] memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute Joao Martins
2023-06-22 21:48 ` [PATCH v4 10/15] intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute Joao Martins
2023-07-09 15:17 ` Avihai Horon
2023-07-10 13:44 ` Joao Martins
2023-10-02 15:42 ` Cédric Le Goater
2023-10-06 8:43 ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 11/15] vfio/common: Move dirty tracking ranges update to helper Joao Martins
2023-06-22 21:48 ` [PATCH v4 12/15] vfio/common: Support device dirty page tracking with vIOMMU Joao Martins
2023-07-09 15:24 ` Avihai Horon
2023-07-10 13:49 ` Joao Martins [this message]
2023-09-08 6:11 ` Duan, Zhenzhong
2023-09-08 10:11 ` Joao Martins
2023-09-08 11:52 ` Duan, Zhenzhong
2023-09-08 11:54 ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 13/15] vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap() Joao Martins
2023-06-22 21:48 ` [PATCH v4 14/15] vfio/common: Optimize device dirty page tracking with vIOMMU Joao Martins
2023-06-22 21:48 ` [PATCH v4 15/15] vfio/common: Block migration with vIOMMUs without address width limits Joao Martins
2023-09-08 6:28 ` Duan, Zhenzhong
2023-09-08 10:11 ` Joao Martins
2023-06-22 22:18 ` [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU Joao Martins
2023-09-07 11:11 ` Joao Martins
2023-09-07 12:40 ` Cédric Le Goater
2023-09-07 15:20 ` Joao Martins
2024-06-06 15:43 ` Cédric Le Goater
2024-06-07 15:10 ` Joao Martins
2024-06-10 16:53 ` Cédric Le Goater
2024-06-18 11:26 ` Joao Martins
2024-06-20 12:31 ` Cédric Le Goater
2024-11-28 3:19 ` Zhangfei Gao
2024-11-28 18:29 ` Joao Martins
2025-01-21 16:42 ` Joao Martins
2025-01-07 6:55 ` Zhangfei Gao
2025-01-21 16:42 ` Joao Martins
2025-02-08 2:07 ` Zhangfei Gao
2025-03-05 11:59 ` Joao Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=743c8d19-2e74-0542-d39c-df75a2ebb4f3@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=clg@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).