Re: [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during migration

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Yi Liu <yi.l.liu@intel.com>
To: Zhenzhong Duan <zhenzhong.duan@intel.com>, <qemu-devel@nongnu.org>
Cc: <alex.williamson@redhat.com>, <clg@redhat.com>, <mst@redhat.com>,
	<jasowang@redhat.com>, <clement.mathieu--drif@eviden.com>,
	<eric.auger@redhat.com>, <joao.m.martins@oracle.com>,
	<avihaih@nvidia.com>, <xudong.hao@intel.com>,
	<giovanni.cabiddu@intel.com>, <mark.gross@intel.com>,
	 <arjan.van.de.ven@intel.com>
Subject: Re: [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during migration
Date: Sun, 12 Oct 2025 18:31:10 +0800	[thread overview]
Message-ID: <bc51d154-be8e-47d7-abe7-bcb9f93a7348@intel.com> (raw)
In-Reply-To: <20250910023701.244356-5-zhenzhong.duan@intel.com>

On 2025/9/10 10:37, Zhenzhong Duan wrote:
> If a VFIO device in guest switches from IOMMU domain to block domain,
> vtd_address_space_unmap() is called to unmap whole address space.
> 
> If that happens during migration, migration fails with legacy VFIO
> backend as below:
> 
> Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))

this should be a giant and busy VM. right? Is a fix tag needed by the way?

> 
> Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
> 4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
> ioctl fails.
> 
> There is no such limitation with iommufd backend, but it's still not optimal
> to allocate large bitmap.
> 
> Optimize it by iterating over DMAMap list to unmap each range with active
> mapping when migration is active. If migration is not active, unmapping the
> whole address space in one go is optimal.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
> ---
>   hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 42 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 83c5e44413..6876dae727 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -37,6 +37,7 @@
>   #include "system/system.h"
>   #include "hw/i386/apic_internal.h"
>   #include "kvm/kvm_i386.h"
> +#include "migration/misc.h"
>   #include "migration/vmstate.h"
>   #include "trace.h"
>   
> @@ -4423,6 +4424,42 @@ static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
>       vtd_iommu_unlock(s);
>   }
>   
> +/*
> + * Unmapping a large range in one go is not optimal during migration because
> + * a large dirty bitmap needs to be allocated while there may be only small
> + * mappings, iterate over DMAMap list to unmap each range with active mapping.
> + */
> +static void vtd_address_space_unmap_in_migration(VTDAddressSpace *as,
> +                                                 IOMMUNotifier *n)
> +{
> +    const DMAMap *map;
> +    const DMAMap target = {
> +        .iova = n->start,
> +        .size = n->end,
> +    };
> +    IOVATree *tree = as->iova_tree;
> +
> +    /*
> +     * DMAMap is created during IOMMU page table sync, it's either 4KB or huge
> +     * page size and always a power of 2 in size. So the range of DMAMap could
> +     * be used for UNMAP notification directly.
> +     */
> +    while ((map = iova_tree_find(tree, &target))) {

how about an empty iova_tree? If guest has not mapped anything for the
device, the tree is empty. And it is fine to not unmap anyting. While,
if the device is attached to an identify domain, the iova_tree is empty
as well. Are we sure that we need not to unmap anything here? It looks
the answer is yes. But I'm suspecting the unmap failure will happen in 
the vfio side? If yes, need to consider a complete fix. :)

> +        IOMMUTLBEvent event;
> +
> +        event.type = IOMMU_NOTIFIER_UNMAP;
> +        event.entry.iova = map->iova;
> +        event.entry.addr_mask = map->size;
> +        event.entry.target_as = &address_space_memory;
> +        event.entry.perm = IOMMU_NONE;
> +        /* This field is meaningless for unmap */
> +        event.entry.translated_addr = 0;
> +        memory_region_notify_iommu_one(n, &event);
> +
> +        iova_tree_remove(tree, *map);
> +    }
> +}
> +
>   /* Unmap the whole range in the notifier's scope. */
>   static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>   {
> @@ -4432,6 +4469,11 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>       IntelIOMMUState *s = as->iommu_state;
>       DMAMap map;
>   
> +    if (migration_is_running()) {

If the range is not big enough, it is still better to unmap in one-go.
right? If so, might add a check on the range here to go to the iova_tee
iteration conditionally.

> +        vtd_address_space_unmap_in_migration(as, n);
> +        return;
> +    }
> +
>       /*
>        * Note: all the codes in this function has a assumption that IOVA
>        * bits are no more than VTD_MGAW bits (which is restricted by

Regards,
Yi Liu

next prev parent reply	other threads:[~2025-10-12 10:25 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10  2:36 [PATCH 0/5] vfio: relax the vIOMMU check Zhenzhong Duan
2025-09-10  2:36 ` [PATCH 1/5] vfio/iommufd: Add framework code to support getting dirty bitmap before unmap Zhenzhong Duan
2025-09-19  9:30   ` Cédric Le Goater
2025-09-22  3:17     ` Duan, Zhenzhong
2025-09-22  8:27       ` Cédric Le Goater
2025-09-23  2:45         ` Duan, Zhenzhong
2025-09-23  7:06           ` Cédric Le Goater
2025-09-23  9:50             ` Duan, Zhenzhong
2025-09-10  2:36 ` [PATCH 2/5] vfio/iommufd: Query dirty bitmap before DMA unmap Zhenzhong Duan
2025-09-10  2:36 ` [PATCH 3/5] vfio/iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag support Zhenzhong Duan
2025-09-19 10:27   ` Joao Martins
2025-09-22  5:49     ` Duan, Zhenzhong
2025-09-22 16:02       ` Cédric Le Goater
2025-09-22 16:06         ` Joao Martins
2025-09-22 17:01           ` Joao Martins
2025-09-23  2:50             ` Duan, Zhenzhong
2025-09-23  9:17               ` Joao Martins
2025-09-23  9:55                 ` Duan, Zhenzhong
2025-09-23 10:06                   ` Duan, Zhenzhong
2025-09-23  2:47         ` Duan, Zhenzhong
2025-10-09 10:20         ` Duan, Zhenzhong
2025-10-09 12:43           ` Cédric Le Goater
2025-10-10  4:09             ` Duan, Zhenzhong
2025-09-10  2:37 ` [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during migration Zhenzhong Duan
2025-10-12 10:31   ` Yi Liu [this message]
2025-10-13  2:50     ` Duan, Zhenzhong
2025-10-13 12:56       ` Yi Liu
2025-10-14  2:31         ` Duan, Zhenzhong
2025-09-10  2:37 ` [PATCH 5/5] vfio/migration: Allow live migration with vIOMMU without VFs using device dirty tracking Zhenzhong Duan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc51d154-be8e-47d7-abe7-bcb9f93a7348@intel.com \
    --to=yi.l.liu@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=arjan.van.de.ven@intel.com \
    --cc=avihaih@nvidia.com \
    --cc=clement.mathieu--drif@eviden.com \
    --cc=clg@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=giovanni.cabiddu@intel.com \
    --cc=jasowang@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=mark.gross@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=xudong.hao@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).