From: Zhenzhong Duan <zhenzhong.duan@intel.com>
To: qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, mst@redhat.com,
jasowang@redhat.com, yi.l.liu@intel.com,
clement.mathieu--drif@eviden.com, eric.auger@redhat.com,
joao.m.martins@oracle.com, avihaih@nvidia.com,
xudong.hao@intel.com, giovanni.cabiddu@intel.com,
rohith.s.r@intel.com, mark.gross@intel.com,
arjan.van.de.ven@intel.com,
Zhenzhong Duan <zhenzhong.duan@intel.com>
Subject: [PATCH v3 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Date: Thu, 23 Oct 2025 22:09:20 -0400 [thread overview]
Message-ID: <20251024020922.13053-7-zhenzhong.duan@intel.com> (raw)
In-Reply-To: <20251024020922.13053-1-zhenzhong.duan@intel.com>
If a VFIO device in guest switches from IOMMU domain to block domain,
vtd_address_space_unmap() is called to unmap whole address space.
If that happens during migration, migration fails with legacy VFIO
backend as below:
Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))
Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
ioctl fails. Normally such large UNMAP notification come from IOVA range
rather than system memory.
Fix it by iterating over DMAMap list to unmap each range with active mapping
when migration is active. If migration is not active, unmapping the whole
address space in one go is optimal.
There is no such limitation with iommufd backend, but it's still not optimal
to allocate large bitmap, e.g., there may be large hole between IOVA ranges,
allocating large bitmap and dirty tracking on the hole is time consuming and
useless work.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
---
hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c402643b56..b00fdecaf8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -37,6 +37,7 @@
#include "system/system.h"
#include "hw/i386/apic_internal.h"
#include "kvm/kvm_i386.h"
+#include "migration/misc.h"
#include "migration/vmstate.h"
#include "trace.h"
@@ -4695,6 +4696,42 @@ static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
vtd_iommu_unlock(s);
}
+/*
+ * Unmapping a large range in one go is not optimal during migration because
+ * a large dirty bitmap needs to be allocated while there may be only small
+ * mappings, iterate over DMAMap list to unmap each range with active mapping.
+ */
+static void vtd_address_space_unmap_in_migration(VTDAddressSpace *as,
+ IOMMUNotifier *n)
+{
+ const DMAMap *map;
+ const DMAMap target = {
+ .iova = n->start,
+ .size = n->end,
+ };
+ IOVATree *tree = as->iova_tree;
+
+ /*
+ * DMAMap is created during IOMMU page table sync, it's either 4KB or huge
+ * page size and always a power of 2 in size. So the range of DMAMap could
+ * be used for UNMAP notification directly.
+ */
+ while ((map = iova_tree_find(tree, &target))) {
+ IOMMUTLBEvent event;
+
+ event.type = IOMMU_NOTIFIER_UNMAP;
+ event.entry.iova = map->iova;
+ event.entry.addr_mask = map->size;
+ event.entry.target_as = &address_space_memory;
+ event.entry.perm = IOMMU_NONE;
+ /* This field is meaningless for unmap */
+ event.entry.translated_addr = 0;
+ memory_region_notify_iommu_one(n, &event);
+
+ iova_tree_remove(tree, *map);
+ }
+}
+
/* Unmap the whole range in the notifier's scope. */
static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
{
@@ -4704,6 +4741,11 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
IntelIOMMUState *s = as->iommu_state;
DMAMap map;
+ if (migration_is_running()) {
+ vtd_address_space_unmap_in_migration(as, n);
+ return;
+ }
+
/*
* Note: all the codes in this function has a assumption that IOVA
* bits are no more than VTD_MGAW bits (which is restricted by
--
2.47.1
next prev parent reply other threads:[~2025-10-24 2:11 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-24 2:09 [PATCH v3 0/8] vfio: relax the vIOMMU check Zhenzhong Duan
2025-10-24 2:09 ` [PATCH v3 1/8] vfio/iommufd: Add framework code to support getting dirty bitmap before unmap Zhenzhong Duan
2025-10-24 2:09 ` [PATCH v3 2/8] vfio/iommufd: Query dirty bitmap before DMA unmap Zhenzhong Duan
2025-10-24 2:09 ` [PATCH v3 3/8] vfio/container-legacy: rename vfio_dma_unmap_bitmap() to vfio_legacy_dma_unmap_get_dirty_bitmap() Zhenzhong Duan
2025-10-24 2:09 ` [PATCH v3 4/8] vfio: Add a backend_flag parameter to vfio_contianer_query_dirty_bitmap() Zhenzhong Duan
2025-10-24 2:09 ` [PATCH v3 5/8] vfio/iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag support Zhenzhong Duan
2025-10-24 2:09 ` Zhenzhong Duan [this message]
2025-10-24 2:09 ` [PATCH v3 7/8] vfio/migration: Add migration blocker if VM memory is too large to cause unmap_bitmap failure Zhenzhong Duan
2025-10-27 11:26 ` Avihai Horon
2025-10-28 9:32 ` Duan, Zhenzhong
2025-10-28 13:14 ` Avihai Horon
2025-10-29 7:39 ` Duan, Zhenzhong
2025-10-24 2:09 ` [PATCH v3 8/8] vfio/migration: Allow live migration with vIOMMU without VFs using device dirty tracking Zhenzhong Duan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251024020922.13053-7-zhenzhong.duan@intel.com \
--to=zhenzhong.duan@intel.com \
--cc=alex.williamson@redhat.com \
--cc=arjan.van.de.ven@intel.com \
--cc=avihaih@nvidia.com \
--cc=clement.mathieu--drif@eviden.com \
--cc=clg@redhat.com \
--cc=eric.auger@redhat.com \
--cc=giovanni.cabiddu@intel.com \
--cc=jasowang@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=mark.gross@intel.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rohith.s.r@intel.com \
--cc=xudong.hao@intel.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).