From: Zhenzhong Duan <zhenzhong.duan@intel.com>
To: qemu-devel@nongnu.org
Cc: mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
yi.l.liu@intel.com, chao.p.peng@intel.com
Subject: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync
Date: Thu, 1 Jun 2023 14:33:18 +0800 [thread overview]
Message-ID: <20230601063320.139308-3-zhenzhong.duan@intel.com> (raw)
In-Reply-To: <20230601063320.139308-1-zhenzhong.duan@intel.com>
Peter Xu found a potential issue:
"The other thing is when I am looking at the new code I found that we
actually extended the replay() to be used also in dirty tracking of vfio,
in vfio_sync_dirty_bitmap(). For that maybe it's already broken if
unmap_all() because afaiu log_sync() can be called in migration thread
anytime during DMA so I think it means the device is prone to DMA with the
IOMMU pgtable quickly erased and rebuilt here, which means the DMA could
fail unexpectedly. Copy Alex, Kirti and Neo."
To eliminate this small window with empty mapping, we should remove the
call to unmap_all(). Besides that, introduce a new notifier type called
IOMMU_NOTIFIER_FULL_MAP to get full mappings as intel_iommu only notifies
changed mappings while VFIO dirty page sync needs full mappings. Thanks
to current implementation of iova tree, we could pick mappings from iova
trees directly instead of walking through guest IOMMU page table.
IOMMU_NOTIFIER_MAP is still used to get changed mappings for optimization
purpose. As long as notification for IOMMU_NOTIFIER_MAP could ensure shadow
page table in sync, then it's OK.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 49 +++++++++++++++++++++++++++++++++++--------
hw/vfio/common.c | 2 +-
include/exec/memory.h | 13 ++++++++++++
softmmu/memory.c | 4 ++++
4 files changed, 58 insertions(+), 10 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 94d52f4205d2..061fcded0dfb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3819,6 +3819,41 @@ static int vtd_replay_hook(IOMMUTLBEvent *event, void *private)
return 0;
}
+static gboolean vtd_replay_full_map(DMAMap *map, gpointer *private)
+{
+ IOMMUTLBEvent event;
+
+ event.type = IOMMU_NOTIFIER_MAP;
+ event.entry.iova = map->iova;
+ event.entry.addr_mask = map->size;
+ event.entry.target_as = &address_space_memory;
+ event.entry.perm = map->perm;
+ event.entry.translated_addr = map->translated_addr;
+
+ return vtd_replay_hook(&event, private);
+}
+
+/*
+ * This is a fast path to notify the full mappings falling in the scope
+ * of IOMMU notifier. The call site should ensure no iova tree update by
+ * taking necessary locks(e.x. BQL).
+ */
+static int vtd_page_walk_full_map_fast_path(IOVATree *iova_tree,
+ IOMMUNotifier *n)
+{
+ DMAMap map;
+
+ map.iova = n->start;
+ map.size = n->end - n->start;
+ if (!iova_tree_find(iova_tree, &map)) {
+ return 0;
+ }
+
+ iova_tree_foreach_range_data(iova_tree, &map, vtd_replay_full_map,
+ (gpointer *)n);
+ return 0;
+}
+
static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
{
VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
@@ -3826,13 +3861,6 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
uint8_t bus_n = pci_bus_num(vtd_as->bus);
VTDContextEntry ce;
- /*
- * The replay can be triggered by either a invalidation or a newly
- * created entry. No matter what, we release existing mappings
- * (it means flushing caches for UNMAP-only registers).
- */
- vtd_address_space_unmap(vtd_as, n);
-
if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
"legacy mode",
@@ -3850,8 +3878,11 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
.as = vtd_as,
.domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
};
-
- vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
+ if (n->notifier_flags & IOMMU_NOTIFIER_FULL_MAP) {
+ vtd_page_walk_full_map_fast_path(vtd_as->iova_tree, n);
+ } else {
+ vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
+ }
}
} else {
trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 78358ede2764..5dae4502b908 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1890,7 +1890,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
iommu_notifier_init(&gdn.n,
vfio_iommu_map_dirty_notify,
- IOMMU_NOTIFIER_MAP,
+ IOMMU_NOTIFIER_FULL_MAP,
section->offset_within_region,
int128_get64(llend),
idx);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index c3661b2276c7..eecc3eec6702 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -142,6 +142,10 @@ struct IOMMUTLBEntry {
* events (e.g. VFIO). Both notifications must be accurate so that
* the shadow page table is fully in sync with the guest view.
*
+ * Besides MAP, there is a special use case called FULL_MAP which
+ * requests notification for all the existent mappings (e.g. VFIO
+ * dirty page sync).
+ *
* (2) When the device doesn't need accurate synchronizations of the
* vIOMMU page tables, it needs to register only with UNMAP or
* DEVIOTLB_UNMAP notifies.
@@ -164,6 +168,8 @@ typedef enum {
IOMMU_NOTIFIER_MAP = 0x2,
/* Notify changes on device IOTLB entries */
IOMMU_NOTIFIER_DEVIOTLB_UNMAP = 0x04,
+ /* Notify every existent entries */
+ IOMMU_NOTIFIER_FULL_MAP = 0x8,
} IOMMUNotifierFlag;
#define IOMMU_NOTIFIER_IOTLB_EVENTS (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
@@ -237,6 +243,13 @@ static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
hwaddr start, hwaddr end,
int iommu_idx)
{
+ /*
+ * memory_region_notify_iommu_one() needs IOMMU_NOTIFIER_MAP set to
+ * trigger notifier.
+ */
+ if (flags & IOMMU_NOTIFIER_FULL_MAP) {
+ flags |= IOMMU_NOTIFIER_MAP;
+ }
n->notify = fn;
n->notifier_flags = flags;
n->start = start;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce7028..0a8465007c66 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1922,6 +1922,10 @@ int memory_region_register_iommu_notifier(MemoryRegion *mr,
assert(n->iommu_idx >= 0 &&
n->iommu_idx < memory_region_iommu_num_indexes(iommu_mr));
+ if (n->notifier_flags & IOMMU_NOTIFIER_FULL_MAP) {
+ error_setg(errp, "FULL_MAP could only be used in replay");
+ }
+
QLIST_INSERT_HEAD(&iommu_mr->iommu_notify, n, node);
ret = memory_region_update_iommu_notify_flags(iommu_mr, errp);
if (ret) {
--
2.34.1
next prev parent reply other threads:[~2023-06-01 6:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-01 6:33 [PATCH v2 0/4] Optimize UNMAP call and bug fix Zhenzhong Duan
2023-06-01 6:33 ` [PATCH v2 1/4] util: Add iova_tree_foreach_range_data Zhenzhong Duan
2023-06-05 18:39 ` Peter Xu
2023-06-01 6:33 ` Zhenzhong Duan [this message]
2023-06-05 18:39 ` [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync Peter Xu
2023-06-06 2:35 ` Duan, Zhenzhong
2023-06-06 15:41 ` Peter Xu
2023-06-07 3:14 ` Duan, Zhenzhong
2023-06-07 14:06 ` Peter Xu
2023-06-01 6:33 ` [PATCH v2 3/4] memory: Document update on replay() Zhenzhong Duan
2023-06-05 18:42 ` Peter Xu
2023-06-06 2:48 ` Duan, Zhenzhong
2023-06-01 6:33 ` [PATCH v2 4/4] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230601063320.139308-3-zhenzhong.duan@intel.com \
--to=zhenzhong.duan@intel.com \
--cc=alex.williamson@redhat.com \
--cc=chao.p.peng@intel.com \
--cc=cjia@nvidia.com \
--cc=clg@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=jasowang@redhat.com \
--cc=kwankhede@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).