From: Peter Xu <peterx@redhat.com>
To: Zhenzhong Duan <zhenzhong.duan@intel.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
yi.l.liu@intel.com, chao.p.peng@intel.com
Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync
Date: Mon, 5 Jun 2023 14:39:24 -0400 [thread overview]
Message-ID: <ZH4r3FCIU8uOiV8h@x1n> (raw)
In-Reply-To: <20230601063320.139308-3-zhenzhong.duan@intel.com>
On Thu, Jun 01, 2023 at 02:33:18PM +0800, Zhenzhong Duan wrote:
> Peter Xu found a potential issue:
>
> "The other thing is when I am looking at the new code I found that we
> actually extended the replay() to be used also in dirty tracking of vfio,
> in vfio_sync_dirty_bitmap(). For that maybe it's already broken if
> unmap_all() because afaiu log_sync() can be called in migration thread
> anytime during DMA so I think it means the device is prone to DMA with the
> IOMMU pgtable quickly erased and rebuilt here, which means the DMA could
> fail unexpectedly. Copy Alex, Kirti and Neo."
>
> To eliminate this small window with empty mapping, we should remove the
> call to unmap_all(). Besides that, introduce a new notifier type called
> IOMMU_NOTIFIER_FULL_MAP to get full mappings as intel_iommu only notifies
> changed mappings while VFIO dirty page sync needs full mappings. Thanks
> to current implementation of iova tree, we could pick mappings from iova
> trees directly instead of walking through guest IOMMU page table.
>
> IOMMU_NOTIFIER_MAP is still used to get changed mappings for optimization
> purpose. As long as notification for IOMMU_NOTIFIER_MAP could ensure shadow
> page table in sync, then it's OK.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu.c | 49 +++++++++++++++++++++++++++++++++++--------
> hw/vfio/common.c | 2 +-
> include/exec/memory.h | 13 ++++++++++++
> softmmu/memory.c | 4 ++++
> 4 files changed, 58 insertions(+), 10 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 94d52f4205d2..061fcded0dfb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3819,6 +3819,41 @@ static int vtd_replay_hook(IOMMUTLBEvent *event, void *private)
> return 0;
> }
>
> +static gboolean vtd_replay_full_map(DMAMap *map, gpointer *private)
> +{
> + IOMMUTLBEvent event;
> +
> + event.type = IOMMU_NOTIFIER_MAP;
> + event.entry.iova = map->iova;
> + event.entry.addr_mask = map->size;
> + event.entry.target_as = &address_space_memory;
> + event.entry.perm = map->perm;
> + event.entry.translated_addr = map->translated_addr;
> +
> + return vtd_replay_hook(&event, private);
> +}
> +
> +/*
> + * This is a fast path to notify the full mappings falling in the scope
> + * of IOMMU notifier. The call site should ensure no iova tree update by
> + * taking necessary locks(e.x. BQL).
We should be accurate on the locking - I think it's the BQL so far.
> + */
> +static int vtd_page_walk_full_map_fast_path(IOVATree *iova_tree,
> + IOMMUNotifier *n)
> +{
> + DMAMap map;
> +
> + map.iova = n->start;
> + map.size = n->end - n->start;
> + if (!iova_tree_find(iova_tree, &map)) {
> + return 0;
> + }
> +
> + iova_tree_foreach_range_data(iova_tree, &map, vtd_replay_full_map,
> + (gpointer *)n);
> + return 0;
> +}
> +
> static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> {
> VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
> @@ -3826,13 +3861,6 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> uint8_t bus_n = pci_bus_num(vtd_as->bus);
> VTDContextEntry ce;
>
> - /*
> - * The replay can be triggered by either a invalidation or a newly
> - * created entry. No matter what, we release existing mappings
> - * (it means flushing caches for UNMAP-only registers).
> - */
> - vtd_address_space_unmap(vtd_as, n);
> -
> if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
> trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
> "legacy mode",
> @@ -3850,8 +3878,11 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> .as = vtd_as,
> .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
> };
> -
> - vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
> + if (n->notifier_flags & IOMMU_NOTIFIER_FULL_MAP) {
> + vtd_page_walk_full_map_fast_path(vtd_as->iova_tree, n);
> + } else {
> + vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
> + }
> }
> } else {
> trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 78358ede2764..5dae4502b908 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1890,7 +1890,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
>
> iommu_notifier_init(&gdn.n,
> vfio_iommu_map_dirty_notify,
> - IOMMU_NOTIFIER_MAP,
> + IOMMU_NOTIFIER_FULL_MAP,
> section->offset_within_region,
> int128_get64(llend),
> idx);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c3661b2276c7..eecc3eec6702 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -142,6 +142,10 @@ struct IOMMUTLBEntry {
> * events (e.g. VFIO). Both notifications must be accurate so that
> * the shadow page table is fully in sync with the guest view.
> *
> + * Besides MAP, there is a special use case called FULL_MAP which
> + * requests notification for all the existent mappings (e.g. VFIO
> + * dirty page sync).
Why do we need FULL_MAP? Can we simply reimpl MAP?
> + *
> * (2) When the device doesn't need accurate synchronizations of the
> * vIOMMU page tables, it needs to register only with UNMAP or
> * DEVIOTLB_UNMAP notifies.
> @@ -164,6 +168,8 @@ typedef enum {
> IOMMU_NOTIFIER_MAP = 0x2,
> /* Notify changes on device IOTLB entries */
> IOMMU_NOTIFIER_DEVIOTLB_UNMAP = 0x04,
> + /* Notify every existent entries */
> + IOMMU_NOTIFIER_FULL_MAP = 0x8,
> } IOMMUNotifierFlag;
>
> #define IOMMU_NOTIFIER_IOTLB_EVENTS (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
> @@ -237,6 +243,13 @@ static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> hwaddr start, hwaddr end,
> int iommu_idx)
> {
> + /*
> + * memory_region_notify_iommu_one() needs IOMMU_NOTIFIER_MAP set to
> + * trigger notifier.
> + */
> + if (flags & IOMMU_NOTIFIER_FULL_MAP) {
> + flags |= IOMMU_NOTIFIER_MAP;
> + }
> n->notify = fn;
> n->notifier_flags = flags;
> n->start = start;
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7d9494ce7028..0a8465007c66 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1922,6 +1922,10 @@ int memory_region_register_iommu_notifier(MemoryRegion *mr,
> assert(n->iommu_idx >= 0 &&
> n->iommu_idx < memory_region_iommu_num_indexes(iommu_mr));
>
> + if (n->notifier_flags & IOMMU_NOTIFIER_FULL_MAP) {
> + error_setg(errp, "FULL_MAP could only be used in replay");
> + }
> +
> QLIST_INSERT_HEAD(&iommu_mr->iommu_notify, n, node);
> ret = memory_region_update_iommu_notify_flags(iommu_mr, errp);
> if (ret) {
> --
> 2.34.1
>
--
Peter Xu
next prev parent reply other threads:[~2023-06-05 18:40 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-01 6:33 [PATCH v2 0/4] Optimize UNMAP call and bug fix Zhenzhong Duan
2023-06-01 6:33 ` [PATCH v2 1/4] util: Add iova_tree_foreach_range_data Zhenzhong Duan
2023-06-05 18:39 ` Peter Xu
2023-06-01 6:33 ` [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync Zhenzhong Duan
2023-06-05 18:39 ` Peter Xu [this message]
2023-06-06 2:35 ` Duan, Zhenzhong
2023-06-06 15:41 ` Peter Xu
2023-06-07 3:14 ` Duan, Zhenzhong
2023-06-07 14:06 ` Peter Xu
2023-06-01 6:33 ` [PATCH v2 3/4] memory: Document update on replay() Zhenzhong Duan
2023-06-05 18:42 ` Peter Xu
2023-06-06 2:48 ` Duan, Zhenzhong
2023-06-01 6:33 ` [PATCH v2 4/4] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZH4r3FCIU8uOiV8h@x1n \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=chao.p.peng@intel.com \
--cc=cjia@nvidia.com \
--cc=clg@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=jasowang@redhat.com \
--cc=kwankhede@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=yi.l.liu@intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).