qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Zhenzhong Duan <zhenzhong.duan@intel.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
	pbonzini@redhat.com, richard.henderson@linaro.org,
	eduardo@habkost.net, marcel.apfelbaum@gmail.com,
	alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
	philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
	yi.l.liu@intel.com, chao.p.peng@intel.com
Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync
Date: Mon, 5 Jun 2023 14:39:24 -0400	[thread overview]
Message-ID: <ZH4r3FCIU8uOiV8h@x1n> (raw)
In-Reply-To: <20230601063320.139308-3-zhenzhong.duan@intel.com>

On Thu, Jun 01, 2023 at 02:33:18PM +0800, Zhenzhong Duan wrote:
> Peter Xu found a potential issue:
> 
> "The other thing is when I am looking at the new code I found that we
> actually extended the replay() to be used also in dirty tracking of vfio,
> in vfio_sync_dirty_bitmap().  For that maybe it's already broken if
> unmap_all() because afaiu log_sync() can be called in migration thread
> anytime during DMA so I think it means the device is prone to DMA with the
> IOMMU pgtable quickly erased and rebuilt here, which means the DMA could
> fail unexpectedly.  Copy Alex, Kirti and Neo."
> 
> To eliminate this small window with empty mapping, we should remove the
> call to unmap_all(). Besides that, introduce a new notifier type called
> IOMMU_NOTIFIER_FULL_MAP to get full mappings as intel_iommu only notifies
> changed mappings while VFIO dirty page sync needs full mappings. Thanks
> to current implementation of iova tree, we could pick mappings from iova
> trees directly instead of walking through guest IOMMU page table.
> 
> IOMMU_NOTIFIER_MAP is still used to get changed mappings for optimization
> purpose. As long as notification for IOMMU_NOTIFIER_MAP could ensure shadow
> page table in sync, then it's OK.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  hw/i386/intel_iommu.c | 49 +++++++++++++++++++++++++++++++++++--------
>  hw/vfio/common.c      |  2 +-
>  include/exec/memory.h | 13 ++++++++++++
>  softmmu/memory.c      |  4 ++++
>  4 files changed, 58 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 94d52f4205d2..061fcded0dfb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3819,6 +3819,41 @@ static int vtd_replay_hook(IOMMUTLBEvent *event, void *private)
>      return 0;
>  }
>  
> +static gboolean vtd_replay_full_map(DMAMap *map, gpointer *private)
> +{
> +    IOMMUTLBEvent event;
> +
> +    event.type = IOMMU_NOTIFIER_MAP;
> +    event.entry.iova = map->iova;
> +    event.entry.addr_mask = map->size;
> +    event.entry.target_as = &address_space_memory;
> +    event.entry.perm = map->perm;
> +    event.entry.translated_addr = map->translated_addr;
> +
> +    return vtd_replay_hook(&event, private);
> +}
> +
> +/*
> + * This is a fast path to notify the full mappings falling in the scope
> + * of IOMMU notifier. The call site should ensure no iova tree update by
> + * taking necessary locks(e.x. BQL).

We should be accurate on the locking - I think it's the BQL so far.

> + */
> +static int vtd_page_walk_full_map_fast_path(IOVATree *iova_tree,
> +                                            IOMMUNotifier *n)
> +{
> +    DMAMap map;
> +
> +    map.iova = n->start;
> +    map.size = n->end - n->start;
> +    if (!iova_tree_find(iova_tree, &map)) {
> +        return 0;
> +    }
> +
> +    iova_tree_foreach_range_data(iova_tree, &map, vtd_replay_full_map,
> +                                 (gpointer *)n);
> +    return 0;
> +}
> +
>  static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
> @@ -3826,13 +3861,6 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>      uint8_t bus_n = pci_bus_num(vtd_as->bus);
>      VTDContextEntry ce;
>  
> -    /*
> -     * The replay can be triggered by either a invalidation or a newly
> -     * created entry. No matter what, we release existing mappings
> -     * (it means flushing caches for UNMAP-only registers).
> -     */
> -    vtd_address_space_unmap(vtd_as, n);
> -
>      if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
>          trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
>                                    "legacy mode",
> @@ -3850,8 +3878,11 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>                  .as = vtd_as,
>                  .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
>              };
> -
> -            vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
> +            if (n->notifier_flags & IOMMU_NOTIFIER_FULL_MAP) {
> +                vtd_page_walk_full_map_fast_path(vtd_as->iova_tree, n);
> +            } else {
> +                vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
> +            }
>          }
>      } else {
>          trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 78358ede2764..5dae4502b908 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1890,7 +1890,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
>  
>                  iommu_notifier_init(&gdn.n,
>                                      vfio_iommu_map_dirty_notify,
> -                                    IOMMU_NOTIFIER_MAP,
> +                                    IOMMU_NOTIFIER_FULL_MAP,
>                                      section->offset_within_region,
>                                      int128_get64(llend),
>                                      idx);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c3661b2276c7..eecc3eec6702 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -142,6 +142,10 @@ struct IOMMUTLBEntry {
>   *       events (e.g. VFIO). Both notifications must be accurate so that
>   *       the shadow page table is fully in sync with the guest view.
>   *
> + *       Besides MAP, there is a special use case called FULL_MAP which
> + *       requests notification for all the existent mappings (e.g. VFIO
> + *       dirty page sync).

Why do we need FULL_MAP?  Can we simply reimpl MAP?

> + *
>   *   (2) When the device doesn't need accurate synchronizations of the
>   *       vIOMMU page tables, it needs to register only with UNMAP or
>   *       DEVIOTLB_UNMAP notifies.
> @@ -164,6 +168,8 @@ typedef enum {
>      IOMMU_NOTIFIER_MAP = 0x2,
>      /* Notify changes on device IOTLB entries */
>      IOMMU_NOTIFIER_DEVIOTLB_UNMAP = 0x04,
> +    /* Notify every existent entries */
> +    IOMMU_NOTIFIER_FULL_MAP = 0x8,
>  } IOMMUNotifierFlag;
>  
>  #define IOMMU_NOTIFIER_IOTLB_EVENTS (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
> @@ -237,6 +243,13 @@ static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         hwaddr start, hwaddr end,
>                                         int iommu_idx)
>  {
> +    /*
> +     * memory_region_notify_iommu_one() needs IOMMU_NOTIFIER_MAP set to
> +     * trigger notifier.
> +     */
> +    if (flags & IOMMU_NOTIFIER_FULL_MAP) {
> +        flags |= IOMMU_NOTIFIER_MAP;
> +    }
>      n->notify = fn;
>      n->notifier_flags = flags;
>      n->start = start;
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7d9494ce7028..0a8465007c66 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1922,6 +1922,10 @@ int memory_region_register_iommu_notifier(MemoryRegion *mr,
>      assert(n->iommu_idx >= 0 &&
>             n->iommu_idx < memory_region_iommu_num_indexes(iommu_mr));
>  
> +    if (n->notifier_flags & IOMMU_NOTIFIER_FULL_MAP) {
> +        error_setg(errp, "FULL_MAP could only be used in replay");
> +    }
> +
>      QLIST_INSERT_HEAD(&iommu_mr->iommu_notify, n, node);
>      ret = memory_region_update_iommu_notify_flags(iommu_mr, errp);
>      if (ret) {
> -- 
> 2.34.1
> 

-- 
Peter Xu



  reply	other threads:[~2023-06-05 18:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-01  6:33 [PATCH v2 0/4] Optimize UNMAP call and bug fix Zhenzhong Duan
2023-06-01  6:33 ` [PATCH v2 1/4] util: Add iova_tree_foreach_range_data Zhenzhong Duan
2023-06-05 18:39   ` Peter Xu
2023-06-01  6:33 ` [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync Zhenzhong Duan
2023-06-05 18:39   ` Peter Xu [this message]
2023-06-06  2:35     ` Duan, Zhenzhong
2023-06-06 15:41       ` Peter Xu
2023-06-07  3:14         ` Duan, Zhenzhong
2023-06-07 14:06           ` Peter Xu
2023-06-01  6:33 ` [PATCH v2 3/4] memory: Document update on replay() Zhenzhong Duan
2023-06-05 18:42   ` Peter Xu
2023-06-06  2:48     ` Duan, Zhenzhong
2023-06-01  6:33 ` [PATCH v2 4/4] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZH4r3FCIU8uOiV8h@x1n \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=chao.p.peng@intel.com \
    --cc=cjia@nvidia.com \
    --cc=clg@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=jasowang@redhat.com \
    --cc=kwankhede@nvidia.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).