qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Duan, Zhenzhong" <zhenzhong.duan@intel.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"richard.henderson@linaro.org" <richard.henderson@linaro.org>,
	"eduardo@habkost.net" <eduardo@habkost.net>,
	"marcel.apfelbaum@gmail.com" <marcel.apfelbaum@gmail.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"clg@redhat.com" <clg@redhat.com>,
	"david@redhat.com" <david@redhat.com>,
	"philmd@linaro.org" <philmd@linaro.org>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"cjia@nvidia.com" <cjia@nvidia.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Peng, Chao P" <chao.p.peng@intel.com>
Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync
Date: Wed, 7 Jun 2023 10:06:19 -0400	[thread overview]
Message-ID: <ZICO2/bsxC4cnDRX@x1n> (raw)
In-Reply-To: <SJ0PR11MB6744F1C210BFDB46BD9269389253A@SJ0PR11MB6744.namprd11.prod.outlook.com>

On Wed, Jun 07, 2023 at 03:14:07AM +0000, Duan, Zhenzhong wrote:
> 
> 
> >-----Original Message-----
> >From: Peter Xu <peterx@redhat.com>
> >Sent: Tuesday, June 6, 2023 11:42 PM
> >Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty
> >page sync
> >
> ...
> >> >> a/include/exec/memory.h b/include/exec/memory.h index
> >> >> c3661b2276c7..eecc3eec6702 100644
> >> >> --- a/include/exec/memory.h
> >> >> +++ b/include/exec/memory.h
> >> >> @@ -142,6 +142,10 @@ struct IOMMUTLBEntry {
> >> >>   *       events (e.g. VFIO). Both notifications must be accurate so that
> >> >>   *       the shadow page table is fully in sync with the guest view.
> >> >>   *
> >> >> + *       Besides MAP, there is a special use case called FULL_MAP which
> >> >> + *       requests notification for all the existent mappings (e.g. VFIO
> >> >> + *       dirty page sync).
> >> >
> >> >Why do we need FULL_MAP?  Can we simply reimpl MAP?
> >>
> >> Sorry, I just realized IOMMU_NOTIFIER_FULL_MAP is confusing.
> >> Maybe IOMMU_NOTIFIER_MAP_FAST_PATH could be a bit more accurate.
> >>
> >> IIUC, currently replay() is called from two paths, one is VFIO device
> >> address space switch which walks over the IOMMU page table to setup
> >> initial mapping and cache it in IOVA tree. The other is VFIO dirty
> >> sync which walks over the IOMMU page table to notify the mapping,
> >> because we already cache the mapping in IOVA tree and VFIO dirty sync
> >> is protected by BQL, so I think it's fine to pick mapping from IOVA
> >> tree directly instead of walking over IOMMU page table. That's the
> >> reason of FULL_MAP (IOMMU_NOTIFIER_MAP_FAST_PATH better).
> >>
> >> About "reimpl MAP", do you mean to walk over IOMMU page table to
> >> notify all existing MAP events without checking with the IOVA tree for
> >> difference? If you prefer, I'll rewrite an implementation this way.
> >
> >We still need to maintain iova tree. IIUC that's the major complexity of vt-d
> >emulation, because we have that extra cache layer to sync with the real guest
> >iommu pgtables.
> 
> Can't agree more, looks only intel-iommu and virtio-iommu implemented such
> optimization for now.
> 
> >
> >But I think we were just wrong to also notify in the unmap_all() procedure.
> >
> >IIUC the right thing to do (keeping replay() the interface as-is, per it used to be
> >defined) is we should replace the unmap_all() to only evacuate the iova tree
> >(keeping all host mappings untouched, IOW, don't notify UNMAP), and do a
> >full resync there, which will notify all existing mappings as MAP.  Then we
> >don't interrupt with any existing mapping if there is (e.g. for the dirty sync
> >case), meanwhile we keep sync too to latest (for moving a vfio device into an
> >existing iommu group).
> >
> >Do you think that'll work for us?
> 
> Yes, I think I get your point.
> Below simple change will work in your suggested way, do you agree?
> 
> @@ -3825,13 +3833,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>      IntelIOMMUState *s = vtd_as->iommu_state;
>      uint8_t bus_n = pci_bus_num(vtd_as->bus);
>      VTDContextEntry ce;
> +    DMAMap map = { .iova = 0, .size = HWADDR_MAX }
> 
> -    /*
> -     * The replay can be triggered by either a invalidation or a newly
> -     * created entry. No matter what, we release existing mappings
> -     * (it means flushing caches for UNMAP-only registers).
> -     */
> -    vtd_address_space_unmap(vtd_as, n);
> +    /* replay is protected by BQL, page walk will re-setup IOVA tree safely */
> +    iova_tree_remove(as->iova_tree, map);
> 
>      if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
>          trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :

Yes, thanks!

-- 
Peter Xu



  reply	other threads:[~2023-06-07 14:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-01  6:33 [PATCH v2 0/4] Optimize UNMAP call and bug fix Zhenzhong Duan
2023-06-01  6:33 ` [PATCH v2 1/4] util: Add iova_tree_foreach_range_data Zhenzhong Duan
2023-06-05 18:39   ` Peter Xu
2023-06-01  6:33 ` [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync Zhenzhong Duan
2023-06-05 18:39   ` Peter Xu
2023-06-06  2:35     ` Duan, Zhenzhong
2023-06-06 15:41       ` Peter Xu
2023-06-07  3:14         ` Duan, Zhenzhong
2023-06-07 14:06           ` Peter Xu [this message]
2023-06-01  6:33 ` [PATCH v2 3/4] memory: Document update on replay() Zhenzhong Duan
2023-06-05 18:42   ` Peter Xu
2023-06-06  2:48     ` Duan, Zhenzhong
2023-06-01  6:33 ` [PATCH v2 4/4] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZICO2/bsxC4cnDRX@x1n \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=chao.p.peng@intel.com \
    --cc=cjia@nvidia.com \
    --cc=clg@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=jasowang@redhat.com \
    --cc=kwankhede@nvidia.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).