From: Peter Xu <peterx@redhat.com>
To: "Duan, Zhenzhong" <zhenzhong.duan@intel.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"mst@redhat.com" <mst@redhat.com>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"richard.henderson@linaro.org" <richard.henderson@linaro.org>,
"eduardo@habkost.net" <eduardo@habkost.net>,
"marcel.apfelbaum@gmail.com" <marcel.apfelbaum@gmail.com>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"clg@redhat.com" <clg@redhat.com>,
"david@redhat.com" <david@redhat.com>,
"philmd@linaro.org" <philmd@linaro.org>,
"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
"cjia@nvidia.com" <cjia@nvidia.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"Peng, Chao P" <chao.p.peng@intel.com>
Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync
Date: Wed, 7 Jun 2023 10:06:19 -0400 [thread overview]
Message-ID: <ZICO2/bsxC4cnDRX@x1n> (raw)
In-Reply-To: <SJ0PR11MB6744F1C210BFDB46BD9269389253A@SJ0PR11MB6744.namprd11.prod.outlook.com>
On Wed, Jun 07, 2023 at 03:14:07AM +0000, Duan, Zhenzhong wrote:
>
>
> >-----Original Message-----
> >From: Peter Xu <peterx@redhat.com>
> >Sent: Tuesday, June 6, 2023 11:42 PM
> >Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty
> >page sync
> >
> ...
> >> >> a/include/exec/memory.h b/include/exec/memory.h index
> >> >> c3661b2276c7..eecc3eec6702 100644
> >> >> --- a/include/exec/memory.h
> >> >> +++ b/include/exec/memory.h
> >> >> @@ -142,6 +142,10 @@ struct IOMMUTLBEntry {
> >> >> * events (e.g. VFIO). Both notifications must be accurate so that
> >> >> * the shadow page table is fully in sync with the guest view.
> >> >> *
> >> >> + * Besides MAP, there is a special use case called FULL_MAP which
> >> >> + * requests notification for all the existent mappings (e.g. VFIO
> >> >> + * dirty page sync).
> >> >
> >> >Why do we need FULL_MAP? Can we simply reimpl MAP?
> >>
> >> Sorry, I just realized IOMMU_NOTIFIER_FULL_MAP is confusing.
> >> Maybe IOMMU_NOTIFIER_MAP_FAST_PATH could be a bit more accurate.
> >>
> >> IIUC, currently replay() is called from two paths, one is VFIO device
> >> address space switch which walks over the IOMMU page table to setup
> >> initial mapping and cache it in IOVA tree. The other is VFIO dirty
> >> sync which walks over the IOMMU page table to notify the mapping,
> >> because we already cache the mapping in IOVA tree and VFIO dirty sync
> >> is protected by BQL, so I think it's fine to pick mapping from IOVA
> >> tree directly instead of walking over IOMMU page table. That's the
> >> reason of FULL_MAP (IOMMU_NOTIFIER_MAP_FAST_PATH better).
> >>
> >> About "reimpl MAP", do you mean to walk over IOMMU page table to
> >> notify all existing MAP events without checking with the IOVA tree for
> >> difference? If you prefer, I'll rewrite an implementation this way.
> >
> >We still need to maintain iova tree. IIUC that's the major complexity of vt-d
> >emulation, because we have that extra cache layer to sync with the real guest
> >iommu pgtables.
>
> Can't agree more, looks only intel-iommu and virtio-iommu implemented such
> optimization for now.
>
> >
> >But I think we were just wrong to also notify in the unmap_all() procedure.
> >
> >IIUC the right thing to do (keeping replay() the interface as-is, per it used to be
> >defined) is we should replace the unmap_all() to only evacuate the iova tree
> >(keeping all host mappings untouched, IOW, don't notify UNMAP), and do a
> >full resync there, which will notify all existing mappings as MAP. Then we
> >don't interrupt with any existing mapping if there is (e.g. for the dirty sync
> >case), meanwhile we keep sync too to latest (for moving a vfio device into an
> >existing iommu group).
> >
> >Do you think that'll work for us?
>
> Yes, I think I get your point.
> Below simple change will work in your suggested way, do you agree?
>
> @@ -3825,13 +3833,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> IntelIOMMUState *s = vtd_as->iommu_state;
> uint8_t bus_n = pci_bus_num(vtd_as->bus);
> VTDContextEntry ce;
> + DMAMap map = { .iova = 0, .size = HWADDR_MAX }
>
> - /*
> - * The replay can be triggered by either a invalidation or a newly
> - * created entry. No matter what, we release existing mappings
> - * (it means flushing caches for UNMAP-only registers).
> - */
> - vtd_address_space_unmap(vtd_as, n);
> + /* replay is protected by BQL, page walk will re-setup IOVA tree safely */
> + iova_tree_remove(as->iova_tree, map);
>
> if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
> trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
Yes, thanks!
--
Peter Xu
next prev parent reply other threads:[~2023-06-07 14:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-01 6:33 [PATCH v2 0/4] Optimize UNMAP call and bug fix Zhenzhong Duan
2023-06-01 6:33 ` [PATCH v2 1/4] util: Add iova_tree_foreach_range_data Zhenzhong Duan
2023-06-05 18:39 ` Peter Xu
2023-06-01 6:33 ` [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync Zhenzhong Duan
2023-06-05 18:39 ` Peter Xu
2023-06-06 2:35 ` Duan, Zhenzhong
2023-06-06 15:41 ` Peter Xu
2023-06-07 3:14 ` Duan, Zhenzhong
2023-06-07 14:06 ` Peter Xu [this message]
2023-06-01 6:33 ` [PATCH v2 3/4] memory: Document update on replay() Zhenzhong Duan
2023-06-05 18:42 ` Peter Xu
2023-06-06 2:48 ` Duan, Zhenzhong
2023-06-01 6:33 ` [PATCH v2 4/4] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZICO2/bsxC4cnDRX@x1n \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=chao.p.peng@intel.com \
--cc=cjia@nvidia.com \
--cc=clg@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=jasowang@redhat.com \
--cc=kwankhede@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=yi.l.liu@intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.