From: Alex Williamson <alex.williamson@redhat.com>
To: "Huang, Kai" <kai.huang@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>, "Aviv B.D" <bd.aviv@gmail.com>,
Jan Kiszka <jan.kiszka@web.de>,
qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest
Date: Tue, 7 Jun 2016 12:46:53 -0600 [thread overview]
Message-ID: <20160607124653.693f30bb@ul30vt.home> (raw)
In-Reply-To: <d2744e04-1f13-e7d8-6933-1c5a2a8fd161@linux.intel.com>
On Tue, 7 Jun 2016 17:21:06 +1200
"Huang, Kai" <kai.huang@linux.intel.com> wrote:
> On 6/7/2016 3:58 PM, Alex Williamson wrote:
> > On Tue, 7 Jun 2016 11:20:32 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >
> >> On Mon, Jun 06, 2016 at 11:02:11AM -0600, Alex Williamson wrote:
> >>> On Mon, 6 Jun 2016 21:43:17 +0800
> >>> Peter Xu <peterx@redhat.com> wrote:
> >>>
> >>>> On Mon, Jun 06, 2016 at 07:11:41AM -0600, Alex Williamson wrote:
> >>>>> On Mon, 6 Jun 2016 13:04:07 +0800
> >>>>> Peter Xu <peterx@redhat.com> wrote:
> >>>> [...]
> >>>>>> Besides the reason that there might have guests that do not support
> >>>>>> CM=1, will there be performance considerations? When user's
> >>>>>> configuration does not require CM capability (e.g., generic VM
> >>>>>> configuration, without VFIO), shall we allow user to disable the CM
> >>>>>> bit so that we can have better IOMMU performance (avoid extra and
> >>>>>> useless invalidations)?
> >>>>>
> >>>>> With Alexey's proposed patch to have callback ops when the iommu
> >>>>> notifier list adds its first entry and removes its last, any of the
> >>>>> additional overhead to generate notifies when nobody is listening can
> >>>>> be avoided. These same callbacks would be the ones that need to
> >>>>> generate a hw_error if a notifier is added while running in CM=0.
> >>>>
> >>>> Not familar with Alexey's patch
> >>>
> >>> https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg00079.html
> >>
> >> Thanks for the pointer. :)
> >>
> >>>
> >>>> , but is that for VFIO only?
> >>>
> >>> vfio is currently the only user of the iommu notifier, but the
> >>> interface is generic, which is how it should (must) be.
> >>
> >> Yes.
> >>
> >>>
> >>>> I mean, if
> >>>> we configured CMbit=1, guest kernel will send invalidation request
> >>>> every time it creates new entries (context entries, or iotlb
> >>>> entries). Even without VFIO notifiers, guest need to trap into QEMU
> >>>> and process the invalidation requests. This is avoidable if we are not
> >>>> using VFIO devices at all (so no need to maintain any mappings),
> >>>> right?
> >>>
> >>> CM=1 only defines that not-present and invalid entries can be cached,
> >>> any changes to existing entries requires an invalidation regardless of
> >>> CM. What you're looking for sounds more like ECAP.C:
> >>
> >> Yes, but I guess what I was talking about is CM bit but not ECAP.C.
> >> When we clear/replace one context entry, guest kernel will definitely
> >> send one context entry invalidation to QEMU:
> >>
> >> static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 devfn)
> >> {
> >> if (!iommu)
> >> return;
> >>
> >> clear_context_table(iommu, bus, devfn);
> >> iommu->flush.flush_context(iommu, 0, 0, 0,
> >> DMA_CCMD_GLOBAL_INVL);
> >> iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
> >> }
> >>
> >> ... While if we are creating a new one (like attaching a new VFIO
> >> device?), it's an optional behavior depending on whether CM bit is
> >> set:
> >>
> >> static int domain_context_mapping_one(struct dmar_domain *domain,
> >> struct intel_iommu *iommu,
> >> u8 bus, u8 devfn)
> >> {
> >> ...
> >> /*
> >> * It's a non-present to present mapping. If hardware doesn't cache
> >> * non-present entry we only need to flush the write-buffer. If the
> >> * _does_ cache non-present entries, then it does so in the special
> >> * domain #0, which we have to flush:
> >> */
> >> if (cap_caching_mode(iommu->cap)) {
> >> iommu->flush.flush_context(iommu, 0,
> >> (((u16)bus) << 8) | devfn,
> >> DMA_CCMD_MASK_NOBIT,
> >> DMA_CCMD_DEVICE_INVL);
> >> iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> >> } else {
> >> iommu_flush_write_buffer(iommu);
> >> }
> >> ...
> >> }
> >>
> >> Only if cap_caching_mode() is set (which is bit 7, the CM bit), we
> >> will send these invalidations. What I meant is that, we should allow
> >> user to specify the CM bit, so that when we are not using VFIO
> >> devices, we can skip the above flush_content() and flush_iotlb()
> >> etc... So, besides the truth that we have some guests do not support
> >> CM bit (like Jailhouse), performance might be another consideration
> >> point that we should allow user to specify the CM bit themselfs.
> >
> > I'm dubious of this, IOMMU drivers are already aware that hardware
> > flushes are expensive and do batching to optimize it. The queued
> > invalidation mechanism itself is meant to allow asynchronous
> > invalidations. QEMU invalidating a virtual IOMMU might very well be
> > faster than hardware.
>
> Do batching doesn't mean we can eliminate the IOTLB flush for mappings
> from non-present to present, in case of CM=1, while in case CM=0 those
> IOTLB flush are not necessary, just like the code above shows. Therefore
> generally speaking CM=0 should have better performance than CM=1, even
> for Qemu's vIOMMU.
>
> In my understanding the purpose of exposing CM=1 is to force guest do
> IOTLB flush for each mapping change (including from non-present to
> present) so Qemu is able to emulate each mapping change from guest
> (correct me if I was wrong). If previous statement stands, CM=1 is
> really a workaround for making vfio assigned devices and vIOMMU work
> together, and unfortunately this cannot work on other vendor's IOMMU
> without CM bit, such as AMD's IOMMU.
>
> So what's the requirements of making vfio assigned devices and vIOMMU
> work together? I think it should be more helpful to implement a more
> generic solution to monitor and emulate guest vIOMMU's page table,
> rather than simply exposing CM=1 to guest, as it only works on intel IOMMU.
>
> And what do you mean asynchronous invalidations? I think the iova of the
> changed mappings cannot be used until the mappings are invalidated. It
> doesn't matter whether the invalidation is done via QI or register.
Ok, so you're arguing that CM=0 is more efficient than CM=1 because it
eliminates some portion of the invalidations necessary by the guest,
while at the same time arguing for a more general solution of shadow
page tables which would trap into the vIOMMU at every update,
eliminating all the batching done by the guest IOMMU driver code
attempting to reduce and consolidate the number of flushes done. All
of this with only speculation on what might be more efficient. Can we
get vIOMMU working, especially with assigned devices, before we
speculate further?
How do we expect a vIOMMU to be used in a guest? In the case of
emulated devices, what value does it provide? Are emulated devices
isolated from one another by the vIOMMU? No. Do we have 32-bit
emulated devices for which DMA translation at the vIOMMU is
significantly more efficient than bounce buffers within the guest?
Probably not, and if we did we could just emulate 64bit devices. So I
assume that beyond being a development tool, our primary useful feature
of a vIOMMU is to expose devices to guest userspace (and thus nested
guests) via tools like vfio. Am I wrong here? In this use case, it's
most efficient to run with iommu=pt in the L1 guest, which would make
any sort of invalidations a very rare event. Making use of vfio inside
the L1 guest would then move a device from the static-identity domain
in the L1 guest into a new domain where the mappings for that domain
are also relatively static. So I really don't see why we're trying to
optimize the invalidation of the vIOMMU at this point. I also have to
believe that the hardware folks that designed VT-d believed there to be
a performance advantage to using emulated VT-d with CM=1 versus
shadowing all of the VT-d data structures in the hypervisor or they
wouldn't have added this bit to the specification. Thanks,
Alex
next prev parent reply other threads:[~2016-06-07 18:47 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-21 16:19 [Qemu-devel] [PATCH v3 0/3] IOMMU: Add Support to VFIO devices with vIOMMU present Aviv B.D
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest Aviv B.D
2016-05-21 16:42 ` Jan Kiszka
2016-06-02 8:44 ` Peter Xu
2016-06-02 13:00 ` Alex Williamson
2016-06-02 13:14 ` Jan Kiszka
2016-06-02 13:17 ` Jan Kiszka
2016-06-02 16:15 ` Michael S. Tsirkin
2016-06-06 5:04 ` Peter Xu
2016-06-06 13:11 ` Alex Williamson
2016-06-06 13:43 ` Peter Xu
2016-06-06 17:02 ` Alex Williamson
2016-06-07 3:20 ` Peter Xu
2016-06-07 3:58 ` Alex Williamson
2016-06-07 5:00 ` Peter Xu
2016-06-07 5:21 ` Huang, Kai
2016-06-07 18:46 ` Alex Williamson [this message]
2016-06-07 22:39 ` Huang, Kai
2016-05-24 8:14 ` Jason Wang
2016-05-24 9:25 ` Jan Kiszka
2016-05-28 16:12 ` Aviv B.D.
2016-05-28 16:34 ` Kiszka, Jan
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
2016-06-06 5:04 ` Peter Xu
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 3/3] IOMMU: Integrate between VFIO and vIOMMU to support device assignment Aviv B.D
2016-05-23 17:53 ` Alex Williamson
2016-05-26 20:58 ` Alex Williamson
2016-05-28 10:52 ` Aviv B.D.
2016-05-28 16:02 ` Alex Williamson
2016-05-28 16:10 ` Aviv B.D.
2016-05-28 17:39 ` Alex Williamson
2016-05-28 18:14 ` Aviv B.D.
2016-05-28 19:48 ` Alex Williamson
2016-06-02 13:09 ` Aviv B.D.
2016-06-02 13:34 ` Alex Williamson
2016-06-06 8:09 ` Peter Xu
2016-06-06 18:21 ` Alex Williamson
2016-06-07 13:20 ` Peter Xu
2016-06-06 7:38 ` Peter Xu
2016-06-06 17:30 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160607124653.693f30bb@ul30vt.home \
--to=alex.williamson@redhat.com \
--cc=bd.aviv@gmail.com \
--cc=jan.kiszka@web.de \
--cc=kai.huang@linux.intel.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).