From: "Huang, Kai" <kai.huang@linux.intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Aviv B.D" <bd.aviv@gmail.com>, Jan Kiszka <jan.kiszka@web.de>,
qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest
Date: Wed, 8 Jun 2016 10:39:19 +1200 [thread overview]
Message-ID: <11e4b619-12ea-cda1-75f1-a106fadce96f@linux.intel.com> (raw)
In-Reply-To: <20160607124653.693f30bb@ul30vt.home>
On 6/8/2016 6:46 AM, Alex Williamson wrote:
> On Tue, 7 Jun 2016 17:21:06 +1200
> "Huang, Kai" <kai.huang@linux.intel.com> wrote:
>
>> On 6/7/2016 3:58 PM, Alex Williamson wrote:
>>> On Tue, 7 Jun 2016 11:20:32 +0800
>>> Peter Xu <peterx@redhat.com> wrote:
>>>
>>>> On Mon, Jun 06, 2016 at 11:02:11AM -0600, Alex Williamson wrote:
>>>>> On Mon, 6 Jun 2016 21:43:17 +0800
>>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>>
>>>>>> On Mon, Jun 06, 2016 at 07:11:41AM -0600, Alex Williamson wrote:
>>>>>>> On Mon, 6 Jun 2016 13:04:07 +0800
>>>>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>>> [...]
>>>>>>>> Besides the reason that there might have guests that do not support
>>>>>>>> CM=1, will there be performance considerations? When user's
>>>>>>>> configuration does not require CM capability (e.g., generic VM
>>>>>>>> configuration, without VFIO), shall we allow user to disable the CM
>>>>>>>> bit so that we can have better IOMMU performance (avoid extra and
>>>>>>>> useless invalidations)?
>>>>>>>
>>>>>>> With Alexey's proposed patch to have callback ops when the iommu
>>>>>>> notifier list adds its first entry and removes its last, any of the
>>>>>>> additional overhead to generate notifies when nobody is listening can
>>>>>>> be avoided. These same callbacks would be the ones that need to
>>>>>>> generate a hw_error if a notifier is added while running in CM=0.
>>>>>>
>>>>>> Not familar with Alexey's patch
>>>>>
>>>>> https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg00079.html
>>>>
>>>> Thanks for the pointer. :)
>>>>
>>>>>
>>>>>> , but is that for VFIO only?
>>>>>
>>>>> vfio is currently the only user of the iommu notifier, but the
>>>>> interface is generic, which is how it should (must) be.
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>>> I mean, if
>>>>>> we configured CMbit=1, guest kernel will send invalidation request
>>>>>> every time it creates new entries (context entries, or iotlb
>>>>>> entries). Even without VFIO notifiers, guest need to trap into QEMU
>>>>>> and process the invalidation requests. This is avoidable if we are not
>>>>>> using VFIO devices at all (so no need to maintain any mappings),
>>>>>> right?
>>>>>
>>>>> CM=1 only defines that not-present and invalid entries can be cached,
>>>>> any changes to existing entries requires an invalidation regardless of
>>>>> CM. What you're looking for sounds more like ECAP.C:
>>>>
>>>> Yes, but I guess what I was talking about is CM bit but not ECAP.C.
>>>> When we clear/replace one context entry, guest kernel will definitely
>>>> send one context entry invalidation to QEMU:
>>>>
>>>> static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 devfn)
>>>> {
>>>> if (!iommu)
>>>> return;
>>>>
>>>> clear_context_table(iommu, bus, devfn);
>>>> iommu->flush.flush_context(iommu, 0, 0, 0,
>>>> DMA_CCMD_GLOBAL_INVL);
>>>> iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
>>>> }
>>>>
>>>> ... While if we are creating a new one (like attaching a new VFIO
>>>> device?), it's an optional behavior depending on whether CM bit is
>>>> set:
>>>>
>>>> static int domain_context_mapping_one(struct dmar_domain *domain,
>>>> struct intel_iommu *iommu,
>>>> u8 bus, u8 devfn)
>>>> {
>>>> ...
>>>> /*
>>>> * It's a non-present to present mapping. If hardware doesn't cache
>>>> * non-present entry we only need to flush the write-buffer. If the
>>>> * _does_ cache non-present entries, then it does so in the special
>>>> * domain #0, which we have to flush:
>>>> */
>>>> if (cap_caching_mode(iommu->cap)) {
>>>> iommu->flush.flush_context(iommu, 0,
>>>> (((u16)bus) << 8) | devfn,
>>>> DMA_CCMD_MASK_NOBIT,
>>>> DMA_CCMD_DEVICE_INVL);
>>>> iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
>>>> } else {
>>>> iommu_flush_write_buffer(iommu);
>>>> }
>>>> ...
>>>> }
>>>>
>>>> Only if cap_caching_mode() is set (which is bit 7, the CM bit), we
>>>> will send these invalidations. What I meant is that, we should allow
>>>> user to specify the CM bit, so that when we are not using VFIO
>>>> devices, we can skip the above flush_content() and flush_iotlb()
>>>> etc... So, besides the truth that we have some guests do not support
>>>> CM bit (like Jailhouse), performance might be another consideration
>>>> point that we should allow user to specify the CM bit themselfs.
>>>
>>> I'm dubious of this, IOMMU drivers are already aware that hardware
>>> flushes are expensive and do batching to optimize it. The queued
>>> invalidation mechanism itself is meant to allow asynchronous
>>> invalidations. QEMU invalidating a virtual IOMMU might very well be
>>> faster than hardware.
>>
>> Do batching doesn't mean we can eliminate the IOTLB flush for mappings
>> from non-present to present, in case of CM=1, while in case CM=0 those
>> IOTLB flush are not necessary, just like the code above shows. Therefore
>> generally speaking CM=0 should have better performance than CM=1, even
>> for Qemu's vIOMMU.
>>
>> In my understanding the purpose of exposing CM=1 is to force guest do
>> IOTLB flush for each mapping change (including from non-present to
>> present) so Qemu is able to emulate each mapping change from guest
>> (correct me if I was wrong). If previous statement stands, CM=1 is
>> really a workaround for making vfio assigned devices and vIOMMU work
>> together, and unfortunately this cannot work on other vendor's IOMMU
>> without CM bit, such as AMD's IOMMU.
>>
>> So what's the requirements of making vfio assigned devices and vIOMMU
>> work together? I think it should be more helpful to implement a more
>> generic solution to monitor and emulate guest vIOMMU's page table,
>> rather than simply exposing CM=1 to guest, as it only works on intel IOMMU.
>>
>> And what do you mean asynchronous invalidations? I think the iova of the
>> changed mappings cannot be used until the mappings are invalidated. It
>> doesn't matter whether the invalidation is done via QI or register.
>
> Ok, so you're arguing that CM=0 is more efficient than CM=1 because it
> eliminates some portion of the invalidations necessary by the guest,
> while at the same time arguing for a more general solution of shadow
> page tables which would trap into the vIOMMU at every update,
> eliminating all the batching done by the guest IOMMU driver code
> attempting to reduce and consolidate the number of flushes done. All
> of this with only speculation on what might be more efficient. Can we
> get vIOMMU working, especially with assigned devices, before we
> speculate further?
>
> How do we expect a vIOMMU to be used in a guest? In the case of
> emulated devices, what value does it provide? Are emulated devices
> isolated from one another by the vIOMMU? No. Do we have 32-bit
> emulated devices for which DMA translation at the vIOMMU is
> significantly more efficient than bounce buffers within the guest?
> Probably not, and if we did we could just emulate 64bit devices. So I
> assume that beyond being a development tool, our primary useful feature
> of a vIOMMU is to expose devices to guest userspace (and thus nested
> guests) via tools like vfio. Am I wrong here? In this use case, it's
> most efficient to run with iommu=pt in the L1 guest, which would make
> any sort of invalidations a very rare event. Making use of vfio inside
> the L1 guest would then move a device from the static-identity domain
> in the L1 guest into a new domain where the mappings for that domain
> are also relatively static. So I really don't see why we're trying to
> optimize the invalidation of the vIOMMU at this point. I also have to
> believe that the hardware folks that designed VT-d believed there to be
> a performance advantage to using emulated VT-d with CM=1 versus
> shadowing all of the VT-d data structures in the hypervisor or they
> wouldn't have added this bit to the specification. Thanks,
Hi Alex,
Sorry for jumping to this discussion suddenly. Yes I absolutely agree
that getting vIOMMU working with assigned devices is more important
thing than arguing on vIOMMU performance on CM bit. Actually I am very
eager to make vIOMMU working with vfio for assigned device in guest as I
want to try DPDK via VFIO in guest (this is the reason I searched vIOMMU
support in Qemu and found this discussion). My concern for CM bit is not
performance, but it is not generic way, but still it is better than
nothing :)
Thanks,
-Kai
>
> Alex
>
>
next prev parent reply other threads:[~2016-06-07 22:39 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-21 16:19 [Qemu-devel] [PATCH v3 0/3] IOMMU: Add Support to VFIO devices with vIOMMU present Aviv B.D
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest Aviv B.D
2016-05-21 16:42 ` Jan Kiszka
2016-06-02 8:44 ` Peter Xu
2016-06-02 13:00 ` Alex Williamson
2016-06-02 13:14 ` Jan Kiszka
2016-06-02 13:17 ` Jan Kiszka
2016-06-02 16:15 ` Michael S. Tsirkin
2016-06-06 5:04 ` Peter Xu
2016-06-06 13:11 ` Alex Williamson
2016-06-06 13:43 ` Peter Xu
2016-06-06 17:02 ` Alex Williamson
2016-06-07 3:20 ` Peter Xu
2016-06-07 3:58 ` Alex Williamson
2016-06-07 5:00 ` Peter Xu
2016-06-07 5:21 ` Huang, Kai
2016-06-07 18:46 ` Alex Williamson
2016-06-07 22:39 ` Huang, Kai [this message]
2016-05-24 8:14 ` Jason Wang
2016-05-24 9:25 ` Jan Kiszka
2016-05-28 16:12 ` Aviv B.D.
2016-05-28 16:34 ` Kiszka, Jan
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
2016-06-06 5:04 ` Peter Xu
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 3/3] IOMMU: Integrate between VFIO and vIOMMU to support device assignment Aviv B.D
2016-05-23 17:53 ` Alex Williamson
2016-05-26 20:58 ` Alex Williamson
2016-05-28 10:52 ` Aviv B.D.
2016-05-28 16:02 ` Alex Williamson
2016-05-28 16:10 ` Aviv B.D.
2016-05-28 17:39 ` Alex Williamson
2016-05-28 18:14 ` Aviv B.D.
2016-05-28 19:48 ` Alex Williamson
2016-06-02 13:09 ` Aviv B.D.
2016-06-02 13:34 ` Alex Williamson
2016-06-06 8:09 ` Peter Xu
2016-06-06 18:21 ` Alex Williamson
2016-06-07 13:20 ` Peter Xu
2016-06-06 7:38 ` Peter Xu
2016-06-06 17:30 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11e4b619-12ea-cda1-75f1-a106fadce96f@linux.intel.com \
--to=kai.huang@linux.intel.com \
--cc=alex.williamson@redhat.com \
--cc=bd.aviv@gmail.com \
--cc=jan.kiszka@web.de \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).