qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@linux.intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Aviv B.D" <bd.aviv@gmail.com>, Jan Kiszka <jan.kiszka@web.de>,
	qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest
Date: Wed, 8 Jun 2016 10:39:19 +1200	[thread overview]
Message-ID: <11e4b619-12ea-cda1-75f1-a106fadce96f@linux.intel.com> (raw)
In-Reply-To: <20160607124653.693f30bb@ul30vt.home>



On 6/8/2016 6:46 AM, Alex Williamson wrote:
> On Tue, 7 Jun 2016 17:21:06 +1200
> "Huang, Kai" <kai.huang@linux.intel.com> wrote:
>
>> On 6/7/2016 3:58 PM, Alex Williamson wrote:
>>> On Tue, 7 Jun 2016 11:20:32 +0800
>>> Peter Xu <peterx@redhat.com> wrote:
>>>
>>>> On Mon, Jun 06, 2016 at 11:02:11AM -0600, Alex Williamson wrote:
>>>>> On Mon, 6 Jun 2016 21:43:17 +0800
>>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>>
>>>>>> On Mon, Jun 06, 2016 at 07:11:41AM -0600, Alex Williamson wrote:
>>>>>>> On Mon, 6 Jun 2016 13:04:07 +0800
>>>>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>>> [...]
>>>>>>>> Besides the reason that there might have guests that do not support
>>>>>>>> CM=1, will there be performance considerations? When user's
>>>>>>>> configuration does not require CM capability (e.g., generic VM
>>>>>>>> configuration, without VFIO), shall we allow user to disable the CM
>>>>>>>> bit so that we can have better IOMMU performance (avoid extra and
>>>>>>>> useless invalidations)?
>>>>>>>
>>>>>>> With Alexey's proposed patch to have callback ops when the iommu
>>>>>>> notifier list adds its first entry and removes its last, any of the
>>>>>>> additional overhead to generate notifies when nobody is listening can
>>>>>>> be avoided.  These same callbacks would be the ones that need to
>>>>>>> generate a hw_error if a notifier is added while running in CM=0.
>>>>>>
>>>>>> Not familar with Alexey's patch
>>>>>
>>>>> https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg00079.html
>>>>
>>>> Thanks for the pointer. :)
>>>>
>>>>>
>>>>>> , but is that for VFIO only?
>>>>>
>>>>> vfio is currently the only user of the iommu notifier, but the
>>>>> interface is generic, which is how it should (must) be.
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>>> I mean, if
>>>>>> we configured CMbit=1, guest kernel will send invalidation request
>>>>>> every time it creates new entries (context entries, or iotlb
>>>>>> entries). Even without VFIO notifiers, guest need to trap into QEMU
>>>>>> and process the invalidation requests. This is avoidable if we are not
>>>>>> using VFIO devices at all (so no need to maintain any mappings),
>>>>>> right?
>>>>>
>>>>> CM=1 only defines that not-present and invalid entries can be cached,
>>>>> any changes to existing entries requires an invalidation regardless of
>>>>> CM.  What you're looking for sounds more like ECAP.C:
>>>>
>>>> Yes, but I guess what I was talking about is CM bit but not ECAP.C.
>>>> When we clear/replace one context entry, guest kernel will definitely
>>>> send one context entry invalidation to QEMU:
>>>>
>>>> static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 devfn)
>>>> {
>>>> 	if (!iommu)
>>>> 		return;
>>>>
>>>> 	clear_context_table(iommu, bus, devfn);
>>>> 	iommu->flush.flush_context(iommu, 0, 0, 0,
>>>> 					   DMA_CCMD_GLOBAL_INVL);
>>>> 	iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
>>>> }
>>>>
>>>> ... While if we are creating a new one (like attaching a new VFIO
>>>> device?), it's an optional behavior depending on whether CM bit is
>>>> set:
>>>>
>>>> static int domain_context_mapping_one(struct dmar_domain *domain,
>>>> 				      struct intel_iommu *iommu,
>>>> 				      u8 bus, u8 devfn)
>>>> {
>>>>     ...
>>>> 	/*
>>>> 	 * It's a non-present to present mapping. If hardware doesn't cache
>>>> 	 * non-present entry we only need to flush the write-buffer. If the
>>>> 	 * _does_ cache non-present entries, then it does so in the special
>>>> 	 * domain #0, which we have to flush:
>>>> 	 */
>>>> 	if (cap_caching_mode(iommu->cap)) {
>>>> 		iommu->flush.flush_context(iommu, 0,
>>>> 					   (((u16)bus) << 8) | devfn,
>>>> 					   DMA_CCMD_MASK_NOBIT,
>>>> 					   DMA_CCMD_DEVICE_INVL);
>>>> 		iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
>>>> 	} else {
>>>> 		iommu_flush_write_buffer(iommu);
>>>> 	}
>>>>     ...
>>>> }
>>>>
>>>> Only if cap_caching_mode() is set (which is bit 7, the CM bit), we
>>>> will send these invalidations. What I meant is that, we should allow
>>>> user to specify the CM bit, so that when we are not using VFIO
>>>> devices, we can skip the above flush_content() and flush_iotlb()
>>>> etc... So, besides the truth that we have some guests do not support
>>>> CM bit (like Jailhouse), performance might be another consideration
>>>> point that we should allow user to specify the CM bit themselfs.
>>>
>>> I'm dubious of this, IOMMU drivers are already aware that hardware
>>> flushes are expensive and do batching to optimize it.  The queued
>>> invalidation mechanism itself is meant to allow asynchronous
>>> invalidations.  QEMU invalidating a virtual IOMMU might very well be
>>> faster than hardware.
>>
>> Do batching doesn't mean we can eliminate the IOTLB flush for mappings
>> from non-present to present, in case of CM=1, while in case CM=0 those
>> IOTLB flush are not necessary, just like the code above shows. Therefore
>> generally speaking CM=0 should have better performance than CM=1, even
>> for Qemu's vIOMMU.
>>
>> In my understanding the purpose of exposing CM=1 is to force guest do
>> IOTLB flush for each mapping change (including from non-present to
>> present) so Qemu is able to emulate each mapping change from guest
>> (correct me if I was wrong). If previous statement stands, CM=1 is
>> really a workaround for making vfio assigned devices and vIOMMU work
>> together, and unfortunately this cannot work on other vendor's IOMMU
>> without CM bit, such as AMD's IOMMU.
>>
>> So what's the requirements of making vfio assigned devices and vIOMMU
>> work together? I think it should be more helpful to implement a more
>> generic solution to monitor and emulate guest vIOMMU's page table,
>> rather than simply exposing CM=1 to guest, as it only works on intel IOMMU.
>>
>> And what do you mean asynchronous invalidations? I think the iova of the
>> changed mappings cannot be used until the mappings are invalidated. It
>> doesn't matter whether the invalidation is done via QI or register.
>
> Ok, so you're arguing that CM=0 is more efficient than CM=1 because it
> eliminates some portion of the invalidations necessary by the guest,
> while at the same time arguing for a more general solution of shadow
> page tables which would trap into the vIOMMU at every update,
> eliminating all the batching done by the guest IOMMU driver code
> attempting to reduce and consolidate the number of flushes done.  All
> of this with only speculation on what might be more efficient.  Can we
> get vIOMMU working, especially with assigned devices, before we
> speculate further?
>
> How do we expect a vIOMMU to be used in a guest?  In the case of
> emulated devices, what value does it provide?  Are emulated devices
> isolated from one another by the vIOMMU?  No.  Do we have 32-bit
> emulated devices for which DMA translation at the vIOMMU is
> significantly more efficient than bounce buffers within the guest?
> Probably not, and if we did we could just emulate 64bit devices.  So I
> assume that beyond being a development tool, our primary useful feature
> of a vIOMMU is to expose devices to guest userspace (and thus nested
> guests) via tools like vfio.  Am I wrong here?  In this use case, it's
> most efficient to run with iommu=pt in the L1 guest, which would make
> any sort of invalidations a very rare event.  Making use of vfio inside
> the L1 guest would then move a device from the static-identity domain
> in the L1 guest into a new domain where the mappings for that domain
> are also relatively static.  So I really don't see why we're trying to
> optimize the invalidation of the vIOMMU at this point.  I also have to
> believe that the hardware folks that designed VT-d believed there to be
> a performance advantage to using emulated VT-d with CM=1 versus
> shadowing all of the VT-d data structures in the hypervisor or they
> wouldn't have added this bit to the specification.  Thanks,
Hi Alex,

Sorry for jumping to this discussion suddenly. Yes I absolutely agree 
that getting vIOMMU working with assigned devices is more important 
thing than arguing on vIOMMU performance on CM bit. Actually I am very 
eager to make vIOMMU working with vfio for assigned device in guest as I 
want to try DPDK via VFIO in guest (this is the reason I searched vIOMMU 
support in Qemu and found this discussion). My concern for CM bit is not 
performance, but it is not generic way, but still it is better than 
nothing :)

Thanks,
-Kai

>
> Alex
>
>

  reply	other threads:[~2016-06-07 22:39 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-21 16:19 [Qemu-devel] [PATCH v3 0/3] IOMMU: Add Support to VFIO devices with vIOMMU present Aviv B.D
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest Aviv B.D
2016-05-21 16:42   ` Jan Kiszka
2016-06-02  8:44     ` Peter Xu
2016-06-02 13:00       ` Alex Williamson
2016-06-02 13:14         ` Jan Kiszka
2016-06-02 13:17           ` Jan Kiszka
2016-06-02 16:15           ` Michael S. Tsirkin
2016-06-06  5:04           ` Peter Xu
2016-06-06 13:11             ` Alex Williamson
2016-06-06 13:43               ` Peter Xu
2016-06-06 17:02                 ` Alex Williamson
2016-06-07  3:20                   ` Peter Xu
2016-06-07  3:58                     ` Alex Williamson
2016-06-07  5:00                       ` Peter Xu
2016-06-07  5:21                       ` Huang, Kai
2016-06-07 18:46                         ` Alex Williamson
2016-06-07 22:39                           ` Huang, Kai [this message]
2016-05-24  8:14   ` Jason Wang
2016-05-24  9:25     ` Jan Kiszka
2016-05-28 16:12       ` Aviv B.D.
2016-05-28 16:34         ` Kiszka, Jan
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
2016-06-06  5:04   ` Peter Xu
2016-05-21 16:19 ` [Qemu-devel] [PATCH v3 3/3] IOMMU: Integrate between VFIO and vIOMMU to support device assignment Aviv B.D
2016-05-23 17:53   ` Alex Williamson
2016-05-26 20:58     ` Alex Williamson
2016-05-28 10:52       ` Aviv B.D.
2016-05-28 16:02         ` Alex Williamson
2016-05-28 16:10           ` Aviv B.D.
2016-05-28 17:39             ` Alex Williamson
2016-05-28 18:14               ` Aviv B.D.
2016-05-28 19:48                 ` Alex Williamson
2016-06-02 13:09                   ` Aviv B.D.
2016-06-02 13:34                     ` Alex Williamson
2016-06-06  8:09                       ` Peter Xu
2016-06-06 18:21                         ` Alex Williamson
2016-06-07 13:20                           ` Peter Xu
2016-06-06  7:38     ` Peter Xu
2016-06-06 17:30       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11e4b619-12ea-cda1-75f1-a106fadce96f@linux.intel.com \
    --to=kai.huang@linux.intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=jan.kiszka@web.de \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).