From: Jason Wang <jasowang@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, tianyu.lan@intel.com,
kevin.tian@intel.com, mst@redhat.com, jan.kiszka@siemens.com,
alex.williamson@redhat.com, bd.aviv@gmail.com
Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices
Date: Wed, 18 Jan 2017 16:36:05 +0800 [thread overview]
Message-ID: <f89c8ad4-a95d-bf66-33c6-f151531c2174@redhat.com> (raw)
In-Reply-To: <20170118081137.GS30108@pxdev.xzpeter.org>
On 2017年01月18日 16:11, Peter Xu wrote:
> On Wed, Jan 18, 2017 at 11:10:53AM +0800, Jason Wang wrote:
>>
>> On 2017年01月17日 22:45, Peter Xu wrote:
>>> On Mon, Jan 16, 2017 at 05:54:55PM +0800, Jason Wang wrote:
>>>> On 2017年01月16日 17:18, Peter Xu wrote:
>>>>>>> static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>>>>>>> hwaddr addr, uint8_t am)
>>>>>>> {
>>>>>>> @@ -1222,6 +1251,7 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>>>>>>> info.addr = addr;
>>>>>>> info.mask = ~((1 << am) - 1);
>>>>>>> g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
>>>>>>> + vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
>>>>>> Is the case of GLOBAL or DSI flush missed, or we don't care it at all?
>>>>> IMHO we don't. For device assignment, since we are having CM=1 here,
>>>>> we should have explicit page invalidations even if guest sends
>>>>> global/domain invalidations.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -- peterx
>>>> Is this spec required?
>>> I think not. IMO the spec is very coarse grained on describing cache
>>> mode...
>>>
>>>> Btw, it looks to me that both DSI and GLOBAL are
>>>> indeed explicit flush.
>>> Actually when cache mode is on, it is unclear to me on how we should
>>> treat domain/global invalidations, at least from the spec (as
>>> mentioned earlier). My understanding is that they are not "explicit
>>> flushes", which IMHO should only mean page selective IOTLB
>>> invalidations.
>> Probably not, at least from the view of performance. DSI and global should
>> be more efficient in some cases.
> I agree with you that DSI/GLOBAL flushes are more efficient in some
> way. But IMHO that does not mean these invalidations are "explicit
> invalidations", and I suspect whether cache mode has to coop with it.
Well, the spec does not forbid DSI/GLOBAL with CM and the driver codes
had used them for almost ten years. I can hardly believe it's wrong.
>
> But here I should add one more thing besides PSI - context entry
> invalidation should be one of "the explicit invalidations" as well,
> which we need to handle just like PSI when cache mode is on.
>
>>>> Just have a quick go through on driver codes and find this something
>>>> interesting in intel_iommu_flush_iotlb_psi():
>>>>
>>>> ...
>>>> /*
>>>> * Fallback to domain selective flush if no PSI support or the size is
>>>> * too big.
>>>> * PSI requires page size to be 2 ^ x, and the base address is naturally
>>>> * aligned to the size
>>>> */
>>>> if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
>>>> iommu->flush.flush_iotlb(iommu, did, 0, 0,
>>>> DMA_TLB_DSI_FLUSH);
>>>> else
>>>> iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
>>>> DMA_TLB_PSI_FLUSH);
>>>> ...
>>> I think this is interesting... and I doubt its correctness while with
>>> cache mode enabled.
>>>
>>> If so (sending domain invalidation instead of a big range of page
>>> invalidations), how should we capture which pages are unmapped in
>>> emulated IOMMU?
>> We don't need to track individual pages here, since all pages for a specific
>> domain were unmapped I believe?
> IMHO this might not be the correct behavior.
>
> If we receive one domain specific invalidation, I agree that we should
> invalidate the IOTLB cache for all the devices inside the domain.
> However, when cache mode is on, we should be depending on the PSIs to
> unmap each page (unless we want to unmap the whole address space, in
> this case it's very possible that the guest is just unmapping a range,
> not the entire space). If we convert several PSIs into one big DSI,
> IMHO we will leave those pages mapped/unmapped while we should
> unmap/map them.
Confused, do you have an example for this? (I fail to understand why DSI
can't work, at least implementation can convert DSI to several PSIs
internally).
Thanks
>
>>>> It looks like DSI_FLUSH is possible even for CM on.
>>>>
>>>> And in flush_unmaps():
>>>>
>>>> ...
>>>> /* In caching mode, global flushes turn emulation expensive */
>>>> if (!cap_caching_mode(iommu->cap))
>>>> iommu->flush.flush_iotlb(iommu, 0, 0, 0,
>>>> DMA_TLB_GLOBAL_FLUSH);
>>>> ...
>>>>
>>>> If I understand the comments correctly, GLOBAL is ok for CM too (though the
>>>> code did not do it for performance reason).
>>> I think it should be okay to send global flush for CM, but not sure
>>> whether we should notify anything when we receive it. Hmm, anyway, I
>>> think I need some more reading to make sure I understand the whole
>>> thing correctly. :)
>>>
>>> For example, when I see this commit:
>>>
>>> commit 78d5f0f500e6ba8f6cfd0673475ff4d941d705a2
>>> Author: Nadav Amit <nadav.amit@gmail.com>
>>> Date: Thu Apr 8 23:00:41 2010 +0300
>>>
>>> intel-iommu: Avoid global flushes with caching mode.
>>> While it may be efficient on real hardware, emulation of global
>>> invalidations is very expensive as all shadow entries must be examined.
>>> This patch changes the behaviour when caching mode is enabled (which is
>>> the case when IOMMU emulation takes place). In this case, page specific
>>> invalidation is used instead.
>>>
>>> Before I ask someone else besides qemu-devel, I am curious about
>>> whether there is existing VT-d emulation code (outside QEMU, of
>>> course) so that I can have a reference?
>> Yes, it has. The author of this patch - Nadav has done lots of research on
>> emulated IOMMU. See following papers:
>>
>> https://hal.inria.fr/inria-00493752/document
>> http://www.cse.iitd.ac.in/~sbansal/csl862-virt/readings/vIOMMU.pdf
> Thanks for these good materials. I will google the author for sure
> next time. :)
>
> -- peterx
next prev parent reply other threads:[~2017-01-18 8:36 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-13 3:06 [Qemu-devel] [PATCH RFC v3 00/14] VT-d: vfio enablement and misc enhances Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 01/14] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Peter Xu
2017-01-20 8:32 ` Tian, Kevin
2017-01-20 8:54 ` Peter Xu
2017-01-20 8:59 ` Tian, Kevin
2017-01-20 9:11 ` Peter Xu
2017-01-20 9:20 ` Tian, Kevin
2017-01-20 9:30 ` Peter Xu
2017-01-20 15:42 ` Eric Blake
2017-01-22 2:32 ` Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 02/14] intel_iommu: simplify irq region translation Peter Xu
2017-01-20 8:22 ` Tian, Kevin
2017-01-20 9:05 ` Peter Xu
2017-01-20 9:15 ` Tian, Kevin
2017-01-20 9:27 ` Peter Xu
2017-01-20 9:52 ` Tian, Kevin
2017-01-20 10:04 ` Peter Xu
2017-01-22 4:42 ` Tian, Kevin
2017-01-22 4:50 ` Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 03/14] intel_iommu: renaming gpa to iova where proper Peter Xu
2017-01-20 8:27 ` Tian, Kevin
2017-01-20 9:23 ` Peter Xu
2017-01-20 9:41 ` Tian, Kevin
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 04/14] intel_iommu: fix trace for inv desc handling Peter Xu
2017-01-13 7:46 ` Jason Wang
2017-01-13 9:13 ` Peter Xu
2017-01-13 9:33 ` Jason Wang
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 05/14] intel_iommu: fix trace for addr translation Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 06/14] intel_iommu: vtd_slpt_level_shift check level Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 07/14] memory: add section range info for IOMMU notifier Peter Xu
2017-01-13 7:55 ` Jason Wang
2017-01-13 9:23 ` Peter Xu
2017-01-13 9:37 ` Jason Wang
2017-01-13 10:22 ` Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 08/14] memory: provide iommu_replay_all() Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 09/14] memory: introduce memory_region_notify_one() Peter Xu
2017-01-13 7:58 ` Jason Wang
2017-01-16 7:08 ` Peter Xu
2017-01-16 7:38 ` Jason Wang
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 10/14] memory: add MemoryRegionIOMMUOps.replay() callback Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 11/14] intel_iommu: provide its own replay() callback Peter Xu
2017-01-13 9:26 ` Jason Wang
2017-01-16 7:31 ` Peter Xu
2017-01-16 7:47 ` Jason Wang
2017-01-16 7:59 ` Peter Xu
2017-01-16 8:03 ` Jason Wang
2017-01-16 8:06 ` Peter Xu
2017-01-16 8:23 ` Jason Wang
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 12/14] intel_iommu: do replay when context invalidate Peter Xu
2017-01-16 5:53 ` Jason Wang
2017-01-16 7:43 ` Peter Xu
2017-01-16 7:52 ` Jason Wang
2017-01-16 8:02 ` Peter Xu
2017-01-16 8:18 ` Peter Xu
2017-01-16 8:28 ` Jason Wang
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2017-01-16 6:20 ` Jason Wang
2017-01-16 7:50 ` Peter Xu
2017-01-16 8:01 ` Jason Wang
2017-01-16 8:12 ` Peter Xu
2017-01-16 8:25 ` Jason Wang
2017-01-16 8:32 ` Peter Xu
2017-01-16 16:25 ` Michael S. Tsirkin
2017-01-17 14:53 ` Peter Xu
2017-01-16 19:53 ` Alex Williamson
2017-01-17 14:00 ` Peter Xu
2017-01-17 15:46 ` Alex Williamson
2017-01-18 7:49 ` Peter Xu
2017-01-19 8:20 ` Peter Xu
2017-01-13 3:06 ` [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices Peter Xu
2017-01-16 6:30 ` Jason Wang
2017-01-16 9:18 ` Peter Xu
2017-01-16 9:54 ` Jason Wang
2017-01-17 14:45 ` Peter Xu
2017-01-18 3:10 ` Jason Wang
2017-01-18 8:11 ` Peter Xu
2017-01-18 8:36 ` Jason Wang [this message]
2017-01-18 8:46 ` Peter Xu
2017-01-18 9:38 ` Tian, Kevin
2017-01-18 10:06 ` Jason Wang
2017-01-19 3:32 ` Peter Xu
2017-01-19 3:36 ` Jason Wang
2017-01-19 3:16 ` Peter Xu
2017-01-19 6:22 ` Tian, Kevin
2017-01-19 9:38 ` Peter Xu
2017-01-19 6:44 ` Liu, Yi L
2017-01-19 7:02 ` Jason Wang
2017-01-19 7:02 ` Peter Xu
2017-01-16 9:20 ` Peter Xu
2017-01-13 15:58 ` [Qemu-devel] [PATCH RFC v3 00/14] VT-d: vfio enablement and misc enhances Michael S. Tsirkin
2017-01-14 2:59 ` Peter Xu
2017-01-17 15:07 ` Michael S. Tsirkin
2017-01-18 7:34 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f89c8ad4-a95d-bf66-33c6-f151531c2174@redhat.com \
--to=jasowang@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=bd.aviv@gmail.com \
--cc=jan.kiszka@siemens.com \
--cc=kevin.tian@intel.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=tianyu.lan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).