qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Jason Wang <jasowang@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges
Date: Thu, 3 May 2018 14:04:42 +0800	[thread overview]
Message-ID: <20180503060442.GB2378@xz-mi> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19113D318@SHSMSX101.ccr.corp.intel.com>

On Fri, Apr 27, 2018 at 11:37:24PM +0000, Tian, Kevin wrote:

[...]

> > Self NAK on this...
> > 
> > More than half of the whole series tries to solve the solo problem
> > that we unmapped some pages that were already mapped, which proved
> > to
> > be wrong.  Now if we squash the change we will do the same wrong
> > thing, so we'll still have a very small window that the remapped page
> > be missing from a device's POV.
> > 
> > Now to solve this I suppose we'll need to cache every translation then
> > we can know whether a mapping has changed, and we only remap when it
> > really has changed.  But I'm afraid that can be a big amount of data
> > for nested guests.  For a most common 4G L2 guest, I think the worst
> > case (e.g., no huge page at all, no continuous pages) is 4G/4K=1M
> > entries in that tree.
> 
> I think one key factor to think about is the effect of PSI. From
> VT-d spec, all internal caches (IOTLB entries, paging-structure
> cache entries, etc.) associated with specified address range
> must be flushed accordingly, i.e. no cache on stale mapping.
> Now per-device iova ranges is new type of cache introduced
> by Qemu VT-d. It doesn't cache actual mapping but its purpose
> is to decide whether to notify VFIO for updated mapping. In
> this manner if we don't differentiate whether an entry is
> for stale mapping, looks the definition of PSI is broken.
> 
> ask another question. In reality how much improvement this
> patch can bring, i.e. is it usual to see guest map on an already
> mapped region, or unmap an already unmapped region?

The funny thing is that there is actually no MAP/UNMAP flag for
PSI/DSI.  For either MAP/UNMAP, guest send one PSI for that range, or
even a DSI (Domain Selective Invalidations).

This patch will mostly be helpful not really for PSIs, but for DSI.
Please have a look on Issue (4) that I mentioned in the cover letter.
This patch is the core part to solve that DMA error problem.

An example is that we can get DSI for a domain that already have
existing mappings. In QEMU, we handle DSI as a whole-range PSI, so
before this patch we will unmap those already mapped pages then remap
all of them.  However we can't map the same page again, so we cache
what we have mapped here.

> 
> > 
> > Is it really worth it to solve this possibly-buggy-guest-OS problem
> > with such a overhead?  I don't know..
> 
> If adding overhead removes the benefit of this patch, then 
> definitely not a good thing.

For the problem that we are going to solve, this patch is not really a
beneficial one, but fixes a critical bug.  Again, please refer to the
issue (4) of the cover letter for that 3ms window problem.

> 
> > 
> > I'm not sure whether it's still acceptable that we put this issue
> > aside.  We should know that normal OSs should not do this, and if they
> > do, IMHO it proves a buggy OS already (so even from hardware POV we
> > allow this, from software POV it should still be problematic), then
> > it'll have problem for sure, but only within the VM itself, and it
> > won't affect other VMs or the host.  That sounds still reasonable to
> > me so far.
> 
> As said earlier, what I'm worried is whether there is a way to
> detect such case when your assumption is violated. usually
> emulation can choose to not implement all the features which
> are supported on the real device, but it's done in a way that
> non-supported features/behaviors can be reported to guest
> (based on spec definition) thus guest knows the expectation
> from the emulated device...

IMHO the guest can't really detect this, but it'll found that the
device is not working functionally if it's doing something like what
Jason has mentioned.

Actually now I have had an idea if we really want to live well even
with Jason's example: maybe we'll need to identify PSI/DSI.  For DSI,
we don't remap for mapped pages; for PSI, we unmap and remap the
mapped pages.  That'll complicate the stuff a bit, but it should
satisfy all the people.

Thanks,

-- 
Peter Xu

  reply	other threads:[~2018-05-03  6:05 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25  4:51 [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 01/10] intel-iommu: send PSI always even if across PDEs Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 02/10] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 03/10] intel-iommu: add iommu lock Peter Xu
2018-04-25 16:26   ` Emilio G. Cota
2018-04-26  5:45     ` Peter Xu
2018-04-27  5:13   ` Jason Wang
2018-04-27  6:26     ` Peter Xu
2018-04-27  7:19       ` Tian, Kevin
2018-04-27  9:53         ` Peter Xu
2018-04-28  1:54           ` Tian, Kevin
2018-04-28  1:43       ` Jason Wang
2018-04-28  2:24         ` Peter Xu
2018-04-28  2:42           ` Jason Wang
2018-04-28  3:06             ` Peter Xu
2018-04-28  3:11               ` Jason Wang
2018-04-28  3:14             ` Peter Xu
2018-04-28  3:16               ` Jason Wang
2018-04-30  7:22               ` Paolo Bonzini
2018-04-30  7:20           ` Paolo Bonzini
2018-05-03  5:39             ` Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 04/10] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 05/10] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 06/10] intel-iommu: pass in address space when page walk Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 07/10] util: implement simple interval tree logic Peter Xu
2018-04-27  5:53   ` Jason Wang
2018-04-27  6:27     ` Peter Xu
2018-05-03  7:10     ` Peter Xu
2018-05-03  7:21       ` Jason Wang
2018-05-03  7:30         ` Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges Peter Xu
2018-04-27  6:07   ` Jason Wang
2018-04-27  6:34     ` Peter Xu
2018-04-27  7:02     ` Tian, Kevin
2018-04-27  7:28       ` Peter Xu
2018-04-27  7:44         ` Tian, Kevin
2018-04-27  9:55           ` Peter Xu
2018-04-27 11:40             ` Peter Xu
2018-04-27 23:37               ` Tian, Kevin
2018-05-03  6:04                 ` Peter Xu [this message]
2018-05-03  7:20                   ` Jason Wang
2018-05-03  7:28                     ` Peter Xu
2018-05-03  7:43                       ` Jason Wang
2018-05-03  7:53                         ` Peter Xu
2018-05-03  9:22                           ` Jason Wang
2018-05-03  9:53                             ` Peter Xu
2018-05-03 12:01                               ` Peter Xu
2018-04-28  1:49               ` Jason Wang
2018-04-25  4:51 ` [Qemu-devel] [PATCH 09/10] intel-iommu: don't unmap all for shadow page table Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 10/10] intel-iommu: remove notify_unmap for page walk Peter Xu
2018-04-25  5:05 ` [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes no-reply
2018-04-25  5:34   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180503060442.GB2378@xz-mi \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).