Re: [Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and unmap notifications

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"Aviv B.D" <bd.aviv@gmail.com>, Jason Wang <jasowang@redhat.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Lan, Tianyu" <tianyu.lan@intel.com>
Subject: Re: [Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and unmap notifications
Date: Fri, 2 Dec 2016 10:26:41 -0700	[thread overview]
Message-ID: <20161202102641.28b81768@t450s.home> (raw)
In-Reply-To: <20161202055925.GC21601@pxdev.xzpeter.org>

On Fri, 2 Dec 2016 13:59:25 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Thu, Dec 01, 2016 at 04:21:38AM +0000, Tian, Kevin wrote:
> > > From: Peter Xu
> > > Sent: Wednesday, November 30, 2016 5:24 PM
> > > 
> > > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote:  
> > > > * intel_iommu's replay op is not implemented yet (May come in different patch
> > > >   set).
> > > >   The replay function is required for hotplug vfio device and to move devices
> > > >   between existing domains.  
> > > 
> > > I am thinking about this replay thing recently and now I start to
> > > doubt whether the whole vt-d vIOMMU framework suites this...
> > > 
> > > Generally speaking, current work is throwing away the IOMMU "domain"
> > > layer here. We maintain the mapping only per device, and we don't care
> > > too much about which domain it belongs. This seems problematic.
> > > 
> > > A simplest wrong case for this is (let's assume cache-mode is
> > > enabled): if we have two assigned devices A and B, both belong to the
> > > same domain 1. Meanwhile, in domain 1 assume we have one mapping which
> > > is the first page (iova range 0-0xfff). Then, if guest wants to
> > > invalidate the page, it'll notify VT-d vIOMMU with an invalidation
> > > message. If we do this invalidation per-device, we'll need to UNMAP
> > > the region twice - once for A, once for B (if we have more devices, we
> > > will unmap more times), and we can never know we have done duplicated
> > > work since we don't keep domain info, so we don't know they are using
> > > the same address space. The first unmap will work, and then we'll
> > > possibly get some errors on the rest of dma unmap failures.  
> > 
> > Tianyu and I discussed there is a bigger problem: today VFIO assumes 
> > only one address space per container, which is fine w/o vIOMMU (all devices in 
> > same container share same GPA->HPA translation). However it's not the case
> > when vIOMMU is enabled, because guest Linux implements per-device 
> > IOVA space. If a VFIO container includes multiple devices, it means 
> > multiple address spaces required per container...  
> 
> IIUC the vfio container is created in:
> 
>   vfio_realize
>   vfio_get_group
>   vfio_connect_container
> 
> Along the way (for vfio_get_group()), we have:
> 
>   group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp);
> 
> Here the address space is per device. If without vIOMMU, they will be
> pointed to the same system address space. However if with vIOMMU,
> that address space will be per-device, no?

Correct, with VT-d present, there will be a separate AddressSpace per
device, so each device will be placed into separate containers.  This
is currently the only way to provide the flexibility that those
separate devices can be attached to different domains in the guest.  It
also automatically faults on when devices share an iommu group on the
host but the guest attempts to use separate AddressSpaces.

Trouble comes when the guest is booted with iommu=pt as each container
will need to map the full guest memory, yet each container is accounted
separately for locked memory.  libvirt doesn't account for
$NUM_HOSTDEVS x $VM_MEM_SIZE for locked memory.

Ideally we could be more flexible with dynamic containers, but it's not
currently an option to move a group from one container to another w/o
first closing all the devices within the group.  Thanks,

Alex

next prev parent reply	other threads:[~2016-12-02 17:26 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-28 15:51 [Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
2016-11-28 15:51 ` [Qemu-devel] [PATCH v7 1/5] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
2016-12-01  4:25   ` Tian, Kevin
2016-11-28 15:51 ` [Qemu-devel] [PATCH v7 2/5] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
2016-11-28 15:51 ` [Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
2016-11-29  3:23   ` 蓝天宇
2016-11-29  7:57     ` Aviv B.D.
2016-11-28 15:51 ` [Qemu-devel] [PATCH v7 4/5] IOMMU: add specific replay function with default implemenation Aviv B.D
2016-11-28 15:51 ` [Qemu-devel] [PATCH v7 5/5] IOMMU: add specific null implementation of iommu_replay to intel_iommu Aviv B.D
2016-11-28 16:36   ` Alex Williamson
2016-11-28 18:57     ` Aviv B.D.
2016-11-30  9:23 ` [Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and unmap notifications Peter Xu
2016-12-01  4:21   ` Tian, Kevin
2016-12-01  8:13     ` Lan Tianyu
2016-12-02  5:59     ` Peter Xu
2016-12-02  6:23       ` Tian, Kevin
2016-12-02  6:58         ` Peter Xu
2016-12-02 17:26       ` Alex Williamson [this message]
2016-12-01  8:27   ` Lan Tianyu
2016-12-02  6:08     ` Peter Xu
2016-12-02 17:30       ` Alex Williamson
2016-12-06  2:03         ` Lan, Tianyu
2016-12-06  2:18         ` Peter Xu
2016-12-01 15:42   ` Alex Williamson
2016-12-02  6:17     ` Peter Xu
2016-12-01  3:26 ` Tian, Kevin
2016-12-01  6:44 ` Lan Tianyu
2016-12-02  6:52   ` Peter Xu
2016-12-06  6:30     ` Lan Tianyu
2016-12-06  6:51       ` Peter Xu
2016-12-06  7:06         ` Lan Tianyu
2016-12-06  7:22           ` Peter Xu
2016-12-06  8:27             ` Lan Tianyu
2016-12-06 10:59               ` Peter Xu
2016-12-06 16:58                 ` Alex Williamson
2016-12-07  6:09                 ` Lan Tianyu
2016-12-07  6:43                   ` Peter Xu
2016-12-07 14:04                     ` Lan Tianyu
2016-12-08  2:39                       ` Peter Xu
2016-12-08  5:41                         ` Lan Tianyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161202102641.28b81768@t450s.home \
    --to=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).