From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37267)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <peterx@redhat.com>) id 1cCgsg-0006kt-RR
	for qemu-devel@nongnu.org; Fri, 02 Dec 2016 00:59:35 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <peterx@redhat.com>) id 1cCgsd-0000tM-Pe
	for qemu-devel@nongnu.org; Fri, 02 Dec 2016 00:59:34 -0500
Received: from mx1.redhat.com ([209.132.183.28]:40230)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1cCgsd-0000qw-Gs
	for qemu-devel@nongnu.org; Fri, 02 Dec 2016 00:59:31 -0500
Date: Fri, 2 Dec 2016 13:59:25 +0800
From: Peter Xu <peterx@redhat.com>
Message-ID: <20161202055925.GC21601@pxdev.xzpeter.org>
References: <1480348315-13332-1-git-send-email-bd.aviv@gmail.com>
	<20161130092359.GC4731@pxdev.xzpeter.org>
	<AADFC41AFE54684AB9EE6CBC0274A5D18E078B22@SHSMSX101.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D18E078B22@SHSMSX101.ccr.corp.intel.com>
Subject: Re: [Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and
 unmap notifications
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "Aviv B.D" <bd.aviv@gmail.com>, Jason Wang <jasowang@redhat.com>, Jan Kiszka <jan.kiszka@siemens.com>, Alex Williamson <alex.williamson@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "Michael S. Tsirkin" <mst@redhat.com>, "Lan, Tianyu" <tianyu.lan@intel.com>

On Thu, Dec 01, 2016 at 04:21:38AM +0000, Tian, Kevin wrote:
> > From: Peter Xu
> > Sent: Wednesday, November 30, 2016 5:24 PM
> > 
> > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote:
> > > * intel_iommu's replay op is not implemented yet (May come in different patch
> > >   set).
> > >   The replay function is required for hotplug vfio device and to move devices
> > >   between existing domains.
> > 
> > I am thinking about this replay thing recently and now I start to
> > doubt whether the whole vt-d vIOMMU framework suites this...
> > 
> > Generally speaking, current work is throwing away the IOMMU "domain"
> > layer here. We maintain the mapping only per device, and we don't care
> > too much about which domain it belongs. This seems problematic.
> > 
> > A simplest wrong case for this is (let's assume cache-mode is
> > enabled): if we have two assigned devices A and B, both belong to the
> > same domain 1. Meanwhile, in domain 1 assume we have one mapping which
> > is the first page (iova range 0-0xfff). Then, if guest wants to
> > invalidate the page, it'll notify VT-d vIOMMU with an invalidation
> > message. If we do this invalidation per-device, we'll need to UNMAP
> > the region twice - once for A, once for B (if we have more devices, we
> > will unmap more times), and we can never know we have done duplicated
> > work since we don't keep domain info, so we don't know they are using
> > the same address space. The first unmap will work, and then we'll
> > possibly get some errors on the rest of dma unmap failures.
> 
> Tianyu and I discussed there is a bigger problem: today VFIO assumes 
> only one address space per container, which is fine w/o vIOMMU (all devices in 
> same container share same GPA->HPA translation). However it's not the case
> when vIOMMU is enabled, because guest Linux implements per-device 
> IOVA space. If a VFIO container includes multiple devices, it means 
> multiple address spaces required per container...

IIUC the vfio container is created in:

  vfio_realize
  vfio_get_group
  vfio_connect_container

Along the way (for vfio_get_group()), we have:

  group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp);

Here the address space is per device. If without vIOMMU, they will be
pointed to the same system address space. However if with vIOMMU,
that address space will be per-device, no?

-- peterx