From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37267) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cCgsg-0006kt-RR for qemu-devel@nongnu.org; Fri, 02 Dec 2016 00:59:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cCgsd-0000tM-Pe for qemu-devel@nongnu.org; Fri, 02 Dec 2016 00:59:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40230) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cCgsd-0000qw-Gs for qemu-devel@nongnu.org; Fri, 02 Dec 2016 00:59:31 -0500 Date: Fri, 2 Dec 2016 13:59:25 +0800 From: Peter Xu Message-ID: <20161202055925.GC21601@pxdev.xzpeter.org> References: <1480348315-13332-1-git-send-email-bd.aviv@gmail.com> <20161130092359.GC4731@pxdev.xzpeter.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and unmap notifications List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Tian, Kevin" Cc: "Aviv B.D" , Jason Wang , Jan Kiszka , Alex Williamson , "qemu-devel@nongnu.org" , "Michael S. Tsirkin" , "Lan, Tianyu" On Thu, Dec 01, 2016 at 04:21:38AM +0000, Tian, Kevin wrote: > > From: Peter Xu > > Sent: Wednesday, November 30, 2016 5:24 PM > > > > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote: > > > * intel_iommu's replay op is not implemented yet (May come in different patch > > > set). > > > The replay function is required for hotplug vfio device and to move devices > > > between existing domains. > > > > I am thinking about this replay thing recently and now I start to > > doubt whether the whole vt-d vIOMMU framework suites this... > > > > Generally speaking, current work is throwing away the IOMMU "domain" > > layer here. We maintain the mapping only per device, and we don't care > > too much about which domain it belongs. This seems problematic. > > > > A simplest wrong case for this is (let's assume cache-mode is > > enabled): if we have two assigned devices A and B, both belong to the > > same domain 1. Meanwhile, in domain 1 assume we have one mapping which > > is the first page (iova range 0-0xfff). Then, if guest wants to > > invalidate the page, it'll notify VT-d vIOMMU with an invalidation > > message. If we do this invalidation per-device, we'll need to UNMAP > > the region twice - once for A, once for B (if we have more devices, we > > will unmap more times), and we can never know we have done duplicated > > work since we don't keep domain info, so we don't know they are using > > the same address space. The first unmap will work, and then we'll > > possibly get some errors on the rest of dma unmap failures. > > Tianyu and I discussed there is a bigger problem: today VFIO assumes > only one address space per container, which is fine w/o vIOMMU (all devices in > same container share same GPA->HPA translation). However it's not the case > when vIOMMU is enabled, because guest Linux implements per-device > IOVA space. If a VFIO container includes multiple devices, it means > multiple address spaces required per container... IIUC the vfio container is created in: vfio_realize vfio_get_group vfio_connect_container Along the way (for vfio_get_group()), we have: group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp); Here the address space is per device. If without vIOMMU, they will be pointed to the same system address space. However if with vIOMMU, that address space will be per-device, no? -- peterx