From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40885) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cCh1w-00006R-Rp for qemu-devel@nongnu.org; Fri, 02 Dec 2016 01:09:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cCh1t-0005aF-OT for qemu-devel@nongnu.org; Fri, 02 Dec 2016 01:09:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44490) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cCh1t-0005a0-GW for qemu-devel@nongnu.org; Fri, 02 Dec 2016 01:09:05 -0500 Date: Fri, 2 Dec 2016 14:08:59 +0800 From: Peter Xu Message-ID: <20161202060859.GD21601@pxdev.xzpeter.org> References: <1480348315-13332-1-git-send-email-bd.aviv@gmail.com> <20161130092359.GC4731@pxdev.xzpeter.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and unmap notifications List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Lan Tianyu Cc: "Aviv B.D" , Jason Wang , Jan Kiszka , Alex Williamson , qemu-devel@nongnu.org, "Michael S. Tsirkin" On Thu, Dec 01, 2016 at 04:27:52PM +0800, Lan Tianyu wrote: > On 2016=E5=B9=B411=E6=9C=8830=E6=97=A5 17:23, Peter Xu wrote: > > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote: > >> * intel_iommu's replay op is not implemented yet (May come in differ= ent patch=20 > >> set). > >> The replay function is required for hotplug vfio device and to mov= e devices=20 > >> between existing domains. > >=20 > > I am thinking about this replay thing recently and now I start to > > doubt whether the whole vt-d vIOMMU framework suites this... > >=20 > > Generally speaking, current work is throwing away the IOMMU "domain" > > layer here. We maintain the mapping only per device, and we don't car= e > > too much about which domain it belongs. This seems problematic. > >=20 > > A simplest wrong case for this is (let's assume cache-mode is > > enabled): if we have two assigned devices A and B, both belong to the > > same domain 1. Meanwhile, in domain 1 assume we have one mapping whic= h > > is the first page (iova range 0-0xfff). Then, if guest wants to > > invalidate the page, it'll notify VT-d vIOMMU with an invalidation > > message. If we do this invalidation per-device, we'll need to UNMAP > > the region twice - once for A, once for B (if we have more devices, w= e > > will unmap more times), and we can never know we have done duplicated > > work since we don't keep domain info, so we don't know they are using > > the same address space. The first unmap will work, and then we'll > > possibly get some errors on the rest of dma unmap failures. >=20 >=20 > Hi Peter: Hi, Tianyu, > According VTD spec 6.2.2.1, "Software must ensure that, if multiple > context-entries (or extended-context-entries) are programmed > with the same Domain-id (DID), such entries must be programmed with sam= e > value for the secondlevel page-table pointer (SLPTPTR) field, and same > value for the PASID Table Pointer (PASIDTPTR) field.". >=20 > So if two assigned device may have different IO page table, they should > be put into different domains. By default, devices will be put into different domains. However it should be legal that we put two assigned devices into the same IOMMU domain (in the guest), right? And we should handle both cases well IMHO. Actually I just wrote a tool to do it based on vfio-pci: https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-gr= oup/vfio-bind-group.c If we run this tool in the guest with parameter like: ./vfio-bind-groups 00:03.0 00:04.0 It'll create one single domain, and put PCI device 00:03.0, 00:04.0 into the same IOMMU domain. Thanks, -- peterx