From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48983)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1a0jKc-00018u-4e
	for qemu-devel@nongnu.org; Mon, 23 Nov 2015 00:06:27 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1a0jKY-0005Ki-Uk
	for qemu-devel@nongnu.org; Mon, 23 Nov 2015 00:06:26 -0500
Received: from mga03.intel.com ([134.134.136.65]:33818)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1a0jKY-0005Ke-LH
	for qemu-devel@nongnu.org; Mon, 23 Nov 2015 00:06:22 -0500
Message-ID: <56529EAB.4030007@intel.com>
Date: Mon, 23 Nov 2015 13:05:47 +0800
From: Jike Song <jike.song@intel.com>
MIME-Version: 1.0
References: <53D215D3.50608@intel.com> <547FCAAD.2060406@intel.com>
	<54AF967B.3060503@intel.com> <5527CEC4.9080700@intel.com>
	<559B3E38.1080707@intel.com> <562F4311.9@intel.com>
	<1447870341.4697.92.camel@redhat.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F7152DB@SHSMSX101.ccr.corp.intel.com>
	<1447963356.4697.184.camel@redhat.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F717951@SHSMSX101.ccr.corp.intel.com>
	<1448040302.4697.300.camel@redhat.com>
In-Reply-To: <1448040302.4697.300.camel@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Intel-gfx] [Announcement] 2015-Q3 release of
 XenGT - a Mediated Graphics Passthrough Solution from Intel
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "igvt-g@ml01.01.org" <igvt-g@ml01.01.org>, "Tian,
	Kevin" <kevin.tian@intel.com>, "Reddy,
	Raghuveer" <raghuveer.reddy@intel.com>, qemu-devel <qemu-devel@nongnu.org>, "White,
	Michael L" <michael.l.white@intel.com>, "Cowperthwaite,
	David J" <david.j.cowperthwaite@intel.com>, "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>, "Li, Susie" <susie.li@intel.com>, "Dong, Eddie" <eddie.dong@intel.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Gerd Hoffmann <kraxel@redhat.com>, "Zhou,
	Chao" <chao.zhou@intel.com>, Paolo Bonzini <pbonzini@redhat.com>, "Zhu, Libo" <libo.zhu@intel.com>, "Wang,
	Hongbo" <hongbo.wang@intel.com>, "Lv, Zhiyuan" <zhiyuan.lv@intel.com>

On 11/21/2015 01:25 AM, Alex Williamson wrote:
> On Fri, 2015-11-20 at 08:10 +0000, Tian, Kevin wrote:
>>
>> Here is a more concrete example:
>>
>> KVMGT doesn't require IOMMU. All DMA targets are already replaced with
>> HPA thru shadow GTT. So DMA requests from GPU all contain HPAs.
>>
>> When IOMMU is enabled, one simple approach is to have vGPU IOMMU
>> driver configure system IOMMU with identity mapping (HPA->HPA). We
>> can't use (GPA->HPA) since GPAs from multiple VMs are conflicting.
>>
>> However, we still have host gfx driver running. When IOMMU is enabled,
>> dma_alloc_*** will return IOVA (drvers/iommu/iova.c) in host gfx driver,
>> which will have IOVA->HPA programmed to system IOMMU.
>>
>> One IOMMU device entry can only translate one address space, so here
>> comes a conflict (HPA->HPA vs. IOVA->HPA). To solve this, vGPU IOMMU
>> driver needs to allocate IOVA from iova.c for each VM w/ vGPU assigned,
>> and then KVMGT will program IOVA in shadow GTT accordingly. It adds
>> one additional mapping layer (GPA->IOVA->HPA). In this way two
>> requirements can be unified together since only IOVA->HPA mapping
>> needs to be built.
>>
>> So unlike existing type1 IOMMU driver which controls IOMMU alone, vGPU
>> IOMMU driver needs to cooperate with other agent (iova.c here) to
>> co-manage system IOMMU. This may not impact existing VFIO framework.
>> Just want to highlight additional work here when implementing the vGPU
>> IOMMU driver.
>
> Right, so the existing i915 driver needs to use the DMA API and calls
> like dma_map_page() to enable translations through the IOMMU.  With
> dma_map_page(), the caller provides a page address (~HPA) and is
> returned an IOVA.  So unfortunately you don't get to take the shortcut
> of having an identity mapping through the IOMMU unless you want to
> convert i915 entirely to using the IOMMU API, because we also can't have
> the conflict that an HPA could overlap an IOVA for a previously mapped
> page.
>
> The double translation, once through the GPU MMU and once through the
> system IOMMU is going to happen regardless of whether we can identity
> map through the IOMMU.  The only solution to this would be for the GPU
> to participate in ATS and provide pre-translated transactions from the
> GPU.  All of this is internal to the i915 driver (or vfio extension of
> that driver) and needs to be done regardless of what sort of interface
> we're using to expose the vGPU to QEMU.  It just seems like VFIO
> provides a convenient way of doing this since you'll have ready access
> to the HVA-GPA mappings for the user.
>
> I think the key points though are:
>
>        * the VFIO type1 IOMMU stores GPA to HVA translations
>        * get_user_pages() on the HVA will pin the page and give you a
>          page
>        * dma_map_page() receives that page, programs the system IOMMU and
>          provides an IOVA
>        * the GPU MMU can then be programmed with the GPA to IOVA
>          translations

Thanks for such a nice example! I'll do my home work and get back to you
shortly :)

>
> Thanks,
> Alex
>

--
Thanks,
Jike