From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53511) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLF7G-0001Hq-RX for qemu-devel@nongnu.org; Mon, 18 Jan 2016 14:05:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aLF7B-0008Ty-Po for qemu-devel@nongnu.org; Mon, 18 Jan 2016 14:05:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39021) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLF7B-0008Sl-Em for qemu-devel@nongnu.org; Mon, 18 Jan 2016 14:05:21 -0500 Message-ID: <1453143919.32741.169.camel@redhat.com> From: Alex Williamson Date: Mon, 18 Jan 2016 12:05:19 -0700 In-Reply-To: <569CA8AD.6070200@intel.com> References: <569C5071.6080004@intel.com> <1453092476.32741.67.camel@redhat.com> <569CA8AD.6070200@intel.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jike Song Cc: "Ruan, Shuai" , "Tian, Kevin" , kvm@vger.kernel.org, "igvt-g@lists.01.org" , qemu-devel , Gerd Hoffmann , Paolo Bonzini , Zhiyuan Lv On Mon, 2016-01-18 at 16:56 +0800, Jike Song wrote: > On 01/18/2016 12:47 PM, Alex Williamson wrote: > > Hi Jike, > >=20 > > On Mon, 2016-01-18 at 10:39 +0800, Jike Song wrote: > > > Hi Alex, let's continue with a new thread :) > > >=20 > > > Basically we agree with you: exposing vGPU via VFIO can make > > > QEMU share as much code as possible with pcidev(PF or VF) assignmen= t. > > > And yes, different vGPU vendors can share quite a lot of the > > > QEMU part, which will do good for upper layers such as libvirt. > > >=20 > > >=20 > > > To achieve this, there are quite a lot to do, I'll summarize > > > it below. I dived into VFIO for a while but still may have > > > things misunderstood, so please correct me :) > > >=20 > > >=20 > > >=20 > > > First, let me illustrate my understanding of current VFIO > > > framework used to pass through a pcidev to guest: > > >=20 > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+----------------------------------+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vfio qemu=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+-----+------------------------+---+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|DMA= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0^=C2=A0=C2=A0|CFG > > > QEMU=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|map=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0I= RQ|=C2=A0=C2=A0| > > > -----------------------|---------------------|--|----------- > > > KERNEL=C2=A0=C2=A0=C2=A0=C2=A0+------------|---------------------|-= -|----------+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| VFIO=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0v=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0v=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0= =C2=A0+-------------------+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+-----+---------= --+ | > > > IOMMU=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0| vfio iommu driver= |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| vfio bus driver | | > > > API=C2=A0=C2=A0<-------+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| | > > > Layer=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0| e.g. type1=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| e= .g. vfio_pci=C2=A0=C2=A0=C2=A0| | > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0= =C2=A0+-------------------+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+---------------= --+ | > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+------= ------------------------------------------+ > > >=20 > > >=20 > > > Here when a particular pcidev is passed-through to a KVM guest, > > > it is attached to vfio_pci driver in host, and guest memory > > > is mapped into IOMMU via the type1 iommu driver. > > >=20 > > >=20 > > > Then, the draft infrastructure of future VFIO-based vgpu: > > >=20 > > >=20 > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+-------------------------------------+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vfio qemu=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+----+-------------------------+------+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|DMA=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0^=C2=A0=C2=A0|CFG > > > QEMU=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|map=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0I= RQ|=C2=A0=C2=A0| > > > ----------------------|----------------------|--|----------- > > > KERNEL=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+------------= |----------------------|--|----------+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|VFIO=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0v=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0v=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| +----------= ----------+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+-----+-----------+ | > > > DMA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| | vfio iommu driver=C2=A0=C2= =A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| vfio bus driver | | > > > API <------+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| | > > > Layer=C2=A0=C2=A0=C2=A0=C2=A0| |=C2=A0=C2=A0e.g. vfio_type2=C2=A0=C2= =A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0e.g. vfio_vgpu= | | > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| +----------= ----------+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+-----------------+ | > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0^=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0^=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+---------|--= |----------------------|--|----------+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0|=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0|=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0v=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+---------|--= |----------+=C2=A0=C2=A0=C2=A0+---------------------+ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| +-------v--= ---------+ |=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| |=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0| |=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| |=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0KVMGT=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0| |=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| |=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0| |=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0host g= fx driver=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| +----------= ---------+ |=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0|=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0= =C2=A0=C2=A0KVM hypervisor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0= |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0+------------= -----------+=C2=A0=C2=A0=C2=A0+---------------------+ > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0NOTE=C2=A0=C2=A0=C2= =A0=C2=A0vfio_type2 and vfio_vgpu are only *logically* parts > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0of VFIO, they may be implemented in KVM hyperv= isor > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0or host gfx driver. > > >=20 > > >=20 > > >=20 > > > Here we need to implement a new vfio IOMMU driver instead of type1, > > > let's call it vfio_type2 temporarily. The main difference from pcid= ev > > > assignment is, vGPU doesn't have its own DMA requester id, so it ha= s > > > to share mappings with host and other vGPUs. > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- type1 iommu drive= r maps gpa to hpa for passing through; > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0whereas= type2 maps iova to hpa; > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- hardware iommu is= always needed by type1, whereas for > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0type2, = hardware iommu is optional; > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- type1 will invoke= low-level IOMMU API (iommu_map et al) to > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0setup I= OMMU page table directly, whereas type2 dosen't (only > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0need to= invoke higher level DMA API like dma_map_page); > >=20 > > Yes, the current type1 implementation is not compatible with vgpu sin= ce > > there are not separate requester IDs on the bus and you probably don'= t > > want or need to pin all of guest memory like we do for direct > > assignment.=C2=A0=C2=A0However, let's separate the type1 user API fro= m the > > current implementation.=C2=A0=C2=A0It's quite easy within the vfio co= de to > > consider "type1" to be an API specification that may have multiple > > implementations.=C2=A0=C2=A0A minor code change would allow us to con= tinue > > looking for compatible iommu backends if the group we're trying to > > attach is rejected. >=20 > Would you elaborate a bit about 'iommu backends' here? Previously > I thought that entire type1 will be duplicated. If not, what is suppose= d > to add, a new vfio_dma_do_map? I don't know that you necessarily want to re-use any of the vfio_iommu_type1.c code as-is, it's just the API that we'll want to keep consistent so QEMU doesn't need to learn about a new iommu backend. =C2=A0Opportunities for sharing certainly may arise, you may wan= t to use a similar red-black tree for storing current mappings, the pinning code may be similar, etc. =C2=A0We can evaluate on a case by case basis whether it makes sense to pull out common code for each of those. As for an iommu backend in general, if you look at the code flow example in Documentation/vfio.txt, the user opens a container (/dev/vfio/vfio) and a group (/dev/vfio/$GROUPNUM). =C2=A0The group is se= t to associate with a container instance via=C2=A0VFIO_GROUP_SET_CONTAINER = and then an iommu model is set for the container with=C2=A0VFIO_SET_IOMMU. =C2=A0Looking at drivers/vfio/vfio.c:vfio_ioctl_set_iommu(), we look for = an iommu backend that supports the requested extension (VFIO_TYPE1_IOMMU), call the open() callback on it and then attempt to attach the group via the attach_group() callback. =C2=A0At this latter callback, the iommu backend can compare the device to those that it actually supports. =C2=A0= For instance the existing vfio_iommu_type1 will attempt to use the IOMMU API and should fail if the device cannot be supported with that. =C2=A0Th= e current loop in=C2=A0vfio_ioctl_set_iommu() will exit in this case, but a= s you can see in the code, it's easy to make it continue and look for another iommu backend that supports the requested extension. > > The benefit here is that QEMU could work > > unmodified, using the type1 vfio-iommu API regardless of whether a > > device is directly assigned or virtual. > >=20 > > Let's look at the type1 interface; we have simple map and unmap > > interfaces which map and unmap process virtual address space (vaddr) = to > > the device address space (iova).=C2=A0=C2=A0The host physical address= is obtained > > by pinning the vaddr.=C2=A0=C2=A0In the current implementation, a map= operation > > pins pages and populates the hardware iommu.=C2=A0=C2=A0A vgpu compat= ible > > implementation might simply register the translation into a kernel- > > based database to be called upon later.=C2=A0=C2=A0When the host grap= hics driver > > needs to enable dma for the vgpu, it doesn't need to go to QEMU for t= he > > translation, it already possesses the iova to vaddr mapping, which > > becomes iova to hpa after a pinning operation. > >=20 > > So, I would encourage you to look at creating a vgpu vfio iommu > > backened that makes use of the type1 api since it will reduce the > > changes necessary for userspace. > >=20 >=20 > Yes, keeping type1 API sounds a great idea. >=20 > > > We also need to implement a new 'bus' driver instead of vfio_pci, > > > let's call it vfio_vgpu temporarily: > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- vfio_pci is a rea= l pci driver, it has a probe method called > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0during = dev attaching; whereas the vfio_vgpu is a pseudo > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0driver,= it won't attach any devivce - the GPU is always owned by > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0host gf= x driver. It has to do 'probing' elsewhere, but > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0still i= n host gfx driver attached to the device; > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- pcidev(PF or VF) = attached to vfio_pci has a natural path > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0in sysf= s; whereas vgpu is purely a software concept: > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vfio_vg= pu needs to create create/destory vgpu instances, > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0maintai= n their paths in sysfs (e.g. "/sys/class/vgpu/intel/vgpu0") > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0etc. Th= ere should be something added in a higher layer > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0to do t= his (VFIO or DRM). > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- vfio_pci in most = case will allow QEMU to access pcidev > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0hardwar= e; whereas vfio_vgpu is to access virtual resource > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0emulate= d by another device model; > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- vfio_pci will inj= ect an IRQ to guest only when physical IRQ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0generat= ed; whereas vfio_vgpu may inject an IRQ for emulation > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0purpose= . Anyway they can share the same injection interface; > >=20 > > Here too, I think you're making assumptions based on an implementatio= n > > path.=C2=A0=C2=A0Personally, I think each vgpu should be a struct dev= ice and that > > an iommu group should be created for each.=C2=A0=C2=A0I think this is= a valid > > abstraction; dma isolation is provided through something other than a > > system-level iommu, but it's still provided.=C2=A0=C2=A0Without this,= the entire > > vfio core would need to be aware of vgpu, since the core operates on > > devices and groups.=C2=A0=C2=A0I believe creating a struct device als= o gives you > > basic probe and release support for a driver. > >=20 >=20 > Indeed. > BTW, that should be done in the 'bus' driver, right? I think you have some flexibility between the graphics driver and the vfio-vgpu driver in where this is done. =C2=A0If we want vfio-vgpu to be more generic, then vgpu device creation and management should probably be done in the graphics driver and vfio-vgpu should be able to probe that device and call back into the graphics driver to handle requests. If it turns out there's not much for vfio-vgpu to share, ie. it's just a passthrough for device specific emulation, then maybe we want a vfio- intel-vgpu instead. > > There will be a need for some sort of lifecycle management of a vgpu. > > =C2=A0How is it created?=C2=A0=C2=A0Destroyed?=C2=A0=C2=A0Can it be g= iven more or less resources > > than other vgpus, etc.=C2=A0=C2=A0This could be implemented in sysfs = for each > > physical gpu with vgpu support, sort of like how we support sr-iov no= w, > > the PF exports controls for creating VFs.=C2=A0=C2=A0The more commona= lity we can > > get for lifecycle and device access for userspace, the better. > >=20 >=20 > Will have a look at the VF managements, thanks for the info. >=20 > > As for virtual vs physical resources and interrupts, part of the > > purpose of vfio is to abstract a device into basic components.=C2=A0=C2= =A0It's up > > to the bus driver how accesses to each space map to the physical > > device.=C2=A0=C2=A0Take for instance PCI config space, the existing v= fio-pci > > driver emulates some portions of config space for the user. > >=20 > > > Questions: > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0[1] For VFIO No-IOM= MU mode (!iommu_present), I saw it was reverted > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0in upstream ae5515d66362(Revert: "vfio: Include No-IOMMU mode"). > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0In my opinion, vfio_type2 doesn't rely on it to support No-IOMMU > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0case, instead it needs a new implementation which fits both > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0w/ and w/o IOMMU. Is this correct? > > >=20 > >=20 > > vfio no-iommu has also been re-added for v4.5 (03a76b60f8ba), this wa= s > > simply a case that the kernel development outpaced the intended user > > and I didn't want to commit to the user api changes until it had been > > completely vetted.=C2=A0=C2=A0In any case, vgpu should have no depend= ency > > whatsoever on no-iommu.=C2=A0=C2=A0As above, I think vgpu should crea= te virtual > > devices and add them to an iommu group, similar to how no-iommu does, > > but without the kernel tainting because you are actually providing > > isolation through other means than a system iommu. > >=20 >=20 > Thanks for confirmation. >=20 > > > For things not mentioned above, we might have them discussed in > > > other threads, or temporarily maintained in a TODO list (we might g= et > > > back to them after the big picture get agreed): > > >=20 > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- How to expose gue= st framebuffer via VFIO for SPICE; > >=20 > > Potentially through a new, device specific region, which I think can = be > > done within the existing vfio API.=C2=A0=C2=A0The API can already exp= ose an > > arbitrary number of regions to the user, it's just a matter of how we > > tell the user the purpose of a region index beyond the fixed set we m= ap > > to PCI resources. > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- How to avoid doub= le translation with two-stage: GTT + IOMMU, > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0whether= identity map is possible, and if yes, how to make it > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0more ef= fectively; > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- Application accel= eration > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0You men= tioned that with VFIO, a vGPU may be used by > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0applica= tions to get GPU acceleration. It's a potential > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0opportu= nity to use vGPU for container usage, worthy of > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0further= investigation. > >=20 > > Yes, interesting topics.=C2=A0=C2=A0Thanks, > >=20 >=20 > Looks that things get more clear overall, with small exceptions. > Thanks for the advice:) Yes, please let me know how I can help. Thanks, Alex