From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33995) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOGvL-0005iN-HI for qemu-devel@nongnu.org; Tue, 26 Jan 2016 22:37:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aOGvI-0005sA-7D for qemu-devel@nongnu.org; Tue, 26 Jan 2016 22:37:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45888) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOGvH-0005s6-VC for qemu-devel@nongnu.org; Tue, 26 Jan 2016 22:37:36 -0500 Message-ID: <1453865854.3107.5.camel@redhat.com> From: Alex Williamson Date: Tue, 26 Jan 2016 20:37:34 -0700 In-Reply-To: <56A822C6.5020707@gmail.com> References: <569C5071.6080004@intel.com> <1453092476.32741.67.camel@redhat.com> <569CA8AD.6070200@intel.com> <1453143919.32741.169.camel@redhat.com> <569F4C86.2070501@intel.com> <56A6083E.10703@intel.com> <1453757426.32741.614.camel@redhat.com> <56A72313.9030009@intel.com> <56A77D2D.40109@gmail.com> <1453826249.26652.54.camel@redhat.com> <1453844613.18049.1.camel@redhat.com> <1453846073.18049.3.camel@redhat.com> <1453847250.18049.5.camel@redhat.com> <1453848975.18049.7.camel@redhat.com> <56A822C6.5020707@gmail.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yang Zhang , "Tian, Kevin" , "Song, Jike" Cc: "Ruan, Shuai" , Neo Jia , "kvm@vger.kernel.org" , "igvt-g@lists.01.org" , qemu-devel , Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" On Wed, 2016-01-27 at 09:52 +0800, Yang Zhang wrote: > On 2016/1/27 6:56, Alex Williamson wrote: > > On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > Sent: Wednesday, January 27, 2016 6:27 AM > > > >=C2=A0 > > > > On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote: > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > > > Sent: Wednesday, January 27, 2016 6:08 AM > > > > > >=C2=A0 > > > > > > > > > >=C2=A0 > > > > > > > > >=C2=A0 > > > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulatio= n callbacks to > > > > > > > > > KVM, so VM MMIO access will be forwarded to KVMGT direc= tly for > > > > > > > > > emulation in kernel. If we reuse above R/W flags, the w= hole emulation > > > > > > > > > path would be unnecessarily long with obvious performan= ce impact. We > > > > > > > > > either need a new flag here to indicate in-kernel emula= tion (bias from > > > > > > > > > passthrough support), or just hide the region alternati= vely (let KVMGT > > > > > > > > > to handle I/O emulation itself like today). > > > > > > > >=C2=A0 > > > > > > > > That sounds like a future optimization TBH.=C2=A0=C2=A0Th= ere's very strict > > > > > > > > layering between vfio and kvm.=C2=A0=C2=A0Physical device= assignment could make > > > > > > > > use of it as well, avoiding a round trip through userspac= e when an > > > > > > > > ioread/write would do.=C2=A0=C2=A0Userspace also needs to= orchestrate those kinds > > > > > > > > of accelerators, there might be cases where userspace wan= ts to see those > > > > > > > > transactions for debugging or manipulating the device.=C2= =A0=C2=A0We can't simply > > > > > > > > take shortcuts to provide such direct access.=C2=A0=C2=A0= Thanks, > > > > > > > >=C2=A0 > > > > > > >=C2=A0 > > > > > > > But we have to balance such debugging flexibility and accep= table performance. > > > > > > > To me the latter one is more important otherwise there'd be= no real usage > > > > > > > around this technique, while for debugging there are other = alternative (e.g. > > > > > > > ftrace) Consider some extreme case with 100k traps/second a= nd then see > > > > > > > how much impact a 2-3x longer emulation path can bring... > > > > > >=C2=A0 > > > > > > Are you jumping to the conclusion that it cannot be done with= proper > > > > > > layering in place?=C2=A0=C2=A0Performance is important, but i= t's not an excuse to > > > > > > abandon designing interfaces between independent components.=C2= =A0=C2=A0Thanks, > > > > > >=C2=A0 > > > > >=C2=A0 > > > > > Two are not controversial. My point is to remove unnecessary lo= ng trip > > > > > as possible. After another thought, yes we can reuse existing r= ead/write > > > > > flags: > > > > > =C2=A0=C2=A0 - KVMGT will expose a private control variable whe= ther in-kernel > > > > > delivery is required; > > > >=C2=A0 > > > > But in-kernel delivery is never *required*.=C2=A0=C2=A0Wouldn't u= serspace want to > > > > deliver in-kernel any time it possibly could? > > > >=C2=A0 > > > > > =C2=A0=C2=A0 - when the variable is true, KVMGT will register i= n-kernel MMIO > > > > > emulation callbacks then VM MMIO request will be delivered to K= VMGT > > > > > directly; > > > > > =C2=A0=C2=A0 - when the variable is false, KVMGT will not regis= ter anything. > > > > > VM MMIO request will then be delivered to Qemu and then ioread/= write > > > > > will be used to finally reach KVMGT emulation logic; > > > >=C2=A0 > > > > No, that means the interface is entirely dependent on a backdoor = through > > > > KVM.=C2=A0=C2=A0Why can't userspace (QEMU) do something like regi= ster an MMIO > > > > region with KVM handled via a provided file descriptor and offset= , > > > > couldn't KVM then call the file ops without a kernel exit?=C2=A0=C2= =A0Thanks, > > > >=C2=A0 > > >=C2=A0 > > > Could you elaborate this thought? If it can achieve the purpose w/o > > > a kernel exit definitely we can adapt to it. :-) > >=C2=A0 > > I only thought of it when replying to the last email and have been do= ing > > some research, but we already do quite a bit of synchronization throu= gh > > file descriptors.=C2=A0=C2=A0The kvm-vfio pseudo device uses a group = file > > descriptor to ensure a user has access to a group, allowing some degr= ee > > of interaction between modules.=C2=A0=C2=A0Eventfds and irqfds alread= y make use of > > f_ops on file descriptors to poke data.=C2=A0=C2=A0So, if KVM had inf= ormation that > > an MMIO region was backed by a file descriptor for which it already h= as > > a reference via fdget() (and verified access rights and whatnot), the= n > > it ought to be a simple matter to get to f_ops->read/write knowing th= e > > base offset of that MMIO region.=C2=A0=C2=A0Perhaps it could even sim= ply use > > __vfs_read/write().=C2=A0=C2=A0Then we've got a proper reference to t= he file > > descriptor for ownership purposes and we've transparently jumped acro= ss > > modules without any implicit knowledge of the other end.=C2=A0=C2=A0C= ould it work? > > Thanks, >=C2=A0 > ioeventfd is a good example. > As i known, all access to the MMIO of IGD is trapped into kernel. Also,= =C2=A0 > the pci config space is emulated by Qemu. Same the for VGA, which is=C2= =A0 > emulated too. I guest interrupt also is emulated(This means we cannot=C2= =A0 > benifit from VT-d pi). The most important is that KVMGT doesn't require= d=C2=A0 > hardware IOMMU. As we known, VFIO is for the direct device assignment,=C2= =A0 > but most of thing for KVMGT are emulated, why we should use VFIO for it= ? What is a vGPU?=C2=A0=C2=A0It's a PCI device exposed to QEMU that needs t= o support emulated and direct MMIO paths into the kernel driver, PCI config space emulation, and various interrupt models.=C2=A0=C2=A0What does the VFIO AP= I provide?=C2=A0=C2=A0Exactly those things. Yes, vfio is typically used for assigning physical devices, but it has a very modular infrastructure which allows sub-drivers to be written that can do much more complicated and device specific passthrough and emulation in the kernel.=C2=A0=C2=A0vfio typically works with a platform = IOMMU, but any devices that can provide isolation and translation services will work.=C2=A0=C2=A0In the case of graphics cards, there's effectively alrea= dy an IOMMU on the device, in the case of vGPU, this is mediated through the physical GPU driver. So what's the benefit?=C2=A0=C2=A0VFIO already has the IOMMU and device a= ccess interfaces, is already supported by QEMU and libvirt, and re-using these for vGPU avoids a proliferation of new vendor specific devices, each with their own implementation of these interfaces and each requiring unique libvirt and upper level management device specific knowledge. That's why.=C2=A0=C2=A0Thanks, Alex