From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neo Jia Subject: Re: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Date: Tue, 26 Jan 2016 14:28:30 -0800 Message-ID: <20160126222830.GB21927@nvidia.com> References: <1453092476.32741.67.camel@redhat.com> <569CA8AD.6070200@intel.com> <1453143919.32741.169.camel@redhat.com> <569F4C86.2070501@intel.com> <56A6083E.10703@intel.com> <1453757426.32741.614.camel@redhat.com> <20160126102003.GA14400@nvidia.com> <1453838773.15515.1.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "Tian, Kevin" , "Song, Jike" , Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" , "Ruan, Shuai" , "kvm@vger.kernel.org" , qemu-devel , "igvt-g@lists.01.org" , Kirti Wankhede To: Alex Williamson Return-path: Received: from hqemgate16.nvidia.com ([216.228.121.65]:6554 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751128AbcAZW2d convert rfc822-to-8bit (ORCPT ); Tue, 26 Jan 2016 17:28:33 -0500 Content-Disposition: inline In-Reply-To: <1453838773.15515.1.camel@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Jan 26, 2016 at 01:06:13PM -0700, Alex Williamson wrote: > On Tue, 2016-01-26 at 02:20 -0800, Neo Jia wrote: > > On Mon, Jan 25, 2016 at 09:45:14PM +0000, Tian, Kevin wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > >=A0 > > Hi Alex, Kevin and Jike, > >=A0 > > (Seems I shouldn't use attachment, resend it again to the list, pat= ches are > > inline at the end) > >=A0 > > Thanks for adding me to this technical discussion, a great opportun= ity > > for us to design together which can bring both Intel and NVIDIA vGP= U solution to > > KVM platform. > >=A0 > > Instead of directly jumping to the proposal that we have been worki= ng on > > recently for NVIDIA vGPU on KVM, I think it is better for me to put= out couple > > quick comments / thoughts regarding the existing discussions on thi= s thread as > > fundamentally I think we are solving the same problem, DMA, interru= pt and MMIO. > >=A0 > > Then we can look at what we have, hopefully we can reach some conse= nsus soon. > >=A0 > > > Yes, and since you're creating and destroying the vgpu here, this= is > > > where I'd expect a struct device to be created and added to an IO= MMU > > > group. =A0The lifecycle management should really include links be= tween > > > the vGPU and physical GPU, which would be much, much easier to do= with > > > struct devices create here rather than at the point where we star= t > > > doing vfio "stuff". > >=A0 > > Infact to keep vfio-vgpu to be more generic, vgpu device creation a= nd management > > can be centralized and done in vfio-vgpu. That also include adding = to IOMMU > > group and VFIO group. >=20 > Is this really a good idea?=A0=A0The concept of a vgpu is not unique = to > vfio, we want vfio to be a driver for a vgpu, not an integral part of > the lifecycle of a vgpu.=A0=A0That certainly doesn't exclude adding > infrastructure to make lifecycle management of a vgpu more consistent > between drivers, but it should be done independently of vfio.=A0=A0I'= ll go > back to the SR-IOV model, vfio is often used with SR-IOV VFs, but vfi= o > does not create the VF, that's done in coordination with the PF makin= g > use of some PCI infrastructure for consistency between drivers. >=20 > It seems like we need to take more advantage of the class and driver > core support to perhaps setup a vgpu bus and class with vfio-vgpu jus= t > being a driver for those devices. >=20 > > Graphics driver can register with vfio-vgpu to get management and e= mulation call > > backs to graphics driver.=A0=A0=A0 > >=A0 > > We already have struct vgpu_device in our proposal that keeps point= er to > > physical device.=A0=A0 > >=A0 > > > - vfio_pci will inject an IRQ to guest only when physical IRQ > > > generated; whereas vfio_vgpu may inject an IRQ for emulation > > > purpose. Anyway they can share the same injection interface; > >=A0 > > eventfd to inject the interrupt is known to vfio-vgpu, that fd shou= ld be > > available to graphics driver so that graphics driver can inject int= errupts > > directly when physical device triggers interrupt.=A0 > >=A0 > > Here is the proposal we have, please review. > >=A0 > > Please note the patches we have put out here is mainly for POC purp= ose to > > verify our understanding also can serve the purpose to reduce confu= sions and speed up=A0 > > our design, although we are very happy to refine that to something = eventually > > can be used for both parties and upstreamed. > >=A0 > > Linux vGPU kernel design > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > Here we are proposing a generic Linux kernel module based on VFIO f= ramework > > which allows different GPU vendors to plugin and provide their GPU = virtualization > > solution on KVM, the benefits of having such generic kernel module = are: > >=A0 > > 1) Reuse QEMU VFIO driver, supporting VFIO UAPI > >=A0 > > 2) GPU HW agnostic management API for upper layer software such as = libvirt > >=A0 > > 3) No duplicated VFIO kernel logic reimplemented by different GPU d= river vendor > >=A0 > > 0. High level overview > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > =A0 > > =A0 user space: > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0+-----------+=A0=A0VFIO IOMMU IOCTLs > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0+= ---------| QEMU VFIO |-------------------------+ > > =A0=A0=A0=A0=A0=A0=A0=A0VFIO IOCTLs=A0=A0=A0|=A0=A0=A0=A0=A0=A0=A0=A0= =A0+-----------+=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0| > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0|= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0|=A0 > > =A0---------------------|------------------------------------------= -----|--------- > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0|= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0| > > =A0 kernel space:=A0=A0=A0=A0=A0=A0=A0|=A0=A0+--->----------->---+=A0= =A0(callback)=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0V > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0|= =A0=A0|=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0v=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0+------V-----+ > > =A0 +----------+=A0=A0=A0+----V--^--+=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= +--+--+-----+=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0| VGPU=A0=A0=A0=A0=A0=A0=A0= | > > =A0 |=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0|=A0=A0=A0|=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0|=A0=A0=A0=A0=A0+----| nvidia.ko +----->-----> TYPE1 IOMMU| > > =A0 | VFIO Bus <=3D=3D=3D| VGPU.ko=A0=A0|<----|=A0=A0=A0=A0+-------= ----+=A0=A0=A0=A0=A0|=A0=A0=A0=A0=A0+---++-------+=A0 > > =A0 |=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0|=A0=A0=A0|=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0|=A0=A0=A0=A0=A0| (register)=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0^= =A0=A0=A0=A0=A0=A0=A0=A0=A0|| > > =A0 +----------+=A0=A0=A0+-------+--+=A0=A0=A0=A0=A0|=A0=A0=A0=A0+-= ----------+=A0=A0=A0=A0=A0|=A0=A0=A0=A0=A0=A0=A0=A0=A0|| > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0V=A0=A0=A0=A0=A0=A0=A0=A0+----| i915.ko=A0=A0=A0+-----+=A0=A0=A0=A0= =A0+---VV-------+=A0 > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0|=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0+-----^-----+=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0| TYPE1=A0=A0=A0=A0=A0=A0| > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0|=A0=A0(callback)=A0=A0=A0=A0=A0=A0=A0|=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0| IOMMU=A0=A0=A0=A0=A0=A0| > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0+-->------------>---+=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0+------------+ > > =A0access flow: > >=A0 > > =A0 Guest MMIO / PCI config access > > =A0 | > > =A0 ------------------------------------------------- > > =A0 | > > =A0 +-----> KVM VM_EXITs=A0=A0(kernel) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0| > > =A0 ------------------------------------------------- > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0| > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0+-----> QEMU VFIO driver (user) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0|=A0 > > =A0 ------------------------------------------------- > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0| > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0+---->=A0=A0V= GPU kernel driver (kernel) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0|=A0=A0 > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0|=A0 > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0+----> vendor driver callback > >=A0 > >=A0 > > 1. VGPU management interface > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > This is the interface allows upper layer software (mostly libvirt) = to query and > > configure virtual GPU device in a HW agnostic fashion. Also, this m= anagement > > interface has provided flexibility to underlying GPU vendor to supp= ort virtual > > device hotplug, multiple virtual devices per VM, multiple virtual d= evices from > > different physical devices, etc. > >=A0 > > 1.1 Under per-physical device sysfs: > > -------------------------------------------------------------------= --------------- > >=A0 > > vgpu_supported_types - RO, list the current supported virtual GPU t= ypes and its > > VGPU_ID. VGPU_ID - a vGPU type identifier returned from reads of > > "vgpu_supported_types". > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0 > > vgpu_create - WO, input syntax , create a virt= ual > > gpu device on a target physical GPU. idx: virtual device index insi= de a VM > >=A0 > > vgpu_destroy - WO, input syntax , destroy a virtual gp= u device on a > > target physical GPU >=20 >=20 > I've noted in previous discussions that we need to separate user poli= cy > from kernel policy here, the kernel policy should not require a "VM > UUID".=A0=A0A UUID simply represents a set of one or more devices and= an > index picks the device within the set.=A0=A0Whether that UUID matches= a VM > or is independently used is up to the user policy when creating the > device. >=20 > Personally I'd also prefer to get rid of the concept of indexes withi= n a > UUID set of devices and instead have each device be independent.=A0=A0= This > seems to be an imposition on the nvidia implementation into the kerne= l > interface design. >=20 Hi Alex, I agree with you that we should not put UUID concept into a kernel API.= At this point (without any prototyping), I am thinking of using a list of = virtual devices instead of UUID. >=20 > > 1.3 Under vgpu class sysfs: > > -------------------------------------------------------------------= --------------- > >=A0 > > vgpu_start - WO, input syntax , this will trigger the regi= stration > > interface to notify the GPU vendor driver to commit virtual GPU res= ource for > > this target VM.=A0 > >=A0 > > Also, the vgpu_start function is a synchronized call, the successfu= l return of > > this call will indicate all the requested vGPU resource has been fu= lly > > committed, the VMM should continue. > >=A0 > > vgpu_shutdown - WO, input syntax , this will trigger the r= egistration > > interface to notify the GPU vendor driver to release virtual GPU re= source of > > this target VM. > >=A0 > > 1.4 Virtual device Hotplug > > -------------------------------------------------------------------= --------------- > >=A0 > > To support virtual device hotplug, and = can be > > accessed during VM runtime, and the corresponding registration call= back will be > > invoked to allow GPU vendor support hotplug. > >=A0 > > To support hotplug, vendor driver would take necessary action to ha= ndle the > > situation when a vgpu_create is done on a VM_UUID after vgpu_start,= and that > > implies both create and start for that vgpu device. > >=A0 > > Same, vgpu_destroy implies a vgpu_shudown on a running VM only if v= endor driver > > supports vgpu hotplug. > >=A0 > > If hotplug is not supported and VM is still running, vendor driver = can return > > error code to indicate not supported. > >=A0 > > Separate create from start gives flixibility to have: > >=A0 > > - multiple vgpu instances for single VM and > > - hotplug feature. > >=A0 > > 2. GPU driver vendor registration interface > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > 2.1 Registration interface definition (include/linux/vgpu.h) > > -------------------------------------------------------------------= --------------- > >=A0 > > extern int vgpu_register_device(struct pci_dev *dev,=A0 > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0const struct gpu_device_ops *ops); > >=A0 > > extern void vgpu_unregister_device(struct pci_dev *dev); > >=A0 > > /** > > =A0* struct gpu_device_ops - Structure to be registered for each ph= ysical GPU to > > =A0* register the device to vgpu module. > > =A0* > > =A0* @owner:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0The module owner. > > =A0* @vgpu_supported_config:=A0=A0=A0=A0=A0=A0Called to get informa= tion about supported vgpu > > =A0* types. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@dev : pci device structure of physical GPU.= =A0 > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@config: should return string listing suppor= ted > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0config > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0Returns integer: success (0) or error (< 0) > > =A0* @vgpu_create:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0C= alled to allocate basic resouces in graphics > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0driver for a particular vgpu. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@dev: physical pci device structure on which > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0vgpu=A0 > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0should be created > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vm_uuid: VM's uuid for which VM it is inten= ded > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0to > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@instance: vgpu instance in that VM > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vgpu_id: This represents the type of vgpu t= o be > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0created > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0Returns integer: success (0) or error (< 0) > > =A0* @vgpu_destroy:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Cal= led to free resources in graphics driver for > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0a vgpu instance of that VM. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@dev: physical pci device structure to which > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0this vgpu points to. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vm_uuid: VM's uuid for which the vgpu belon= gs > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0to. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@instance: vgpu instance in that VM > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0Returns integer: success (0) or error (< 0) > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0If VM is running and vgpu_destroy is called = that=A0 > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0means the vGPU is being hotunpluged. Return > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0error > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0if VM is running and graphics driver doesn't > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0support vgpu hotplug. > > =A0* @vgpu_start:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= Called to do initiate vGPU initialization > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0process in graphics driver when VM boots bef= ore > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0qemu starts. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vm_uuid: VM's UUID which is booting. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0Returns integer: success (0) or error (< 0) > > =A0* @vgpu_shutdown:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Calle= d to teardown vGPU related resources for > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0the VM > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vm_uuid: VM's UUID which is shutting down . > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0Returns integer: success (0) or error (< 0) > > =A0* @read:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0Read emulation callback > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vdev: vgpu device structure > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@buf: read buffer > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@count: number bytes to read=A0 > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@address_space: specifies for which address > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0space > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0the request is: pci_config_space, IO registe= r > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0space or MMIO space. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0Retuns number on bytes read on success or er= ror. > > =A0* @write:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0Write emulation callback > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vdev: vgpu device structure > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@buf: write buffer > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@count: number bytes to be written > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@address_space: specifies for which address > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0space > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0the request is: pci_config_space, IO registe= r > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0space or MMIO space. > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0Retuns number on bytes written on success or > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0error. > > =A0* @vgpu_set_irqs:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Calle= d to send about interrupts configuration > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0information that qemu set.=A0 > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@vdev: vgpu device structure > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0@flags, index, start, count and *data : same= as > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0that of struct vfio_irq_set of > > =A0*=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0VFIO_DEVICE_SET_IRQS API.=A0 > > =A0* > > =A0* Physical GPU that support vGPU should be register with vgpu mo= dule with=A0 > > =A0* gpu_device_ops structure. > > =A0*/ > >=A0 > > struct gpu_device_ops { > > =A0=A0=A0=A0=A0=A0=A0=A0struct module=A0=A0=A0*owner; > > =A0=A0=A0=A0=A0=A0=A0=A0int=A0=A0=A0=A0=A0(*vgpu_supported_config)(= struct pci_dev *dev, char *config); > > =A0=A0=A0=A0=A0=A0=A0=A0int=A0=A0=A0=A0=A0(*vgpu_create)(struct pci= _dev *dev, uuid_le vm_uuid, > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0uint32_t instance, uint32_t vgpu_id); > > =A0=A0=A0=A0=A0=A0=A0=A0int=A0=A0=A0=A0=A0(*vgpu_destroy)(struct pc= i_dev *dev, uuid_le vm_uuid, > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0uint32_t instance); > > =A0=A0=A0=A0=A0=A0=A0=A0int=A0=A0=A0=A0=A0(*vgpu_start)(uuid_le vm_= uuid); > > =A0=A0=A0=A0=A0=A0=A0=A0int=A0=A0=A0=A0=A0(*vgpu_shutdown)(uuid_le = vm_uuid); > > =A0=A0=A0=A0=A0=A0=A0=A0ssize_t (*read) (struct vgpu_device *vdev, = char *buf, size_t count, > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0uint32_t address_space, loff_t pos); > > =A0=A0=A0=A0=A0=A0=A0=A0ssize_t (*write)(struct vgpu_device *vdev, = char *buf, size_t count, > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0uint32_t address_space,loff_t pos); > > =A0=A0=A0=A0=A0=A0=A0=A0int=A0=A0=A0=A0=A0(*vgpu_set_irqs)(struct v= gpu_device *vdev, uint32_t flags, > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0unsigned index, unsigned start, unsigned = count, > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0void *data); > >=A0 > > }; >=20 >=20 > I wonder if it shouldn't be vfio-vgpu sub-drivers (ie, Intel and Nvid= ia) > that register these ops with the main vfio-vgpu driver and they shoul= d > also include a probe() function which allows us to associate a given > vgpu device with a set of vendor ops. >=20 >=20 > >=A0 > > 2.2 Details for callbacks we haven't mentioned above. > > -------------------------------------------------------------------= -------------- > >=A0 > > vgpu_supported_config: allows the vendor driver to specify the supp= orted vGPU > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= type/configuration > >=A0 > > vgpu_create=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0: create a virtual GPU dev= ice, can be used for device hotplug. > >=A0 > > vgpu_destroy=A0=A0=A0=A0=A0=A0=A0=A0=A0: destroy a virtual GPU devi= ce, can be used for device hotplug. > >=A0 > > vgpu_start=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0: callback function to n= otify vendor driver vgpu device > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= come to live for a given virtual machine. > >=A0 > > vgpu_shutdown=A0=A0=A0=A0=A0=A0=A0=A0: callback function to notify = vendor driver=A0 > >=A0 > > read=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0: callback t= o vendor driver to handle virtual device config > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= space or MMIO read access > >=A0 > > write=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0: callback to = vendor driver to handle virtual device config > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= space or MMIO write access > >=A0 > > vgpu_set_irqs=A0=A0=A0=A0=A0=A0=A0=A0: callback to vendor driver to= pass along the interrupt > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= information for the target virtual device, then vendor > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= driver can inject interrupt into virtual machine for this > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= device. > >=A0 > > 2.3 Potential additional virtual device configuration registration = interface: > > -------------------------------------------------------------------= -------------- > >=A0 > > callback function to describe the MMAP behavior of the virtual GPU=A0 > >=A0 > > callback function to allow GPU vendor driver to provide PCI config = space backing > > memory. > >=A0 > > 3. VGPU TYPE1 IOMMU > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > Here we are providing a TYPE1 IOMMU for vGPU which will basically k= eep track the=A0 > > and save the QEMU mm for later reference. > >=A0 > > You can find the quick/ugly implementation in the attached patch fi= le, which is > > actually just a simple version Alex's type1 IOMMU without actual re= al > > mapping when IOMMU_MAP_DMA / IOMMU_UNMAP_DMA is called.=A0 > >=A0 > > We have thought about providing another vendor driver registration = interface so > > such tracking information will be sent to vendor driver and he will= use the QEMU > > mm to do the get_user_pages / remap_pfn_range when it is required. = After doing a > > quick implementation within our driver, I noticed following issues: > >=A0 > > 1) OS/VFIO logic into vendor driver which will be a maintenance iss= ue. > >=A0 > > 2) Every driver vendor has to implement their own RB tree, instead = of reusing > > the common existing VFIO code (vfio_find/link/unlink_dma)=A0 > >=A0 > > 3) IOMMU_UNMAP_DMA is expecting to get "unmapped bytes" back to the= caller/QEMU, > > better not have anything inside a vendor driver that the VFIO calle= r immediately > > depends on. > >=A0 > > Based on the above consideration, we decide to implement the DMA tr= acking logic > > within VGPU TYPE1 IOMMU code (ideally, this should be merged into c= urrent TYPE1 > > IOMMU code) and expose two symbols to outside for MMIO mapping and = page > > translation and pinning.=A0 > >=A0 > > Also, with a mmap MMIO interface between virtual and physical, this= allows > > para-virtualized guest driver can access his virtual MMIO without t= aking a MMAP > > fault hit, also we can support different MMIO size between virtual = and physical > > device. > >=A0 > > int vgpu_map_virtual_bar > > ( > > =A0=A0=A0=A0uint64_t virt_bar_addr, > > =A0=A0=A0=A0uint64_t phys_bar_addr, > > =A0=A0=A0=A0uint32_t len, > > =A0=A0=A0=A0uint32_t flags > > ) > >=A0 > > EXPORT_SYMBOL(vgpu_map_virtual_bar); >=20 >=20 > Per the implementation provided, this needs to be implemented in the > vfio device driver, not in the iommu interface.=A0=A0Finding the DMA = mapping > of the device and replacing it is wrong.=A0=A0It should be remapped a= t the > vfio device file interface using vm_ops. >=20 So you are basically suggesting that we are going to take a mmap fault = and within that fault handler, we will go into vendor driver to look up the "pre-registered" mapping and remap there. Is my understanding correct? >=20 > > int vgpu_dma_do_translate(dma_addr_t *gfn_buffer, uint32_t count) > >=A0 > > EXPORT_SYMBOL(vgpu_dma_do_translate); > >=A0 > > Still a lot to be added and modified, such as supporting multiple V= Ms and=A0 > > multiple virtual devices, tracking the mapped / pinned region withi= n VGPU IOMMU=A0 > > kernel driver, error handling, roll-back and locked memory size per= user, etc.=A0 >=20 > Particularly, handling of mapping changes is completely missing.=A0=A0= This > cannot be a point in time translation, the user is free to remap > addresses whenever they wish and device translations need to be updat= ed > accordingly. >=20 When you say "user", do you mean the QEMU? Here, whenever the DMA that the guest driver is going to launch will be first pinned within VM, and= then registered to QEMU, therefore the IOMMU memory listener, eventually the= pages will be pinned by the GPU or DMA engine. Since we are keeping the upper level code same, thinking about passthru= case, where the GPU has already put the real IOVA into his PTEs, I don't know= how QEMU can change that mapping without causing an IOMMU fault on a active DMA = device. >=20 > > 4. Modules > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > Two new modules are introduced: vfio_iommu_type1_vgpu.ko and vgpu.k= o > >=A0 > > vfio_iommu_type1_vgpu.ko - IOMMU TYPE1 driver supporting the IOMMU=A0 > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0TYPE1 v1 and v2 interface.=A0 >=20 > Depending on how intrusive it is, this can possibly by done within th= e > existing type1 driver.=A0=A0Either that or we can split out common co= de for > use by a separate module. >=20 > > vgpu.ko=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0- prov= ide registration interface and virtual device > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0VFIO access. > >=A0 > > 5. QEMU note > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > To allow us focus on the VGPU kernel driver prototyping, we have in= troduced a new VFIO=A0 > > class - vgpu inside QEMU, so we don't have to change the existing v= fio/pci.c file and=A0 > > use it as a reference for our implementation. It is basically just = a quick c & p > > from vfio/pci.c to quickly meet our needs. > >=A0 > > Once this proposal is finalized, we will move to vfio/pci.c instead= of a new > > class, and probably the only thing required is to have a new way to= discover the > > device. > >=A0 > > 6. Examples > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=A0 > > On this server, we have two NVIDIA M60 GPUs. > >=A0 > > [root@cjia-vgx-kvm ~]# lspci -d 10de:13f2 > > 86:00.0 VGA compatible controller: NVIDIA Corporation Device 13f2 (= rev a1) > > 87:00.0 VGA compatible controller: NVIDIA Corporation Device 13f2 (= rev a1) > >=A0 > > After nvidia.ko gets initialized, we can query the supported vGPU t= ype by > > accessing the "vgpu_supported_types" like following: > >=A0 > > [root@cjia-vgx-kvm ~]# cat /sys/bus/pci/devices/0000\:86\:00.0/vgpu= _supported_types=A0 > > 11:GRID M60-0B > > 12:GRID M60-0Q > > 13:GRID M60-1B > > 14:GRID M60-1Q > > 15:GRID M60-2B > > 16:GRID M60-2Q > > 17:GRID M60-4Q > > 18:GRID M60-8Q > >=A0 > > For example the VM_UUID is c0b26072-dd1b-4340-84fe-bf338c510818, an= d we would > > like to create "GRID M60-4Q" VM on it. > >=A0 > > echo "c0b26072-dd1b-4340-84fe-bf338c510818:0:17" > /sys/bus/pci/dev= ices/0000\:86\:00.0/vgpu_create > >=A0 > > Note: the number 0 here is for vGPU device index. So far the change= is not tested > > for multiple vgpu devices yet, but we will support it. > >=A0 > > At this moment, if you query the "vgpu_supported_types" it will sti= ll show all > > supported virtual GPU types as no virtual GPU resource is committed= yet. > >=A0 > > Starting VM: > >=A0 > > echo "c0b26072-dd1b-4340-84fe-bf338c510818" > /sys/class/vgpu/vgpu_= start > >=A0 > > then, the supported vGPU type query will return: > >=A0 > > [root@cjia-vgx-kvm /home/cjia]$ > > > cat /sys/bus/pci/devices/0000\:86\:00.0/vgpu_supported_types > > 17:GRID M60-4Q > >=A0 > > So vgpu_supported_config needs to be called whenever a new virtual = device gets > > created as the underlying HW might limit the supported types if the= re are > > any existing VM runnings. > >=A0 > > Then, VM gets shutdown, writes to /sys/class/vgpu/vgpu_shutdown wil= l info the > > GPU driver vendor to clean up resource. > >=A0 > > Eventually, those virtual GPUs can be removed by writing to vgpu_de= stroy under > > device sysfs. >=20 >=20 > I'd like to hear Intel's thoughts on this interface.=A0=A0Are there > different vgpu capacities or priority classes that would necessitate > different types of vcpus on Intel? >=20 > I think there are some gaps in translating from named vgpu types to > indexes here, along with my previous mention of the UUID/set oddity. >=20 > Does Intel have a need for start and shutdown interfaces? >=20 > Neo, wasn't there at some point information about how many of each ty= pe > could be supported through these interfaces?=A0=A0How does a user kno= w their > capacity limits? >=20 Thanks for reminding me that, I think we probably forget to put that *i= mportant* information as the output of "vgpu_supported_types". Regarding the capacity, we can provide the frame buffer size as part of= the "vgpu_supported_types" output as well, I would imagine those will be ev= entually show up on the openstack management interface or virt-mgr. Basically, yes there would be a separate col to show the number of inst= ance you can create for each type of VGPU on a specific physical GPU. Thanks, Neo > Thanks, > Alex >=20