From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neo Jia Subject: Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU. Date: Tue, 16 Feb 2016 21:37:42 -0800 Message-ID: <20160217053742.GA8839@nvidia.com> References: <20160216071304.GA6867@nvidia.com> <20160216073647.GB6867@nvidia.com> <20160216075310.GC6867@nvidia.com> <20160216084855.GA7717@nvidia.com> <20160217041743.GA7903@nvidia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "Ruan, Shuai" , "Song, Jike" , "kvm@vger.kernel.org" , Kirti Wankhede , qemu-devel , Alex Williamson , Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" To: "Tian, Kevin" Return-path: Received: from hqemgate14.nvidia.com ([216.228.121.143]:2586 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751582AbcBQFhs convert rfc822-to-8bit (ORCPT ); Wed, 17 Feb 2016 00:37:48 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Feb 17, 2016 at 05:04:31AM +0000, Tian, Kevin wrote: > > From: Neo Jia > > Sent: Wednesday, February 17, 2016 12:18 PM > >=20 > > On Wed, Feb 17, 2016 at 03:31:24AM +0000, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > Sent: Tuesday, February 16, 2016 4:49 PM > > > > > > > > On Tue, Feb 16, 2016 at 08:10:42AM +0000, Tian, Kevin wrote: > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > Sent: Tuesday, February 16, 2016 3:53 PM > > > > > > > > > > > > On Tue, Feb 16, 2016 at 07:40:47AM +0000, Tian, Kevin wrote= : > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > > Sent: Tuesday, February 16, 2016 3:37 PM > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +0000, Tian, Kevin w= rote: > > > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM > > > > > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +0000, Tian, Kev= in wrote: > > > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@r= edhat.com] > > > > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffman= n wrote: > > > > > > > > > > > > > =A0 Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > Actually I have a long puzzle in this area.= Definitely libvirt will > > use > > > > UUID > > > > > > to > > > > > > > > > > > > > > mark a VM. And obviously UUID is not record= ed within KVM. > > Then > > > > how > > > > > > does > > > > > > > > > > > > > > libvirt talk to KVM based on UUID? It could= be a good reference > > to > > > > this > > > > > > design. > > > > > > > > > > > > > > > > > > > > > > > > > > libvirt keeps track which qemu instance belon= gs to which vm. > > > > > > > > > > > > > qemu also gets started with "-uuid ...", so o= ne can query qemu > > via > > > > > > > > > > > > > monitor ("info uuid") to figure what the uuid= is.=A0=A0It is also in the > > > > > > > > > > > > > smbios tables so the guest can see it in the = system information > > table. > > > > > > > > > > > > > > > > > > > > > > > > > > The uuid is not visible to the kernel though,= the kvm kernel driver > > > > > > > > > > > > > doesn't know what the uuid is (and neither do= es vfio).=A0=A0qemu uses > > > > file > > > > > > > > > > > > > handles to talk to both kvm and vfio.=A0=A0qe= mu notifies both kvm > > and > > > > vfio > > > > > > > > > > > > > about anything relevant events (guest address= space changes > > etc) > > > > and > > > > > > > > > > > > > connects file descriptors (eventfd -> irqfd). > > > > > > > > > > > > > > > > > > > > > > > > I think the original link to using a VM UUID fo= r the vGPU comes from > > > > > > > > > > > > NVIDIA having a userspace component which might= get launched > > from > > > > a udev > > > > > > > > > > > > event as the vGPU is created or the set of vGPU= s within that UUID > > is > > > > > > > > > > > > started.=A0=A0Using the VM UUID then gives them= a way to associate > > that > > > > > > > > > > > > userspace process with a VM instance.=A0=A0Mayb= e it could register with > > > > > > > > > > > > libvirt for some sort of service provided for t= he VM, I don't know. > > > > > > > > > > > > > > > > > > > > > > Intel doesn't have this requirement. It should be= enough as long as > > > > > > > > > > > libvirt maintains which sysfs vgpu node is associ= ated to a VM UUID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > qemu needs a sysfs node as handle to the vfio= device, something > > > > > > > > > > > > > like /sys/devices/virtual/vgpu/.=A0=A0<= name> can be a uuid > > if > > > > you > > > > > > want > > > > > > > > > > > > > have it that way, but it could be pretty much= anything.=A0=A0The sysfs > > node > > > > > > > > > > > > > will probably show up as-is in the libvirt xm= l when assign a vgpu > > to > > > > a > > > > > > > > > > > > > vm.=A0=A0So the name should be something stab= le (i.e. when using > > a uuid > > > > as > > > > > > > > > > > > > name you should better not generate a new one= on each boot). > > > > > > > > > > > > > > > > > > > > > > > > Actually I don't think there's really a persist= ent naming issue, that's > > > > > > > > > > > > probably where we diverge from the SR-IOV model= =2E=A0=A0SR-IOV cannot > > > > > > > > > > > > dynamically add a new VF, it needs to reset the= number of VFs to > > zero, > > > > > > > > > > > > then re-allocate all of them up to the new desi= red count.=A0=A0That has > > some > > > > > > > > > > > > obvious implications.=A0=A0I think with both ve= ndors here, we can > > > > > > > > > > > > dynamically allocate new vGPUs, so I would expe= ct that libvirt would > > > > > > > > > > > > create each vGPU instance as it's needed.=A0=A0= None would be created > > by > > > > > > > > > > > > default without user interaction. > > > > > > > > > > > > > > > > > > > > > > > > Personally I think using a UUID makes sense, bu= t it needs to be > > > > > > > > > > > > userspace policy whether that UUID has any impl= icit meaning like > > > > > > > > > > > > matching the VM UUID.=A0=A0Having an index with= in a UUID bothers me > > a > > > > bit, > > > > > > > > > > > > but it doesn't seem like too much of a concessi= on to enable the use > > case > > > > > > > > > > > > that NVIDIA is trying to achieve.=A0=A0Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would prefer to making UUID an optional paramet= er, while not tieing > > > > > > > > > > > sysfs vgpu naming to UUID. This would be more fle= xible to different > > > > > > > > > > > scenarios where UUID might not be required. > > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > > > > > Happy Chinese New Year! > > > > > > > > > > > > > > > > > > > > I think having UUID as the vgpu device name will al= low us to have an > > gpu > > > > vendor > > > > > > > > > > agnostic solution for the upper layer software stac= k such as QEMU, who > > is > > > > > > > > > > supposed to open the device. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Qemu can use whatever sysfs path provided to open the= device, regardless > > > > > > > > > of whether there is an UUID within the path... > > > > > > > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > Then it will provide even more benefit of using UUID as= libvirt can be > > > > > > > > implemented as gpu vendor agnostic, right? :-) > > > > > > > > > > > > > > > > The UUID can be VM UUID or vGPU group object UUID which= really depends > > on > > > > the > > > > > > > > high level software stack, again the benefit is gpu ven= dor agnostic. > > > > > > > > > > > > > > > > > > > > > > There is case where libvirt is not used while another mgm= t. stack doesn't use > > > > > > > UUID, e.g. in some Xen scenarios. So it's not about GPU v= endor agnostic. It's > > > > > > > about high level mgmt. stack agnostic. That's why we need= make UUID as > > > > > > > optional in this vGPU-core framework. > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > As long as you have to create an object to represent vGPU o= r vGPU group, you > > > > > > will have UUID, no matter which management stack you are go= ing to use. > > > > > > > > > > > > UUID is the most agnostic way to represent an object, I thi= nk. > > > > > > > > > > > > (a bit off topic since we are supposed to focus on VFIO on = KVM) > > > > > > > > > > > > Since now you are talking about Xen, I am very happy to dis= cuss that with you. > > > > > > You can check how Xen has managed its object via UUID in xa= pi. > > > > > > > > > > > > > > > > Well, I'm not the expert in this area. IMHO UUID is just an u= ser level > > > > > attribute, which can be associated to any sysfs node and mana= ged by > > > > > mgmt. stack itself, and then the sysfs path can be opened as = the > > > > > bridge between user/kernel. I don't understand the necessity = of binding > > > > > UUID internally within vGPU core framework here. Alex gave on= e example > > > > > of udev, but I didn't quite catch why only UUID can work ther= e. Maybe > > > > > you can elaborate that requirement. > > > > > > > > Hi Kevin, > > > > > > > > UUID is just a way to represent an object. > > > > > > > > It is not binding, it is just a representation. I think here we= are just > > > > creating a convenient and generic way to represent a virtual gp= u device on > > > > sysfs. > > > > > > > > Having the UUID as part of the virtual gpu device name allows u= s easily find out > > > > the mapping. > > > > > > > > UUID can be anything, you can always use an UUID to present VMI= D in the example > > > > you listed below, so you are actually gaining flexibility by us= ing UUID instead > > > > of VMID as it can be supported by both KVM and Xen. :-) > > > > > > > > Thanks, > > > > Neo > > > > > > > > > > Thanks Neo. I understand UUID has its merit in many usages. As yo= u > > > may see from my earlier reply, my main concern is whether it's a = must > > > to record this information within kernel vGPU-core framework. We = can > > > still make it hypervisor agnostic even not using UUID, as long as= there's > > > a unified namespace created for all vgpus, like: > > > vgpu-vendor-0, vgpu-vendor-1, ... > > > > > > Then high level mgmt. stack can associate UUID to that namespace.= So > > > I hope you can help elaborate below description: > > > > >=20 > > > > Having the UUID as part of the virtual gpu device name allows u= s easily find out > > > > the mapping. > > > > >=20 > > Hi Kevin, > >=20 > > The answer is simple, having a UUID as part of the device name will= give you a > > unique sysfs path that will be opened by QEMU. > >=20 > > vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have m= ultiple > > virtual gpu devices per VM coming from same or different physical d= evices. >=20 > That is not a problem. We can add physical device info too like vgpu-= vendor-0-0, > vgpu-vendor-1-0, ... >=20 > Please note Qemu doesn't care about the actual name. It just accepts = a sysfs path > to open. Hi Kevin, No, I think you are making things even more complicated than it is requ= ired, also it is not generic anymore as you are requiring the QEMU to know mo= re than he needs to. The way you name those devices will require QEMU to know the relation between virtual devices and physical devices. I don't think that is goo= d. My flow is like this: libvirt creats a VM object, it will have a UUID. then it will use the U= UID to create virtual gpu devices, then it will pass the UUID to the QEMU (act= ually QEMU already has the VM UUID), then it will just open up the unique pat= h. Also, you need to consider those 0-0 numbers are not generic as the UUI= D. >=20 > >=20 > > If you are worried about losing meaningful name here, we can create= a sysfs file > > to capture the vendor device description if you like. > >=20 >=20 > Having the vgpu name descriptive is more informative imo. User can si= mply check > sysfs names to know raw information w/o relying on 3rd party agent to= query=20 > information around an opaque UUID. >=20 You are actually arguing against your own design here, unfortunately. I= f you look at your design carefully, it is your design actually require to ha= ve a 3rd party code to figure out the VM and virtual gpu device relation as it i= s never documented in the sysfs.=20 In our current design, it doesn't require any 3rd party agent as the VM= UUID is part of the QEMU command already, and the VM UUID is already embedded w= ithin the virtual device path. Also, it doesn't require 3rd party to retrieve information as the virtu= al device will just be a directory, we will have another file within each virtual= gpu device directory, you can always cat the file to retrieve vendor inform= ation. Let's use the UUID-$vgpu_idx as the virtual device directory name plus = a vendor description file within that directory, so we don't lose any additional information, also capture the VM and virtual device relation. Thanks, Neo > That's why I prefer to making UUID optional. By default vgpu name wou= ld be > some description string, and when UUID is provided, UUID can be appen= ded > to the string to serve your purpose. >=20 > Thanks > Kevin >=20