From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51803) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVuoC-00059a-Oy for qemu-devel@nongnu.org; Wed, 17 Feb 2016 00:37:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aVuo9-0004X3-Do for qemu-devel@nongnu.org; Wed, 17 Feb 2016 00:37:52 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:2588) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVuo9-0004Wz-39 for qemu-devel@nongnu.org; Wed, 17 Feb 2016 00:37:49 -0500 Date: Tue, 16 Feb 2016 21:37:42 -0800 From: Neo Jia Message-ID: <20160217053742.GA8839@nvidia.com> References: <20160216071304.GA6867@nvidia.com> <20160216073647.GB6867@nvidia.com> <20160216075310.GC6867@nvidia.com> <20160216084855.GA7717@nvidia.com> <20160217041743.GA7903@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: Subject: Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Tian, Kevin" Cc: "Ruan, Shuai" , "Song, Jike" , "kvm@vger.kernel.org" , Kirti Wankhede , qemu-devel , Alex Williamson , Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" On Wed, Feb 17, 2016 at 05:04:31AM +0000, Tian, Kevin wrote: > > From: Neo Jia > > Sent: Wednesday, February 17, 2016 12:18 PM > >=20 > > On Wed, Feb 17, 2016 at 03:31:24AM +0000, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > Sent: Tuesday, February 16, 2016 4:49 PM > > > > > > > > On Tue, Feb 16, 2016 at 08:10:42AM +0000, Tian, Kevin wrote: > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > Sent: Tuesday, February 16, 2016 3:53 PM > > > > > > > > > > > > On Tue, Feb 16, 2016 at 07:40:47AM +0000, Tian, Kevin wrote: > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > > Sent: Tuesday, February 16, 2016 3:37 PM > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +0000, Tian, Kevin wrote= : > > > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM > > > > > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +0000, Tian, Kevin w= rote: > > > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redha= t.com] > > > > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann wr= ote: > > > > > > > > > > > > > =A0 Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > Actually I have a long puzzle in this area. Def= initely libvirt will > > use > > > > UUID > > > > > > to > > > > > > > > > > > > > > mark a VM. And obviously UUID is not recorded w= ithin KVM. > > Then > > > > how > > > > > > does > > > > > > > > > > > > > > libvirt talk to KVM based on UUID? It could be = a good reference > > to > > > > this > > > > > > design. > > > > > > > > > > > > > > > > > > > > > > > > > > libvirt keeps track which qemu instance belongs t= o which vm. > > > > > > > > > > > > > qemu also gets started with "-uuid ...", so one c= an query qemu > > via > > > > > > > > > > > > > monitor ("info uuid") to figure what the uuid is.= =A0=A0It is also in the > > > > > > > > > > > > > smbios tables so the guest can see it in the syst= em information > > table. > > > > > > > > > > > > > > > > > > > > > > > > > > The uuid is not visible to the kernel though, the= kvm kernel driver > > > > > > > > > > > > > doesn't know what the uuid is (and neither does v= fio).=A0=A0qemu uses > > > > file > > > > > > > > > > > > > handles to talk to both kvm and vfio.=A0=A0qemu n= otifies both kvm > > and > > > > vfio > > > > > > > > > > > > > about anything relevant events (guest address spa= ce changes > > etc) > > > > and > > > > > > > > > > > > > connects file descriptors (eventfd -> irqfd). > > > > > > > > > > > > > > > > > > > > > > > > I think the original link to using a VM UUID for th= e vGPU comes from > > > > > > > > > > > > NVIDIA having a userspace component which might get= launched > > from > > > > a udev > > > > > > > > > > > > event as the vGPU is created or the set of vGPUs wi= thin that UUID > > is > > > > > > > > > > > > started.=A0=A0Using the VM UUID then gives them a w= ay to associate > > that > > > > > > > > > > > > userspace process with a VM instance.=A0=A0Maybe it= could register with > > > > > > > > > > > > libvirt for some sort of service provided for the V= M, I don't know. > > > > > > > > > > > > > > > > > > > > > > Intel doesn't have this requirement. It should be eno= ugh as long as > > > > > > > > > > > libvirt maintains which sysfs vgpu node is associated= to a VM UUID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > qemu needs a sysfs node as handle to the vfio dev= ice, something > > > > > > > > > > > > > like /sys/devices/virtual/vgpu/.=A0=A0 can be a uuid > > if > > > > you > > > > > > want > > > > > > > > > > > > > have it that way, but it could be pretty much any= thing.=A0=A0The sysfs > > node > > > > > > > > > > > > > will probably show up as-is in the libvirt xml wh= en assign a vgpu > > to > > > > a > > > > > > > > > > > > > vm.=A0=A0So the name should be something stable (= i.e. when using > > a uuid > > > > as > > > > > > > > > > > > > name you should better not generate a new one on = each boot). > > > > > > > > > > > > > > > > > > > > > > > > Actually I don't think there's really a persistent = naming issue, that's > > > > > > > > > > > > probably where we diverge from the SR-IOV model.=A0= =A0SR-IOV cannot > > > > > > > > > > > > dynamically add a new VF, it needs to reset the num= ber of VFs to > > zero, > > > > > > > > > > > > then re-allocate all of them up to the new desired = count.=A0=A0That has > > some > > > > > > > > > > > > obvious implications.=A0=A0I think with both vendor= s here, we can > > > > > > > > > > > > dynamically allocate new vGPUs, so I would expect t= hat libvirt would > > > > > > > > > > > > create each vGPU instance as it's needed.=A0=A0None= would be created > > by > > > > > > > > > > > > default without user interaction. > > > > > > > > > > > > > > > > > > > > > > > > Personally I think using a UUID makes sense, but it= needs to be > > > > > > > > > > > > userspace policy whether that UUID has any implicit= meaning like > > > > > > > > > > > > matching the VM UUID.=A0=A0Having an index within a= UUID bothers me > > a > > > > bit, > > > > > > > > > > > > but it doesn't seem like too much of a concession t= o enable the use > > case > > > > > > > > > > > > that NVIDIA is trying to achieve.=A0=A0Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would prefer to making UUID an optional parameter, = while not tieing > > > > > > > > > > > sysfs vgpu naming to UUID. This would be more flexibl= e to different > > > > > > > > > > > scenarios where UUID might not be required. > > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > > > > > Happy Chinese New Year! > > > > > > > > > > > > > > > > > > > > I think having UUID as the vgpu device name will allow = us to have an > > gpu > > > > vendor > > > > > > > > > > agnostic solution for the upper layer software stack su= ch as QEMU, who > > is > > > > > > > > > > supposed to open the device. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Qemu can use whatever sysfs path provided to open the dev= ice, regardless > > > > > > > > > of whether there is an UUID within the path... > > > > > > > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > Then it will provide even more benefit of using UUID as lib= virt can be > > > > > > > > implemented as gpu vendor agnostic, right? :-) > > > > > > > > > > > > > > > > The UUID can be VM UUID or vGPU group object UUID which rea= lly depends > > on > > > > the > > > > > > > > high level software stack, again the benefit is gpu vendor = agnostic. > > > > > > > > > > > > > > > > > > > > > > There is case where libvirt is not used while another mgmt. s= tack doesn't use > > > > > > > UUID, e.g. in some Xen scenarios. So it's not about GPU vendo= r agnostic. It's > > > > > > > about high level mgmt. stack agnostic. That's why we need mak= e UUID as > > > > > > > optional in this vGPU-core framework. > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > As long as you have to create an object to represent vGPU or vG= PU group, you > > > > > > will have UUID, no matter which management stack you are going = to use. > > > > > > > > > > > > UUID is the most agnostic way to represent an object, I think. > > > > > > > > > > > > (a bit off topic since we are supposed to focus on VFIO on KVM) > > > > > > > > > > > > Since now you are talking about Xen, I am very happy to discuss= that with you. > > > > > > You can check how Xen has managed its object via UUID in xapi. > > > > > > > > > > > > > > > > Well, I'm not the expert in this area. IMHO UUID is just an user = level > > > > > attribute, which can be associated to any sysfs node and managed = by > > > > > mgmt. stack itself, and then the sysfs path can be opened as the > > > > > bridge between user/kernel. I don't understand the necessity of b= inding > > > > > UUID internally within vGPU core framework here. Alex gave one ex= ample > > > > > of udev, but I didn't quite catch why only UUID can work there. M= aybe > > > > > you can elaborate that requirement. > > > > > > > > Hi Kevin, > > > > > > > > UUID is just a way to represent an object. > > > > > > > > It is not binding, it is just a representation. I think here we are= just > > > > creating a convenient and generic way to represent a virtual gpu de= vice on > > > > sysfs. > > > > > > > > Having the UUID as part of the virtual gpu device name allows us ea= sily find out > > > > the mapping. > > > > > > > > UUID can be anything, you can always use an UUID to present VMID in= the example > > > > you listed below, so you are actually gaining flexibility by using = UUID instead > > > > of VMID as it can be supported by both KVM and Xen. :-) > > > > > > > > Thanks, > > > > Neo > > > > > > > > > > Thanks Neo. I understand UUID has its merit in many usages. As you > > > may see from my earlier reply, my main concern is whether it's a must > > > to record this information within kernel vGPU-core framework. We can > > > still make it hypervisor agnostic even not using UUID, as long as the= re's > > > a unified namespace created for all vgpus, like: > > > vgpu-vendor-0, vgpu-vendor-1, ... > > > > > > Then high level mgmt. stack can associate UUID to that namespace. So > > > I hope you can help elaborate below description: > > > > >=20 > > > > Having the UUID as part of the virtual gpu device name allows us ea= sily find out > > > > the mapping. > > > > >=20 > > Hi Kevin, > >=20 > > The answer is simple, having a UUID as part of the device name will giv= e you a > > unique sysfs path that will be opened by QEMU. > >=20 > > vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have multi= ple > > virtual gpu devices per VM coming from same or different physical devic= es. >=20 > That is not a problem. We can add physical device info too like vgpu-vend= or-0-0, > vgpu-vendor-1-0, ... >=20 > Please note Qemu doesn't care about the actual name. It just accepts a sy= sfs path > to open. Hi Kevin, No, I think you are making things even more complicated than it is required= , also it is not generic anymore as you are requiring the QEMU to know more t= han he needs to. The way you name those devices will require QEMU to know the relation between virtual devices and physical devices. I don't think that is good. My flow is like this: libvirt creats a VM object, it will have a UUID. then it will use the UUID = to create virtual gpu devices, then it will pass the UUID to the QEMU (actuall= y QEMU already has the VM UUID), then it will just open up the unique path. Also, you need to consider those 0-0 numbers are not generic as the UUID. >=20 > >=20 > > If you are worried about losing meaningful name here, we can create a s= ysfs file > > to capture the vendor device description if you like. > >=20 >=20 > Having the vgpu name descriptive is more informative imo. User can simply= check > sysfs names to know raw information w/o relying on 3rd party agent to que= ry=20 > information around an opaque UUID. >=20 You are actually arguing against your own design here, unfortunately. If yo= u look at your design carefully, it is your design actually require to have a= 3rd party code to figure out the VM and virtual gpu device relation as it is never documented in the sysfs.=20 In our current design, it doesn't require any 3rd party agent as the VM UUI= D is part of the QEMU command already, and the VM UUID is already embedded withi= n the virtual device path. Also, it doesn't require 3rd party to retrieve information as the virtual d= evice will just be a directory, we will have another file within each virtual gpu device directory, you can always cat the file to retrieve vendor informatio= n. Let's use the UUID-$vgpu_idx as the virtual device directory name plus a ve= ndor description file within that directory, so we don't lose any additional information, also capture the VM and virtual device relation. Thanks, Neo > That's why I prefer to making UUID optional. By default vgpu name would b= e > some description string, and when UUID is provided, UUID can be appended > to the string to serve your purpose. >=20 > Thanks > Kevin >=20