From mboxrd@z Thu Jan  1 00:00:00 1970
From: Neo Jia <cjia@nvidia.com>
Subject: Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide
 common interface for vGPU.
Date: Tue, 16 Feb 2016 21:37:42 -0800
Message-ID: <20160217053742.GA8839@nvidia.com>
References: <20160216071304.GA6867@nvidia.com>
 <AADFC41AFE54684AB9EE6CBC0274A5D15F7B3633@SHSMSX101.ccr.corp.intel.com>
 <20160216073647.GB6867@nvidia.com>
 <AADFC41AFE54684AB9EE6CBC0274A5D15F7B3659@SHSMSX101.ccr.corp.intel.com>
 <20160216075310.GC6867@nvidia.com>
 <AADFC41AFE54684AB9EE6CBC0274A5D15F7B3709@SHSMSX101.ccr.corp.intel.com>
 <20160216084855.GA7717@nvidia.com>
 <AADFC41AFE54684AB9EE6CBC0274A5D15F7B51AE@SHSMSX101.ccr.corp.intel.com>
 <20160217041743.GA7903@nvidia.com>
 <AADFC41AFE54684AB9EE6CBC0274A5D15F7B565E@SHSMSX101.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>,
	"Song, Jike" <jike.song@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from hqemgate14.nvidia.com ([216.228.121.143]:2586 "EHLO
	hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751582AbcBQFhs convert rfc822-to-8bit (ORCPT
	<rfc822;kvm@vger.kernel.org>); Wed, 17 Feb 2016 00:37:48 -0500
Content-Disposition: inline
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D15F7B565E@SHSMSX101.ccr.corp.intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Wed, Feb 17, 2016 at 05:04:31AM +0000, Tian, Kevin wrote:
> > From: Neo Jia
> > Sent: Wednesday, February 17, 2016 12:18 PM
> >=20
> > On Wed, Feb 17, 2016 at 03:31:24AM +0000, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > Sent: Tuesday, February 16, 2016 4:49 PM
> > > >
> > > > On Tue, Feb 16, 2016 at 08:10:42AM +0000, Tian, Kevin wrote:
> > > > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > > > Sent: Tuesday, February 16, 2016 3:53 PM
> > > > > >
> > > > > > On Tue, Feb 16, 2016 at 07:40:47AM +0000, Tian, Kevin wrote=
:
> > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > > > > > Sent: Tuesday, February 16, 2016 3:37 PM
> > > > > > > >
> > > > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +0000, Tian, Kevin w=
rote:
> > > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +0000, Tian, Kev=
in wrote:
> > > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@r=
edhat.com]
> > > > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffman=
n wrote:
> > > > > > > > > > > > > =A0 Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Actually I have a long puzzle in this area.=
 Definitely libvirt will
> > use
> > > > UUID
> > > > > > to
> > > > > > > > > > > > > > mark a VM. And obviously UUID is not record=
ed within KVM.
> > Then
> > > > how
> > > > > > does
> > > > > > > > > > > > > > libvirt talk to KVM based on UUID? It could=
 be a good reference
> > to
> > > > this
> > > > > > design.
> > > > > > > > > > > > >
> > > > > > > > > > > > > libvirt keeps track which qemu instance belon=
gs to which vm.
> > > > > > > > > > > > > qemu also gets started with "-uuid ...", so o=
ne can query qemu
> > via
> > > > > > > > > > > > > monitor ("info uuid") to figure what the uuid=
 is.=A0=A0It is also in the
> > > > > > > > > > > > > smbios tables so the guest can see it in the =
system information
> > table.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The uuid is not visible to the kernel though,=
 the kvm kernel driver
> > > > > > > > > > > > > doesn't know what the uuid is (and neither do=
es vfio).=A0=A0qemu uses
> > > > file
> > > > > > > > > > > > > handles to talk to both kvm and vfio.=A0=A0qe=
mu notifies both kvm
> > and
> > > > vfio
> > > > > > > > > > > > > about anything relevant events (guest address=
 space changes
> > etc)
> > > > and
> > > > > > > > > > > > > connects file descriptors (eventfd -> irqfd).
> > > > > > > > > > > >
> > > > > > > > > > > > I think the original link to using a VM UUID fo=
r the vGPU comes from
> > > > > > > > > > > > NVIDIA having a userspace component which might=
 get launched
> > from
> > > > a udev
> > > > > > > > > > > > event as the vGPU is created or the set of vGPU=
s within that UUID
> > is
> > > > > > > > > > > > started.=A0=A0Using the VM UUID then gives them=
 a way to associate
> > that
> > > > > > > > > > > > userspace process with a VM instance.=A0=A0Mayb=
e it could register with
> > > > > > > > > > > > libvirt for some sort of service provided for t=
he VM, I don't know.
> > > > > > > > > > >
> > > > > > > > > > > Intel doesn't have this requirement. It should be=
 enough as long as
> > > > > > > > > > > libvirt maintains which sysfs vgpu node is associ=
ated to a VM UUID.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > qemu needs a sysfs node as handle to the vfio=
 device, something
> > > > > > > > > > > > > like /sys/devices/virtual/vgpu/<name>.=A0=A0<=
name> can be a uuid
> > if
> > > > you
> > > > > > want
> > > > > > > > > > > > > have it that way, but it could be pretty much=
 anything.=A0=A0The sysfs
> > node
> > > > > > > > > > > > > will probably show up as-is in the libvirt xm=
l when assign a vgpu
> > to
> > > > a
> > > > > > > > > > > > > vm.=A0=A0So the name should be something stab=
le (i.e. when using
> > a uuid
> > > > as
> > > > > > > > > > > > > name you should better not generate a new one=
 on each boot).
> > > > > > > > > > > >
> > > > > > > > > > > > Actually I don't think there's really a persist=
ent naming issue, that's
> > > > > > > > > > > > probably where we diverge from the SR-IOV model=
=2E=A0=A0SR-IOV cannot
> > > > > > > > > > > > dynamically add a new VF, it needs to reset the=
 number of VFs to
> > zero,
> > > > > > > > > > > > then re-allocate all of them up to the new desi=
red count.=A0=A0That has
> > some
> > > > > > > > > > > > obvious implications.=A0=A0I think with both ve=
ndors here, we can
> > > > > > > > > > > > dynamically allocate new vGPUs, so I would expe=
ct that libvirt would
> > > > > > > > > > > > create each vGPU instance as it's needed.=A0=A0=
None would be created
> > by
> > > > > > > > > > > > default without user interaction.
> > > > > > > > > > > >
> > > > > > > > > > > > Personally I think using a UUID makes sense, bu=
t it needs to be
> > > > > > > > > > > > userspace policy whether that UUID has any impl=
icit meaning like
> > > > > > > > > > > > matching the VM UUID.=A0=A0Having an index with=
in a UUID bothers me
> > a
> > > > bit,
> > > > > > > > > > > > but it doesn't seem like too much of a concessi=
on to enable the use
> > case
> > > > > > > > > > > > that NVIDIA is trying to achieve.=A0=A0Thanks,
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I would prefer to making UUID an optional paramet=
er, while not tieing
> > > > > > > > > > > sysfs vgpu naming to UUID. This would be more fle=
xible to different
> > > > > > > > > > > scenarios where UUID might not be required.
> > > > > > > > > >
> > > > > > > > > > Hi Kevin,
> > > > > > > > > >
> > > > > > > > > > Happy Chinese New Year!
> > > > > > > > > >
> > > > > > > > > > I think having UUID as the vgpu device name will al=
low us to have an
> > gpu
> > > > vendor
> > > > > > > > > > agnostic solution for the upper layer software stac=
k such as QEMU, who
> > is
> > > > > > > > > > supposed to open the device.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Qemu can use whatever sysfs path provided to open the=
 device, regardless
> > > > > > > > > of whether there is an UUID within the path...
> > > > > > > > >
> > > > > > > >
> > > > > > > > Hi Kevin,
> > > > > > > >
> > > > > > > > Then it will provide even more benefit of using UUID as=
 libvirt can be
> > > > > > > > implemented as gpu vendor agnostic, right? :-)
> > > > > > > >
> > > > > > > > The UUID can be VM UUID or vGPU group object UUID which=
 really depends
> > on
> > > > the
> > > > > > > > high level software stack, again the benefit is gpu ven=
dor agnostic.
> > > > > > > >
> > > > > > >
> > > > > > > There is case where libvirt is not used while another mgm=
t. stack doesn't use
> > > > > > > UUID, e.g. in some Xen scenarios. So it's not about GPU v=
endor agnostic. It's
> > > > > > > about high level mgmt. stack agnostic. That's why we need=
 make UUID as
> > > > > > > optional in this vGPU-core framework.
> > > > > >
> > > > > > Hi Kevin,
> > > > > >
> > > > > > As long as you have to create an object to represent vGPU o=
r vGPU group, you
> > > > > > will have UUID, no matter which management stack you are go=
ing to use.
> > > > > >
> > > > > > UUID is the most agnostic way to represent an object, I thi=
nk.
> > > > > >
> > > > > > (a bit off topic since we are supposed to focus on VFIO on =
KVM)
> > > > > >
> > > > > > Since now you are talking about Xen, I am very happy to dis=
cuss that with you.
> > > > > > You can check how Xen has managed its object via UUID in xa=
pi.
> > > > > >
> > > > >
> > > > > Well, I'm not the expert in this area. IMHO UUID is just an u=
ser level
> > > > > attribute, which can be associated to any sysfs node and mana=
ged by
> > > > > mgmt. stack itself, and then the sysfs path can be opened as =
the
> > > > > bridge between user/kernel. I don't understand the necessity =
of binding
> > > > > UUID internally within vGPU core framework here. Alex gave on=
e example
> > > > > of udev, but I didn't quite catch why only UUID can work ther=
e. Maybe
> > > > > you can elaborate that requirement.
> > > >
> > > > Hi Kevin,
> > > >
> > > > UUID is just a way to represent an object.
> > > >
> > > > It is not binding, it is just a representation. I think here we=
 are just
> > > > creating a convenient and generic way to represent a virtual gp=
u device on
> > > > sysfs.
> > > >
> > > > Having the UUID as part of the virtual gpu device name allows u=
s easily find out
> > > > the <vgpu, vgpu_group or VM> mapping.
> > > >
> > > > UUID can be anything, you can always use an UUID to present VMI=
D in the example
> > > > you listed below, so you are actually gaining flexibility by us=
ing UUID instead
> > > > of VMID as it can be supported by both KVM and Xen. :-)
> > > >
> > > > Thanks,
> > > > Neo
> > > >
> > >
> > > Thanks Neo. I understand UUID has its merit in many usages. As yo=
u
> > > may see from my earlier reply, my main concern is whether it's a =
must
> > > to record this information within kernel vGPU-core framework. We =
can
> > > still make it hypervisor agnostic even not using UUID, as long as=
 there's
> > > a unified namespace created for all vgpus, like:
> > > 	vgpu-vendor-0, vgpu-vendor-1, ...
> > >
> > > Then high level mgmt. stack can associate UUID to that namespace.=
 So
> > > I hope you can help elaborate below description:
> > >
> >=20
> > > > Having the UUID as part of the virtual gpu device name allows u=
s easily find out
> > > > the <vgpu, vgpu_group or VM> mapping.
> > >
> >=20
> > Hi Kevin,
> >=20
> > The answer is simple, having a UUID as part of the device name will=
 give you a
> > unique sysfs path that will be opened by QEMU.
> >=20
> > vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have m=
ultiple
> > virtual gpu devices per VM coming from same or different physical d=
evices.
>=20
> That is not a problem. We can add physical device info too like vgpu-=
vendor-0-0,
> vgpu-vendor-1-0, ...
>=20
> Please note Qemu doesn't care about the actual name. It just accepts =
a sysfs path
> to open.

Hi Kevin,

No, I think you are making things even more complicated than it is requ=
ired,
also it is not generic anymore as you are requiring the QEMU to know mo=
re than
he needs to.

The way you name those devices will require QEMU to know the relation
between virtual devices and physical devices. I don't think that is goo=
d.

My flow is like this:

libvirt creats a VM object, it will have a UUID. then it will use the U=
UID to
create virtual gpu devices, then it will pass the UUID to the QEMU (act=
ually
QEMU already has the VM UUID), then it will just open up the unique pat=
h.

Also, you need to consider those 0-0 numbers are not generic as the UUI=
D.
>=20
> >=20
> > If you are worried about losing meaningful name here, we can create=
 a sysfs file
> > to capture the vendor device description if you like.
> >=20
>=20
> Having the vgpu name descriptive is more informative imo. User can si=
mply check
> sysfs names to know raw information w/o relying on 3rd party agent to=
 query=20
> information around an opaque UUID.
>=20

You are actually arguing against your own design here, unfortunately. I=
f you
look at your design carefully, it is your design actually require to ha=
ve a 3rd
party code to figure out the VM and virtual gpu device relation as it i=
s
never documented in the sysfs.=20

In our current design, it doesn't require any 3rd party agent as the VM=
 UUID is
part of the QEMU command already, and the VM UUID is already embedded w=
ithin the
virtual device path.

Also, it doesn't require 3rd party to retrieve information as the virtu=
al device
will just be a directory, we will have another file within each virtual=
 gpu
device directory, you can always cat the file to retrieve vendor inform=
ation.

Let's use the UUID-$vgpu_idx as the virtual device directory name plus =
a vendor
description file within that directory, so we don't lose any additional
information, also capture the VM and virtual device relation.

Thanks,
Neo

> That's why I prefer to making UUID optional. By default vgpu name wou=
ld be
> some description string, and when UUID is provided, UUID can be appen=
ded
> to the string to serve your purpose.
>=20
> Thanks
> Kevin
>=20