From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52193) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bgDrB-0007Ht-Af for qemu-devel@nongnu.org; Sat, 03 Sep 2016 12:31:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bgDr7-0005nI-5q for qemu-devel@nongnu.org; Sat, 03 Sep 2016 12:31:48 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:17967) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bgDr6-0005nC-Qs for qemu-devel@nongnu.org; Sat, 03 Sep 2016 12:31:45 -0400 References: <1472097235-6332-1-git-send-email-kwankhede@nvidia.com> <20160830101638.49df467d@t450s.home> <78fedd65-6d62-e849-ff3b-d5105b2da816@redhat.com> <20160901105948.62f750aa@t450s.home> <98bbdbbf-c388-9120-3306-64f0cfb820a7@nvidia.com> <8682faeb-0331-f014-c13e-03c20f3f2bdf@redhat.com> <9863c9f8-77fd-61e8-708c-a6747dcd64ea@redhat.com> From: Kirti Wankhede Message-ID: Date: Sat, 3 Sep 2016 22:01:13 +0530 MIME-Version: 1.0 In-Reply-To: <9863c9f8-77fd-61e8-708c-a6747dcd64ea@redhat.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [libvirt] [PATCH v7 0/4] Add Mediated device support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Ferlan , Paolo Bonzini , Michal Privoznik , Alex Williamson Cc: "Song, Jike" , "cjia@nvidia.com" , "kvm@vger.kernel.org" , "libvir-list@redhat.com" , "Tian, Kevin" , "qemu-devel@nongnu.org" , "kraxel@redhat.com" , Laine Stump , "bjsdjshi@linux.vnet.ibm.com" On 9/3/2016 1:59 AM, John Ferlan wrote: > > > On 09/02/2016 02:33 PM, Kirti Wankhede wrote: >> >> On 9/2/2016 10:55 PM, Paolo Bonzini wrote: >>> >>> >>> On 02/09/2016 19:15, Kirti Wankhede wrote: >>>> On 9/2/2016 3:35 PM, Paolo Bonzini wrote: >>>>> >>>>> my-vgpu >>>>> pci_0000_86_00_0 >>>>> >>>>> >>>>> 0695d332-7831-493f-9e71-1c85c8911a08 >>>>> >>>>> >>>>> >>>>> After creating the vGPU, if required by the host driver, all the other >>>>> type ids would disappear from "virsh nodedev-dumpxml pci_0000_86_00_0" too. >>>> >>>> Thanks Paolo for details. >>>> 'nodedev-create' parse the xml file and accordingly write to 'create' >>>> file in sysfs to create mdev device. Right? >>>> At this moment, does libvirt know which VM this device would be >>>> associated with? >>> >>> No, the VM will associate to the nodedev through the UUID. The nodedev >>> is created separately from the VM. >>> >>>>> When dumping the mdev with nodedev-dumpxml, it could show more complete >>>>> info, again taken from sysfs: >>>>> >>>>> >>>>> my-vgpu >>>>> pci_0000_86_00_0 >>>>> >>>>> 0695d332-7831-493f-9e71-1c85c8911a08 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ... >>>>> NVIDIA >>>>> >>>>> >>>>> >>>>> >>>>> Notice how the parent has mdev inside pci; the vGPU, if it has to have >>>>> pci at all, would have it inside mdev. This represents the difference >>>>> between the mdev provider and the mdev device. >>>> >>>> Parent of mdev device might not always be a PCI device. I think we >>>> shouldn't consider it as PCI capability. >>> >>> The in the vGPU means that it _will_ be exposed >>> as a PCI device by VFIO. >>> >>> The in the physical GPU means that the GPU is a >>> PCI device. >>> >> >> Ok. Got that. >> >>>>> Random proposal for the domain XML too: >>>>> >>>>> >>>>> >>>>> >>>>> 0695d332-7831-493f-9e71-1c85c8911a08 >>>>> >>>>>
>>>>> >>>>> >>>> >>>> When user wants to assign two mdev devices to one VM, user have to add >>>> such two entries or group the two devices in one entry? >>> >>> Two entries, one per UUID, each with its own PCI address in the guest. >>> >>>> On other mail thread with same subject we are thinking of creating group >>>> of mdev devices to assign multiple mdev devices to one VM. >>> >>> What is the advantage in managing mdev groups? (Sorry didn't follow the >>> other thread). >>> >> >> When mdev device is created, resources from physical device is assigned >> to this device. But resources are committed only when device goes >> 'online' ('start' in v6 patch) >> In case of multiple vGPUs in a VM for Nvidia vGPU solution, resources >> for all vGPU devices in a VM are committed at one place. So we need to >> know the vGPUs assigned to a VM before QEMU starts. >> >> Grouping would help here as Alex suggested in that mail. Pulling only >> that part of discussion here: >> >> It seems then that the grouping needs to affect the iommu group >> so that >>> you know that there's only a single owner for all the mdev devices >>> within the group. IIRC, the bus drivers don't have any visibility >>> to opening and releasing of the group itself to trigger the >>> online/offline, but they can track opening of the device file >>> descriptors within the group. Within the VFIO API the user cannot >>> access the device without the device file descriptor, so a "first >>> device opened" and "last device closed" trigger would provide the >>> trigger points you need. Some sort of new sysfs interface would need >>> to be invented to allow this sort of manipulation. >>> Also we should probably keep sight of whether we feel this is >>> sufficiently necessary for the complexity. If we can get by with only >>> doing this grouping at creation time then we could define the "create" >>> interface in various ways. For example: >>> >>> echo $UUID0 > create >>> >>> would create a single mdev named $UUID0 in it's own group. >>> >>> echo {$UUID0,$UUID1} > create >>> >>> could create mdev devices $UUID0 and $UUID1 grouped together. >>> >> >> >> >> I think this would create mdev device of same type on same parent >> device. We need to consider the case of multiple mdev devices of >> different types and with different parents to be grouped together. >> >> >> We could even do: >>> >>> echo $UUID1:$GROUPA > create >>> >>> where $GROUPA is the group ID of a previously created mdev device into >>> which $UUID1 is to be created and added to the same group. >> >> >> >> I was thinking about: >> >> echo $UUID0 > create >> >> would create mdev device >> >> echo $UUID0 > /sys/class/mdev/create_group >> >> would add created device to group. >> >> For multiple devices case: >> echo $UUID0 > create >> echo $UUID1 > create >> >> would create mdev devices which could be of different types and >> different parents. >> echo $UUID0, $UUID1 > /sys/class/mdev/create_group >> >> would add devices in a group. >> Mdev core module would create a new group with unique number. On mdev >> device 'destroy' that mdev device would be removed from the group. When >> there are no devices left in the group, group would be deleted. With >> this "first device opened" and "last device closed" trigger can be used >> to commit resources. >> Then libvirt use mdev device path to pass as argument to QEMU, same as >> it does for VFIO. Libvirt don't have to care about group number. >> >> > > The more complicated one makes this, the more difficult it is for the > customer to configure and the more difficult it is and the longer it > takes to get something out. I didn't follow the details of groups... > > What gets created from a pass through some *mdev/create_group? My proposal here is, on echo $UUID1, $UUID2 > /sys/class/mdev/create_group would create a group in mdev core driver, which should be internal to mdev core module. In mdev core module, a unique group number would be saved in mdev_device structure for each device belonging to a that group. > Does > some new udev device get create that then is fed to the guest? No, group is not a device. It will be like a identifier for the use of vendor driver to identify devices in a group. > Seems > painful to make two distinct/async passes through systemd/udev. I > foresee testing nightmares with creating 3 vGPU's, processing a group > request, while some other process/thread is deleting a vGPU... How do > the vGPU's get marked so that the delete cannot happen. > How is the same case handled for direct assigned device? I mean a device is unbound from its vendors driver, bound to vfio_pci device. How is it guaranteed to be assigned to vfio_pci module? some other process/thread might unbound it from vfio_pci module? > If a vendor wants to create their own utility to group vHBA's together > and manage that grouping, then have at it... Doesn't seem to be > something libvirt needs to be or should be managing... As I go running > for cover... > > If having multiple types generated for a single vGPU, then consider the > following XML: > > > > > > [...] > > > then perhaps building the mdev_create input would be a comma separated > list of type's to be added... "$UUID:11,11,12". Just a thought... > In that case the vGPUs are created on same physical GPUs. Consider the case two vGPUs on different physical devices need to be assigned to a VM. Then those should be two different create commands: echo $UUID0 > /sys/..//mdev_create echo $UUID1 > /sys/..//mdev_create Kirti. > > John > >> Thanks, >> Kirti >> >> -- >> libvir-list mailing list >> libvir-list@redhat.com >> https://www.redhat.com/mailman/listinfo/libvir-list >>