qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kirti Wankhede <kwankhede@nvidia.com>
To: John Ferlan <jferlan@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Michal Privoznik <mprivozn@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: "Song, Jike" <jike.song@intel.com>,
	"cjia@nvidia.com" <cjia@nvidia.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	Laine Stump <laine@redhat.com>,
	"bjsdjshi@linux.vnet.ibm.com" <bjsdjshi@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [libvirt] [PATCH v7 0/4] Add Mediated device support
Date: Sat, 3 Sep 2016 22:01:13 +0530	[thread overview]
Message-ID: <d2dee16c-1c69-159f-dac8-42ff90ad15cd@nvidia.com> (raw)
In-Reply-To: <9863c9f8-77fd-61e8-708c-a6747dcd64ea@redhat.com>



On 9/3/2016 1:59 AM, John Ferlan wrote:
> 
> 
> On 09/02/2016 02:33 PM, Kirti Wankhede wrote:
>>
>> On 9/2/2016 10:55 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 02/09/2016 19:15, Kirti Wankhede wrote:
>>>> On 9/2/2016 3:35 PM, Paolo Bonzini wrote:
>>>>>    <device>
>>>>>      <name>my-vgpu</name>
>>>>>      <parent>pci_0000_86_00_0</parent>
>>>>>      <capability type='mdev'>
>>>>>        <type id='11'/>
>>>>>        <uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
>>>>>      </capability>
>>>>>    </device>
>>>>>
>>>>> After creating the vGPU, if required by the host driver, all the other
>>>>> type ids would disappear from "virsh nodedev-dumpxml pci_0000_86_00_0" too.
>>>>
>>>> Thanks Paolo for details.
>>>> 'nodedev-create' parse the xml file and accordingly write to 'create'
>>>> file in sysfs to create mdev device. Right?
>>>> At this moment, does libvirt know which VM this device would be
>>>> associated with?
>>>
>>> No, the VM will associate to the nodedev through the UUID.  The nodedev
>>> is created separately from the VM.
>>>
>>>>> When dumping the mdev with nodedev-dumpxml, it could show more complete
>>>>> info, again taken from sysfs:
>>>>>
>>>>>    <device>
>>>>>      <name>my-vgpu</name>
>>>>>      <parent>pci_0000_86_00_0</parent>
>>>>>      <capability type='mdev'>
>>>>>        <uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
>>>>>        <!-- only the chosen type -->
>>>>>        <type id='11'>
>>>>>          <!-- ... snip ... -->
>>>>>        </type>
>>>>>        <capability type='pci'>
>>>>>          <!-- no domain/bus/slot/function of course -->
>>>>>          <!-- could show whatever PCI IDs are seen by the guest: -->
>>>>>          <product id='...'>...</product>
>>>>>          <vendor id='0x10de'>NVIDIA</vendor>
>>>>>        </capability>
>>>>>      </capability>
>>>>>    </device>
>>>>>
>>>>> Notice how the parent has mdev inside pci; the vGPU, if it has to have
>>>>> pci at all, would have it inside mdev.  This represents the difference
>>>>> between the mdev provider and the mdev device.
>>>>
>>>> Parent of mdev device might not always be a PCI device. I think we
>>>> shouldn't consider it as PCI capability.
>>>
>>> The <capability type='pci'> in the vGPU means that it _will_ be exposed
>>> as a PCI device by VFIO.
>>>
>>> The <capability type='pci'> in the physical GPU means that the GPU is a
>>> PCI device.
>>>
>>
>> Ok. Got that.
>>
>>>>> Random proposal for the domain XML too:
>>>>>
>>>>>   <hostdev mode='subsystem' type='pci'>
>>>>>     <source type='mdev'>
>>>>>       <!-- possible alternative to uuid: <name>my-vgpu</name> ?!? -->
>>>>>       <uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
>>>>>     </source>
>>>>>     <address type='pci' bus='0' slot='2' function='0'/>
>>>>>   </hostdev>
>>>>>
>>>>
>>>> When user wants to assign two mdev devices to one VM, user have to add
>>>> such two entries or group the two devices in one entry?
>>>
>>> Two entries, one per UUID, each with its own PCI address in the guest.
>>>
>>>> On other mail thread with same subject we are thinking of creating group
>>>> of mdev devices to assign multiple mdev devices to one VM.
>>>
>>> What is the advantage in managing mdev groups?  (Sorry didn't follow the
>>> other thread).
>>>
>>
>> When mdev device is created, resources from physical device is assigned
>> to this device. But resources are committed only when device goes
>> 'online' ('start' in v6 patch)
>> In case of multiple vGPUs in a VM for Nvidia vGPU solution, resources
>> for all vGPU devices in a VM are committed at one place. So we need to
>> know the vGPUs assigned to a VM before QEMU starts.
>>
>> Grouping would help here as Alex suggested in that mail. Pulling only
>> that part of discussion here:
>>
>> <Alex> It seems then that the grouping needs to affect the iommu group
>> so that
>>> you know that there's only a single owner for all the mdev devices
>>> within the group.  IIRC, the bus drivers don't have any visibility
>>> to opening and releasing of the group itself to trigger the
>>> online/offline, but they can track opening of the device file
>>> descriptors within the group.  Within the VFIO API the user cannot
>>> access the device without the device file descriptor, so a "first
>>> device opened" and "last device closed" trigger would provide the
>>> trigger points you need.  Some sort of new sysfs interface would need
>>> to be invented to allow this sort of manipulation.
>>> Also we should probably keep sight of whether we feel this is
>>> sufficiently necessary for the complexity.  If we can get by with only
>>> doing this grouping at creation time then we could define the "create"
>>> interface in various ways.  For example:
>>>
>>> echo $UUID0 > create
>>>
>>> would create a single mdev named $UUID0 in it's own group.
>>>
>>> echo {$UUID0,$UUID1} > create
>>>
>>> could create mdev devices $UUID0 and $UUID1 grouped together.
>>>
>> </Alex>
>>
>> <Kirti>
>> I think this would create mdev device of same type on same parent
>> device. We need to consider the case of multiple mdev devices of
>> different types and with different parents to be grouped together.
>> </Kirti>
>>
>> <Alex> We could even do:
>>>
>>> echo $UUID1:$GROUPA > create
>>>
>>> where $GROUPA is the group ID of a previously created mdev device into
>>> which $UUID1 is to be created and added to the same group.
>> </Alex>
>>
>> <Kirti>
>> I was thinking about:
>>
>>   echo $UUID0 > create
>>
>> would create mdev device
>>
>>   echo $UUID0 > /sys/class/mdev/create_group
>>
>> would add created device to group.
>>
>> For multiple devices case:
>>   echo $UUID0 > create
>>   echo $UUID1 > create
>>
>> would create mdev devices which could be of different types and
>> different parents.
>>   echo $UUID0, $UUID1 > /sys/class/mdev/create_group
>>
>> would add devices in a group.
>> Mdev core module would create a new group with unique number.  On mdev
>> device 'destroy' that mdev device would be removed from the group. When
>> there are no devices left in the group, group would be deleted. With
>> this "first device opened" and "last device closed" trigger can be used
>> to commit resources.
>> Then libvirt use mdev device path to pass as argument to QEMU, same as
>> it does for VFIO. Libvirt don't have to care about group number.
>> </Kirti>
>>
> 
> The more complicated one makes this, the more difficult it is for the
> customer to configure and the more difficult it is and the longer it
> takes to get something out. I didn't follow the details of groups...
> 
> What gets created from a pass through some *mdev/create_group?  

My proposal here is, on
  echo $UUID1, $UUID2 > /sys/class/mdev/create_group
would create a group in mdev core driver, which should be internal to
mdev core module. In mdev core module, a unique group number would be
saved in mdev_device structure for each device belonging to a that group.

> Does
> some new udev device get create that then is fed to the guest?

No, group is not a device. It will be like a identifier for the use of
vendor driver to identify devices in a group.

> Seems
> painful to make two distinct/async passes through systemd/udev. I
> foresee testing nightmares with creating 3 vGPU's, processing a group
> request, while some other process/thread is deleting a vGPU... How do
> the vGPU's get marked so that the delete cannot happen.
> 

How is the same case handled for direct assigned device? I mean a device
is unbound from its vendors driver, bound to vfio_pci device. How is it
guaranteed to be assigned to vfio_pci module? some other process/thread
might unbound it from vfio_pci module?

> If a vendor wants to create their own utility to group vHBA's together
> and manage that grouping, then have at it...  Doesn't seem to be
> something libvirt needs to be or should be managing...  As I go running
> for cover...
> 
> If having multiple types generated for a single vGPU, then consider the
> following XML:
> 
>    <capability type='mdev'>
>      <type id='11' [other attributes]/>
>      <type id='11' [other attributes]/>
>      <type id='12' [other attributes]/>
>      [<uuid>...</uuid>]
>     </capability>
> 
> then perhaps building the mdev_create input would be a comma separated
> list of type's to be added... "$UUID:11,11,12". Just a thought...
> 

In that case the vGPUs are created on same physical GPUs. Consider the
case two vGPUs on different physical devices need to be assigned to a
VM. Then those should be two different create commands:

   echo $UUID0 > /sys/../<bdf1>/mdev_create
   echo $UUID1 > /sys/../<bdf2>/mdev_create

Kirti.
> 
> John
> 
>> Thanks,
>> Kirti
>>
>> --
>> libvir-list mailing list
>> libvir-list@redhat.com
>> https://www.redhat.com/mailman/listinfo/libvir-list
>>

  reply	other threads:[~2016-09-03 16:31 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25  3:53 [Qemu-devel] [PATCH v7 0/4] Add Mediated device support Kirti Wankhede
2016-08-25  3:53 ` [Qemu-devel] [PATCH v7 1/4] vfio: Mediated device Core driver Kirti Wankhede
2016-09-08  8:09   ` Jike Song
2016-09-08  9:38     ` Neo Jia
2016-09-09  6:26       ` Jike Song
2016-09-09 17:48     ` Kirti Wankhede
2016-09-09 18:42       ` Alex Williamson
2016-09-09 19:55         ` Kirti Wankhede
2016-09-12  5:10           ` Jike Song
2016-09-12  7:49             ` Kirti Wankhede
2016-09-12 15:53               ` Alex Williamson
2016-09-19  7:08                 ` Jike Song
2016-09-19 17:29                 ` Kirti Wankhede
2016-09-19 18:11                   ` Alex Williamson
2016-09-19 20:09                     ` Kirti Wankhede
2016-09-19 20:59                       ` Alex Williamson
2016-09-20 12:48   ` Jike Song
2016-08-25  3:53 ` [Qemu-devel] [PATCH v7 2/4] vfio: VFIO driver for mediated devices Kirti Wankhede
2016-08-25  9:22   ` Dong Jia
2016-08-26 14:13     ` Kirti Wankhede
2016-09-08  2:38       ` Jike Song
2016-09-19 18:22       ` Kirti Wankhede
2016-09-19 18:36         ` Alex Williamson
2016-09-19 19:13           ` Kirti Wankhede
2016-09-19 20:03             ` Alex Williamson
2016-09-20  2:50               ` Jike Song
2016-09-20 16:24                 ` Alex Williamson
2016-09-21  3:19                   ` Jike Song
2016-09-21  4:51                     ` Alex Williamson
2016-09-21  5:02                       ` Jike Song
2016-09-08  2:45     ` Jike Song
2016-09-13  2:35       ` Jike Song
2016-09-20  5:48         ` Dong Jia Shi
2016-09-20  6:37           ` Jike Song
2016-09-20 12:53   ` Jike Song
2016-08-25  3:53 ` [Qemu-devel] [PATCH v7 3/4] vfio iommu: Add support " Kirti Wankhede
2016-08-25  7:29   ` Dong Jia
2016-08-26 13:50     ` Kirti Wankhede
2016-09-29  2:17   ` Jike Song
2016-09-29 15:06     ` Kirti Wankhede
2016-09-30  2:58       ` Jike Song
2016-09-30  3:10         ` Jike Song
2016-09-30 11:44           ` Kirti Wankhede
2016-10-08  7:09             ` Jike Song
2016-08-25  3:53 ` [Qemu-devel] [PATCH v7 4/4] docs: Add Documentation for Mediated devices Kirti Wankhede
2016-09-03 16:40   ` Kirti Wankhede
2016-08-30 16:16 ` [Qemu-devel] [PATCH v7 0/4] Add Mediated device support Alex Williamson
2016-08-31  6:12   ` Tian, Kevin
2016-08-31  7:04     ` Jike Song
2016-08-31 15:48       ` Alex Williamson
2016-09-01  4:09         ` Tian, Kevin
2016-09-01  4:10         ` Tian, Kevin
2016-09-01 18:22         ` Kirti Wankhede
2016-09-01 20:01           ` Alex Williamson
2016-09-02  6:17             ` Kirti Wankhede
2016-09-01 16:47     ` Michal Privoznik
2016-09-01 16:59       ` Alex Williamson
2016-09-02  4:48         ` Michal Privoznik
2016-09-02  5:21           ` Kirti Wankhede
2016-09-02 10:05             ` Paolo Bonzini
2016-09-02 17:15               ` Kirti Wankhede
2016-09-02 17:25                 ` Paolo Bonzini
2016-09-02 18:33                   ` Kirti Wankhede
2016-09-02 20:29                     ` [Qemu-devel] [libvirt] " John Ferlan
2016-09-03 16:31                       ` Kirti Wankhede [this message]
2016-09-06 17:54                         ` Alex Williamson
2016-09-02 21:48                     ` [Qemu-devel] " Paolo Bonzini
2016-09-03 11:56                       ` [Qemu-devel] [libvirt] " John Ferlan
2016-09-03 13:07                         ` Paolo Bonzini
2016-09-03 17:47                           ` Kirti Wankhede
2016-09-03 16:34                       ` [Qemu-devel] " Kirti Wankhede
2016-09-06 17:40                         ` Alex Williamson
2016-09-06 19:35                           ` Kirti Wankhede
2016-09-06 21:28                             ` Alex Williamson
2016-09-07  8:22                               ` Tian, Kevin
2016-09-07 16:00                                 ` Alex Williamson
2016-09-07 16:15                               ` Kirti Wankhede
2016-09-07 16:44                                 ` Alex Williamson
2016-09-07 18:06                                   ` Kirti Wankhede
2016-09-07 22:13                                     ` Alex Williamson
2016-09-08 18:48                                       ` Kirti Wankhede
2016-09-08 20:51                                         ` Alex Williamson
2016-09-07 18:17                                   ` Neo Jia
2016-09-07 18:27                                     ` Daniel P. Berrange
2016-09-07 18:32                                       ` Neo Jia
2016-09-07  6:48                           ` Tian, Kevin
2016-09-02 20:19               ` [Qemu-devel] [libvirt] " John Ferlan
2016-09-02 21:44                 ` Paolo Bonzini
2016-09-02 23:57                   ` Laine Stump
2016-09-03 16:49                     ` Kirti Wankhede
2016-09-05  7:52                     ` Paolo Bonzini
2016-09-03 11:57                   ` John Ferlan
2016-09-05  7:54                     ` Paolo Bonzini
2016-09-02 17:55         ` Laine Stump
2016-09-02 19:15           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2dee16c-1c69-159f-dac8-42ff90ad15cd@nvidia.com \
    --to=kwankhede@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=bjsdjshi@linux.vnet.ibm.com \
    --cc=cjia@nvidia.com \
    --cc=jferlan@redhat.com \
    --cc=jike.song@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=laine@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).