From: Alex Williamson <alex.williamson@redhat.com>
To: Neo Jia <cjia@nvidia.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
Kirti Wankhede <kwankhede@nvidia.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"kraxel@redhat.com" <kraxel@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Song, Jike" <jike.song@intel.com>,
"bjsdjshi@linux.vnet.ibm.com" <bjsdjshi@linux.vnet.ibm.com>
Subject: Re: [PATCH v6 1/4] vfio: Mediated device Core driver
Date: Tue, 16 Aug 2016 15:03:59 -0600 [thread overview]
Message-ID: <20160816150359.2bccc1f3@t450s.home> (raw)
In-Reply-To: <20160816061320.GA19918@nvidia.com>
On Mon, 15 Aug 2016 23:13:20 -0700
Neo Jia <cjia@nvidia.com> wrote:
> On Tue, Aug 16, 2016 at 05:58:54AM +0000, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > Sent: Tuesday, August 16, 2016 1:44 PM
> > >
> > > On Tue, Aug 16, 2016 at 04:52:30AM +0000, Tian, Kevin wrote:
> > > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > > Sent: Tuesday, August 16, 2016 12:17 PM
> > > > >
> > > > > On Tue, Aug 16, 2016 at 03:50:44AM +0000, Tian, Kevin wrote:
> > > > > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > > > > Sent: Tuesday, August 16, 2016 11:46 AM
> > > > > > >
> > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +0000, Tian, Kevin wrote:
> > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM
> > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM
> > > in
> > > > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM along with
> > > > > > > > > > > some common resources.
> > > > > > > > > >
> > > > > > > > > > Kirti, can you elaborate the background about above one-shot commit
> > > > > > > > > > requirement? It's hard to understand such a requirement.
> > > > > > > > > >
> > > > > > > > > > As I relied in another mail, I really hope start/stop become a per-mdev
> > > > > > > > > > attribute instead of global one, e.g.:
> > > > > > > > > >
> > > > > > > > > > echo "0/1" >
> > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > > > > > > > > >
> > > > > > > > > > In many scenario the user space client may only want to talk to mdev
> > > > > > > > > > instance directly, w/o need to contact its parent device. Still take
> > > > > > > > > > live migration for example, I don't think Qemu wants to know parent
> > > > > > > > > > device of assigned mdev instances.
> > > > > > > > >
> > > > > > > > > Hi Kevin,
> > > > > > > > >
> > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to
> > > know
> > > > > > > > > parent device. you can just do
> > > > > > > > >
> > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > > > > > >
> > > > > > > > > or
> > > > > > > > >
> > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > > > > > >
> > > > > > > > > without knowing the parent device.
> > > > > > > > >
> > > > > > > >
> > > > > > > > You can look at some existing sysfs example, e.g.:
> > > > > > > >
> > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > > > > > >
> > > > > > > > You may also argue why not using a global style:
> > > > > > > >
> > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > > > > > >
> > > > > > > > There are many similar examples...
> > > > > > >
> > > > > > > Hi Kevin,
> > > > > > >
> > > > > > > My response above is to your question about using the global sysfs entry as you
> > > > > > > don't want to have the global path because
> > > > > > >
> > > > > > > "I don't think Qemu wants to know parent device of assigned mdev instances.".
> > > > > > >
> > > > > > > So I just want to confirm with you that (in case you miss):
> > > > > > >
> > > > > > > /sys/class/mdev/mdev_start | mdev_stop
> > > > > > >
> > > > > > > doesn't require the knowledge of parent device.
> > > > > > >
> > > > > >
> > > > > > Qemu is just one example, where your explanation of parent device
> > > > > > makes sense but still it's not good for Qemu to populate /sys/class/mdev
> > > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > > > > > instance, so any mdev attributes touched by Qemu should be put under
> > > > > > that node (e.g. start/stop for live migration usage as I explained earlier).
> > > > >
> > > > > Exactly, qemu is passed with the actual sysfs path.
> > > > >
> > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at all.
> > > > >
> > > > > QEMU will take the sysfs path as input:
> > > > >
> > > > > -device
> > > > >
> > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> > > > > d=vgpu0
> > > >
> > > > no need of passing "id=vgpu0" here. If necessary you can put id as an attribute
> > > > under sysfs mdev node:
> > > >
> > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id
> > >
> > > I think we have moved away from the device index based on Alex's comment, so the
> > > device path will be:
> > >
> > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818
> >
> > pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818
> > as parameter, and then Qemu can access 'id' under that path. You
> > don't need to pass a separate 'id' field. That's my point.
As noted in previous reply, id is a QEMU device parameter, it's not a
property of the mdev device. I'd also caution against adding arbitrary
sysfs files with the expectation that QEMU will be able to manipulate
them. I believe one of the benefits of vfio vs legacy KVM device
assignment is that vfio is self contained through the vfio API. Each
time we expect QEMU to interact with the system via sysfs, that's a new
file that libvirt needs to add access to and a new attack vector where
we need to worry about security. I still think it's the right idea
from a sysfs perspective to move to per device files, or an "online"
file to replace both, but let's be a little more strategic how we
expect QEMU to interact with the device.
> > > > > As you are saying in live migration, QEMU needs to access "start" and "stop". Could
> > > you
> > > > > please share more details, such as how QEMU access the "start" and "stop" sysfs,
> > > > > when and what triggers that?
> > > > >
> > > >
> > > > A conceptual flow as below:
> > > >
> > > > 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait for
> > > > in-flight DMA completed, etc.)
> > > >
> > > > echo "0" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start
> > > >
> > > > 2. Save mdev state:
> > > >
> > > > cat /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > xxx
> > > >
> > > > 3. xxx will be part of the final VM image and copied to a new machine
> > > >
> > > > 4. Allocate/prepare mdev on the new machine for this VM
> > > >
> > > > 5. Restore mdev state:
> > > >
> > > > cat xxx > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state
> > > > (might be a different path name)
> > > >
> > > > 6. start mdev on the new parent device:
> > > >
> > > > echo "1" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start
For example, another solution to this would be a device specific vfio
region dedicated to starting and stopping the device an manipulating
state. A simplistic API might be that the first dword of this region
is reserved for control of the device, the ability to pause and resume
the device, and reads and writes to the remainder of the region collect
and restore device state, respectively. We'd need to spend more time
thinking about such an API, but something like this could be added
between the kernel and QEMU as a feature that doesn't change the
existing API. Thanks,
Alex
next prev parent reply other threads:[~2016-08-16 21:04 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-03 19:03 [PATCH v6 0/4] Add Mediated device support Kirti Wankhede
2016-08-03 19:03 ` [PATCH v6 1/4] vfio: Mediated device Core driver Kirti Wankhede
2016-08-04 7:21 ` Tian, Kevin
2016-08-05 6:13 ` Kirti Wankhede
2016-08-15 9:15 ` Tian, Kevin
2016-08-09 19:00 ` Alex Williamson
2016-08-12 18:44 ` Kirti Wankhede
2016-08-12 21:16 ` Alex Williamson
2016-08-13 0:37 ` Kirti Wankhede
2016-08-15 9:38 ` Tian, Kevin
2016-08-15 15:59 ` Alex Williamson
2016-08-15 22:09 ` Neo Jia
2016-08-15 22:52 ` Alex Williamson
2016-08-15 23:23 ` Neo Jia
2016-08-16 0:49 ` Tian, Kevin
2016-08-15 19:59 ` Neo Jia
2016-08-15 22:47 ` [Qemu-devel] " Alex Williamson
2016-08-15 23:54 ` Neo Jia
2016-08-16 0:18 ` Tian, Kevin
2016-08-16 20:30 ` [Qemu-devel] " Neo Jia
2016-08-16 20:51 ` Alex Williamson
2016-08-16 21:17 ` Neo Jia
2016-08-16 0:30 ` Tian, Kevin
2016-08-16 3:45 ` Neo Jia
2016-08-16 3:50 ` Tian, Kevin
2016-08-16 4:16 ` Neo Jia
2016-08-16 4:52 ` Tian, Kevin
2016-08-16 5:43 ` Neo Jia
2016-08-16 5:58 ` Tian, Kevin
2016-08-16 6:13 ` Neo Jia
2016-08-16 21:03 ` Alex Williamson [this message]
2016-08-16 12:49 ` [Qemu-devel] " Alex Williamson
2016-08-03 19:03 ` [PATCH v6 2/4] vfio: VFIO driver for mediated PCI device Kirti Wankhede
2016-08-03 21:03 ` kbuild test robot
2016-08-04 0:19 ` kbuild test robot
2016-08-09 19:00 ` Alex Williamson
2016-08-10 21:23 ` Kirti Wankhede
2016-08-10 23:00 ` Alex Williamson
2016-08-11 15:59 ` Kirti Wankhede
2016-08-11 16:24 ` Alex Williamson
2016-08-11 17:46 ` Kirti Wankhede
2016-08-11 18:43 ` Alex Williamson
2016-08-12 17:57 ` Kirti Wankhede
2016-08-12 21:25 ` Alex Williamson
2016-08-13 0:42 ` Kirti Wankhede
2016-08-03 19:03 ` [PATCH v6 3/4] vfio iommu: Add support for mediated devices Kirti Wankhede
2016-08-09 19:00 ` Alex Williamson
2016-08-11 14:22 ` Kirti Wankhede
2016-08-11 16:28 ` Alex Williamson
2016-08-03 19:03 ` [PATCH v6 4/4] docs: Add Documentation for Mediated devices Kirti Wankhede
2016-08-04 7:31 ` Tian, Kevin
2016-08-05 7:45 ` Kirti Wankhede
2016-08-24 22:36 ` [Qemu-devel] " Daniel P. Berrange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160816150359.2bccc1f3@t450s.home \
--to=alex.williamson@redhat.com \
--cc=bjsdjshi@linux.vnet.ibm.com \
--cc=cjia@nvidia.com \
--cc=jike.song@intel.com \
--cc=kevin.tian@intel.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).