kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Neo Jia <cjia@nvidia.com>,
	kvm@vger.kernel.org, Erik Skultety <eskultet@redhat.com>,
	libvirt <libvir-list@redhat.com>,
	Tina Zhang <tina.zhang@intel.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Gerd Hoffmann <kraxel@redhat.com>, Laine Stump <laine@redhat.com>,
	Jiri Denemark <jdenemar@redhat.com>,
	intel-gvt-dev@lists.freedesktop.org
Subject: Re: [libvirt] [PATCH 0/3] sample: vfio mdev display devices.
Date: Wed, 25 Apr 2018 20:52:30 +0100	[thread overview]
Message-ID: <20180425195229.GK2496@work-vm> (raw)
In-Reply-To: <20180425120057.0fabb70e@w520.home>

* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Wed, 25 Apr 2018 21:00:39 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
> > On 4/25/2018 4:29 AM, Alex Williamson wrote:
> > > On Wed, 25 Apr 2018 01:20:08 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >   
> > >> On 4/24/2018 3:10 AM, Alex Williamson wrote:  
> > >>> On Wed, 18 Apr 2018 12:31:53 -0600
> > >>> Alex Williamson <alex.williamson@redhat.com> wrote:
> > >>>     
> > >>>> On Mon,  9 Apr 2018 12:35:10 +0200
> > >>>> Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >>>>    
> > >>>>> This little series adds three drivers, for demo-ing and testing vfio
> > >>>>> display interface code.  There is one mdev device for each interface
> > >>>>> type (mdpy.ko for region and mbochs.ko for dmabuf).      
> > >>>>
> > >>>> Erik Skultety brought up a good question today regarding how libvirt is
> > >>>> meant to handle these different flavors of display interfaces and
> > >>>> knowing whether a given mdev device has display support at all.  It
> > >>>> seems that we cannot simply use the default display=auto because
> > >>>> libvirt needs to specifically configure gl support for a dmabuf type
> > >>>> interface versus not having such a requirement for a region interface,
> > >>>> perhaps even removing the emulated graphics in some cases (though I
> > >>>> don't think we have boot graphics through either solution yet).
> > >>>> Additionally, GVT-g seems to need the x-igd-opregion support
> > >>>> enabled(?), which is a non-starter for libvirt as it's an experimental
> > >>>> option!
> > >>>>
> > >>>> Currently the only way to determine display support is through the
> > >>>> VFIO_DEVICE_QUERY_GFX_PLANE ioctl, but for libvirt to probe that on
> > >>>> their own they'd need to get to the point where they could open the
> > >>>> vfio device and perform the ioctl.  That means opening a vfio
> > >>>> container, adding the group, setting the iommu type, and getting the
> > >>>> device.  I was initially a bit appalled at asking libvirt to do that,
> > >>>> but the alternative is to put this information in sysfs, but doing that
> > >>>> we risk that we need to describe every nuance of the mdev device
> > >>>> through sysfs and it becomes a dumping ground for every possible
> > >>>> feature an mdev device might have.
> ...    
> > >>>> So I was ready to return and suggest that maybe libvirt should probe
> > >>>> the device to know about these ancillary configuration details, but
> > >>>> then I remembered that both mdev vGPU vendors had external dependencies
> > >>>> to even allow probing the device.  KVMGT will fail to open the device
> > >>>> if it's not associated with an instance of KVM and NVIDIA vGPU, I
> > >>>> believe, will fail if the vGPU manager process cannot find the QEMU
> > >>>> instance to extract the VM UUID.  (Both of these were bad ideas)    
> > >>>
> > >>> Here's another proposal that's really growing on me:
> > >>>
> > >>>  * Fix the vendor drivers!  Allow devices to be opened and probed
> > >>>    without these external dependencies.
> > >>>  * Libvirt uses the existing vfio API to open the device and probe the
> > >>>    necessary ioctls, if it can't probe the device, the feature is
> > >>>    unavailable, ie. display=off, no migration.
> > >>>     
> > >>
> > >> I'm trying to think simpler mechanism using sysfs that could work for
> > >> any feature and knowing source-destination migration compatibility check
> > >> by libvirt before initiating migration.
> > >>
> > >> I have another proposal:
> > >> * Add a ioctl VFIO_DEVICE_PROBE_FEATURES
> > >> struct vfio_device_features {
> > >>     __u32 argsz;
> > >>     __u32 features;
> > >> }
> > >>
> > >> Define bit for each feature:
> > >> #define VFIO_DEVICE_FEATURE_DISPLAY_REGION	(1 << 0)
> > >> #define VFIO_DEVICE_FEATURE_DISPLAY_DMABUF	(1 << 1)
> > >> #define VFIO_DEVICE_FEATURE_MIGRATION		(1 << 2)
> > >>
> > >> * Vendor driver returns bitmask of supported features during
> > >> initialization phase.
> > >>
> > >> * In vfio core module, trap this ioctl for each device  in
> > >> vfio_device_fops_unl_ioctl(),  
> > > 
> > > Whoops, chicken and egg problem, VFIO_GROUP_GET_DEVICE_FD is our
> > > blocking point with mdev drivers, we can't get a device fd, so we can't
> > > call an ioctl on the device fd.
> > >   
> > 
> > I'm sorry, I thought we could expose features when QEMU initialize, but
> > libvirt needs to know supported features before QEMU initialize.
> > 
> > 
> > >> check features bitmask returned by vendor
> > >> driver and add a sysfs file if feature is supported that device. This
> > >> sysfs file would return 0/1.  
> > > 
> > > I don't understand why we have an ioctl interface, if the user can get
> > > to the device fd then we have existing interfaces to probe these
> > > things, it seems like you're just wanting to pass a features bitmap
> > > through to vfio_add_group_dev() that vfio-core would expose through
> > > sysfs, but a list of feature bits doesn't convey enough info except for
> > > the most basic uses.
> > >    
> > 
> > Yes, vfio_add_group_dev() seems to be better way to convey features to
> > vfio core.
> > 
> > >> For migration this bit will only indicate if host driver supports
> > >> migration feature.
> > >>
> > >> For source and destination compatibility check libvirt would need more
> > >> data/variables to check like,
> > >> * if same type of 'mdev_type' device create-able at destination,
> > >>    i.e. if ('mdev_type'->available_instances > 0)
> > >>
> > >> * if host_driver_version at source and destination are compatible.
> > >> Host driver from same release branch should be mostly compatible, but if
> > >> there are major changes in structures or APIs, host drivers from
> > >> different branches might not be compatible, for example, if source and
> > >> destination are from different branches and one of the structure had
> > >> changed, then data collected at source might not be compatible with
> > >> structures at destination and typecasting it to changed structures would
> > >> mess up migrated data during restoration.  
> > > 
> > > Of course now you're asking that libvirt understand the release
> > > versioning scheme of every vendor driver and that it remain
> > > programatically consistent.  We can't even do this with in-kernel
> > > drivers.  And in the end, still the best we can do is guess.
> > >  
> > 
> > Libvirt doesn't need to understand the version, libvirt need to do
> > strcmp version string from source and destination. If those are equal,
> > then libvirt would understand that they are compatible.
> 
> Who's to say that the driver version and migration compatibility have
> any relation at all?  Some drivers might focus on designing their own
> migration interface that can maintain compatibility across versions
> (QEMU does this), some drivers may only allow identical version
> migration (which is going to frustrate upper level management tools and
> customers - RHEL goes to great extents to support cross version
> migration).  We cannot have a one size fits all here that driver version
> defines completely the migration compatibility.

I'll agree; I don't know enough about these devices, but to give you
some example of things I'd expect to work:
   a) User adds new machines to their data centre with larger/newer
version of the same vendors GPU; in some cases that should work
(depending on vendor details etc)
   b) The same thing but with identical hardware but a newer driver on
the destination.

Obviously there will be some cut offs that say some versions are
incompatible;  but for normal migration we jump through serious hoops
to make sure stuff works; customers will expect the same with some
VFIO devices.

> > >> * if guest_driver_version is compatible with host driver at destination.
> > >> For mdev devices, guest driver communicates with host driver in some
> > >> form. If there are changes in structures/APIs of such communication,
> > >> guest driver at source might not be compatible with host driver at
> > >> destination.  
> > > 
> > > And another guess plus now the guest driver is involved which libvirt
> > > has no visibility to.
> > >    
> > 
> > Like above libvirt need to do strcmp.
> 
> Insufficient, imo
> 
> > >> 'available_instances' sysfs already exist, later two should be added by
> > >> vendor driver which libvirt can use for migration compatibility check.  
> > > 
> > > As noted previously, display and migration are not necessarily
> > > mdev-only features, it's possible that vfio-pci or vfio-platform could
> > > also implement these, so the sysfs interface cannot be restricted to
> > > the mdev template and lifecycle interface.
> > >   
> > 
> > I agree.
> > Feature bitmask passed to vfio core is not mdev specific. But here
> > 'available_instances' for migration compatibility check is mdev
> > specific. If mdev device is not create-able at destination, there is no
> > point in initiating migration by libvirt.
> 
> 'available_instances' for migration compatibility check...?  We use
> available_instances to know whether we have the resources to create a
> given mdev type.  It's certainly a prerequisite to have a device of the
> identical type at the migration target and how we define what is an
> identical device for a directly assigned PCI device is yet another
> overly complicated rat hole.  But an identical device doesn't
> necessarily imply migration compatibility and I think that's the
> problem we're tackling.  We cannot assume based only on the device type
> that migration is compatible, that's basically saying we're never going
> to have any bugs or oversights or new features in the migration stream.

Those things certainly happen; state that we forgot to transfer, new
features enables on devices, devices configured in different ways.

> Chatting with Laine, it may be worth a step back to include migration
> experts and people up the stack with more visibility to how openstack
> operates.  The issue here is that if vfio gains migration support then
> we have a portion of the migration stream that is not under the control
> of QEMU, we cannot necessarily tie it to a QEMU machine type and we
> cannot necessarily dictate how the vfio bus driver (vendor driver)
> handles versioning and compatibility.  My intent was to expose some
> sort of migration information through the vfio API so that upper level
> tools could determine source and target compatibility, but this in
> itself is I think something new that those tools need to agree how it
> might be done.  How would something like openstack want to handle not
> only finding a migration target with a compatible device, but also
> verifying if the device supports the migration format of the source
> device?
> 
> Alternatively, should we do anything?  Is the problem too hard and we
> should let the driver return an error when it receives an incompatible
> migration stream, aborting the migration?

It's a bit nasty; if you've hit the 'evacuate host' button then what
happens when you've got some incompatible hosts.

Dave

> > > One more try... we have a vfio_group fd.  This is created by the bus
> > > drivers calling vfio_add_group_dev() and registers a struct device, a
> > > struct vfio_device_ops, and private data.  Typically we only wire the
> > > device_ops to the resulting file descriptor we get from
> > > VFIO_GROUP_GET_DEVICE_FD, but could we enable sort of a nested ioctl
> > > through the group fd?  The ioctl would need to take a string arg to
> > > match to a device name, plus an ioctl cmd and arg for the device_ops
> > > ioctl.  The group ioctl would need to filter cmds to known, benign
> > > queries.  We'd also need to verify that the allowed ioctls have no
> > > dependencies on setup done in device_ops.open().  
> > 
> > So these ioctls would be called without devices open() call, doesn't
> > this seem to be against file operations standard?
> 
> vfio_device_ops is modeled largely after file operations, but I don't
> think we're bound by that for the interaction between vfio-core and the
> vfio bus drivers.  We could make a separate callback for unprivileged
> ioctls, but that seems like more work per driver when we really want to
> maintain the identical API, we just want to provide a more limited
> interface and change the calling point.
> 
> An issue I thought of for migration though is that this path wouldn't
> have access to the migration region and therefore if we place a header
> within that region containing the compatibility and versioning
> information, the user still couldn't access it.  This doesn't seem to
> be a blocker though as we could put that information within the region
> capability that defines the region as used for migration.  Possibly a
> device could have multiple migration regions with different formats
> for backwards compatibility, of course then we'd need a way to
> determine which to use and which combinations have been validated.
> Thanks,
> 
> Alex
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-04-25 19:52 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180409103513.8020-1-kraxel@redhat.com>
2018-04-09 10:35 ` [PATCH 1/3] sample: vfio mdev display - host device Gerd Hoffmann
2018-04-24  2:41   ` Alex Williamson
2018-04-24  6:29     ` Gerd Hoffmann
2018-04-09 10:35 ` [PATCH 2/3] sample: vfio mdev display - guest driver Gerd Hoffmann
2018-04-11 20:39   ` Bjorn Helgaas
2018-04-24  2:51   ` Alex Williamson
2018-04-25 21:03   ` Konrad Rzeszutek Wilk
2018-04-09 10:35 ` [PATCH 3/3] sample: vfio bochs vbe display (host device for bochs-drm) Gerd Hoffmann
2018-04-24  3:05   ` Alex Williamson
2018-04-18 18:31 ` [libvirt] [PATCH 0/3] sample: vfio mdev display devices Alex Williamson
2018-04-19  8:40   ` Gerd Hoffmann
2018-04-19 10:03     ` Zhenyu Wang
2018-04-19 14:20     ` Alex Williamson
2018-04-19 14:54     ` Paolo Bonzini
2018-04-23 21:40   ` Alex Williamson
2018-04-24  7:17     ` Gerd Hoffmann
2018-04-24 17:35       ` Alex Williamson
2018-04-25  9:49         ` Zhang, Tina
2018-04-24 19:50     ` Kirti Wankhede
2018-04-24 22:59       ` Alex Williamson
2018-04-25 15:30         ` Kirti Wankhede
2018-04-25 18:00           ` Alex Williamson
2018-04-25 19:52             ` Dr. David Alan Gilbert [this message]
2018-04-26 18:45               ` Kirti Wankhede
2018-04-26 18:55                 ` Dr. David Alan Gilbert
2018-04-27 17:21                   ` Alex Williamson
2018-05-03 18:58                   ` [libvirt] Expose vfio device display/migration to libvirt and above, was " Alex Williamson
2018-05-04  7:49                     ` Erik Skultety
2018-05-04 16:03                       ` Alex Williamson
2018-05-07  6:25                         ` Gerd Hoffmann
2018-07-20  4:56                           ` Yuan, Hang
2018-08-08  7:43                             ` Gerd Hoffmann
2018-05-10 11:00                         ` Erik Skultety
2018-05-10 15:57                           ` Alex Williamson
2018-05-04  9:16                     ` Daniel P. Berrangé
2018-05-04 17:06                       ` Alex Williamson
2018-05-07  6:15                     ` Gerd Hoffmann
2018-05-04  8:39                 ` [libvirt] " Erik Skultety
2018-04-26  3:44   ` Tian, Kevin
2018-04-26  6:14     ` Gerd Hoffmann
2018-04-26 15:44       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180425195229.GK2496@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=eskultet@redhat.com \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jdenemar@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=laine@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=tina.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).