From mboxrd@z Thu Jan 1 00:00:00 1970 From: Erik Skultety Subject: Re: [libvirt] Expose vfio device display/migration to libvirt and above, was Re: [PATCH 0/3] sample: vfio mdev display devices. Date: Fri, 4 May 2018 09:49:44 +0200 Message-ID: <20180504074944.GD8859@erzo-ntb> References: <20180418123153.0f4f037d@w520.home> <20180423154003.12c5467a@w520.home> <20180424165918.5c2ef037@w520.home> <0a1d6487-0dfb-2ffc-4774-ebaf65c15892@nvidia.com> <20180425120057.0fabb70e@w520.home> <20180425195229.GK2496@work-vm> <20180426185522.GQ2631@work-vm> <20180503125800.76cc7582@w520.home> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Neo Jia , kvm@vger.kernel.org, libvirt , "Dr. David Alan Gilbert" , Tina Zhang , Kirti Wankhede , Gerd Hoffmann , Laine Stump , Jiri Denemark , intel-gvt-dev@lists.freedesktop.org To: Alex Williamson Return-path: Content-Disposition: inline In-Reply-To: <20180503125800.76cc7582@w520.home> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com List-Id: kvm.vger.kernel.org On Thu, May 03, 2018 at 12:58:00PM -0600, Alex Williamson wrote: > Hi, > > The previous discussion hasn't produced results, so let's start over. > Here's the situation: > > - We currently have kernel and QEMU support for the QEMU vfio-pci > display option. > > - The default for this option is 'auto', so the device will attempt to > generate a display if the underlying device supports it, currently > only GVTg and some future release of NVIDIA vGPU (plus Gerd's > sample mdpy and mbochs). > > - The display option is implemented via two different mechanism, a > vfio region (NVIDIA, mdpy) or a dma-buf (GVTg, mbochs). > > - Displays using dma-buf require OpenGL support, displays making > use of region support do not. > > - Enabling OpenGL support requires specific VM configurations, which > libvirt /may/ want to facilitate. > > - Probing display support for a given device is complicated by the > fact that GVTg and NVIDIA both impose requirements on the process > opening the device file descriptor through the vfio API: > > - GVTg requires a KVM association or will fail to allow the device > to be opened. How exactly is this association checked? > > - NVIDIA requires that their vgpu-manager process can locate a UUID > for the VM via the process commandline. > > - These are both horrible impositions and prevent libvirt from > simply probing the device itself. So I feel like we're trying to solve a problem coming from one layer on a bunch of different layers which inherently prevents us to produce a viable long term solution without dragging a significant amount of hacky nasty code and it is not the missing sysfs attributes I have in mind. Why does NVIDIA's vgpu-manager need to locate a UUID of a qemu VM? I assume that's to prevent multiple VM instances trying to use the same mdev device, in which case can't the vgpu-manager track references to how many "open" and "close" calls have been made to the same device? This is just from a layman's perspective, but it would allow the following: - when libvirt starts, it initializes all its drivers (let's focus on QEMU) - as part of this initialization, libvirt probes QEMU for capabilities and caches them in order to use them when spawning VMs Now, if we (theoretically) can settle on easing the restrictions Alex has mentioned, we in fact could introduce a QMP command to probe these devices and provide libvirt with useful information at that point in time. Of course, since the 3rd party vendor is "de-coupled" from qemu, libvirt would have no way to find out that the driver has changed in the meantime, thus still using the old information we gathered, ergo potentially causing the QEMU process to fail eventually. But then again, there's very often a strong recommendation to reboot your host after a driver update, especially in NVIDIA's case, which means this fact wouldn't matter. However, there's also a significant drawback to my proposal which probably renders it completely useless (but we can continue from there...) and that is the devices would either have to be present already (not an option) or QEMU would need to be enhanced in a way, that it would create a dummy device during QMP probing, open it, collect the information libvirt needs, close it and remove it. If the driver doesn't change in the meantime, this should be sufficient for a VM to be successfully instantiated with a display, right? > > The above has pressed the need for investigating some sort of > alternative API through which libvirt might introspect a vfio device > and with vfio device migration on the horizon, it's natural that some > sort of support for migration state compatibility for the device need be > considered as a second user of such an API. However, we currently have > no concept of migration compatibility on a per-device level as there > are no migratable devices that live outside of the QEMU code base. > It's therefore assumed that per device migration compatibility is > encompassed by the versioned machine type for the overall VM. We need > participation all the way to the top of the VM management stack to > resolve this issue and it's dragging down the (possibly) more simple > question of how do we resolve the display situation. Therefore I'm > looking for alternatives for display that work within what we have > available to us at the moment. > > Erik Skultety, who initially raised the display question, has identified > one possible solution, which is to simply make the display configuration > the user's problem (apologies if I've misinterpreted Erik). I believe > this would work something like: > > - libvirt identifies a version of QEMU that includes 'display' support > for vfio-pci devices and defaults to adding display=off for every > vfio-pci device [have we chosen the wrong default (auto) in QEMU?]. >>From libvirt's POV, having a new XML attribute display to the host device type mdev should with a default value 'off', potentially extending this to 'auto' once we have enough information to base our decision on. We'll need to combine this with a new attribute value for the