qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	Peter Maydell <peter.maydell@linaro.org>,
	Michael Roth <mdroth@linux.vnet.ibm.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>, Greg Kurz <groug@kaod.org>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
Date: Tue, 8 Aug 2017 13:40:08 -0600	[thread overview]
Message-ID: <20170808134008.0648ab2e@w520.home> (raw)
In-Reply-To: <20170727115042.GL2555@redhat.com>

On Thu, 27 Jul 2017 12:50:42 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Thu, Jul 27, 2017 at 08:53:48PM +1000, David Gibson wrote:
> > On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:  
> > > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:  
> > > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > > much always crashing the host.  
> > > 
> > > My initial naive thought is that if the host kernel can crash then
> > > this is a host kernel bug... shouldn't the host kernel refuse
> > > the subsequent libvirt rebind if it would cause a crash ?  
> > 
> > I think so too, but I haven't been able to convince Alex.  Nor
> > find time to fix it in the kernel myself.  
> 
> I think we need to fix both the QEMU premature sending of DEVICE_DELETED
> and the kernel bug that allowed the crash.


Where do we stand on this for v2.10?  I'd like to see it get in.  There
may be things to fix in the kernel, some of them may already be fixed
in the latest development kernel, but ultimately the kernel considers
driver binding to be a trusted operation and if userspace doesn't
understand all the dependencies, they shouldn't be doing it.  In this
case libvirt is using the DEVICE_DELETED signal with the assumption
that the device has been fully released by QEMU, which is of course not
accurate (libvirt could test this, but chooses not to).  libvirt
therefore begins trying to unbind a device that is still in use, we try
to handle it, but see official kernel stance that userspace is
responsible for understanding device dependencies, so we can only do so
much.

IMO, the next step along those lines would be that libvirt needs to
understand that even once a device is fully released from QEMU, it's
not necessarily safe to re-bind the device to a host driver.  If the
device is a member of a group where other devices are still in use by
userspace, this will violate user/host device isolation and the kernel
will crash to protect itself.  At best I may be able to improve this to
killing the userspace process making use of the conflicting device, but
the kernel view is that userspace (libvirt) has mandated to bind the
device to the host driver and we must make it so, the user is
responsible for the consequences.  Thanks,

Alex

  reply	other threads:[~2017-08-08 19:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting Michael Roth
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away" Michael Roth
2017-07-31 15:51   ` Greg Kurz
2017-07-31 16:39     ` Michael Roth
2017-07-31 17:10       ` Greg Kurz
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize() Michael Roth
2017-07-31 17:11   ` Greg Kurz
2017-08-09 14:04   ` Auger Eric
2017-10-07  0:03     ` Michael Roth
2017-07-27  9:11 ` [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Peter Maydell
2017-07-27 10:53   ` David Gibson
2017-07-27 11:50     ` Daniel P. Berrange
2017-08-08 19:40       ` Alex Williamson [this message]
2017-08-09  5:08         ` David Gibson
2017-09-05 19:35           ` Greg Kurz
2017-07-27 11:54     ` Michael Roth
2017-07-27 14:47     ` Alex Williamson
2017-07-28  3:14       ` David Gibson
2017-08-09 14:53 ` Auger Eric
2017-10-03 22:21 ` Michael Roth
2017-10-04  6:01   ` David Gibson
2017-10-06 10:23   ` David Gibson
2017-10-06 12:31     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170808134008.0648ab2e@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).