From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45629) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dfJEj-00048J-Ig for qemu-devel@nongnu.org; Wed, 09 Aug 2017 01:08:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dfJEf-0002EC-A0 for qemu-devel@nongnu.org; Wed, 09 Aug 2017 01:08:53 -0400 Received: from ozlabs.org ([103.22.144.67]:46789) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dfJEe-0002DC-UK for qemu-devel@nongnu.org; Wed, 09 Aug 2017 01:08:49 -0400 Date: Wed, 9 Aug 2017 15:08:42 +1000 From: David Gibson Message-ID: <20170809050842.GI13670@umbus.fritz.box> References: <1501119055-4060-1-git-send-email-mdroth@linux.vnet.ibm.com> <20170727105348.GG7970@umbus.fritz.box> <20170727115042.GL2555@redhat.com> <20170808134008.0648ab2e@w520.home> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="dCSxeJc5W8HZXZrD" Content-Disposition: inline In-Reply-To: <20170808134008.0648ab2e@w520.home> Subject: Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: "Daniel P. Berrange" , Peter Maydell , Michael Roth , QEMU Developers , Paolo Bonzini , Greg Kurz , Markus Armbruster --dCSxeJc5W8HZXZrD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 08, 2017 at 01:40:08PM -0600, Alex Williamson wrote: > On Thu, 27 Jul 2017 12:50:42 +0100 > "Daniel P. Berrange" wrote: >=20 > > On Thu, Jul 27, 2017 at 08:53:48PM +1000, David Gibson wrote: > > > On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote: =20 > > > > On 27 July 2017 at 02:30, Michael Roth = wrote: =20 > > > > > In particular, Mellanox CX4 adapters on PowerNV hosts might not b= e fully > > > > > quiesced by vfio-pci's finalize() routine until up to 6s after the > > > > > DEVICE_DELETED was emitted, leading to detach-device on the libvi= rt side pretty > > > > > much always crashing the host. =20 > > > >=20 > > > > My initial naive thought is that if the host kernel can crash then > > > > this is a host kernel bug... shouldn't the host kernel refuse > > > > the subsequent libvirt rebind if it would cause a crash ? =20 > > >=20 > > > I think so too, but I haven't been able to convince Alex. Nor > > > find time to fix it in the kernel myself. =20 > >=20 > > I think we need to fix both the QEMU premature sending of DEVICE_DELETED > > and the kernel bug that allowed the crash. >=20 >=20 > Where do we stand on this for v2.10? I'd like to see it get in. There > may be things to fix in the kernel, some of them may already be fixed > in the latest development kernel, but ultimately the kernel considers > driver binding to be a trusted operation and if userspace doesn't > understand all the dependencies, they shouldn't be doing it. In this > case libvirt is using the DEVICE_DELETED signal with the assumption > that the device has been fully released by QEMU, which is of course not > accurate (libvirt could test this, but chooses not to). libvirt > therefore begins trying to unbind a device that is still in use, we try > to handle it, but see official kernel stance that userspace is > responsible for understanding device dependencies, so we can only do so > much. >=20 > IMO, the next step along those lines would be that libvirt needs to > understand that even once a device is fully released from QEMU, it's > not necessarily safe to re-bind the device to a host driver. If the > device is a member of a group where other devices are still in use by > userspace, this will violate user/host device isolation and the kernel > will crash to protect itself. At best I may be able to improve this to > killing the userspace process making use of the conflicting device, but > the kernel view is that userspace (libvirt) has mandated to bind the > device to the host driver and we must make it so, the user is > responsible for the consequences. Thanks, Merging it for 2.10 seems like a good idea to me to, but it's not really my area of expertise, and therefore not my call. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --dCSxeJc5W8HZXZrD Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlmKmNoACgkQbDjKyiDZ s5Ls+w//YmCyDX3Y9EfH9pfkczr1bNpvQv3v/BSkL+jaGXO09SYRA8Cuz/Im0T0T 1Lwe/8GviIqdcvg1NZUTui+o7mJaILgLRnYisnJUpxD4R8K35i93vuF70PVqZ13K YoVLPWUDGRe7aYnfluzGHnITy/vHEdHwCb9ECBmENaPbBPEkjjdxoswo4iYmTcg6 +/iFz+/U2pQ4kj016wiWXJ0K2ijPRE/X+nZHVdOe/Ao7D+upM0jB7AQACPu6/xji fMHyD6jRsdYMqmt0u5O2+u6V8Y5jJoS3t6fN5xAYbaKk6sCEpcusAuZscEeDEhxt MKWUDyxI8Axu33rkC82laDEmd09aeRXBOaWmCFXQs9hzSDKD8VQk2F+gHE3aqns0 MWqLb38iVnVn2HVHCU9TzYn03v7NO2LX4l9t34tDDxk6/N+l/XvAQNS1KB9BOg5w g9+/TZOAWJeHywh13yk4heS0outAF7yMDA72i73/o59urPsfd4Sv6hudGtvGkm50 6Mopce6ZSTCm7mn77AU5NcajuYNBDvcr7YXIA5ckEq6saOASnqvfhdAy3i1h5N6y CCHYsAA31Pr7wyAI6CLlPVpOR/PDm/qnvHbgv4Jr7s5lT9B29GiCDgMhtHI+KDXh AuUYs8JAg2lm+wzFDi9HL0kM/zFsswBok+6CxrwAVpM0hjkI3/U= =mZbV -----END PGP SIGNATURE----- --dCSxeJc5W8HZXZrD--