From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40203) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4MER-0001uK-FH for qemu-devel@nongnu.org; Thu, 03 Dec 2015 00:15:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4MEP-0007Nm-SQ for qemu-devel@nongnu.org; Thu, 03 Dec 2015 00:15:03 -0500 Date: Thu, 3 Dec 2015 15:22:19 +1100 From: David Gibson Message-ID: <20151203042219.GI3107@voom.redhat.com> References: <1447907368-9208-1-git-send-email-david@gibson.dropbear.id.au> <1447907368-9208-2-git-send-email-david@gibson.dropbear.id.au> <1448315891.20382.261.camel@redhat.com> <20151201022346.GJ31343@voom.redhat.com> <1449086974.15753.123.camel@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="yzvKDKJiLNESc64M" Content-Disposition: inline In-Reply-To: <1449086974.15753.123.camel@redhat.com> Subject: Re: [Qemu-devel] [RFC 01/12] vfio: Start improving VFIO/EEH interface List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, gwshan@au1.ibm.com, qemu-devel@nongnu.org, qemu-ppc@nongnu.org --yzvKDKJiLNESc64M Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 02, 2015 at 01:09:34PM -0700, Alex Williamson wrote: > On Tue, 2015-12-01 at 13:23 +1100, David Gibson wrote: > > On Mon, Nov 23, 2015 at 02:58:11PM -0700, Alex Williamson wrote: > > > On Thu, 2015-11-19 at 15:29 +1100, David Gibson wrote: > > > > At present the code handling IBM's Enhanced Error Handling (EEH) in= terface > > > > on VFIO devices operates by bypassing the usual VFIO logic with > > > > vfio_container_ioctl(). That's a poorly designed interface with un= clear > > > > semantics about exactly what can be operated on. > > > >=20 > > > > In particular it operates on a single vfio container internally (he= nce the > > > > name), but takes an address space and group id, from which it deduc= es the > > > > container in a rather roundabout way. groupids are something that = code > > > > outside vfio shouldn't even be aware of. > > > >=20 > > > > This patch creates new interfaces for EEH operations. Internally we > > > > have vfio_eeh_container_op() which takes a VFIOContainer object > > > > directly. For external use we have vfio_eeh_as_ok() which determin= es > > > > if an AddressSpace is usable for EEH (at present this means it has a > > > > single container and at most a single group attached), and > > > > vfio_eeh_as_op() which will perform an operation on an AddressSpace= in > > > > the unambiguous case, and otherwise returns an error. > > > >=20 > > > > This interface still isn't great, but it's enough of an improvement= to > > > > allow a number of cleanups in other places. > > > >=20 > > > > Signed-off-by: David Gibson > > > > --- > > > > hw/vfio/common.c | 77 ++++++++++++++++++++++++++++++++++++++= ++++++++++++ > > > > include/hw/vfio/vfio.h | 2 ++ > > > > 2 files changed, 79 insertions(+) > > > >=20 > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > > index 6797208..4733625 100644 > > > > --- a/hw/vfio/common.c > > > > +++ b/hw/vfio/common.c > > > > @@ -1002,3 +1002,80 @@ int vfio_container_ioctl(AddressSpace *as, i= nt32_t groupid, > > > > =20 > > > > return vfio_container_do_ioctl(as, groupid, req, param); > > > > } > > > > + > > > > +/* > > > > + * Interfaces for IBM EEH (Enhanced Error Handling) > > > > + */ > > > > +static bool vfio_eeh_container_ok(VFIOContainer *container) > > > > +{ > > > > + /* A broken kernel implementation means EEH operations won't w= ork > > > > + * correctly if there are multiple groups in a container */ > > > > + > > > > + if (!QLIST_EMPTY(&container->group_list) > > > > + && QLIST_NEXT(QLIST_FIRST(&container->group_list), contain= er_next)) { > > > > + return false; > > > > + } > > > > + > > > > + return true; > > > > +} > > >=20 > > > Seems like there are ways to make this a non-eeh specific function, > > > vfio_container_group_count(), vfio_container_group_empty_or_singleton= (), > > > etc. > >=20 > > I guess, but I don't know of anything else that needs to know, so is > > there a point? >=20 > Yes, long term maintainability. Simple functions that are named based > on what they do are building blocks for other users, even if we don't > yet know they exist. Functions tainted with the name and purpose of > their currently intended callers are cruft and code duplication waiting > to happen. Ok, point taken. > > Actually.. I could do with a another opinion here: so, logically EEH > > operations should be possible on a container basis - the kernel > > interface correctly reflects that (my previous comments that the > > interface was broken were mistaken). > >=20 > > The current kernel implementation *is* broken (and is non-trivial to > > fix) which is what this test is about. But is checking for a probably > > broken kernel state something that we ought to be checking for in > > qemu? As it stands when the kernel is fixed we'll need a new > > capability so that qemu can know to disable this test. > >=20 > > Should we instead just proceed with any container and just advise > > people not to attach multiple groups until the kernel is fixed? > >=20 > > A relevant point here might be that while I haven't implemented it so > > far, I think it will be possible to workaround the broken kernel with > > full functionality by forcing each group into a separate container and > > using one of a couple of possible different methods to handle EEH > > functionality across multiple containers on a vPHB. >=20 > This sounds vaguely similar to the discussions we're having around AER > handling. We really need to be able to translate a guest bus reset to a > host bus reset to enable guest participation in AER recovery, but iommu > grouping doesn't encompass any sort of shared bus property on x86 like > it does on power. Therefore the configurations where we can enable AER > are only a subset of what we can enable otherwise. However, not > everyone cares about AER recovery and perhaps the same is true of EEH. > So you really don't want to prevent useful configurations if the user > doesn't opt-in for that feature. >=20 > So for AER we're thinking about a new vfio-pci option, aer=3Don, that > indicates the device must be in a configuration that supports AER or the > VM instantiation (or device hotplug) should fail. Should EEH do > something similar? Yes, I think that's a good idea. I'd been thinking about a PHB option for enabling EEH, but I think one on the devices themselves makes things work better. > Should we share an option to make life easier for > libvirt so it doesn't need to care about EEH vs AER? My initial thought is yes, but I'm not really sure if there are wrinkles that could make that a problem. > If the kernel > interface is eventually fixed, maybe that just relaxes some of the > configuration parameters making EEH support easier to achieve, but still > optional? Thanks, So, yes, and that's good, but that's not really what I was asking about. The kernel *interface* is not broken, just the implementation. Which means when it's fixed it won't be discoverable unless we also add a capability advertising the fix. So the question is: do we workaround in qemu until such a capability comes along, or just assume that it's (potentially) working and declare it a kernel problem if it doesn't? --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --yzvKDKJiLNESc64M Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWX8N7AAoJEGw4ysog2bOSAUEP/j8t3evH4ERWiT0cbcms5HEi MKAbtvZgVQ3C1X54aniW/REEQki34dvk+jflXKgxU9R63sa2AdIoyf5FQdVomQaK ZbUQhY3gsFPR7D+aAQjq2n1OakU16gQwzs15rSr+sAaoUzsf26BvzoLrWxlTi8OW tizIO71HRAHy6j0yU27oswwULjSXFdS9RP8tFCVJDqfAfxZM0kbvt1hTidYxeDVT bKox4HD3Zw8BU1JHc1QCFhaiRlE1Xj/cDOR3vKIEh5GQ3MbZrzRp2BPZ4UPkA8+Z nM+8dPnDXOwKMwAemKC1+ptM+9qjtPaH7qiDm6c0gOhGt+GHbPQCBCQ3nOec5JsY SYUZ6Ymn0ijxFLpY5Cdb0IPnzJ5/mbTpqA5AOyODrLHrI2pS5Yx5xSb3hB3sUQ8h VXZf9Xvcxt3u1XNnjqsKWn870irjiaHv37GacYY+Nc+DCGXxjZVDyAAn2Gv0c1dG Fkn+55kL9t9jntoejxw2BgibpxvYf/3GguNFcddSZJODNzKtI+OTbXKWLUGKjeaM EyHpHinK3MT2KXz/O68yvH+IERQVZMRxeQIKjGHypL7afnMdsU3+xRDF5FtFU8o4 8rz4oY++mvKg1Yi9D3LvmnJiIyEYsY9mgj+DNb2vNjfAwcqb3mHPUQj5JbftXl4B 0ZuYzBiHwm17/PnKOA29 =v2nU -----END PGP SIGNATURE----- --yzvKDKJiLNESc64M--