Re: [Qemu-devel] [RFC 01/12] vfio: Start improving VFIO/EEH interface

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, gwshan@au1.ibm.com,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [RFC 01/12] vfio: Start improving VFIO/EEH interface
Date: Thu, 3 Dec 2015 15:22:19 +1100	[thread overview]
Message-ID: <20151203042219.GI3107@voom.redhat.com> (raw)
In-Reply-To: <1449086974.15753.123.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6534 bytes --]

On Wed, Dec 02, 2015 at 01:09:34PM -0700, Alex Williamson wrote:
> On Tue, 2015-12-01 at 13:23 +1100, David Gibson wrote:
> > On Mon, Nov 23, 2015 at 02:58:11PM -0700, Alex Williamson wrote:
> > > On Thu, 2015-11-19 at 15:29 +1100, David Gibson wrote:
> > > > At present the code handling IBM's Enhanced Error Handling (EEH) interface
> > > > on VFIO devices operates by bypassing the usual VFIO logic with
> > > > vfio_container_ioctl().  That's a poorly designed interface with unclear
> > > > semantics about exactly what can be operated on.
> > > > 
> > > > In particular it operates on a single vfio container internally (hence the
> > > > name), but takes an address space and group id, from which it deduces the
> > > > container in a rather roundabout way.  groupids are something that code
> > > > outside vfio shouldn't even be aware of.
> > > > 
> > > > This patch creates new interfaces for EEH operations.  Internally we
> > > > have vfio_eeh_container_op() which takes a VFIOContainer object
> > > > directly.  For external use we have vfio_eeh_as_ok() which determines
> > > > if an AddressSpace is usable for EEH (at present this means it has a
> > > > single container and at most a single group attached), and
> > > > vfio_eeh_as_op() which will perform an operation on an AddressSpace in
> > > > the unambiguous case, and otherwise returns an error.
> > > > 
> > > > This interface still isn't great, but it's enough of an improvement to
> > > > allow a number of cleanups in other places.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  hw/vfio/common.c       | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  include/hw/vfio/vfio.h |  2 ++
> > > >  2 files changed, 79 insertions(+)
> > > > 
> > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > index 6797208..4733625 100644
> > > > --- a/hw/vfio/common.c
> > > > +++ b/hw/vfio/common.c
> > > > @@ -1002,3 +1002,80 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> > > >  
> > > >      return vfio_container_do_ioctl(as, groupid, req, param);
> > > >  }
> > > > +
> > > > +/*
> > > > + * Interfaces for IBM EEH (Enhanced Error Handling)
> > > > + */
> > > > +static bool vfio_eeh_container_ok(VFIOContainer *container)
> > > > +{
> > > > +    /* A broken kernel implementation means EEH operations won't work
> > > > +     * correctly if there are multiple groups in a container */
> > > > +
> > > > +    if (!QLIST_EMPTY(&container->group_list)
> > > > +        && QLIST_NEXT(QLIST_FIRST(&container->group_list), container_next)) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    return true;
> > > > +}
> > > 
> > > Seems like there are ways to make this a non-eeh specific function,
> > > vfio_container_group_count(), vfio_container_group_empty_or_singleton(),
> > > etc.
> > 
> > I guess, but I don't know of anything else that needs to know, so is
> > there a point?
> 
> Yes, long term maintainability.  Simple functions that are named based
> on what they do are building blocks for other users, even if we don't
> yet know they exist.  Functions tainted with the name and purpose of
> their currently intended callers are cruft and code duplication waiting
> to happen.

Ok, point taken.

> > Actually.. I could do with a another opinion here: so, logically EEH
> > operations should be possible on a container basis - the kernel
> > interface correctly reflects that (my previous comments that the
> > interface was broken were mistaken).
> > 
> > The current kernel implementation *is* broken (and is non-trivial to
> > fix) which is what this test is about.  But is checking for a probably
> > broken kernel state something that we ought to be checking for in
> > qemu?  As it stands when the kernel is fixed we'll need a new
> > capability so that qemu can know to disable this test.
> > 
> > Should we instead just proceed with any container and just advise
> > people not to attach multiple groups until the kernel is fixed?
> > 
> > A relevant point here might be that while I haven't implemented it so
> > far, I think it will be possible to workaround the broken kernel with
> > full functionality by forcing each group into a separate container and
> > using one of a couple of possible different methods to handle EEH
> > functionality across multiple containers on a vPHB.
> 
> This sounds vaguely similar to the discussions we're having around AER
> handling.  We really need to be able to translate a guest bus reset to a
> host bus reset to enable guest participation in AER recovery, but iommu
> grouping doesn't encompass any sort of shared bus property on x86 like
> it does on power.  Therefore the configurations where we can enable AER
> are only a subset of what we can enable otherwise.  However, not
> everyone cares about AER recovery and perhaps the same is true of EEH.
> So you really don't want to prevent useful configurations if the user
> doesn't opt-in for that feature.
> 
> So for AER we're thinking about a new vfio-pci option, aer=on, that
> indicates the device must be in a configuration that supports AER or the
> VM instantiation (or device hotplug) should fail.  Should EEH do
> something similar?

Yes, I think that's a good idea.  I'd been thinking about a PHB option
for enabling EEH, but I think one on the devices themselves makes
things work better.

> Should we share an option to make life easier for
> libvirt so it doesn't need to care about EEH vs AER?

My initial thought is yes, but I'm not really sure if there are
wrinkles that could make that a problem.

> If the kernel
> interface is eventually fixed, maybe that just relaxes some of the
> configuration parameters making EEH support easier to achieve, but still
> optional?  Thanks,

So, yes, and that's good, but that's not really what I was asking
about.

The kernel *interface* is not broken, just the implementation.  Which
means when it's fixed it won't be discoverable unless we also add a
capability advertising the fix.

So the question is: do we workaround in qemu until such a capability
comes along, or just assume that it's (potentially) working and
declare it a kernel problem if it doesn't?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

next prev parent reply	other threads:[~2015-12-03  5:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-19  4:29 [Qemu-devel] [RFC 00/12] Merge EEH support into spapr-pci-host-bridge David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 01/12] vfio: Start improving VFIO/EEH interface David Gibson
2015-11-23 21:58   ` Alex Williamson
2015-12-01  2:23     ` David Gibson
2015-12-02 20:09       ` Alex Williamson
2015-12-03  4:22         ` David Gibson [this message]
2015-12-03 21:02           ` Alex Williamson
2015-11-19  4:29 ` [Qemu-devel] [RFC 02/12] spapr_pci: Switch to vfio_eeh_as_op() interface David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 03/12] spapr_pci: Eliminate class callbacks David Gibson
2015-11-24  9:00   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2015-11-25  6:22     ` David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 04/12] spapr_pci: Fold spapr_phb_vfio_eeh_configure() into spapr_pci code David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 05/12] spapr_pci: Fold spapr_phb_vfio_eeh_reset() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 06/12] spapr_pci: Fold spapr_phb_vfio_eeh_get_state() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 07/12] spapr_pci: Fold spapr_phb_vfio_eeh_set_option() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 08/12] spapr_pci: Fold spapr_phb_vfio_reset() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 09/12] spapr_pci: Allow EEH on spapr-pci-host-bridge David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 10/12] spapr_pci: (Mostly) remove spapr-pci-vfio-host-bridge David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 11/12] spapr_pci: Remove finish_realize hook David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 12/12] vfio: Eliminate vfio_container_ioctl() David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151203042219.GI3107@voom.redhat.com \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=gwshan@au1.ibm.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).