qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	Gavin Shan <gwshan@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
Date: Mon, 14 Sep 2015 14:04:07 +1000	[thread overview]
Message-ID: <20150914040407.GG2547@voom.fritz.box> (raw)
In-Reply-To: <1442001818.20355.680.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 9309 bytes --]

On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
> > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
> > > So far there were 2 limitations enforced on an emulated PHB
> > > regarding VFIO:
> > > 1) only one IOMMU group per IOMMU container was allowed and
> > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
> > > 2) only one IOMMU container per PHB was allowed as a group
> > > can only be attached to one container.
> > > 
> > > However these are not really necessary so we are removing them here.
> > > 
> > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
> > > not to receive it. Instead of getting a container from a group ID,
> > > this calls ioctl() for every container attached to the PHB address space.
> > > This allows multiple containers on the same PHB which means multiple
> > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
> > > to be passed to every container separately which will affect
> > > the performance. For 32bit devices it is still recommended to put
> > > every group to a separate PHB.
> > > 
> > > Since the existing VFIO code is already trying to share a container for
> > > multiple groups, just removing a group id from
> > > the vfio_container_do_ioctl() allows the existing code to share
> > > a container if it is supported by the host kernel.
> > > 
> > > This replaces the check for a group id to be set correctly with
> > > the check that it is not set.
> > > 
> > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> > > members is accessed here.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > ---
> > >  hw/ppc/spapr_pci.c      | 10 +++++-----
> > >  hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
> > >  hw/vfio/common.c        | 22 ++++++----------------
> > >  include/hw/vfio/vfio.h  |  3 +--
> > >  4 files changed, 18 insertions(+), 34 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index 4e289cb..1b7559d 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> > >      pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
> > >  
> > >      if (sphb->vfio_num) {
> > > -        if (sphb->iommugroupid == -1) {
> > > -            error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> > > -            return -1;
> > > -        }
> > > -
> > >          ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> > >          if (ret) {
> > >              error_report("Unable to get DMA32 info from VFIO");
> > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> > >      PCIBus *bus;
> > >      uint64_t msi_window_size = 4096;
> > >  
> > > +    if ((sphb->iommugroupid != -1) &&
> > > +        object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
> > > +        error_report("Warning: iommugroupid is deprecated and will be ignored");
> > > +    }
> > > +
> > >      if (sphb->index != (uint32_t)-1) {
> > >          hwaddr windows_base;
> > >  
> > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> > > index f94d8a4..48137d5 100644
> > > --- a/hw/ppc/spapr_pci_vfio.c
> > > +++ b/hw/ppc/spapr_pci_vfio.c
> > > @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> > >      struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
> > >      int ret;
> > >  
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as,
> > >                                 VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > >      if (ret) {
> > >          return ret;
> > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> > >          .op    = VFIO_EEH_PE_ENABLE
> > >      };
> > >  
> > > -    vfio_container_ioctl(&sphb->iommu_as,
> > > -                         sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> > > +    vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >  }
> > >  
> > >  int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > >  
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_HW_ERROR;
> > >      }
> > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> > >      int ret;
> > >  
> > >      op.op = VFIO_EEH_PE_GET_STATE;
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > >  
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_HW_ERROR;
> > >      }
> > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> > >      int ret;
> > >  
> > >      op.op = VFIO_EEH_PE_CONFIGURE;
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index 85ee9b0..a00644e 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
> > >      close(vbasedev->fd);
> > >  }
> > >  
> > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> > > -                                   int req, void *param)
> > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
> > >  {
> > > -    VFIOGroup *group;
> > >      VFIOContainer *container;
> > >      int ret = -1;
> > > +    VFIOAddressSpace *space = vfio_get_address_space(as);
> > >  
> > > -    group = vfio_get_group(groupid, as);
> > > -    if (!group) {
> > > -        error_report("vfio: group %d not registered", groupid);
> > > -        return ret;
> > > -    }
> > > -
> > > -    container = group->container;
> > > -    if (group->container) {
> > > +    QLIST_FOREACH(container, &space->containers, next) {
> > >          ret = ioctl(container->fd, req, param);
> > >          if (ret < 0) {
> > >              error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
> > >                           _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> > > +            return -errno;
> > >          }
> > >      }
> > 
> > This highlights how terrible this ioctl passthrough interface is; what's
> > a caller supposed to do on error?  Here we don't have visibility into
> > the ioctl being called in order to do any unwind on error.  The caller
> > doesn't have the context to unwind only the failed containers.  Is
> > returning errno here really sufficient for anyone to figure out how to
> > proceed or should we just cut our loses and abort()?  What's the least
> > horrible way we can realistically handle a failure here?  Thanks,
> 
> It seemed like I asked this before and it turns out that I did:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
> 
> "Gavin says yes" is not really a supportable solution when I stumbled on
> it again and David doesn't know why it's safe either (nor does it answer
> my question of how does this work).  At a minimum, the reasoning why
> this is safe for the ioctls we currently allow here needs to be spelled
> out in a comment.  We could easily make the mistake of adding another
> ioctl to the passthrough for which those assumptions are not valid.  We
> may even want to abort if it's not one of the ioctls we've evaluated so
> we don't overlook this problem later.  Thanks,

Yeah, you're right.  This is a mess.

What we need to do here is to make vfio_container_ioctl() take a
VFIOContainer object as the name suggests.  The the callers will need
to either use it on a specific container, or iterate across all the
containers in the VFIOAddressSpace as appropriate for the specific
operation.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2015-09-14  4:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-03  4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
2015-09-03  4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-09-03  4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
2015-09-10  2:43   ` Alex Williamson
2015-09-11 20:03     ` Alex Williamson
2015-09-14  4:04       ` David Gibson [this message]
2015-09-14  8:56         ` Alexey Kardashevskiy
2015-09-14 11:42           ` David Gibson
2015-09-14 14:37             ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150914040407.GG2547@voom.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).