qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	Gavin Shan <gwshan@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
Date: Mon, 14 Sep 2015 18:56:35 +1000	[thread overview]
Message-ID: <55F68BC3.3030108@ozlabs.ru> (raw)
In-Reply-To: <20150914040407.GG2547@voom.fritz.box>

On 09/14/2015 02:04 PM, David Gibson wrote:
> On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
>> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
>>> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
>>>> So far there were 2 limitations enforced on an emulated PHB
>>>> regarding VFIO:
>>>> 1) only one IOMMU group per IOMMU container was allowed and
>>>> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
>>>> 2) only one IOMMU container per PHB was allowed as a group
>>>> can only be attached to one container.
>>>>
>>>> However these are not really necessary so we are removing them here.
>>>>
>>>> This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
>>>> not to receive it. Instead of getting a container from a group ID,
>>>> this calls ioctl() for every container attached to the PHB address space.
>>>> This allows multiple containers on the same PHB which means multiple
>>>> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
>>>> to be passed to every container separately which will affect
>>>> the performance. For 32bit devices it is still recommended to put
>>>> every group to a separate PHB.
>>>>
>>>> Since the existing VFIO code is already trying to share a container for
>>>> multiple groups, just removing a group id from
>>>> the vfio_container_do_ioctl() allows the existing code to share
>>>> a container if it is supported by the host kernel.
>>>>
>>>> This replaces the check for a group id to be set correctly with
>>>> the check that it is not set.
>>>>
>>>> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
>>>> members is accessed here.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>   hw/ppc/spapr_pci.c      | 10 +++++-----
>>>>   hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
>>>>   hw/vfio/common.c        | 22 ++++++----------------
>>>>   include/hw/vfio/vfio.h  |  3 +--
>>>>   4 files changed, 18 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index 4e289cb..1b7559d 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
>>>>       pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
>>>>
>>>>       if (sphb->vfio_num) {
>>>> -        if (sphb->iommugroupid == -1) {
>>>> -            error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
>>>> -            return -1;
>>>> -        }
>>>> -
>>>>           ret = spapr_phb_vfio_dma_capabilities_update(sphb);
>>>>           if (ret) {
>>>>               error_report("Unable to get DMA32 info from VFIO");
>>>> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>>       PCIBus *bus;
>>>>       uint64_t msi_window_size = 4096;
>>>>
>>>> +    if ((sphb->iommugroupid != -1) &&
>>>> +        object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
>>>> +        error_report("Warning: iommugroupid is deprecated and will be ignored");
>>>> +    }
>>>> +
>>>>       if (sphb->index != (uint32_t)-1) {
>>>>           hwaddr windows_base;
>>>>
>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>>>> index f94d8a4..48137d5 100644
>>>> --- a/hw/ppc/spapr_pci_vfio.c
>>>> +++ b/hw/ppc/spapr_pci_vfio.c
>>>> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
>>>>       struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>>>>       int ret;
>>>>
>>>> -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> +    ret = vfio_container_ioctl(&sphb->iommu_as,
>>>>                                  VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
>>>>       if (ret) {
>>>>           return ret;
>>>> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
>>>>           .op    = VFIO_EEH_PE_ENABLE
>>>>       };
>>>>
>>>> -    vfio_container_ioctl(&sphb->iommu_as,
>>>> -                         sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
>>>> +    vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>   }
>>>>
>>>>   int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>>           return RTAS_OUT_PARAM_ERROR;
>>>>       }
>>>>
>>>> -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> -                               VFIO_EEH_PE_OP, &op);
>>>> +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>       if (ret < 0) {
>>>>           return RTAS_OUT_HW_ERROR;
>>>>       }
>>>> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
>>>>       int ret;
>>>>
>>>>       op.op = VFIO_EEH_PE_GET_STATE;
>>>> -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> -                               VFIO_EEH_PE_OP, &op);
>>>> +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>       if (ret < 0) {
>>>>           return RTAS_OUT_PARAM_ERROR;
>>>>       }
>>>> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>>>>           return RTAS_OUT_PARAM_ERROR;
>>>>       }
>>>>
>>>> -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> -                               VFIO_EEH_PE_OP, &op);
>>>> +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>       if (ret < 0) {
>>>>           return RTAS_OUT_HW_ERROR;
>>>>       }
>>>> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>>>>       int ret;
>>>>
>>>>       op.op = VFIO_EEH_PE_CONFIGURE;
>>>> -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> -                               VFIO_EEH_PE_OP, &op);
>>>> +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>       if (ret < 0) {
>>>>           return RTAS_OUT_PARAM_ERROR;
>>>>       }
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 85ee9b0..a00644e 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
>>>>       close(vbasedev->fd);
>>>>   }
>>>>
>>>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
>>>> -                                   int req, void *param)
>>>> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
>>>>   {
>>>> -    VFIOGroup *group;
>>>>       VFIOContainer *container;
>>>>       int ret = -1;
>>>> +    VFIOAddressSpace *space = vfio_get_address_space(as);
>>>>
>>>> -    group = vfio_get_group(groupid, as);
>>>> -    if (!group) {
>>>> -        error_report("vfio: group %d not registered", groupid);
>>>> -        return ret;
>>>> -    }
>>>> -
>>>> -    container = group->container;
>>>> -    if (group->container) {
>>>> +    QLIST_FOREACH(container, &space->containers, next) {
>>>>           ret = ioctl(container->fd, req, param);
>>>>           if (ret < 0) {
>>>>               error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
>>>>                            _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
>>>> +            return -errno;
>>>>           }
>>>>       }
>>>
>>> This highlights how terrible this ioctl passthrough interface is; what's
>>> a caller supposed to do on error?  Here we don't have visibility into
>>> the ioctl being called in order to do any unwind on error.  The caller
>>> doesn't have the context to unwind only the failed containers.  Is
>>> returning errno here really sufficient for anyone to figure out how to
>>> proceed or should we just cut our loses and abort()?  What's the least
>>> horrible way we can realistically handle a failure here?  Thanks,
>>
>> It seemed like I asked this before and it turns out that I did:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
>>
>> "Gavin says yes" is not really a supportable solution when I stumbled on
>> it again and David doesn't know why it's safe either (nor does it answer
>> my question of how does this work).  At a minimum, the reasoning why
>> this is safe for the ioctls we currently allow here needs to be spelled
>> out in a comment.  We could easily make the mistake of adding another
>> ioctl to the passthrough for which those assumptions are not valid.  We
>> may even want to abort if it's not one of the ioctls we've evaluated so
>> we don't overlook this problem later.  Thanks,
>
> Yeah, you're right.  This is a mess.


Right. I have been testing/studying this lately a lot, will fix this too.


> What we need to do here is to make vfio_container_ioctl() take a
> VFIOContainer object as the name suggests.  The the callers will need
> to either use it on a specific container, or iterate across all the
> containers in the VFIOAddressSpace as appropriate for the specific
> operation.

The callers - spapr code - do not deal with containers, these are a VFIO 
internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace.

So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?) 
makes more sense here. Which I can make a patch for. Or change 
vfio_container_ioctl() as you suggest (+make is static) and add public 
vfio_as_ioctl(). Which one to pick?



-- 
Alexey

  reply	other threads:[~2015-09-14  8:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-03  4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
2015-09-03  4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-09-03  4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
2015-09-10  2:43   ` Alex Williamson
2015-09-11 20:03     ` Alex Williamson
2015-09-14  4:04       ` David Gibson
2015-09-14  8:56         ` Alexey Kardashevskiy [this message]
2015-09-14 11:42           ` David Gibson
2015-09-14 14:37             ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F68BC3.3030108@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).