From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40550) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZbPZ7-0006ba-AA for qemu-devel@nongnu.org; Mon, 14 Sep 2015 04:56:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZbPZ3-0001Fe-VT for qemu-devel@nongnu.org; Mon, 14 Sep 2015 04:56:45 -0400 Received: from mail-pa0-f50.google.com ([209.85.220.50]:35721) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZbPZ3-0001F3-Nk for qemu-devel@nongnu.org; Mon, 14 Sep 2015 04:56:41 -0400 Received: by pacfv12 with SMTP id fv12so141247262pac.2 for ; Mon, 14 Sep 2015 01:56:41 -0700 (PDT) References: <1441255231-18569-1-git-send-email-aik@ozlabs.ru> <1441255231-18569-3-git-send-email-aik@ozlabs.ru> <1441853009.20355.560.camel@redhat.com> <1442001818.20355.680.camel@redhat.com> <20150914040407.GG2547@voom.fritz.box> From: Alexey Kardashevskiy Message-ID: <55F68BC3.3030108@ozlabs.ru> Date: Mon, 14 Sep 2015 18:56:35 +1000 MIME-Version: 1.0 In-Reply-To: <20150914040407.GG2547@voom.fritz.box> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson , Alex Williamson Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Gavin Shan On 09/14/2015 02:04 PM, David Gibson wrote: > On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: >> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: >>> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: >>>> So far there were 2 limitations enforced on an emulated PHB >>>> regarding VFIO: >>>> 1) only one IOMMU group per IOMMU container was allowed and >>>> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; >>>> 2) only one IOMMU container per PHB was allowed as a group >>>> can only be attached to one container. >>>> >>>> However these are not really necessary so we are removing them here. >>>> >>>> This deprecates IOMMU group ID and changes vfio_container_do_ioctl() >>>> not to receive it. Instead of getting a container from a group ID, >>>> this calls ioctl() for every container attached to the PHB address space. >>>> This allows multiple containers on the same PHB which means multiple >>>> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call >>>> to be passed to every container separately which will affect >>>> the performance. For 32bit devices it is still recommended to put >>>> every group to a separate PHB. >>>> >>>> Since the existing VFIO code is already trying to share a container for >>>> multiple groups, just removing a group id from >>>> the vfio_container_do_ioctl() allows the existing code to share >>>> a container if it is supported by the host kernel. >>>> >>>> This replaces the check for a group id to be set correctly with >>>> the check that it is not set. >>>> >>>> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState >>>> members is accessed here. >>>> >>>> Signed-off-by: Alexey Kardashevskiy >>>> --- >>>> hw/ppc/spapr_pci.c | 10 +++++----- >>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- >>>> hw/vfio/common.c | 22 ++++++---------------- >>>> include/hw/vfio/vfio.h | 3 +-- >>>> 4 files changed, 18 insertions(+), 34 deletions(-) >>>> >>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >>>> index 4e289cb..1b7559d 100644 >>>> --- a/hw/ppc/spapr_pci.c >>>> +++ b/hw/ppc/spapr_pci.c >>>> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) >>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); >>>> >>>> if (sphb->vfio_num) { >>>> - if (sphb->iommugroupid == -1) { >>>> - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); >>>> - return -1; >>>> - } >>>> - >>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb); >>>> if (ret) { >>>> error_report("Unable to get DMA32 info from VFIO"); >>>> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) >>>> PCIBus *bus; >>>> uint64_t msi_window_size = 4096; >>>> >>>> + if ((sphb->iommugroupid != -1) && >>>> + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { >>>> + error_report("Warning: iommugroupid is deprecated and will be ignored"); >>>> + } >>>> + >>>> if (sphb->index != (uint32_t)-1) { >>>> hwaddr windows_base; >>>> >>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c >>>> index f94d8a4..48137d5 100644 >>>> --- a/hw/ppc/spapr_pci_vfio.c >>>> +++ b/hw/ppc/spapr_pci_vfio.c >>>> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) >>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; >>>> int ret; >>>> >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, >>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); >>>> if (ret) { >>>> return ret; >>>> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) >>>> .op = VFIO_EEH_PE_ENABLE >>>> }; >>>> >>>> - vfio_container_ioctl(&sphb->iommu_as, >>>> - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); >>>> + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> } >>>> >>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, >>>> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_HW_ERROR; >>>> } >>>> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) >>>> int ret; >>>> >>>> op.op = VFIO_EEH_PE_GET_STATE; >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_HW_ERROR; >>>> } >>>> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) >>>> int ret; >>>> >>>> op.op = VFIO_EEH_PE_CONFIGURE; >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index 85ee9b0..a00644e 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) >>>> close(vbasedev->fd); >>>> } >>>> >>>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, >>>> - int req, void *param) >>>> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) >>>> { >>>> - VFIOGroup *group; >>>> VFIOContainer *container; >>>> int ret = -1; >>>> + VFIOAddressSpace *space = vfio_get_address_space(as); >>>> >>>> - group = vfio_get_group(groupid, as); >>>> - if (!group) { >>>> - error_report("vfio: group %d not registered", groupid); >>>> - return ret; >>>> - } >>>> - >>>> - container = group->container; >>>> - if (group->container) { >>>> + QLIST_FOREACH(container, &space->containers, next) { >>>> ret = ioctl(container->fd, req, param); >>>> if (ret < 0) { >>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s", >>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); >>>> + return -errno; >>>> } >>>> } >>> >>> This highlights how terrible this ioctl passthrough interface is; what's >>> a caller supposed to do on error? Here we don't have visibility into >>> the ioctl being called in order to do any unwind on error. The caller >>> doesn't have the context to unwind only the failed containers. Is >>> returning errno here really sufficient for anyone to figure out how to >>> proceed or should we just cut our loses and abort()? What's the least >>> horrible way we can realistically handle a failure here? Thanks, >> >> It seemed like I asked this before and it turns out that I did: >> >> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html >> >> "Gavin says yes" is not really a supportable solution when I stumbled on >> it again and David doesn't know why it's safe either (nor does it answer >> my question of how does this work). At a minimum, the reasoning why >> this is safe for the ioctls we currently allow here needs to be spelled >> out in a comment. We could easily make the mistake of adding another >> ioctl to the passthrough for which those assumptions are not valid. We >> may even want to abort if it's not one of the ioctls we've evaluated so >> we don't overlook this problem later. Thanks, > > Yeah, you're right. This is a mess. Right. I have been testing/studying this lately a lot, will fix this too. > What we need to do here is to make vfio_container_ioctl() take a > VFIOContainer object as the name suggests. The the callers will need > to either use it on a specific container, or iterate across all the > containers in the VFIOAddressSpace as appropriate for the specific > operation. The callers - spapr code - do not deal with containers, these are a VFIO internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace. So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?) makes more sense here. Which I can make a patch for. Or change vfio_container_ioctl() as you suggest (+make is static) and add public vfio_as_ioctl(). Which one to pick? -- Alexey