From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49141) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZbKzl-00008v-DD for qemu-devel@nongnu.org; Mon, 14 Sep 2015 00:03:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZbKzh-0000pP-EW for qemu-devel@nongnu.org; Mon, 14 Sep 2015 00:03:57 -0400 Date: Mon, 14 Sep 2015 14:04:07 +1000 From: David Gibson Message-ID: <20150914040407.GG2547@voom.fritz.box> References: <1441255231-18569-1-git-send-email-aik@ozlabs.ru> <1441255231-18569-3-git-send-email-aik@ozlabs.ru> <1441853009.20355.560.camel@redhat.com> <1442001818.20355.680.camel@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="L1c6L/cjZjI9d0Eq" Content-Disposition: inline In-Reply-To: <1442001818.20355.680.camel@redhat.com> Subject: Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Alexey Kardashevskiy , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Gavin Shan --L1c6L/cjZjI9d0Eq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: > On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: > > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: > > > So far there were 2 limitations enforced on an emulated PHB > > > regarding VFIO: > > > 1) only one IOMMU group per IOMMU container was allowed and > > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for th= is; > > > 2) only one IOMMU container per PHB was allowed as a group > > > can only be attached to one container. > > >=20 > > > However these are not really necessary so we are removing them here. > > >=20 > > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl() > > > not to receive it. Instead of getting a container from a group ID, > > > this calls ioctl() for every container attached to the PHB address sp= ace. > > > This allows multiple containers on the same PHB which means multiple > > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call > > > to be passed to every container separately which will affect > > > the performance. For 32bit devices it is still recommended to put > > > every group to a separate PHB. > > >=20 > > > Since the existing VFIO code is already trying to share a container f= or > > > multiple groups, just removing a group id from > > > the vfio_container_do_ioctl() allows the existing code to share > > > a container if it is supported by the host kernel. > > >=20 > > > This replaces the check for a group id to be set correctly with > > > the check that it is not set. > > >=20 > > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState > > > members is accessed here. > > >=20 > > > Signed-off-by: Alexey Kardashevskiy > > > --- > > > hw/ppc/spapr_pci.c | 10 +++++----- > > > hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- > > > hw/vfio/common.c | 22 ++++++---------------- > > > include/hw/vfio/vfio.h | 3 +-- > > > 4 files changed, 18 insertions(+), 34 deletions(-) > > >=20 > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > > index 4e289cb..1b7559d 100644 > > > --- a/hw/ppc/spapr_pci.c > > > +++ b/hw/ppc/spapr_pci.c > > > @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPA= PRPHBState *sphb) > > > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_device= s, sphb); > > > =20 > > > if (sphb->vfio_num) { > > > - if (sphb->iommugroupid =3D=3D -1) { > > > - error_report("Wrong IOMMU group ID %d", sphb->iommugroup= id); > > > - return -1; > > > - } > > > - > > > ret =3D spapr_phb_vfio_dma_capabilities_update(sphb); > > > if (ret) { > > > error_report("Unable to get DMA32 info from VFIO"); > > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev= , Error **errp) > > > PCIBus *bus; > > > uint64_t msi_window_size =3D 4096; > > > =20 > > > + if ((sphb->iommugroupid !=3D -1) && > > > + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_B= RIDGE)) { > > > + error_report("Warning: iommugroupid is deprecated and will b= e ignored"); > > > + } > > > + > > > if (sphb->index !=3D (uint32_t)-1) { > > > hwaddr windows_base; > > > =20 > > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c > > > index f94d8a4..48137d5 100644 > > > --- a/hw/ppc/spapr_pci_vfio.c > > > +++ b/hw/ppc/spapr_pci_vfio.c > > > @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHB= State *sphb) > > > struct vfio_iommu_spapr_tce_info info =3D { .argsz =3D sizeof(in= fo) }; > > > int ret; > > > =20 > > > - ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > + ret =3D vfio_container_ioctl(&sphb->iommu_as, > > > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > > > if (ret) { > > > return ret; > > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sph= b) > > > .op =3D VFIO_EEH_PE_ENABLE > > > }; > > > =20 > > > - vfio_container_ioctl(&sphb->iommu_as, > > > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); > > > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > } > > > =20 > > > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sp= hb, > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > =20 > > > - ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > > > if (ret < 0) { > > > return RTAS_OUT_HW_ERROR; > > > } > > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sph= b, int *state) > > > int ret; > > > =20 > > > op.op =3D VFIO_EEH_PE_GET_STATE; > > > - ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > > > if (ret < 0) { > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb,= int option) > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > =20 > > > - ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > > > if (ret < 0) { > > > return RTAS_OUT_HW_ERROR; > > > } > > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *s= phb) > > > int ret; > > > =20 > > > op.op =3D VFIO_EEH_PE_CONFIGURE; > > > - ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > > > if (ret < 0) { > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > index 85ee9b0..a00644e 100644 > > > --- a/hw/vfio/common.c > > > +++ b/hw/vfio/common.c > > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) > > > close(vbasedev->fd); > > > } > > > =20 > > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, > > > - int req, void *param) > > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *= param) > > > { > > > - VFIOGroup *group; > > > VFIOContainer *container; > > > int ret =3D -1; > > > + VFIOAddressSpace *space =3D vfio_get_address_space(as); > > > =20 > > > - group =3D vfio_get_group(groupid, as); > > > - if (!group) { > > > - error_report("vfio: group %d not registered", groupid); > > > - return ret; > > > - } > > > - > > > - container =3D group->container; > > > - if (group->container) { > > > + QLIST_FOREACH(container, &space->containers, next) { > > > ret =3D ioctl(container->fd, req, param); > > > if (ret < 0) { > > > error_report("vfio: failed to ioctl %d to container: ret= =3D%d, %s", > > > _IOC_NR(req) - VFIO_BASE, ret, strerror(err= no)); > > > + return -errno; > > > } > > > } > >=20 > > This highlights how terrible this ioctl passthrough interface is; what's > > a caller supposed to do on error? Here we don't have visibility into > > the ioctl being called in order to do any unwind on error. The caller > > doesn't have the context to unwind only the failed containers. Is > > returning errno here really sufficient for anyone to figure out how to > > proceed or should we just cut our loses and abort()? What's the least > > horrible way we can realistically handle a failure here? Thanks, >=20 > It seemed like I asked this before and it turns out that I did: >=20 > https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html >=20 > "Gavin says yes" is not really a supportable solution when I stumbled on > it again and David doesn't know why it's safe either (nor does it answer > my question of how does this work). At a minimum, the reasoning why > this is safe for the ioctls we currently allow here needs to be spelled > out in a comment. We could easily make the mistake of adding another > ioctl to the passthrough for which those assumptions are not valid. We > may even want to abort if it's not one of the ioctls we've evaluated so > we don't overlook this problem later. Thanks, Yeah, you're right. This is a mess. What we need to do here is to make vfio_container_ioctl() take a VFIOContainer object as the name suggests. The the callers will need to either use it on a specific container, or iterate across all the containers in the VFIOAddressSpace as appropriate for the specific operation. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --L1c6L/cjZjI9d0Eq Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJV9kc3AAoJEGw4ysog2bOSpZgQAN4QLnCq4x42kadeDv15Nf02 H57q89NWlyEsqq2sv9VASUYpEX3Zb0LZ0LyDuzTFWGZPdPiOtgprcTkGmotRj/8D MiG6V0/1mz5xPuhe+gMgCWF2PjtzpMAid6h6sDtzeZXxWU7P//6z5xRMn80sZNMk F45E/ZP0njiJ2bMeX7UFmhevAxhw3U4wB9tDR5DcPojj0AsBs8DfAHE1jK6HE4yF WENPfDsTjd6c54eIm7u/tm/ttvhq0/RuNdF9IKPFF198UDsgb+hOmkJdi2YezyWX Wkn3Wlp7AvjvnfDL/u/Rx9GYBeXKEXLGwMlT7jxb1O9+B2FFpRwxYawgJNp8Q2sA +zA1VIt9ZXQUahtSoMnoqDK29uWzr+Wu1JusQ9ec0482jVO4FsJUYTIUZaU9Ok5q PtzEXr3te8+GR58K7FwxnJXN12qa87Wt30PC46mLQ+Gq4UAaDSrTjcCuwAeTM5/k hcKBafGxYRrA69WUThlyfpSd9XXmH3dlPTv7JDBi0Ns2wq+ll0eMw34FLhR0OiqT 7Q9vDDEt+dc+W6D7PL5VO8OvC7EHHlMVbIoawHBBieHBTyM4ou07JhmIRnmuV6IG lIuf/Ne8DTWjNsJW7xEk3PlvP4AvWiHp/SB8LuwGRIGDRSRrCFbCBf35n3vrJhfT Wnyxnsd+Po3makI/706/ =ttc6 -----END PGP SIGNATURE----- --L1c6L/cjZjI9d0Eq--