From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37039) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZbSBl-0001my-5E for qemu-devel@nongnu.org; Mon, 14 Sep 2015 07:44:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZbSBj-00009u-3z for qemu-devel@nongnu.org; Mon, 14 Sep 2015 07:44:49 -0400 Date: Mon, 14 Sep 2015 21:42:07 +1000 From: David Gibson Message-ID: <20150914114207.GO2547@voom.fritz.box> References: <1441255231-18569-1-git-send-email-aik@ozlabs.ru> <1441255231-18569-3-git-send-email-aik@ozlabs.ru> <1441853009.20355.560.camel@redhat.com> <1442001818.20355.680.camel@redhat.com> <20150914040407.GG2547@voom.fritz.box> <55F68BC3.3030108@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="96dIhm/ZjrNld+BP" Content-Disposition: inline In-Reply-To: <55F68BC3.3030108@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Gavin Shan --96dIhm/ZjrNld+BP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Sep 14, 2015 at 06:56:35PM +1000, Alexey Kardashevskiy wrote: > On 09/14/2015 02:04 PM, David Gibson wrote: > >On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: > >>On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: > >>>On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: > >>>>So far there were 2 limitations enforced on an emulated PHB > >>>>regarding VFIO: > >>>>1) only one IOMMU group per IOMMU container was allowed and > >>>>the spapr-pci-vfio-host-bridge device has an IOMMU ID property for th= is; > >>>>2) only one IOMMU container per PHB was allowed as a group > >>>>can only be attached to one container. > >>>> > >>>>However these are not really necessary so we are removing them here. > >>>> > >>>>This deprecates IOMMU group ID and changes vfio_container_do_ioctl() > >>>>not to receive it. Instead of getting a container from a group ID, > >>>>this calls ioctl() for every container attached to the PHB address sp= ace. > >>>>This allows multiple containers on the same PHB which means multiple > >>>>groups per PHB. Note that this will lead to every H_PUT_TCE&etc call > >>>>to be passed to every container separately which will affect > >>>>the performance. For 32bit devices it is still recommended to put > >>>>every group to a separate PHB. > >>>> > >>>>Since the existing VFIO code is already trying to share a container f= or > >>>>multiple groups, just removing a group id from > >>>>the vfio_container_do_ioctl() allows the existing code to share > >>>>a container if it is supported by the host kernel. > >>>> > >>>>This replaces the check for a group id to be set correctly with > >>>>the check that it is not set. > >>>> > >>>>This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState > >>>>members is accessed here. > >>>> > >>>>Signed-off-by: Alexey Kardashevskiy > >>>>--- > >>>> hw/ppc/spapr_pci.c | 10 +++++----- > >>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- > >>>> hw/vfio/common.c | 22 ++++++---------------- > >>>> include/hw/vfio/vfio.h | 3 +-- > >>>> 4 files changed, 18 insertions(+), 34 deletions(-) > >>>> > >>>>diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >>>>index 4e289cb..1b7559d 100644 > >>>>--- a/hw/ppc/spapr_pci.c > >>>>+++ b/hw/ppc/spapr_pci.c > >>>>@@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPA= PRPHBState *sphb) > >>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devic= es, sphb); > >>>> > >>>> if (sphb->vfio_num) { > >>>>- if (sphb->iommugroupid =3D=3D -1) { > >>>>- error_report("Wrong IOMMU group ID %d", sphb->iommugroup= id); > >>>>- return -1; > >>>>- } > >>>>- > >>>> ret =3D spapr_phb_vfio_dma_capabilities_update(sphb); > >>>> if (ret) { > >>>> error_report("Unable to get DMA32 info from VFIO"); > >>>>@@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev= , Error **errp) > >>>> PCIBus *bus; > >>>> uint64_t msi_window_size =3D 4096; > >>>> > >>>>+ if ((sphb->iommugroupid !=3D -1) && > >>>>+ object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_B= RIDGE)) { > >>>>+ error_report("Warning: iommugroupid is deprecated and will b= e ignored"); > >>>>+ } > >>>>+ > >>>> if (sphb->index !=3D (uint32_t)-1) { > >>>> hwaddr windows_base; > >>>> > >>>>diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c > >>>>index f94d8a4..48137d5 100644 > >>>>--- a/hw/ppc/spapr_pci_vfio.c > >>>>+++ b/hw/ppc/spapr_pci_vfio.c > >>>>@@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHB= State *sphb) > >>>> struct vfio_iommu_spapr_tce_info info =3D { .argsz =3D sizeof(i= nfo) }; > >>>> int ret; > >>>> > >>>>- ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>+ ret =3D vfio_container_ioctl(&sphb->iommu_as, > >>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info= ); > >>>> if (ret) { > >>>> return ret; > >>>>@@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sph= b) > >>>> .op =3D VFIO_EEH_PE_ENABLE > >>>> }; > >>>> > >>>>- vfio_container_ioctl(&sphb->iommu_as, > >>>>- sphb->iommugroupid, VFIO_EEH_PE_OP, &op); > >>>>+ vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > >>>> } > >>>> > >>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > >>>>@@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sp= hb, > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>> > >>>>- ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_HW_ERROR; > >>>> } > >>>>@@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sph= b, int *state) > >>>> int ret; > >>>> > >>>> op.op =3D VFIO_EEH_PE_GET_STATE; > >>>>- ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>>@@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb,= int option) > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>> > >>>>- ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_HW_ERROR; > >>>> } > >>>>@@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *s= phb) > >>>> int ret; > >>>> > >>>> op.op =3D VFIO_EEH_PE_CONFIGURE; > >>>>- ret =3D vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret =3D vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &o= p); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>>diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>>>index 85ee9b0..a00644e 100644 > >>>>--- a/hw/vfio/common.c > >>>>+++ b/hw/vfio/common.c > >>>>@@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) > >>>> close(vbasedev->fd); > >>>> } > >>>> > >>>>-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, > >>>>- int req, void *param) > >>>>+static int vfio_container_do_ioctl(AddressSpace *as, int req, void *= param) > >>>> { > >>>>- VFIOGroup *group; > >>>> VFIOContainer *container; > >>>> int ret =3D -1; > >>>>+ VFIOAddressSpace *space =3D vfio_get_address_space(as); > >>>> > >>>>- group =3D vfio_get_group(groupid, as); > >>>>- if (!group) { > >>>>- error_report("vfio: group %d not registered", groupid); > >>>>- return ret; > >>>>- } > >>>>- > >>>>- container =3D group->container; > >>>>- if (group->container) { > >>>>+ QLIST_FOREACH(container, &space->containers, next) { > >>>> ret =3D ioctl(container->fd, req, param); > >>>> if (ret < 0) { > >>>> error_report("vfio: failed to ioctl %d to container: re= t=3D%d, %s", > >>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(er= rno)); > >>>>+ return -errno; > >>>> } > >>>> } > >>> > >>>This highlights how terrible this ioctl passthrough interface is; what= 's > >>>a caller supposed to do on error? Here we don't have visibility into > >>>the ioctl being called in order to do any unwind on error. The caller > >>>doesn't have the context to unwind only the failed containers. Is > >>>returning errno here really sufficient for anyone to figure out how to > >>>proceed or should we just cut our loses and abort()? What's the least > >>>horrible way we can realistically handle a failure here? Thanks, > >> > >>It seemed like I asked this before and it turns out that I did: > >> > >>https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html > >> > >>"Gavin says yes" is not really a supportable solution when I stumbled on > >>it again and David doesn't know why it's safe either (nor does it answer > >>my question of how does this work). At a minimum, the reasoning why > >>this is safe for the ioctls we currently allow here needs to be spelled > >>out in a comment. We could easily make the mistake of adding another > >>ioctl to the passthrough for which those assumptions are not valid. We > >>may even want to abort if it's not one of the ioctls we've evaluated so > >>we don't overlook this problem later. Thanks, > > > >Yeah, you're right. This is a mess. >=20 >=20 > Right. I have been testing/studying this lately a lot, will fix this too. >=20 >=20 > >What we need to do here is to make vfio_container_ioctl() take a > >VFIOContainer object as the name suggests. The the callers will need > >to either use it on a specific container, or iterate across all the > >containers in the VFIOAddressSpace as appropriate for the specific > >operation. >=20 > The callers - spapr code - do not deal with containers, these are a VFIO > internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpa= ce. >=20 > So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?) > makes more sense here. Which I can make a patch for. Or change > vfio_container_ioctl() as you suggest (+make is static) and add public > vfio_as_ioctl(). Which one to pick? No, that doesn't address the problem. The ioctl is operating on the container, the wrapper needs to take the container. Whether the ioctl can be safely replicated across all containers in the VFIOAddressSpace, or can only be used if you have additional information to select a specific container *cannot* be known unless you know exactly what ioctl() you're talking about. That decision has to move to the callers, which know what they're doing. If those callers don't know about individual containers now, then they have to learn. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --96dIhm/ZjrNld+BP Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJV9rKPAAoJEGw4ysog2bOSPYUP+QFEbjd3Ml/+mUF1yzZv9oan Wf9KKYQb5RvnzI+YmSNSK54NLtPdWt4FZu0vPPqwOlMtBeMatXmL/h2Gzi8fNRTM tyz7ZCN1S0jYUr0nsyDa52twXQR4bTXAoKiPU/0gxxCMrT6k7LIUlM73JcXMeAsO QMV4RUbY0twR5yGQFXREOUtQqwXG5Hxe7h2BZCPUeipm6IRsCUWy/Z+/v8BYKira w/NB2wNTitiSaC+Gg0G0sIQRNUtM4suBxE17dgPE8I926AKNtunnbURiDNKL8PR9 ya0kd6sWKYcxdkX9rrMCnY1UhlHCI5es1guxSzElafjHb6ySaLjM6a31O1lqq3mx XPfTbx9zZ2A0kW7HWEHMDRgqEegakjkDndHbNEZp6ECYy4RhiD+EDtRekfsYAlcO e9Exx40x1dwoojlBYms+WPbH637M2kz97QfPfWGn1VVdqnXo+zNYMJYmsENzbtLX Advr3ZAqfvoRnm7S9JkGe1pJswC/aFleOCBlNe/evzAoMjt5ZIHQi6D1DR/H4AN1 l79KXBqkxbs7lTvNi6xyBtdCtKJ4QkeiSNxu2X88Dz3oo90cxBvEsEGzXaiRZP1g DhEYZG5ziHLEgQAjY/M/KAP7jMFe5ChVrqR64Ubb6jh7VrYFmxycFPO4sS1yBmo6 2icFbhC0janNxK+eKlPn =GEFe -----END PGP SIGNATURE----- --96dIhm/ZjrNld+BP--