From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43073) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zeto7-0007nC-N3 for qemu-devel@nongnu.org; Wed, 23 Sep 2015 19:50:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zeto5-0000PZ-RH for qemu-devel@nongnu.org; Wed, 23 Sep 2015 19:50:39 -0400 Date: Thu, 24 Sep 2015 09:43:24 +1000 From: David Gibson Message-ID: <20150923234324.GC15944@voom.fritz.box> References: <1442495357-26547-1-git-send-email-david@gibson.dropbear.id.au> <1442495357-26547-4-git-send-email-david@gibson.dropbear.id.au> <56027AA6.8090504@redhat.com> <20150923110706.GA15944@voom.fritz.box> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="RIYY1s2vRbPFwWeW" Content-Disposition: inline In-Reply-To: <20150923110706.GA15944@voom.fritz.box> Subject: Re: [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Thomas Huth Cc: lvivier@redhat.com, aik@ozlabs.ru, gwshan@linux.vnet.ibm.com, qemu-devel@nongnu.org, alex.williamson@redhat.com, qemu-ppc@nongnu.org, pbonzini@redhat.com --RIYY1s2vRbPFwWeW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 23, 2015 at 09:07:06PM +1000, David Gibson wrote: > On Wed, Sep 23, 2015 at 12:10:46PM +0200, Thomas Huth wrote: > > On 17/09/15 15:09, David Gibson wrote: > > > The current vfio core code assumes that the host IOMMU is capable of > > > mapping any IOVA the guest wants to use to where we need. However, r= eal > > > IOMMUs generally only support translating a certain range of IOVAs (t= he > > > "DMA window") not a full 64-bit address space. > > >=20 > > > The common x86 IOMMUs support a wide enough range that guests are very > > > unlikely to go beyond it in practice, however the IOMMU used on IBM P= ower > > > machines - in the default configuration - supports only a much more l= imited > > > IOVA range, usually 0..2GiB. > > >=20 > > > If the guest attempts to set up an IOVA range that the host IOMMU can= 't > > > map, qemu won't report an error until it actually attempts to map a b= ad > > > IOVA. If guest RAM is being mapped directly into the IOMMU (i.e. no = guest > > > visible IOMMU) then this will show up very quickly. If there is a gu= est > > > visible IOMMU, however, the problem might not show up until much late= r when > > > the guest actually attempt to DMA with an IOVA the host can't handle. > > >=20 > > > This patch adds a test so that we will detect earlier if the guest is > > > attempting to use IOVA ranges that the host IOMMU won't be able to de= al > > > with. > > >=20 > > > For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, th= is is > > > incorrect, but no worse than what we have already. We can't do bette= r for > > > now because the Type1 kernel interface doesn't tell us what IOVA rang= e the > > > IOMMU actually supports. > > >=20 > > > For the Power "sPAPR TCE" IOMMU, however, we can retrieve the support= ed > > > IOVA range and validate guest IOVA ranges against it, and this patch = does > > > so. > > >=20 > > > Signed-off-by: David Gibson > > > --- > > > hw/vfio/common.c | 42 +++++++++++++++++++++++++++++++++= ++++++--- > > > include/hw/vfio/vfio-common.h | 6 ++++++ > > > 2 files changed, 45 insertions(+), 3 deletions(-) > > >=20 > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > index 9953b9c..c37f1a1 100644 > > > --- a/hw/vfio/common.c > > > +++ b/hw/vfio/common.c > > > @@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryList= ener *listener, > > > if (int128_ge(int128_make64(iova), llend)) { > > > return; > > > } > > > + end =3D int128_get64(llend); > > > + > > > + if ((iova < container->iommu_data.min_iova) > > > + || ((end - 1) > container->iommu_data.max_iova)) { > >=20 > > (Too much paranthesis for my taste ;-)) >=20 > Yes, well, we've already established our tastes differ on that point. >=20 > > > + error_report("vfio: IOMMU container %p can't map guest IOVA = region" > > > + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, > > > + container, iova, end - 1); > > > + ret =3D -EFAULT; /* FIXME: better choice here? */ > >=20 > > Maybe -EINVAL? ... but -EFAULT also sounds ok for me. >=20 > I try to avoid EINVAL unless it's clearly the only right choice. So > many things use it that it tends to be very unhelpful when you get one. >=20 > > > + goto fail; > > > + } > > ... > > > @@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *gro= up, AddressSpace *as) > > > ret =3D -errno; > > > goto free_container_exit; > > > } > > > + > > > + /* > > > + * FIXME: This only considers the host IOMMU' 32-bit window. > > > + * At some point we need to add support for the optional > > > + * 64-bit window and dynamic windows > > > + */ > > > + info.argsz =3D sizeof(info); > > > + ret =3D ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > > > + if (ret) { > > > + error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed= : %m"); > >=20 > > Isn't that %m a glibc extension only? ... Well, this code likely only > > runs on Linux with a glibc, so it likely doesn't matter, I guess... >=20 > Yes, it is, but it's already used extensively within qemu. >=20 > > > + ret =3D -errno; > > > + goto free_container_exit; > > > + } > > > + container->iommu_data.min_iova =3D info.dma32_window_start; > > > + container->iommu_data.max_iova =3D container->iommu_data.min= _iova > > > + + info.dma32_window_size - 1; > > > } else { > > > error_report("vfio: No available IOMMU models"); > > > ret =3D -EINVAL; > > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-com= mon.h > > > index aff18cd..88ec213 100644 > > > --- a/include/hw/vfio/vfio-common.h > > > +++ b/include/hw/vfio/vfio-common.h > > > @@ -71,6 +71,12 @@ typedef struct VFIOContainer { > > > MemoryListener listener; > > > int error; > > > bool initialized; > > > + /* > > > + * FIXME: This assumes the host IOMMU can support only a > > > + * single contiguous IOVA window. We may need to generalize > > > + * that in future > > > + */ > > > + hwaddr min_iova, max_iova; > >=20 > > Should that maybe be dma_addr_t instead of hwaddr ? >=20 > Ah, yes it probably should. Actually, on further consideration, no it shouldn't. hwaddr is what's used throughout the VFIO code, in address_space_translate() and in IOMMUTLBEntry, for both sides of the translation. In fact, I'm not entirely convinced there's any reason to have dma_addr_t distinct from hwaddr at all, but that's a cleanup for some other day. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --RIYY1s2vRbPFwWeW Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWAzkcAAoJEGw4ysog2bOS71oP/3EIdVEOkZaD9t/H0Z7Mdfw0 js/o7tk6HIDBwE70WknvOG/bCKLy/Jd0untFpu2asQFTXZXF8tRP1z/z7yorr5YJ GBTIJMuEsVgfLSa9DqNbn6bSz2rgX4yartayXju3Hfgdvl3RquJgtmN8siLILcIp 3PhRfG1vJ77Fl3f/+jiOgJYPiQlv6O0msLoYlL9DTsKEuy++Rj4suwnwENgiqeQP YIyNIEbBfav3Jbm6SsIFQo72bQb13EK0D4WxNq0yQqxr5S6A3lfkTd0wOtZWs6/N 0mV2jSb+8yIktLSieVepG5MFyqmy6ye4z4mBZ4h+aaWno9n0eKV+8yVc51pn0eZ9 x5wJtwjcJDioVVZ8KdDsSWKi9zUbrVNhWlGyFkkcepQTBe77AVjaUx+qrQp/N5tP O0y6lqBiFORl2AKoRgYySs70NNwkTFzcVPGuQtnat9gRVAu9gdy1KjloZR3ieX3S Sl6GaQXXm7/Cie4YkIAuWI78H+ETerB8rJF/omWxq8kjk0KglEp8ToQ1JmRm+EqH ANnaDxkNhQdVBiPtE4SifMylUPqFrvs0fC7myCU1cmPPswhNusBC1jCoQEbotRqx lv6r/33QNcWX8ofAovafrpSLsfdkv4RVcmAq9eotpAeD2fcMANxB2B3I9Qt57Fkh E1o9gcbiiKFG8zCMcG6D =7pYF -----END PGP SIGNATURE----- --RIYY1s2vRbPFwWeW--