From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41319) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aY6Zj-0004ug-GZ for qemu-devel@nongnu.org; Tue, 23 Feb 2016 01:36:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aY6Zg-0001dr-A2 for qemu-devel@nongnu.org; Tue, 23 Feb 2016 01:35:59 -0500 Received: from ozlabs.org ([103.22.144.67]:44578) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aY6Zf-0001cj-9b for qemu-devel@nongnu.org; Tue, 23 Feb 2016 01:35:56 -0500 Date: Tue, 23 Feb 2016 17:20:56 +1100 From: David Gibson Message-ID: <20160223062056.GU2808@voom.fritz.box> References: <1456121379-13434-1-git-send-email-aik@ozlabs.ru> <20160222062631.GH2808@voom.fritz.box> <56CAF2B3.2030502@ozlabs.ru> <20160222121227.GN2808@voom.fritz.box> <56CBBFD7.3050806@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6wUvnnXCCq76soZt" Content-Disposition: inline In-Reply-To: <56CBBFD7.3050806@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH qemu] memory: Fix IOMMU replay base address List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-devel@nongnu.org --6wUvnnXCCq76soZt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 23, 2016 at 01:11:35PM +1100, Alexey Kardashevskiy wrote: > On 02/22/2016 11:12 PM, David Gibson wrote: > >On Mon, Feb 22, 2016 at 10:36:19PM +1100, Alexey Kardashevskiy wrote: > >>On 02/22/2016 05:26 PM, David Gibson wrote: > >>>On Mon, Feb 22, 2016 at 05:09:39PM +1100, Alexey Kardashevskiy wrote: > >>>>Since a788f227 "memory: Allow replay of IOMMU mapping notifications" > >>>>when new VFIO listener is added, all existing IOMMU mappings are repl= ayed. > >>>>However there is a problem that the base address of an IOMMU memory r= egion > >>>>(IOMMU MR) is ignored which is not a problem for the existing user (w= hich is > >>>>pseries) with its default 32bit DMA window starting at 0 but it is if= there is > >>>>another DMA window. > >>>> > >>>>This adjusts the replaying address by mr->addr. > >>> > >>>Uh.. this doesn't look right to me. AFAICT from the existing > >>>implementations the 'addr' parameter to the translate function is an > >>>offset within the memory region, which would make the original version > >>>correct. > >> > >>Ok, then spapr_tce_translate_iommu() needs to be fixed. Or I am missing > >>something here? The @addr field is definitely ignored now. > > > >I think at least one of us is missing something. In the normal AS > >translation path, IIUC, it will subtract the mr->addr value to give > >offsets which are passed to translate. It uses those offsets to find > >the TCE and returns a translated address. Where else the @addr be > >used? >=20 > Ok, this patch is definitely wrong. >=20 > The actual problem I am debugging is this: > 1. run guest with an emulated XHCI to have 2 DMA windows; > 2. hotplug vfio-pci device. > 3. observe failing vfio_dma_map(). >=20 > This triggers memory_region_register_iommu_notifier() + > memory_region_iommu_replay() which calls vfio_iommu_map_notify() in a loo= p. > For the second window (huge one, 0x800000000000000 bus address) it fails = as > iotlb.iova's returned by spapr_tce_translate_iommu() are offsets within > IOMMU MR. >=20 > But vfio_dma_map() (which is eventually called by vfio_iommu_map_notify()) > needs an absolute iotlb->iova address. >=20 > imho this should be correct: >=20 > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c > index 2e02dc6..0d6b926 100644 > --- a/hw/ppc/spapr_iommu.c > +++ b/hw/ppc/spapr_iommu.c > @@ -126,7 +126,7 @@ static IOMMUTLBEntry > spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr, > hwaddr page_mask =3D IOMMU_PAGE_MASK(tcet->page_shift); >=20 > tce =3D tcet->table[addr >> tcet->page_shift]; > - ret.iova =3D addr & page_mask; > + ret.iova =3D (addr + iommu->addr) & page_mask; > ret.translated_addr =3D tce & page_mask; I wondered about that change, but I'd have to look closer to see if the iova field here is expected to be relative to the MR as well. It would be oddly inconsistent if it wasn't. > However this does not help either. >=20 > Because at the moment I create hardware windows from a helper which is > called from spapr_phb_hot_plug_child() (HotplugHandlerClass::plug() > callback). And this is too late as pci_qdev_realize("vfio-pci") calls: >=20 > vfio_connect_container -> ... -> vfio_listener_region_add -> .. -> > memory_region_iommu_replay > and the latter fails - there is no huge window in the host kernel yet. > What is the right way of fixing this?... Hmm.. I think we need a backend specific hook which will be called =66rom vfio_listener_region_add() when it sees a new iommu region, but before it calls replay. Basically that notifier is enumerating to the VFIO subsystem where the guest has IOMMU windows. With DDW on the host, we need something to mirror the guest side windows into the host (or reject them, if they're not compatible. Actually.. that hook should replace the test a little bit higher testing against container->min_iova and container->max_iova. With host-side DDW the set of allowable IOVAs is no longer a single, simple range - and it's now changeable. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --6wUvnnXCCq76soZt Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWy/pIAAoJEGw4ysog2bOSB94QAL8brvNqKiUDCQglcawMTVr3 pLRSuqAGRuFFqQlj4/VDY+daEchhV66hpGxTtNT6MLKrxoHsrtrU1VIJ6cDVP24u 2b5Y1ivI50x5rxx5BxOkvB244xwuoeTI8ffeHYsFwpOcvBiIcRaLGwGpHpKBL51m 4keCsoNLU0ZB1C8+lx9CkfTPfDna4h8Wg02dgPlvMpFAtyKQ8ufQUYstykAZJ1AX xKv+fpEzXV1QPwx2xt38S1AHJDjwZ2irU53bFrxjJ3Zq6eZfarafLNT+a+skQXAe 6iZQhQM2BvwihH68rZ859iKt3ROoiebwtx7bvVDveZTLJcK7T2Hl4aa5vkkMRPJQ E12bfDNKnXKRTgllSZB8sI3gGdOW/kKtux7UrBOph9MasjEdXwYsKyu3C4cLRv6G q7WW84kHSGHRHZ4mWGLG5RTUHiCN/3/I953tTFO66BjEtqqkLbD8ECC/K7cVOiY7 RHSuSKS5Pkp5bOtlJnzibnnBZuDCRE+FU6cLC3Q9QPiNh9EVKKGKa6xkCi12FEZt V7jXFUlR0VmbegdRiBu5A0I9kJgl4su6eDZ/4jrFERbCpnO0eFsafZzN+yqD1wYM pes645Bqoh0MlPiTAHdS9yID+359uyEHCz0//xL5HaG1mQsHNnHoHAOBd/LGf6FU 6GrihQ3+bdcDF7zOU4NO =bRgv -----END PGP SIGNATURE----- --6wUvnnXCCq76soZt--