From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58526) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZhRGB-0005pC-MT for qemu-devel@nongnu.org; Wed, 30 Sep 2015 19:58:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZhRG8-0005a0-E4 for qemu-devel@nongnu.org; Wed, 30 Sep 2015 19:58:07 -0400 Date: Thu, 1 Oct 2015 09:51:11 +1000 From: David Gibson Message-ID: <20150930235111.GF23574@voom> References: <1443579237-9636-1-git-send-email-david@gibson.dropbear.id.au> <1443579237-9636-6-git-send-email-david@gibson.dropbear.id.au> <560BA488.9080200@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="8TaQrIeukR7mmbKf" Content-Disposition: inline In-Reply-To: <560BA488.9080200@redhat.com> Subject: Re: [Qemu-devel] [PATCHv3 5/7] memory: Allow replay of IOMMU mapping notifications List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Vivier Cc: thuth@redhat.com, qemu-devel@nongnu.org, aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, abologna@redhat.com, alex.williamson@redhat.com, qemu-ppc@nongnu.org, pbonzini@redhat.com, gwshan@linux.vnet.ibm.com --8TaQrIeukR7mmbKf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 30, 2015 at 10:59:52AM +0200, Laurent Vivier wrote: >=20 >=20 > On 30/09/2015 04:13, David Gibson wrote: > > When we have guest visible IOMMUs, we allow notifiers to be registered > > which will be informed of all changes to IOMMU mappings. This is used = by > > vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings. > >=20 > > However, unlike with a memory region listener, an iommu notifier won't = be > > told about any mappings which already exist in the (guest) IOMMU at the > > time it is registered. This can cause problems if hotplugging a VFIO > > device onto a guest bus which had existing guest IOMMU mappings, but di= dn't > > previously have an VFIO devices (and hence no host IOMMU mappings). > >=20 > > This adds a memory_region_iommu_replay() function to handle this case. = It > > replays any existing mappings in an IOMMU memory region to a specified > > notifier. Because the IOMMU memory region doesn't internally remember = the > > granularity of the guest IOMMU it has a small hack where the caller must > > specify a granularity at which to replay mappings. > >=20 > > If there are finer mappings in the guest IOMMU these will be reported in > > the iotlb structures passed to the notifier which it must handle (proba= bly > > causing it to flag an error). This isn't new - the VFIO iommu notifier > > must already handle notifications about guest IOMMU mappings too short > > for it to represent in the host IOMMU. > >=20 > > Signed-off-by: David Gibson > > --- > > include/exec/memory.h | 13 +++++++++++++ > > memory.c | 20 ++++++++++++++++++++ > > 2 files changed, 33 insertions(+) > >=20 > > diff --git a/include/exec/memory.h b/include/exec/memory.h > > index 5baaf48..0f07159 100644 > > --- a/include/exec/memory.h > > +++ b/include/exec/memory.h > > @@ -583,6 +583,19 @@ void memory_region_notify_iommu(MemoryRegion *mr, > > void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier = *n); > > =20 > > /** > > + * memory_region_iommu_replay: replay existing IOMMU translations to > > + * a notifier > > + * > > + * @mr: the memory region to observe > > + * @n: the notifier to which to replay iommu mappings > > + * @granularity: Minimum page granularity to replay notifications for > > + * @is_write: Whether to treat the replay as a translate "write" > > + * through the iommu > > + */ > > +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, > > + hwaddr granularity, bool is_write); > > + > > +/** > > * memory_region_unregister_iommu_notifier: unregister a notifier for > > * changes to IOMMU translation entries. > > * > > diff --git a/memory.c b/memory.c > > index ef87363..1b03d22 100644 > > --- a/memory.c > > +++ b/memory.c > > @@ -1403,6 +1403,26 @@ void memory_region_register_iommu_notifier(Memor= yRegion *mr, Notifier *n) > > notifier_list_add(&mr->iommu_notify, n); > > } > > =20 > > +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, > > + hwaddr granularity, bool is_write) > > +{ > > + hwaddr addr; > > + IOMMUTLBEntry iotlb; > > + > > + for (addr =3D 0; addr < memory_region_size(mr); addr +=3D granular= ity) { > > + iotlb =3D mr->iommu_ops->translate(mr, addr, is_write); >=20 > in iotlb, there is an "address_mask", on spapr, it is copied from > "page_shift", which is SPAPR_TCE_PAGE_SHIFT (12 -> 4k). >=20 > At a first glance, we would like to use it to scan the memory region, > but as granularity could be a greater value, I think it is a better choic= e. Using address_mask doesn't quite work. *If* you start with an existing, valid translation, then you can use address mask to skip to the end of it - that might be a useful optimization in future, particularly if the guest IOMMU has variable page sizes. But if you start on an address that doesn't have a current valid translation in the IOMMU, then address_mask gets set to ~0, so it doesn't give you any information on where to try next for a valid mapping. That's what the granularity parameter is needed for. > But the question is: why the iotlb page_size is not equal to the > granularity given by VFIO_IOMMU_GET_INFO _IO ? Well, the iotlb page size is the page size from the *guest* iommu, whereas VFIO_IOMMU_GET_INFO tells you the page size of the *host* iommu. In practice, they'll probably be the same, at least on setups likely to work well with VFIO, but in theory they could be different. > > + if (iotlb.perm !=3D IOMMU_NONE) { > > + n->notify(n, &iotlb); > > + } > > + > > + /* if (2^64 - MR size) < granularity, it's possible to get an > > + * infinite loop here. This should catch such a wraparound */ > > + if ((addr + granularity) < addr) { > > + break; > > + } > > + } > > +} > > + > > void memory_region_unregister_iommu_notifier(Notifier *n) > > { > > notifier_remove(n); > >=20 >=20 > As my question is not a bout this particular patch but on another > existing part, I can say: >=20 > Reviewed-by: Laurent Vivier >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --8TaQrIeukR7mmbKf Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWDHVvAAoJEGw4ysog2bOStGMP/AgYj9wDubOAb+JFFz+uAP7j BdD2TLhsVUHR36BgsU+a+N++jkz71I6+HD3tA3sk3neYwO/JuUPxMco7XQZrzdUy WPgDGGef4dDmTJYLa1BL5EtUwulcESakFWGzhpyvEbpag8cn1fQnX/UeJ6Tmr3oA aIKZiRzfgfpvFXkNbXopXg21iSvT4m6mq2PACLYW+GNePG1MoewZdggavJIkqeVp lp+F6wXG48JShNYEMAuAVneKuoWd+ntNaWh4sq7UFmmUbqg1pwAemYsPP/RkiF7z HQg+T/b/np2e31O+yGnWKP+0L0jAnpKiFx+sjuCAt6MEFoC4leOVySpZ7fXOK4g2 aVZdJYxajnet7Uuem6kae6vG1R8ywpmSQeLICRd0ZMjr5vXBkUPCuPvo3aaOkPXA 1RJEcka5jiJqQbV44xBVaCoU7bz78tCPjYREjb0BD5qRNLBdh6BHDgRtMK4lHHf/ Y4ACVcCCEkozeOOSFAMEvPQcScdkIpuM/XQU+DTb5ZTVHKGRf3Ug0w5D1+Y3rjI8 nXR5sQIkrrmoOOaWDlEfp4HUZVJbwmVcf4EWZUa6hwEXhKMoxWhlBZTkTFSKxKo5 mRv3UUNCEz4TMIXXXyeNbR67MKFRvLKeEc/3BUYV2TIn7nAR7RUipuy8ApGMZwfm r7mTXIEFMdjqI9G9hsd1 =DBfi -----END PGP SIGNATURE----- --8TaQrIeukR7mmbKf--