From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48625) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZGTsE-00048P-UV for qemu-devel@nongnu.org; Sat, 18 Jul 2015 11:18:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZGTsB-0004W1-NM for qemu-devel@nongnu.org; Sat, 18 Jul 2015 11:17:58 -0400 Date: Sun, 19 Jul 2015 01:17:53 +1000 From: David Gibson Message-ID: <20150718151753.GB19189@voom.fritz.box> References: <1436876514-2946-1-git-send-email-aik@ozlabs.ru> <1436876514-2946-5-git-send-email-aik@ozlabs.ru> <20150716051122.GA25179@voom.redhat.com> <55A8AB21.8080307@ozlabs.ru> <20150717133959.GG25179@voom.redhat.com> <55A92396.2030506@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="98e8jtXdkpgskNou" Content-Disposition: inline In-Reply-To: <55A92396.2030506@ozlabs.ru> Subject: Re: [Qemu-devel] [RFC PATCH qemu v3 4/4] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Peter Crosthwaite , qemu-devel@nongnu.org, Michael Roth , Alex Williamson , qemu-ppc@nongnu.org, Paolo Bonzini --98e8jtXdkpgskNou Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jul 18, 2015 at 01:47:34AM +1000, Alexey Kardashevskiy wrote: > On 07/17/2015 11:39 PM, David Gibson wrote: > >On Fri, Jul 17, 2015 at 05:13:37PM +1000, Alexey Kardashevskiy wrote: > >>On 07/16/2015 03:11 PM, David Gibson wrote: > >>>On Tue, Jul 14, 2015 at 10:21:54PM +1000, Alexey Kardashevskiy wrote: > >>>>This makes use of the new "memory registering" feature. The idea is > >>>>to provide the userspace ability to notify the host kernel about pages > >>>>which are going to be used for DMA. Having this information, the host > >>>>kernel can pin them all once per user process, do locked pages > >>>>accounting (once) and not spent time on doing that in real time with > >>>>possible failures which cannot be handled nicely in some cases. > >>>> > >>>>This adds a guest RAM memory listener which notifies a VFIO container > >>>>about memory which needs to be pinned/unpinned. VFIO MMIO regions > >>>>(i.e. "skip dump" regions) are skipped. > >>>> > >>>>The feature is only enabled for SPAPR IOMMU v2. The host kernel chang= es > >>>>are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this = does > >>>>not call it when v2 is detected and enabled. > >>>> > >>>>This does not change the guest visible interface. > >>>> > >>>>Signed-off-by: Alexey Kardashevskiy > >>> > >>>I've looked at this in more depth now, and attempting to unify the > >>>pre-reg and mapping listeners like this can't work - they need to be > >>>listening on different address spaces: mapping actions need to be > >>>listening on the PCI address space, whereas the pre-reg needs to be > >>>listening on address_space_memory. For x86 - for now - those end up > >>>being the same thing, but on Power they're not. > >>> > >>>We do need to be clear about what differences are due to the presence > >>>of a guest IOMMU versus which are due to arch or underlying IOMMU > >>>type. For now Power has a guest IOMMU and x86 doesn't, but that could > >>>well change in future: we could well implement the guest side IOMMU > >>>for x86 in future (or x86 could invent a paravirt IOMMU interface). > >>>On the other side, BenH's experimental powernv machine type could > >>>introduce Power machines without a guest side IOMMU (or at least an > >>>optional guest side IOMMU). > >>> > >>>The quick and dirty approach here is: > >>> 1. Leave the main listener as is > >>> 2. Add a new pre-reg notifier to the spapr iommu specific code, > >>> which listens on address_space_memory, *not* the PCI space > >>> > >>>The more generally correct approach, which allows for more complex > >>>IOMMU arrangements and the possibility of new IOMMU types with pre-reg > >>>is: > >>> 1. Have the core implement both a mapping listener and a pre-reg > >>> listener (optionally enabled by a per-iommu-type flag). > >>> Basically the first one sees what *is* mapped, the second sees > >>> what *could* be mapped. > >>> > >>> 2. As now, the mapping listener listens on PCI address space, if > >>> RAM blocks are added, immediately map them into the host IOMMU, > >>> if guest IOMMU blocks appear register a notifier which will > >>> mirror guest IOMMU mappings to the host IOMMU (this is what we > >>> do now). > >>> > >>> 3. The pre-reg listener also listens on the PCI address space. RAM > >>> blocks added are pre-registered immediately. > >> > >> > >>PCI address space listeners won't be notified about RAM blocks on sPAPR. > > > >Sure they will - if any RAM blocks were mapped directly into PCI > >address space, the listener would be notified. It's just that no RAM > >blocks are directly mapped into PCI space, only partially mapped in > >via IOMMU blocks. >=20 > Right. No RAM blocks are mapped. So on *sPAPR* PCI AS listener won't be > notified about *RAM*. But you say "they will". I am missing something her= e. So, I'm thinking more generally that just the existing PAPR and x86 cases. The current listener structure on the PCI address space is correct by construction for all platforms. It handles the no guest IOMMU case where RAM is mapped directly into PCI, it handles the guest iommu case. It handles the case where there isn't a guest IOMMU as such, but some or all of RAM is mapped into PCI space at an offset. It handles the case where a platform has a "bypass window" which has all (or a large block) of RAM mapped with a simple offset, and another window that has a paged IOMMU. So what I'm saying about is that the listener will see RAM blocks that are actually mapped into PCI address space. That's exactly what we need for the actual mapping calls. But for pre-reg we need to see RAM blocks that *might* be mapped into the bus, and we want to see them early. > >But the idea is this scheme could handle a platform that has both a > >"bypass" DMA window which maps directly onto a block of ram and an > >IOMMU controlled DMA window. Or one which could have either setup > >depending on circumstances (which is probably true of BenH's "powernv" > >machine type). > > > >>>But, if guest > >>> IOMMU blocks are added, instead of registering a guest-iommu > >>> notifier, > >> > >>"guest-iommu notifier" is the one called via memory_region_notify_iommu= () > >>from H_PUT_TCE? "Instead" implies dropping it, how this can work? > > > >Because the other listener - the mapping listener at (2) handles that > >part. The pre-reg listener doesn't. >=20 >=20 > My bad, #2 included notifiers, right. >=20 >=20 > >But as noted in by other mail this whole scheme doesn't work without a > >way to discover an IOMMU region's target AS in advance, which doesn't > >currently exist. >=20 > We can add AS to IOMMU MR now, few lines of code :) Not really, because it changes the semantics of what's possible with a guest IOMMU. The current code allows the IOMMU to map in individual pages from multiple target pools. That seems an unlikely thing to do in practice, but I think it requires some thought before doing. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --98e8jtXdkpgskNou Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVqm4hAAoJEGw4ysog2bOSqy4QALNSlwr7tWc5svBigsf4Yeqm pbdew2tqy7lx1rFnkpuf2M0p8cO5zwKKsLWAX0GRs/Dx6GxZhEcx7icmmi3EnsN/ qAVaLVmqOQxo7XFaiT5czLbhRwr64r1l4PxO4LwsG8IhXZWfNS+zpXTdI5K+JrtH XUc3Ugw3Qu86APfwxO/SXoHAK8oxRwfXy8FYDgMNSbDf7D/xCvaJF+SUi0cyYT5F MQQB0RnMnZ7J7+I6mMhZrTzIGtcCFzBKVUER2cAlx10p6dN6EzzTRl6ifgTquW7F OaQdE5YLeK/tnBZYp1K/8h60VaZo/Rzs1xY4UsM04Ng7lCJI47C8lgNaccZnV1LL /beEK6amQWxkmQjB54vH+ZRzFCq7pPytk5aMgCZQpzstuq1uR7ShpCs+agLkPm+o W/L0d9VBSuZ02APNcla0J8THw6hSLkcdrtnJbXRi8xBK8RPM4GVuTekYKR0NT49v 9coIh0RU6vSYm5XmUjBKazlpJQq8DhWH52h3nb/4ZEU9bkcqZnzLIt+zkNKp3A3W 17KtYPqmWBYWEd6Dxqq71I+HuwVFpDARPvIRzGBqG1kcOYaNGLhY6HNELMcfEViw kUrKicZMdroIheVN6VFJKpiJgZHj5kMnFqc4NJ7t9FBlDhLVwFipbEbtl3fMaHY3 BNw7R+MYq9WYLut5FZrf =zJ9T -----END PGP SIGNATURE----- --98e8jtXdkpgskNou--