From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47392) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFy8G-0008RR-Dm for qemu-devel@nongnu.org; Fri, 17 Jul 2015 01:24:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZFy8D-0003zr-5k for qemu-devel@nongnu.org; Fri, 17 Jul 2015 01:24:24 -0400 Date: Fri, 17 Jul 2015 15:20:55 +1000 From: David Gibson Message-ID: <20150717052055.GF25179@voom.redhat.com> References: <1436876514-2946-1-git-send-email-aik@ozlabs.ru> <1436876514-2946-5-git-send-email-aik@ozlabs.ru> <20150716051122.GA25179@voom.redhat.com> <1437057899.1391.562.camel@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ahP6B03r4gLOj5uD" Content-Disposition: inline In-Reply-To: <1437057899.1391.562.camel@redhat.com> Subject: Re: [Qemu-devel] [RFC PATCH qemu v3 4/4] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Peter Crosthwaite , Alexey Kardashevskiy , qemu-devel@nongnu.org, Michael Roth , qemu-ppc@nongnu.org, Paolo Bonzini --ahP6B03r4gLOj5uD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 16, 2015 at 08:44:59AM -0600, Alex Williamson wrote: > On Thu, 2015-07-16 at 15:11 +1000, David Gibson wrote: > > On Tue, Jul 14, 2015 at 10:21:54PM +1000, Alexey Kardashevskiy wrote: > > > This makes use of the new "memory registering" feature. The idea is > > > to provide the userspace ability to notify the host kernel about pages > > > which are going to be used for DMA. Having this information, the host > > > kernel can pin them all once per user process, do locked pages > > > accounting (once) and not spent time on doing that in real time with > > > possible failures which cannot be handled nicely in some cases. > > >=20 > > > This adds a guest RAM memory listener which notifies a VFIO container > > > about memory which needs to be pinned/unpinned. VFIO MMIO regions > > > (i.e. "skip dump" regions) are skipped. > > >=20 > > > The feature is only enabled for SPAPR IOMMU v2. The host kernel chang= es > > > are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this = does > > > not call it when v2 is detected and enabled. > > >=20 > > > This does not change the guest visible interface. > > >=20 > > > Signed-off-by: Alexey Kardashevskiy > >=20 > > I've looked at this in more depth now, and attempting to unify the > > pre-reg and mapping listeners like this can't work - they need to be > > listening on different address spaces: mapping actions need to be > > listening on the PCI address space, whereas the pre-reg needs to be > > listening on address_space_memory. For x86 - for now - those end up > > being the same thing, but on Power they're not. > >=20 > > We do need to be clear about what differences are due to the presence > > of a guest IOMMU versus which are due to arch or underlying IOMMU > > type. For now Power has a guest IOMMU and x86 doesn't, but that could > > well change in future: we could well implement the guest side IOMMU > > for x86 in future (or x86 could invent a paravirt IOMMU interface). > > On the other side, BenH's experimental powernv machine type could > > introduce Power machines without a guest side IOMMU (or at least an > > optional guest side IOMMU). > >=20 > > The quick and dirty approach here is: > > 1. Leave the main listener as is > > 2. Add a new pre-reg notifier to the spapr iommu specific code, > > which listens on address_space_memory, *not* the PCI space >=20 > It is dirty and that's exactly what I've been advising Alexey against > because we have entirely too much dirty spapr specific code that doesn't > need to be spapr specific. I don't see why separate address space > matters, that's done at the point of registering the listener and so far > doesn't play much a role in the actual listener behavior, just which > regions it sees. Well, there's two parts to this - the different address spaces means they need to be different listener instances. They also need different callbacks - or at least parameterized callback behaviour because they do different things (one maps, the other preregs). So I really don't see any sense in which these can be accomplished by the same listener. *Maybe* they could share some region walking code, but I'm not sure there's going to be anything of significant size here. > > The more generally correct approach, which allows for more complex > > IOMMU arrangements and the possibility of new IOMMU types with pre-reg > > is: > > 1. Have the core implement both a mapping listener and a pre-reg > > listener (optionally enabled by a per-iommu-type flag). > > Basically the first one sees what *is* mapped, the second sees > > what *could* be mapped. >=20 > This just sounds like different address spaces, address_space_memory vs > address_space_physical_memory Um.. what? I'm not even sure what you mean by address_space_physical_memory (I see no such thing in the source). The idea was that the (top level) pre-reg listener would spawn more listeners for any AS which could get (partially) mapped into the PCI addres space. But.. I looked a bit closer and realised this scheme doesn't actually work. IOMMU memory regions don't actually have a fixed target AS property (by which I mean the address space the IOMMU translates *into* rather than translates from - address_space_memory in most cases). Instead any individual IOMMU mapping can point to a different AS supplied in the IOMMUTLB structure. > > 2. As now, the mapping listener listens on PCI address space, if > > RAM blocks are added, immediately map them into the host IOMMU, > > if guest IOMMU blocks appear register a notifier which will > > mirror guest IOMMU mappings to the host IOMMU (this is what we > > do now). >=20 > Right, this is done now, nothing new required. Yes, I was just spelling that out for comparison with the other part. > > 3. The pre-reg listener also listens on the PCI address space. RAM > > blocks added are pre-registered immediately. But, if guest > > IOMMU blocks are added, instead of registering a guest-iommu > > notifier, we register another listener on the *target* AS of the > > guest IOMMU, same callbacks as this one. In practice that > > target AS will almost always resolve to address_space_memory, > > but this can at least in theory handle crazy guest setups with > > multiple layers of IOMMU. > >=20 > > 4. Have to ensure that the pre-reg callbacks always happen before > > the mapping calls. For a system with an IOMMU backend which > > requires pre-registration, but doesn't have a guest IOMMU, we > > need to pre-reg, then host-iommu-map RAM blocks that appear in > > PCI address space. > Both of these just seem like details of the iommu-type (pre-reg) > specific parts of the listener, I'm not understanding what's > fundamentally different about it that we can't do it now, within a > single listener, nor do I see how it's all that different from x86. So, we have two different address spaces to listen on (PCI and address_space_memory) so we need to different listener instances registered, clearly. Those two different listeners need to do different things. 1) maps memory blocks into VFIO and sets up a notifier to mirror guest IOMMU mappings into VFIO. 2) preregisters memory blocks, and it would be a bug if it ever saw a guest IOMMU. So, what's left to share? > x86 > mappings are more dynamic, but that's largely in legacy space for x86 > due to some bizarre compatibility things. x86 also tries to map mmio > regions for p2p, but these are just minor details to mappings that are > largely the same. Thanks, >=20 > Alex >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --ahP6B03r4gLOj5uD Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVqJC3AAoJEGw4ysog2bOSkpcP/0jgJYJi77FEpzxpiYqdJB3g YoLhB3viDD5eNF/g5hEUZmczbOejD+diYQMWMtHxTd74IEhZm0zxoEy7xEe/U8ks ZjQI0tL/QJDBm+EiFEApzLNd9XFAnfcp2hkOf7ofqxWdVdXm6f44WRNY9agZqx7/ GmSGdLbLINXzBEuJQOGYZqO0V1wsH9nNyMFkO4aDmI8X/+tl1lNR7dkJIAdaxPyL nAe+97sXrdjnkIhftqbH3EWkIcVpYZG/WlWICooRxih2Tw4ijVeRXvEr15QfBOIP dC9YCodNeJVjy+Fnw84CI4mmnxdtZoP99OwL/m7ml0VxUSo7NR1I1p+LT9UMxmu6 4cFQFnib1t7hRH/5qjAVKOWCl/sA5gwL9Nq9KIwq243yXe+FFa8MrqQwdBo/05Ex /9R+7KjgVNMXAHVOe3LcLBXe+pmZWAHVpQT5pP2DbQL3DEE4BpQfshmpFU20Dxte PMaSBdInKGipoDe8nbNLT2d/WWLr3hXHGXBFoHOrzt1gf/A+2B9zBBtTqrPm5wa3 t5qJZ9WjJ7REzoGZ+TQX3+ZTYiLDrSh9RzS9SkoBKIPHJPd/wb6Vm7mBgLWseIA+ JXlHDaHdicIw01ac79jqtaYQo5TrpjeRrjNXzNAWDeqF+joyqWpuvsfNbG/77iKP bHWVgG80PLx4afiDSCjy =ikfA -----END PGP SIGNATURE----- --ahP6B03r4gLOj5uD--