From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52411) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZG7rX-0006YX-FD for qemu-devel@nongnu.org; Fri, 17 Jul 2015 11:47:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZG7rT-00051R-8O for qemu-devel@nongnu.org; Fri, 17 Jul 2015 11:47:47 -0400 Received: from mail-pa0-f51.google.com ([209.85.220.51]:36750) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZG7rT-00050z-14 for qemu-devel@nongnu.org; Fri, 17 Jul 2015 11:47:43 -0400 Received: by pachj5 with SMTP id hj5so62873365pac.3 for ; Fri, 17 Jul 2015 08:47:42 -0700 (PDT) References: <1436876514-2946-1-git-send-email-aik@ozlabs.ru> <1436876514-2946-5-git-send-email-aik@ozlabs.ru> <20150716051122.GA25179@voom.redhat.com> <55A8AB21.8080307@ozlabs.ru> <20150717133959.GG25179@voom.redhat.com> From: Alexey Kardashevskiy Message-ID: <55A92396.2030506@ozlabs.ru> Date: Sat, 18 Jul 2015 01:47:34 +1000 MIME-Version: 1.0 In-Reply-To: <20150717133959.GG25179@voom.redhat.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH qemu v3 4/4] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Peter Crosthwaite , qemu-devel@nongnu.org, Michael Roth , Alex Williamson , qemu-ppc@nongnu.org, Paolo Bonzini On 07/17/2015 11:39 PM, David Gibson wrote: > On Fri, Jul 17, 2015 at 05:13:37PM +1000, Alexey Kardashevskiy wrote: >> On 07/16/2015 03:11 PM, David Gibson wrote: >>> On Tue, Jul 14, 2015 at 10:21:54PM +1000, Alexey Kardashevskiy wrote: >>>> This makes use of the new "memory registering" feature. The idea is >>>> to provide the userspace ability to notify the host kernel about pages >>>> which are going to be used for DMA. Having this information, the host >>>> kernel can pin them all once per user process, do locked pages >>>> accounting (once) and not spent time on doing that in real time with >>>> possible failures which cannot be handled nicely in some cases. >>>> >>>> This adds a guest RAM memory listener which notifies a VFIO container >>>> about memory which needs to be pinned/unpinned. VFIO MMIO regions >>>> (i.e. "skip dump" regions) are skipped. >>>> >>>> The feature is only enabled for SPAPR IOMMU v2. The host kernel changes >>>> are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does >>>> not call it when v2 is detected and enabled. >>>> >>>> This does not change the guest visible interface. >>>> >>>> Signed-off-by: Alexey Kardashevskiy >>> >>> I've looked at this in more depth now, and attempting to unify the >>> pre-reg and mapping listeners like this can't work - they need to be >>> listening on different address spaces: mapping actions need to be >>> listening on the PCI address space, whereas the pre-reg needs to be >>> listening on address_space_memory. For x86 - for now - those end up >>> being the same thing, but on Power they're not. >>> >>> We do need to be clear about what differences are due to the presence >>> of a guest IOMMU versus which are due to arch or underlying IOMMU >>> type. For now Power has a guest IOMMU and x86 doesn't, but that could >>> well change in future: we could well implement the guest side IOMMU >>> for x86 in future (or x86 could invent a paravirt IOMMU interface). >>> On the other side, BenH's experimental powernv machine type could >>> introduce Power machines without a guest side IOMMU (or at least an >>> optional guest side IOMMU). >>> >>> The quick and dirty approach here is: >>> 1. Leave the main listener as is >>> 2. Add a new pre-reg notifier to the spapr iommu specific code, >>> which listens on address_space_memory, *not* the PCI space >>> >>> The more generally correct approach, which allows for more complex >>> IOMMU arrangements and the possibility of new IOMMU types with pre-reg >>> is: >>> 1. Have the core implement both a mapping listener and a pre-reg >>> listener (optionally enabled by a per-iommu-type flag). >>> Basically the first one sees what *is* mapped, the second sees >>> what *could* be mapped. >>> >>> 2. As now, the mapping listener listens on PCI address space, if >>> RAM blocks are added, immediately map them into the host IOMMU, >>> if guest IOMMU blocks appear register a notifier which will >>> mirror guest IOMMU mappings to the host IOMMU (this is what we >>> do now). >>> >>> 3. The pre-reg listener also listens on the PCI address space. RAM >>> blocks added are pre-registered immediately. >> >> >> PCI address space listeners won't be notified about RAM blocks on sPAPR. > > Sure they will - if any RAM blocks were mapped directly into PCI > address space, the listener would be notified. It's just that no RAM > blocks are directly mapped into PCI space, only partially mapped in > via IOMMU blocks. Right. No RAM blocks are mapped. So on *sPAPR* PCI AS listener won't be notified about *RAM*. But you say "they will". I am missing something here. > But the idea is this scheme could handle a platform that has both a > "bypass" DMA window which maps directly onto a block of ram and an > IOMMU controlled DMA window. Or one which could have either setup > depending on circumstances (which is probably true of BenH's "powernv" > machine type). > >>> But, if guest >>> IOMMU blocks are added, instead of registering a guest-iommu >>> notifier, >> >> "guest-iommu notifier" is the one called via memory_region_notify_iommu() >> from H_PUT_TCE? "Instead" implies dropping it, how this can work? > > Because the other listener - the mapping listener at (2) handles that > part. The pre-reg listener doesn't. My bad, #2 included notifiers, right. > But as noted in by other mail this whole scheme doesn't work without a > way to discover an IOMMU region's target AS in advance, which doesn't > currently exist. We can add AS to IOMMU MR now, few lines of code :) >>> we register another listener on the *target* AS of the >>> guest IOMMU, same callbacks as this one. In practice that >>> target AS will almost always resolve to address_space_memory, >>> but this can at least in theory handle crazy guest setups with >>> multiple layers of IOMMU. >>> >>> 4. Have to ensure that the pre-reg callbacks always happen before >>> the mapping calls. For a system with an IOMMU backend which >>> requires pre-registration, but doesn't have a guest IOMMU, we >>> need to pre-reg, then host-iommu-map RAM blocks that appear in >>> PCI address space. > -- Alexey