From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41377) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z7iIW-0001ET-Dx for qemu-devel@nongnu.org; Wed, 24 Jun 2015 06:52:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z7iIS-0002Cr-Kd for qemu-devel@nongnu.org; Wed, 24 Jun 2015 06:52:52 -0400 Received: from mail-pa0-f43.google.com ([209.85.220.43]:33857) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z7iIS-0002CU-Cg for qemu-devel@nongnu.org; Wed, 24 Jun 2015 06:52:48 -0400 Received: by pabvl15 with SMTP id vl15so26962921pab.1 for ; Wed, 24 Jun 2015 03:52:47 -0700 (PDT) References: <1434627456-13745-1-git-send-email-aik@ozlabs.ru> <20150623064442.GC13352@voom.redhat.com> From: Alexey Kardashevskiy Message-ID: <558A8BF8.3080509@ozlabs.ru> Date: Wed, 24 Jun 2015 20:52:40 +1000 MIME-Version: 1.0 In-Reply-To: <20150623064442.GC13352@voom.redhat.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Gavin Shan , Alexander Graf On 06/23/2015 04:44 PM, David Gibson wrote: > On Thu, Jun 18, 2015 at 09:37:22PM +1000, Alexey Kardashevskiy wrote: >> >> (cut-n-paste from kernel patchset) >> >> Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus >> where devices are allowed to do DMA. These ranges are called DMA windows. >> By default, there is a single DMA window, 1 or 2GB big, mapped at zero >> on a PCI bus. >> >> PAPR defines a DDW RTAS API which allows pseries guests >> querying the hypervisor about DDW support and capabilities (page size mask >> for now). A pseries guest may request an additional (to the default) >> DMA windows using this RTAS API. >> The existing pseries Linux guests request an additional window as big as >> the guest RAM and map the entire guest window which effectively creates >> direct mapping of the guest memory to a PCI bus. >> >> This patchset reworks PPC64 IOMMU code and adds necessary structures >> to support big windows. >> >> Once a Linux guest discovers the presence of DDW, it does: >> 1. query hypervisor about number of available windows and page size masks; >> 2. create a window with the biggest possible page size (today 4K/64K/16M); >> 3. map the entire guest RAM via H_PUT_TCE* hypercalls; >> 4. switche dma_ops to direct_dma_ops on the selected PE. >> >> Once this is done, H_PUT_TCE is not called anymore for 64bit devices and >> the guest does not waste time on DMA map/unmap operations. >> >> Note that 32bit devices won't use DDW and will keep using the default >> DMA window so KVM optimizations will be required (to be posted later). >> >> This patchset adds DDW support for pseries. The host kernel changes are >> required, posted as: >> >> [PATCH kernel v11 00/34] powerpc/iommu/vfio: Enable Dynamic DMA windows >> >> This patchset is based on git://github.com/dgibson/qemu.git spapr-next branch. > > A couple of general queries - this touchs on the kernel part as well > as the qemu part: > > * Am I correct in thinking that the point in doing the > pre-registration stuff is to allow the kernel to handle PUT_TCE > in real mode? i.e. that the advatage of doing preregistration > rather than accounting on the DMA_MAP and DMA_UNMAP itself only > appears once you have kernel KVM+VFIO acceleration? Handling PUT_TCE includes 2 things: 1. get_user_pages_fast() and put_page() 2. update locked_vm Both are tricky in real mode but 2) is also tricky in virtual mode as I have to deal with multiple unrelated 32bit and 64bit windows (VFIO does not care if they belong to one or many processes) with IOMMU page size==4K and gup/put_page working with 64k pages (our default page size for host kernel). But yes, without keeping real mode handlers in mind, this thing could have been made simpler. > * Do you have test numbers to show that it's still worthwhile to have > kernel acceleration once you have a guest using DDW? With DDW in > play, even if PUT_TCE is slow, it should be called a lot less > often. With DDW, the whole RAM mapped once at first set_dma_mask(64bit) called by the guest, it is just a few PUT_TCE_INDIRECT calls. If the guest uses DDW, real mode handlers cannot possibly beat it and I have reports that real mode handlers are noticibly slower than direct DMA mapping (i.e. DDW) for 40Gb devices (10Gb seems to be fine but I have not tried a dozen of guests yet). > The reason I ask is that the preregistration handling is a pretty big > chunk of code that inserts itself into some pretty core kernel data > structures, all for one pretty specific use case. We only want to do > that if there's a strong justification for it. Exactly. I keep asking Ben and Paul periodically if we want to keep it and the answer is always yes :) About "vfio: spapr: Move SPAPR-related code to a separate file" - I guess I better off removing it for now, right? -- Alexey