From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Alex Williamson <alex.williamson@redhat.com>,
qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
Gavin Shan <gwshan@linux.vnet.ibm.com>,
Alexander Graf <agraf@suse.de>
Subject: Re: [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW)
Date: Wed, 24 Jun 2015 20:52:40 +1000 [thread overview]
Message-ID: <558A8BF8.3080509@ozlabs.ru> (raw)
In-Reply-To: <20150623064442.GC13352@voom.redhat.com>
On 06/23/2015 04:44 PM, David Gibson wrote:
> On Thu, Jun 18, 2015 at 09:37:22PM +1000, Alexey Kardashevskiy wrote:
>>
>> (cut-n-paste from kernel patchset)
>>
>> Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
>> where devices are allowed to do DMA. These ranges are called DMA windows.
>> By default, there is a single DMA window, 1 or 2GB big, mapped at zero
>> on a PCI bus.
>>
>> PAPR defines a DDW RTAS API which allows pseries guests
>> querying the hypervisor about DDW support and capabilities (page size mask
>> for now). A pseries guest may request an additional (to the default)
>> DMA windows using this RTAS API.
>> The existing pseries Linux guests request an additional window as big as
>> the guest RAM and map the entire guest window which effectively creates
>> direct mapping of the guest memory to a PCI bus.
>>
>> This patchset reworks PPC64 IOMMU code and adds necessary structures
>> to support big windows.
>>
>> Once a Linux guest discovers the presence of DDW, it does:
>> 1. query hypervisor about number of available windows and page size masks;
>> 2. create a window with the biggest possible page size (today 4K/64K/16M);
>> 3. map the entire guest RAM via H_PUT_TCE* hypercalls;
>> 4. switche dma_ops to direct_dma_ops on the selected PE.
>>
>> Once this is done, H_PUT_TCE is not called anymore for 64bit devices and
>> the guest does not waste time on DMA map/unmap operations.
>>
>> Note that 32bit devices won't use DDW and will keep using the default
>> DMA window so KVM optimizations will be required (to be posted later).
>>
>> This patchset adds DDW support for pseries. The host kernel changes are
>> required, posted as:
>>
>> [PATCH kernel v11 00/34] powerpc/iommu/vfio: Enable Dynamic DMA windows
>>
>> This patchset is based on git://github.com/dgibson/qemu.git spapr-next branch.
>
> A couple of general queries - this touchs on the kernel part as well
> as the qemu part:
>
> * Am I correct in thinking that the point in doing the
> pre-registration stuff is to allow the kernel to handle PUT_TCE
> in real mode? i.e. that the advatage of doing preregistration
> rather than accounting on the DMA_MAP and DMA_UNMAP itself only
> appears once you have kernel KVM+VFIO acceleration?
Handling PUT_TCE includes 2 things:
1. get_user_pages_fast() and put_page()
2. update locked_vm
Both are tricky in real mode but 2) is also tricky in virtual mode as I
have to deal with multiple unrelated 32bit and 64bit windows (VFIO does not
care if they belong to one or many processes) with IOMMU page size==4K and
gup/put_page working with 64k pages (our default page size for host kernel).
But yes, without keeping real mode handlers in mind, this thing could have
been made simpler.
> * Do you have test numbers to show that it's still worthwhile to have
> kernel acceleration once you have a guest using DDW? With DDW in
> play, even if PUT_TCE is slow, it should be called a lot less
> often.
With DDW, the whole RAM mapped once at first set_dma_mask(64bit) called by
the guest, it is just a few PUT_TCE_INDIRECT calls.
If the guest uses DDW, real mode handlers cannot possibly beat it and I
have reports that real mode handlers are noticibly slower than direct DMA
mapping (i.e. DDW) for 40Gb devices (10Gb seems to be fine but I have not
tried a dozen of guests yet).
> The reason I ask is that the preregistration handling is a pretty big
> chunk of code that inserts itself into some pretty core kernel data
> structures, all for one pretty specific use case. We only want to do
> that if there's a strong justification for it.
Exactly. I keep asking Ben and Paul periodically if we want to keep it and
the answer is always yes :)
About "vfio: spapr: Move SPAPR-related code to a separate file" - I guess I
better off removing it for now, right?
--
Alexey
next prev parent reply other threads:[~2015-06-24 10:52 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-18 11:37 [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 01/14] vmstate: Define VARRAY with VMS_ALLOC Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 02/14] vfio: spapr: Move SPAPR-related code to a separate file Alexey Kardashevskiy
2015-06-18 21:10 ` Alex Williamson
2015-06-19 0:16 ` Alexey Kardashevskiy
2015-06-23 5:49 ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 03/14] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
2015-06-25 19:59 ` Alex Williamson
2015-06-30 3:32 ` Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 04/14] spapr_pci: Convert finish_realize() to dma_capabilities_update()+dma_init_window() Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 05/14] spapr_iommu: Move table allocation to helpers Alexey Kardashevskiy
2015-06-22 3:28 ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 06/14] spapr_iommu: Introduce "enabled" state for TCE table Alexey Kardashevskiy
2015-06-22 3:45 ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 07/14] spapr_iommu: Remove vfio_accel flag from sPAPRTCETable Alexey Kardashevskiy
2015-06-22 3:51 ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 08/14] spapr_iommu: Add root memory region Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 09/14] spapr_pci: Do complete reset of DMA config when resetting PHB Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 10/14] spapr_vfio_pci: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-06-22 4:41 ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 11/14] spapr_pci: Enable vfio-pci hotplug Alexey Kardashevskiy
2015-06-22 5:14 ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 12/14] linux headers update for DDW on SPAPR Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
2015-06-23 6:38 ` David Gibson
2015-06-24 10:37 ` Alexey Kardashevskiy
2015-06-23 6:44 ` [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW) David Gibson
2015-06-24 10:52 ` Alexey Kardashevskiy [this message]
2015-06-25 19:59 ` Alex Williamson
2015-06-26 7:01 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=558A8BF8.3080509@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=agraf@suse.de \
--cc=alex.williamson@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=gwshan@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).