qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	Gavin Shan <gwshan@linux.vnet.ibm.com>,
	Alexander Graf <agraf@suse.de>
Subject: Re: [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW)
Date: Wed, 24 Jun 2015 20:52:40 +1000	[thread overview]
Message-ID: <558A8BF8.3080509@ozlabs.ru> (raw)
In-Reply-To: <20150623064442.GC13352@voom.redhat.com>

On 06/23/2015 04:44 PM, David Gibson wrote:
> On Thu, Jun 18, 2015 at 09:37:22PM +1000, Alexey Kardashevskiy wrote:
>>
>> (cut-n-paste from kernel patchset)
>>
>> Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
>> where devices are allowed to do DMA. These ranges are called DMA windows.
>> By default, there is a single DMA window, 1 or 2GB big, mapped at zero
>> on a PCI bus.
>>
>> PAPR defines a DDW RTAS API which allows pseries guests
>> querying the hypervisor about DDW support and capabilities (page size mask
>> for now). A pseries guest may request an additional (to the default)
>> DMA windows using this RTAS API.
>> The existing pseries Linux guests request an additional window as big as
>> the guest RAM and map the entire guest window which effectively creates
>> direct mapping of the guest memory to a PCI bus.
>>
>> This patchset reworks PPC64 IOMMU code and adds necessary structures
>> to support big windows.
>>
>> Once a Linux guest discovers the presence of DDW, it does:
>> 1. query hypervisor about number of available windows and page size masks;
>> 2. create a window with the biggest possible page size (today 4K/64K/16M);
>> 3. map the entire guest RAM via H_PUT_TCE* hypercalls;
>> 4. switche dma_ops to direct_dma_ops on the selected PE.
>>
>> Once this is done, H_PUT_TCE is not called anymore for 64bit devices and
>> the guest does not waste time on DMA map/unmap operations.
>>
>> Note that 32bit devices won't use DDW and will keep using the default
>> DMA window so KVM optimizations will be required (to be posted later).
>>
>> This patchset adds DDW support for pseries. The host kernel changes are
>> required, posted as:
>>
>> [PATCH kernel v11 00/34] powerpc/iommu/vfio: Enable Dynamic DMA windows
>>
>> This patchset is based on git://github.com/dgibson/qemu.git spapr-next branch.
>
> A couple of general queries - this touchs on the kernel part as well
> as the qemu part:
>
>   * Am I correct in thinking that the point in doing the
>     pre-registration stuff is to allow the kernel to handle PUT_TCE
>     in real mode?  i.e. that the advatage of doing preregistration
>     rather than accounting on the DMA_MAP and DMA_UNMAP itself only
>     appears once you have kernel KVM+VFIO acceleration?


Handling PUT_TCE includes 2 things:
1. get_user_pages_fast() and put_page()
2. update locked_vm

Both are tricky in real mode but 2) is also tricky in virtual mode as I 
have to deal with multiple unrelated 32bit and 64bit windows (VFIO does not 
care if they belong to one or many processes) with IOMMU page size==4K and 
gup/put_page working with 64k pages (our default page size for host kernel).

But yes, without keeping real mode handlers in mind, this thing could have 
been made simpler.


>   * Do you have test numbers to show that it's still worthwhile to have
>     kernel acceleration once you have a guest using DDW?  With DDW in
>     play, even if PUT_TCE is slow, it should be called a lot less
>     often.

With DDW, the whole RAM mapped once at first set_dma_mask(64bit) called by 
the guest, it is just a few PUT_TCE_INDIRECT calls.

If the guest uses DDW, real mode handlers cannot possibly beat it and I 
have reports that real mode handlers are noticibly slower than direct DMA 
mapping (i.e. DDW) for 40Gb devices (10Gb seems to be fine but I have not 
tried a dozen of guests yet).


> The reason I ask is that the preregistration handling is a pretty big
> chunk of code that inserts itself into some pretty core kernel data
> structures, all for one pretty specific use case.  We only want to do
> that if there's a strong justification for it.

Exactly. I keep asking Ben and Paul periodically if we want to keep it and 
the answer is always yes :)


About "vfio: spapr: Move SPAPR-related code to a separate file" - I guess I 
better off removing it for now, right?



-- 
Alexey

  reply	other threads:[~2015-06-24 10:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-18 11:37 [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 01/14] vmstate: Define VARRAY with VMS_ALLOC Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 02/14] vfio: spapr: Move SPAPR-related code to a separate file Alexey Kardashevskiy
2015-06-18 21:10   ` Alex Williamson
2015-06-19  0:16     ` Alexey Kardashevskiy
2015-06-23  5:49   ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 03/14] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
2015-06-25 19:59   ` Alex Williamson
2015-06-30  3:32     ` Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 04/14] spapr_pci: Convert finish_realize() to dma_capabilities_update()+dma_init_window() Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 05/14] spapr_iommu: Move table allocation to helpers Alexey Kardashevskiy
2015-06-22  3:28   ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 06/14] spapr_iommu: Introduce "enabled" state for TCE table Alexey Kardashevskiy
2015-06-22  3:45   ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 07/14] spapr_iommu: Remove vfio_accel flag from sPAPRTCETable Alexey Kardashevskiy
2015-06-22  3:51   ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 08/14] spapr_iommu: Add root memory region Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 09/14] spapr_pci: Do complete reset of DMA config when resetting PHB Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 10/14] spapr_vfio_pci: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-06-22  4:41   ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 11/14] spapr_pci: Enable vfio-pci hotplug Alexey Kardashevskiy
2015-06-22  5:14   ` David Gibson
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 12/14] linux headers update for DDW on SPAPR Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
2015-06-18 11:37 ` [Qemu-devel] [PATCH qemu v8 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
2015-06-23  6:38   ` David Gibson
2015-06-24 10:37     ` Alexey Kardashevskiy
2015-06-23  6:44 ` [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW) David Gibson
2015-06-24 10:52   ` Alexey Kardashevskiy [this message]
2015-06-25 19:59     ` Alex Williamson
2015-06-26  7:01       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558A8BF8.3080509@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).