qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alexander Graf <agraf@suse.de>, qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, Gavin Shan <gwshan@linux.vnet.ibm.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support
Date: Wed, 27 Aug 2014 23:56:04 +1000	[thread overview]
Message-ID: <53FDE374.1040705@ozlabs.ru> (raw)
In-Reply-To: <53FDA6B0.6050408@suse.de>

On 08/27/2014 07:36 PM, Alexander Graf wrote:
> 
> 
> On 15.08.14 12:12, Alexey Kardashevskiy wrote:
>> This adds support for Dynamic DMA Windows (DDW) option defined by
>> the SPAPR specification which allows to have additional DMA window(s)
>> which can support page sizes other than 4K.
>>
>> The existing implementation of DDW in the guest tries to create one huge
>> DMA window with 64K or 16MB pages and map the entire guest RAM to. If it
>> succeeds, the guest switches to dma_direct_ops and never calls
>> TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use
>> the entire RAM and not waste time on map/unmap.
>>
>> This adds 4 RTAS handlers:
>> * ibm,query-pe-dma-window
>> * ibm,create-pe-dma-window
>> * ibm,remove-pe-dma-window
>> * ibm,reset-pe-dma-window
>> These are registered from type_init() callback.
>>
>> These RTAS handlers are implemented in a separate file to avoid polluting
>> spapr_iommu.c with PHB.
>>
>> Since no PHB class implements new callback in this patch, no functional
>> change is expected.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * double loop squashed to spapr_iommu_fixmask() helper
>> * added @ddw_num counter to PHB, it is used to generate LIOBN for new
>> window; it is reset on ddw-reset event
>> * added ULL to constants used in shift operations
>> * rtas_ibm_reset_pe_dma_window() and rtas_ibm_remove_pe_dma_window()
>> do not remove windows anymore, the PHB callback has to as it will reuse
>> the same code in case of guest reboot as well
>> ---
>>  hw/ppc/Makefile.objs        |   3 +
>>  hw/ppc/spapr_pci.c          |   3 +-
>>  hw/ppc/spapr_rtas_ddw.c     | 283 ++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/pci-host/spapr.h |  19 +++
>>  include/hw/ppc/spapr.h      |   6 +-
>>  trace-events                |   4 +
>>  6 files changed, 316 insertions(+), 2 deletions(-)
>>  create mode 100644 hw/ppc/spapr_rtas_ddw.c
>>
>> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
>> index edd44d0..9773294 100644
>> --- a/hw/ppc/Makefile.objs
>> +++ b/hw/ppc/Makefile.objs
>> @@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o
>>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>>  obj-y += spapr_pci_vfio.o
>>  endif
>> +ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
>> +obj-y += spapr_rtas_ddw.o
>> +endif
>>  # PowerPC 4xx boards
>>  obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>>  obj-y += ppc4xx_pci.o
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index aa20c36..9b03d0d 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -759,7 +759,7 @@ static int spapr_pci_post_load(void *opaque, int version_id)
>>  
>>  static const VMStateDescription vmstate_spapr_pci = {
>>      .name = "spapr_pci",
>> -    .version_id = 2,
>> +    .version_id = 3,
>>      .minimum_version_id = 2,
>>      .pre_save = spapr_pci_pre_save,
>>      .post_load = spapr_pci_post_load,
>> @@ -775,6 +775,7 @@ static const VMStateDescription vmstate_spapr_pci = {
>>          VMSTATE_INT32(msi_devs_num, sPAPRPHBState),
>>          VMSTATE_STRUCT_VARRAY_ALLOC(msi_devs, sPAPRPHBState, msi_devs_num, 0,
>>                                      vmstate_spapr_pci_msi, spapr_pci_msi_mig),
>> +        VMSTATE_UINT32_V(ddw_num, sPAPRPHBState, 3),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
>> new file mode 100644
>> index 0000000..2b5376a
>> --- /dev/null
>> +++ b/hw/ppc/spapr_rtas_ddw.c
>> @@ -0,0 +1,283 @@
>> +/*
>> + * QEMU sPAPR Dynamic DMA windows support
>> + *
>> + * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.
>> + *
>> + *  This program is free software; you can redistribute it and/or modify
>> + *  it under the terms of the GNU General Public License as published by
>> + *  the Free Software Foundation; either version 2 of the License,
>> + *  or (at your option) any later version.
>> + *
>> + *  This program is distributed in the hope that it will be useful,
>> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + *  GNU General Public License for more details.
>> + *
>> + *  You should have received a copy of the GNU General Public License
>> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/pci-host/spapr.h"
>> +#include "trace.h"
>> +
>> +static uint32_t spapr_iommu_fixmask(struct ppc_one_seg_page_size *sps,
>> +                                    uint32_t query_mask)
>> +{
>> +    int i, j;
>> +    uint32_t mask = 0;
>> +    const struct { int shift; uint32_t mask; } masks[] = {
>> +        { 12, DDW_PGSIZE_4K },
>> +        { 16, DDW_PGSIZE_64K },
>> +        { 24, DDW_PGSIZE_16M },
>> +        { 25, DDW_PGSIZE_32M },
>> +        { 26, DDW_PGSIZE_64M },
>> +        { 27, DDW_PGSIZE_128M },
>> +        { 28, DDW_PGSIZE_256M },
>> +        { 34, DDW_PGSIZE_16G },
>> +    };
>> +
>> +    for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
>> +        for (j = 0; j < ARRAY_SIZE(masks); ++j) {
>> +            if ((sps[i].page_shift == masks[j].shift) &&
>> +                    (query_mask & masks[j].mask)) {
>> +                mask |= masks[j].mask;
>> +            }
>> +        }
>> +    }
>> +
>> +    return mask;
>> +}
>> +
>> +static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
>> +                                         sPAPREnvironment *spapr,
>> +                                         uint32_t token, uint32_t nargs,
>> +                                         target_ulong args,
>> +                                         uint32_t nret, target_ulong rets)
>> +{
>> +    CPUPPCState *env = &cpu->env;
>> +    sPAPRPHBState *sphb;
>> +    sPAPRPHBClass *spc;
>> +    uint64_t buid;
>> +    uint32_t addr, pgmask = 0;
>> +    uint32_t windows_available = 0, page_size_mask = 0;
>> +    long ret;
>> +
>> +    if ((nargs != 3) || (nret != 5)) {
>> +        goto param_error_exit;
>> +    }
>> +
>> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>> +    addr = rtas_ld(args, 0);
>> +    sphb = spapr_pci_find_phb(spapr, buid);
>> +    if (!sphb) {
>> +        goto param_error_exit;
>> +    }
>> +
>> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
>> +    if (!spc->ddw_query) {
>> +        goto hw_error_exit;
>> +    }
>> +
>> +    ret = spc->ddw_query(sphb, &windows_available, &page_size_mask);
>> +    trace_spapr_iommu_ddw_query(buid, addr, windows_available,
>> +                                page_size_mask, pgmask, ret);
>> +    if (ret) {
>> +        goto hw_error_exit;
>> +    }
>> +
>> +    /* Work out supported page masks */
>> +    pgmask = spapr_iommu_fixmask(env->sps.sps, page_size_mask);
>> +
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    rtas_st(rets, 1, windows_available);
>> +
>> +    /*
>> +     * This is "Largest contiguous block of TCEs allocated specifically
>> +     * for (that is, are reserved for) this PE".
>> +     * Return the maximum number as all RAM was in 4K pages.
>> +     */
>> +    rtas_st(rets, 2, ram_size >> SPAPR_TCE_PAGE_SHIFT);
>> +    rtas_st(rets, 3, pgmask);
>> +    rtas_st(rets, 4, pgmask); /* DMA migration mask */
>> +    return;
>> +
>> +hw_error_exit:
>> +    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> +    return;
>> +
>> +param_error_exit:
>> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +}
>> +
>> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
>> +                                          sPAPREnvironment *spapr,
>> +                                          uint32_t token, uint32_t nargs,
>> +                                          target_ulong args,
>> +                                          uint32_t nret, target_ulong rets)
>> +{
>> +    sPAPRPHBState *sphb;
>> +    sPAPRPHBClass *spc;
>> +    sPAPRTCETable *tcet = NULL;
>> +    uint32_t addr, page_shift, window_shift, liobn;
>> +    uint64_t buid;
>> +    long ret;
>> +
>> +    if ((nargs != 5) || (nret != 4)) {
>> +        goto param_error_exit;
>> +    }
>> +
>> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>> +    addr = rtas_ld(args, 0);
>> +    sphb = spapr_pci_find_phb(spapr, buid);
>> +    if (!sphb) {
>> +        goto param_error_exit;
>> +    }
>> +
>> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
>> +    if (!spc->ddw_create) {
>> +        goto hw_error_exit;
>> +    }
>> +
>> +    page_shift = rtas_ld(args, 3);
>> +    window_shift = rtas_ld(args, 4);
>> +    /* Default 32bit window#0 is always there so +1 */
>> +    liobn = SPAPR_PCI_LIOBN(sphb->index, sphb->ddw_num + 1);
> 
> What if you just initialize ddw_num to 1 on init?

All this DDW needs rework again.

POWER8 allows exactly 2 DMA windows per PE, their start address is fixed
and defined by bit 59 of PCI address.

Normally guests just create additional huge window and that's it. And they
do not care if QEMU advertises "ddw-reset" RTAS token. And it is all simple
and nice.

But sles11sp3 is awesomely smart and if it sees "ddw-reset" token, it
removes the default window and then creates a huge window. I can do that
but guests (sles11 and newer) do not accept huge window starting from zero
- it is considered a failure (while it is not by PAPR and I tried hacking
guest - that works).

So I'll do first version without "reset".

However "ddw-reset" is mandatory if we want to be PAPR 2.7 compliant. So
I'll probably allow removing default DMA window but when guest asks for a
huge one, I will create the second window which starts high and hope if the
guest will ever ask for a second window, it will be ready for it to start
from zero :)



-- 
Alexey

  reply	other threads:[~2014-08-27 13:56 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal Alexey Kardashevskiy
2014-08-19  0:39   ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
2014-08-19  0:43   ` David Gibson
2014-08-20  8:09     ` Alexey Kardashevskiy
2014-08-27  9:27   ` Alexander Graf
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 03/13] spapr_pci: Make find_phb()/find_dev() public Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 04/13] spapr_iommu: Make spapr_tce_find_by_liobn() public Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
2014-08-19  0:44   ` David Gibson
2014-08-27  9:29   ` Alexander Graf
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
2014-08-26  6:16   ` David Gibson
2014-08-26  7:04     ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support Alexey Kardashevskiy
2014-08-26  7:06   ` David Gibson
2014-08-27  9:36   ` Alexander Graf
2014-08-27 13:56     ` Alexey Kardashevskiy [this message]
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW Alexey Kardashevskiy
2014-08-26  7:14   ` David Gibson
2014-08-26  8:11     ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset Alexey Kardashevskiy
2014-08-26  6:55   ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW Alexey Kardashevskiy
2014-08-18 17:42   ` Alex Williamson
2014-08-20  7:49     ` Alexey Kardashevskiy
2014-08-20 19:44       ` Alex Williamson
2014-08-21  2:47         ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW Alexey Kardashevskiy
2014-08-26  7:19   ` David Gibson
2014-08-26  8:16     ` Alexey Kardashevskiy
2014-08-27  8:25       ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
2014-08-26  7:20   ` David Gibson
2014-08-26  8:20     ` Alexey Kardashevskiy
2014-08-27  8:42       ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option Alexey Kardashevskiy
2014-08-27  9:44   ` Alexander Graf
2014-08-27 14:24     ` Alexey Kardashevskiy
2014-08-27  9:44   ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53FDE374.1040705@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).