From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Thomas Huth <thuth@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>,
qemu-devel@nongnu.org, Gavin Shan <gwshan@linux.vnet.ibm.com>,
Alex Williamson <alex.williamson@redhat.com>,
qemu-ppc@nongnu.org, David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH qemu v10 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
Date: Tue, 7 Jul 2015 20:43:44 +1000 [thread overview]
Message-ID: <559BAD60.50701@ozlabs.ru> (raw)
In-Reply-To: <20150707113335.66821a88@thh440s>
On 07/07/2015 07:33 PM, Thomas Huth wrote:
> On Mon, 6 Jul 2015 12:11:10 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
>> This adds support for Dynamic DMA Windows (DDW) option defined by
>> the SPAPR specification which allows to have additional DMA window(s)
>>
>> This implements DDW for emulated and VFIO devices. As all TCE root regions
>> are mapped at 0 and 64bit long (and actual tables are child regions),
>> this replaces memory_region_add_subregion() with _overlap() to make
>> QEMU memory API happy.
>>
>> This reserves RTAS token numbers for DDW calls.
>>
>> This implements helpers to interact with VFIO kernel interface.
>>
>> This changes the TCE table migration descriptor to support dynamic
>> tables as from now on, PHB will create as many stub TCE table objects
>> as PHB can possibly support but not all of them might be initialized at
>> the time of migration because DDW might or might not be requested by
>> the guest.
>>
>> The "ddw" property is enabled by default on a PHB but for compatibility
>> the pseries-2.3 machine and older disable it.
>>
>> This implements DDW for VFIO. The host kernel support is required.
>> This adds a "levels" property to PHB to control the number of levels
>> in the actual TCE table allocated by the host kernel, 0 is the default
>> value to tell QEMU to calculate the correct value. Current hardware
>> supports up to 5 levels.
>>
>> The existing linux guests try creating one additional huge DMA window
>> with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
>> the guest switches to dma_direct_ops and never calls TCE hypercalls
>> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
>> and not waste time on map/unmap later. This adds a "dma64_win_addr"
>> property which is a bus address for the 64bit window and by default
>> set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
>> uses and this allows having emulated and VFIO devices on the same bus.
>>
>> This adds 4 RTAS handlers:
>> * ibm,query-pe-dma-window
>> * ibm,create-pe-dma-window
>> * ibm,remove-pe-dma-window
>> * ibm,reset-pe-dma-window
>> These are registered from type_init() callback.
>>
>> These RTAS handlers are implemented in a separate file to avoid polluting
>> spapr_iommu.c with PCI.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
> ...
>> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
>> new file mode 100644
>> index 0000000..7539c6a
>> --- /dev/null
>> +++ b/hw/ppc/spapr_rtas_ddw.c
>> @@ -0,0 +1,300 @@
>> +/*
>> + * QEMU sPAPR Dynamic DMA windows support
>> + *
>> + * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.
>
> Happy new year?
>
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License,
>> + * or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/error-report.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/pci-host/spapr.h"
>> +#include "trace.h"
>> +
>> +static int spapr_phb_get_active_win_num_cb(Object *child, void *opaque)
>> +{
>> + sPAPRTCETable *tcet;
>> +
>> + tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
>> + if (tcet && tcet->enabled) {
>> + ++*(unsigned *)opaque;
>> + }
>> + return 0;
>> +}
>> +
>> +static unsigned spapr_phb_get_active_win_num(sPAPRPHBState *sphb)
>> +{
>> + unsigned ret = 0;
>> +
>> + object_child_foreach(OBJECT(sphb), spapr_phb_get_active_win_num_cb, &ret);
>> +
>> + return ret;
>> +}
>> +
>> +static int spapr_phb_get_free_liobn_cb(Object *child, void *opaque)
>> +{
>> + sPAPRTCETable *tcet;
>> +
>> + tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
>> + if (tcet && !tcet->enabled) {
>> + *(uint32_t *)opaque = tcet->liobn;
>> + return 1;
>> + }
>> + return 0;
>> +}
>> +
>> +static unsigned spapr_phb_get_free_liobn(sPAPRPHBState *sphb)
>> +{
>> + uint32_t liobn = 0;
>> +
>> + object_child_foreach(OBJECT(sphb), spapr_phb_get_free_liobn_cb, &liobn);
>> +
>> + return liobn;
>> +}
>> +
>> +static uint32_t spapr_query_mask(struct ppc_one_seg_page_size *sps,
>> + uint64_t page_mask)
>> +{
>> + int i, j;
>> + uint32_t mask = 0;
>> + const struct { int shift; uint32_t mask; } masks[] = {
>> + { 12, RTAS_DDW_PGSIZE_4K },
>> + { 16, RTAS_DDW_PGSIZE_64K },
>> + { 24, RTAS_DDW_PGSIZE_16M },
>> + { 25, RTAS_DDW_PGSIZE_32M },
>> + { 26, RTAS_DDW_PGSIZE_64M },
>> + { 27, RTAS_DDW_PGSIZE_128M },
>> + { 28, RTAS_DDW_PGSIZE_256M },
>> + { 34, RTAS_DDW_PGSIZE_16G },
>> + };
>> +
>> + for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
>> + for (j = 0; j < ARRAY_SIZE(masks); ++j) {
>> + if ((sps[i].page_shift == masks[j].shift) &&
>> + (page_mask & (1ULL << masks[j].shift))) {
>> + mask |= masks[j].mask;
>> + }
>> + }
>> + }
>> +
>> + return mask;
>> +}
>> +
>> +static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
>> + sPAPRMachineState *spapr,
>> + uint32_t token, uint32_t nargs,
>> + target_ulong args,
>> + uint32_t nret, target_ulong rets)
>> +{
>> + CPUPPCState *env = &cpu->env;
>> + sPAPRPHBState *sphb;
>> + uint64_t buid;
>> + uint32_t avail, addr, pgmask = 0;
>> + unsigned current;
>> +
>> + if ((nargs != 3) || (nret != 5)) {
>> + goto param_error_exit;
>> + }
>> +
>> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>> + addr = rtas_ld(args, 0);
>> + sphb = spapr_pci_find_phb(spapr, buid);
>> + if (!sphb || !sphb->ddw_enabled) {
>> + goto param_error_exit;
>> + }
>> +
>> + current = spapr_phb_get_active_win_num(sphb);
>> + avail = (sphb->windows_supported > current) ?
>> + (sphb->windows_supported - current) : 0;
>> +
>> + /* Work out supported page masks */
>> + pgmask = spapr_query_mask(env->sps.sps, sphb->page_size_mask);
>> +
>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> + rtas_st(rets, 1, avail);
>> +
>> + /*
>> + * This is "Largest contiguous block of TCEs allocated specifically
>> + * for (that is, are reserved for) this PE".
>> + * Return the maximum number as all RAM was in 4K pages.
>> + */
>> + rtas_st(rets, 2, sphb->dma64_window_size >> SPAPR_TCE_PAGE_SHIFT);
>> + rtas_st(rets, 3, pgmask);
>> + rtas_st(rets, 4, 0); /* DMA migration mask, not supported */
>> +
>> + trace_spapr_iommu_ddw_query(buid, addr, avail, sphb->dma64_window_size,
>> + pgmask);
>> + return;
>> +
>> +param_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +}
>> +
>> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
>> + sPAPRMachineState *spapr,
>> + uint32_t token, uint32_t nargs,
>> + target_ulong args,
>> + uint32_t nret, target_ulong rets)
>> +{
>> + sPAPRPHBState *sphb;
>> + sPAPRTCETable *tcet = NULL;
>> + uint32_t addr, page_shift, window_shift, liobn;
>> + uint64_t buid;
>> + long ret;
>> +
>> + if ((nargs != 5) || (nret != 4)) {
>
> Pascal bracket style again :-(
Am I breaking any code design guideline here?
>
>> + goto param_error_exit;
>> + }
>> +
>> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
But here braces are ok? :-/
>> + addr = rtas_ld(args, 0);
>> + sphb = spapr_pci_find_phb(spapr, buid);
>> + if (!sphb || !sphb->ddw_enabled) {
>> + goto param_error_exit;
>> + }
>> +
>> + page_shift = rtas_ld(args, 3);
>> + window_shift = rtas_ld(args, 4);
>> + liobn = spapr_phb_get_free_liobn(sphb);
>> +
>> + if (!liobn || !(sphb->page_size_mask & (1ULL << page_shift))) {
>> + goto hw_error_exit;
>> + }
>> +
>> + ret = spapr_phb_dma_init_window(sphb, liobn, page_shift,
>> + 1ULL << window_shift);
>
> As already mentioned in a comment to another patch in this series, I
> think it maybe might be better to do some sanity checks on the
> window_shift value, too?
Well, as you suggested, I added a check to spapr_phb_dma_init_window()
which makes this code return RTAS_OUT_HW_ERROR. Or I can add this here:
if (window_shift < page_shift) {
goto param_error_exit;
}
and RTAS handler will return RTAS_OUT_PARAM_ERROR.
SPAPR does not say what is the correct reponse in this case...
>
>> + tcet = spapr_tce_find_by_liobn(liobn);
>> + trace_spapr_iommu_ddw_create(buid, addr, 1ULL << page_shift,
>> + 1ULL << window_shift,
>> + tcet ? tcet->bus_offset : 0xbaadf00d,
>> + liobn, ret);
>> + if (ret || !tcet) {
>> + goto hw_error_exit;
>> + }
>> +
>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> + rtas_st(rets, 1, liobn);
>> + rtas_st(rets, 2, tcet->bus_offset >> 32);
>> + rtas_st(rets, 3, tcet->bus_offset & ((uint32_t) -1));
>
> Why don't you simply use 0xffffffff instead of ((uint32_t) -1) ?
> That's shorter and much easier to understand at a first glance than
> calulating the type-cast in your brain ;-)
At a first glance I cannot tell if there are 7 or 8 or 9 "f"s in
0xffffffff. I may accidentally add/remove one "f" and nobody will notice.
Such typecast of (-1) is quite typical.
>
>> +
>> + return;
>> +
>> +hw_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> + return;
>> +
>> +param_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +}
>> +
>> +static void rtas_ibm_remove_pe_dma_window(PowerPCCPU *cpu,
>> + sPAPRMachineState *spapr,
>> + uint32_t token, uint32_t nargs,
>> + target_ulong args,
>> + uint32_t nret, target_ulong rets)
>> +{
>> + sPAPRPHBState *sphb;
>> + sPAPRTCETable *tcet;
>> + uint32_t liobn;
>> + long ret;
>> +
>> + if ((nargs != 1) || (nret != 1)) {
>
> (╯°□°)╯︵ ┻━┻
>
>> + goto param_error_exit;
>> + }
>> +
>> + liobn = rtas_ld(args, 0);
>> + tcet = spapr_tce_find_by_liobn(liobn);
>> + if (!tcet) {
>> + goto param_error_exit;
>> + }
>> +
>> + sphb = SPAPR_PCI_HOST_BRIDGE(OBJECT(tcet)->parent);
>> + if (!sphb || !sphb->ddw_enabled) {
>> + goto param_error_exit;
>> + }
>> +
>> + ret = spapr_phb_dma_remove_window(sphb, tcet);
>> + trace_spapr_iommu_ddw_remove(liobn, ret);
>> + if (ret) {
>> + goto hw_error_exit;
>> + }
>> +
>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> + return;
>> +
>> +hw_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> + return;
>> +
>> +param_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +}
>> +
>> +static void rtas_ibm_reset_pe_dma_window(PowerPCCPU *cpu,
>> + sPAPRMachineState *spapr,
>> + uint32_t token, uint32_t nargs,
>> + target_ulong args,
>> + uint32_t nret, target_ulong rets)
>> +{
>> + sPAPRPHBState *sphb;
>> + uint64_t buid;
>> + uint32_t addr;
>> + long ret;
>> +
>> + if ((nargs != 3) || (nret != 1)) {
>
> ┬─┬ ︵ /(.□. \)
>
>> + goto param_error_exit;
>> + }
>> +
>> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>> + addr = rtas_ld(args, 0);
>> + sphb = spapr_pci_find_phb(spapr, buid);
>> + if (!sphb || !sphb->ddw_enabled) {
>> + goto param_error_exit;
>> + }
>> +
>> + ret = spapr_phb_dma_reset(sphb);
>> + trace_spapr_iommu_ddw_reset(buid, addr, ret);
>> + if (ret) {
>> + goto hw_error_exit;
>> + }
>> +
>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +
>> + return;
>> +
>> +hw_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> + return;
>> +
>> +param_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +}
>> +
>> +static void spapr_rtas_ddw_init(void)
>> +{
>> + spapr_rtas_register(RTAS_IBM_QUERY_PE_DMA_WINDOW,
>> + "ibm,query-pe-dma-window",
>> + rtas_ibm_query_pe_dma_window);
>> + spapr_rtas_register(RTAS_IBM_CREATE_PE_DMA_WINDOW,
>> + "ibm,create-pe-dma-window",
>> + rtas_ibm_create_pe_dma_window);
>> + spapr_rtas_register(RTAS_IBM_REMOVE_PE_DMA_WINDOW,
>> + "ibm,remove-pe-dma-window",
>> + rtas_ibm_remove_pe_dma_window);
>> + spapr_rtas_register(RTAS_IBM_RESET_PE_DMA_WINDOW,
>> + "ibm,reset-pe-dma-window",
>> + rtas_ibm_reset_pe_dma_window);
>> +}
>> +
>> +type_init(spapr_rtas_ddw_init)
>
> Thomas
>
--
Alexey
next prev parent reply other threads:[~2015-07-07 10:43 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-06 2:10 [Qemu-devel] [PATCH qemu v10 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2015-07-06 2:10 ` [Qemu-devel] [PATCH qemu v10 01/14] linux-headers: Update to 4.2-rc1 Alexey Kardashevskiy
2015-07-06 11:18 ` Paolo Bonzini
2015-07-06 2:10 ` [Qemu-devel] [PATCH qemu v10 02/14] vmstate: Define VARRAY with VMS_ALLOC Alexey Kardashevskiy
2015-07-06 14:21 ` Thomas Huth
2015-07-06 2:10 ` [Qemu-devel] [PATCH qemu v10 03/14] spapr_pci: Convert finish_realize() to dma_capabilities_update()+dma_init_window() Alexey Kardashevskiy
2015-07-06 16:41 ` Laurent Vivier
2015-07-07 0:28 ` Alexey Kardashevskiy
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 04/14] spapr_iommu: Move table allocation to helpers Alexey Kardashevskiy
2015-07-06 15:14 ` Thomas Huth
2015-07-06 15:43 ` Alexey Kardashevskiy
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 05/14] spapr_iommu: Introduce "enabled" state for TCE table Alexey Kardashevskiy
2015-07-06 10:07 ` David Gibson
2015-07-06 17:04 ` Thomas Huth
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 06/14] spapr_iommu: Remove vfio_accel flag from sPAPRTCETable Alexey Kardashevskiy
2015-07-06 16:45 ` Laurent Vivier
2015-07-06 17:11 ` Thomas Huth
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 07/14] spapr_iommu: Add root memory region Alexey Kardashevskiy
2015-07-06 19:15 ` Thomas Huth
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 08/14] spapr_pci: Do complete reset of DMA config when resetting PHB Alexey Kardashevskiy
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 09/14] spapr_vfio_pci: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-07-06 21:13 ` Thomas Huth
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 10/14] spapr_pci: Enable vfio-pci hotplug Alexey Kardashevskiy
2015-07-06 10:27 ` David Gibson
2015-07-06 21:31 ` Thomas Huth
2015-07-07 9:28 ` Alexey Kardashevskiy
2015-07-10 21:33 ` Michael Roth
2015-07-12 4:59 ` Alexey Kardashevskiy
2015-07-12 14:41 ` Michael Roth
2015-07-13 1:10 ` David Gibson
2015-07-13 7:06 ` Alexey Kardashevskiy
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 11/14] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
2015-07-07 7:02 ` Thomas Huth
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 12/14] vfio: Unregister IOMMU notifiers when container is destroyed Alexey Kardashevskiy
2015-07-06 10:33 ` David Gibson
2015-07-06 12:49 ` Alex Williamson
2015-07-06 12:59 ` Alexey Kardashevskiy
2015-07-06 13:45 ` Alex Williamson
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
2015-07-06 13:42 ` Alex Williamson
2015-07-06 15:34 ` Alexey Kardashevskiy
2015-07-06 16:13 ` Alex Williamson
2015-07-07 0:29 ` David Gibson
2015-07-07 0:36 ` Alexey Kardashevskiy
2015-07-07 12:11 ` Alexey Kardashevskiy
2015-07-07 16:24 ` Alex Williamson
2015-07-08 6:26 ` Alexey Kardashevskiy
2015-07-08 14:51 ` Alex Williamson
2015-07-07 7:23 ` Thomas Huth
2015-07-07 10:05 ` Alexey Kardashevskiy
2015-07-07 10:21 ` Thomas Huth
2015-07-07 11:05 ` Alexey Kardashevskiy
2015-07-08 4:30 ` David Gibson
2015-07-08 6:24 ` Thomas Huth
2015-07-08 6:50 ` David Gibson
2015-07-08 7:07 ` Alexey Kardashevskiy
2015-07-08 14:47 ` Alex Williamson
2015-07-06 2:11 ` [Qemu-devel] [PATCH qemu v10 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
2015-07-06 11:06 ` David Gibson
2015-07-06 11:27 ` Alexey Kardashevskiy
2015-07-07 9:46 ` Alexey Kardashevskiy
2015-07-07 4:58 ` David Gibson
2015-07-07 9:33 ` Thomas Huth
2015-07-07 10:43 ` Alexey Kardashevskiy [this message]
2015-07-07 11:35 ` Thomas Huth
2015-07-07 11:53 ` Alexey Kardashevskiy
2015-07-06 11:13 ` [Qemu-devel] [PATCH qemu v10 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW) David Gibson
2015-07-06 15:54 ` Thomas Huth
2015-07-06 16:07 ` Alexey Kardashevskiy
2015-07-06 16:13 ` Thomas Huth
2015-07-08 4:34 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=559BAD60.50701@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=gwshan@linux.vnet.ibm.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.