From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57174) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCQM0-0002E8-Uj for qemu-devel@nongnu.org; Tue, 07 Jul 2015 06:43:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZCQLx-00004t-KF for qemu-devel@nongnu.org; Tue, 07 Jul 2015 06:43:56 -0400 Received: from mail-pd0-f175.google.com ([209.85.192.175]:36342) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCQLx-0008WN-C9 for qemu-devel@nongnu.org; Tue, 07 Jul 2015 06:43:53 -0400 Received: by pddu5 with SMTP id u5so36314251pdd.3 for ; Tue, 07 Jul 2015 03:43:52 -0700 (PDT) References: <1436148670-6592-1-git-send-email-aik@ozlabs.ru> <1436148670-6592-15-git-send-email-aik@ozlabs.ru> <20150707113335.66821a88@thh440s> From: Alexey Kardashevskiy Message-ID: <559BAD60.50701@ozlabs.ru> Date: Tue, 7 Jul 2015 20:43:44 +1000 MIME-Version: 1.0 In-Reply-To: <20150707113335.66821a88@thh440s> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH qemu v10 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Thomas Huth Cc: Michael Roth , qemu-devel@nongnu.org, Gavin Shan , Alex Williamson , qemu-ppc@nongnu.org, David Gibson On 07/07/2015 07:33 PM, Thomas Huth wrote: > On Mon, 6 Jul 2015 12:11:10 +1000 > Alexey Kardashevskiy wrote: > >> This adds support for Dynamic DMA Windows (DDW) option defined by >> the SPAPR specification which allows to have additional DMA window(s) >> >> This implements DDW for emulated and VFIO devices. As all TCE root regions >> are mapped at 0 and 64bit long (and actual tables are child regions), >> this replaces memory_region_add_subregion() with _overlap() to make >> QEMU memory API happy. >> >> This reserves RTAS token numbers for DDW calls. >> >> This implements helpers to interact with VFIO kernel interface. >> >> This changes the TCE table migration descriptor to support dynamic >> tables as from now on, PHB will create as many stub TCE table objects >> as PHB can possibly support but not all of them might be initialized at >> the time of migration because DDW might or might not be requested by >> the guest. >> >> The "ddw" property is enabled by default on a PHB but for compatibility >> the pseries-2.3 machine and older disable it. >> >> This implements DDW for VFIO. The host kernel support is required. >> This adds a "levels" property to PHB to control the number of levels >> in the actual TCE table allocated by the host kernel, 0 is the default >> value to tell QEMU to calculate the correct value. Current hardware >> supports up to 5 levels. >> >> The existing linux guests try creating one additional huge DMA window >> with 64K or 16MB pages and map the entire guest RAM to. If succeeded, >> the guest switches to dma_direct_ops and never calls TCE hypercalls >> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM >> and not waste time on map/unmap later. This adds a "dma64_win_addr" >> property which is a bus address for the 64bit window and by default >> set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware >> uses and this allows having emulated and VFIO devices on the same bus. >> >> This adds 4 RTAS handlers: >> * ibm,query-pe-dma-window >> * ibm,create-pe-dma-window >> * ibm,remove-pe-dma-window >> * ibm,reset-pe-dma-window >> These are registered from type_init() callback. >> >> These RTAS handlers are implemented in a separate file to avoid polluting >> spapr_iommu.c with PCI. >> >> Signed-off-by: Alexey Kardashevskiy >> --- > ... >> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c >> new file mode 100644 >> index 0000000..7539c6a >> --- /dev/null >> +++ b/hw/ppc/spapr_rtas_ddw.c >> @@ -0,0 +1,300 @@ >> +/* >> + * QEMU sPAPR Dynamic DMA windows support >> + * >> + * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation. > > Happy new year? > >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation; either version 2 of the License, >> + * or (at your option) any later version. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program; if not, see . >> + */ >> + >> +#include "qemu/error-report.h" >> +#include "hw/ppc/spapr.h" >> +#include "hw/pci-host/spapr.h" >> +#include "trace.h" >> + >> +static int spapr_phb_get_active_win_num_cb(Object *child, void *opaque) >> +{ >> + sPAPRTCETable *tcet; >> + >> + tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE); >> + if (tcet && tcet->enabled) { >> + ++*(unsigned *)opaque; >> + } >> + return 0; >> +} >> + >> +static unsigned spapr_phb_get_active_win_num(sPAPRPHBState *sphb) >> +{ >> + unsigned ret = 0; >> + >> + object_child_foreach(OBJECT(sphb), spapr_phb_get_active_win_num_cb, &ret); >> + >> + return ret; >> +} >> + >> +static int spapr_phb_get_free_liobn_cb(Object *child, void *opaque) >> +{ >> + sPAPRTCETable *tcet; >> + >> + tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE); >> + if (tcet && !tcet->enabled) { >> + *(uint32_t *)opaque = tcet->liobn; >> + return 1; >> + } >> + return 0; >> +} >> + >> +static unsigned spapr_phb_get_free_liobn(sPAPRPHBState *sphb) >> +{ >> + uint32_t liobn = 0; >> + >> + object_child_foreach(OBJECT(sphb), spapr_phb_get_free_liobn_cb, &liobn); >> + >> + return liobn; >> +} >> + >> +static uint32_t spapr_query_mask(struct ppc_one_seg_page_size *sps, >> + uint64_t page_mask) >> +{ >> + int i, j; >> + uint32_t mask = 0; >> + const struct { int shift; uint32_t mask; } masks[] = { >> + { 12, RTAS_DDW_PGSIZE_4K }, >> + { 16, RTAS_DDW_PGSIZE_64K }, >> + { 24, RTAS_DDW_PGSIZE_16M }, >> + { 25, RTAS_DDW_PGSIZE_32M }, >> + { 26, RTAS_DDW_PGSIZE_64M }, >> + { 27, RTAS_DDW_PGSIZE_128M }, >> + { 28, RTAS_DDW_PGSIZE_256M }, >> + { 34, RTAS_DDW_PGSIZE_16G }, >> + }; >> + >> + for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) { >> + for (j = 0; j < ARRAY_SIZE(masks); ++j) { >> + if ((sps[i].page_shift == masks[j].shift) && >> + (page_mask & (1ULL << masks[j].shift))) { >> + mask |= masks[j].mask; >> + } >> + } >> + } >> + >> + return mask; >> +} >> + >> +static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu, >> + sPAPRMachineState *spapr, >> + uint32_t token, uint32_t nargs, >> + target_ulong args, >> + uint32_t nret, target_ulong rets) >> +{ >> + CPUPPCState *env = &cpu->env; >> + sPAPRPHBState *sphb; >> + uint64_t buid; >> + uint32_t avail, addr, pgmask = 0; >> + unsigned current; >> + >> + if ((nargs != 3) || (nret != 5)) { >> + goto param_error_exit; >> + } >> + >> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2); >> + addr = rtas_ld(args, 0); >> + sphb = spapr_pci_find_phb(spapr, buid); >> + if (!sphb || !sphb->ddw_enabled) { >> + goto param_error_exit; >> + } >> + >> + current = spapr_phb_get_active_win_num(sphb); >> + avail = (sphb->windows_supported > current) ? >> + (sphb->windows_supported - current) : 0; >> + >> + /* Work out supported page masks */ >> + pgmask = spapr_query_mask(env->sps.sps, sphb->page_size_mask); >> + >> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >> + rtas_st(rets, 1, avail); >> + >> + /* >> + * This is "Largest contiguous block of TCEs allocated specifically >> + * for (that is, are reserved for) this PE". >> + * Return the maximum number as all RAM was in 4K pages. >> + */ >> + rtas_st(rets, 2, sphb->dma64_window_size >> SPAPR_TCE_PAGE_SHIFT); >> + rtas_st(rets, 3, pgmask); >> + rtas_st(rets, 4, 0); /* DMA migration mask, not supported */ >> + >> + trace_spapr_iommu_ddw_query(buid, addr, avail, sphb->dma64_window_size, >> + pgmask); >> + return; >> + >> +param_error_exit: >> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); >> +} >> + >> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu, >> + sPAPRMachineState *spapr, >> + uint32_t token, uint32_t nargs, >> + target_ulong args, >> + uint32_t nret, target_ulong rets) >> +{ >> + sPAPRPHBState *sphb; >> + sPAPRTCETable *tcet = NULL; >> + uint32_t addr, page_shift, window_shift, liobn; >> + uint64_t buid; >> + long ret; >> + >> + if ((nargs != 5) || (nret != 4)) { > > Pascal bracket style again :-( Am I breaking any code design guideline here? > >> + goto param_error_exit; >> + } >> + >> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2); But here braces are ok? :-/ >> + addr = rtas_ld(args, 0); >> + sphb = spapr_pci_find_phb(spapr, buid); >> + if (!sphb || !sphb->ddw_enabled) { >> + goto param_error_exit; >> + } >> + >> + page_shift = rtas_ld(args, 3); >> + window_shift = rtas_ld(args, 4); >> + liobn = spapr_phb_get_free_liobn(sphb); >> + >> + if (!liobn || !(sphb->page_size_mask & (1ULL << page_shift))) { >> + goto hw_error_exit; >> + } >> + >> + ret = spapr_phb_dma_init_window(sphb, liobn, page_shift, >> + 1ULL << window_shift); > > As already mentioned in a comment to another patch in this series, I > think it maybe might be better to do some sanity checks on the > window_shift value, too? Well, as you suggested, I added a check to spapr_phb_dma_init_window() which makes this code return RTAS_OUT_HW_ERROR. Or I can add this here: if (window_shift < page_shift) { goto param_error_exit; } and RTAS handler will return RTAS_OUT_PARAM_ERROR. SPAPR does not say what is the correct reponse in this case... > >> + tcet = spapr_tce_find_by_liobn(liobn); >> + trace_spapr_iommu_ddw_create(buid, addr, 1ULL << page_shift, >> + 1ULL << window_shift, >> + tcet ? tcet->bus_offset : 0xbaadf00d, >> + liobn, ret); >> + if (ret || !tcet) { >> + goto hw_error_exit; >> + } >> + >> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >> + rtas_st(rets, 1, liobn); >> + rtas_st(rets, 2, tcet->bus_offset >> 32); >> + rtas_st(rets, 3, tcet->bus_offset & ((uint32_t) -1)); > > Why don't you simply use 0xffffffff instead of ((uint32_t) -1) ? > That's shorter and much easier to understand at a first glance than > calulating the type-cast in your brain ;-) At a first glance I cannot tell if there are 7 or 8 or 9 "f"s in 0xffffffff. I may accidentally add/remove one "f" and nobody will notice. Such typecast of (-1) is quite typical. > >> + >> + return; >> + >> +hw_error_exit: >> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR); >> + return; >> + >> +param_error_exit: >> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); >> +} >> + >> +static void rtas_ibm_remove_pe_dma_window(PowerPCCPU *cpu, >> + sPAPRMachineState *spapr, >> + uint32_t token, uint32_t nargs, >> + target_ulong args, >> + uint32_t nret, target_ulong rets) >> +{ >> + sPAPRPHBState *sphb; >> + sPAPRTCETable *tcet; >> + uint32_t liobn; >> + long ret; >> + >> + if ((nargs != 1) || (nret != 1)) { > > (╯°□°)╯︵ ┻━┻ > >> + goto param_error_exit; >> + } >> + >> + liobn = rtas_ld(args, 0); >> + tcet = spapr_tce_find_by_liobn(liobn); >> + if (!tcet) { >> + goto param_error_exit; >> + } >> + >> + sphb = SPAPR_PCI_HOST_BRIDGE(OBJECT(tcet)->parent); >> + if (!sphb || !sphb->ddw_enabled) { >> + goto param_error_exit; >> + } >> + >> + ret = spapr_phb_dma_remove_window(sphb, tcet); >> + trace_spapr_iommu_ddw_remove(liobn, ret); >> + if (ret) { >> + goto hw_error_exit; >> + } >> + >> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >> + return; >> + >> +hw_error_exit: >> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR); >> + return; >> + >> +param_error_exit: >> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); >> +} >> + >> +static void rtas_ibm_reset_pe_dma_window(PowerPCCPU *cpu, >> + sPAPRMachineState *spapr, >> + uint32_t token, uint32_t nargs, >> + target_ulong args, >> + uint32_t nret, target_ulong rets) >> +{ >> + sPAPRPHBState *sphb; >> + uint64_t buid; >> + uint32_t addr; >> + long ret; >> + >> + if ((nargs != 3) || (nret != 1)) { > > ┬─┬ ︵ /(.□. \) > >> + goto param_error_exit; >> + } >> + >> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2); >> + addr = rtas_ld(args, 0); >> + sphb = spapr_pci_find_phb(spapr, buid); >> + if (!sphb || !sphb->ddw_enabled) { >> + goto param_error_exit; >> + } >> + >> + ret = spapr_phb_dma_reset(sphb); >> + trace_spapr_iommu_ddw_reset(buid, addr, ret); >> + if (ret) { >> + goto hw_error_exit; >> + } >> + >> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >> + >> + return; >> + >> +hw_error_exit: >> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR); >> + return; >> + >> +param_error_exit: >> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); >> +} >> + >> +static void spapr_rtas_ddw_init(void) >> +{ >> + spapr_rtas_register(RTAS_IBM_QUERY_PE_DMA_WINDOW, >> + "ibm,query-pe-dma-window", >> + rtas_ibm_query_pe_dma_window); >> + spapr_rtas_register(RTAS_IBM_CREATE_PE_DMA_WINDOW, >> + "ibm,create-pe-dma-window", >> + rtas_ibm_create_pe_dma_window); >> + spapr_rtas_register(RTAS_IBM_REMOVE_PE_DMA_WINDOW, >> + "ibm,remove-pe-dma-window", >> + rtas_ibm_remove_pe_dma_window); >> + spapr_rtas_register(RTAS_IBM_RESET_PE_DMA_WINDOW, >> + "ibm,reset-pe-dma-window", >> + rtas_ibm_reset_pe_dma_window); >> +} >> + >> +type_init(spapr_rtas_ddw_init) > > Thomas > -- Alexey