From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35685) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5Y6z-0008IB-2C for qemu-devel@nongnu.org; Thu, 18 Jun 2015 07:36:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5Y6s-0002Jk-8A for qemu-devel@nongnu.org; Thu, 18 Jun 2015 07:36:00 -0400 Received: from mail-pd0-f172.google.com ([209.85.192.172]:36577) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5Y6r-0002JJ-VK for qemu-devel@nongnu.org; Thu, 18 Jun 2015 07:35:54 -0400 Received: by pdjm12 with SMTP id m12so64807511pdj.3 for ; Thu, 18 Jun 2015 04:35:52 -0700 (PDT) References: <1429964684-23872-1-git-send-email-aik@ozlabs.ru> <1429964684-23872-14-git-send-email-aik@ozlabs.ru> <20150505124940.GS14090@voom.redhat.com> From: Alexey Kardashevskiy Message-ID: <5582AD10.3070400@ozlabs.ru> Date: Thu, 18 Jun 2015 21:35:44 +1000 MIME-Version: 1.0 In-Reply-To: <20150505124940.GS14090@voom.redhat.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH qemu v7 13/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Alexander Graf On 05/05/2015 10:49 PM, David Gibson wrote: > On Sat, Apr 25, 2015 at 10:24:43PM +1000, Alexey Kardashevskiy wrote: >> This adds support for Dynamic DMA Windows (DDW) option defined by >> the SPAPR specification which allows to have additional DMA window(s) >> >> This implements DDW for emulated and VFIO devices. As all TCE root regions >> are mapped at 0 and 64bit long (and actual tables are child regions), >> this replaces memory_region_add_subregion() with _overlap() to make >> QEMU memory API happy. >> >> This reserves RTAS token numbers for DDW calls. >> >> This implements helpers to interact with VFIO kernel interface. >> >> This changes the TCE table migration descriptor to support dynamic >> tables as from now on, PHB will create as many stub TCE table objects >> as PHB can possibly support but not all of them might be initialized at >> the time of migration because DDW might or might not be requested by >> the guest. >> >> The "ddw" property is enabled by default on a PHB but for compatibility >> the pseries-2.3 machine and older disable it. >> >> This implements DDW for VFIO. The host kernel support is required. >> This adds a "levels" property to PHB to control the number of levels >> in the actual TCE table allocated by the host kernel, 0 is the default >> value to tell QEMU to calculate the correct value. Current hardware >> supports up to 5 levels. >> >> The existing linux guests try creating one additional huge DMA window >> with 64K or 16MB pages and map the entire guest RAM to. If succeeded, >> the guest switches to dma_direct_ops and never calls TCE hypercalls >> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM >> and not waste time on map/unmap later. >> >> This adds 4 RTAS handlers: >> * ibm,query-pe-dma-window >> * ibm,create-pe-dma-window >> * ibm,remove-pe-dma-window >> * ibm,reset-pe-dma-window >> These are registered from type_init() callback. >> >> These RTAS handlers are implemented in a separate file to avoid polluting >> spapr_iommu.c with PCI. >> >> Signed-off-by: Alexey Kardashevskiy > > Reviewed-by: David Gibson I saw this and decided there are no more coments but I was wrong :) >> --- >> Changes: >> v6: >> * rework as there is no more special device for VFIO PHB >> >> v5: >> * total rework >> * enabled for machines >2.3 >> * fixed migration >> * merged rtas handlers here >> >> v4: >> * reset handler is back in generalized form >> >> v3: >> * removed reset >> * windows_num is now 1 or bigger rather than 0-based value and it is only >> changed in PHB code, not in RTAS >> * added page mask check in create() >> * added SPAPR_PCI_DDW_MAX_WINDOWS to track how many windows are already >> created >> >> v2: >> * tested on hacked emulated E1000 >> * implemented DDW reset on the PHB reset >> * spapr_pci_ddw_remove/spapr_pci_ddw_reset are public for reuse by VFIO >> --- >> hw/ppc/Makefile.objs | 3 + >> hw/ppc/spapr.c | 10 +- >> hw/ppc/spapr_iommu.c | 35 +++++- >> hw/ppc/spapr_pci.c | 66 ++++++++-- >> hw/ppc/spapr_pci_vfio.c | 80 ++++++++++++ >> hw/ppc/spapr_rtas_ddw.c | 300 ++++++++++++++++++++++++++++++++++++++++++++ >> include/hw/pci-host/spapr.h | 21 ++++ >> include/hw/ppc/spapr.h | 17 ++- >> trace-events | 4 + >> 9 files changed, 521 insertions(+), 15 deletions(-) >> create mode 100644 hw/ppc/spapr_rtas_ddw.c >> >> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs >> index 437955d..c6b344f 100644 >> --- a/hw/ppc/Makefile.objs >> +++ b/hw/ppc/Makefile.objs >> @@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o >> ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy) >> obj-y += spapr_pci_vfio.o >> endif >> +ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy) >> +obj-y += spapr_rtas_ddw.o >> +endif >> # PowerPC 4xx boards >> obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o >> obj-y += ppc4xx_pci.o >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index b28209f..fd7fdb3 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -1801,7 +1801,15 @@ static const TypeInfo spapr_machine_info = { >> }, >> }; >> >> +#define SPAPR_COMPAT_2_3 \ >> + {\ >> + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE,\ >> + .property = "ddw",\ >> + .value = stringify(off),\ >> + } >> + >> #define SPAPR_COMPAT_2_2 \ >> + SPAPR_COMPAT_2_3, \ >> {\ >> .driver = TYPE_SPAPR_PCI_HOST_BRIDGE,\ >> .property = "mem_win_size",\ >> @@ -1853,7 +1861,7 @@ static const TypeInfo spapr_machine_2_2_info = { >> static void spapr_machine_2_3_class_init(ObjectClass *oc, void *data) >> { >> static GlobalProperty compat_props[] = { >> - SPAPR_COMPAT_2_2, >> + SPAPR_COMPAT_2_3, >> { /* end of list */ } >> }; >> MachineClass *mc = MACHINE_CLASS(oc); >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c >> index 245534f..df4c72d 100644 >> --- a/hw/ppc/spapr_iommu.c >> +++ b/hw/ppc/spapr_iommu.c >> @@ -90,6 +90,15 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr, >> return ret; >> } >> >> +static void spapr_tce_table_pre_save(void *opaque) >> +{ >> + sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque); >> + >> + tcet->migtable = tcet->table; >> +} >> + >> +static void spapr_tce_table_do_enable(sPAPRTCETable *tcet); >> + >> static int spapr_tce_table_post_load(void *opaque, int version_id) >> { >> sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque); >> @@ -98,22 +107,42 @@ static int spapr_tce_table_post_load(void *opaque, int version_id) >> spapr_vio_set_bypass(tcet->vdev, tcet->bypass); >> } >> >> + if (!tcet->migtable) { > > What's the case where migtable will be NULL? IIUC an old->new > migration will result in the data saved for "table" being loaded into > "migtable". > > So "migtable" should only be NULL, when tce->enabled is also false? Seems to be true and this is just extra precaution. Remove? -- Alexey