From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34020) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bBEg3-0002iQ-Ql for qemu-devel@nongnu.org; Fri, 10 Jun 2016 01:08:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bBEfv-0005mm-CM for qemu-devel@nongnu.org; Fri, 10 Jun 2016 01:08:14 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36861) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bBEfv-0005mH-4J for qemu-devel@nongnu.org; Fri, 10 Jun 2016 01:08:07 -0400 Received: from pps.filterd (m0048827.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u5A53rbE042501 for ; Fri, 10 Jun 2016 01:08:02 -0400 Received: from e23smtp01.au.ibm.com (e23smtp01.au.ibm.com [202.81.31.143]) by mx0a-001b2d01.pphosted.com with ESMTP id 23f9vdg3nk-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 10 Jun 2016 01:08:01 -0400 Received: from localhost by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 10 Jun 2016 15:07:59 +1000 Date: Fri, 10 Jun 2016 10:37:05 +0530 From: Bharata B Rao Reply-To: bharata@linux.vnet.ibm.com References: <1465276743-7340-1-git-send-email-bharata@linux.vnet.ibm.com> <20160607233728.713.97557@loki> <20160608023554.GA8861@in.ibm.com> <20160608150512.665.12735@loki> <20160609020330.GG9226@voom.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160609020330.GG9226@voom.fritz.box> Message-Id: <20160610050705.GE8861@in.ibm.com> Subject: Re: [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Michael Roth , qemu-devel@nongnu.org, nfont@linux.vnet.ibm.com, aik@ozlabs.ru, qemu-ppc@nongnu.org On Thu, Jun 09, 2016 at 12:03:30PM +1000, David Gibson wrote: > On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote: > > Quoting Bharata B Rao (2016-06-07 21:35:54) > > > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote: > > > > Quoting Bharata B Rao (2016-06-07 00:19:03) > > > > > Memory hotplug can fail for some combinations of RAM and maxmem when > > > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends > > > > > on maximum addressable memory returned by guest and this value is currently > > > > > being calculated wrongly by the guest kernel routine memory_hotplug_max(). > > > > > While there is an attempt to fix the guest kernel, this patch works > > > > > around the problem within QEMU itself. > > > > > > > > > > memory_hotplug_max() routine in the guest kernel arrives at max > > > > > addressable memory by multiplying lmb-size with the lmb-count obtained > > > > > from ibm,dynamic-memory property. There are two assumptions here: > > > > > > > > > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM > > > > > where only hot-pluggable LMBs are present in this property. > > > > > - The memory area comprising of RAM and hotplug region is contiguous: This > > > > > needn't be true always for PowerKVM as there can be gap between > > > > > boot time RAM and hotplug region. > > > > > > > > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory > > > > > has information about all the LMBs (RMA, boot-time LMBs, future > > > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and > > > > > hotpluggable region). > > > > > > > > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs > > > > > and also the LMBs for the gap b/n RAM and hotpluggable region as > > > > > reserved so that these LMBs are not recounted/counted by guest. > > > > > > > > > > Signed-off-by: Bharata B Rao > > > > > --- > > > > > Changes in v3: > > > > > > > > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue > > > > > to have DRC objects for only hotpluggable LMBs. > > > > > - Simplified the logic of creating dynamic-memory node based on comments > > > > > from Michael Roth and David Gibson. > > > > > > > > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html > > > > > > > > > > hw/ppc/spapr.c | 51 ++++++++++++++++++++++++++++++++------------------ > > > > > include/hw/ppc/spapr.h | 5 +++-- > > > > > 2 files changed, 36 insertions(+), 20 deletions(-) > > > > > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > > index 0636642..9d1d43d 100644 > > > > > --- a/hw/ppc/spapr.c > > > > > +++ b/hw/ppc/spapr.c > > > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) > > > > > int ret, i, offset; > > > > > uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; > > > > > uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)}; > > > > > - uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size; > > > > > + uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size; > > > > > + uint32_t nr_lmbs = (spapr->hotplug_memory.base + > > > > > + memory_region_size(&spapr->hotplug_memory.mr)) / > > > > > + lmb_size; > > > > > uint32_t *int_buf, *cur_index, buf_len; > > > > > int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; > > > > > > > > > > /* > > > > > - * Don't create the node if there are no DR LMBs. > > > > > + * Don't create the node if there is no hotpluggable memory > > > > > */ > > > > > - if (!nr_lmbs) { > > > > > + if (machine->ram_size == machine->maxram_size) { > > > > > return 0; > > > > > } > > > > > > > > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) > > > > > for (i = 0; i < nr_lmbs; i++) { > > > > > sPAPRDRConnector *drc; > > > > > sPAPRDRConnectorClass *drck; > > > > > > > > Since these ^ are only used if (i >= hotplug_lmb_start), it might be > > > > clearer to move them there now. > > > > > > Yes. > > > > > > > > > > > > - uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;; > > > > > + uint64_t addr = i * lmb_size; > > > > > uint32_t *dynamic_memory = cur_index; > > > > > > > > > > - drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB, > > > > > - addr/lmb_size); > > > > > - g_assert(drc); > > > > > - drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > > > > - > > > > > - dynamic_memory[0] = cpu_to_be32(addr >> 32); > > > > > - dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff); > > > > > - dynamic_memory[2] = cpu_to_be32(drck->get_index(drc)); > > > > > - dynamic_memory[3] = cpu_to_be32(0); /* reserved */ > > > > > - dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL)); > > > > > - if (addr < machine->ram_size || > > > > > - memory_region_present(get_system_memory(), addr)) { > > > > > - dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED); > > > > > + if (i >= hotplug_lmb_start) { > > > > > + drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB, > > > > > + addr / lmb_size); > > > > > > > > Could just be i > > > > > > Hmm I thought I got all such occurances covered :( > > > > > > > > > > > > + g_assert(drc); > > > > > + drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > > > > + > > > > > + dynamic_memory[0] = cpu_to_be32(addr >> 32); > > > > > + dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff); > > > > > + dynamic_memory[2] = cpu_to_be32(drck->get_index(drc)); > > > > > + dynamic_memory[3] = cpu_to_be32(0); /* reserved */ > > > > > + dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL)); > > > > > + if (memory_region_present(get_system_memory(), addr)) { > > > > > + dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED); > > > > > + } else { > > > > > + dynamic_memory[5] = cpu_to_be32(0); > > > > > + } > > > > > } else { > > > > > - dynamic_memory[5] = cpu_to_be32(0); > > > > > + /* > > > > > + * LMB information for RMA, boot time RAM and gap b/n RAM and > > > > > + * hotplug memory region -- all these are marked as reserved. > > > > > + */ > > > > > + dynamic_memory[0] = cpu_to_be32(0); > > > > > + dynamic_memory[1] = cpu_to_be32(0); > > > > > > > > Are we sure we shouldn't still encode the addr here? > > > > > > Since kernel won't look at reserved LMBs, I thought it should be fine. > > > We could populate the addr, but we don't have DRC for them and hence > > > no DRC index, so anyway we will not have full information. > > > > The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario > > to have LMBs with no backing drc, so I would think the other fields > > should still be filled out appropriately, even if it might not get > > used by guest either way. > > That makes sense - but we should check that existing guests will > actually respect that DRC invalid bit. Existing guests don't even define DRC invalid bit yet. As Nathan said elsewhere, existing guest will check for reserved bit and not consider such LMBs. Regards, Bharata.