From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36762) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAqMK-00035e-L5 for qemu-devel@nongnu.org; Wed, 08 Jun 2016 23:10:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bAqMF-0004tB-Gu for qemu-devel@nongnu.org; Wed, 08 Jun 2016 23:10:15 -0400 Date: Thu, 9 Jun 2016 12:03:30 +1000 From: David Gibson Message-ID: <20160609020330.GG9226@voom.fritz.box> References: <1465276743-7340-1-git-send-email-bharata@linux.vnet.ibm.com> <20160607233728.713.97557@loki> <20160608023554.GA8861@in.ibm.com> <20160608150512.665.12735@loki> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="boT9Oj39GmgPxYhu" Content-Disposition: inline In-Reply-To: <20160608150512.665.12735@loki> Subject: Re: [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: bharata@linux.vnet.ibm.com, qemu-devel@nongnu.org, nfont@linux.vnet.ibm.com, aik@ozlabs.ru, qemu-ppc@nongnu.org --boT9Oj39GmgPxYhu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote: > Quoting Bharata B Rao (2016-06-07 21:35:54) > > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote: > > > Quoting Bharata B Rao (2016-06-07 00:19:03) > > > > Memory hotplug can fail for some combinations of RAM and maxmem when > > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW de= pends > > > > on maximum addressable memory returned by guest and this value is c= urrently > > > > being calculated wrongly by the guest kernel routine memory_hotplug= _max(). > > > > While there is an attempt to fix the guest kernel, this patch works > > > > around the problem within QEMU itself. > > > >=20 > > > > memory_hotplug_max() routine in the guest kernel arrives at max > > > > addressable memory by multiplying lmb-size with the lmb-count obtai= ned > > > > from ibm,dynamic-memory property. There are two assumptions here: > > > >=20 > > > > - All LMBs are part of ibm,dynamic memory: This is not true for Pow= erKVM > > > > where only hot-pluggable LMBs are present in this property. > > > > - The memory area comprising of RAM and hotplug region is contiguou= s: This > > > > needn't be true always for PowerKVM as there can be gap between > > > > boot time RAM and hotplug region. > > > >=20 > > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory > > > > has information about all the LMBs (RMA, boot-time LMBs, future > > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and > > > > hotpluggable region). > > > >=20 > > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs > > > > and also the LMBs for the gap b/n RAM and hotpluggable region as > > > > reserved so that these LMBs are not recounted/counted by guest. > > > >=20 > > > > Signed-off-by: Bharata B Rao > > > > --- > > > > Changes in v3: > > > >=20 > > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue > > > > to have DRC objects for only hotpluggable LMBs. > > > > - Simplified the logic of creating dynamic-memory node based on com= ments > > > > from Michael Roth and David Gibson. > > > >=20 > > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.= html > > > >=20 > > > > hw/ppc/spapr.c | 51 ++++++++++++++++++++++++++++++++------= ------------ > > > > include/hw/ppc/spapr.h | 5 +++-- > > > > 2 files changed, 36 insertions(+), 20 deletions(-) > > > >=20 > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > index 0636642..9d1d43d 100644 > > > > --- a/hw/ppc/spapr.c > > > > +++ b/hw/ppc/spapr.c > > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPR= MachineState *spapr, void *fdt) > > > > int ret, i, offset; > > > > uint64_t lmb_size =3D SPAPR_MEMORY_BLOCK_SIZE; > > > > uint32_t prop_lmb_size[] =3D {0, cpu_to_be32(lmb_size)}; > > > > - uint32_t nr_lmbs =3D (machine->maxram_size - machine->ram_size= )/lmb_size; > > > > + uint32_t hotplug_lmb_start =3D spapr->hotplug_memory.base / lm= b_size; > > > > + uint32_t nr_lmbs =3D (spapr->hotplug_memory.base + > > > > + memory_region_size(&spapr->hotplug_memory.m= r)) / > > > > + lmb_size; > > > > uint32_t *int_buf, *cur_index, buf_len; > > > > int nr_nodes =3D nb_numa_nodes ? nb_numa_nodes : 1; > > > >=20 > > > > /* > > > > - * Don't create the node if there are no DR LMBs. > > > > + * Don't create the node if there is no hotpluggable memory > > > > */ > > > > - if (!nr_lmbs) { > > > > + if (machine->ram_size =3D=3D machine->maxram_size) { > > > > return 0; > > > > } > > > >=20 > > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPR= MachineState *spapr, void *fdt) > > > > for (i =3D 0; i < nr_lmbs; i++) { > > > > sPAPRDRConnector *drc; > > > > sPAPRDRConnectorClass *drck; > > >=20 > > > Since these ^ are only used if (i >=3D hotplug_lmb_start), it might be > > > clearer to move them there now. > >=20 > > Yes. > >=20 > > >=20 > > > > - uint64_t addr =3D i * lmb_size + spapr->hotplug_memory.bas= e;; > > > > + uint64_t addr =3D i * lmb_size; > > > > uint32_t *dynamic_memory =3D cur_index; > > > >=20 > > > > - drc =3D spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_L= MB, > > > > - addr/lmb_size); > > > > - g_assert(drc); > > > > - drck =3D SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > > > - > > > > - dynamic_memory[0] =3D cpu_to_be32(addr >> 32); > > > > - dynamic_memory[1] =3D cpu_to_be32(addr & 0xffffffff); > > > > - dynamic_memory[2] =3D cpu_to_be32(drck->get_index(drc)); > > > > - dynamic_memory[3] =3D cpu_to_be32(0); /* reserved */ > > > > - dynamic_memory[4] =3D cpu_to_be32(numa_get_node(addr, NULL= )); > > > > - if (addr < machine->ram_size || > > > > - memory_region_present(get_system_memory(), add= r)) { > > > > - dynamic_memory[5] =3D cpu_to_be32(SPAPR_LMB_FLAGS_ASSI= GNED); > > > > + if (i >=3D hotplug_lmb_start) { > > > > + drc =3D spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TY= PE_LMB, > > > > + addr / lmb_size); > > >=20 > > > Could just be i > >=20 > > Hmm I thought I got all such occurances covered :( > >=20 > > >=20 > > > > + g_assert(drc); > > > > + drck =3D SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > > > + > > > > + dynamic_memory[0] =3D cpu_to_be32(addr >> 32); > > > > + dynamic_memory[1] =3D cpu_to_be32(addr & 0xffffffff); > > > > + dynamic_memory[2] =3D cpu_to_be32(drck->get_index(drc)= ); > > > > + dynamic_memory[3] =3D cpu_to_be32(0); /* reserved */ > > > > + dynamic_memory[4] =3D cpu_to_be32(numa_get_node(addr, = NULL)); > > > > + if (memory_region_present(get_system_memory(), addr)) { > > > > + dynamic_memory[5] =3D cpu_to_be32(SPAPR_LMB_FLAGS_= ASSIGNED); > > > > + } else { > > > > + dynamic_memory[5] =3D cpu_to_be32(0); > > > > + } > > > > } else { > > > > - dynamic_memory[5] =3D cpu_to_be32(0); > > > > + /* > > > > + * LMB information for RMA, boot time RAM and gap b/n = RAM and > > > > + * hotplug memory region -- all these are marked as re= served. > > > > + */ > > > > + dynamic_memory[0] =3D cpu_to_be32(0); > > > > + dynamic_memory[1] =3D cpu_to_be32(0); > > >=20 > > > Are we sure we shouldn't still encode the addr here? > >=20 > > Since kernel won't look at reserved LMBs, I thought it should be fine. > > We could populate the addr, but we don't have DRC for them and hence > > no DRC index, so anyway we will not have full information. >=20 > The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario > to have LMBs with no backing drc, so I would think the other fields > should still be filled out appropriately, even if it might not get > used by guest either way. That makes sense - but we should check that existing guests will actually respect that DRC invalid bit. > I think an argument could be made for not setting addr for LMBs to used > to cover the RAM->hotplug gap, since that's padding rather than real > memory. But effectively re-using addr 0 seems like more of a liability > than a safeguard. If 'reserved' flag is doing what we expect, then it's > probably best not to deviate from normal values any more than necessary, > IMO. Yes, I think addr should be set, even for the dummy LMBs. > > > > + dynamic_memory[2] =3D cpu_to_be32(0); > > > > + dynamic_memory[3] =3D cpu_to_be32(0); /* reserved */ > > > > + dynamic_memory[4] =3D cpu_to_be32(-1); > > > > + dynamic_memory[5] =3D cpu_to_be32(SPAPR_LMB_FLAGS_RESE= RVED); > > >=20 > > > LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I > > > think is what we'll want for cases where there's no backing DRC. > >=20 > > Seems to work. I started with reserved based on Nathan's original > > suggestion. > >=20 > > So Nathan, which would be more appropriate here from kernel point of vi= ew ? > > Reserved or 'DRC Invalid' ? >=20 > Wouldn't we want an OR of both? That might be the safest option, yes. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --boT9Oj39GmgPxYhu Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXWM5yAAoJEGw4ysog2bOS718P/RFAmKWA2N91xV6LNxh4MOWZ AqG9ErjCQkfIfdqXztcKD0zx45yB/iXfGOeuHwuN9nWg3B5GxINgP4DByP6TB6jm zYlkba8Wu3u6HLeI2tF/NHjdnDNlOErxRcEeCfVMvF4IW98Y0IAGUpUbOdcfvcU0 WL+vGfWjnAbbz5q7lyLj/VpxO5ZizcOgN/sRO2OkbSk6w+xhMm42Sh4aJCbo76yC AjEMQlXtGB0Tq40wXgUMMCe0CS14YneiYRbzvEHjX3MQBbpmwSDDzlHJD8wz1gxt Zm8YqLDUJU99W/8TgLujtcbzLfOTq5OAMDdTF/FOhug5gm5IFTotBQ5yWkH2Ifml ta10Nvdkke+yYR4vW/H/cjnbj3ukTuzgJZyg9HXuyauE8UV54h93xX9OzVNhfIXj np9MeAOF/t5CtyoqONhFF6n/Ezght62iBCnoz2u9zx6L7/CZB7kmMHIpjHLxsrIN gTvyL9ee9wuDiej/sbs+X+zjMMbc5w/94mk24NtU29TqCSxroMmQBAov4YGIGglY T1wu1UKSF9I1y5Duyrdl58SHagvSg+NgGQfNXOqbo+Ltg9VFFv4RuLfA8r0faJWQ 3QZoDGaXWhXOjR6XHnXo8qJr+y/hK0MoAmUtsIhB6hK4j3LuTcjRcusazm9sWV9a Jfy8u49axAzKDc9+Tou3 =JsIF -----END PGP SIGNATURE----- --boT9Oj39GmgPxYhu--