From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56291) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMTxZ-0007ly-1W for qemu-devel@nongnu.org; Tue, 04 Aug 2015 00:36:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZMTxX-0000f9-4b for qemu-devel@nongnu.org; Tue, 04 Aug 2015 00:36:16 -0400 Date: Tue, 4 Aug 2015 14:36:16 +1000 From: David Gibson Message-ID: <20150804043616.GG3080@voom.redhat.com> References: <1438580143-587-1-git-send-email-bharata@linux.vnet.ibm.com> <1438580143-587-5-git-send-email-bharata@linux.vnet.ibm.com> <20150803065501.GB31111@voom.redhat.com> <20150803075302.GB5776@in.ibm.com> <20150803223243.4156.40106@loki> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="AzNpbZlgThVzWita" Content-Disposition: inline In-Reply-To: <20150803223243.4156.40106@loki> Subject: Re: [Qemu-devel] [RFC PATCH v0 4/5] spapr: Support hotplug by specifying DRC count List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: agraf@suse.de, qemu-ppc@nongnu.org, qemu-devel@nongnu.org, nfont@linux.vnet.ibm.com, bharata@linux.vnet.ibm.com --AzNpbZlgThVzWita Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Aug 03, 2015 at 05:32:43PM -0500, Michael Roth wrote: > Quoting Bharata B Rao (2015-08-03 02:53:02) > > On Mon, Aug 03, 2015 at 04:55:01PM +1000, David Gibson wrote: > > > On Mon, Aug 03, 2015 at 11:05:42AM +0530, Bharata B Rao wrote: > > > > Support hotplug identifier type RTAS_LOG_V6_HP_ID_DRC_COUNT that al= lows > > > > hotplugging of DRCs by specifying the DRC count. > > > >=20 > > > > While we are here, rename > > > >=20 > > > > spapr_hotplug_req_add_event() to spapr_hotplug_req_add_by_index() > > > > spapr_hotplug_req_remove_event() to spapr_hotplug_req_remove_by_ind= ex() > > > >=20 > > > > so that they match with spapr_hotplug_req_add_by_count(). > > > >=20 > > > > Signed-off-by: Bharata B Rao > > > > --- > > > > hw/ppc/spapr.c | 2 +- > > > > hw/ppc/spapr_events.c | 47 ++++++++++++++++++++++++++++++++++++++= --------- > > > > hw/ppc/spapr_pci.c | 4 ++-- > > > > include/hw/ppc/spapr.h | 8 ++++++-- > > > > 4 files changed, 47 insertions(+), 14 deletions(-) > > > >=20 > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > index 669dc43..13af9be 100644 > > > > --- a/hw/ppc/spapr.c > > > > +++ b/hw/ppc/spapr.c > > > > @@ -2072,7 +2072,7 @@ static void spapr_add_lmbs(DeviceState *dev, = uint64_t addr, uint64_t size, > > > > =20 > > > > drck =3D SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > > > drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, = errp); > > > > - spapr_hotplug_req_add_event(drc); > > > > + spapr_hotplug_req_add_by_index(drc); > > > > addr +=3D SPAPR_MEMORY_BLOCK_SIZE; > > > > } > > > > } > > > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c > > > > index 98bf7ae..744ea62 100644 > > > > --- a/hw/ppc/spapr_events.c > > > > +++ b/hw/ppc/spapr_events.c > > > > @@ -386,7 +386,9 @@ static void spapr_powerdown_req(Notifier *n, vo= id *opaque) > > > > qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exceptio= n_irq)); > > > > } > > > > =20 > > > > -static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t= hp_action) > > > > +static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_acti= on, > > > > + sPAPRDRConnectorType drc_type, > > > > + uint32_t drc) > > > > { > > > > sPAPRMachineState *spapr =3D SPAPR_MACHINE(qdev_get_machine()); > > > > struct hp_log_full *new_hp; > > > > @@ -395,8 +397,6 @@ static void spapr_hotplug_req_event(sPAPRDRConn= ector *drc, uint8_t hp_action) > > > > struct rtas_event_log_v6_maina *maina; > > > > struct rtas_event_log_v6_mainb *mainb; > > > > struct rtas_event_log_v6_hp *hp; > > > > - sPAPRDRConnectorClass *drck =3D SPAPR_DR_CONNECTOR_GET_CLASS(d= rc); > > > > - sPAPRDRConnectorType drc_type =3D drck->get_type(drc); > > > > =20 > > > > new_hp =3D g_malloc0(sizeof(struct hp_log_full)); > > > > hdr =3D &new_hp->hdr; > > > > @@ -427,8 +427,7 @@ static void spapr_hotplug_req_event(sPAPRDRConn= ector *drc, uint8_t hp_action) > > > > hp->hdr.section_length =3D cpu_to_be16(sizeof(*hp)); > > > > hp->hdr.section_version =3D 1; /* includes extended modifier */ > > > > hp->hotplug_action =3D hp_action; > > > > - hp->drc.index =3D cpu_to_be32(drck->get_index(drc)); > > > > - hp->hotplug_identifier =3D RTAS_LOG_V6_HP_ID_DRC_INDEX; > > > > + hp->hotplug_identifier =3D hp_id; > > > > =20 > > > > switch (drc_type) { > > > > case SPAPR_DR_CONNECTOR_TYPE_PCI: > > > > @@ -445,19 +444,49 @@ static void spapr_hotplug_req_event(sPAPRDRCo= nnector *drc, uint8_t hp_action) > > > > return; > > > > } > > > > =20 > > > > + if (hp_id =3D=3D RTAS_LOG_V6_HP_ID_DRC_COUNT) { > > > > + hp->drc.count =3D cpu_to_be32(drc); > > >=20 > > > I'm a bit confused as to how this message can work with *only* a > > > count and not some sort of base index. >=20 > Talked a bit with Nathan for some more insight on *why* it works: >=20 > drc-count is actually how hotplug add/remove is done for pHyp cpu/mem > resources, drc-index is used in special situations like memory/cpu > failures, and for other drc types like pci. >=20 > So in general, drc-count with no drc-index is supposed to work fine > with existing guests/tools. >=20 > As far as the *how*: >=20 > guests just attempt to add LMBs until they fulfill the request, if > there's a failure they abort/rewind and try the next LMB. Huh, ok. > There are some complications with QEMU though. Namely, our LMBs are > hotplugged in the form of DIMM-backed LMBs, as opposed to pHyp where > we hotplug from a logical pool of LMBs where there's no DIMM-level > modeling of resources. So, with both pHyp and pKVM, LMB hotplug/unplug > can fail for various reasons, but in the case of QEMU we have the > unique no-dimm-backing-this-lmb failure. We fail this at the DRC > level: if a guest attempts to set DRC/LMB allocation state to USED, > and QEMU decides there's no DIMM backing this memory, it simply reports > an error. This is in line with PAPR+. And due to the > try/abort/rewind/next handling this all still works. >=20 > Memory *unplug* is a little hairier: >=20 > All the above applies, but prior to getting to the point of QEMU failing > the DRC/LMB removal (via set-isolation-state:ISOLATED, > set-allocation-state:UNUSED returning failure), the guest kernel will > offline the LMB, migrating the memory if need be, so when the guest hits > the DRC failure it has to bring the LMB back online. This unecessary > churn and memory migration is obviously not desirable, but we've yet > to look into the performance aspects of it. Ah, and presumably this migrate shuffle could continue for a long time, as it works its way though each LMB until reaching the ones connected to the removed DIMM. > I think a drc-index being provided as a hint/starting point would be > the right way to address this potential performance issue, but I'm hoping > the approach in this series is sufficient for mem hotplug/unplug > for existing guests that wouldn't support such a hint. I guess so. Although it does make me wonder if using the PC derived DIMM model was really what we wanted for PAPR guests. I guess implementing the phyp LMB-pool model probably would have required a lot of messing with the core qemu hotplug model, though. > This drc-index hint would likely be introduced in tandem with a new > ibm,client-set-architecture flag so QEMU can gracefully switch over for > newer guests. Nathan is working the PAPR+ spec for this in the process > of his work to move memory hotplug handling into guest kernel. >=20 > >=20 > > Right and this can be an issue when we start supporting memory removal > > and the currrent userspace drmgr tool will just go ahead and remove > > 'count' number of LMBs that have been marked for removal and there is > > really no association b/n these LMBs to the LMBs that make up the > > pc-dimm device that is being removed. > >=20 > > One solution is to extend the rtas event header and include the base > > drc-index (with count type of identifier) so that we know exactly which > > set of LMBs to remove for the given pc-dimm device. > >=20 > > Michael has some thoughts about alternative ways in thich we can achieve > > removal of correct LMBs without needing to pass the base drc-index. > >=20 > > Regards, > > Bharata. > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --AzNpbZlgThVzWita Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVwEFAAAoJEGw4ysog2bOS5t0QALVgSM781tHrDphPX8okZxos ip2yuM+wywDjm6zl4saU28uFykV0CcPt+KWEF9NhcA5r1QQ7drAyvAK0CXtCJC6d fcLcaXH4xv1RNInQbiCwitCX02R83Lb/1sOkb8FVSyW0unbQ5btR0gNdwi9PicNb DcEBZRn3ydv+UEjpFZsGvdwUZlF+2fxrIf4E8RVNy/jyHA00g5XVogj80Ymwi3pn N1E9in77xJ09biqDoe7tIXJ4dgzL53tZTXq3Aar3hErtEbuL2FvmGe9Gq7aW9SIQ O9ThidnuUKP1gWN+PYbD3qotr2pOe22K3rX7KSZ/Hs+YRCBUk7+U6AMQdV4pFSzK tMXV0fijA5+nwuByCyDBFBMxjeHnz+0HOjhUybv+o8tcPvzVkMSFcfDi+I2m9iCj gmzaaNesqIR5snn511QZh2jArKX9CYsmlB22YhP4SVmgzJonPolreoiTYhRMWF8+ biEFiaeo+i0fMdJiPkrr6xESBiPLxx3Ats2pSroIqgQFnDsMSD99cn+0Gk0/UKyY TJBMmLZNqhlQDjbaNJeI2gtljAtBCbZSVAIsqZjfIC6g6YUcEI/CJq03x1pYWiZf oTajWr8K6nwJOvHjIrR3WB9yleOPQJDIQNMde6n574Hmgmk8UWFHihudiBRqQKpy /l2Zx80517A84ZFJXqxP =ug7R -----END PGP SIGNATURE----- --AzNpbZlgThVzWita--