From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42970) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aZ9kf-0003cG-Jw for qemu-devel@nongnu.org; Thu, 25 Feb 2016 23:11:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aZ9ke-0004GV-5j for qemu-devel@nongnu.org; Thu, 25 Feb 2016 23:11:37 -0500 Received: from ozlabs.org ([2401:3900:2:1::2]:52632) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aZ9kd-0004GK-0j for qemu-devel@nongnu.org; Thu, 25 Feb 2016 23:11:36 -0500 Date: Fri, 26 Feb 2016 15:12:26 +1100 From: David Gibson Message-ID: <20160226041226.GI20657@voom.fritz.box> References: <20160218033952.GG15224@voom.fritz.box> <20160218113739.64b02461@nial.brq.redhat.com> <20160219043848.GZ15224@voom.fritz.box> <20160219164911.091b4351@nial.brq.redhat.com> <20160222025432.GD2808@voom.fritz.box> <20160223104645.6b64f60c@nial.brq.redhat.com> <20160224015417.GF2808@voom.fritz.box> <20160224151754.3e28094c@nial.brq.redhat.com> <20160225012543.GE22216@voom.redhat.com> <20160225134305.2c2a6aef@nial.brq.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jaTU8Y2VLE5tlY1O" Content-Disposition: inline In-Reply-To: <20160225134305.2c2a6aef@nial.brq.redhat.com> Subject: Re: [Qemu-devel] [RFC] QMP: add query-hotpluggable-cpus List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: lvivier@redhat.com, thuth@redhat.com, Peter Krempa , ehabkost@redhat.com, aik@ozlabs.ru, qemu-devel@nongnu.org, agraf@suse.de, Markus Armbruster , abologna@redhat.com, bharata@linux.vnet.ibm.com, pbonzini@redhat.com, afaerber@suse.de --jaTU8Y2VLE5tlY1O Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 25, 2016 at 01:43:05PM +0100, Igor Mammedov wrote: > On Thu, 25 Feb 2016 12:25:43 +1100 > David Gibson wrote: >=20 > > On Wed, Feb 24, 2016 at 03:17:54PM +0100, Igor Mammedov wrote: > > > On Wed, 24 Feb 2016 12:54:17 +1100 > > > David Gibson wrote: > > > =20 > > > > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote: =20 > > > > > On Mon, 22 Feb 2016 13:54:32 +1100 > > > > > David Gibson wrote: > > > > > =20 > > > > > > On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote: = =20 > > > > > > > On Fri, 19 Feb 2016 15:38:48 +1100 > > > > > > > David Gibson wrote: > > > > > > >=20 > > > > > > > CCing thread a couple of libvirt guys. > > > > > > > =20 > > > > > > > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wro= te: =20 > > > > > > > > > On Thu, 18 Feb 2016 14:39:52 +1100 > > > > > > > > > David Gibson wrote: > > > > > > > > > =20 > > > > > > > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov= wrote: =20 > > > > > > > > > > > On Mon, 15 Feb 2016 20:43:41 +0100 > > > > > > > > > > > Markus Armbruster wrote: > > > > > > > > > > > =20 > > > > > > > > > > > > Igor Mammedov writes: > > > > > > > > > > > > =20 > > > > > > > > > > > > > it will allow mgmt to query present and possible = to hotplug CPUs > > > > > > > > > > > > > it is required from a target platform that wish t= o support > > > > > > > > > > > > > command to set board specific MachineClass.possib= le_cpus() hook, > > > > > > > > > > > > > which will return a list of possible CPUs with op= tions > > > > > > > > > > > > > that would be needed for hotplugging possible CPU= s. > > > > > > > > > > > > > > > > > > > > > > > > > > For RFC there are: > > > > > > > > > > > > > 'arch_id': 'int' - mandatory unique CPU number, > > > > > > > > > > > > > for x86 it's APIC ID for AR= M it's MPIDR > > > > > > > > > > > > > 'type': 'str' - CPU object type for usage with= device_add > > > > > > > > > > > > > > > > > > > > > > > > > > and a set of optional fields that would allows mg= mt tools > > > > > > > > > > > > > to know at what granularity and where a new CPU c= ould be > > > > > > > > > > > > > hotplugged; > > > > > > > > > > > > > [node],[socket],[core],[thread] > > > > > > > > > > > > > Hopefully that should cover needs for CPU hotplug= porposes for > > > > > > > > > > > > > magor targets and we can extend structure in futu= re adding > > > > > > > > > > > > > more fields if it will be needed. > > > > > > > > > > > > > > > > > > > > > > > > > > also for present CPUs there is a 'cpu_link' field= which > > > > > > > > > > > > > would allow mgmt inspect whatever object/abstract= ion > > > > > > > > > > > > > the target platform considers as CPU object. > > > > > > > > > > > > > > > > > > > > > > > > > > For RFC purposes implements only for x86 target s= o far. =20 > > > > > > > > > > > >=20 > > > > > > > > > > > > Adding ad hoc queries as we go won't scale. Could = this be solved by a > > > > > > > > > > > > generic introspection interface? =20 > > > > > > > > > > > Do you mean generic QOM introspection? > > > > > > > > > > >=20 > > > > > > > > > > > Using QOM we could have '/cpus' container and create = QOM links > > > > > > > > > > > for exiting (populated links) and possible (empty lin= ks) CPUs. > > > > > > > > > > > However in that case link's name will need have a spe= cial format > > > > > > > > > > > that will convey an information necessary for mgmt to= hotplug > > > > > > > > > > > a CPU object, at least: > > > > > > > > > > > - where: [node],[socket],[core],[thread] options > > > > > > > > > > > - optionally what CPU object to use with device_add= command =20 > > > > > > > > > >=20 > > > > > > > > > > Hmm.. is it not enough to follow the link and get the t= opology > > > > > > > > > > information by examining the target? =20 > > > > > > > > > One can't follow a link if it's an empty one, hence > > > > > > > > > CPU placement information should be provided somehow, > > > > > > > > > either: =20 > > > > > > > >=20 > > > > > > > > Ah, right, so the issue is determining the socket/core/thre= ad > > > > > > > > addresses that cpus which aren't yet present will have. > > > > > > > > =20 > > > > > > > > > * by precreating cpu-package objects with properties that > > > > > > > > > would describe it /could be inspected via OQM/ = =20 > > > > > > > >=20 > > > > > > > > So, we could do this, but I think the natural way would be = to have the > > > > > > > > information for each potential thread in the package. Just= putting > > > > > > > > say "core number" in the package itself assumes more than I= 'd like > > > > > > > > about how packages sit in the heirarchy. Plus, it means th= at > > > > > > > > management has a bunch of cases to deal with: package has a= ll the > > > > > > > > information, package has just a core id, package has just a= socket id, > > > > > > > > and so forth. > > > > > > > >=20 > > > > > > > > It is a but clunky that when the package is plugged, this i= nformation > > > > > > > > will have to sit parallel to the array of actual thread lin= ks. > > > > > > > > > > > > > > > > Markus or Andreas is there a natural way to present a list = of (node, > > > > > > > > socket, core, thread) tuples in the package object? Prefer= ably > > > > > > > > without having to create a whole bunch of "potential thread= " objects > > > > > > > > just for the purpose. =20 > > > > > > > I'm sorry but I couldn't parse above 2 paragraphs. The way I = see > > > > > > > whatever placement info QEMU will provide to mgmt, mgmt will = have > > > > > > > to deal with it in one way or another. > > > > > > > Perhaps rephrasing and adding some examples might help to exp= lain > > > > > > > suggestion a bit better? =20 > > > > > >=20 > > > > > > Ok, so what I'm saying is that I think describing a location fo= r the > > > > > > package itself could be problematic. For some cases it will be= ok, > > > > > > but depending on exactly what the package represents on a parti= cular > > > > > > platform there could be a lot of options for how to represent i= t. > > > > > >=20 > > > > > > What I'm suggesting instead is that instead of giving a locatio= n for > > > > > > itself, the package lists the locations of all the threads it w= ill > > > > > > contain when it is enabled/present/whatever. Those locations c= an be > > > > > > given as node/socket/core/thread tuples - which are properties = that > > > > > > cpu threads already need to have, so we're not making the possi= ble > > > > > > inadequacy of that information any worse than it already was. > > > > > >=20 > > > > > > Examples.. so I'm not really sure how to write QOM objects, but= I hope > > > > > > this is clear enough: > > > > > >=20 > > > > > > On x86 > > > > > > .../cpu-package[0] (type 'acpi-thread') > > > > > > present =3D true > > > > > > location[0] =3D (node 0, socket 0, core 0, thread 0) > > > > > > thread[0] =3D > > > > > > .../cpu-package[1] (type 'acpi-thread') > > > > > > present =3D false > > > > > > location[0] =3D (node 0, socket 0, core 0, thread 1) > > > > > >=20 > > > > > > On Power > > > > > > .../cpu-package[0] (type 'spapr-core') > > > > > > present =3D true > > > > > > location[0] =3D (node 0, socket 0, core 0, thread 0) > > > > > > location[1] =3D (node 0, socket 0, core 0, thread 1) > > > > > > ... > > > > > > location[7] =3D (node 0, socket 0, core 0, thread 7) > > > > > > thread[0] =3D > > > > > > ... > > > > > > thread[7] =3D >link...> > > > > > > .../cpu-package[1] (type 'spapr-core') > > > > > > present =3D false > > > > > > location[0] =3D (node 0, socket 0, core 0, thread 0) > > > > > > location[1] =3D (node 0, socket 0, core 0, thread 1) > > > > > > ... > > > > > > location[7] =3D (node 0, socket 0, core 0, thread 7) > > > > > >=20 > > > > > > Does that make sense? > > > > > > =20 > > > > > > > > > or > > > > > > > > > * via QMP/HMP command that would provide the same inform= ation > > > > > > > > > only without need to precreate anything. The only diff= erence > > > > > > > > > is that it allows to use -device/device_add for new CP= Us. =20 > > > > > > > >=20 > > > > > > > > I'd be ok with that option as well. I'd be thinking it wou= ld be > > > > > > > > implemented via a class method on the package object which = returns the > > > > > > > > addresses that its contained threads will have, whether or = not they're > > > > > > > > present right now. Does that make sense? =20 > > > > > > > In this RFC it's MachineClass.possible_cpus method which is a= bit more > > > > > > > flexible as it allows a board to describe possible CPU device= s (whatever > > > > > > > they might be: sockets|cores|threads|some_chip_module) and th= eir properties > > > > > > > without forcing board to precreate cpu_package objects which = should convey > > > > > > > the same info one way or another. =20 > > > > > >=20 > > > > > > Hmm.. so my RFC so far (at least the revised version based on > > > > > > Eduardo's comments) is that the cpu_package objects are always > > > > > > precreated. In future we might allow dynamic construction, but= that > > > > > > will require a bunch more thinking to designt the right interfa= ces. > > > > > > =20 > > > > > > > > > Considering that we would need to create HMP command so u= ser could > > > > > > > > > inspect possible CPUs from monitor, it would need to do t= he same as > > > > > > > > > QMP command regardless of whether it's cpu-package object= s or > > > > > > > > > just board calculated info a runtime. > > > > > > > > > =20 > > > > > > > > > > In the design Eduardo and I have been discussing we're = actually not > > > > > > > > > > planning to allow device_add to construct CPU packages = - at least, not > > > > > > > > > > for the time being. The idea is that the machine type = will construct > > > > > > > > > > enough packages for maxcpus, and management just toggle= s them on and > > > > > > > > > > off. =20 > > > > > > > > > Another question is how it would work wrt migration? = =20 > > > > > > > >=20 > > > > > > > > I'm assuming the "present" bits would be added to the migra= tion > > > > > > > > stream; seems straightforward enough to me. Is there some > > > > > > > > consideration I'm missing? =20 > > > > > > > It's hard to estimate how cpu-package objects might complicate > > > > > > > migration. It should not break migration for old machine types > > > > > > > and if possible it should work for backwards migration to old= er > > > > > > > QEMU versions (to be downstream friendly). =20 > > > > > >=20 > > > > > > So, the simple way to achieve that is to only instantiate the > > > > > > cpu-package objects on newer machine types. Older machine type= s will > > > > > > instatiate the cpu threads directly from the machine type in th= e old > > > > > > way, and (except for x86) won't allow cpu hotplug. > > > > > >=20 > > > > > > I think that's a reasonable first approach. Later we can look = at > > > > > > migrating a non-package setup to a package setup, if it looks l= ike > > > > > > that will be useful. > > > > > > =20 > > > > > > > If we go typical '-device/device_add whatever_cpu_device,foo_= options_list' > > > > > > > route then it would allow us to replicate older device models= without > > > > > > > issues (I don't expect any in x86 case) as it's what CPUs are= now under the hood. > > > > > > > This RFC doesn't force us to re-factor device models in order= to use > > > > > > > hotplug (where CPU objects are already self-sufficient device= s/hotplug capable). > > > > > > >=20 > > > > > > > It rather tries completely split interface aspect from how we= are > > > > > > > internally model CPU hotplug, and tries to solve issue with > > > > > > >=20 > > > > > > > -device/device_add for which we need to provide > > > > > > > 'what type to plug' and 'where to plug, which options to s= et to what' > > > > > > >=20 > > > > > > > It's 1st level per you proposal, later we can do 2nd level on= top of it > > > > > > > using cpu-packages(flip present property) to simplify mgmt's = job > > > > > > > if it still would really needed (i.e. mgmt won't be able to c= ope with > > > > > > > -device, which it already has support for). =20 > > > > > >=20 > > > > > > Yeah.. so the thing is, in the short term I'm really more inter= ested > > > > > > in the 2nd layer interface. It's something we can actually use, > > > > > > whereas the 1st layer interfaace still has a lot of potential > > > > > > complications. =20 > > > > > What complications do you see from POWER point if view? =20 > > > >=20 > > > > I don't relaly see any complications specific to Power. But the > > > > biggest issue, as far as I can tell is how do we advertise to the u= ser > > > > / management layer what sorts of CPUs can be hotplugged - how many, > > > > what types are possible and so forth. The constraints here could in > > > > theory be pretty complex. =20 > > > that's what query-hotpluggable-cpus does, but not for theoretical > > > set of platforms but rather a practical set that we a wanting > > > CPU hotplug for. > > > i.e. board returns a fixed board layout describing what cpu types > > > could be hotplugged and where at in terms of [socket/core/thread] > > > tuples, which maps well to current targets which need CPU hotplug > > > (power/s390/x86/ARM). > > >=20 > > > The rest of interface (i.e.) add/remove actions are handled by > > > reused -device/device_add - that mgmt has already support for and > > > works pretty well for migration as well > > > (no need to maintain machine version-ed compat glue is plus). > > >=20 > > > So any suggestions how to improve layout description returned > > > by query-hotpluggable-cpus command are welcome. > > > Even if we end up using QOM interface, suggestions will still > > > be useful as the other interface will need to convey the same info > > > just via other means. =20 > >=20 > > Yeah, as I mentioned elsewhere, I'm starting to come around to this > > basic approach, although I'm still a bit dubious about the specific > > format suggested. I don't have specific suggestions to improve it > > yet, but I'm working on it :). > >=20 > >=20 > > > > > > This is why Eduardo suggested - and I agreed - that it's probab= ly > > > > > > better to implement the "1st layer" as an internal structure/in= terface > > > > > > only, and implement the 2nd layer on top of that. When/if we n= eed to > > > > > > we can revisit a user-accessible interface to the 1st layer. = =20 > > > > > We are going around QOM based CPU introspecting interface for > > > > > years now and that's exactly what 2nd layer is, just another > > > > > implementation. I've just lost hope in this approach. > > > > >=20 > > > > > What I'm suggesting in this RFC is to forget controversial > > > > > QOM approach for now and use -device/device_add + QMP introspecti= on, > > > > > i.e. completely split interface from how boards internally implem= ent > > > > > CPU hotplug. =20 > > > >=20 > > > > I can see the appeal of that approach at this juncture. Hmm.. =20 > > > A lot of work has been done to make CPUs device_add compatible. =20 > >=20 > > So... it's been much discussed, but I'm still pretty unclear on how > > the device_add interface is supposed to work; at least in the context > > of non thread-granularity hotplug. > >=20 > > Basically, is it acceptable for: > > device_add vendor-model-cpu-core > >=20 > > to create, in addition to the core device, a bunch of additional > > devices (the individual threads), or is that the "object mutating its > > own topology" that Andreas objects to violently? > I think it's acceptable to have vendor-model-cpu-core device > considering it's platform limitation or socket if device model calls for = it. > I'm not sure that mutating applies to all objects but for Device > inherited classes there shouldn't be any. > i.e. > 1. create Device with instance_init - constructor that shouldn't fail ev= er > 2. set properties - > done by -device/device_add and also by device_post_init() for globa= ls > 3. set 'realize' property to ON - allowed to fail, completes device init= ialization > realize() hook must validate set earlier properties if it hasn't been > done earlier and complete all child objects initialization, Ok, does that include the initial construction of child objects? > children are should be at 'realized' state when parent's realize() > hook finishes without error. No further children are allowed to be > created and not properties are allowed to be set after Device is real= ized. > 4. Once realize() hook is executed, Device core code calls > plug hook if it supported hotplug_handler_plug() which usually > does the job of wiring Device to board. For more details see > device_set_realized(). > > On top of that Andreas would like that children weren't dynamically > allocated but embedded into parent, included in parent's > instance_size if possible i.e. children count is known at > instance_init() time. Right, which is not possible if we have a nr_threads property, as we want for the cases we're looking at now. > > If that is acceptable, where exactly should it be done? In the > > device's instance_init? in realize? somewhere else? > Not sure what question is about, does above answer it? > =20 > > > The missing piece is letting mgmt to know what CPUs and with > > > which options could be plugged in. =20 > >=20 > > Well, that's *a* missing piece, certainly.. > >=20 > > > And adding a query-hotpluggable-cpus QMP command looks like > > > a path of the least resistance that would work for power/s390/x86/ARM. > > > =20 > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --jaTU8Y2VLE5tlY1O Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWz9CqAAoJEGw4ysog2bOS4VgQAN02/gRD4CXlw+hV2rrmiG1F /s3Y9x48rcqdUMtC6o57tgNO2i+qIHymSDBNNWo+hFRyyKSNtp+fwaprTUfYL12S CU3ylq7AF7IdWPDt4j22hC1oNgBUYSX3NIfqpRaXSaOoZgJ8gy/55r2xS/7/cTlk An3IcKAOzgMQDjnj3YufLvpfWj8UKKkO+ibkAI7xtTYm5hv3aiCX+I39ziUUhkYY d42iEEuJG9fZ9UxUy/PCOyJ56GIzIPAjM7yd+MxkTd9E9rfrR9vze8OjOG0SI6sF 4sHvK+2wLL5HXBbnGq2DQiASwRzGGISifuRF0fsRQlal690v0Nij1KjYVjInxOHd NlV+dELugd8NQhiIeLzZNzRB3N9fz18i0L26hCTHJRYO1SQq7+60LAhvFtauk6aH vPBnRi9dBwlL7qgMW5yLd30tC1QMaC0ozhJqni1hEnH+C2cenDsBZtGl6e+QpFBL Knl4fofzWzp7P3grkZNOUhfisxij73Ubcb8nJ1Pcd0x6kcneAfxkfg9OKOfiSatV piucGgkanNUV/kVBWuVw4cDWgPpBY4XCVjrkUIuOaa2f/3y1rRBEQONqdr1YLifD c4gqqqudz762q0A3D3UxOR1Z/l0y4lPJX3/kgb0suEnIBDww/dGNoyNl1EU/e4tg WOF63jZnSpUa9ULGrrQP =Hxtu -----END PGP SIGNATURE----- --jaTU8Y2VLE5tlY1O--