From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42541) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYldl-0005eM-BW for qemu-devel@nongnu.org; Wed, 24 Feb 2016 21:26:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYldi-0005eM-2Y for qemu-devel@nongnu.org; Wed, 24 Feb 2016 21:26:53 -0500 Received: from ozlabs.org ([103.22.144.67]:52715) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYldh-0005e3-EE for qemu-devel@nongnu.org; Wed, 24 Feb 2016 21:26:49 -0500 Date: Thu, 25 Feb 2016 12:03:21 +1100 From: David Gibson Message-ID: <20160225010321.GB22216@voom.redhat.com> References: <20160218033952.GG15224@voom.fritz.box> <20160218113739.64b02461@nial.brq.redhat.com> <20160219043848.GZ15224@voom.fritz.box> <87h9h5uiy8.fsf@blackfin.pond.sub.org> <20160222023228.GC2808@voom.fritz.box> <87h9h1i07h.fsf@blackfin.pond.sub.org> <20160224015711.GG2808@voom.fritz.box> <87povm1ov1.fsf@blackfin.pond.sub.org> <20160224105119.GN2808@voom.fritz.box> <20160224120341.11a04f26@nial.brq.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/NkBOFFp2J2Af1nK" Content-Disposition: inline In-Reply-To: <20160224120341.11a04f26@nial.brq.redhat.com> Subject: Re: [Qemu-devel] [RFC] QMP: add query-hotpluggable-cpus List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: lvivier@redhat.com, agraf@suse.de, thuth@redhat.com, ehabkost@redhat.com, aik@ozlabs.ru, qemu-devel@nongnu.org, Markus Armbruster , abologna@redhat.com, bharata@linux.vnet.ibm.com, pbonzini@redhat.com, afaerber@suse.de --/NkBOFFp2J2Af1nK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 24, 2016 at 12:03:41PM +0100, Igor Mammedov wrote: > On Wed, 24 Feb 2016 21:51:19 +1100 > David Gibson wrote: >=20 > > On Wed, Feb 24, 2016 at 09:42:10AM +0100, Markus Armbruster wrote: > > > David Gibson writes: > > > =20 > > > > On Mon, Feb 22, 2016 at 10:05:54AM +0100, Markus Armbruster wrote: = =20 > > > >> David Gibson writes: > > > >> =20 > > > >> > On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrot= e: =20 > > > >> >> David Gibson writes: > > > >> >> =20 > > > >> >> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote= : =20 > > > >> >> >> On Thu, 18 Feb 2016 14:39:52 +1100 > > > >> >> >> David Gibson wrote: > > > >> >> >> =20 > > > >> >> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wr= ote: =20 > > > >> >> >> > > On Mon, 15 Feb 2016 20:43:41 +0100 > > > >> >> >> > > Markus Armbruster wrote: > > > >> >> >> > > =20 > > > >> >> >> > > > Igor Mammedov writes: > > > >> >> >> > > > =20 > > > >> >> >> > > > > it will allow mgmt to query present and possible to = hotplug CPUs > > > >> >> >> > > > > it is required from a target platform that wish to s= upport > > > >> >> >> > > > > command to set board specific MachineClass.possible_= cpus() hook, > > > >> >> >> > > > > which will return a list of possible CPUs with optio= ns > > > >> >> >> > > > > that would be needed for hotplugging possible CPUs. > > > >> >> >> > > > > > > > >> >> >> > > > > For RFC there are: > > > >> >> >> > > > > 'arch_id': 'int' - mandatory unique CPU number, > > > >> >> >> > > > > for x86 it's APIC ID for ARM i= t's MPIDR > > > >> >> >> > > > > 'type': 'str' - CPU object type for usage with de= vice_add > > > >> >> >> > > > > > > > >> >> >> > > > > and a set of optional fields that would allows mgmt = tools > > > >> >> >> > > > > to know at what granularity and where a new CPU coul= d be > > > >> >> >> > > > > hotplugged; > > > >> >> >> > > > > [node],[socket],[core],[thread] > > > >> >> >> > > > > Hopefully that should cover needs for CPU hotplug po= rposes for > > > >> >> >> > > > > magor targets and we can extend structure in future = adding > > > >> >> >> > > > > more fields if it will be needed. > > > >> >> >> > > > > > > > >> >> >> > > > > also for present CPUs there is a 'cpu_link' field wh= ich > > > >> >> >> > > > > would allow mgmt inspect whatever object/abstraction > > > >> >> >> > > > > the target platform considers as CPU object. > > > >> >> >> > > > > > > > >> >> >> > > > > For RFC purposes implements only for x86 target so f= ar. =20 > > > >> >> >> > > >=20 > > > >> >> >> > > > Adding ad hoc queries as we go won't scale. Could thi= s be solved by a > > > >> >> >> > > > generic introspection interface? =20 > > > >> >> >> > > Do you mean generic QOM introspection? > > > >> >> >> > >=20 > > > >> >> >> > > Using QOM we could have '/cpus' container and create QOM= links > > > >> >> >> > > for exiting (populated links) and possible (empty links)= CPUs. > > > >> >> >> > > However in that case link's name will need have a specia= l format > > > >> >> >> > > that will convey an information necessary for mgmt to ho= tplug > > > >> >> >> > > a CPU object, at least: > > > >> >> >> > > - where: [node],[socket],[core],[thread] options > > > >> >> >> > > - optionally what CPU object to use with device_add co= mmand =20 > > > >> >> >> >=20 > > > >> >> >> > Hmm.. is it not enough to follow the link and get the topo= logy > > > >> >> >> > information by examining the target? =20 > > > >> >> >> One can't follow a link if it's an empty one, hence > > > >> >> >> CPU placement information should be provided somehow, > > > >> >> >> either: =20 > > > >> >> > > > > >> >> > Ah, right, so the issue is determining the socket/core/thread > > > >> >> > addresses that cpus which aren't yet present will have. > > > >> >> > =20 > > > >> >> >> * by precreating cpu-package objects with properties that > > > >> >> >> would describe it /could be inspected via OQM/ =20 > > > >> >> > > > > >> >> > So, we could do this, but I think the natural way would be to= have the > > > >> >> > information for each potential thread in the package. Just p= utting > > > >> >> > say "core number" in the package itself assumes more than I'd= like > > > >> >> > about how packages sit in the heirarchy. Plus, it means that > > > >> >> > management has a bunch of cases to deal with: package has all= the > > > >> >> > information, package has just a core id, package has just a s= ocket id, > > > >> >> > and so forth. > > > >> >> > > > > >> >> > It is a but clunky that when the package is plugged, this inf= ormation > > > >> >> > will have to sit parallel to the array of actual thread links. > > > >> >> > > > > >> >> > Markus or Andreas is there a natural way to present a list of= (node, > > > >> >> > socket, core, thread) tuples in the package object? Preferab= ly > > > >> >> > without having to create a whole bunch of "potential thread" = objects > > > >> >> > just for the purpose. =20 > > > >> >>=20 > > > >> >> I'm just a dabbler when it comes to QOM, but I can try. > > > >> >>=20 > > > >> >> I view a concrete cpu-package device (subtype of the abstract > > > >> >> cpu-package device) as a composite device containing stuff like= actual > > > >> >> cores. =20 > > > >> > > > > >> > So.. the idea is it's a bit more abstract than that. My intenti= on is > > > >> > that the package lists - in some manner - each of the threads > > > >> > (i.e. vcpus) it contains / can contain. Depending on the platfo= rm it > > > >> > *might* also have internal structure such as cores / sockets, bu= t it > > > >> > doesn't have to. Either way, the contained threads will be list= ed in > > > >> > a common way, as a flat array. > > > >> > =20 > > > >> >> To create a composite device, you start with the outer shell, t= hen plug > > > >> >> in components one by one. Components can be nested arbitrarily= deep. > > > >> >>=20 > > > >> >> Perhaps you can define the concrete cpu-package shell in a way = that lets > > > >> >> you query what you need to know from a mere shell (no components > > > >> >> plugged). =20 > > > >> > > > > >> > Right.. that's exactly what I'm suggesting, but I don't know eno= ugh > > > >> > about the presentation of basic data in QOM to know quite how to > > > >> > accomplish it. > > > >> > =20 > > > >> >> >> or > > > >> >> >> * via QMP/HMP command that would provide the same informati= on > > > >> >> >> only without need to precreate anything. The only differe= nce > > > >> >> >> is that it allows to use -device/device_add for new CPUs.= =20 > > > >> >> > > > > >> >> > I'd be ok with that option as well. I'd be thinking it would= be > > > >> >> > implemented via a class method on the package object which re= turns the > > > >> >> > addresses that its contained threads will have, whether or no= t they're > > > >> >> > present right now. Does that make sense? =20 > > > >> >>=20 > > > >> >> If you model CPU packages as composite cpu-package devices, the= n you > > > >> >> should be able to plug and unplug these with device_add, unless= plugging > > > >> >> them requires complex wiring that can't be done in qdev / devic= e_add, > > > >> >> yet. =20 > > > >> > > > > >> > There's a whole bunch of issues raised by allowing device_add of > > > >> > cpus. Although they're certainly interesting and probably usefu= l, I'd > > > >> > really like to punt on them for the time being, so we can get so= me > > > >> > sort of cpu hotplug working on Power (and s390 and others). =20 > > > >>=20 > > > >> If you make it a device, you can still set > > > >> cannot_instantiate_with_device_add_yet to disable -device / device= _add > > > >> for now, and unset it later, when you're ready for it. =20 > > > > > > > > Yes, that was the plan. > > > > =20 > > > >> > The idea of the cpu packages is that - at least for now - the us= er > > > >> > can't control their contents apart from the single "present" bit. > > > >> > They already know what they can contain. =20 > > > >>=20 > > > >> Composite devices commonly do. They're not general containers. > > > >>=20 > > > >> The "present" bit sounds like you propose to "pre-plug" all the po= ssible > > > >> CPU packages, and thus reduce CPU hot plug/unplug to enabling/disa= bling > > > >> pre-plugged CPU packages. =20 > > > > > > > > Yes. =20 > > >=20 > > > I'm concerned this might suffer combinatorial explosion. > > >=20 > > > qemu-system-x86_64 --cpu help shows more than two dozen CPUs. They c= an > > > be configured in numerous arrangements of sockets, cores, threads. M= any > > > of these wouldn't be physically possible with older CPUs. Guest > > > software might work even with physically impossible configurations, b= ut > > > arranging virtual models of physical hardware in physically impossible > > > configurations invites trouble, and should best be avoided. > > >=20 > > > I'm afraid I'm still in the guess-what-you-mean stage because I lack > > > concrete examples to go with the abstract description. Can you > > > enumerate the pre-plugged CPU packages for a board of your choice to > > > give us a better idea of how your proposal would look like in practic= e? > > > Then describe briefly what a management application would need to know > > > about them, and what it would do with the knowledge? > > >=20 > > > Perhaps a PC board would be the most useful, because PCs are probably > > > second to none in random complexity :) =20 > >=20 > > Well, it may be moot at this point, since Andreas has objected > > strongly to Bharata's draft for reasons I have yet to really figure > > out. > >=20 > > But I think the answer below will clarify this. > >=20 > > > >> What if a board can take different kinds of CPU packages? Do we > > > >> pre-plug all combinations? Then some combinations are non-sensica= l. > > > >> How would we reject them? =20 > > > > > > > > I'm not trying to solve all cases with the present bit handling - j= ust > > > > the currently common case of a machine with fixed maximum number of > > > > slots which are expected to contain identical processor units. > > > > =20 > > > >> For instance, PC machines support a wide range of CPUs in various > > > >> arrangements, but you generally need to use a single kind of CPU, = and > > > >> the kind of CPU restricts the possible arrangements. How would you > > > >> model that? =20 > > > > > > > > The idea is that the available slots are determined by the machine, > > > > possibly using machine or global options. So for PC, -cpu and -smp > > > > would determine the number of slots and what can go into them. =20 > > >=20 > > > Do these CPU packages come with "soldered-in" CPUs? Or do they provi= de > > > slots where CPUs can be plugged in? From what I've read, I guess it's > > > the latter, together with a "thou shalt not plug in different CPUs" > > > commandment. Correct? =20 > >=20 > > No, they do in fact come with "soldered in" CPUS. Once the package is > > constructed it is either absent, or supplies exactly one set of cpu > > threads (and possibly other bits and pieces), there is no further > > configuration. > >=20 > > So: > > qemu-system-x86_64 -machine pc -cpu Haswell -smp 2,maxcpus=3D8 > >=20 > > Would give you 8 cpu packages. 2 would initially be present, the rest > > would be absent. If you toggle an absent one to present, another > > single-thread Haswell would appear in the guest. > >=20 > > qemu-system-x86_64 -machine pc -cpu Haswell \ > > -smp 2,threads=3D2,cores=3D2,sockets=3D2,maxcpus=3D8 > >=20 > ok now lets imagine that mgmt set 'present'=3Don for pkg 7 and > that needs to be migrated, how would target QEMU be able to recreate > the state of source QEMU instance? Ugh, yeah, I'm not sure that will work. I had just imagined that we'd migrate the present bit for the pkg, and it would construct the necessary threads on the far end. But ordering that with the transfer of the thread state could get hairy. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --/NkBOFFp2J2Af1nK Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWzlLYAAoJEGw4ysog2bOSp1sQANMxJ0kChX/DCzWCUJsW/t2I 20FmARBlx+EiUs2HKF1CD3+0Lda5WNCVc9mhGxc0BB9hNUk2GjhVR1wzq/xVFsWa e5RhVOsLvmvbB4AO8UWMtqAcjKtYtw8r7oV+o9gCmCe7/AeG2EArDjq7DWf1O12B jUiQzRa+Gt+z36IPGNbp8zuSK8kiPItNXnMTVP6vph5HpThpGkoFTX6hZVSSc+1a RP6a9/UDG6vhOItL9HSpgGCYsNBsd6CJmaEMUdj5LD4a0t7uM5OP0I8mYA58FQME QiY2LKKJCIVWoeK73kY3Y1vwPO0glRvy9PzfDsyOv5wRO3V1y5j2EsQDHN5H4W2C JkvVtXOf7WMpdhqJ//wPbSYM0n0HSk8zrQ8hcxUlNbddQ0XLtbRUrGyyX2MuQjm3 z+nL1V9DE9XpRAU57zl0c+BY1UXTQ6mpNwTfedJZ6PY3OTm0eoSstoeVSPvEk/er T0A/+SSoA4Yb5rbZP8o0AizmwIPYm41p9Z1vrCjWP3U7TyeSWqZ/iYs6Ht6nfg8T SvjJH2lQaprltb1Uk4Ioiglo70h+cQ5/Q3znH1ivDB2GK5NxeHAtmEfQ0p2kav0V +qwaFiTf536qDaHzNwEIiubsocrLoY7KnHzA+Q9sNIJd56WKsJpWVvT3FoJyh/Ee xrRpiuG5ZeI71PDaTp43 =AR2a -----END PGP SIGNATURE----- --/NkBOFFp2J2Af1nK--