From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40717) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlHL7-0002J8-Ny for qemu-devel@nongnu.org; Thu, 23 Apr 2015 09:38:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlHL4-0007NC-Dv for qemu-devel@nongnu.org; Thu, 23 Apr 2015 09:38:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41811) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlHL4-0007N8-3j for qemu-devel@nongnu.org; Thu, 23 Apr 2015 09:38:46 -0400 Date: Thu, 23 Apr 2015 10:17:36 -0300 From: Eduardo Habkost Message-ID: <20150423131736.GA17796@thinpad.lan.raisama.net> References: <1427131923-4670-1-git-send-email-afaerber@suse.de> <5523D0FF.7090609@de.ibm.com> <20150423073233.GB26536@voom.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150423073233.GB26536@voom.redhat.com> Subject: Re: [Qemu-devel] cpu modelling and hotplug (was: [PATCH RFC 0/4] target-i386: PC socket/core/thread modeling, part 1) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Peter Maydell , Bharata B Rao , qemu-devel@nongnu.org, Alexander Graf , Christian Borntraeger , "Jason J. Herne" , Paolo Bonzini , Cornelia Huck , Igor Mammedov , Andreas =?iso-8859-1?Q?F=E4rber?= On Thu, Apr 23, 2015 at 05:32:33PM +1000, David Gibson wrote: > On Tue, Apr 07, 2015 at 02:43:43PM +0200, Christian Borntraeger wrote: > > We had a call and I was asked to write a summary about our conclusion. > > > > The more I wrote, there more I became uncertain if we really came to a > > conclusion and became more certain that we want to define the QMP/HMP/CLI > > interfaces first (or quite early in the process) > > > > As discussed I will provide an initial document as a discussion starter > > > > So here is my current understanding with each piece of information on one line, so > > that everybody can correct me or make additions: > > > > current wrap-up of architecture support > > ------------------- > > x86 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > - supports cpu hotplug via cpu_add > > > > power > > - Topology possible > > - interfaces to query topology? > > For power, topology information is communicated via the > "ibm,associativity" (and related) properties in the device tree. This > is can encode heirarchical topologies, but it is *not* bound to the > socket/core/thread heirarchy. On the guest side in Power there's no > real notion of "socket", just cores with specified proximities to > various memory nodes. > > > - SMT: Power8: no threads in host and full core passed in due to HW design > > may change in the future > > > > s/390 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - always virtualized via PR/SM LPAR > > - host topology from LPAR can be heterogenous (e.g. 3 cpus in 1st socket, 4 in 2nd) > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > > > > > Current downsides of CPU definitions/hotplug > > ----------------------------------------------- > > - smp, sockets=,cores=,threads= builds only homogeneous topology > > - cpu_add does not tell were to add > > - artificial icc bus construct on x86 for several reasons (link, sysbus not hotpluggable..) > > Artificial though it may be, I think having a "cpus" pseudo-bus is not > such a bad idea That was considered before[1][2]. We have use cases for adding additional information about VCPUs to query-cpus, but we could simply use qom-get for that. The only thing missing is a predictable QOM path for VCPU objects. If we provide something like "/cpus/" links on all machines, callers could simply use qom-get to get just the information they need, instead of getting too much information from query-cpus (which also has the side-effect of interrupting all running VCPUs to synchronize register information). Quoting part of your proposal below: > Ignoring NUMA topology (I'll come back to that in a moment) qemu > should really only care about two things: > > a) the unit of execution scheduling (a vCPU or "thread") > b) the unit of plug/unplug > [...] > 3) I'm thinking we'd have a "cpus" virtual bus represented in QOM, > which would contain the vCMs (also QOM objects). Their existence > would be generic, though we'd almost certainly use arch and/or machine > specific subtypes. > > 4) There would be a (generic) way of finding the vCPUS (threads) in a > vCM and the vCM for a specific vCPU. > What I propose now is a bit simpler: just a mechanism for enumerating VCPUs/threads (a), that would replace query-cpus. Later we could also have a generic mechanism for (b), if we decide to introduce a generic "CPU module" abstraction for plug/unplug. A more complex mechanism to enumerating vCMs and the vCPUs inside a vCM would be a superset of (a), so in theory we wouldn't need both. But I believe that: 1) we will take some time to define the details of the vCM/plug/unplug abstractions; 2) we already have use cases today[2] that could benefit from a generic QOM path for (a). [1] Message-ID: <20140516151641.GY3302@otherpad.lan.raisama.net> http://article.gmane.org/gmane.comp.emulators.qemu/273463 [2] Message-ID: <20150331131623.GG7031@thinpad.lan.raisama.net> http://article.gmane.org/gmane.comp.emulators.kvm.devel/134625 > > > discussions > > ------------------- > > - we want to be able to (most important question, IHMO) > > - hotplug CPUs on power/x86/s390 and maybe others > > - define topology information > > - bind the guest topology to the host topology in some way > > - to host nodes > > - maybe also for gang scheduling of threads (might face reluctance from > > the linux scheduler folks) > > - not really deeply outlined in this call > > - QOM links must be allocated at boot time, but can be set later on > > - nothing that we want to expose to users > > - Machine provides QOM links that the device_add hotplug mechanism can use to add > > new CPUs into preallocated slots. "CPUs" can be groups of cores and/or threads. > > - hotplug and initial config should use same semantics > > - cpu and memory topology might be somewhat independent > > --> - define nodes > > - map CPUs to nodes > > - map memory to nodes > > > > - hotplug per > > - socket > > - core > > - thread > > ? > > Now comes the part where I am not sure if we came to a conclusion or not: > > - hotplug/definition per core (but not per thread) seems to handle all cases > > - core might have multiple threads ( and thus multiple cpustates) > > - as device statement (or object?) > > - mapping of cpus to nodes or defining the topology not really > > outlined in this call > > > > To be defined: > > - QEMU command line for initial setup > > - QEMU hmp/qmp interfaces for dynamic setup > > So, I can't say I've entirely got my head around this, but here's my > thoughts so far. > > I think the basic problem here is that the fixed socket -> core -> > thread heirarchy is something from x86 land that's become integrated > into qemu's generic code where it doesn't entirely make sense. > > Ignoring NUMA topology (I'll come back to that in a moment) qemu > should really only care about two things: > > a) the unit of execution scheduling (a vCPU or "thread") > b) the unit of plug/unplug > > Now, returning to NUMA topology. What the guest, and therefore qemu, > really needs to know is the relative proximity of each thread to each > block of memory. That usually forms some sort of node heirarchy, > but it doesn't necessarily correspond to a socket->core->thread > heirarchy you can see in physical units. > > On Power, an arbitrary NUMA node heirarchy can be described in the > device tree without reference to "cores" or "sockets", so really qemu > has no business even talking about such units. > > IIUC, on x86 the NUMA topology is bound up to the socket->core->thread > heirarchy so it needs to have a notion of those layers, but ideally > that would be specific to the pc machine type. > > So, here's what I'd propose: > > 1) I think we really need some better terminology to refer to the unit > of plug/unplug. Until someone comes up with something better, I'm > going to use "CPU Module" (CM), to distinguish from the NUMA baggage > of "socket" and also to refer more clearly to the thing that goes into > the socket, rather than the socket itself. > > 2) A Virtual CPU Module (vCM) need not correspond to a real physical > object. For machine types which we want to faithfully represent a > specific physical machine, it would. For generic or pure virtual > machines, the vCMs would be as small as possible. So for current > Power, they'd be one virtual core, for future power (maybe) or s390 a > single virtual thread. For x86 I'm not sure what they'd be. > > 3) I'm thinking we'd have a "cpus" virtual bus represented in QOM, > which would contain the vCMs (also QOM objects). Their existence > would be generic, though we'd almost certainly use arch and/or machine > specific subtypes. > > 4) There would be a (generic) way of finding the vCPUS (threads) in a > vCM and the vCM for a specific vCPU. > > 5) A vCM *might* have internal subdivisions into "cores" or "nodes" or > "chips" or "MCMs" or whatever, but that would be up to the machine > type specific code, and not represented in the QOM heirarchy. > > 6) Obviously we'd need some backwards compat goo to sort out existing > command line options referring to cores and sockets into the new > representation. This will need machine type specific hooks - so for > x86 it would need to set up the right vCM subdivisions and make sure > the right NUMA topology info goes into ACPI. For -machine pseries I'm > thinking that "-smp sockets=2,cores=1,threads=4" and "-smp > sockets=1,cores=2,threads=4" should result in exactly the same thing > internally. > > > Thoughts? > > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Eduardo