qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	peter.maydell@linaro.org, pkrempa@redhat.com, cohuck@redhat.com,
	qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Date: Thu, 19 Oct 2017 22:49:54 +1100	[thread overview]
Message-ID: <20171019114954.GC13245@umbus> (raw)
In-Reply-To: <20171018202240.GD2942@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 5770 bytes --]

On Wed, Oct 18, 2017 at 06:22:40PM -0200, Eduardo Habkost wrote:
> On Wed, Oct 18, 2017 at 04:30:10PM +0100, Daniel P. Berrange wrote:
> > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > 
> > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >   
> > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > NUMA mapping for cpus.    
> > > > > > 
> > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > do for NUMA configuration ?  
> > > > > From RHBZ1382425
> > > > > "
> > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > values for -numa cpus=... QEMU CLI option.  
> > > > 
> > > > In broad terms, this problem applies to every device / object libvirt
> > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > "id" string, which is can then use to identify the thing later. The
> > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > libvirt has to infer. The latter is the same problem we had with
> > > > devices before '-device' was introduced allowing 'id' naming.
> > > > 
> > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > the individual CPUs as something we can explicitly create with -object
> > > > or -device. That way libvirt can assign names and does not have to 
> > > > care about CPU index values, and it all works just the same way as
> > > > any other devices / object we create
> > > > 
> > > > ie instead of:
> > > > 
> > > >   -smp 8,sockets=4,cores=2,threads=1
> > > >   -numa node,nodeid=0,cpus=0-3
> > > >   -numa node,nodeid=1,cpus=4-7
> > > > 
> > > > we could do:
> > > > 
> > > >   -object numa-node,id=numa0
> > > >   -object numa-node,id=numa1
> > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > come from, currently these options are the function of
> > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > runtime after qemu parses -M and -smp options.
> > 
> > NB, I realize my example was open to mis-interpretation. The values I'm
> > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > are a plain enumeration of values. ie this is saying the 4th socket, the
> > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > level
> 
> I believe we have been trying to avoid index numbers to identify
> entities as a reaction to the bad experience we had with the
> cpu_index/apic_id mess in the past.
> 
> An interface using arch-independent socket/core/thread indexes
> (not arch-dependent IDs) like you propose in the paragraph above
> could be a solution, as long as it is documented very clearly
> (and we include automated testing for those constraints).  But
> note that this is _not_ how the socket/core/thread IDs on the
> "-device *-cpu" and -numa command-line options work today.
> 
> Also, this might solve the problem for CPU socket/core/thread
> identification, but might not be enough for the messy device
> address assignment rules that libvirt needs to duplicate in
> src/qemu/qemu_domain_address.c today.

Note that describing socket/core/thread tuples as arch independent (or
even machine independent) is.. debatable.  I mean it's flexible enough
that most platforms can be fit to that scheme without too much
straining.  But, there's no arch independent way of defining what each
level means in terms of its properties.

So, for example, on spapr - being paravirt - there's no real
distinction between cores and sockets, how you divide them up is
completely arbitrary.  I don't think we have any implemented, but it's
easy to imagine modelling a big server type machine with more than 3
natural layers of heirarchy (say, thread, core, chip,
multi-chip-module, big-honkin-drawer-of-processors, ...).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2017-10-19 12:20 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
2017-10-17  5:49   ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
2017-10-18  3:27   ` David Gibson
2017-10-18 14:53     ` Eric Blake
2017-10-16 16:22 ` [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field Igor Mammedov
2017-10-18 11:12   ` [Qemu-devel] [RFC v2 " Igor Mammedov
2017-10-19  6:31     ` David Gibson
2017-10-31 14:01       ` Igor Mammedov
2017-11-06 18:02         ` Eduardo Habkost
2017-11-07 15:04           ` Cornelia Huck
2017-11-09  6:58             ` David Gibson
2017-11-09 20:02               ` Eduardo Habkost
2017-11-10 10:14                 ` Cornelia Huck
2017-11-10 12:34                   ` David Hildenbrand
2017-11-10 12:58                     ` Eduardo Habkost
2017-11-10 13:07                       ` David Hildenbrand
2017-11-21 14:02                 ` Igor Mammedov
2017-11-09  6:53           ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
2017-10-16 16:35   ` Daniel P. Berrange
2017-10-17  8:17     ` Igor Mammedov
2017-10-17 10:56       ` Laszlo Ersek
2017-10-17 11:11         ` Peter Krempa
2017-10-20 15:38     ` Eduardo Habkost
2017-10-16 16:59   ` Eduardo Habkost
2017-10-16 17:01     ` Paolo Bonzini
2017-10-16 17:17       ` Eduardo Habkost
2017-10-17  8:47         ` Paolo Bonzini
2017-10-17  9:25           ` Igor Mammedov
2017-10-17 14:48       ` Daniel P. Berrange
2017-10-17 15:21         ` Laszlo Ersek
2017-10-17 15:35           ` Daniel P. Berrange
2017-10-17 15:42             ` Laszlo Ersek
2017-10-17 15:47               ` Daniel P. Berrange
2017-10-17 15:47             ` Igor Mammedov
2017-10-17 15:52               ` Daniel P. Berrange
2017-10-17  9:10     ` Igor Mammedov
2017-10-19 10:42     ` David Gibson
2017-10-20  0:15       ` Eduardo Habkost
2017-10-20  1:19         ` David Gibson
2017-10-20 14:21           ` Eduardo Habkost
2017-10-23  9:49             ` Igor Mammedov
2017-10-23  9:53               ` Daniel P. Berrange
2017-10-23 10:36                 ` Igor Mammedov
2017-10-23 10:49                   ` Daniel P. Berrange
2017-10-23 11:18                     ` Igor Mammedov
2017-10-25 10:52                       ` Eduardo Habkost
2017-10-25 10:35               ` Eduardo Habkost
2017-10-23  9:30         ` Alex Bennée
2017-10-16 16:22 ` [Qemu-devel] [RFC 5/6] HMP: add set-numa-node command Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 6/6] QMP: " Igor Mammedov
2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
2017-10-16 17:05   ` Eduardo Habkost
2017-10-17  7:27   ` Igor Mammedov
2017-10-17 15:07     ` Daniel P. Berrange
2017-10-17 15:24       ` Laszlo Ersek
2017-10-17 16:06       ` Igor Mammedov
2017-10-17 16:09         ` Daniel P. Berrange
2017-10-17 16:18           ` Igor Mammedov
2017-10-18 12:59             ` Eduardo Habkost
2017-10-18 14:44               ` Igor Mammedov
2017-10-18 14:49                 ` Daniel P. Berrange
2017-10-18 15:24                   ` Igor Mammedov
2017-10-18 15:27                     ` Daniel P. Berrange
2017-10-18 20:11                       ` Eduardo Habkost
2017-10-18 15:30         ` Daniel P. Berrange
2017-10-18 20:22           ` Eduardo Habkost
2017-10-19 11:49             ` David Gibson [this message]
2017-10-19 12:23               ` Paolo Bonzini
2017-10-20  1:21                 ` David Gibson
2017-10-20 19:53                   ` Eduardo Habkost
2017-10-23  8:17                     ` Igor Mammedov
2017-10-23  8:45                     ` Igor Mammedov
2017-10-25  6:57                       ` Eduardo Habkost
2017-10-25  7:02                         ` Daniel P. Berrange
2017-10-25 13:37                           ` Eduardo Habkost
2017-10-19 15:21           ` Igor Mammedov
2017-10-19 15:28             ` Daniel P. Berrange
2017-10-19 19:56               ` Eduardo Habkost
2017-10-20  9:07                 ` Daniel P. Berrange
2017-10-20 20:07                   ` Eduardo Habkost
2017-10-23  8:53                     ` Igor Mammedov
2017-10-23 10:04                   ` Igor Mammedov
2017-10-23 10:19                     ` Daniel P. Berrange
2017-10-18 12:19       ` Paolo Bonzini
2017-10-18 12:27         ` Daniel P. Berrange
2017-10-18 12:33           ` Paolo Bonzini
2017-10-18 14:26             ` Igor Mammedov
2017-10-18 14:29               ` Paolo Bonzini
2017-10-18 14:54                 ` Igor Mammedov
2017-10-18 14:21           ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171019114954.GC13245@umbus \
    --to=david@gibson.dropbear.id.au \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=pkrempa@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).