Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eduardo Habkost <ehabkost@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
	peter maydell <peter.maydell@linaro.org>,
	pkrempa@redhat.com, cohuck@redhat.com, qemu-devel@nongnu.org,
	armbru@redhat.com, pbonzini@redhat.com,
	david@gibson.dropbear.id.au, Laine Stump <laine@redhat.com>,
	libvir-list@redhat.com
Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Date: Thu, 19 Oct 2017 17:56:49 -0200	[thread overview]
Message-ID: <20171019195649.GJ2942@localhost.localdomain> (raw)
In-Reply-To: <20171019152859.GV8408@redhat.com>

On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:
> On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> > ----- Original Message -----
> > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > 
> > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > 
> > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >   
> > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > option
> > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > NUMA mapping for cpus.
> > > > > > > 
> > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > currently
> > > > > > > do for NUMA configuration ?
> > > > > > From RHBZ1382425
> > > > > > "
> > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > values for -numa cpus=... QEMU CLI option.
> > > > > 
> > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > "id" string, which is can then use to identify the thing later. The
> > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > 
> > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > the individual CPUs as something we can explicitly create with -object
> > > > > or -device. That way libvirt can assign names and does not have to
> > > > > care about CPU index values, and it all works just the same way as
> > > > > any other devices / object we create
> > > > > 
> > > > > ie instead of:
> > > > > 
> > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > >   -numa node,nodeid=0,cpus=0-3
> > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > 
> > > > > we could do:
> > > > > 
> > > > >   -object numa-node,id=numa0
> > > > >   -object numa-node,id=numa1
> > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > come from, currently these options are the function of
> > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > runtime after qemu parses -M and -smp options.
> > > 
> > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > level
> > Even though fixed properties/values simplicity is tempting and it might even
> > work for what we have implemented in qemu currently (well, SPAPR will need
> > refactoring (if possible) to meet requirements + compat stuff for current
> > machines with sparse IDs).
> > But I have to disagree here and try to oppose it.
> > 
> > QEMU models concrete platforms/hw with certain non abstract properties
> > and it's libvirt's domain to translate platform specific devices into
> > 'spherical' devices with abstract properties.
> > 
> > Now back to cpus and suggestion to fix the set of 'address' properties
> > and their values into continuous enumeration range [0..N). That would
> >   1. put a burden of hiding platform/device details on QEMU
> >       (which is already bad as QEMU's job is to emulate it)
> >   2. with abstract 'address' properties and values, user won't have
> >      a clue as to where device is being attached (as qemu would magically
> >      remap that to fit specific machine needs)
> >   2.1. if abstract 'address' properties and values we can do away with
> >      socket/core/thread/whatnot since they won't mean the same when considered
> >      from platform point of view, so we can just drop all these nonsense
> >      and go back to cpu-index that has all the properties you've suggested
> >      /abstract, [0..N]/.
> >   3. we currently stopped with socket|core|thread-id properties as they are
> >      applicable to machines that support -device cpu, but it's up to machine
> >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> >      but current property set is open for extension if need arises without
> >      need to redefine interface. So fixed list of properties [even ignoring
> >      values impact] doesn't scale.
> 
> Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> guest XML, we just provide an overall count of sockets/cores/threads which is
> portable. The only arch specific thing we would have todo is express constraints
> about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> threads per core for example.
> 
> > We even have cpu-add command which takes cpu-index as argument and
> > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > where and if it makes any sense from platform point of view.
> > 
> > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > for hot-plug:
> > 
> > Approach allows 
> >    1: machine to publish properties/values that make sense from emulated
> >       platform point of view but still understandable by user of given hw.
> >    2: user may use them as opaque mandatory properties to create cpu device if
> >       he/she doesn't care about where it's plugged.
> >    3: if user cares about which cpu goes where, properties defined by machine
> >       provide that info from emulated hw point of view including platform specific
> >       details.
> >    4: it's easy to extend set of properties/values if need arises without
> >       breaking users (provided user will put them all in -device/device_add
> >       options as it's supposed to)
> > 
> > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > be started first, which is fine for hot plug but not for specifying CLI options.
> > 
> > Currently that could be solved by starting qemu twice when 'defining domain',
> > where on the first run mgmt queries board layout and caches it for all the next
> > times the defined machine is started (change in machine/version/-smp/-cpu will
> > invalidate, cache).
> > 
> > This series allows to avoid this 1st time restart, when creating domain for
> > the first time, mgmt can query layout and then specify numa mapping without
> > restarting, it can cache defined mapping as commands exactly match corresponding
> > CLI options and reuse cached options on the next domain starts.
> > 
> > This approach could be extended further with "device_add cpu" command
> > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > these commands and reuse them on CLI next time machine is started
> > 
> > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > but working to the same goal to allow mgmt discover which hw is provided by
> > specific machine and where/which hw could be plugged (like which slot supports
> > which kind of device and which 'address' should be used to attach device
> > (socket|core... - for cpus, bus/function - for pic, ...)
> 
> As mentioned elsewhere in the thread, the approach of defining the VM config
> incrementally via the monitor has significant downsides, by making the config
> invisible in any logs of the ARGV, and has likely performance impact when
> starting up QEMU, particularly if it is used for more things going forward. To
> me these downsides are enough to make the suggested approach for CPUs impractical
> for libvirt to use.

Those downsides do exist, but we should weight them against the
downsides of not allowing any information at all to flow from
QEMU to libvirt when starting a VM.

I believe the code in libvirt/src/qemu/qemu_domain_address.c is
a good illustration of those downsides.

-- 
Eduardo

next prev parent reply	other threads:[~2017-10-19 19:57 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
2017-10-17  5:49   ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
2017-10-18  3:27   ` David Gibson
2017-10-18 14:53     ` Eric Blake
2017-10-16 16:22 ` [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field Igor Mammedov
2017-10-18 11:12   ` [Qemu-devel] [RFC v2 " Igor Mammedov
2017-10-19  6:31     ` David Gibson
2017-10-31 14:01       ` Igor Mammedov
2017-11-06 18:02         ` Eduardo Habkost
2017-11-07 15:04           ` Cornelia Huck
2017-11-09  6:58             ` David Gibson
2017-11-09 20:02               ` Eduardo Habkost
2017-11-10 10:14                 ` Cornelia Huck
2017-11-10 12:34                   ` David Hildenbrand
2017-11-10 12:58                     ` Eduardo Habkost
2017-11-10 13:07                       ` David Hildenbrand
2017-11-21 14:02                 ` Igor Mammedov
2017-11-09  6:53           ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
2017-10-16 16:35   ` Daniel P. Berrange
2017-10-17  8:17     ` Igor Mammedov
2017-10-17 10:56       ` Laszlo Ersek
2017-10-17 11:11         ` Peter Krempa
2017-10-20 15:38     ` Eduardo Habkost
2017-10-16 16:59   ` Eduardo Habkost
2017-10-16 17:01     ` Paolo Bonzini
2017-10-16 17:17       ` Eduardo Habkost
2017-10-17  8:47         ` Paolo Bonzini
2017-10-17  9:25           ` Igor Mammedov
2017-10-17 14:48       ` Daniel P. Berrange
2017-10-17 15:21         ` Laszlo Ersek
2017-10-17 15:35           ` Daniel P. Berrange
2017-10-17 15:42             ` Laszlo Ersek
2017-10-17 15:47               ` Daniel P. Berrange
2017-10-17 15:47             ` Igor Mammedov
2017-10-17 15:52               ` Daniel P. Berrange
2017-10-17  9:10     ` Igor Mammedov
2017-10-19 10:42     ` David Gibson
2017-10-20  0:15       ` Eduardo Habkost
2017-10-20  1:19         ` David Gibson
2017-10-20 14:21           ` Eduardo Habkost
2017-10-23  9:49             ` Igor Mammedov
2017-10-23  9:53               ` Daniel P. Berrange
2017-10-23 10:36                 ` Igor Mammedov
2017-10-23 10:49                   ` Daniel P. Berrange
2017-10-23 11:18                     ` Igor Mammedov
2017-10-25 10:52                       ` Eduardo Habkost
2017-10-25 10:35               ` Eduardo Habkost
2017-10-23  9:30         ` Alex Bennée
2017-10-16 16:22 ` [Qemu-devel] [RFC 5/6] HMP: add set-numa-node command Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 6/6] QMP: " Igor Mammedov
2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
2017-10-16 17:05   ` Eduardo Habkost
2017-10-17  7:27   ` Igor Mammedov
2017-10-17 15:07     ` Daniel P. Berrange
2017-10-17 15:24       ` Laszlo Ersek
2017-10-17 16:06       ` Igor Mammedov
2017-10-17 16:09         ` Daniel P. Berrange
2017-10-17 16:18           ` Igor Mammedov
2017-10-18 12:59             ` Eduardo Habkost
2017-10-18 14:44               ` Igor Mammedov
2017-10-18 14:49                 ` Daniel P. Berrange
2017-10-18 15:24                   ` Igor Mammedov
2017-10-18 15:27                     ` Daniel P. Berrange
2017-10-18 20:11                       ` Eduardo Habkost
2017-10-18 15:30         ` Daniel P. Berrange
2017-10-18 20:22           ` Eduardo Habkost
2017-10-19 11:49             ` David Gibson
2017-10-19 12:23               ` Paolo Bonzini
2017-10-20  1:21                 ` David Gibson
2017-10-20 19:53                   ` Eduardo Habkost
2017-10-23  8:17                     ` Igor Mammedov
2017-10-23  8:45                     ` Igor Mammedov
2017-10-25  6:57                       ` Eduardo Habkost
2017-10-25  7:02                         ` Daniel P. Berrange
2017-10-25 13:37                           ` Eduardo Habkost
2017-10-19 15:21           ` Igor Mammedov
2017-10-19 15:28             ` Daniel P. Berrange
2017-10-19 19:56               ` Eduardo Habkost [this message]
2017-10-20  9:07                 ` Daniel P. Berrange
2017-10-20 20:07                   ` Eduardo Habkost
2017-10-23  8:53                     ` Igor Mammedov
2017-10-23 10:04                   ` Igor Mammedov
2017-10-23 10:19                     ` Daniel P. Berrange
2017-10-18 12:19       ` Paolo Bonzini
2017-10-18 12:27         ` Daniel P. Berrange
2017-10-18 12:33           ` Paolo Bonzini
2017-10-18 14:26             ` Igor Mammedov
2017-10-18 14:29               ` Paolo Bonzini
2017-10-18 14:54                 ` Igor Mammedov
2017-10-18 14:21           ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171019195649.GJ2942@localhost.localdomain \
    --to=ehabkost@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=imammedo@redhat.com \
    --cc=laine@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=pkrempa@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).