qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	peter.maydell@linaro.org, pkrempa@redhat.com, cohuck@redhat.com,
	armbru@redhat.com, qemu-devel@nongnu.org, pbonzini@redhat.com,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
Date: Mon, 23 Oct 2017 11:49:44 +0100	[thread overview]
Message-ID: <20171023104944.GH16472@redhat.com> (raw)
In-Reply-To: <20171023123620.64c5263c@nial.brq.redhat.com>

On Mon, Oct 23, 2017 at 12:36:20PM +0200, Igor Mammedov wrote:
> On Mon, 23 Oct 2017 10:53:16 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:
> > > On Fri, 20 Oct 2017 12:21:00 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:  
> > > > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:    
> > > > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:    
> > > > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:    
> > > > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:    
> > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > ---
> > > > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > > > >  qmp.c                   |  5 +++++
> > > > > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > > > index b213696..3feb94f 100644
> > > > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > > > >  } WakeupReason;
> > > > > > > > >  
> > > > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > > > >  void qemu_system_suspend_request(void);
> > > > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > > > index 39225ae..bd44db8 100644
> > > > > > > > > --- a/qemu-options.hx
> > > > > > > > > +++ b/qemu-options.hx
> > > > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > > > >  Run the emulation in single step mode.
> > > > > > > > >  ETEXI
> > > > > > > > >  
> > > > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > > > > +    QEMU_ARCH_ALL)    
> > > > > > > > 
> > > > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > > > management could run query-machines before choosing a
> > > > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > > > called?
> > > > > > > > 
> > > > > > > > The same probably applies to other things initialized before
> > > > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > > > including but not limited to:
> > > > > > > > * Accelerator configuration
> > > > > > > > * Registering global properties
> > > > > > > > * RAM size
> > > > > > > > * SMP/CPU configuration    
> > > > > > > 
> > > > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > > > doesn't sound great.    
> > > > > > 
> > > > > > I agree.  The number of externally visible pause states should be
> > > > > > as small as possible.
> > > > > > 
> > > > > >     
> > > > > > >                       Could we avoid this by instead changing -S to
> > > > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > > > that require a later stage automatically "fast forwarding" to the
> > > > > > > right phase?    
> > > > > > 
> > > > > > That would hide the internal details from the outside.  Sounds
> > > > > > nice, but adding new machine/device configuration QMP commands
> > > > > > while hiding the QEMU state from the outside sounds impossible.
> > > > > > 
> > > > > > For example, if we use -S today, this works:
> > > > > > 
> > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > > >   -> {"execute":"qmp_capabilities"}    
> > > > > >   <- {"return": {}}    
> > > > > >   -> {"execute":"query-cpus"}    
> > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > > > > 
> > > > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > > > stage if we want to keep compatibility.
> > > > > > 
> > > > > > Now, assume we add a set-numa-node command like the one in this
> > > > > > series.  e.g.:
> > > > > > 
> > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > > >   -> {"execute":"qmp_capabilities"}    
> > > > > >   <- {"return": {}}    
> > > > > >   -> {"execute":"set-numa-node" ... }    
> > > > > >   <- {"return": ...}
> > > > > > 
> > > > > > The command will work only if machine initialization didn't run
> > > > > > yet.
> > > > > > 
> > > > > > But now an innocent-looking query command would change QEMU state
> > > > > > in an unexpected way:
> > > > > > 
> > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > > >   -> {"execute":"qmp_capabilities"}    
> > > > > >   <- {"return": {}}    
> > > > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]    
> > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}    
> > > > > >   -> {"execute":"set-numa-node" ... }    
> > > > > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > > > > 
> > > > > > This means we do have a externally visible "too late to use
> > > > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > > > visible side effect.  Every QMP command would need to document
> > > > > > how it affects QEMU state in a externally visible way.
> > > > > > 
> > > > > > If QEMU pause state is still going to be externally visible this
> > > > > > way, I would prefer to let the client to explicitly tell what's
> > > > > > the state they want QEMU to be, instead of making QEMU change
> > > > > > state silently as a side effect of QMP commands.    
> > > > > 
> > > > > Yeah, good point.  My proposal would just have changed explicitly
> > > > > exposed ugly internal state to subtly exposed ugly internal state,
> > > > > which is probably worse :(.
> > > > > 
> > > > > 
> > > > > Ok.. next possibly bad idea..
> > > > > 
> > > > > What about a "re-exec" monitor command; it would take what's
> > > > > essentially a new command line, and basically restart qemu from the
> > > > > beginning, reparsing this new command line, but without actually 
> > > > > 
> > > > > Pro:
> > > > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > > > >     configuration being buried in the qmp session - if libvirt logged
> > > > >     its last "re-exec" that would have what is generally needed.
> > > > >   * Lets libvirt do assorted investigation of options, then rewind to
> > > > >     choose what it actually wants    
> > > > 
> > > > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > > > It would be a very simple interface, not sure it can be easily
> > > > implemented efficiently.
> > > > 
> > > > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > > >   
> > > > > 
> > > > > Con:
> > > > >   * Would require a bunch of auditing of structures/state to make sure
> > > > >     they can be re-initialized cleanly    
> > > > 
> > > > This sounds like a big obstacle.  QEMU still have too much global
> > > > state outside the machine/qdev tree.
> > > > 
> > > >   
> > > > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > > > >     slowness which makes multiple qemu invocations by libvirt
> > > > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > > > >     internal start up processing?    
> > > > 
> > > > My gut feeling is that this could be too slow, if the scope of
> > > > "re-exec" is too big.
> > > > 
> > > > 
> > > > Now, let me try to go to the opposite extreme: I think you had a
> > > > good point in your previous proposal.  Why should we need to
> > > > restart/re-execute anything at all just because some bit of
> > > > configuration is being changed by libvirt?  Why commands like
> > > > set-numa-node should require QEMU to be in a state that is not
> > > > covered by -S?  If the guest is not running yet, there should be
> > > > no reason to require clients to explicitly pause/continue/restart
> > > > anything.  
> > > It's probably doable to do numa config at '-S' time for x86 (arm),
> > > since ACPI tables are regenerated on the first read (legacy fw_cfg
> > > would be a little problematic but probably could be 'fixed' as well)
> > > 
> > > But I can't say outright if it's doable for other targets,
> > > in general issue here is that '-S' pauses after machine_done is run
> > > and all necessary wiring board requires is finalized by then
> > > and no hooks run after unpause.
> > > If there is a general consensus to go this route, I can invest
> > > some time in making it work (then this series could be dropped)
> > > 
> > > Even so, postponing set-numa to '-S' won't address Daniel's concern,
> > > i.e. configuration would take several round trips of command to complete
> > > potentially oven slow network. But as it was said libvirt can cache
> > > new CLI options for further reuse.  
> > 
> > We can cache stuff from the generic "-m none" invokation, but we won't
> > cache stuff from invokation of a specific VM instance, because we can't
> > have confidence that such data is independant of the VM config. So we
> In case if cpu layout we have fixed set of options that influence it
> (-M foo_vXX -smp ...),  so from QEMU side it should be possible to
> promise it would stay stable.
> But such caching would be useful in other use cases as well.
> Is the issue in invalidating cached data in case of option(s) would
> change cached data?

For the caching to be useful, we need to have a good cache hit rate.
If the cache depends on alot of different CLI args, then you're going
to have to populate many caches each with low hit rate. The current
caching is done based on QEMU/libvirtd binary, so we have 1 cache miss
when QEMU or libvirt are upgraded, then 100% cache hit thereafter, so
the cache is very effective.

> > would likely just end up hardcoding the arch specific data in libvirt if
> > that was all QEMU provided.
> Another insane idea is to make algorithm introspectable, i.e.
> publish per machine code that would be used by both, mgmt and qemu
> to compute layout, for example in python. It's probably not issue
> for libvirt but qemu will have to embed python to make shared
> algorithm work. Not sure if it's acceptable from qemu pov.

That's not going to fly - we definitely cannot assume apps want or can
run python code. Libvirt has major users in many programming languages,
including Python, C, Go, Java, Vala and more.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

  reply	other threads:[~2017-10-23 10:49 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
2017-10-17  5:49   ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
2017-10-18  3:27   ` David Gibson
2017-10-18 14:53     ` Eric Blake
2017-10-16 16:22 ` [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field Igor Mammedov
2017-10-18 11:12   ` [Qemu-devel] [RFC v2 " Igor Mammedov
2017-10-19  6:31     ` David Gibson
2017-10-31 14:01       ` Igor Mammedov
2017-11-06 18:02         ` Eduardo Habkost
2017-11-07 15:04           ` Cornelia Huck
2017-11-09  6:58             ` David Gibson
2017-11-09 20:02               ` Eduardo Habkost
2017-11-10 10:14                 ` Cornelia Huck
2017-11-10 12:34                   ` David Hildenbrand
2017-11-10 12:58                     ` Eduardo Habkost
2017-11-10 13:07                       ` David Hildenbrand
2017-11-21 14:02                 ` Igor Mammedov
2017-11-09  6:53           ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
2017-10-16 16:35   ` Daniel P. Berrange
2017-10-17  8:17     ` Igor Mammedov
2017-10-17 10:56       ` Laszlo Ersek
2017-10-17 11:11         ` Peter Krempa
2017-10-20 15:38     ` Eduardo Habkost
2017-10-16 16:59   ` Eduardo Habkost
2017-10-16 17:01     ` Paolo Bonzini
2017-10-16 17:17       ` Eduardo Habkost
2017-10-17  8:47         ` Paolo Bonzini
2017-10-17  9:25           ` Igor Mammedov
2017-10-17 14:48       ` Daniel P. Berrange
2017-10-17 15:21         ` Laszlo Ersek
2017-10-17 15:35           ` Daniel P. Berrange
2017-10-17 15:42             ` Laszlo Ersek
2017-10-17 15:47               ` Daniel P. Berrange
2017-10-17 15:47             ` Igor Mammedov
2017-10-17 15:52               ` Daniel P. Berrange
2017-10-17  9:10     ` Igor Mammedov
2017-10-19 10:42     ` David Gibson
2017-10-20  0:15       ` Eduardo Habkost
2017-10-20  1:19         ` David Gibson
2017-10-20 14:21           ` Eduardo Habkost
2017-10-23  9:49             ` Igor Mammedov
2017-10-23  9:53               ` Daniel P. Berrange
2017-10-23 10:36                 ` Igor Mammedov
2017-10-23 10:49                   ` Daniel P. Berrange [this message]
2017-10-23 11:18                     ` Igor Mammedov
2017-10-25 10:52                       ` Eduardo Habkost
2017-10-25 10:35               ` Eduardo Habkost
2017-10-23  9:30         ` Alex Bennée
2017-10-16 16:22 ` [Qemu-devel] [RFC 5/6] HMP: add set-numa-node command Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 6/6] QMP: " Igor Mammedov
2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
2017-10-16 17:05   ` Eduardo Habkost
2017-10-17  7:27   ` Igor Mammedov
2017-10-17 15:07     ` Daniel P. Berrange
2017-10-17 15:24       ` Laszlo Ersek
2017-10-17 16:06       ` Igor Mammedov
2017-10-17 16:09         ` Daniel P. Berrange
2017-10-17 16:18           ` Igor Mammedov
2017-10-18 12:59             ` Eduardo Habkost
2017-10-18 14:44               ` Igor Mammedov
2017-10-18 14:49                 ` Daniel P. Berrange
2017-10-18 15:24                   ` Igor Mammedov
2017-10-18 15:27                     ` Daniel P. Berrange
2017-10-18 20:11                       ` Eduardo Habkost
2017-10-18 15:30         ` Daniel P. Berrange
2017-10-18 20:22           ` Eduardo Habkost
2017-10-19 11:49             ` David Gibson
2017-10-19 12:23               ` Paolo Bonzini
2017-10-20  1:21                 ` David Gibson
2017-10-20 19:53                   ` Eduardo Habkost
2017-10-23  8:17                     ` Igor Mammedov
2017-10-23  8:45                     ` Igor Mammedov
2017-10-25  6:57                       ` Eduardo Habkost
2017-10-25  7:02                         ` Daniel P. Berrange
2017-10-25 13:37                           ` Eduardo Habkost
2017-10-19 15:21           ` Igor Mammedov
2017-10-19 15:28             ` Daniel P. Berrange
2017-10-19 19:56               ` Eduardo Habkost
2017-10-20  9:07                 ` Daniel P. Berrange
2017-10-20 20:07                   ` Eduardo Habkost
2017-10-23  8:53                     ` Igor Mammedov
2017-10-23 10:04                   ` Igor Mammedov
2017-10-23 10:19                     ` Daniel P. Berrange
2017-10-18 12:19       ` Paolo Bonzini
2017-10-18 12:27         ` Daniel P. Berrange
2017-10-18 12:33           ` Paolo Bonzini
2017-10-18 14:26             ` Igor Mammedov
2017-10-18 14:29               ` Paolo Bonzini
2017-10-18 14:54                 ` Igor Mammedov
2017-10-18 14:21           ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171023104944.GH16472@redhat.com \
    --to=berrange@redhat.com \
    --cc=armbru@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=pkrempa@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).