Dynamic & heterogeneous machines, initial configuration: problems

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Dynamic & heterogeneous machines, initial configuration: problems
@ 2024-01-31 20:14 Markus Armbruster
  2024-02-01  8:06 ` Zhao Liu
  2024-02-05 12:47 ` Daniel P. Berrangé
  0 siblings, 2 replies; 3+ messages in thread
From: Markus Armbruster @ 2024-01-31 20:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Philippe Mathieu-Daudé, Brian Cain, Warner Losh, Luc Michel,
	Bernhard Beschow, Paul Walmsley, Alessandro Di Federico,
	Mark Burton, Cédric Le Goater, Daniel P. Berrangé,
	Edgar E. Iglesias, LIU Zhiwei, Dr. David Alan Gilbert,
	Paolo Bonzini, Eduardo Habkost, Jim Shu, Richard Henderson,
	Alistair Francis, Alex Bennée, Anton Johansson

This memo is the fruit of discussions with Philippe Mathieu-Daudé.
Its errors are mine.

QEMU defines machines statically in C code.  We've long wished we could
define them dynamically in some suitable DSL.  This is what we call
"dynamic machines".

There's a need for machines that contain more than one target's CPUs.
This is what we call "heterogeneous machines".  They require a single
binary capable of any of the targets involved.

There's substantial overlap with a seemingly unrelated problem:
machine-friendly initial configuration.

To keep the memo's length in check (sort of), it focuses on (known)
problems.

= Problem 1: Initial configuration =

Previously discussed in

    Subject: Redesign of QEMU startup & initial configuration
    Date: Thu, 02 Dec 2021 07:57:38 +0100
    Message-ID: <87lf13cx3x.fsf@dusky.pond.sub.org>

== What users want for initial configuration ==

1. QMP only

   Management applications need to use QMP for monitoring anyway.  They
   may want to use it for initial configuration, too.  Libvirt does.

   They still need to bootstrap a QMP monitor, and for that, CLI is fine
   as long as it's simple and stable.

2. CLI and configuration files

   Human users want a CLI and configuration files.

   CLI is good for quick tweaks, and to explore.

   For more permanent, non-trivial configuration, configuration files
   are more suitable, because they are easier to read, edit, and
   document than long command lines.

== What we have for initial configuration ==

Half of 1. and half of 2., satisfying nobody's needs.

Management applications need to do a lot of non-trivial initial
configuration with the CLI.

Human users struggle with inconsistent syntax, insufficiently expressive
configuration files, and huge command lines.

= Problem 2: Defining machines =

This is how I understand the problem.  Please correct me where I'm off.

== How we'd like to build machines ==

We want to build machines declaratively, by configuring devices and
their connections.

We want to build composite devices the same way.

The non-composite devices are provided by the QEMU binary.

Users want to build machines as variations of canned machine types
shipped with QEMU.  Some users may want to build their own machines from
scratch.

To enable all this, machine configuration needs to be composable and
dynamic.

Composable means configuration can be assembled from components,
recursively.

Dynamic means it can be done during qemu-system-FOO initial
configuration.

== What we have for defining machines ==

A QEMU binary provides a fixed set of device types, some of them
composite, and a fixed set of machine types.

Machines are QOM objects: instance of a concrete subtype of "machine".

Devices are usually QOM objects: instance of a concrete subtype of
"device".  Exceptions remain in old code nobody can be bothered to
update.

Both machine types and composite devices are built from devices
by code, i.e. imperatively, not declaratively.

The code can be parameterized.  For QOM objects, parameters should be
QOM properties, but machine type code additionally uses global old-style
configuration such as -drive and -serial.

Code may create default backends for convenience.  Machine type code may
also create backends to honor global old-style configuration.  Only some
backends are QOM objects.

Machine types split their code between object creation (QOM methods
.instance_init() and .instance_post_init()) and machine initialization
(MachineClass method .init()).  However, basically everything is done in
the latter.

QOM device types split their code between object creation and device
realization (qdev method .realize()).  The actual split varies widely
between devices.  Developers are commonly unsure what to put where.

After machine type code is done, the resulting machine can still be
adjusted with device cold plug and unplug: -device, device_add,
device_del.  Only works for a subset of the devices.

Related, but out of scope here: hot plug and unplug.

= Common sub-problem: qemu-system initial startup =

QAPI/QMP is our most capable, flexible, and mature configuration
interface.  We need to offer machine-friendly initial configuration via
QMP, and we'd very much like to have a QAPI-based CLI and configuration
files (see "What users want for initial configuration" above).

Dynamic machine configuration happens during initial startup.  This
makes it part of the larger initial configuration problem.  We want an
integrated solution for the larger configuration problem that includes
machine configuration.

Traditionally, QMP becomes available quite late, long after machine
initialization.  This precludes use of QMP for most parts of initial
configuration, including dynamic machine configuration.

To enable a bit of machine configuration via QMP, experimental CLI
option -preconfig delays part of initial startup including machine
initialization by moving it into QMP command x-exit-preconfig.  Only
selected commands (the ones marked 'allow-preconfig': true) are
available before x-exit-preconfig.

To enable arbitrary configuration via QMP, we need to make it available
before we complete configuring anything.

QMP is a concrete transport for an abstract interface.  Configuration
files could be another transport.  CLI, too.

= Problem 3: Loadable modules =

QOM wasn't designed for loadable modules.  Support for them was grafted
on, and there are serious deficiencies.

Building a loadable module results in a DSO.  Additionally, module
meta-data necessary to load it is compiled into the executables that can
load modules.  Actually loading a module can fail, e.g. when the module
was not deployed.

Loadable modules are designed to be transparent, i.e. users don't need
to know whether a module is compiled in or loadable.

QOM types don't exist until the module is initialized.  Compiled-in
modules are initialized early in startup.  Loadable modules are
initialized on load.

QMP command qom-list-types returns all QOM types.  To be able to find
them all, it needs to load all modules.  Modules that cannot be found
(or have dependencies that cannot be found) are silently ignored.  Any
other loading errors are reported to stderr with error_report_err(),
which is inappropriate.  In either case, the types provided by the
unloadable modules are not returned by the command.

We have two functions to look up an object class by name:
object_class_by_name() and module_object_class_by_name().  The latter
attempts to load a module when the type doesn't exist.  Again, modules
that cannot be found are silently ignored, and other loading errors are
reported with error_report_err(), which is inappropriate in certain
contexts.

When to use which of the two functions is unclear.  Existing usage may
well be wrong.

The QOM functions to create objects in-place (object_initialize(), ...)
or on the heap (object_new(), ...) cannot fail.  This is just fine in
QOM's original design.  It is not fine when a loadable module fails to
load.  Since the functions can't fail, they exit(1) then.

This means things like a hot plugging a device provided by a loadable
module can crash a VM immediately.

Attempting to load all modules beforehand with qom-list-types does not
protect against this: we try to load again, fail again, and exit(1).

= Problem 4: The /machine/unattached/ orphanage =

Is it okay for a QOM object to have no parent?

An object without a parent is not part of the composition tree; it has
no canonical path, and object_get_canonical_path() returns null.

Such objects can behave in wonky ways.  For instance,
object_property_set_link() treats a target object without a parent as
null.  If a linked object somehow loses its parent,
object_property_get_link() will return null even though the underlying C
pointer still points to the poor orphan.

This strongly suggests QOM was designed with the assumption that objects
always have a parent, except during initialization (before they are
connected to anything) and finalization (when no longer connected to
anything).  object_property_try_add_child()'s contract seems to confirm
this:

 * Child properties form the composition tree.  All objects need to be a child
 * of another object.  Objects can only be a child of one object.

Some functions to create objects take the new object's parent as a
parameter.  Example: object_new_with_props(), object_new_with_propv(),
clock_new(), ...

Others set a fixed parent.  For instance, we always add character
backends to "/chardevs/", objects created with object-add in
"/objects/", devices created with device_add in "/machine/peripheral/"
(with ID) or "/machine/peripheral-anon/" (without ID), ...

There are also functions that don't set a parent: object_new(),
object_new_with_class(), qdev_new(), qdev_try_new(), ...  Setting a
parent is the callers job then.  Invites misuse.  I'm aware of one
instance: @current_migration remains without a parent forever.

Not all callers care to set a parent themselves.  Instead, they rely on
the "/machine/unattached/" orphanage:

* qdev_connect_gpio_out_named() needs the input pin to have a parent.
  If it lacks one, it gets added to "/machine/unattached/" with a
  made-up name.

* device_set_realized() ensures realized devices have a parent by adding
  devices lacking one to "/machine/unattached/" with a made-up name.

* portio_list_add() adds a memory region.  If the caller doesn't specify
  the parent, "/machine/unattached/" is assumed.

* memory_region_init() adds a memory region, and may set the parent.  If
  the caller requests setting a parent without specifying one,
  "/machine/unattached/" is assumed.

* qemu_create_machine() adds the main system bus to
  "/machine/unattached/".

Except for the last one, the child names depend on execution order.  For
instance, device_set_realized() uses "device[N]", where N counts up from
zero.

These brittle, made-up names are visible in QMP QOM introspection.
Whether that's a stable interface is unclear.  Better not.

We don't rely on these names in C.  We follow pointers instead.

When we replace C code by configuration, we switch from pointers to
names.  Brittle names become a problem.

= Problem 5: QOM lacks a clear life cycle =

QOM doesn't define a clear life cycle.

It has an implicit one:

    created ---------+
       |             |
       v             |
    parented <--+    |
       |        |    |
       v        |    |
    unparented -+    |
       |             |
       v             |
    destroyed <------+

I'm not aware of code that goes from "unparented" back to "parented".

Since the object becomes visible in the QOM graph at add to parent time,
object configuration (by setting properties) should probably be finished
then.

Some subtypes define their own life cycle.

Devices (subtypes of TYPE_DEVICE) go

    created ---------+
       |             |
       v             |
    realized <--+    |
       |        |    |
       v        |    |
    unrealized -+    |
       |             |
       v             |
    destroyed <------+

I'm pretty sure we don't actually go from "unrealized" back to
"realized".

The device is to be configured (by setting properties) in state created.

The transition to realized can fail.  When it does, we go to destroyed
immediately.

If the device has no QOM parent when we try to realize, we make one up
(see problem 4).  Unrealize automatically removes from parent.  So the
actual cycle is like

    created ------+
       |          |
       v          |
    parented      |
       |          |
       v          |
    realized      |
       |          |
       v          |
    unrealized    |
       |          |
       v          |
    unparented    |
       |          |
    destroyed <---+

We way want to refine the life cycle further, e.g. to include reset.

User-createable objects (objects that have interface
TYPE_USER_CREATABLE) go

    created -----+
       |         |
       v         |
    completed    |
       |         |
       v         |
    destroyed <--+

The object is to be configured (by setting properties) in state created.

The transition to complete can fail.  When it does, we go to destroyed
immediately.

The actual life cycle is

    created ------+
       |          |
       v          |
    parented      |
       |          |
       v          |
    completed     |
       |          |
       v          |
    unparented    |
       |          |
       v          |
    destroy <-----+

Somewhat related: machine init done notifiers let arbitrary code
(including object initialization register a callback to be run when
machine initialization completes.

Ideally, a composite object's components go through the life cycle
together.  First, create all the components and assign parents.  This
also creates all the properties.  Then configure the object by setting
property values.  Finally, complete / realize all components.

However, when the number or type of components depend on property
values, creation has to be delayed, possibly even until complete /
realize.  This complicates their configuration.  

Note that a machine is a (big) composite object.

For dynamic configuration, we likely want one useful life cycle for
everything, not one for devices, one for user-creatable objects, and a
not so useful one for everything else.

"Everything" will have to be more than what is available with -device
and -object now.

= Problem 6: QOM's object configuration interface =

QOM objects are configured by setting properties.

Properties have other uses, such as telemetry, control, and internal
versioning.  Properties are mostly undocumented, and their intended
purpose is commonly unclear.

Properties are added dynamically by C code.  In particular, setting a
property can add or delete properties.  This makes the configuration
interface dynamic.  Properties may need to be set in a certain order.
Such ordering constraints don't play well with declarative
configuration.

QOM type introspection can only report initial properties.  To find an
object's current properties, you need to introspect the object, not its
type.

The type information available via QOM introspection is mostly
undocumented, and much weaker than in QAPI/QMP introspection.

These introspection deficiencies can get in the way of more
sophisticated use of the interface.  Whether this affects declarative
machine specification is unclear.

Kevin Wolf proposed to move the configuration interface into the QAPI
schema, to make it compile-time static, introspectable, and to force us
to document it properly.

RFC patches:

    Subject: [RFC PATCH 00/12] QOM/QAPI integration part 1
    Date: Wed, 3 Nov 2021 18:29:50 +0100
    Message-Id: <20211103173002.209906-1-kwolf@redhat.com>

= Problem 7: Design of the machine specification DSL =

We want to specify machines and machine components in a declarative DSL.

From an abstract point of view, a machine or component is merely a graph
of QOM objects connected by child and link edges.  Objects and child
edges form the composition tree.

Such a composite object can be specified by listing its component
objects with their properties.  Special child and link properties
specify the edges.  All we need so far is a way to specify an object and
its properties, where special property values refer to other objects,
say by QOM path.

A composite object has in turn properties.  It could for instance expose
a property of one of its components.  It could also apply a scale
factor, or some other computation.  It could connect a single own
property to multiple component properties.  How can we specify all this?

For practical machine specification, we need more than just the ability
to specify objects.  The C code uses loops to create multiple similar
objects, and functions to build abstractions such as composite objects
or complex connections.  How can we address the same needs in a
declarative DSL?

Complication: life cycle management (see problem 5).  We need to manage
a state transition from "object created, but not ready for use" to
"object configured and ready for use".  In what order do the objects
change state?

Moreover, we have quite a few graphs in QEMU:

* QOM composition tree
* Memory tree
* qdev tree(s) (legacy)
* irq graph
* reset wiring (tree or graph?)
* more?

Not all of them are modelled in QOM, I fear.  How do we plan to deal
with that?

= Problem 8: Singletons =

When the same global symbol is defined in multiple targets, we can't
link the targets into a single binary.  For instance, we declare
kvm_arch_init() in include/sysemu/kvm.h, and define it in each target
that supports KVM.  This is a problem, but it's one the linker reliably
flags for us.

When a global symbol defined in target-independent code is used by
multiple targets, the linker gives us a single global shared by all
targets.  This sharing may or may not work.

Consider macro @first_cpu retrieves the first CPU from global variable
@cpus_queue.  Target code commonly QOM-casts this CPUState pointer to
the target CPU state pointer.  Works fine as long as the machine uses a
single target CPU class, even if the binary contains many of them.
However, a heterogeneous machine has more than one target CPU class.

Some QOM objects are singletons.  For instance, TYPE_ISA_BUS is due to
its use of global @isabus, and TYPE_I8259 is due to its use of globals
@isa_pic and @slave_pic.  isa_bus_new() catches attemnpts to create more
than one instance.  i8259_init() does not.  Instead, it overwrites the
globals.

External interfaces also contain singletons.  Consider -machine
kernel=FNAME.  Fine until we have a heterogeneous machine running more
than one kernel simultaneously.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Dynamic & heterogeneous machines, initial configuration: problems
  2024-01-31 20:14 Dynamic & heterogeneous machines, initial configuration: problems Markus Armbruster
@ 2024-02-01  8:06 ` Zhao Liu
  2024-02-05 12:47 ` Daniel P. Berrangé
  1 sibling, 0 replies; 3+ messages in thread
From: Zhao Liu @ 2024-02-01  8:06 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Philippe Mathieu-Daudé, Brian Cain, Warner Losh,
	Luc Michel, Bernhard Beschow, Paul Walmsley,
	Alessandro Di Federico, Mark Burton, Cédric Le Goater,
	Daniel P. Berrangé, Edgar E. Iglesias, LIU Zhiwei,
	Dr. David Alan Gilbert, Paolo Bonzini, Eduardo Habkost, Jim Shu,
	Richard Henderson, Alistair Francis, Alex Bennée,
	Anton Johansson

Hi Markus,

On Wed, Jan 31, 2024 at 09:14:21PM +0100, Markus Armbruster wrote:
> Date: Wed, 31 Jan 2024 21:14:21 +0100
> From: Markus Armbruster <armbru@redhat.com>
> Subject: Dynamic & heterogeneous machines, initial configuration: problems
> 
> This memo is the fruit of discussions with Philippe Mathieu-Daudé.
> Its errors are mine.
> 
> QEMU defines machines statically in C code.  We've long wished we could
> define them dynamically in some suitable DSL.  This is what we call
> "dynamic machines".
> 
> There's a need for machines that contain more than one target's CPUs.
> This is what we call "heterogeneous machines".  They require a single
> binary capable of any of the targets involved.
> 
> There's substantial overlap with a seemingly unrelated problem:
> machine-friendly initial configuration.
> 
> To keep the memo's length in check (sort of), it focuses on (known)
> problems.
> 
> 
> = Problem 1: Initial configuration =
> 
> Previously discussed in
> 
>     Subject: Redesign of QEMU startup & initial configuration
>     Date: Thu, 02 Dec 2021 07:57:38 +0100
>     Message-ID: <87lf13cx3x.fsf@dusky.pond.sub.org>
> 
> 
> == What users want for initial configuration ==
> 
> 1. QMP only
> 
>    Management applications need to use QMP for monitoring anyway.  They
>    may want to use it for initial configuration, too.  Libvirt does.
> 
>    They still need to bootstrap a QMP monitor, and for that, CLI is fine
>    as long as it's simple and stable.
> 
> 2. CLI and configuration files
> 
>    Human users want a CLI and configuration files.
> 
>    CLI is good for quick tweaks, and to explore.
> 
>    For more permanent, non-trivial configuration, configuration files
>    are more suitable, because they are easier to read, edit, and
>    document than long command lines.
> 
> 
> == What we have for initial configuration ==
> 
> Half of 1. and half of 2., satisfying nobody's needs.
> 
> Management applications need to do a lot of non-trivial initial
> configuration with the CLI.
> 
> Human users struggle with inconsistent syntax, insufficiently expressive
> configuration files, and huge command lines.
> 
> 
> = Problem 2: Defining machines =
> 
> This is how I understand the problem.  Please correct me where I'm off.
> 
> 
> == How we'd like to build machines ==
> 
> We want to build machines declaratively, by configuring devices and
> their connections.
> 
> We want to build composite devices the same way.
> 
> The non-composite devices are provided by the QEMU binary.
> 
> Users want to build machines as variations of canned machine types
> shipped with QEMU.  Some users may want to build their own machines from
> scratch.
> 
> To enable all this, machine configuration needs to be composable and
> dynamic.
> 
> Composable means configuration can be assembled from components,
> recursively.
> 
> Dynamic means it can be done during qemu-system-FOO initial
> configuration.
> 
> 
> == What we have for defining machines ==
> 
> A QEMU binary provides a fixed set of device types, some of them
> composite, and a fixed set of machine types.
> 
> Machines are QOM objects: instance of a concrete subtype of "machine".
> 
> Devices are usually QOM objects: instance of a concrete subtype of
> "device".  Exceptions remain in old code nobody can be bothered to
> update.
> 
> Both machine types and composite devices are built from devices
> by code, i.e. imperatively, not declaratively.
> 
> The code can be parameterized.  For QOM objects, parameters should be
> QOM properties, but machine type code additionally uses global old-style
> configuration such as -drive and -serial.
> 
> Code may create default backends for convenience.  Machine type code may
> also create backends to honor global old-style configuration.  Only some
> backends are QOM objects.
> 
> Machine types split their code between object creation (QOM methods
> .instance_init() and .instance_post_init()) and machine initialization
> (MachineClass method .init()).  However, basically everything is done in
> the latter.
> 
> QOM device types split their code between object creation and device
> realization (qdev method .realize()).  The actual split varies widely
> between devices.  Developers are commonly unsure what to put where.
> 
> After machine type code is done, the resulting machine can still be
> adjusted with device cold plug and unplug: -device, device_add,
> device_del.  Only works for a subset of the devices.
> 
> Related, but out of scope here: hot plug and unplug.
> 
> 
> = Common sub-problem: qemu-system initial startup =
> 
> QAPI/QMP is our most capable, flexible, and mature configuration
> interface.  We need to offer machine-friendly initial configuration via
> QMP, and we'd very much like to have a QAPI-based CLI and configuration
> files (see "What users want for initial configuration" above).
> 
> Dynamic machine configuration happens during initial startup.  This
> makes it part of the larger initial configuration problem.  We want an
> integrated solution for the larger configuration problem that includes
> machine configuration.
> 
> Traditionally, QMP becomes available quite late, long after machine
> initialization.  This precludes use of QMP for most parts of initial
> configuration, including dynamic machine configuration.
> 
> To enable a bit of machine configuration via QMP, experimental CLI
> option -preconfig delays part of initial startup including machine
> initialization by moving it into QMP command x-exit-preconfig.  Only
> selected commands (the ones marked 'allow-preconfig': true) are
> available before x-exit-preconfig.
> 
> To enable arbitrary configuration via QMP, we need to make it available
> before we complete configuring anything.
> 
> QMP is a concrete transport for an abstract interface.  Configuration
> files could be another transport.  CLI, too.
> 
> 
> = Problem 3: Loadable modules =
> 
> QOM wasn't designed for loadable modules.  Support for them was grafted
> on, and there are serious deficiencies.
> 
> Building a loadable module results in a DSO.  Additionally, module
> meta-data necessary to load it is compiled into the executables that can
> load modules.  Actually loading a module can fail, e.g. when the module
> was not deployed.
> 
> Loadable modules are designed to be transparent, i.e. users don't need
> to know whether a module is compiled in or loadable.
> 
> QOM types don't exist until the module is initialized.  Compiled-in
> modules are initialized early in startup.  Loadable modules are
> initialized on load.
> 
> QMP command qom-list-types returns all QOM types.  To be able to find
> them all, it needs to load all modules.  Modules that cannot be found
> (or have dependencies that cannot be found) are silently ignored.  Any
> other loading errors are reported to stderr with error_report_err(),
> which is inappropriate.  In either case, the types provided by the
> unloadable modules are not returned by the command.
> 
> We have two functions to look up an object class by name:
> object_class_by_name() and module_object_class_by_name().  The latter
> attempts to load a module when the type doesn't exist.  Again, modules
> that cannot be found are silently ignored, and other loading errors are
> reported with error_report_err(), which is inappropriate in certain
> contexts.
> 
> When to use which of the two functions is unclear.  Existing usage may
> well be wrong.
> 
> The QOM functions to create objects in-place (object_initialize(), ...)
> or on the heap (object_new(), ...) cannot fail.  This is just fine in
> QOM's original design.  It is not fine when a loadable module fails to
> load.  Since the functions can't fail, they exit(1) then.
> 
> This means things like a hot plugging a device provided by a loadable
> module can crash a VM immediately.
> 
> Attempting to load all modules beforehand with qom-list-types does not
> protect against this: we try to load again, fail again, and exit(1).
> 
> 
> = Problem 4: The /machine/unattached/ orphanage =
> 
> Is it okay for a QOM object to have no parent?
> 
> An object without a parent is not part of the composition tree; it has
> no canonical path, and object_get_canonical_path() returns null.
> 
> Such objects can behave in wonky ways.  For instance,
> object_property_set_link() treats a target object without a parent as
> null.  If a linked object somehow loses its parent,
> object_property_get_link() will return null even though the underlying C
> pointer still points to the poor orphan.
> 
> This strongly suggests QOM was designed with the assumption that objects
> always have a parent, except during initialization (before they are
> connected to anything) and finalization (when no longer connected to
> anything).  object_property_try_add_child()'s contract seems to confirm
> this:
> 
>  * Child properties form the composition tree.  All objects need to be a child
>  * of another object.  Objects can only be a child of one object.
> 
> Some functions to create objects take the new object's parent as a
> parameter.  Example: object_new_with_props(), object_new_with_propv(),
> clock_new(), ...
> 
> Others set a fixed parent.  For instance, we always add character
> backends to "/chardevs/", objects created with object-add in
> "/objects/", devices created with device_add in "/machine/peripheral/"
> (with ID) or "/machine/peripheral-anon/" (without ID), ...
> 
> There are also functions that don't set a parent: object_new(),
> object_new_with_class(), qdev_new(), qdev_try_new(), ...  Setting a
> parent is the callers job then.  Invites misuse.  I'm aware of one
> instance: @current_migration remains without a parent forever.
> 
> Not all callers care to set a parent themselves.  Instead, they rely on
> the "/machine/unattached/" orphanage:
> 
> * qdev_connect_gpio_out_named() needs the input pin to have a parent.
>   If it lacks one, it gets added to "/machine/unattached/" with a
>   made-up name.
> 
> * device_set_realized() ensures realized devices have a parent by adding
>   devices lacking one to "/machine/unattached/" with a made-up name.
> 
> * portio_list_add() adds a memory region.  If the caller doesn't specify
>   the parent, "/machine/unattached/" is assumed.
> 
> * memory_region_init() adds a memory region, and may set the parent.  If
>   the caller requests setting a parent without specifying one,
>   "/machine/unattached/" is assumed.
> 
> * qemu_create_machine() adds the main system bus to
>   "/machine/unattached/".
> 
> Except for the last one, the child names depend on execution order.  For
> instance, device_set_realized() uses "device[N]", where N counts up from
> zero.
> 
> These brittle, made-up names are visible in QMP QOM introspection.
> Whether that's a stable interface is unclear.  Better not.
> 
> We don't rely on these names in C.  We follow pointers instead.
> 
> When we replace C code by configuration, we switch from pointers to
> names.  Brittle names become a problem.
> 
> 
> = Problem 5: QOM lacks a clear life cycle =
> 
> QOM doesn't define a clear life cycle.
> 
> It has an implicit one:
> 
>     created ---------+
>        |             |
>        v             |
>     parented <--+    |
>        |        |    |
>        v        |    |
>     unparented -+    |
>        |             |
>        v             |
>     destroyed <------+
> 
> I'm not aware of code that goes from "unparented" back to "parented".
> 
> Since the object becomes visible in the QOM graph at add to parent time,
> object configuration (by setting properties) should probably be finished
> then.
> 
> Some subtypes define their own life cycle.
> 
> Devices (subtypes of TYPE_DEVICE) go
> 
>     created ---------+
>        |             |
>        v             |
>     realized <--+    |
>        |        |    |
>        v        |    |
>     unrealized -+    |
>        |             |
>        v             |
>     destroyed <------+
> 
> I'm pretty sure we don't actually go from "unrealized" back to
> "realized".
> 
> The device is to be configured (by setting properties) in state created.
> 
> The transition to realized can fail.  When it does, we go to destroyed
> immediately.
> 
> If the device has no QOM parent when we try to realize, we make one up
> (see problem 4).  Unrealize automatically removes from parent.  So the
> actual cycle is like
> 
>     created ------+
>        |          |
>        v          |
>     parented      |
>        |          |
>        v          |
>     realized      |
>        |          |
>        v          |
>     unrealized    |
>        |          |
>        v          |
>     unparented    |
>        |          |
>     destroyed <---+
> 
> We way want to refine the life cycle further, e.g. to include reset.
> 
> User-createable objects (objects that have interface
> TYPE_USER_CREATABLE) go
> 
>     created -----+
>        |         |
>        v         |
>     completed    |
>        |         |
>        v         |
>     destroyed <--+
> 
> The object is to be configured (by setting properties) in state created.
> 
> The transition to complete can fail.  When it does, we go to destroyed
> immediately.
> 
> The actual life cycle is
> 
>     created ------+
>        |          |
>        v          |
>     parented      |
>        |          |
>        v          |
>     completed     |
>        |          |
>        v          |
>     unparented    |
>        |          |
>        v          |
>     destroy <-----+
> 
> Somewhat related: machine init done notifiers let arbitrary code
> (including object initialization register a callback to be run when
> machine initialization completes.
> 
> Ideally, a composite object's components go through the life cycle
> together.  First, create all the components and assign parents.  This
> also creates all the properties.  Then configure the object by setting
> property values.  Finally, complete / realize all components.
> 
> However, when the number or type of components depend on property
> values, creation has to be delayed, possibly even until complete /
> realize.  This complicates their configuration.  
> 
> Note that a machine is a (big) composite object.
> 
> For dynamic configuration, we likely want one useful life cycle for
> everything, not one for devices, one for user-creatable objects, and a
> not so useful one for everything else.
> 
> "Everything" will have to be more than what is available with -device
> and -object now.
> 
> 
> = Problem 6: QOM's object configuration interface =
> 
> QOM objects are configured by setting properties.
> 
> Properties have other uses, such as telemetry, control, and internal
> versioning.  Properties are mostly undocumented, and their intended
> purpose is commonly unclear.
> 
> Properties are added dynamically by C code.  In particular, setting a
> property can add or delete properties.  This makes the configuration
> interface dynamic.  Properties may need to be set in a certain order.
> Such ordering constraints don't play well with declarative
> configuration.
> 
> QOM type introspection can only report initial properties.  To find an
> object's current properties, you need to introspect the object, not its
> type.
> 
> The type information available via QOM introspection is mostly
> undocumented, and much weaker than in QAPI/QMP introspection.
> 
> These introspection deficiencies can get in the way of more
> sophisticated use of the interface.  Whether this affects declarative
> machine specification is unclear.
> 
> Kevin Wolf proposed to move the configuration interface into the QAPI
> schema, to make it compile-time static, introspectable, and to force us
> to document it properly.
> 
> RFC patches:
> 
>     Subject: [RFC PATCH 00/12] QOM/QAPI integration part 1
>     Date: Wed, 3 Nov 2021 18:29:50 +0100
>     Message-Id: <20211103173002.209906-1-kwolf@redhat.com>
> 
> 
> = Problem 7: Design of the machine specification DSL =
> 
> We want to specify machines and machine components in a declarative DSL.
> 
> From an abstract point of view, a machine or component is merely a graph
> of QOM objects connected by child and link edges.  Objects and child
> edges form the composition tree.
> 
> Such a composite object can be specified by listing its component
> objects with their properties.  Special child and link properties
> specify the edges.  All we need so far is a way to specify an object and
> its properties, where special property values refer to other objects,
> say by QOM path.
> 
> A composite object has in turn properties.  It could for instance expose
> a property of one of its components.  It could also apply a scale
> factor, or some other computation.  It could connect a single own
> property to multiple component properties.  How can we specify all this?
> 
> For practical machine specification, we need more than just the ability
> to specify objects.  The C code uses loops to create multiple similar
> objects, and functions to build abstractions such as composite objects
> or complex connections.  How can we address the same needs in a
> declarative DSL?
> 
> Complication: life cycle management (see problem 5).  We need to manage
> a state transition from "object created, but not ready for use" to
> "object configured and ready for use".  In what order do the objects
> change state?
> 
> Moreover, we have quite a few graphs in QEMU:
> 
> * QOM composition tree
> * Memory tree
> * qdev tree(s) (legacy)
> * irq graph
> * reset wiring (tree or graph?)
> * more?
> 
> Not all of them are modelled in QOM, I fear.  How do we plan to deal
> with that?

Maybe and the graph for CPU topology.

CPU topology is not just an abstraction of the purely virtual topology
hierarchy, but also has a direct impact on how we should organize
heterogeneous cores and caches in a machine.

Since the heterogeneous machine I'm dealing with directly is the Intel
client CPUs (Alder Lake, Rapor Lake and Meteor Lake), with the same
target but very different performance/power efficiency cores. And though
the P core and E core for Intel hybrid CPU have the nearly identical
ISAs, they still have some difficience, e.g., their PMUs have different
events.

Therefore, I think this case should also be included in the
heterogeneous machine category, right?

About the CPU topology graph, I also posted some RFCs:
* The first try: [RFC 00/52] Introduce hybrid CPU topology
https://lore.kernel.org/qemu-devel/20230213095035.158240-1-zhao1.liu@linux.intel.com/

* The second try: [RFC 00/41] qom-topo: Abstract Everything about CPU
  Topology:
https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1.liu@linux.intel.com/

Unfortunately, however, no feedback was received on the second attempt.
I'd hope to be educated in the right direction as well. So hopefully
when discussing heterogeneous machines, the Intel hybrid CPU case
especially hybrid topology scenarios can also be taken into
consideration, which is the heterogeneous scenario we are trying to push
in virtualization.

Thanks,
Zhao

> 
> 
> = Problem 8: Singletons =
> 
> When the same global symbol is defined in multiple targets, we can't
> link the targets into a single binary.  For instance, we declare
> kvm_arch_init() in include/sysemu/kvm.h, and define it in each target
> that supports KVM.  This is a problem, but it's one the linker reliably
> flags for us.
> 
> When a global symbol defined in target-independent code is used by
> multiple targets, the linker gives us a single global shared by all
> targets.  This sharing may or may not work.
> 
> Consider macro @first_cpu retrieves the first CPU from global variable
> @cpus_queue.  Target code commonly QOM-casts this CPUState pointer to
> the target CPU state pointer.  Works fine as long as the machine uses a
> single target CPU class, even if the binary contains many of them.
> However, a heterogeneous machine has more than one target CPU class.
> 
> Some QOM objects are singletons.  For instance, TYPE_ISA_BUS is due to
> its use of global @isabus, and TYPE_I8259 is due to its use of globals
> @isa_pic and @slave_pic.  isa_bus_new() catches attemnpts to create more
> than one instance.  i8259_init() does not.  Instead, it overwrites the
> globals.
> 
> External interfaces also contain singletons.  Consider -machine
> kernel=FNAME.  Fine until we have a heterogeneous machine running more
> than one kernel simultaneously.
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Dynamic & heterogeneous machines, initial configuration: problems
  2024-01-31 20:14 Dynamic & heterogeneous machines, initial configuration: problems Markus Armbruster
  2024-02-01  8:06 ` Zhao Liu
@ 2024-02-05 12:47 ` Daniel P. Berrangé
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel P. Berrangé @ 2024-02-05 12:47 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Philippe Mathieu-Daudé, Brian Cain, Warner Losh,
	Luc Michel, Bernhard Beschow, Paul Walmsley,
	Alessandro Di Federico, Mark Burton, Cédric Le Goater,
	Edgar E. Iglesias, LIU Zhiwei, Dr. David Alan Gilbert,
	Paolo Bonzini, Eduardo Habkost, Jim Shu, Richard Henderson,
	Alistair Francis, Alex Bennée, Anton Johansson

On Wed, Jan 31, 2024 at 09:14:21PM +0100, Markus Armbruster wrote:
> == What users want for initial configuration ==
> 
> 1. QMP only
> 
>    Management applications need to use QMP for monitoring anyway.  They
>    may want to use it for initial configuration, too.  Libvirt does.
> 
>    They still need to bootstrap a QMP monitor, and for that, CLI is fine
>    as long as it's simple and stable.
> 
> 2. CLI and configuration files
> 
>    Human users want a CLI and configuration files.
> 
>    CLI is good for quick tweaks, and to explore.
> 
>    For more permanent, non-trivial configuration, configuration files
>    are more suitable, because they are easier to read, edit, and
>    document than long command lines.
> 
> 
> == What we have for initial configuration ==
> 
> Half of 1. and half of 2., satisfying nobody's needs.
> 
> Management applications need to do a lot of non-trivial initial
> configuration with the CLI.
> 
> Human users struggle with inconsistent syntax, insufficiently expressive
> configuration files, and huge command lines.

Our two sets of users (humans & machines) have pretty different
desires in many respects. To suit machines, we've made our config
more and more expressive & detailed. This has worked well for
machines. Humans have largely ignored most of it though, and
stuck with the massively simpler ("legacy") config approaches.

Every now & then though, humans are forced to use the modern low
level config to access some edge case feature not exposed in the
legacy syntax. Pain and suffering ensues.

I feel like we have become somewhat incapable of innovating on
features that are in the interests of humans, because our thought
processes get derailed by a desire to keep thing fully expressive
for machines, and the human areas of code are often the most
crufty with highest risk of breakage.

My wish is that when we switch to a new binary, we exclusively
focus on machines, and build a human focused frontend above
that, so we have clean separation, and we can do whatever we
thing is right for humans without being distracted by whether
machines can consume it or not, and vica-verca.

> = Problem 2: Defining machines =
> 
> This is how I understand the problem.  Please correct me where I'm off.
> 
> 
> == How we'd like to build machines ==
> 
> We want to build machines declaratively, by configuring devices and
> their connections.
> 
> We want to build composite devices the same way.
> 
> The non-composite devices are provided by the QEMU binary.
> 
> Users want to build machines as variations of canned machine types
> shipped with QEMU.  Some users may want to build their own machines from
> scratch.
> 
> To enable all this, machine configuration needs to be composable and
> dynamic.
> 
> Composable means configuration can be assembled from components,
> recursively.
> 
> Dynamic means it can be done during qemu-system-FOO initial
> configuration.
> 
> 
> == What we have for defining machines ==
> 
> A QEMU binary provides a fixed set of device types, some of them
> composite, and a fixed set of machine types.
> 
> Machines are QOM objects: instance of a concrete subtype of "machine".
> 
> Devices are usually QOM objects: instance of a concrete subtype of
> "device".  Exceptions remain in old code nobody can be bothered to
> update.
> 
> Both machine types and composite devices are built from devices
> by code, i.e. imperatively, not declaratively.
> 
> The code can be parameterized.  For QOM objects, parameters should be
> QOM properties, but machine type code additionally uses global old-style
> configuration such as -drive and -serial.
> 
> Code may create default backends for convenience.  Machine type code may
> also create backends to honor global old-style configuration.  Only some
> backends are QOM objects.

The default devices/backends in machines are an artifact of use trying
to do something which suits both humans and machines, while at the same
time hardcoding machine definitions.

Our '-nodefaults' hack is a gross solution to this problem.

An ability to dynamically define machines could give us a far more
attractive solution to this problem.

<handwaving>

eg we could have a  'q35-minimal.cfg'  configuration that defined
the minimal 'q35' machine type, along with a 'q35-recommended.cfg'
that added the typical extra devices, and possibly even a further
'q35-simple.cfg' that added the typical extra devices, alongw ith
typical extra backend connectivity.

Using a 'q35-minimal.cfg' the mgmt app would have to setup all
backends and extra devices explicitly as it saw fit.

Using a 'q35-recommended.cfg' the human would merely have to
provide backend configuration

Using a 'q35-simple.cfg' the human would merely need to provide
a disk image path.
</handwaving>

this would also solve our forever problem of "sensible defaults"
for RAM size, CPU model, etc being an undecidable problem. We
could have 'q35-tiny.cfg' and 'q35-huge.cfg', or any number of
other profile variants. Or we could have 'q35-windows.cfg' and
'q35-linux.cfg'.

We wouldn't have needed to create 'microvm' at all, Kata could
have just defined a suitable config themselves which was optimal
for their needs. 

Essentially once machine types are turned into data, instead of
code we improve life for humans and machines alike, and likely
eliminate entire classes of problems[1].

I wonder if machine types as data might also have a positive
impact on our migration compatibility support. Say we screw up
and break migrate compat between 2 QEMU releases. Fixing it
requires new code releases and builds. Fixing it with just a
data update may well be easier to consume. On the flip side,
however, if mgmt apps are maintaining their own configuration
for defining machines, they might have to take on full
responsibility for adding changes to their config ro preserve
ABI.

With regards,
Daniel

[1] And create ourselves a whole suite of entertaining new problems
    to worry about, which will be a refreshing change from the old
    problems we're all bored of by now :-)
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-02-05 12:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-31 20:14 Dynamic & heterogeneous machines, initial configuration: problems Markus Armbruster
2024-02-01  8:06 ` Zhao Liu
2024-02-05 12:47 ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).