* [Qemu-devel] [RFC v2 1/4] numa: split out NumaOptions parsing into parse_NumaOptions()
2017-12-28 17:22 [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP Igor Mammedov
@ 2017-12-28 17:22 ` Igor Mammedov
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 2/4] HMP: add set-numa-node command Igor Mammedov
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Igor Mammedov @ 2017-12-28 17:22 UTC (permalink / raw)
To: qemu-devel
Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell, pbonzini
it will allow to reuse parse_NumaOptions() for parsing
configuration commands received via QMP interface
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
include/sysemu/numa.h | 1 +
numa.c | 48 +++++++++++++++++++++++++++++-------------------
2 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index b354521..4621312 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -23,6 +23,7 @@ struct NumaNodeMem {
};
extern NodeInfo numa_info[MAX_NODES];
+int parse_numa(void *opaque, QemuOpts *opts, Error **errp);
void parse_numa_opts(MachineState *ms);
void query_numa_node_mem(NumaNodeMem node_mem[]);
extern QemuOptsList qemu_numa_opts;
diff --git a/numa.c b/numa.c
index 7b9c33a..d157961 100644
--- a/numa.c
+++ b/numa.c
@@ -168,28 +168,11 @@ static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
have_numa_distance = true;
}
-static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
+static
+void parse_NumaOptions(MachineState *ms, NumaOptions *object, Error **errp)
{
- NumaOptions *object = NULL;
- MachineState *ms = opaque;
Error *err = NULL;
- {
- Visitor *v = opts_visitor_new(opts);
- visit_type_NumaOptions(v, NULL, &object, &err);
- visit_free(v);
- }
-
- if (err) {
- goto end;
- }
-
- /* Fix up legacy suffix-less format */
- if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
- const char *mem_str = qemu_opt_get(opts, "mem");
- qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
- }
-
switch (object->type) {
case NUMA_OPTIONS_TYPE_NODE:
parse_numa_node(ms, &object->u.node, &err);
@@ -223,6 +206,33 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
}
end:
+ if (err) {
+ error_propagate(errp, err);
+ }
+}
+
+int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
+{
+ NumaOptions *object = NULL;
+ MachineState *ms = MACHINE(opaque);
+ Error *err = NULL;
+ Visitor *v = opts_visitor_new(opts);
+
+ visit_type_NumaOptions(v, NULL, &object, &err);
+ visit_free(v);
+ if (err) {
+ goto end;
+ }
+
+ /* Fix up legacy suffix-less format */
+ if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
+ const char *mem_str = qemu_opt_get(opts, "mem");
+ qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
+ }
+
+ parse_NumaOptions(ms, object, &err);
+
+end:
qapi_free_NumaOptions(object);
if (err) {
error_report_err(err);
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [RFC v2 2/4] HMP: add set-numa-node command
2017-12-28 17:22 [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP Igor Mammedov
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 1/4] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
@ 2017-12-28 17:22 ` Igor Mammedov
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 3/4] QMP: " Igor Mammedov
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Igor Mammedov @ 2017-12-28 17:22 UTC (permalink / raw)
To: qemu-devel
Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell, pbonzini
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
hmp.h | 1 +
hmp-commands.hx | 13 +++++++++++++
hmp.c | 23 +++++++++++++++++++++++
3 files changed, 37 insertions(+)
diff --git a/hmp.h b/hmp.h
index a6f56b1..d861038 100644
--- a/hmp.h
+++ b/hmp.h
@@ -147,5 +147,6 @@ void hmp_info_ramblock(Monitor *mon, const QDict *qdict);
void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict);
+void hmp_set_numa_node(Monitor *mon, const QDict *qdict);
#endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 6d5ebdf..17f8504 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1816,6 +1816,19 @@ Print QOM properties of object at location @var{path}
ETEXI
{
+ .name = "set-numa-node",
+ .args_type = "numa:O",
+ .params = "see -numa CLI option for possible options",
+ .help = "assign CPU to numa node",
+ .cmd = hmp_set_numa_node,
+ },
+
+STEXI
+@item qom-set @var{path} @var{property} @var{value}
+Set QOM property @var{property} of object at location @var{path} to value @var{value}
+ETEXI
+
+ {
.name = "qom-set",
.args_type = "path:s,property:s,value:s",
.params = "path property value",
diff --git a/hmp.c b/hmp.c
index 35a7041..c8ed910 100644
--- a/hmp.c
+++ b/hmp.c
@@ -43,6 +43,7 @@
#include "hw/intc/intc.h"
#include "migration/snapshot.h"
#include "migration/misc.h"
+#include "sysemu/numa.h"
#ifdef CONFIG_SPICE
#include <spice/enums.h>
@@ -2918,3 +2919,25 @@ void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict)
}
hmp_handle_error(mon, &err);
}
+
+void hmp_set_numa_node(Monitor *mon, const QDict *qdict)
+{
+ QemuOpts *opts;
+ Error *err = NULL;
+ MachineState *ms = MACHINE(qdev_get_machine());
+
+ opts = qemu_opts_from_qdict(qemu_find_opts("numa"), qdict, &err);
+ if (err) {
+ goto end;
+ }
+
+ parse_numa(ms, opts, &err);
+ if (err) {
+ goto end;
+ }
+
+end:
+ if (err) {
+ hmp_handle_error(mon, &err);
+ }
+}
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [RFC v2 3/4] QMP: add set-numa-node command
2017-12-28 17:22 [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP Igor Mammedov
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 1/4] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 2/4] HMP: add set-numa-node command Igor Mammedov
@ 2017-12-28 17:22 ` Igor Mammedov
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 4/4] numa: pc: reset machine if numa config has changed in prelaunch time Igor Mammedov
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Igor Mammedov @ 2017-12-28 17:22 UTC (permalink / raw)
To: qemu-devel
Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell, pbonzini
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
hw/core/machine.c | 1 +
numa.c | 5 +++++
qapi-schema.json | 13 +++++++++++++
3 files changed, 19 insertions(+)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index c857f3f..212dfec 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -21,6 +21,7 @@
#include "qemu/error-report.h"
#include "qemu/cutils.h"
#include "sysemu/qtest.h"
+#include "qmp-commands.h"
static char *machine_get_accel(Object *obj, Error **errp)
{
diff --git a/numa.c b/numa.c
index d157961..fd2bf1c 100644
--- a/numa.c
+++ b/numa.c
@@ -442,6 +442,11 @@ void parse_numa_opts(MachineState *ms)
}
}
+void qmp_set_numa_node(NumaOptions *cmd, Error **errp)
+{
+ parse_NumaOptions(MACHINE(qdev_get_machine()), cmd, errp);
+}
+
void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
{
int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
diff --git a/qapi-schema.json b/qapi-schema.json
index 5c06745..94ef197 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3189,3 +3189,16 @@
# Since: 2.11
##
{ 'command': 'watchdog-set-action', 'data' : {'action': 'WatchdogAction'} }
+
+##
+# @set-numa-node:
+#
+# Runtime equivalent of '-numa' CLI option, available at
+# preconfigure stage to configure numa mapping before initializing
+# machine.
+#
+# Since 2.10
+##
+{ 'command': 'set-numa-node', 'boxed': true,
+ 'data': 'NumaOptions'
+}
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [RFC v2 4/4] numa: pc: reset machine if numa config has changed in prelaunch time
2017-12-28 17:22 [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP Igor Mammedov
` (2 preceding siblings ...)
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 3/4] QMP: " Igor Mammedov
@ 2017-12-28 17:22 ` Igor Mammedov
2018-01-03 5:52 ` [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP David Gibson
2018-01-03 14:17 ` Markus Armbruster
5 siblings, 0 replies; 8+ messages in thread
From: Igor Mammedov @ 2017-12-28 17:22 UTC (permalink / raw)
To: qemu-devel
Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell, pbonzini
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
include/hw/boards.h | 1 +
hw/core/machine.c | 2 +-
hw/i386/pc.c | 1 +
numa.c | 14 +++++++++++---
vl.c | 4 ++++
5 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 156b16f..9b3ec6a 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -62,6 +62,7 @@ MachineClass *find_default_machine(void);
extern MachineState *current_machine;
void machine_run_board_init(MachineState *machine);
+void machine_numa_finish_init(MachineState *machine);
bool machine_usb(MachineState *machine);
bool machine_kernel_irqchip_allowed(MachineState *machine);
bool machine_kernel_irqchip_required(MachineState *machine);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 212dfec..120b7ca 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -701,7 +701,7 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
return g_string_free(s, false);
}
-static void machine_numa_finish_init(MachineState *machine)
+void machine_numa_finish_init(MachineState *machine)
{
int i;
bool default_mapping;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3fcf318..6c91554 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2244,6 +2244,7 @@ static void pc_machine_reset(void)
CPUState *cs;
X86CPU *cpu;
+ parse_numa_opts(MACHINE(qdev_get_machine()));
qemu_devices_reset();
/* Reset APIC after devices have been reset to cancel
diff --git a/numa.c b/numa.c
index fd2bf1c..83eaaea 100644
--- a/numa.c
+++ b/numa.c
@@ -53,7 +53,7 @@ static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
int nb_numa_nodes;
bool have_numa_distance;
NodeInfo numa_info[MAX_NODES];
-
+static bool numa_inited;
static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
Error **errp)
@@ -173,6 +173,11 @@ void parse_NumaOptions(MachineState *ms, NumaOptions *object, Error **errp)
{
Error *err = NULL;
+ if (numa_inited && runstate_check(RUN_STATE_PRELAUNCH)) {
+ qemu_system_reset_request(SHUTDOWN_CAUSE_HOST_QMP);
+ }
+ numa_inited = false;
+
switch (object->type) {
case NUMA_OPTIONS_TYPE_NODE:
parse_numa_node(ms, &object->u.node, &err);
@@ -352,9 +357,10 @@ void parse_numa_opts(MachineState *ms)
int i;
MachineClass *mc = MACHINE_GET_CLASS(ms);
- if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
- exit(1);
+ if (numa_inited) {
+ return;
}
+ numa_inited = true;
/*
* If memory hotplug is enabled (slots > 0) but without '-numa'
@@ -439,6 +445,8 @@ void parse_numa_opts(MachineState *ms)
/* Validation succeeded, now fill in any missing distances. */
complete_init_numa_distance();
}
+
+ machine_numa_finish_init(ms);
}
}
diff --git a/vl.c b/vl.c
index d3a5c5d..9e00604 100644
--- a/vl.c
+++ b/vl.c
@@ -4690,6 +4690,10 @@ int main(int argc, char **argv, char **envp)
current_machine->boot_order = boot_order;
current_machine->cpu_model = cpu_model;
+ if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa,
+ current_machine, NULL)) {
+ exit(1);
+ }
parse_numa_opts(current_machine);
/* parse features once if machine provides default cpu_type */
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP
2017-12-28 17:22 [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP Igor Mammedov
` (3 preceding siblings ...)
2017-12-28 17:22 ` [Qemu-devel] [RFC v2 4/4] numa: pc: reset machine if numa config has changed in prelaunch time Igor Mammedov
@ 2018-01-03 5:52 ` David Gibson
2018-01-03 14:17 ` Markus Armbruster
5 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2018-01-03 5:52 UTC (permalink / raw)
To: Igor Mammedov
Cc: qemu-devel, eblake, armbru, ehabkost, pkrempa, peter.maydell,
pbonzini
[-- Attachment #1: Type: text/plain, Size: 5084 bytes --]
On Thu, Dec 28, 2017 at 06:22:55PM +0100, Igor Mammedov wrote:
>
> As were suggested at (1) and at bof session where we discussed subj,
> I'm posting variant with late numa 'configuration' i.e. when QEMU is
> started with '-S' option in paused state and numa is configured via
> monitor/QMP before machine cpus are allowed to run.
>
> Suggested idea was to try 'late' numa configuration as it might result in
> shortcut approach allowing us reuse current pause point (-S) versus adding
> another preconfig option with earlier pause point.
> So this series tries to show how feasible this approach.
>
> Currently numa options mainly affect only firmware blobs (ACPI/FDT tables),
> it should have been possible to regenerate those blobs right before we start
> CPUs, which would allow us setup numa configuration at first pause point and
> get firmware blobs with updated numa information.
>
> Series implements idea for x86 ans spapr machines and uses machine reset,
> to reconfigure firmware and other machine structures after each numa
> configuration command (HMP or QMP).
>
> It was relatively not hard to implement for above machines as they already
> rebuild firmware blobs at reset time. But it still was a pain as QEMU isn't
> written with dynamic reconfiguration in mind and one need to update device
> state with new data (I think I've got it right but not 100% sure)
>
> However when it comes to the last target supporting NUMA, ARM
> all simplification versus v1 goes down the drain, since FDT blob is build
> incrementally during machine_init(), -device, machine_done() time, and
> it turns out into huge refactoring to isolate scattered FDT pieces into
> single FDT build function (like we do for ACPI). It's job that we would need
> to do anyways for hotplug to work properly on ARM,
Kind of irrelevant to this series, but I agree. pseries started out
with the FDT being almost static created at init time, with a few tiny
adjustments later on. But as the platform developed we needed to move
more and more of the FDT generation to later on (reset time, roughly).
For a long time we had an ugly split between the "skeleton" built at
init time and the stuff built at reset time, until I eventually moved
it all to reset time.
I'm pretty sure ARM will want the same thing, for hotplug as you
mention, but also for other things. I also think it'll save effort
over all to do it sooner rather than later.
I had stuff in the works for ages to make DT building easier,
including a full "live" DT model for qemu (fdt is a good format for
passing the DT from one unit to another, but it gets clunky to do lots
of manipulation with it). Unfortunately I've been sufficiently busy
with other things that I haven't really gotten anywhere with that for
the last year or more.
> but I don't think it
> should get in the way of numa refactoring.
> So that was the point where I gave up and decided to post only x86/spapr
> pieces for demo purposes.
Fair enough.
>
> I'm inclined towards avoiding 'v2 shortcut' and going in direction of v1,
> as I didn't see v2 as the right way in general, since one would have to:
> - build machine / connect / initalize / devices one way and then find out
> devices / connections that need to be fixed/updated with new configuration,
> it's very fragile and easy break.
>
> If I remember correctly the bof session, consensus was that we would like to have
> early configuration interface (like v1) in the end, so I'd rather send time
> on addressing v1 drawbacks instead of hacking machine init order to make numa work
> in backwards way.
>
> CC: eblake@redhat.com
> CC: armbru@redhat.com
> CC: ehabkost@redhat.com
> CC: pkrempa@redhat.com
> CC: david@gibson.dropbear.id.au
> CC: peter.maydell@linaro.org
> CC: pbonzini@redhat.com
>
> [1]
> v1 for reference:
> [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> https://lists.nongnu.org/archive/html/qemu-devel/2017-10/msg03583.html
>
> PS:
> exercise wasn't waste as it resulted in cleanups that were already merged.
>
>
> Igor Mammedov (4):
> numa: split out NumaOptions parsing into parse_NumaOptions()
> HMP: add set-numa-node command
> QMP: add set-numa-node command
> numa: pc: reset machine if numa config has changed in prelaunch time
>
> hmp.h | 1 +
> include/hw/boards.h | 1 +
> include/sysemu/numa.h | 1 +
> hmp-commands.hx | 13 +++++++++++
> hmp.c | 23 +++++++++++++++++++
> hw/core/machine.c | 3 ++-
> hw/i386/pc.c | 1 +
> numa.c | 63 +++++++++++++++++++++++++++++++++++----------------
> qapi-schema.json | 13 +++++++++++
> vl.c | 4 ++++
> 10 files changed, 102 insertions(+), 21 deletions(-)
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP
2017-12-28 17:22 [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP Igor Mammedov
` (4 preceding siblings ...)
2018-01-03 5:52 ` [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP David Gibson
@ 2018-01-03 14:17 ` Markus Armbruster
2018-01-08 8:35 ` Igor Mammedov
5 siblings, 1 reply; 8+ messages in thread
From: Markus Armbruster @ 2018-01-03 14:17 UTC (permalink / raw)
To: Igor Mammedov
Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, pbonzini, david
Igor Mammedov <imammedo@redhat.com> writes:
> As were suggested at (1) and at bof session where we discussed subj,
> I'm posting variant with late numa 'configuration' i.e. when QEMU is
> started with '-S' option in paused state and numa is configured via
> monitor/QMP before machine cpus are allowed to run.
>
> Suggested idea was to try 'late' numa configuration as it might result in
> shortcut approach allowing us reuse current pause point (-S) versus adding
> another preconfig option with earlier pause point.
> So this series tries to show how feasible this approach.
>
> Currently numa options mainly affect only firmware blobs (ACPI/FDT tables),
> it should have been possible to regenerate those blobs right before we start
> CPUs, which would allow us setup numa configuration at first pause point and
> get firmware blobs with updated numa information.
>
> Series implements idea for x86 ans spapr machines and uses machine reset,
> to reconfigure firmware and other machine structures after each numa
> configuration command (HMP or QMP).
>
> It was relatively not hard to implement for above machines as they already
> rebuild firmware blobs at reset time. But it still was a pain as QEMU isn't
> written with dynamic reconfiguration in mind and one need to update device
> state with new data (I think I've got it right but not 100% sure)
>
> However when it comes to the last target supporting NUMA, ARM
> all simplification versus v1 goes down the drain, since FDT blob is build
> incrementally during machine_init(), -device, machine_done() time, and
> it turns out into huge refactoring to isolate scattered FDT pieces into
> single FDT build function (like we do for ACPI). It's job that we would need
> to do anyways for hotplug to work properly on ARM, but I don't think it
> should get in the way of numa refactoring.
> So that was the point where I gave up and decided to post only x86/spapr
> pieces for demo purposes.
>
> I'm inclined towards avoiding 'v2 shortcut' and going in direction of v1,
> as I didn't see v2 as the right way in general, since one would have to:
> - build machine / connect / initalize / devices one way and then find out
> devices / connections that need to be fixed/updated with new configuration,
> it's very fragile and easy break.
>
> If I remember correctly the bof session, consensus was that we would like to have
> early configuration interface (like v1) in the end, so I'd rather send time
> on addressing v1 drawbacks instead of hacking machine init order to make numa work
> in backwards way.
It's been a while... Can you summarize v1 and its drawbacks?
> CC: eblake@redhat.com
> CC: armbru@redhat.com
> CC: ehabkost@redhat.com
> CC: pkrempa@redhat.com
> CC: david@gibson.dropbear.id.au
> CC: peter.maydell@linaro.org
> CC: pbonzini@redhat.com
>
> [1]
> v1 for reference:
> [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> https://lists.nongnu.org/archive/html/qemu-devel/2017-10/msg03583.html
>
> PS:
> exercise wasn't waste as it resulted in cleanups that were already merged.
Good :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [RFC v2 0/4] enable numa configuration before machine is running from HMP/QMP
2018-01-03 14:17 ` Markus Armbruster
@ 2018-01-08 8:35 ` Igor Mammedov
0 siblings, 0 replies; 8+ messages in thread
From: Igor Mammedov @ 2018-01-08 8:35 UTC (permalink / raw)
To: Markus Armbruster
Cc: peter.maydell, pkrempa, ehabkost, qemu-devel, pbonzini, david
On Wed, 03 Jan 2018 15:17:49 +0100
Markus Armbruster <armbru@redhat.com> wrote:
> Igor Mammedov <imammedo@redhat.com> writes:
>
> > As were suggested at (1) and at bof session where we discussed subj,
> > I'm posting variant with late numa 'configuration' i.e. when QEMU is
> > started with '-S' option in paused state and numa is configured via
> > monitor/QMP before machine cpus are allowed to run.
> >
> > Suggested idea was to try 'late' numa configuration as it might result in
> > shortcut approach allowing us reuse current pause point (-S) versus adding
> > another preconfig option with earlier pause point.
> > So this series tries to show how feasible this approach.
> >
> > Currently numa options mainly affect only firmware blobs (ACPI/FDT tables),
> > it should have been possible to regenerate those blobs right before we start
> > CPUs, which would allow us setup numa configuration at first pause point and
> > get firmware blobs with updated numa information.
> >
> > Series implements idea for x86 ans spapr machines and uses machine reset,
> > to reconfigure firmware and other machine structures after each numa
> > configuration command (HMP or QMP).
> >
> > It was relatively not hard to implement for above machines as they already
> > rebuild firmware blobs at reset time. But it still was a pain as QEMU isn't
> > written with dynamic reconfiguration in mind and one need to update device
> > state with new data (I think I've got it right but not 100% sure)
> >
> > However when it comes to the last target supporting NUMA, ARM
> > all simplification versus v1 goes down the drain, since FDT blob is build
> > incrementally during machine_init(), -device, machine_done() time, and
> > it turns out into huge refactoring to isolate scattered FDT pieces into
> > single FDT build function (like we do for ACPI). It's job that we would need
> > to do anyways for hotplug to work properly on ARM, but I don't think it
> > should get in the way of numa refactoring.
> > So that was the point where I gave up and decided to post only x86/spapr
> > pieces for demo purposes.
> >
> > I'm inclined towards avoiding 'v2 shortcut' and going in direction of v1,
> > as I didn't see v2 as the right way in general, since one would have to:
> > - build machine / connect / initalize / devices one way and then find out
> > devices / connections that need to be fixed/updated with new configuration,
> > it's very fragile and easy break.
> >
> > If I remember correctly the bof session, consensus was that we would like to have
> > early configuration interface (like v1) in the end, so I'd rather send time
> > on addressing v1 drawbacks instead of hacking machine init order to make numa work
> > in backwards way.
>
> It's been a while... Can you summarize v1 and its drawbacks?
[...]
Goal of v1 and this series is to provide way to configure NUMA
mappings before guest starts to run, for this we need map
possible cpus to numa nodes. List of possible CPUs and
their address properties (socket|core|thread-ids) and
corresponding values are a function of (-M + -smp) options
that could be currently fetched with query-hotpluggable-cpus.
This series 'demo' way where it's done at '-S' pause time
(right before CPUs start running) and v1 did this before
calling mc->machine_init() but when -M and -smp were already
processed.
v1 was adding new '-paused [state=]postconf|preconf' CLI option,
where:
- postconf: equivalent of '-S' option, pausing QEMU after
machine_done and right before CPUs start to run
- preconf: new paused state for QEMU, right before board specific
machine_init callback is run by machine_run_board_init()
New 'preconf' state would allow to define NUMA mapping early
using query-hotpluggable-cpus/set-numa-node commands so that
board code will have all necessary data when machine is build
during machine_init => devices init => machine_done stages
without need to refactor boards code to fixup not properly
configured state later like v2 series does.
About drawbacks:
- users would need to add new option handling
- new QEMU state to deal with, accessible via QMP/HMP to users
when machine is not yet initialized.
- v1 blindly exposes all QMP commands at pause point
and most of them won't work or will crash QEMU.
I considered adding early/late white/black lists,
but that's not really maintainable. It would be
better if there were a way to specify directly in
QAPI schema at which stage commands are allowed to run,
so it would be introspectable.
- dynamic configuration might be not usable/desirable for
one-time guests (guest-fish, virt-sandbox) as it might add up
to startup delay. But honestly such usecases can continue
using pure CLI, we are not removing CLI after all.
There were a bunch of ideas discussed/suggested during v1:
- use preconfig stage for other commands as well,
including ability to pick machine and configure it
step by step using QMP.
It would be a large complex rework and probably could
done incrementally, opening refactored QMP commands to
preconfig stage.
So questions here would be:
- is it possible to move 'preconfig' pause point to
earlier point later without breaking being introduced
set-numa-node and query-hotpluggble-cpus commands.
As shortcut it could be a check for machine existence
and cleanly error out saying that machine should be
created first.
- provide a stable interface that would work even if we
move 'preconfig' pause point to earlier stages.
maybe it's possible to add command like:
set-cli-option ....
instead of specialized ones like I did with 'set-numa-node'
- provide some sort of command dependency checks so
commands will error out cleanly when QEMU is not in
a state they are expecting it to be.
- I'm omitting Daniel's suggestion which suggested to drop
configuration at runtime altogether and use fixed set
of properties/values to specify CPU's addresses/slots,
so that libvirt could make up CLI on its own without
introspecting QEMU first.
> > v1 for reference:
> > [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > https://lists.nongnu.org/archive/html/qemu-devel/2017-10/msg03583.html
[...]
^ permalink raw reply [flat|nested] 8+ messages in thread