From: David Gibson <david@gibson.dropbear.id.au>
To: Greg Kurz <groug@kaod.org>
Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
Cedric Le Goater <clg@kaod.org>
Subject: Re: [Qemu-devel] [PATCH v3 5/5] spapr: fix migration of ICPState objects from/to older QEMU
Date: Mon, 12 Jun 2017 22:24:56 +0800 [thread overview]
Message-ID: <20170612142456.GJ18542@umbus> (raw)
In-Reply-To: <20170608115410.2e7a2511@bahia.ttt.fr.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 11266 bytes --]
On Thu, Jun 08, 2017 at 11:54:10AM +0200, Greg Kurz wrote:
> On Thu, 8 Jun 2017 14:08:57 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Wed, Jun 07, 2017 at 07:17:26PM +0200, Greg Kurz wrote:
> > > Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
> > > sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
> > > This is an improvement since we no longer allocate ICPState objects
> > > that will never be used. But it has the side-effect of breaking
> > > migration of older machine types from older QEMU versions.
> > >
> > > This patch allows spapr to register dummy "icp/server" entries to vmstate.
> > > These entries use a dedicated VMStateDescription that can swallow and
> > > discard state of an incoming migration stream, and that don't send anything
> > > on outgoing migration.
> > >
> > > As for real ICPState objects, the instance_id is the cpu_index of the
> > > corresponding vCPU, which happens to be equal to the generated instance_id
> > > of older machine types.
> > >
> > > The machine can unregister/register these entries when CPUs are dynamically
> > > plugged/unplugged.
> > >
> > > This is only available for pseries-2.9 and older machines, thanks to a
> > > compat property.
> > >
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > ---
> > > v3: - new logic entirely implemented in hw/ppc/spapr.c
> > > ---
> > > hw/ppc/spapr.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++-
> > > include/hw/ppc/spapr.h | 2 +
> > > 2 files changed, 88 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 9b7ae28939a8..c15b604978f0 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -124,9 +124,52 @@ error:
> > > return NULL;
> > > }
> > >
> > > +static bool pre_2_10_vmstate_dummy_icp_needed(void *opaque)
> > > +{
> > > + return false;
> > > +}
> >
> > Uh.. the needed function always returns false, how can that work?
> >
>
> The needed function is used for outgoing migration only:
>
> bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque)
> {
> if (vmsd->needed && !vmsd->needed(opaque)) {
> /* optional section not needed */
> return false;
> }
> return true;
> }
>
> The idea is that all ICPState objects that were created but not associated
> to a vCPU by pre-2.10 machine types don't need to be migrated at all, as
> their state hasn't changed.
>
> We don't even create these unneeded ICPState objects here, but simply
> register their ids to vmstate.
>
> > > +
> > > +static const VMStateDescription pre_2_10_vmstate_dummy_icp = {
> > > + .name = "icp/server",
> > > + .version_id = 1,
> > > + .minimum_version_id = 1,
> > > + .needed = pre_2_10_vmstate_dummy_icp_needed,
>
> Outgoing migration:
> - machine in older QEMU have unused ICPState objects (default state)
> - machine in QEMU 2.10 doesn't even have extra ICPState objects
>
> => don't send anything
>
> > > + .fields = (VMStateField[]) {
> > > + VMSTATE_UNUSED(4), /* uint32_t xirr */
> > > + VMSTATE_UNUSED(1), /* uint8_t pending_priority */
> > > + VMSTATE_UNUSED(1), /* uint8_t mfrr */
>
> Incoming migration from older QEMU: we don't have the extra ICPState objects.
>
> => accept the state and discard it
>
> > > + VMSTATE_END_OF_LIST()
> > > + },
> > > +};
> > > +
> > > +static void pre_2_10_vmstate_register_dummy_icp(sPAPRMachineState *spapr, int i)
> > > +{
> > > + bool *flag = &spapr->pre_2_10_ignore_icp[i];
> > > +
> > > + g_assert(!*flag);
> >
> > Apart from this assert(), you never seem to test the values in the
> > pre_2_10_ignore_icp() array, so it seems a bit pointless.
> >
>
> There's the opposite check in pre_2_10_vmstate_unregister_dummy_icp().
> But I agree it isn't really useful... but more because of paranoia :)
I'm all for paranoid assert()s if they can be made using data readily
to hand. Adding a data structure just for the purpose of making an
assert() later, not so much.
> > > + vmstate_register(NULL, i, &pre_2_10_vmstate_dummy_icp, flag);
> > > + *flag = true;
> > > +}
> > > +
> > > +static void pre_2_10_vmstate_unregister_dummy_icp(sPAPRMachineState *spapr,
> > > + int i)
> > > +{
> > > + bool *flag = &spapr->pre_2_10_ignore_icp[i];
> > > +
> > > + g_assert(*flag);
> > > + vmstate_unregister(NULL, &pre_2_10_vmstate_dummy_icp, flag);
> > > + *flag = false;
> > > +}
> > > +
> > > +static inline int xics_nr_servers(void)
> >
> > Maybe a different name to emphasise that this is only used for the
> > backwards compat logic.
> >
>
> It is also used to compute the "ibm,interrupt-server-ranges" DT prop.
>
> /* /interrupt controller */
> spapr_dt_xics(xics_nr_servers(), fdt, PHANDLE_XICP);
Ah, good point. Maybe rename to "max server number" or something,
since "nr_servers" isn't really accurate any more.
> > > +{
> > > + return DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
> > > +}
> > > +
> > > static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
> > > {
> > > sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> > > + sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
> > >
> > > if (kvm_enabled()) {
> > > if (machine_kernel_irqchip_allowed(machine) &&
> > > @@ -148,6 +191,15 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
> > > return;
> > > }
> > > }
> > > +
> > > + if (smc->pre_2_10_has_unused_icps) {
> > > + int i;
> > > +
> > > + spapr->pre_2_10_ignore_icp = g_malloc(xics_nr_servers());
> > > + for (i = 0; i < xics_nr_servers(); i++) {
> > > + pre_2_10_vmstate_register_dummy_icp(spapr, i);
> >
> > This registers a dummy ICP for every slot, some of which will have
> > real cpus / icps. That doesn't seem right.
> >
>
> This is initialization, before we even have actual CPUs. We start with
> dummy ICPs for every slot, but they get replaced by real ICPs when we
> plug CPU cores...... (see below)
>
> > > + }
> > > + }
> > > }
> > >
> > > static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
> > > @@ -976,7 +1028,6 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> > > void *fdt;
> > > sPAPRPHBState *phb;
> > > char *buf;
> > > - int smt = kvmppc_smt_threads();
> > >
> > > fdt = g_malloc0(FDT_MAX_SIZE);
> > > _FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
> > > @@ -1016,7 +1067,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> > > _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
> > >
> > > /* /interrupt controller */
> > > - spapr_dt_xics(DIV_ROUND_UP(max_cpus * smt, smp_threads), fdt, PHANDLE_XICP);
> > > + spapr_dt_xics(xics_nr_servers(), fdt, PHANDLE_XICP);
> > >
> > > ret = spapr_populate_memory(spapr, fdt);
> > > if (ret < 0) {
> > > @@ -2800,9 +2851,24 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > Error **errp)
> > > {
> > > MachineState *ms = MACHINE(qdev_get_machine());
> > > + sPAPRMachineState *spapr = SPAPR_MACHINE(ms);
> > > CPUCore *cc = CPU_CORE(dev);
> > > CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
> > >
> > > + if (spapr->pre_2_10_ignore_icp) {
> > > + sPAPRCPUCore *sc = SPAPR_CPU_CORE(OBJECT(dev));
> > > + sPAPRCPUCoreClass *scc = SPAPR_CPU_CORE_GET_CLASS(OBJECT(cc));
> > > + const char *typename = object_class_get_name(scc->cpu_class);
> > > + size_t size = object_type_get_instance_size(typename);
> > > + int i;
> > > +
> > > + for (i = 0; i < cc->nr_threads; i++) {
> > > + CPUState *cs = CPU(sc->threads + i * size);
> > > +
> > > + pre_2_10_vmstate_register_dummy_icp(spapr, cs->cpu_index);
> > > + }
> > > + }
> > > +
> > > assert(core_slot);
> > > core_slot->cpu = NULL;
> > > object_unparent(OBJECT(dev));
> > > @@ -2912,6 +2978,21 @@ static void spapr_core_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > }
> > > }
> > > core_slot->cpu = OBJECT(dev);
> > > +
> > > + if (spapr->pre_2_10_ignore_icp) {
> > > + sPAPRCPUCoreClass *scc = SPAPR_CPU_CORE_GET_CLASS(OBJECT(cc));
> > > + const char *typename = object_class_get_name(scc->cpu_class);
> > > + size_t size = object_type_get_instance_size(typename);
> > > + int i;
> > > +
> > > + for (i = 0; i < cc->nr_threads; i++) {
> > > + sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
> > > + void *obj = sc->threads + i * size;
> > > +
> > > + cs = CPU(obj);
> > > + pre_2_10_vmstate_unregister_dummy_icp(spapr, cs->cpu_index);
>
> ...... here.
>
> The opposite happens in spapr_core_unplug().
>
> > > + }
> > > + }
> > > }
> > >
> > > static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > @@ -3361,9 +3442,12 @@ static void spapr_machine_2_9_instance_options(MachineState *machine)
> > >
> > > static void spapr_machine_2_9_class_options(MachineClass *mc)
> > > {
> > > + sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> > > +
> > > spapr_machine_2_10_class_options(mc);
> > > SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
> > > mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> > > + smc->pre_2_10_has_unused_icps = true;
> > > }
> > >
> > > DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index f973b0284596..64382623199d 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -53,6 +53,7 @@ struct sPAPRMachineClass {
> > > bool dr_lmb_enabled; /* enable dynamic-reconfig/hotplug of LMBs */
> > > bool use_ohci_by_default; /* use USB-OHCI instead of XHCI */
> > > const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
> > > + bool pre_2_10_has_unused_icps;
> > > void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
> > > uint64_t *buid, hwaddr *pio,
> > > hwaddr *mmio32, hwaddr *mmio64,
> > > @@ -90,6 +91,7 @@ struct sPAPRMachineState {
> > > sPAPROptionVector *ov5_cas; /* negotiated (via CAS) option vectors */
> > > bool cas_reboot;
> > > bool cas_legacy_guest_workaround;
> > > + bool *pre_2_10_ignore_icp;
> > >
> > > Notifier epow_notifier;
> > > QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> > >
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2017-06-12 14:25 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-07 17:16 [Qemu-devel] [PATCH v3 0/5] spapr/xics: fix migration of older machine types Greg Kurz
2017-06-07 17:16 ` [Qemu-devel] [PATCH v3 1/5] pnv_core: drop reference on ICPState object during CPU realization Greg Kurz
2017-06-07 17:49 ` Cédric Le Goater
2017-06-08 1:41 ` David Gibson
2017-06-07 17:17 ` [Qemu-devel] [PATCH v3 2/5] xics: add reset() handler to ICPStateClass Greg Kurz
2017-06-07 17:47 ` Cédric Le Goater
2017-06-08 1:44 ` David Gibson
2017-06-07 17:17 ` [Qemu-devel] [PATCH v3 3/5] xics: setup cpu at realize time Greg Kurz
2017-06-07 18:11 ` Cédric Le Goater
2017-06-07 20:55 ` Greg Kurz
2017-06-08 1:53 ` David Gibson
2017-06-08 9:14 ` Greg Kurz
2017-06-08 9:25 ` Cédric Le Goater
2017-06-08 9:59 ` Greg Kurz
2017-06-08 5:50 ` Cédric Le Goater
2017-06-08 8:54 ` Greg Kurz
2017-06-08 2:01 ` David Gibson
2017-06-08 8:45 ` Greg Kurz
2017-06-08 9:32 ` Cédric Le Goater
2017-06-09 2:24 ` David Gibson
2017-06-09 6:45 ` Greg Kurz
2017-06-09 9:43 ` David Gibson
2017-06-07 17:17 ` [Qemu-devel] [PATCH v3 4/5] xics: directly register ICPState objects to vmstate Greg Kurz
2017-06-07 18:14 ` Cédric Le Goater
2017-06-07 20:56 ` Greg Kurz
2017-06-08 3:59 ` David Gibson
2017-06-08 9:08 ` Greg Kurz
2017-06-07 17:17 ` [Qemu-devel] [PATCH v3 5/5] spapr: fix migration of ICPState objects from/to older QEMU Greg Kurz
2017-06-08 4:08 ` David Gibson
2017-06-08 9:54 ` Greg Kurz
2017-06-12 14:24 ` David Gibson [this message]
2017-06-13 7:33 ` Greg Kurz
2017-06-13 8:06 ` David Gibson
2017-06-13 8:40 ` Greg Kurz
2017-06-13 9:00 ` Dr. David Alan Gilbert
2017-06-13 9:21 ` Greg Kurz
2017-06-13 9:55 ` Dr. David Alan Gilbert
2017-06-13 10:05 ` Greg Kurz
2017-06-13 10:12 ` Dr. David Alan Gilbert
2017-06-13 10:35 ` Greg Kurz
2017-06-13 14:55 ` David Gibson
2017-06-13 10:01 ` David Gibson
2017-06-13 15:24 ` Greg Kurz
2017-06-14 1:40 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170612142456.GJ18542@umbus \
--to=david@gibson.dropbear.id.au \
--cc=clg@kaod.org \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.