From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: David Gibson <david@gibson.dropbear.id.au>,
lvivier@redhat.com, aik@ozlabs.ru, Greg Kurz <groug@kaod.org>,
qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription
Date: Mon, 11 Sep 2017 18:19:54 +0100 [thread overview]
Message-ID: <20170911171953.GE2150@work-vm> (raw)
In-Reply-To: <cc212d72-1eff-86ae-1e34-fdc1636015e5@ilande.co.uk>
* Mark Cave-Ayland (mark.cave-ayland@ilande.co.uk) wrote:
> On 11/09/17 11:48, David Gibson wrote:
>
> > On Mon, Sep 11, 2017 at 10:30:33AM +0100, Dr. David Alan Gilbert wrote:
> >> * Greg Kurz (groug@kaod.org) wrote:
> >>> On Sun, 10 Sep 2017 15:37:33 +0100
> >>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> wrote:
> >>>
> >>>> Commit a90db15 "target-ppc: Convert ppc cpu savevm to VMStateDescription"
> >>>> appears to drop the internal CPU IRQ state from the migration stream. Whilst
> >>>> testing migration on g3beige/mac99 machines, test images would randomly fail to
> >>>> resume unless a key was pressed on the VGA console.
> >>>>
> >>>> Further investigation suggests that internal CPU IRQ state isn't being
> >>>> preserved and so interrupts asserted at the time of migration are lost. Adding
> >>>> the pending_interrupts and irq_input_state fields back into the migration
> >>>> stream appears to fix the problem here during local tests.
> >>>>
> >>>> As part of this commit we bump the vmstate_ppc version from 5 to 6 to handle
> >>>> the additional fields.
> >>>>
> >>>
> >>> And so this unconditionally breaks backward migration... what about adding
> >>> a subsection for this ?
> >>
> >> and wiring it to a flag on the machine type so that older machine types
> >> don't send it.
> >
> > Right, a subsection is certainly necessary to avoid breaking backwards
> > migration.
>
> The suggestion of using the VMSTATE_*_V macros with an increased version
> number came from Alexey's original review of the patch many months ago
> which is why I did it that way.
>
> Out of curiosity though, what is the criteria for supporting backwards
> migration? Obviously forward migration is supported as-is in this
> manner, so what determines if a patch needs to be backwards compatible
> and how far?
It's a bit fuzzy. Downstream we do backwards migration between various
versions - and those versions are pretty arbitrarily chosen.
Generally I prefer if we don't break that upstream either, although
it isn't tested much.
If it was in code that was specific to your g3beige I wouldn't mind;
but for ppc in general then if it breaks the server migration it'll
be a pain we'd have to then fix. Best to keep it working upstream.
But it's fairly easy to put new fields in a subsection and tie it
to a property; that makes it easy to switch it on/off in machine
types.
> > But apart from that I want to understand better exactly why this is
> > necessary. What's the state that's being lost, and is it really not
> > recoverable from anywhere else.
>
> The test case I have is installing Darwin PPC 6.02 with qemu-system-ppc
> TCG and repeatedly pausing, executing "savevm foo", then quitting and
> continuing with "-loadvm foo" during the installation phase. About 1 in
> 10 times the installer hangs after the loadvm until I press a key, at
> which point it carries on as normal.
>
> I then proceeded to going backwards through the git history until I
> found out that it was the removal of the pending_interrupts,
> irq_input_state and access_type fields during the conversion to
> VMStateDescription commit a90db15 which seemed to cause the problem.
>
> > The other thing that concerns me is how we're encoding the
> > information. These are essentially internal fields, not reflecting
> > something with an architected encoding - adding those to the migration
> > stream is often a bad idea - it inhibits our ability to rework
> > internal encodings.
>
> I'm not sure how this should be managed, however there was a similar
> issue with excp_prefix (which was also removed in a90db15) that was
> fixed in 2360b6e by calling a helper in cpu_post_load(). I can't easily
> see how could work with env.pending_interrupts and env.irq_input_state
> though.
Without knowing anything about this hardware... generally the migration
stream should reflect the real state of the system rather than internal
implementation detail, that way if you change the implementation you
don't need to fudge the state. Having said that, there's generally
some internal state that's perhaps not immediately obvious from specs
until you try and implement it.
Dave
> ATB,
>
> Mark.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-09-11 17:20 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-10 14:37 [Qemu-devel] [PATCH 0/4] ppc: migration fixes for TCG Mark Cave-Ayland
2017-09-10 14:37 ` [Qemu-devel] [PATCH 1/4] ppc: change CPUPPCState access_type from int to uint8_t Mark Cave-Ayland
2017-09-10 16:30 ` Laurent Vivier
2017-09-10 18:00 ` Mark Cave-Ayland
2017-09-10 14:37 ` [Qemu-devel] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription Mark Cave-Ayland
2017-09-11 7:50 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-09-11 9:30 ` Dr. David Alan Gilbert
2017-09-11 10:48 ` David Gibson
2017-09-11 16:46 ` Mark Cave-Ayland
2017-09-11 17:19 ` Dr. David Alan Gilbert [this message]
2017-09-13 7:03 ` David Gibson
2017-09-12 16:21 ` Dr. David Alan Gilbert
2017-09-12 16:41 ` Mark Cave-Ayland
2017-09-12 16:46 ` Mark Cave-Ayland
2017-09-13 2:23 ` Alexey Kardashevskiy
2017-09-13 6:02 ` David Gibson
2017-09-13 16:44 ` Mark Cave-Ayland
2017-09-13 17:13 ` Greg Kurz
2017-09-14 3:48 ` David Gibson
2017-09-14 3:30 ` David Gibson
2017-09-10 14:37 ` [Qemu-devel] [PATCH 3/4] ppc: add CPU access_type into the migration stream Mark Cave-Ayland
2017-09-11 10:57 ` David Gibson
2017-09-11 16:52 ` Mark Cave-Ayland
2017-09-13 7:19 ` David Gibson
2017-09-13 17:17 ` Mark Cave-Ayland
2017-09-14 3:54 ` David Gibson
2017-09-10 14:37 ` [Qemu-devel] [PATCH 4/4] ppc: ensure we update the decrementer value during migration Mark Cave-Ayland
2017-09-13 7:12 ` David Gibson
2017-09-13 17:11 ` Mark Cave-Ayland
2017-09-13 17:58 ` Laurent Vivier
2017-09-14 3:52 ` David Gibson
2017-09-15 12:45 ` Mark Cave-Ayland
2017-09-19 8:36 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170911171953.GE2150@work-vm \
--to=dgilbert@redhat.com \
--cc=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=groug@kaod.org \
--cc=lvivier@redhat.com \
--cc=mark.cave-ayland@ilande.co.uk \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).