qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>,
	lvivier@redhat.com, aik@ozlabs.ru, Greg Kurz <groug@kaod.org>,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription
Date: Wed, 13 Sep 2017 17:03:38 +1000	[thread overview]
Message-ID: <20170913070338.GH7550@umbus.fritz.box> (raw)
In-Reply-To: <20170911171953.GE2150@work-vm>

[-- Attachment #1: Type: text/plain, Size: 5315 bytes --]

On Mon, Sep 11, 2017 at 06:19:54PM +0100, Dr. David Alan Gilbert wrote:
> * Mark Cave-Ayland (mark.cave-ayland@ilande.co.uk) wrote:
> > On 11/09/17 11:48, David Gibson wrote:
> > 
> > > On Mon, Sep 11, 2017 at 10:30:33AM +0100, Dr. David Alan Gilbert wrote:
> > >> * Greg Kurz (groug@kaod.org) wrote:
> > >>> On Sun, 10 Sep 2017 15:37:33 +0100
> > >>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> wrote:
> > >>>
> > >>>> Commit a90db15 "target-ppc: Convert ppc cpu savevm to VMStateDescription"
> > >>>> appears to drop the internal CPU IRQ state from the migration stream. Whilst
> > >>>> testing migration on g3beige/mac99 machines, test images would randomly fail to
> > >>>> resume unless a key was pressed on the VGA console.
> > >>>>
> > >>>> Further investigation suggests that internal CPU IRQ state isn't being
> > >>>> preserved and so interrupts asserted at the time of migration are lost. Adding
> > >>>> the pending_interrupts and irq_input_state fields back into the migration
> > >>>> stream appears to fix the problem here during local tests.
> > >>>>
> > >>>> As part of this commit we bump the vmstate_ppc version from 5 to 6 to handle
> > >>>> the additional fields.
> > >>>>
> > >>>
> > >>> And so this unconditionally breaks backward migration... what about adding
> > >>> a subsection for this ?
> > >>
> > >> and wiring it to a flag on the machine type so that older machine types
> > >> don't send it.
> > > 
> > > Right, a subsection is certainly necessary to avoid breaking backwards
> > > migration.
> > 
> > The suggestion of using the VMSTATE_*_V macros with an increased version
> > number came from Alexey's original review of the patch many months ago
> > which is why I did it that way.
> > 
> > Out of curiosity though, what is the criteria for supporting backwards
> > migration? Obviously forward migration is supported as-is in this
> > manner, so what determines if a patch needs to be backwards compatible
> > and how far?
> 
> It's a bit fuzzy.  Downstream we do backwards migration between various
> versions - and those versions are pretty arbitrarily chosen.
> Generally I prefer if we don't break that upstream either, although
> it isn't tested much.

Right.  In this case not breaking backwards migration upstream is
essentially a favour to downstream, since unbreaking it downstream
once its broken upstream is a real PITA.

> If it was in code that was specific to your g3beige I wouldn't mind;
> but for ppc in general then if it breaks the server migration it'll
> be a pain we'd have to then fix.  Best to keep it working upstream.

Right.

> But it's fairly easy to put new fields in a subsection and tie it
> to a property;  that makes it easy to switch it on/off in machine
> types.

In this case I'm not sure we even need a property - I think we could
migrate it only when it's non-zero.  That shold only break it in cases
where actually it would already break.

> > > But apart from that I want to understand better exactly why this is
> > > necessary.  What's the state that's being lost, and is it really not
> > > recoverable from anywhere else.
> > 
> > The test case I have is installing Darwin PPC 6.02 with qemu-system-ppc
> > TCG and repeatedly pausing, executing "savevm foo", then quitting and
> > continuing with "-loadvm foo" during the installation phase. About 1 in
> > 10 times the installer hangs after the loadvm until I press a key, at
> > which point it carries on as normal.
> > 
> > I then proceeded to going backwards through the git history until I
> > found out that it was the removal of the pending_interrupts,
> > irq_input_state and access_type fields during the conversion to
> > VMStateDescription commit a90db15 which seemed to cause the problem.
> > 
> > > The other thing that concerns me is how we're encoding the
> > > information.  These are essentially internal fields, not reflecting
> > > something with an architected encoding - adding those to the migration
> > > stream is often a bad idea - it inhibits our ability to rework
> > > internal encodings.
> > 
> > I'm not sure how this should be managed, however there was a similar
> > issue with excp_prefix (which was also removed in a90db15) that was
> > fixed in 2360b6e by calling a helper in cpu_post_load(). I can't easily
> > see how could work with env.pending_interrupts and env.irq_input_state
> > though.
> 
> Without knowing anything about this hardware... generally the migration
> stream should reflect the real state of the system rather than internal
> implementation detail, that way if you change the implementation you
> don't need to fudge the state.  Having said that, there's generally
> some internal state that's perhaps not immediately obvious from specs
> until you try and implement it.

Right.  I'm not really sure how to handle this yet.  The CPU irq
numbers are pretty much arbitrarily assigned, I don't think much
thought has gone into them.  And if its going to become part of the
migration ABI, some thought needs to be put into it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2017-09-13  7:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-10 14:37 [Qemu-devel] [PATCH 0/4] ppc: migration fixes for TCG Mark Cave-Ayland
2017-09-10 14:37 ` [Qemu-devel] [PATCH 1/4] ppc: change CPUPPCState access_type from int to uint8_t Mark Cave-Ayland
2017-09-10 16:30   ` Laurent Vivier
2017-09-10 18:00     ` Mark Cave-Ayland
2017-09-10 14:37 ` [Qemu-devel] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription Mark Cave-Ayland
2017-09-11  7:50   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-09-11  9:30     ` Dr. David Alan Gilbert
2017-09-11 10:48       ` David Gibson
2017-09-11 16:46         ` Mark Cave-Ayland
2017-09-11 17:19           ` Dr. David Alan Gilbert
2017-09-13  7:03             ` David Gibson [this message]
2017-09-12 16:21         ` Dr. David Alan Gilbert
2017-09-12 16:41           ` Mark Cave-Ayland
2017-09-12 16:46             ` Mark Cave-Ayland
2017-09-13  2:23               ` Alexey Kardashevskiy
2017-09-13  6:02                 ` David Gibson
2017-09-13 16:44                   ` Mark Cave-Ayland
2017-09-13 17:13                     ` Greg Kurz
2017-09-14  3:48                       ` David Gibson
2017-09-14  3:30                     ` David Gibson
2017-09-10 14:37 ` [Qemu-devel] [PATCH 3/4] ppc: add CPU access_type into the migration stream Mark Cave-Ayland
2017-09-11 10:57   ` David Gibson
2017-09-11 16:52     ` Mark Cave-Ayland
2017-09-13  7:19       ` David Gibson
2017-09-13 17:17         ` Mark Cave-Ayland
2017-09-14  3:54           ` David Gibson
2017-09-10 14:37 ` [Qemu-devel] [PATCH 4/4] ppc: ensure we update the decrementer value during migration Mark Cave-Ayland
2017-09-13  7:12   ` David Gibson
2017-09-13 17:11     ` Mark Cave-Ayland
2017-09-13 17:58       ` Laurent Vivier
2017-09-14  3:52         ` David Gibson
2017-09-15 12:45           ` Mark Cave-Ayland
2017-09-19  8:36             ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170913070338.GH7550@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=dgilbert@redhat.com \
    --cc=groug@kaod.org \
    --cc=lvivier@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).