qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: peter.maydell@linaro.org, John Snow <jsnow@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 3/8] fdc: Introduce fdctrl->phase
Date: Fri, 29 May 2015 11:11:26 +0200	[thread overview]
Message-ID: <20150529091126.GC3804@noname.redhat.com> (raw)
In-Reply-To: <20150529083309.GB2127@work-vm>

Am 29.05.2015 um 10:33 hat Dr. David Alan Gilbert geschrieben:
> * Markus Armbruster (armbru@redhat.com) wrote:
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> > 
> > > * John Snow (jsnow@redhat.com) wrote:
> > >> 
> > >> 
> > >> On 05/21/2015 09:19 AM, Kevin Wolf wrote:
> > >> > The floppy controller spec describes three different controller phases,
> > >> > which are currently not explicitly modelled in our emulation. Instead,
> > >> > each phase is represented by a combination of flags in registers.
> > >> > 
> > >> > This patch makes explicit in which phase the controller currently is.
> > >> > 
> > >> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> 
> <snip>
> 
> > >> > +static bool fdc_phase_needed(void *opaque)
> > >> > +{
> > >> > +    FDCtrl *fdctrl = opaque;
> > >> > +
> > >> > +    return reconstruct_phase(fdctrl) != fdctrl->phase;
> > >> > +}
> > >
> > > OK, that is one way of doing it which should work, as long
> > > as most of the time that matches.
> > >
> > > My preference for subsections is normally to make them just
> > > conditional on machine type, that way backwards migration just
> > > works.  However, if the result of the backwards migration would
> > > be data corruption (which if I understand what you saying it could
> > > in this case) then it's arguably better to fail migration in those
> > > cases.
> > 
> > This isn't a matter of preference.
> > 
> > Subsections were designed to solves a problem we only have because we're
> > not perfect: device model bugs, namely forgetting to migrate a bit of
> > state, or failing to model it in the first place.
> > 
> > Subsections tied to machine types solve an entirely different problem:
> > the old version of the device is just fine, including migration, but we
> > now want a new and improved variant, which needs some more state
> > migrated.  Putting that piece of state into a subsection tied to the new
> > machine type avoids code duplication.
> > 
> > But what do you do for device model bugs serious enough to need fixing
> > even for existing machine types, when the fix needs additional state
> > migrated?  Subsections tied to machine types are *not* a solution there.
> > That's why this isn't about preference!  It's about having a bug to fix
> > or not.
> 
> My problem is that people keep adding subsections to fix minor device
> bugs because they think breaking migration is irrelevant.  If the bug
> being fixed causes data corruption then OK I agree it is probably better
> to break migration, otherwise  you need to make a decision about whether
> fixing the device bug during migration is better or worse than having
> the migration fail.
> Some people say 'Migration failure is ok - people just try it again';
> sorry, no, that's useless in environments where the VM has 100s of GB
> of RAM and takes hours to migrate only for it then to fail, and it
> was urgent that it was migrated off that source host.

If this is a problem in practice, it sounds a lot like we need to
improve our handling of that situation in general. Why do we abort
the whole migration when serialising the state fails? Can't we abort
just the completion and switch back to the live state, and then retry a
bit later?

Most, if not all, of the subsections you mentioned that fix minor bugs
fix some states while the device is actively used and shorty after we
should be back in a state where the subsection isn't needed.

This is exactly the "fails 1 in 100 migrations" cases you talked about,
and they would be practically fixed if you made a few more attempts to
complete the migration before you let it fail (or perhaps even retry
indefinitely, as long as the user doesn't cancel migration).

Kevin

  reply	other threads:[~2015-05-29  9:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 13:19 [Qemu-devel] [PATCH v2 0/8] fdc: Clean up and fix command processing Kevin Wolf
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 1/8] fdc: Rename fdctrl_reset_fifo() to fdctrl_to_command_phase() Kevin Wolf
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 2/8] fdc: Rename fdctrl_set_fifo() to fdctrl_to_result_phase() Kevin Wolf
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 3/8] fdc: Introduce fdctrl->phase Kevin Wolf
2015-05-21 21:55   ` John Snow
2015-05-28 17:29     ` Dr. David Alan Gilbert
2015-05-29  7:50       ` Markus Armbruster
2015-05-29  8:33         ` Dr. David Alan Gilbert
2015-05-29  9:11           ` Kevin Wolf [this message]
2015-05-29  9:38             ` Dr. David Alan Gilbert
2015-05-29 10:27               ` Kevin Wolf
2015-05-29 10:34                 ` Dr. David Alan Gilbert
2015-05-29 10:55                   ` Peter Maydell
2015-05-29 10:57                     ` Dr. David Alan Gilbert
2015-06-01 12:51                       ` Markus Armbruster
2015-05-29 10:59                   ` Kevin Wolf
2015-06-01 12:46           ` Markus Armbruster
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 4/8] fdc: Use phase in fdctrl_write_data() Kevin Wolf
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 5/8] fdc: Code cleanup " Kevin Wolf
2015-05-21 21:34   ` John Snow
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 6/8] fdc: Disentangle phases in fdctrl_read_data() Kevin Wolf
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 7/8] fdc: Fix MSR.RQM flag Kevin Wolf
2015-05-21 21:27   ` John Snow
2015-05-21 13:19 ` [Qemu-devel] [PATCH v2 8/8] fdc-test: Test state for existing cases more thoroughly Kevin Wolf
2015-06-02 17:37 ` [Qemu-devel] [PATCH v2 0/8] fdc: Clean up and fix command processing John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150529091126.GC3804@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).