From: "Daniel P. Berrange" <berrange@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Alexey Perevalov <a.perevalov@samsung.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
Date: Wed, 2 Aug 2017 10:28:47 +0100 [thread overview]
Message-ID: <20170802092847.GC4098@redhat.com> (raw)
In-Reply-To: <20170802055646.GE15080@pxdev.xzpeter.org>
On Wed, Aug 02, 2017 at 01:56:46PM +0800, Peter Xu wrote:
> On Tue, Aug 01, 2017 at 12:03:48PM +0100, Daniel P. Berrange wrote:
> > On Fri, Jul 28, 2017 at 04:06:25PM +0800, Peter Xu wrote:
> > > It will be used when we want to resume one paused migration.
> > >
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > hmp-commands.hx | 7 ++++---
> > > hmp.c | 4 +++-
> > > migration/migration.c | 2 +-
> > > qapi-schema.json | 5 ++++-
> > > 4 files changed, 12 insertions(+), 6 deletions(-)
> >
> > I'm not seeing explicit info about how we handle the original failure
> > and how it relates to this resume command, but this feels like a
> > potentially racy approach to me.
> >
> > If we have a network problem between source & target, we could see
> > two results. Either the TCP stream will simply hang (it'll still
> > appear open to QEMU but no traffic will be flowing),
>
> (let's say this is the "1st condition")
>
> > or the connection
> > may actually break such that we get EOF and end up closing the file
> > descriptor.
>
> (let's say this is the "2nd condition")
>
> >
> > In the latter case, we're ok because the original channel is now
> > gone and we can safely establish the new one by issuing the new
> > 'migrate --resume URI' command.
> >
> > In the former case, however, there is the possibility that the
> > hang may come back to life at some point, concurrently with us
> > trying to do 'migrate --resume URI' and I'm unclear on the
> > semantics if that happens.
> >
> > Should the original connection carry on, and thus cause the
> > 'migrate --resume' command to fail, or will we forcably terminate
> > the original connection no matter what and use the new "resumed"
> > connection.
>
> Hmm yes, this is a good question. Currently this series is only
> handling the 2nd condition, say, when we can detect the error via
> system calls (IIUC we can know nothing when the 1st condition is
> encountered, we just e.g. block at the system calls as usual when
> reading the file handle). And currently the "resume" command is only
> allowed if the 2nd condition is detected (so it will never destroy an
> existing channel).
>
> If you see the next following patch, there is something like:
>
> if (has_resume && resume) {
> if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> error_setg(errp, "Cannot resume if there is no "
> "paused migration");
> return;
> }
> goto do_resume;
> }
>
> And here MIGRATION_STATUS_POSTCOPY_PAUSED will only be set when the
> 2nd condition is met.
>
> >
> > There's also synchronization with the target host - at the time we
> > want to recover, we need to be able to tell the target to accept
> > new incoming clients again, but we don't want to do that if the
> > original connection comes back to life.
>
> Yeah, I hacked this part in this v1 series (as you may have seen) to
> keep the ports open-forever. I am not sure whether that is acceptable,
> but looks not. :)
>
> How about this: when destination detected 2nd condition, it firstly
> switch to "postcopy-pause" state, then re-opens the accept channels.
> And it can turns the accept channels off when the state moves out of
> "postcopy-pause".
>
> >
> > It feels to me that if the mgmt app or admin believes the migration
> > is in a stuck state, we should be able to explicitly terminate the
> > existing connection via a monitor command. Then setup the target
> > host to accept new client, and then issue this migrate resume on
> > the source.
>
> Totally agree. That should be the only way to handle 1st condition
> well. However, would you mind if I postpone it a bit? IMHO as long as
> we can solve the 2nd condition nicely (which is the goal of this
> series), then it won't be too hard to continue support the 1st
> condition.
Sure, the 1st scenario is an easy bolt on to the second scenario. I
just wanted to be clear about what the target of these patches is,
because I think the 1st scenario is probably the most common one.
I guess if you have TCP keepalives enabled with a reasonably short
timeout, the 1st scenario will turn into the 2nd scenario fairly
quickly.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2017-08-02 9:29 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-28 8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
2017-07-31 16:34 ` Dr. David Alan Gilbert
2017-08-01 2:11 ` Peter Xu
2017-08-01 5:48 ` Alexey Perevalov
2017-08-01 6:02 ` Peter Xu
2017-08-01 6:12 ` Alexey Perevalov
2017-07-28 8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
2017-07-31 16:39 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
2017-07-31 16:53 ` Dr. David Alan Gilbert
2017-08-01 2:25 ` Peter Xu
2017-08-01 8:32 ` Daniel P. Berrange
2017-08-01 8:55 ` Dr. David Alan Gilbert
2017-08-02 3:21 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
2017-07-31 17:11 ` Dr. David Alan Gilbert
2017-08-01 2:43 ` Peter Xu
2017-08-01 8:40 ` Dr. David Alan Gilbert
2017-08-02 3:20 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
2017-07-31 17:58 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
2017-07-31 18:27 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
2017-07-31 18:39 ` Dr. David Alan Gilbert
2017-08-01 5:49 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
2017-07-31 18:42 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-07-31 18:45 ` Dr. David Alan Gilbert
2017-08-01 3:01 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
2017-07-31 18:52 ` Dr. David Alan Gilbert
2017-08-01 3:13 ` Peter Xu
2017-08-01 8:50 ` Dr. David Alan Gilbert
2017-08-02 3:31 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
2017-07-28 15:53 ` Eric Blake
2017-07-31 7:02 ` Peter Xu
2017-07-31 19:06 ` Dr. David Alan Gilbert
2017-08-01 6:28 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
2017-08-01 9:47 ` Dr. David Alan Gilbert
2017-08-02 5:06 ` Peter Xu
2017-08-03 14:03 ` Dr. David Alan Gilbert
2017-08-04 3:43 ` Peter Xu
2017-08-04 9:33 ` Dr. David Alan Gilbert
2017-08-04 9:44 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
2017-08-01 10:01 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
2017-08-01 10:30 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
2017-08-01 10:41 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
2017-07-28 15:57 ` Eric Blake
2017-07-31 7:05 ` Peter Xu
2017-08-01 10:42 ` Dr. David Alan Gilbert
2017-08-01 11:03 ` Daniel P. Berrange
2017-08-02 5:56 ` Peter Xu
2017-08-02 9:28 ` Daniel P. Berrange [this message]
2017-07-28 8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
2017-08-01 10:59 ` Dr. David Alan Gilbert
2017-08-02 6:14 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
2017-08-01 11:36 ` Dr. David Alan Gilbert
2017-08-02 6:42 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
2017-08-01 10:56 ` Daniel P. Berrange
2017-08-02 7:02 ` Peter Xu
2017-08-02 9:26 ` Daniel P. Berrange
2017-08-02 11:02 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-03 9:28 ` Dr. David Alan Gilbert
2017-08-04 5:46 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-03 9:49 ` Dr. David Alan Gilbert
2017-08-04 6:08 ` Peter Xu
2017-08-04 6:15 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-08-03 10:50 ` Dr. David Alan Gilbert
2017-08-04 6:59 ` Peter Xu
2017-08-04 9:49 ` Dr. David Alan Gilbert
2017-08-07 6:11 ` Peter Xu
2017-08-07 9:04 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-08-03 11:05 ` Dr. David Alan Gilbert
2017-08-04 7:04 ` Peter Xu
2017-08-04 7:09 ` Peter Xu
2017-08-04 8:30 ` Dr. David Alan Gilbert
2017-08-04 9:22 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-08-03 11:21 ` Dr. David Alan Gilbert
2017-08-04 7:23 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-08-03 11:38 ` Dr. David Alan Gilbert
2017-08-04 7:39 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
2017-08-03 11:56 ` Dr. David Alan Gilbert
2017-08-04 7:49 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
2017-08-03 12:37 ` Dr. David Alan Gilbert
2017-08-04 8:39 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
2017-08-03 13:47 ` Dr. David Alan Gilbert
2017-08-04 9:05 ` Peter Xu
2017-08-04 9:53 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
2017-08-03 13:54 ` Dr. David Alan Gilbert
2017-08-04 8:52 ` Peter Xu
2017-08-04 9:52 ` Dr. David Alan Gilbert
2017-08-07 6:57 ` Peter Xu
2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-08-03 15:57 ` Dr. David Alan Gilbert
2017-08-21 7:47 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170802092847.GC4098@redhat.com \
--to=berrange@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lvivier@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.