From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
"Daniel P . Berrange" <berrange@redhat.com>,
Alexey Perevalov <a.perevalov@samsung.com>,
Juan Quintela <quintela@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM
Date: Tue, 10 Oct 2017 18:08:30 +0800 [thread overview]
Message-ID: <20171010100830.GD20686@pxdev.xzpeter.org> (raw)
In-Reply-To: <20171009172805.GP2374@work-vm>
On Mon, Oct 09, 2017 at 06:28:06PM +0100, Dr. David Alan Gilbert wrote:
[...]
> > > > /*
> > > > @@ -1291,14 +1301,25 @@ void migrate_del_blocker(Error *reason)
> > > > void qmp_migrate_incoming(const char *uri, Error **errp)
> > > > {
> > > > Error *local_err = NULL;
> > > > - static bool once = true;
> > > > + MigrationIncomingState *mis = migration_incoming_get_current();
> > > >
> > > > - if (!deferred_incoming) {
> > > > - error_setg(errp, "For use with '-incoming defer'");
> > > > + if (!deferred_incoming &&
> > > > + mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > + error_setg(errp, "For use with '-incoming defer'"
> > > > + " or PAUSED postcopy migration only.");
> > > > return;
> > > > }
> > > > - if (!once) {
> > > > - error_setg(errp, "The incoming migration has already been started");
> > >
> > > What guards against someone doing a migrate_incoming after the succesful
> > > completion of an incoming migration?
> >
> > If deferred incoming is not enabled, we should be protected by above
> > check on (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED). But yes I
> > think this is a problem if deferred incoming is used. Maybe I should
> > still keep the "once" check here for deferred migration, but I think I
> > can re-use the variable "deferred_incoming". Please see below.
> >
> > > Also with RDMA the following won't happen so I'm not quite sure what
> > > state we're in.
> >
> > Indeed. Currently there is still no good way to destroy the RDMA
> > accept handle easily since it's using its own qemu_set_fd_handler()
> > way to setup accept ports. But I think maybe I can solve this problem
> > with below issue together. Please see below.
> >
> > >
> > > When we get to non-blocking commands it's also a bit interesting - we
> > > could be getting an accept on the main thread at just the same time
> > > this is going down the OOB side.
> >
> > This is an interesting point. Thanks for noticing that.
> >
> > How about I do it the strict way? like this (hopefully this can solve
> > all the issues mentioned above):
> >
> > qmp_migrate_incoming()
> > {
> > if (deferred_incoming) {
> > // PASS, deferred incoming is set, and never triggered
> > } else if (state == POSTCOPY_PAUSED && listen_tag == 0) {
> > // PASS, we don't have an accept port
> > } else {
> > // FAIL
>
> One problem is at this point you can't say much about why you failed;
> my original migrate_incoming was like this, but then in 4debb5f5 I
> added the 'once' to allow you to distinguish the cases of trying to use
> migrate_incoming twice from never having used -incoming defer;
> Markus asked for that in the review: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04079.html
Ah. Then let me revive the "once" parameter:
if (state == POSTCOPY_PAUSED && listen_tag == 0) {
// PASS, we don't have an accept port and need recovery
} else if (deferred_incoming) {
if (!once) {
once = true;
// PASS, incoming is deferred
} else {
// FAIL: deferred incoming has been specified already
}
} else {
// FAIL: neither do we need recovery, nor do we have deferred incoming
}
>
> > }
> >
> > qemu_start_incoming_migration(uri, &local_err);
>
> We still have to make sure that nothin in that takes a lock.
I think the monitor_lock is needed when sending events, but I think
it's fine - during critical section of monitor_lock, there is no
chance for page fault.
For the rest, I didn't see a chance. Hope I didn't miss anything...
>
> > if (local_err) {
> > error_propagate(errp, local_err);
> > return;
> > }
> >
> > // stop allowing this
> > deferred_incoming = false;
>
> OK, this works I think as long as we have the requirement that
> only one OOB command can be executing at once. So that depends
> on the structure of your OOB stuff; if you can run multiple OOB
> at once then you can have two instances of this command running
> at the same time and this setting passes each other.
Indeed. IIUC Markus's proposal (and lastest version of the series)
won't allow OOB to be run in parallel. They (the commands) should be
fast commands, fast enough that won't need to bother to be run
concurrently. If that can be paralleled, we may need a lock.
>
> (You may have to be careful of the read of state and listen_tag
> since those are getting set from another thread).
IMHO think it should be fine here - I'm checking on listen_tag against
zero, and this function is the only chance we change it from zero to
non-zero. So as long as we don't parallel this function (or have lock
as mentioned above) IMHO we should be good.
>
> > }
> >
> > To make sure it works, I may need to hack an unique listen tag for
> > RDMA for now, say, using (guint)(-1) to stands for RDMA tag (instead
> > of really re-write RDMA codes to use the watcher stuff with real
> > listen tags), like:
> >
> > #define MIG_LISTEN_TAG_RDMA_FAKE ((guint)(-1))
> >
> > bool migrate_incoming_detach_listen()
> > {
> > if (listen_tag) {
> > if (listen_tag != MIG_LISTEN_TAG_RDMA_FAKE) {
> > // RDMA has already detached the accept port
> > g_source_remove(listen_tag);
> > }
> > listen_tag = 0;
> > return true;
> > }
> > return false;
> > }
> >
> > Then when listen_tag != 0 it means that there is an acception port,
> > and as long as there is one port we don't allow to change it (like the
> > pesudo qmp_migrate_incoming() code I wrote).
>
> It's worth noting anyway that RDMA doesn't work with postcopy yet
> anyway (although I now have some ideas how we could fix that).
Ah, good to know.
Then I think I can avoid introducing this hacky tag any more. Instead,
I may do proper commenting showing that the check should not apply to
RDMA (since we will first check POSTCOPY_PAUSED state before checking
listen_tag, then it would never be RDMA migration).
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2017-10-10 10:08 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-30 8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
2017-09-20 8:41 ` Juan Quintela
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
2017-09-20 8:25 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
2017-09-21 17:35 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
2017-09-06 14:36 ` Dr. David Alan Gilbert
2017-09-20 8:44 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
2017-09-21 17:51 ` Dr. David Alan Gilbert
2017-09-26 8:48 ` Peter Xu
2017-09-26 8:53 ` Dr. David Alan Gilbert
2017-09-26 9:13 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
2017-09-20 8:47 ` Juan Quintela
2017-09-20 9:06 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
2017-09-21 17:57 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
2017-09-21 19:21 ` Dr. David Alan Gilbert
2017-09-26 9:35 ` Peter Xu
2017-10-09 15:32 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
2017-09-21 19:29 ` Dr. David Alan Gilbert
2017-09-27 7:34 ` Peter Xu
2017-10-09 18:58 ` Dr. David Alan Gilbert
2017-10-10 9:38 ` Peter Xu
2017-10-10 11:31 ` Peter Xu
2017-10-31 18:57 ` Dr. David Alan Gilbert
2017-10-10 12:30 ` Dr. David Alan Gilbert
2017-10-11 3:00 ` Peter Xu
2017-10-12 12:19 ` Dr. David Alan Gilbert
2017-10-13 5:08 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
2017-09-22 9:09 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
2017-09-22 9:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
2017-09-22 10:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-09-22 11:05 ` Dr. David Alan Gilbert
2017-09-27 10:04 ` Peter Xu
2017-10-09 19:12 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-09-22 11:08 ` Dr. David Alan Gilbert
2017-09-27 10:11 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-09-22 11:13 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-09-22 11:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
2017-09-22 11:33 ` Dr. David Alan Gilbert
2017-09-28 2:30 ` Peter Xu
2017-10-02 11:04 ` Dr. David Alan Gilbert
2017-10-09 3:55 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
2017-09-22 11:53 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
2017-09-22 11:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
2017-09-22 20:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
2017-09-22 20:11 ` Dr. David Alan Gilbert
2017-09-28 3:12 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
2017-09-22 20:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
2017-09-22 20:32 ` Dr. David Alan Gilbert
2017-09-28 6:54 ` Peter Xu
2017-10-09 17:28 ` Dr. David Alan Gilbert
2017-10-10 10:08 ` Peter Xu [this message]
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
2017-09-22 20:37 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171010100830.GD20686@pxdev.xzpeter.org \
--to=peterx@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lvivier@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).