From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
"Daniel P . Berrange" <berrange@redhat.com>,
Alexey Perevalov <a.perevalov@samsung.com>,
Juan Quintela <quintela@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM
Date: Mon, 9 Oct 2017 18:28:06 +0100 [thread overview]
Message-ID: <20171009172805.GP2374@work-vm> (raw)
In-Reply-To: <20170928065407.GF17044@pxdev.xzpeter.org>
* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Sep 22, 2017 at 09:32:28PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > migrate_incoming command is previously only used when we were providing
> > > "-incoming defer" in the command line, to defer the incoming migration
> > > channel creation.
> > >
> > > However there is similar requirement when we are paused during postcopy
> > > migration. The old incoming channel might have been destroyed already.
> > > We may need another new channel for the recovery to happen.
> > >
> > > This patch leveraged the same interface, but allows the user to specify
> > > incoming migration channel even for paused postcopy.
> > >
> > > Meanwhile, now migration listening ports are always detached manually
> > > using the tag, rather than using return values of dispatchers.
> > >
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > migration/exec.c | 2 +-
> > > migration/fd.c | 2 +-
> > > migration/migration.c | 39 +++++++++++++++++++++++++++++----------
> > > migration/socket.c | 2 +-
> > > 4 files changed, 32 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/migration/exec.c b/migration/exec.c
> > > index ef1fb4c..26fc37d 100644
> > > --- a/migration/exec.c
> > > +++ b/migration/exec.c
> > > @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
> > > {
> > > migration_channel_process_incoming(ioc);
> > > object_unref(OBJECT(ioc));
> > > - return FALSE; /* unregister */
> > > + return TRUE; /* keep it registered */
> > > }
> > >
> > > /*
> > > diff --git a/migration/fd.c b/migration/fd.c
> > > index e9a548c..7d0aefa 100644
> > > --- a/migration/fd.c
> > > +++ b/migration/fd.c
> > > @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> > > {
> > > migration_channel_process_incoming(ioc);
> > > object_unref(OBJECT(ioc));
> > > - return FALSE; /* unregister */
> > > + return TRUE; /* keep it registered */
> > > }
> > >
> > > /*
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index daf356b..5812478 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -175,6 +175,17 @@ void migration_incoming_state_destroy(void)
> > > qemu_event_destroy(&mis->main_thread_load_event);
> > > }
> > >
> > > +static bool migrate_incoming_detach_listen(MigrationIncomingState *mis)
> > > +{
> > > + if (mis->listen_task_tag) {
> > > + /* Never fail */
> > > + g_source_remove(mis->listen_task_tag);
> > > + mis->listen_task_tag = 0;
> > > + return true;
> > > + }
> > > + return false;
> > > +}
> > > +
> > > static void migrate_generate_event(int new_state)
> > > {
> > > if (migrate_use_events()) {
> > > @@ -432,10 +443,9 @@ void migration_fd_process_incoming(QEMUFile *f)
> > >
> > > /*
> > > * When reach here, we should not need the listening port any
> > > - * more. We'll detach the listening task soon, let's reset the
> > > - * listen task tag.
> > > + * more. Detach the listening port explicitly.
> > > */
> > > - mis->listen_task_tag = 0;
> > > + migrate_incoming_detach_listen(mis);
> > > }
> > >
> > > /*
> > > @@ -1291,14 +1301,25 @@ void migrate_del_blocker(Error *reason)
> > > void qmp_migrate_incoming(const char *uri, Error **errp)
> > > {
> > > Error *local_err = NULL;
> > > - static bool once = true;
> > > + MigrationIncomingState *mis = migration_incoming_get_current();
> > >
> > > - if (!deferred_incoming) {
> > > - error_setg(errp, "For use with '-incoming defer'");
> > > + if (!deferred_incoming &&
> > > + mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > + error_setg(errp, "For use with '-incoming defer'"
> > > + " or PAUSED postcopy migration only.");
> > > return;
> > > }
> > > - if (!once) {
> > > - error_setg(errp, "The incoming migration has already been started");
> >
> > What guards against someone doing a migrate_incoming after the succesful
> > completion of an incoming migration?
>
> If deferred incoming is not enabled, we should be protected by above
> check on (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED). But yes I
> think this is a problem if deferred incoming is used. Maybe I should
> still keep the "once" check here for deferred migration, but I think I
> can re-use the variable "deferred_incoming". Please see below.
>
> > Also with RDMA the following won't happen so I'm not quite sure what
> > state we're in.
>
> Indeed. Currently there is still no good way to destroy the RDMA
> accept handle easily since it's using its own qemu_set_fd_handler()
> way to setup accept ports. But I think maybe I can solve this problem
> with below issue together. Please see below.
>
> >
> > When we get to non-blocking commands it's also a bit interesting - we
> > could be getting an accept on the main thread at just the same time
> > this is going down the OOB side.
>
> This is an interesting point. Thanks for noticing that.
>
> How about I do it the strict way? like this (hopefully this can solve
> all the issues mentioned above):
>
> qmp_migrate_incoming()
> {
> if (deferred_incoming) {
> // PASS, deferred incoming is set, and never triggered
> } else if (state == POSTCOPY_PAUSED && listen_tag == 0) {
> // PASS, we don't have an accept port
> } else {
> // FAIL
One problem is at this point you can't say much about why you failed;
my original migrate_incoming was like this, but then in 4debb5f5 I
added the 'once' to allow you to distinguish the cases of trying to use
migrate_incoming twice from never having used -incoming defer;
Markus asked for that in the review: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04079.html
> }
>
> qemu_start_incoming_migration(uri, &local_err);
We still have to make sure that nothin in that takes a lock.
> if (local_err) {
> error_propagate(errp, local_err);
> return;
> }
>
> // stop allowing this
> deferred_incoming = false;
OK, this works I think as long as we have the requirement that
only one OOB command can be executing at once. So that depends
on the structure of your OOB stuff; if you can run multiple OOB
at once then you can have two instances of this command running
at the same time and this setting passes each other.
(You may have to be careful of the read of state and listen_tag
since those are getting set from another thread).
> }
>
> To make sure it works, I may need to hack an unique listen tag for
> RDMA for now, say, using (guint)(-1) to stands for RDMA tag (instead
> of really re-write RDMA codes to use the watcher stuff with real
> listen tags), like:
>
> #define MIG_LISTEN_TAG_RDMA_FAKE ((guint)(-1))
>
> bool migrate_incoming_detach_listen()
> {
> if (listen_tag) {
> if (listen_tag != MIG_LISTEN_TAG_RDMA_FAKE) {
> // RDMA has already detached the accept port
> g_source_remove(listen_tag);
> }
> listen_tag = 0;
> return true;
> }
> return false;
> }
>
> Then when listen_tag != 0 it means that there is an acception port,
> and as long as there is one port we don't allow to change it (like the
> pesudo qmp_migrate_incoming() code I wrote).
It's worth noting anyway that RDMA doesn't work with postcopy yet
anyway (although I now have some ideas how we could fix that).
Dave
> Would this work?
>
> >
> > Dave
> >
> > > +
> > > + /*
> > > + * Destroy existing listening task if exist. Logically this should
> > > + * not really happen at all (for either deferred migration or
> > > + * postcopy migration, we should both detached the listening
> > > + * task). So raise an error but still we safely detach it.
> > > + */
> > > + if (migrate_incoming_detach_listen(mis)) {
> > > + error_report("%s: detected existing listen channel, "
> > > + "while it should not exist", __func__);
> > > + /* Continue */
> > > }
> > >
> > > qemu_start_incoming_migration(uri, &local_err);
> > > @@ -1307,8 +1328,6 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
> > > error_propagate(errp, local_err);
> > > return;
> > > }
> > > -
> > > - once = false;
> > > }
> > >
> > > bool migration_is_blocked(Error **errp)
> > > diff --git a/migration/socket.c b/migration/socket.c
> > > index 6ee51ef..e3e453f 100644
> > > --- a/migration/socket.c
> > > +++ b/migration/socket.c
> > > @@ -154,7 +154,7 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
> > > out:
> > > /* Close listening socket as its no longer needed */
> > > qio_channel_close(ioc, NULL);
> > > - return FALSE; /* unregister */
> > > + return TRUE; /* keep it registered */
> > > }
> > >
> > >
> > > --
> > > 2.7.4
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> --
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-10-09 17:28 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-30 8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
2017-09-20 8:41 ` Juan Quintela
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
2017-09-20 8:25 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
2017-09-21 17:35 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
2017-09-06 14:36 ` Dr. David Alan Gilbert
2017-09-20 8:44 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
2017-09-21 17:51 ` Dr. David Alan Gilbert
2017-09-26 8:48 ` Peter Xu
2017-09-26 8:53 ` Dr. David Alan Gilbert
2017-09-26 9:13 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
2017-09-20 8:47 ` Juan Quintela
2017-09-20 9:06 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
2017-09-21 17:57 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
2017-09-21 19:21 ` Dr. David Alan Gilbert
2017-09-26 9:35 ` Peter Xu
2017-10-09 15:32 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
2017-09-21 19:29 ` Dr. David Alan Gilbert
2017-09-27 7:34 ` Peter Xu
2017-10-09 18:58 ` Dr. David Alan Gilbert
2017-10-10 9:38 ` Peter Xu
2017-10-10 11:31 ` Peter Xu
2017-10-31 18:57 ` Dr. David Alan Gilbert
2017-10-10 12:30 ` Dr. David Alan Gilbert
2017-10-11 3:00 ` Peter Xu
2017-10-12 12:19 ` Dr. David Alan Gilbert
2017-10-13 5:08 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
2017-09-22 9:09 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
2017-09-22 9:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
2017-09-22 10:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-09-22 11:05 ` Dr. David Alan Gilbert
2017-09-27 10:04 ` Peter Xu
2017-10-09 19:12 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-09-22 11:08 ` Dr. David Alan Gilbert
2017-09-27 10:11 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-09-22 11:13 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-09-22 11:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
2017-09-22 11:33 ` Dr. David Alan Gilbert
2017-09-28 2:30 ` Peter Xu
2017-10-02 11:04 ` Dr. David Alan Gilbert
2017-10-09 3:55 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
2017-09-22 11:53 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
2017-09-22 11:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
2017-09-22 20:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
2017-09-22 20:11 ` Dr. David Alan Gilbert
2017-09-28 3:12 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
2017-09-22 20:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
2017-09-22 20:32 ` Dr. David Alan Gilbert
2017-09-28 6:54 ` Peter Xu
2017-10-09 17:28 ` Dr. David Alan Gilbert [this message]
2017-10-10 10:08 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
2017-09-22 20:37 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171009172805.GP2374@work-vm \
--to=dgilbert@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=berrange@redhat.com \
--cc=lvivier@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).