From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
"Daniel P . Berrange" <berrange@redhat.com>,
Alexey Perevalov <a.perevalov@samsung.com>,
Juan Quintela <quintela@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
Date: Tue, 10 Oct 2017 13:30:18 +0100 [thread overview]
Message-ID: <20171010123017.GE2132@work-vm> (raw)
In-Reply-To: <20171010093801.GC20686@pxdev.xzpeter.org>
* Peter Xu (peterx@redhat.com) wrote:
> On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Sep 21, 2017 at 08:29:03PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > When there is IO error on the incoming channel (e.g., network down),
> > > > > instead of bailing out immediately, we allow the dst vm to switch to the
> > > > > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> > > > > new semaphore, until someone poke it for another attempt.
> > > > >
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > ---
> > > > > migration/migration.c | 1 +
> > > > > migration/migration.h | 3 +++
> > > > > migration/savevm.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > > migration/trace-events | 2 ++
> > > > > 4 files changed, 64 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index 8d26ea8..80de212 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
> > > > > memset(&mis_current, 0, sizeof(MigrationIncomingState));
> > > > > qemu_mutex_init(&mis_current.rp_mutex);
> > > > > qemu_event_init(&mis_current.main_thread_load_event, false);
> > > > > + qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> > > > > once = true;
> > > > > }
> > > > > return &mis_current;
> > > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > > index 0c957c9..c423682 100644
> > > > > --- a/migration/migration.h
> > > > > +++ b/migration/migration.h
> > > > > @@ -60,6 +60,9 @@ struct MigrationIncomingState {
> > > > > /* The coroutine we should enter (back) after failover */
> > > > > Coroutine *migration_incoming_co;
> > > > > QemuSemaphore colo_incoming_sem;
> > > > > +
> > > > > + /* notify PAUSED postcopy incoming migrations to try to continue */
> > > > > + QemuSemaphore postcopy_pause_sem_dst;
> > > > > };
> > > > >
> > > > > MigrationIncomingState *migration_incoming_get_current(void);
> > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > index 7172f14..3777124 100644
> > > > > --- a/migration/savevm.c
> > > > > +++ b/migration/savevm.c
> > > > > @@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > > > > */
> > > > > static void *postcopy_ram_listen_thread(void *opaque)
> > > > > {
> > > > > - QEMUFile *f = opaque;
> > > > > MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > + QEMUFile *f = mis->from_src_file;
> > > > > int load_res;
> > > > >
> > > > > migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > > > > @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > > > > */
> > > > > qemu_file_set_blocking(f, true);
> > > > > load_res = qemu_loadvm_state_main(f, mis);
> > > > > +
> > > > > + /*
> > > > > + * This is tricky, but, mis->from_src_file can change after it
> > > > > + * returns, when postcopy recovery happened. In the future, we may
> > > > > + * want a wrapper for the QEMUFile handle.
> > > > > + */
> > > > > + f = mis->from_src_file;
> > > > > +
> > > > > /* And non-blocking again so we don't block in any cleanup */
> > > > > qemu_file_set_blocking(f, false);
> > > > >
> > > > > @@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> > > > > /* Start up the listening thread and wait for it to signal ready */
> > > > > qemu_sem_init(&mis->listen_thread_sem, 0);
> > > > > qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > > > > - postcopy_ram_listen_thread, mis->from_src_file,
> > > > > + postcopy_ram_listen_thread, NULL,
> > > > > QEMU_THREAD_DETACHED);
> > > > > qemu_sem_wait(&mis->listen_thread_sem);
> > > > > qemu_sem_destroy(&mis->listen_thread_sem);
> > > > > @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
> > > > > }
> > > > > }
> > > > >
> > > > > +/* Return true if we should continue the migration, or false. */
> > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > > +{
> > > > > + trace_postcopy_pause_incoming();
> > > > > +
> > > > > + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > > + MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > > +
> > > > > + assert(mis->from_src_file);
> > > > > + qemu_file_shutdown(mis->from_src_file);
> > > > > + qemu_fclose(mis->from_src_file);
> > > > > + mis->from_src_file = NULL;
> > > > > +
> > > > > + assert(mis->to_src_file);
> > > > > + qemu_mutex_lock(&mis->rp_mutex);
> > > > > + qemu_file_shutdown(mis->to_src_file);
> > > >
> > > > Should you not do the shutdown() before the lock?
> > > > For example if the other thread is stuck, with rp_mutex
> > > > held, trying to write to to_src_file, then you'll block
> > > > waiting for the mutex. If you call shutdown and then take
> > > > the lock, the other thread will error and release the lock.
> > >
> > > The problem is that IMHO QEMUFile is not yet thread-safe itself. So
> > > if we operate on it (even to shut it down) logically we need to have
> > > the lock, right?
> >
> > That probably needs fixing for 'shutdown' under the assumption that
> > a) No one has or is deleting/freeing the QEMUFile
> > b) No one is closing the QEMUFile
> >
> > The whole point of using shutdown() is it forces any stuck send()'s or
> > read()'s to fail rather than staying stuck.
>
> I see. I just noticed that actually qemu_file_shutdown() is
> thread-safe itself - it boils down to the system shutdown() call (as
> long as the above assumption is there).
>
> Let me call qemu_file_shutdown() first before taking the lock to make
> sure send()/recv() hang won't happen.
>
> >
> > > Then, IMHO the question would be: when will the send() be stuck in the
> > > other thread?
> > >
> > > Normally the only case I can think of is that source didn't recv()
> > > fast enough, and we even consumed all the write buffer in dst side (I
> > > don't really know how kernel manages the buffers though, and e.g. how
> > > the size of buffer is defined...).
> > >
> > > But when reach here, the channel (say, from_src_file and to_src_file,
> > > since both of them are using the same channel behind the QEMUFile
> > > interface) should already be broken in some way, then IIUC even there
> > > is a send() in the other thread, it should return at some point with a
> > > failure as well, just like how we reached here (possibly due to a
> > > read() failure).
> >
> > We have to be careful about this; a network can fail in a way it
> > gets stuck rather than fails - this can get stuck until a full TCP
> > disconnection; and that takes about 30mins (from memory).
> > The nice thing about using 'shutdown' is that you can kill the existing
> > connection if it's hung. (Which then makes an interesting question;
> > the rules in your migrate-incoming command become different if you
> > want to declare it's failed!). Having said that, you're right that at
> > this point stuff has already failed - so do we need the shutdown?
> > (You might want to do the shutdown as part of the recovery earlier
> > or as a separate command to force the failure)
>
> I assume if I call shutdown before the lock then we'll be good then.
The question is what happens if you only allow recovery if we're already
in postcopy-paused state; in the case of a hung socket, since no IO has
actually failed yet, you will still be in postcopy-active.
Dave
> >
> > > > I'm not quite sure what will happen if we end up calling this
> > > > before the main thread has been returned from postcopy and the
> > > > device loading is complete.
> > >
> > > IIUC you mean the time starts from when we got MIG_CMD_PACKAGED until
> > > main thread finishes handling that package?
> >
> > Yes.
> >
> > > Normally I think that should not matter much since during handling the
> > > package it should hardly fail (we were reading from a buffer QIO
> > > channel, no real IOs there)...
> >
> > Note that while the main thread is reading the package, the listener
> > thread is receiving pages, so you can legally get a failure at that
> > point when the fd fails as it's receiving pages at the same time
> > as reading the devices.
> > (There's an argument that if it fails before you've received all
> > your devices then perhaps you can just restart the source)
>
> Yes.
>
> >
> > > But I agree about the reasoning. How
> > > about one more patch to postpone the "active" to "postcopy-active"
> > > state change after the package is handled correctly? Like:
> > >
> > > --------------
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index b5c3214034..8317b2a7e2 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > > QEMUFile *f = mis->from_src_file;
> > > int load_res;
> > >
> > > - migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > > - MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > qemu_sem_post(&mis->listen_thread_sem);
> > > trace_postcopy_ram_listen_thread_start();
> > >
> > > @@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> > > qemu_fclose(packf);
> > > object_unref(OBJECT(bioc));
> > >
> > > + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > > + MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > +
> > > return ret;
> > > }
> > > --------------
> > >
> > > This function will only be called with "postcopy-active" state.
> >
> > I *think* that's safe; you've got to be careful, but I can't see
> > anyone on the destination that cares about the destinction.
>
> Indeed, but I'd say that's the best thing I can think of (and the
> simplest). Even, not sure whether it'll be more clear if we set
> postcopy-active state right before starting the VM on destination,
> say, at the beginning of loadvm_postcopy_handle_run_bh().
>
> >
> > > > Also, at this point have we guaranteed no one else is about
> > > > to do an op on mis->to_src_file and will seg?
> > >
> > > I think no? Since IMHO the main thread is playing with the buffer QIO
> > > channel, rather than the real one?
> >
> > OK.
> >
> > > (btw, could I ask what's "seg"? :)
> >
> > just short for segmentation fault; sig 11.
>
> I see. Thanks!
>
> >
> > Dave
> >
> > > --
> > > Peter Xu
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> --
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-10-10 12:30 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-30 8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
2017-09-20 8:41 ` Juan Quintela
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
2017-09-20 8:25 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
2017-09-21 17:35 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
2017-09-06 14:36 ` Dr. David Alan Gilbert
2017-09-20 8:44 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
2017-09-21 17:51 ` Dr. David Alan Gilbert
2017-09-26 8:48 ` Peter Xu
2017-09-26 8:53 ` Dr. David Alan Gilbert
2017-09-26 9:13 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
2017-09-20 8:47 ` Juan Quintela
2017-09-20 9:06 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
2017-09-21 17:57 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
2017-09-21 19:21 ` Dr. David Alan Gilbert
2017-09-26 9:35 ` Peter Xu
2017-10-09 15:32 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
2017-09-21 19:29 ` Dr. David Alan Gilbert
2017-09-27 7:34 ` Peter Xu
2017-10-09 18:58 ` Dr. David Alan Gilbert
2017-10-10 9:38 ` Peter Xu
2017-10-10 11:31 ` Peter Xu
2017-10-31 18:57 ` Dr. David Alan Gilbert
2017-10-10 12:30 ` Dr. David Alan Gilbert [this message]
2017-10-11 3:00 ` Peter Xu
2017-10-12 12:19 ` Dr. David Alan Gilbert
2017-10-13 5:08 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
2017-09-22 9:09 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
2017-09-22 9:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
2017-09-22 10:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-09-22 11:05 ` Dr. David Alan Gilbert
2017-09-27 10:04 ` Peter Xu
2017-10-09 19:12 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-09-22 11:08 ` Dr. David Alan Gilbert
2017-09-27 10:11 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-09-22 11:13 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-09-22 11:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
2017-09-22 11:33 ` Dr. David Alan Gilbert
2017-09-28 2:30 ` Peter Xu
2017-10-02 11:04 ` Dr. David Alan Gilbert
2017-10-09 3:55 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
2017-09-22 11:53 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
2017-09-22 11:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
2017-09-22 20:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
2017-09-22 20:11 ` Dr. David Alan Gilbert
2017-09-28 3:12 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
2017-09-22 20:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
2017-09-22 20:32 ` Dr. David Alan Gilbert
2017-09-28 6:54 ` Peter Xu
2017-10-09 17:28 ` Dr. David Alan Gilbert
2017-10-10 10:08 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
2017-09-22 20:37 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171010123017.GE2132@work-vm \
--to=dgilbert@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=berrange@redhat.com \
--cc=lvivier@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).