qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
	"Daniel P . Berrange" <berrange@redhat.com>,
	Alexey Perevalov <a.perevalov@samsung.com>,
	Juan Quintela <quintela@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
Date: Mon, 9 Oct 2017 19:58:13 +0100	[thread overview]
Message-ID: <20171009185812.GR2374@work-vm> (raw)
In-Reply-To: <20170927073441.GD25011@pxdev.xzpeter.org>

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Sep 21, 2017 at 08:29:03PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > When there is IO error on the incoming channel (e.g., network down),
> > > instead of bailing out immediately, we allow the dst vm to switch to the
> > > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> > > new semaphore, until someone poke it for another attempt.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c  |  1 +
> > >  migration/migration.h  |  3 +++
> > >  migration/savevm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  migration/trace-events |  2 ++
> > >  4 files changed, 64 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 8d26ea8..80de212 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
> > >          memset(&mis_current, 0, sizeof(MigrationIncomingState));
> > >          qemu_mutex_init(&mis_current.rp_mutex);
> > >          qemu_event_init(&mis_current.main_thread_load_event, false);
> > > +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> > >          once = true;
> > >      }
> > >      return &mis_current;
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index 0c957c9..c423682 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -60,6 +60,9 @@ struct MigrationIncomingState {
> > >      /* The coroutine we should enter (back) after failover */
> > >      Coroutine *migration_incoming_co;
> > >      QemuSemaphore colo_incoming_sem;
> > > +
> > > +    /* notify PAUSED postcopy incoming migrations to try to continue */
> > > +    QemuSemaphore postcopy_pause_sem_dst;
> > >  };
> > >  
> > >  MigrationIncomingState *migration_incoming_get_current(void);
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index 7172f14..3777124 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > >   */
> > >  static void *postcopy_ram_listen_thread(void *opaque)
> > >  {
> > > -    QEMUFile *f = opaque;
> > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    QEMUFile *f = mis->from_src_file;
> > >      int load_res;
> > >  
> > >      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > > @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > >       */
> > >      qemu_file_set_blocking(f, true);
> > >      load_res = qemu_loadvm_state_main(f, mis);
> > > +
> > > +    /*
> > > +     * This is tricky, but, mis->from_src_file can change after it
> > > +     * returns, when postcopy recovery happened. In the future, we may
> > > +     * want a wrapper for the QEMUFile handle.
> > > +     */
> > > +    f = mis->from_src_file;
> > > +
> > >      /* And non-blocking again so we don't block in any cleanup */
> > >      qemu_file_set_blocking(f, false);
> > >  
> > > @@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> > >      /* Start up the listening thread and wait for it to signal ready */
> > >      qemu_sem_init(&mis->listen_thread_sem, 0);
> > >      qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > > -                       postcopy_ram_listen_thread, mis->from_src_file,
> > > +                       postcopy_ram_listen_thread, NULL,
> > >                         QEMU_THREAD_DETACHED);
> > >      qemu_sem_wait(&mis->listen_thread_sem);
> > >      qemu_sem_destroy(&mis->listen_thread_sem);
> > > @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
> > >      }
> > >  }
> > >  
> > > +/* Return true if we should continue the migration, or false. */
> > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > +{
> > > +    trace_postcopy_pause_incoming();
> > > +
> > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > +
> > > +    assert(mis->from_src_file);
> > > +    qemu_file_shutdown(mis->from_src_file);
> > > +    qemu_fclose(mis->from_src_file);
> > > +    mis->from_src_file = NULL;
> > > +
> > > +    assert(mis->to_src_file);
> > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > +    qemu_file_shutdown(mis->to_src_file);
> > 
> > Should you not do the shutdown() before the lock?
> > For example if the other thread is stuck, with rp_mutex
> > held, trying to write to to_src_file, then you'll block
> > waiting for the mutex.  If you call shutdown and then take
> > the lock, the other thread will error and release the lock.
> 
> The problem is that IMHO QEMUFile is not yet thread-safe itself.  So
> if we operate on it (even to shut it down) logically we need to have
> the lock, right?

That probably needs fixing for 'shutdown' under the assumption that
   a) No one has or is deleting/freeing the QEMUFile
   b) No one is closing the QEMUFile

The whole point of using shutdown() is it forces any stuck send()'s or
read()'s to fail rather than staying stuck.

> Then, IMHO the question would be: when will the send() be stuck in the
> other thread?
> 
> Normally the only case I can think of is that source didn't recv()
> fast enough, and we even consumed all the write buffer in dst side (I
> don't really know how kernel manages the buffers though, and e.g. how
> the size of buffer is defined...).
> 
> But when reach here, the channel (say, from_src_file and to_src_file,
> since both of them are using the same channel behind the QEMUFile
> interface) should already be broken in some way, then IIUC even there
> is a send() in the other thread, it should return at some point with a
> failure as well, just like how we reached here (possibly due to a
> read() failure).

We have to be careful about this; a network can fail in a way it
gets stuck rather than fails - this can get stuck until a full TCP
disconnection; and that takes about 30mins (from memory).
The nice thing about using 'shutdown' is that you can kill the existing
connection if it's hung. (Which then makes an interesting question;
the rules in your migrate-incoming command become different if you
want to declare it's failed!).  Having said that, you're right that at
this point stuff has already failed - so do we need the shutdown?
(You might want to do the shutdown as part of the recovery earlier
or as a separate command to force the failure)

> > I'm not quite sure what will happen if we end up calling this
> > before the main thread has been returned from postcopy and the
> > device loading is complete.
> 
> IIUC you mean the time starts from when we got MIG_CMD_PACKAGED until
> main thread finishes handling that package?

Yes.

> Normally I think that should not matter much since during handling the
> package it should hardly fail (we were reading from a buffer QIO
> channel, no real IOs there)... 

Note that while the main thread is reading the package, the listener
thread is receiving pages, so you can legally get a failure at that
point when the fd fails as it's receiving pages at the same time
as reading the devices.
(There's an argument that if it fails before you've received all
your devices then perhaps you can just restart the source)

> But I agree about the reasoning.  How
> about one more patch to postpone the "active" to "postcopy-active"
> state change after the package is handled correctly?  Like:
> 
> --------------
> diff --git a/migration/savevm.c b/migration/savevm.c                     
> index b5c3214034..8317b2a7e2 100644 
> --- a/migration/savevm.c            
> +++ b/migration/savevm.c            
> @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)                                                                       
>      QEMUFile *f = mis->from_src_file;                                   
>      int load_res;                  
>                                     
> -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
>      qemu_sem_post(&mis->listen_thread_sem);                             
>      trace_postcopy_ram_listen_thread_start();                           
>                                     
> @@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)                                                          
>      qemu_fclose(packf);            
>      object_unref(OBJECT(bioc));    
>                                     
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> +                                   
>      return ret;                    
>  }                                  
> --------------
> 
> This function will only be called with "postcopy-active" state.

I *think* that's safe; you've got to be careful, but I can't see
anyone on the destination that cares about the destinction.

> > Also, at this point have we guaranteed no one else is about
> > to do an op on mis->to_src_file and will seg?
> 
> I think no?  Since IMHO the main thread is playing with the buffer QIO
> channel, rather than the real one?

OK.

> (btw, could I ask what's "seg"? :)

just short for segmentation fault; sig 11.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-10-09 18:58 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
2017-08-30  8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
2017-09-20  8:41   ` Juan Quintela
2017-08-30  8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
2017-09-20  8:25   ` Juan Quintela
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
2017-09-21 17:35   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
2017-09-06 14:36   ` Dr. David Alan Gilbert
2017-09-20  8:44   ` Juan Quintela
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
2017-09-21 17:51   ` Dr. David Alan Gilbert
2017-09-26  8:48     ` Peter Xu
2017-09-26  8:53       ` Dr. David Alan Gilbert
2017-09-26  9:13         ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
2017-09-20  8:47   ` Juan Quintela
2017-09-20  9:06   ` Juan Quintela
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
2017-09-21 17:57   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
2017-09-21 19:21   ` Dr. David Alan Gilbert
2017-09-26  9:35     ` Peter Xu
2017-10-09 15:32       ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
2017-09-21 19:29   ` Dr. David Alan Gilbert
2017-09-27  7:34     ` Peter Xu
2017-10-09 18:58       ` Dr. David Alan Gilbert [this message]
2017-10-10  9:38         ` Peter Xu
2017-10-10 11:31           ` Peter Xu
2017-10-31 18:57             ` Dr. David Alan Gilbert
2017-10-10 12:30           ` Dr. David Alan Gilbert
2017-10-11  3:00             ` Peter Xu
2017-10-12 12:19               ` Dr. David Alan Gilbert
2017-10-13  5:08                 ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
2017-09-22  9:09   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
2017-09-22  9:56   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
2017-09-22 10:08   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-09-22 11:05   ` Dr. David Alan Gilbert
2017-09-27 10:04     ` Peter Xu
2017-10-09 19:12       ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-09-22 11:08   ` Dr. David Alan Gilbert
2017-09-27 10:11     ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-09-22 11:13   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-09-22 11:17   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
2017-09-22 11:33   ` Dr. David Alan Gilbert
2017-09-28  2:30     ` Peter Xu
2017-10-02 11:04       ` Dr. David Alan Gilbert
2017-10-09  3:55         ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
2017-09-22 11:53   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
2017-09-22 11:56   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
2017-09-22 20:08   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
2017-09-22 20:11   ` Dr. David Alan Gilbert
2017-09-28  3:12     ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
2017-09-22 20:15   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
2017-09-22 20:15   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
2017-09-22 20:17   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
2017-09-22 20:32   ` Dr. David Alan Gilbert
2017-09-28  6:54     ` Peter Xu
2017-10-09 17:28       ` Dr. David Alan Gilbert
2017-10-10 10:08         ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
2017-09-22 20:37   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171009185812.GR2374@work-vm \
    --to=dgilbert@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=aarcange@redhat.com \
    --cc=berrange@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).