qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
	Alexey Perevalov <a.perevalov@samsung.com>,
	Juan Quintela <quintela@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
Date: Fri, 4 Aug 2017 10:52:27 +0100	[thread overview]
Message-ID: <20170804095226.GE2805@work-vm> (raw)
In-Reply-To: <20170804085216.GO5561@pxdev.xzpeter.org>

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Firstly, MigThrError enumeration is introduced to describe the error in
> > > migration_detect_error() better. This gives the migration_thread() a
> > > chance to know whether a recovery has happened.
> > > 
> > > Then, if a recovery is detected, migration_thread() will reset its local
> > > variables to prepare for that.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c | 40 +++++++++++++++++++++++++++++-----------
> > >  1 file changed, 29 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index ecebe30..439bc22 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s)
> > >      return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
> > >  }
> > >  
> > > +typedef enum MigThrError {
> > > +    /* No error detected */
> > > +    MIG_THR_ERR_NONE = 0,
> > > +    /* Detected error, but resumed successfully */
> > > +    MIG_THR_ERR_RECOVERED = 1,
> > > +    /* Detected fatal error, need to exit */
> > > +    MIG_THR_ERR_FATAL = 2,
> > > +} MigThrError;
> > > +
> > 
> > Could you move this patch earlier to when postcopy_pause is created
> > so it's created with this enum?
> 
> Sure.
> 
> [...]
> 
> > > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
> > >      /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> > >      enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
> > >      bool enable_colo = migrate_colo_enabled();
> > > +    MigThrError thr_error;
> > >  
> > >      rcu_register_thread();
> > >  
> > > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
> > >           * Try to detect any kind of failures, and see whether we
> > >           * should stop the migration now.
> > >           */
> > > -        if (migration_detect_error(s)) {
> > > +        thr_error = migration_detect_error(s);
> > > +        if (thr_error == MIG_THR_ERR_FATAL) {
> > > +            /* Stop migration */
> > >              break;
> > > +        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
> > > +            /*
> > > +             * Just recovered from a e.g. network failure, reset all
> > > +             * the local variables.
> > > +             */
> > > +            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +            initial_bytes = 0;
> > 
> > They don't seem that important to reset?
> 
> The problem is that we have this in migration_thread():
> 
>         if (current_time >= initial_time + BUFFER_DELAY) {
>             uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
>                                          initial_bytes;
>             uint64_t time_spent = current_time - initial_time;
>             double bandwidth = (double)transferred_bytes / time_spent;
>             threshold_size = bandwidth * s->parameters.downtime_limit;
>             ...
>         }
> 
> Here qemu_ftell() would possibly be very small since we have just
> resumed... and then transferred_bytes will be extremely huge since
> "qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative...
> Then, with luck, we'll got extremely huge "bandwidth" as well.

Ah yes that's a good reason to reset it then; add a comment like
'important to avoid breaking transferred_bytes and bandwidth
calculation'

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-08-04  9:52 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
2017-07-31 16:34   ` Dr. David Alan Gilbert
2017-08-01  2:11     ` Peter Xu
2017-08-01  5:48       ` Alexey Perevalov
2017-08-01  6:02         ` Peter Xu
2017-08-01  6:12           ` Alexey Perevalov
2017-07-28  8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
2017-07-31 16:39   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
2017-07-31 16:53   ` Dr. David Alan Gilbert
2017-08-01  2:25     ` Peter Xu
2017-08-01  8:32       ` Daniel P. Berrange
2017-08-01  8:55         ` Dr. David Alan Gilbert
2017-08-02  3:21           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
2017-07-31 17:11   ` Dr. David Alan Gilbert
2017-08-01  2:43     ` Peter Xu
2017-08-01  8:40       ` Dr. David Alan Gilbert
2017-08-02  3:20         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
2017-07-31 17:58   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
2017-07-31 18:27   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
2017-07-31 18:39   ` Dr. David Alan Gilbert
2017-08-01  5:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
2017-07-31 18:42   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-07-31 18:45   ` Dr. David Alan Gilbert
2017-08-01  3:01     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
2017-07-31 18:52   ` Dr. David Alan Gilbert
2017-08-01  3:13     ` Peter Xu
2017-08-01  8:50       ` Dr. David Alan Gilbert
2017-08-02  3:31         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
2017-07-28 15:53   ` Eric Blake
2017-07-31  7:02     ` Peter Xu
2017-07-31 19:06   ` Dr. David Alan Gilbert
2017-08-01  6:28     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
2017-08-01  9:47   ` Dr. David Alan Gilbert
2017-08-02  5:06     ` Peter Xu
2017-08-03 14:03       ` Dr. David Alan Gilbert
2017-08-04  3:43         ` Peter Xu
2017-08-04  9:33           ` Dr. David Alan Gilbert
2017-08-04  9:44             ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
2017-08-01 10:01   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
2017-08-01 10:30   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
2017-08-01 10:41   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
2017-07-28 15:57   ` Eric Blake
2017-07-31  7:05     ` Peter Xu
2017-08-01 10:42   ` Dr. David Alan Gilbert
2017-08-01 11:03   ` Daniel P. Berrange
2017-08-02  5:56     ` Peter Xu
2017-08-02  9:28       ` Daniel P. Berrange
2017-07-28  8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
2017-08-01 10:59   ` Dr. David Alan Gilbert
2017-08-02  6:14     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
2017-08-01 11:36   ` Dr. David Alan Gilbert
2017-08-02  6:42     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
2017-08-01 10:56   ` Daniel P. Berrange
2017-08-02  7:02     ` Peter Xu
2017-08-02  9:26       ` Daniel P. Berrange
2017-08-02 11:02         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-03  9:28   ` Dr. David Alan Gilbert
2017-08-04  5:46     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-03  9:49   ` Dr. David Alan Gilbert
2017-08-04  6:08     ` Peter Xu
2017-08-04  6:15       ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-08-03 10:50   ` Dr. David Alan Gilbert
2017-08-04  6:59     ` Peter Xu
2017-08-04  9:49       ` Dr. David Alan Gilbert
2017-08-07  6:11         ` Peter Xu
2017-08-07  9:04           ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-08-03 11:05   ` Dr. David Alan Gilbert
2017-08-04  7:04     ` Peter Xu
2017-08-04  7:09       ` Peter Xu
2017-08-04  8:30         ` Dr. David Alan Gilbert
2017-08-04  9:22           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-08-03 11:21   ` Dr. David Alan Gilbert
2017-08-04  7:23     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-08-03 11:38   ` Dr. David Alan Gilbert
2017-08-04  7:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
2017-08-03 11:56   ` Dr. David Alan Gilbert
2017-08-04  7:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
2017-08-03 12:37   ` Dr. David Alan Gilbert
2017-08-04  8:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
2017-08-03 13:47   ` Dr. David Alan Gilbert
2017-08-04  9:05     ` Peter Xu
2017-08-04  9:53       ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
2017-08-03 13:54   ` Dr. David Alan Gilbert
2017-08-04  8:52     ` Peter Xu
2017-08-04  9:52       ` Dr. David Alan Gilbert [this message]
2017-08-07  6:57         ` Peter Xu
2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-08-03 15:57 ` Dr. David Alan Gilbert
2017-08-21  7:47   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170804095226.GE2805@work-vm \
    --to=dgilbert@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=aarcange@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).