All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
	Alexey Perevalov <a.perevalov@samsung.com>,
	Juan Quintela <quintela@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover"
Date: Tue, 1 Aug 2017 12:36:22 +0100	[thread overview]
Message-ID: <20170801113622.GK2079@work-vm> (raw)
In-Reply-To: <1501229198-30588-19-git-send-email-peterx@redhat.com>

* Peter Xu (peterx@redhat.com) wrote:
> Introducing new migration state "postcopy-recover". If a migration
> procedure is paused and the connection is rebuilt afterward
> successfully, we'll switch the source VM state from "postcopy-paused" to
> the new state "postcopy-recover", then we'll do the resume logic in the
> migration thread (along with the return path thread).
> 
> This patch only do the state switch on source side. Another following up
> patch will handle the state switching on destination side using the same
> status bit.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 45 +++++++++++++++++++++++++++++++++++++++++----
>  qapi-schema.json      |  4 +++-
>  2 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 64de0ee..3aabe11 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -495,6 +495,7 @@ static bool migration_is_setup_or_active(int state)
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
>      case MIGRATION_STATUS_SETUP:
>          return true;
>  
> @@ -571,6 +572,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>      case MIGRATION_STATUS_CANCELLING:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
>           /* TODO add some postcopy stats */
>          info->has_status = true;
>          info->has_total_time = true;
> @@ -2018,6 +2020,13 @@ static bool postcopy_should_start(MigrationState *s)
>      return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
>  }
>  
> +/* Return zero if success, or <0 for error */
> +static int postcopy_do_resume(MigrationState *s)
> +{
> +    /* TODO: do the resume logic */
> +    return 0;
> +}
> +
>  /*
>   * We don't return until we are in a safe state to continue current
>   * postcopy migration.  Returns true to continue the migration, or
> @@ -2026,7 +2035,9 @@ static bool postcopy_should_start(MigrationState *s)
>  static bool postcopy_pause(MigrationState *s)
>  {
>      assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> -    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +
> +do_pause:
> +    migrate_set_state(&s->state, s->state,
>                        MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
>      /* Current channel is possibly broken. Release it. */
> @@ -2043,9 +2054,32 @@ static bool postcopy_pause(MigrationState *s)
>          qemu_sem_wait(&s->postcopy_pause_sem);
>      }
>  
> -    trace_postcopy_pause_continued();
> +    if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        /* We were waken up by a recover procedure. Give it a shot */
>  
> -    return true;
> +        /*
> +         * Firstly, let's wake up the return path now, with a new
> +         * return path channel.
> +         */
> +        qemu_sem_post(&s->postcopy_pause_rp_sem);
> +
> +        /* Do the resume logic */
> +        if (postcopy_do_resume(s) == 0) {
> +            /* Let's continue! */
> +            trace_postcopy_pause_continued();
> +            return true;
> +        } else {
> +            /*
> +             * Something wrong happened during the recovery, let's
> +             * pause again. Pause is always better than throwing data
> +             * away.
> +             */
> +            goto do_pause;

You should be able to turn this around into a do {} while or similar
rather than goto.

Dave

> +        }
> +    } else {
> +        /* This is not right... Time to quit. */
> +        return false;
> +    }
>  }
>  
>  /* Return true if we want to stop the migration, otherwise false. */
> @@ -2300,7 +2334,10 @@ void migrate_fd_connect(MigrationState *s)
>      }
>  
>      if (resume) {
> -        /* TODO: do the resume logic */
> +        /* Wakeup the main migration thread to do the recovery */
> +        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> +                          MIGRATION_STATUS_POSTCOPY_RECOVER);
> +        qemu_sem_post(&s->postcopy_pause_sem);
>          return;
>      }
>  
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 27b7c4c..10f1f60 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -669,6 +669,8 @@
>  #
>  # @postcopy-paused: during postcopy but paused. (since 2.10)
>  #
> +# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11)
> +#
>  # @completed: migration is finished.
>  #
>  # @failed: some error occurred during migration process.
> @@ -682,7 +684,7 @@
>  { 'enum': 'MigrationStatus',
>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
>              'active', 'postcopy-active', 'postcopy-paused',
> -            'completed', 'failed', 'colo' ] }
> +            'postcopy-recover', 'completed', 'failed', 'colo' ] }
>  
>  ##
>  # @MigrationInfo:
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-08-01 11:36 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
2017-07-31 16:34   ` Dr. David Alan Gilbert
2017-08-01  2:11     ` Peter Xu
2017-08-01  5:48       ` Alexey Perevalov
2017-08-01  6:02         ` Peter Xu
2017-08-01  6:12           ` Alexey Perevalov
2017-07-28  8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
2017-07-31 16:39   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
2017-07-31 16:53   ` Dr. David Alan Gilbert
2017-08-01  2:25     ` Peter Xu
2017-08-01  8:32       ` Daniel P. Berrange
2017-08-01  8:55         ` Dr. David Alan Gilbert
2017-08-02  3:21           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
2017-07-31 17:11   ` Dr. David Alan Gilbert
2017-08-01  2:43     ` Peter Xu
2017-08-01  8:40       ` Dr. David Alan Gilbert
2017-08-02  3:20         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
2017-07-31 17:58   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
2017-07-31 18:27   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
2017-07-31 18:39   ` Dr. David Alan Gilbert
2017-08-01  5:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
2017-07-31 18:42   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-07-31 18:45   ` Dr. David Alan Gilbert
2017-08-01  3:01     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
2017-07-31 18:52   ` Dr. David Alan Gilbert
2017-08-01  3:13     ` Peter Xu
2017-08-01  8:50       ` Dr. David Alan Gilbert
2017-08-02  3:31         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
2017-07-28 15:53   ` Eric Blake
2017-07-31  7:02     ` Peter Xu
2017-07-31 19:06   ` Dr. David Alan Gilbert
2017-08-01  6:28     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
2017-08-01  9:47   ` Dr. David Alan Gilbert
2017-08-02  5:06     ` Peter Xu
2017-08-03 14:03       ` Dr. David Alan Gilbert
2017-08-04  3:43         ` Peter Xu
2017-08-04  9:33           ` Dr. David Alan Gilbert
2017-08-04  9:44             ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
2017-08-01 10:01   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
2017-08-01 10:30   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
2017-08-01 10:41   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
2017-07-28 15:57   ` Eric Blake
2017-07-31  7:05     ` Peter Xu
2017-08-01 10:42   ` Dr. David Alan Gilbert
2017-08-01 11:03   ` Daniel P. Berrange
2017-08-02  5:56     ` Peter Xu
2017-08-02  9:28       ` Daniel P. Berrange
2017-07-28  8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
2017-08-01 10:59   ` Dr. David Alan Gilbert
2017-08-02  6:14     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
2017-08-01 11:36   ` Dr. David Alan Gilbert [this message]
2017-08-02  6:42     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
2017-08-01 10:56   ` Daniel P. Berrange
2017-08-02  7:02     ` Peter Xu
2017-08-02  9:26       ` Daniel P. Berrange
2017-08-02 11:02         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-03  9:28   ` Dr. David Alan Gilbert
2017-08-04  5:46     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-03  9:49   ` Dr. David Alan Gilbert
2017-08-04  6:08     ` Peter Xu
2017-08-04  6:15       ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-08-03 10:50   ` Dr. David Alan Gilbert
2017-08-04  6:59     ` Peter Xu
2017-08-04  9:49       ` Dr. David Alan Gilbert
2017-08-07  6:11         ` Peter Xu
2017-08-07  9:04           ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-08-03 11:05   ` Dr. David Alan Gilbert
2017-08-04  7:04     ` Peter Xu
2017-08-04  7:09       ` Peter Xu
2017-08-04  8:30         ` Dr. David Alan Gilbert
2017-08-04  9:22           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-08-03 11:21   ` Dr. David Alan Gilbert
2017-08-04  7:23     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-08-03 11:38   ` Dr. David Alan Gilbert
2017-08-04  7:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
2017-08-03 11:56   ` Dr. David Alan Gilbert
2017-08-04  7:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
2017-08-03 12:37   ` Dr. David Alan Gilbert
2017-08-04  8:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
2017-08-03 13:47   ` Dr. David Alan Gilbert
2017-08-04  9:05     ` Peter Xu
2017-08-04  9:53       ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
2017-08-03 13:54   ` Dr. David Alan Gilbert
2017-08-04  8:52     ` Peter Xu
2017-08-04  9:52       ` Dr. David Alan Gilbert
2017-08-07  6:57         ` Peter Xu
2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-08-03 15:57 ` Dr. David Alan Gilbert
2017-08-21  7:47   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170801113622.GK2079@work-vm \
    --to=dgilbert@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=aarcange@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.