qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: andrey.shinkevich@virtuozzo.com, qemu-devel@nongnu.org,
	quintela@redhat.com
Subject: Re: [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed
Date: Mon, 17 Feb 2020 16:57:00 +0000	[thread overview]
Message-ID: <20200217165700.GO3434@work-vm> (raw)
In-Reply-To: <20200217150246.29180-12-vsementsov@virtuozzo.com>

* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> First, if only bitmaps postcopy enabled (not ram postcopy)
> postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).
> 
> And anyway, bitmaps postcopy is not prepared to be somehow recovered.
> The original idea instead is that if bitmaps postcopy failed, we just
> loss some bitmaps, which is not critical. So, on failure we just need
> to remove unfinished bitmaps and guest should continue execution on
> destination.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1d4220ece8..7e9dd58ccb 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1812,6 +1812,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      QEMUFile *f = mis->from_src_file;
>      int load_res;
> +    MigrationState *migr = migrate_get_current();
> +
> +    object_ref(OBJECT(migr));
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                                     MIGRATION_STATUS_POSTCOPY_ACTIVE);
> @@ -1838,11 +1841,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
>  
>      trace_postcopy_ram_listen_thread_exit();
>      if (load_res < 0) {
> -        error_report("%s: loadvm failed: %d", __func__, load_res);
>          qemu_file_set_error(f, load_res);
> -        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> -                                       MIGRATION_STATUS_FAILED);
> -    } else {
> +        dirty_bitmap_mig_cancel_incoming();
> +        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> +            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> +        {
> +            error_report("%s: loadvm failed during postcopy: %d. All state is "
> +                         "migrated except for dirty bitmaps. Some dirty "
> +                         "bitmaps may be lost, and present migrated dirty "
> +                         "bitmaps are correctly migrated and valid.",
> +                         __func__, load_res);
> +            load_res = 0; /* prevent further exit() */
> +        } else {
> +            error_report("%s: loadvm failed: %d", __func__, load_res);
> +            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                                           MIGRATION_STATUS_FAILED);
> +        }
> +    }
> +    if (load_res >= 0) {
>          /*
>           * This looks good, but it's possible that the device loading in the
>           * main thread hasn't finished yet, and so we might not be in 'RUN'
> @@ -1878,6 +1894,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      mis->have_listen_thread = false;
>      postcopy_state_set(POSTCOPY_INCOMING_END);
>  
> +    object_unref(OBJECT(migr));
> +
>      return NULL;
>  }
>  
> @@ -2429,6 +2447,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>  {
>      trace_postcopy_pause_incoming();
>  
> +    assert(migrate_postcopy_ram());
> +
>      /* Clear the triggered bit to allow one recovery */
>      mis->postcopy_recover_triggered = false;
>  
> @@ -2513,15 +2533,22 @@ out:
>      if (ret < 0) {
>          qemu_file_set_error(f, ret);
>  
> +        /* Cancel bitmaps incoming regardless of recovery */
> +        dirty_bitmap_mig_cancel_incoming();
> +
>          /*
>           * If we are during an active postcopy, then we pause instead
>           * of bail out to at least keep the VM's dirty data.  Note
>           * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
>           * during which we're still receiving device states and we
>           * still haven't yet started the VM on destination.
> +         *
> +         * Only RAM postcopy supports recovery. Still, if RAM postcopy is
> +         * enabled, canceled bitmaps postcopy will not affect RAM postcopy
> +         * recovering.
>           */
>          if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> -            postcopy_pause_incoming(mis)) {
> +            migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
>              /* Reset f to point to the newly created channel */
>              f = mis->from_src_file;
>              goto retry;
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-02-17 16:57 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
2020-02-18  9:44   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 02/22] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
2020-07-23 20:50   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
2020-02-18 11:00   ` Andrey Shinkevich
2020-02-19 14:20     ` Vladimir Sementsov-Ogievskiy
2020-07-23 20:54       ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 04/22] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
2020-02-18 11:28   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
2020-02-18 13:05   ` Andrey Shinkevich
2020-02-19 15:29     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 06/22] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
2020-02-18 13:20   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
2020-02-18 14:26   ` Andrey Shinkevich
2020-02-19 15:30     ` Vladimir Sementsov-Ogievskiy
2020-02-19 16:14       ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
2020-02-18 17:07   ` Andrey Shinkevich
2020-07-23 21:30   ` Eric Blake
2020-07-24  5:18     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
2020-02-18 18:54   ` Andrey Shinkevich
2020-02-19 15:34     ` Vladimir Sementsov-Ogievskiy
2020-07-24  7:23       ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
2020-02-18 19:11   ` Andrey Shinkevich
2020-07-23 21:04   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
2020-02-17 16:57   ` Dr. David Alan Gilbert [this message]
2020-02-18 19:44   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 12/22] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
2020-02-19  7:04   ` Andrey Shinkevich
2020-07-23 22:03   ` Eric Blake
2020-07-24  6:32     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 13/22] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
2020-02-19  8:02   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
2020-02-19 13:16   ` Andrey Shinkevich
2020-02-19 15:44     ` Vladimir Sementsov-Ogievskiy
2020-07-24  6:50     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 15/22] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
2020-02-19 14:17   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 16/22] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
2020-02-19 14:33   ` Andrey Shinkevich
2020-02-19 14:44     ` Andrey Shinkevich
2020-02-19 15:46     ` Vladimir Sementsov-Ogievskiy
2020-07-24  0:23   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 17/22] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
2020-02-19 14:56   ` Andrey Shinkevich
2020-07-24  0:14   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 18/22] python/qemu/machine: add kill() method Vladimir Sementsov-Ogievskiy
2020-02-19 17:00   ` Andrey Shinkevich
2020-05-29 10:09   ` Philippe Mathieu-Daudé
2020-02-17 15:02 ` [PATCH v2 19/22] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
2020-02-19 16:10   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 20/22] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
2020-02-19 16:28   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
2020-02-19 16:48   ` Andrey Shinkevich
2020-02-19 16:50   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 22/22] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy
2020-02-19 17:15   ` Andrey Shinkevich
2020-07-24  7:50     ` Vladimir Sementsov-Ogievskiy
2020-02-17 19:31 ` [PATCH v2 00/22] Fix error handling during bitmap postcopy no-reply
2020-02-18 20:02 ` Andrey Shinkevich
2020-02-18 20:57   ` Eric Blake
2020-02-19 13:25     ` Andrey Shinkevich
2020-02-19 13:36       ` Eric Blake
2020-02-19 13:52         ` Andrey Shinkevich
2020-02-19 14:58           ` Eric Blake
2020-02-19 17:22             ` Andrey Shinkevich
2020-02-19 14:00         ` Eric Blake
2020-04-02  7:42 ` Vladimir Sementsov-Ogievskiy
2020-05-29 11:58   ` Eric Blake
2020-05-29 12:16     ` Vladimir Sementsov-Ogievskiy
2020-07-23 20:39       ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200217165700.GO3434@work-vm \
    --to=dgilbert@redhat.com \
    --cc=andrey.shinkevich@virtuozzo.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).