qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, fam@euphon.net, vsementsov@virtuozzo.com,
	quintela@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com,
	stefanha@redhat.com, andrey.shinkevich@virtuozzo.com,
	den@openvz.org, mreitz@redhat.com, jsnow@redhat.com
Subject: [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed
Date: Fri, 24 Jul 2020 11:43:23 +0300	[thread overview]
Message-ID: <20200724084327.15665-18-vsementsov@virtuozzo.com> (raw)
In-Reply-To: <20200724084327.15665-1-vsementsov@virtuozzo.com>

First, if only bitmaps postcopy enabled (not ram postcopy)
postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).

And anyway, bitmaps postcopy is not prepared to be somehow recovered.
The original idea instead is that if bitmaps postcopy failed, we just
loss some bitmaps, which is not critical. So, on failure we just need
to remove unfinished bitmaps and guest should continue execution on
destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9d8a..a843d202b5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1813,6 +1813,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
     QEMUFile *f = mis->from_src_file;
     int load_res;
+    MigrationState *migr = migrate_get_current();
+
+    object_ref(OBJECT(migr));
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                                    MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1839,11 +1842,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
     trace_postcopy_ram_listen_thread_exit();
     if (load_res < 0) {
-        error_report("%s: loadvm failed: %d", __func__, load_res);
         qemu_file_set_error(f, load_res);
-        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                       MIGRATION_STATUS_FAILED);
-    } else {
+        dirty_bitmap_mig_cancel_incoming();
+        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
+        {
+            error_report("%s: loadvm failed during postcopy: %d. All states "
+                         "are migrated except dirty bitmaps. Some dirty "
+                         "bitmaps may be lost, and present migrated dirty "
+                         "bitmaps are correctly migrated and valid.",
+                         __func__, load_res);
+            load_res = 0; /* prevent further exit() */
+        } else {
+            error_report("%s: loadvm failed: %d", __func__, load_res);
+            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                           MIGRATION_STATUS_FAILED);
+        }
+    }
+    if (load_res >= 0) {
         /*
          * This looks good, but it's possible that the device loading in the
          * main thread hasn't finished yet, and so we might not be in 'RUN'
@@ -1879,6 +1895,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
     mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
+    object_unref(OBJECT(migr));
+
     return NULL;
 }
 
@@ -2437,6 +2455,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 {
     trace_postcopy_pause_incoming();
 
+    assert(migrate_postcopy_ram());
+
     /* Clear the triggered bit to allow one recovery */
     mis->postcopy_recover_triggered = false;
 
@@ -2521,15 +2541,22 @@ out:
     if (ret < 0) {
         qemu_file_set_error(f, ret);
 
+        /* Cancel bitmaps incoming regardless of recovery */
+        dirty_bitmap_mig_cancel_incoming();
+
         /*
          * If we are during an active postcopy, then we pause instead
          * of bail out to at least keep the VM's dirty data.  Note
          * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
          * during which we're still receiving device states and we
          * still haven't yet started the VM on destination.
+         *
+         * Only RAM postcopy supports recovery. Still, if RAM postcopy is
+         * enabled, canceled bitmaps postcopy will not affect RAM postcopy
+         * recovering.
          */
         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
-            postcopy_pause_incoming(mis)) {
+            migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
             /* Reset f to point to the newly created channel */
             f = mis->from_src_file;
             goto retry;
-- 
2.21.0



  parent reply	other threads:[~2020-07-24  8:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 01/21] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
2020-07-24 15:03   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 02/21] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 03/21] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 04/21] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 05/21] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 06/21] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
2020-07-24 15:49   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 08/21] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 09/21] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
2020-07-27  9:51   ` Dr. David Alan Gilbert
2020-07-24  8:43 ` [PATCH v3 11/21] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 12/21] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
2020-07-24 16:11   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 14/21] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
2020-07-24 17:35   ` Eric Blake
2020-07-24 20:30     ` Vladimir Sementsov-Ogievskiy
2020-07-27 11:16       ` Dr. David Alan Gilbert
2020-07-27 11:26         ` Vladimir Sementsov-Ogievskiy
2020-07-27 11:23   ` Dr. David Alan Gilbert
2020-07-27 11:28     ` Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
2020-07-27 13:21   ` Dr. David Alan Gilbert
2020-07-27 17:06     ` Vladimir Sementsov-Ogievskiy
2020-07-27 19:27       ` Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy [this message]
2020-07-24 18:08   ` [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed Eric Blake
2020-07-27 13:29     ` Dr. David Alan Gilbert
2020-07-24  8:43 ` [PATCH v3 18/21] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 19/21] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
2020-07-24 19:07   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 21/21] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200724084327.15665-18-vsementsov@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=andrey.shinkevich@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=fam@euphon.net \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).