qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: peterx@redhat.com
To: qemu-devel@nongnu.org, Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	peterx@redhat.com, Fabiano Rosas <farosas@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Prasad Pandit <ppandit@redhat.com>,
	Avihai Horon <avihaih@nvidia.com>
Subject: [PULL 01/34] migration: Don't serialize devices in qemu_savevm_state_iterate()
Date: Mon, 11 Mar 2024 17:58:52 -0400	[thread overview]
Message-ID: <20240311215925.40618-2-peterx@redhat.com> (raw)
In-Reply-To: <20240311215925.40618-1-peterx@redhat.com>

From: Avihai Horon <avihaih@nvidia.com>

Commit 90697be8896c ("live migration: Serialize vmstate saving in stage
2") introduced device serialization in qemu_savevm_state_iterate(). The
rationale behind it was to first complete migration of slower changing
block devices and only then migrate the RAM, to avoid sending fast
changing RAM pages over and over.

This commit was added a long time ago, and while it was useful back
then, it is not the case anymore:
1. Block migration is deprecated, see commit 66db46ca83b8 ("migration:
   Deprecate block migration").
2. Today there are other iterative devices besides RAM and block, such
   as VFIO, which are registered for migration after RAM. With current
   serialization behavior, a fast changing device can block other
   devices from sending their data, which may prevent migration from
   converging in some cases.

The issue described in item 2 was observed in several VFIO migration
scenarios with switchover-ack capability enabled, where some workload on
the VM prevented RAM from ever reaching a hard zero, thus blocking VFIO
initial pre-copy data from being sent. Hence, destination could not ack
switchover and migration could not converge.

Fix that by not serializing iterative devices in
qemu_savevm_state_iterate().

Note that this still doesn't fully prevent device starvation. As
correctly pointed out by Peter [1], a fast changing device might
constantly consume all allocated bandwidth and block the following
devices. However, this scenario is more likely to happen only if
max-bandwidth is low.

[1] https://lore.kernel.org/qemu-devel/Zd6iw9dBhW6wKNxx@x1n/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240304105339.20713-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index dc1fb9c0d3..e84b26e1c8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1390,7 +1390,8 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
 {
     SaveStateEntry *se;
-    int ret = 1;
+    bool all_finished = true;
+    int ret;
 
     trace_savevm_state_iterate();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
@@ -1431,16 +1432,12 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
                          "%d(%s): %d",
                          se->section_id, se->idstr, ret);
             qemu_file_set_error(f, ret);
-        }
-        if (ret <= 0) {
-            /* Do not proceed to the next vmstate before this one reported
-               completion of the current stage. This serializes the migration
-               and reduces the probability that a faster changing state is
-               synchronized over and over again. */
-            break;
+            return ret;
+        } else if (!ret) {
+            all_finished = false;
         }
     }
-    return ret;
+    return all_finished;
 }
 
 static bool should_send_vmdesc(void)
-- 
2.44.0



  reply	other threads:[~2024-03-11 22:03 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 21:58 [PULL 00/34] Migration 20240311 patches peterx
2024-03-11 21:58 ` peterx [this message]
2024-03-11 21:58 ` [PULL 02/34] vfio/migration: Refactor vfio_save_state() return value peterx
2024-03-11 21:58 ` [PULL 03/34] vfio/migration: Add a note about migration rate limiting peterx
2024-03-11 21:58 ` [PULL 04/34] migration/ram: add additional check peterx
2024-03-11 21:58 ` [PULL 05/34] migration: Report error when shutdown fails peterx
2024-03-11 21:58 ` [PULL 06/34] migration: Remove SaveStateHandler and LoadStateHandler typedefs peterx
2024-03-11 21:58 ` [PULL 07/34] migration: Add documentation for SaveVMHandlers peterx
2024-03-11 21:58 ` [PULL 08/34] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error peterx
2024-03-11 21:59 ` [PULL 09/34] migration/multifd: Don't fsync when closing QIOChannelFile peterx
2024-03-11 21:59 ` [PULL 10/34] migration/rdma: Fix a memory issue for migration peterx
2024-03-11 21:59 ` [PULL 11/34] physmem: Rename addr1 to more informative mr_addr in flatview_read/write() and similar peterx
2024-03-11 21:59 ` [PULL 12/34] physmem: Reduce local variable scope in flatview_read/write_continue() peterx
2024-03-11 21:59 ` [PULL 13/34] physmem: Factor out body of flatview_read/write_continue() loop peterx
2024-03-11 21:59 ` [PULL 14/34] physmem: Fix wrong address in large address_space_read/write_cached_slow() peterx
2024-03-11 21:59 ` [PULL 15/34] migration: Fix format in error message peterx
2024-03-11 21:59 ` [PULL 16/34] migration: export fewer options peterx
2024-03-11 21:59 ` [PULL 17/34] migration: remove migration.h references peterx
2024-03-11 21:59 ` [PULL 18/34] migration: export migration_is_setup_or_active peterx
2024-03-11 21:59 ` [PULL 19/34] migration: export migration_is_active peterx
2024-03-11 21:59 ` [PULL 20/34] migration: export migration_is_running peterx
2024-03-11 21:59 ` [PULL 21/34] migration: export vcpu_dirty_limit_period peterx
2024-03-11 21:59 ` [PULL 22/34] migration: migration_thread_is_self peterx
2024-03-11 21:59 ` [PULL 23/34] migration: migration_is_device peterx
2024-03-11 21:59 ` [PULL 24/34] migration: migration_file_set_error peterx
2024-03-11 21:59 ` [PULL 25/34] migration: privatize colo interfaces peterx
2024-03-11 21:59 ` [PULL 26/34] migration: delete unused accessors peterx
2024-03-11 21:59 ` [PULL 27/34] migration: purge MigrationState from public interface peterx
2024-03-11 21:59 ` [PULL 28/34] migration/multifd: Allow zero pages in file migration peterx
2024-03-11 21:59 ` [PULL 29/34] migration/multifd: Allow clearing of the file_bmap from multifd peterx
2024-03-11 21:59 ` [PULL 30/34] migration/multifd: Add new migration option zero-page-detection peterx
2024-03-11 21:59 ` [PULL 31/34] migration/multifd: Implement zero page transmission on the multifd thread peterx
2024-03-11 21:59 ` [PULL 32/34] migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page peterx
2024-03-11 21:59 ` [PULL 33/34] migration/multifd: Enable multifd zero page checking by default peterx
2024-03-11 21:59 ` [PULL 34/34] migration/multifd: Add new migration test cases for legacy zero page checking peterx
2024-03-12 13:07 ` [PULL 00/34] Migration 20240311 patches Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240311215925.40618-2-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=david@redhat.com \
    --cc=farosas@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).