qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: qemu-block@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>,
	Fam Zheng <fam@euphon.net>, Cleber Rosa <crosa@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	Li Zhijian <lizhijian@fujitsu.com>, Peter Xu <peterx@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	John Snow <jsnow@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	Leonardo Bras <leobras@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Fabiano Rosas <farosas@suse.de>, Thomas Huth <thuth@redhat.com>,
	Fiona Ebner <f.ebner@proxmox.com>
Subject: [PULL 12/38] migration: hold the BQL during setup
Date: Mon, 16 Oct 2023 12:06:40 +0200	[thread overview]
Message-ID: <20231016100706.2551-13-quintela@redhat.com> (raw)
In-Reply-To: <20231016100706.2551-1-quintela@redhat.com>

From: Fiona Ebner <f.ebner@proxmox.com>

This is intended to be a semantic revert of commit 9b09503752
("migration: run setup callbacks out of big lock"). There have been so
many changes since that commit (e.g. a new setup callback
dirty_bitmap_save_setup() that also needs to be adapted now), it's
easier to do the revert manually.

For snapshots, the bdrv_writev_vmstate() function is used during setup
(in QIOChannelBlock backing the QEMUFile), but not holding the BQL
while calling it could lead to an assertion failure. To understand
how, first note the following:

1. Generated coroutine wrappers for block layer functions spawn the
coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
2. If the host OS switches threads at an inconvenient time, it can
happen that a bottom half scheduled for the main thread's AioContext
is executed as part of a vCPU thread's aio_poll().

An example leading to the assertion failure is as follows:

main thread:
1. A snapshot-save QMP command gets issued.
2. snapshot_save_job_bh() is scheduled.

vCPU thread:
3. aio_poll() for the main thread's AioContext is called (e.g. when
the guest writes to a pflash device, as part of blk_pwrite which is a
generated coroutine wrapper).
4. snapshot_save_job_bh() is executed as part of aio_poll().
3. qemu_savevm_state() is called.
4. qemu_mutex_unlock_iothread() is called. Now
qemu_get_current_aio_context() returns 0x0.
5. bdrv_writev_vmstate() is executed during the usual savevm setup
via qemu_fflush(). But this function is a generated coroutine wrapper,
so it uses AIO_WAIT_WHILE. There, the assertion
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
will fail.

To fix it, ensure that the BQL is held during setup. While it would
only be needed for snapshots, adapting migration too avoids additional
logic for conditional locking/unlocking in the setup callbacks.
Writing the header could (in theory) also trigger qemu_fflush() and
thus bdrv_writev_vmstate(), so the locked section also covers the
qemu_savevm_state_header() call, even for migration for consistency.

The section around multifd_send_sync_main() needs to be unlocked to
avoid a deadlock. In particular, the multifd_save_setup() function calls
socket_send_channel_create() using multifd_new_send_channel_async() as a
callback and then waits for the callback to signal via the
channels_ready semaphore. The connection happens via
qio_task_run_in_thread(), but the callback is only executed via
qio_task_thread_result() which is scheduled for the main event loop.
Without unlocking the section, the main thread would never get to
process the task result and the callback meaning there would be no
signal via the channels_ready semaphore.

The comment in ram_init_bitmaps() was introduced by 4987783400
("migration: fix incorrect memory_global_dirty_log_start outside BQL")
and is removed, because it referred to the qemu_mutex_lock_iothread()
call.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>
---
 include/migration/register.h   | 2 +-
 migration/block-dirty-bitmap.c | 3 ---
 migration/block.c              | 5 -----
 migration/migration.c          | 6 ++++++
 migration/ram.c                | 6 +++---
 migration/savevm.c             | 2 --
 6 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 2b12c6adec..fed1d04a3c 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -25,6 +25,7 @@ typedef struct SaveVMHandlers {
      * used to perform early checks.
      */
     int (*save_prepare)(void *opaque, Error **errp);
+    int (*save_setup)(QEMUFile *f, void *opaque);
     void (*save_cleanup)(void *opaque);
     int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
     int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
@@ -50,7 +51,6 @@ typedef struct SaveVMHandlers {
     int (*save_live_iterate)(QEMUFile *f, void *opaque);
 
     /* This runs outside the iothread lock!  */
-    int (*save_setup)(QEMUFile *f, void *opaque);
     /* Note for save_live_pending:
      * must_precopy:
      * - must be migrated in precopy or in stopped state
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 032fc5f405..03cb2e72ee 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1214,9 +1214,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
     DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms = NULL;
 
-    qemu_mutex_lock_iothread();
     if (init_dirty_bitmap_migration(s) < 0) {
-        qemu_mutex_unlock_iothread();
         return -1;
     }
 
@@ -1224,7 +1222,6 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
         send_bitmap_start(f, s, dbms);
     }
     qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
-    qemu_mutex_unlock_iothread();
     return 0;
 }
 
diff --git a/migration/block.c b/migration/block.c
index 5f930870a5..7cf70c1066 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -729,18 +729,13 @@ static int block_save_setup(QEMUFile *f, void *opaque)
     trace_migration_block_save("setup", block_mig_state.submitted,
                                block_mig_state.transferred);
 
-    qemu_mutex_lock_iothread();
     ret = init_blk_migration(f);
     if (ret < 0) {
-        qemu_mutex_unlock_iothread();
         return ret;
     }
 
     /* start track dirty blocks */
     ret = set_dirty_tracking();
-
-    qemu_mutex_unlock_iothread();
-
     if (ret) {
         return ret;
     }
diff --git a/migration/migration.c b/migration/migration.c
index 98151b1424..79fa11e3f6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3020,7 +3020,9 @@ static void *migration_thread(void *opaque)
     object_ref(OBJECT(s));
     update_iteration_initial_status(s);
 
+    qemu_mutex_lock_iothread();
     qemu_savevm_state_header(s->to_dst_file);
+    qemu_mutex_unlock_iothread();
 
     /*
      * If we opened the return path, we need to make sure dst has it
@@ -3048,7 +3050,9 @@ static void *migration_thread(void *opaque)
         qemu_savevm_send_colo_enable(s->to_dst_file);
     }
 
+    qemu_mutex_lock_iothread();
     qemu_savevm_state_setup(s->to_dst_file);
+    qemu_mutex_unlock_iothread();
 
     qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
                                MIGRATION_STATUS_ACTIVE);
@@ -3159,8 +3163,10 @@ static void *bg_migration_thread(void *opaque)
     ram_write_tracking_prepare();
 #endif
 
+    qemu_mutex_lock_iothread();
     qemu_savevm_state_header(s->to_dst_file);
     qemu_savevm_state_setup(s->to_dst_file);
+    qemu_mutex_unlock_iothread();
 
     qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
                                MIGRATION_STATUS_ACTIVE);
diff --git a/migration/ram.c b/migration/ram.c
index e8df4dc862..d3d9c8b65b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2891,8 +2891,6 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
 
 static void ram_init_bitmaps(RAMState *rs)
 {
-    /* For memory_global_dirty_log_start below.  */
-    qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
 
     WITH_RCU_READ_LOCK_GUARD() {
@@ -2904,7 +2902,6 @@ static void ram_init_bitmaps(RAMState *rs)
         }
     }
     qemu_mutex_unlock_ramlist();
-    qemu_mutex_unlock_iothread();
 
     /*
      * After an eventual first bitmap sync, fixup the initial bitmap
@@ -3067,7 +3064,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
     migration_ops = g_malloc0(sizeof(MigrationOps));
     migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+
+    qemu_mutex_unlock_iothread();
     ret = multifd_send_sync_main(f);
+    qemu_mutex_lock_iothread();
     if (ret < 0) {
         return ret;
     }
diff --git a/migration/savevm.c b/migration/savevm.c
index bce698b0af..8622f229e5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1672,10 +1672,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     }
     ms->to_dst_file = f;
 
-    qemu_mutex_unlock_iothread();
     qemu_savevm_state_header(f);
     qemu_savevm_state_setup(f);
-    qemu_mutex_lock_iothread();
 
     while (qemu_file_get_error(f) == 0) {
         if (qemu_savevm_state_iterate(f, false) > 0) {
-- 
2.41.0



  parent reply	other threads:[~2023-10-16 10:08 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 10:06 [PULL 00/38] Migration 20231016 patches Juan Quintela
2023-10-16 10:06 ` [PULL 01/38] migration: refactor migration_completion Juan Quintela
2023-10-16 10:06 ` [PULL 02/38] migration: Use g_autofree to simplify ram_dirty_bitmap_reload() Juan Quintela
2023-10-16 10:06 ` [PULL 03/38] migration: Allow user to specify available switchover bandwidth Juan Quintela
2023-10-16 10:06 ` [PULL 04/38] migration: fix RAMBlock add NULL check Juan Quintela
2023-10-16 10:06 ` [PULL 05/38] migration: Add the configuration vmstate to the json writer Juan Quintela
2023-10-16 10:06 ` [PULL 06/38] migration: Fix analyze-migration.py 'configuration' parsing Juan Quintela
2023-10-16 10:06 ` [PULL 07/38] migration: Add capability parsing to analyze-migration.py Juan Quintela
2023-10-16 10:06 ` [PULL 08/38] migration: Fix analyze-migration.py when ignore-shared is used Juan Quintela
2023-10-16 10:06 ` [PULL 09/38] migration: Fix analyze-migration read operation signedness Juan Quintela
2023-10-16 10:06 ` [PULL 10/38] tests/qtest/migration: Add a test for the analyze-migration script Juan Quintela
2023-10-16 10:06 ` [PULL 11/38] tests/qtest: migration-test: Add tests for file-based migration Juan Quintela
2023-10-16 18:25   ` Fabiano Rosas
2023-10-17  7:21     ` Juan Quintela
2023-10-17 12:30       ` Fabiano Rosas
2023-10-17 12:55         ` Juan Quintela
2023-10-17 13:19           ` Fabiano Rosas
2023-10-17 13:30             ` Juan Quintela
2023-10-16 10:06 ` Juan Quintela [this message]
2023-10-16 10:06 ` [PULL 13/38] migration: Non multifd migration don't care about multifd flushes Juan Quintela
2023-10-16 10:06 ` [PULL 14/38] migration: Create migrate_rdma() Juan Quintela
2023-10-16 10:06 ` [PULL 15/38] migration/rdma: Unfold ram_control_before_iterate() Juan Quintela
2023-10-16 10:06 ` [PULL 16/38] migration/rdma: Unfold ram_control_after_iterate() Juan Quintela
2023-10-16 10:06 ` [PULL 17/38] migration/rdma: Remove all uses of RAM_CONTROL_HOOK Juan Quintela
2023-10-16 10:06 ` [PULL 18/38] migration/rdma: Unfold hook_ram_load() Juan Quintela
2023-10-16 10:06 ` [PULL 19/38] migration/rdma: Create rdma_control_save_page() Juan Quintela
2023-10-16 10:06 ` [PULL 20/38] qemu-file: Remove QEMUFileHooks Juan Quintela
2023-10-16 10:06 ` [PULL 21/38] migration/rdma: Move rdma constants from qemu-file.h to rdma.h Juan Quintela
2023-10-16 10:06 ` [PULL 22/38] migration/rdma: Remove qemu_ prefix from exported functions Juan Quintela
2023-10-16 10:06 ` [PULL 23/38] migration/rdma: Check sooner if we are in postcopy for save_page() Juan Quintela
2023-10-16 10:06 ` [PULL 24/38] migration/rdma: Use i as for index instead of idx Juan Quintela
2023-10-16 10:06 ` [PULL 25/38] migration/rdma: Declare for index variables local Juan Quintela
2023-10-16 10:06 ` [PULL 26/38] migration/rdma: Remove all "ret" variables that are used only once Juan Quintela
2023-10-16 10:06 ` [PULL 27/38] migration: Improve json and formatting Juan Quintela
2023-10-16 10:06 ` [PULL 28/38] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Juan Quintela
2023-10-16 10:06 ` [PULL 29/38] multifd: fix counters in multifd_send_thread Juan Quintela
2023-10-16 10:06 ` [PULL 30/38] multifd: reset next_packet_len after sending pages Juan Quintela
2023-10-16 10:06 ` [PULL 31/38] migration/ram: Refactor precopy ram loading code Juan Quintela
2023-10-16 10:07 ` [PULL 32/38] migration/ram: Remove RAMState from xbzrle_cache_zero_page Juan Quintela
2023-10-16 10:07 ` [PULL 33/38] migration/ram: Stop passing QEMUFile around in save_zero_page Juan Quintela
2023-10-16 10:07 ` [PULL 34/38] migration/ram: Move xbzrle zero page handling into save_zero_page Juan Quintela
2023-10-16 10:07 ` [PULL 35/38] migration/ram: Merge save_zero_page functions Juan Quintela
2023-10-16 10:07 ` [PULL 36/38] migration/multifd: Remove direct "socket" references Juan Quintela
2023-10-16 10:07 ` [PULL 37/38] migration/multifd: Unify multifd_send_thread error paths Juan Quintela
2023-10-16 10:07 ` [PULL 38/38] migration/multifd: Clarify Error usage in multifd_channel_connect Juan Quintela
2023-10-16 16:31 ` [PULL 00/38] Migration 20231016 patches Stefan Hajnoczi
2023-10-16 17:13   ` Fabiano Rosas
2023-10-16 19:18     ` Stefan Hajnoczi
2023-10-16 20:31       ` Fabiano Rosas
2023-10-17  7:24   ` Juan Quintela
2023-10-17  8:20     ` Thomas Huth
  -- strict thread matches above, loose matches on Subject: below --
2023-10-17  8:29 [PULL 00/38] Migration 20231017 patches Juan Quintela
2023-10-17  8:29 ` [PULL 12/38] migration: hold the BQL during setup Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231016100706.2551-13-quintela@redhat.com \
    --to=quintela@redhat.com \
    --cc=armbru@redhat.com \
    --cc=crosa@redhat.com \
    --cc=eblake@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=fam@euphon.net \
    --cc=farosas@suse.de \
    --cc=jsnow@redhat.com \
    --cc=leobras@redhat.com \
    --cc=lizhijian@fujitsu.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).