Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, berrange@redhat.com, armbru@redhat.com,
	Claudio Fontana <cfontana@suse.de>
Subject: Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration
Date: Tue, 27 Feb 2024 11:52:47 +0800	[thread overview]
Message-ID: <Zd1cj4jkIpUktu6k@x1n> (raw)
In-Reply-To: <87y1b6alej.fsf@suse.de>

On Mon, Feb 26, 2024 at 07:52:20PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Tue, Feb 20, 2024 at 07:41:26PM -0300, Fabiano Rosas wrote:
> >> The fixed-ram migration can be performed live or non-live, but it is
> >> always asynchronous, i.e. the source machine and the destination
> >> machine are not migrating at the same time. We only need some pieces
> >> of the multifd sync operations.
> >> 
> >> multifd_send_sync_main()
> >> ------------------------
> >>   Issued by the ram migration code on the migration thread, causes the
> >>   multifd send channels to synchronize with the migration thread and
> >>   makes the sending side emit a packet with the MULTIFD_FLUSH flag.
> >> 
> >>   With fixed-ram we want to maintain the sync on the sending side
> >>   because that provides ordering between the rounds of dirty pages when
> >>   migrating live.
> >> 
> >> MULTIFD_FLUSH
> >> -------------
> >>   On the receiving side, the presence of the MULTIFD_FLUSH flag on a
> >>   packet causes the receiving channels to start synchronizing with the
> >>   main thread.
> >> 
> >>   We're not using packets with fixed-ram, so there's no MULTIFD_FLUSH
> >>   flag and therefore no channel sync on the receiving side.
> >> 
> >> multifd_recv_sync_main()
> >> ------------------------
> >>   Issued by the migration thread when the ram migration flag
> >>   RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
> >>   on the receiving side to start synchronizing with the recv
> >>   channels. Due to compatibility, this is also issued when
> >>   RAM_SAVE_FLAG_EOS is received.
> >> 
> >>   For fixed-ram we only need to synchronize the channels at the end of
> >>   migration to avoid doing cleanup before the channels have finished
> >>   their IO.
> >> 
> >> Make sure the multifd syncs are only issued at the appropriate
> >> times. Note that due to pre-existing backward compatibility issues, we
> >> have the multifd_flush_after_each_section property that enables an
> >> older behavior of synchronizing channels more frequently (and
> >> inefficiently). Fixed-ram should always run with that property
> >> disabled (default).
> >
> > What if the user enables multifd_flush_after_each_section=true?
> >
> > IMHO we don't necessarily need to attach the fixed-ram loading flush to any
> > flag in the stream.  For fixed-ram IIUC all the loads will happen in one
> > shot of ram_load() anyway when parsing the ramblock list, so.. how about we
> > decouple the fixed-ram load flush from the stream by always do a sync in
> > ram_load() unconditionally?
> 
> I would like to. But it's not possible because ram_load() is called once
> per section. So once for each EOS flag on the stream. We'll have at
> least two calls to ram_load(), once due to qemu_savevm_state_iterate()
> and another due to qemu_savevm_state_complete_precopy().
> 
> The fact that fixed-ram can use just one load doesn't change the fact
> that we perform more than one "save". So we'll need to use the FLUSH
> flag in this case unfortunately.

After I re-read it, I found one more issue.

Now recv side sync is "once and for all" - it doesn't allow a second time
to sync_main because it syncs only until quits.  That is IMHO making the
code much harder to maintain, and we'll need rich comment to explain why is
that happening.

Ideally any "sync main" for recv threads can be called multiple times.  And
IMHO it's not really hard.  Actually it can make the code much cleaner by
merging some logic between socket-based and file-based from that regard.

I tried to play with your branch and propose something like this, just to
show what I meant. This should allow all new fixed-ram test to pass here,
meanwhile it should allow sync main on recv side to be re-entrant, sharing
the logic with socket-based as much as possible:

=====
diff --git a/migration/multifd.c b/migration/multifd.c
index a0202b5661..28480f6cfe 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -86,10 +86,8 @@ struct {
     /* number of created threads */
     int count;
     /*
-     * For sockets: this is posted once for each MULTIFD_FLAG_SYNC flag.
-     *
-     * For files: this is only posted at the end of the file load to mark
-     *            completion of the load process.
+     * This is always posted by the recv threads, the main thread uses it
+     * to wait for recv threads to finish assigned tasks.
      */
     QemuSemaphore sem_sync;
     /* global number of generated multifd packets */
@@ -1316,38 +1314,55 @@ void multifd_recv_cleanup(void)
     multifd_recv_cleanup_state();
 }
 
-
-/*
- * Wait until all channels have finished receiving data. Once this
- * function returns, cleanup routines are safe to run.
- */
-static void multifd_file_recv_sync(void)
+static void multifd_recv_file_sync_request(void)
 {
     int i;
 
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
-        trace_multifd_recv_sync_main_wait(p->id);
-
+        /*
+         * We play a trick here: instead of using a separate pending_sync
+         * to send a sync request (like what we do on senders), we simply
+         * kick the recv thread once without setting pending_job.
+         *
+         * If there's already a pending_job, the thread will only see it
+         * after it processed the current.  If there's no pending_job,
+         * it'll see this immediately.
+         */
         qemu_sem_post(&p->sem);
-
         trace_multifd_recv_sync_main_signal(p->id);
-        qemu_sem_wait(&p->sem_sync);
     }
-    return;
 }
 
+/*
+ * Request a sync for all the multifd recv threads.
+ *
+ * For socket-based, sync request is much more complicated, which relies on
+ * collaborations between both explicit RAM_SAVE_FLAG_MULTIFD_FLUSH in the
+ * main stream, and MULTIFD_FLAG_SYNC flag in per-channel protocol.  Here
+ * it should be invoked by the main stream request.
+ *
+ * For file-based, it is much simpler, because there's no need for a strong
+ * sync semantics between the main thread and the recv threads.  What we
+ * need is only to make sure all recv threads finished their tasks.
+ */
 void multifd_recv_sync_main(void)
 {
+    bool file_based = !multifd_use_packets();
     int i;
 
     if (!migrate_multifd()) {
         return;
     }
 
-    if (!multifd_use_packets()) {
-        return multifd_file_recv_sync();
+    if (file_based) {
+        /*
+         * File-based multifd requires an explicit sync request because
+         * tasks are assigned by the main recv thread, rather than parsed
+         * through the multifd channels.
+         */
+        multifd_recv_file_sync_request();
     }
 
     for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -1356,6 +1371,11 @@ void multifd_recv_sync_main(void)
         trace_multifd_recv_sync_main_wait(p->id);
         qemu_sem_wait(&multifd_recv_state->sem_sync);
     }
+
+    if (file_based) {
+        return;
+    }
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
@@ -1420,11 +1440,12 @@ static void *multifd_recv_thread(void *opaque)
             }
 
             /*
-             * Migration thread did not send work, break and signal
-             * sem_sync so it knows we're not lagging behind.
+             * Migration thread did not send work, this emulates
+             * pending_sync, post sem_sync to notify the main thread.
              */
             if (!qatomic_read(&p->pending_job)) {
-                break;
+                qemu_sem_post(&multifd_recv_state->sem_sync);
+                continue;
             }
 
             has_data = !!p->data->size;
@@ -1449,10 +1470,6 @@ static void *multifd_recv_thread(void *opaque)
         }
     }
 
-    if (!use_packets) {
-        qemu_sem_post(&p->sem_sync);
-    }
-
     if (local_err) {
         multifd_recv_terminate_threads(local_err);
         error_free(local_err);

==========

Note that I used multifd_recv_state->sem_sync to send the message rather
than p->sem, not only because socket-based has similar logic on using that
sem, but also because main thread shouldn't care about "which" recv thread
has finished, but "all recv threads are idle".

Do you think this should work out for us in a nicer way?

Then we talk about the other issue, on whether we should rely on migration
stream to flush recv threads.  My answer is still hopefully a no.

In the ideal case, fixed-ram image format should even be tailed to not use
a live stream protocol.  For example, currently during ram iterations we
should flush quite a lot of ram QEMU_VM_SECTION_PART sections contains
mostly rubbish but then ending that with RAM_SAVE_FLAG_EOS. Then we keep
doing this in the iteration loop.  Here the real meat is during processing
of QEMU_VM_SECTION_PART, the src QEMU will update the guest pages with
fixed offsets in the file.  That however doesn't really contribute to
anything valuable in the migration stream itself (things sent over
to_dst_file).

AFAIU we chose to still use that logic only for simplicity, even if we know
those EOSs and all RAM streams are garbage.  Now we tend to add one
dependency on part of the garbage, which is RAM_SAVE_FLAG_MULTIFD_FLUSH in
this case; which is useful in socket-based but shouldn't be necessary for
file.

I think I have a solution besides ram_load(): ultimately fixed-ram stores
all guest mem in the QEMU_VM_SECTION_START section of the ram, through all
of the RAM_SAVE_FLAG_MEM_SIZE (which leads to parse_ramblocks()).  If so,
perhaps we can do one shot sync for file at the end of parse_ramblocks()?
Then we decouple sync_main on recv for file-based completely against all
stream flags.

-- 
Peter Xu

next prev parent reply	other threads:[~2024-02-27  3:53 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport Fabiano Rosas
2024-02-23  3:01   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 02/34] tests/qtest/migration: Rename fd_proto test Fabiano Rosas
2024-02-23  3:03   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 03/34] tests/qtest/migration: Add a fd + file test Fabiano Rosas
2024-02-23  3:08   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 04/34] migration/multifd: Remove p->quit from recv side Fabiano Rosas
2024-02-23  3:13   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 05/34] migration/multifd: Release recv sem_sync earlier Fabiano Rosas
2024-02-23  3:16   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 06/34] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 07/34] io: Add generic pwritev/preadv interface Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 08/34] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 09/34] io: fsync before closing a file channel Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 10/34] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2024-02-21  8:41   ` Markus Armbruster
2024-02-21 13:24     ` Fabiano Rosas
2024-02-21 13:50       ` Daniel P. Berrangé
2024-02-21 15:05         ` Fabiano Rosas
2024-02-26  3:07   ` Peter Xu
2024-02-26  3:22   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 12/34] migration: Add fixed-ram URI compatibility check Fabiano Rosas
2024-02-26  3:11   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 13/34] migration/ram: Add outgoing 'fixed-ram' migration Fabiano Rosas
2024-02-26  4:03   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 14/34] migration/ram: Add incoming " Fabiano Rosas
2024-02-26  5:19   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 15/34] tests/qtest/migration: Add tests for fixed-ram file-based migration Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 16/34] migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 17/34] migration/multifd: Decouple recv method from pages Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 18/34] migration/multifd: Allow multifd without packets Fabiano Rosas
2024-02-26  5:57   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 19/34] migration/multifd: Allow receiving pages " Fabiano Rosas
2024-02-26  6:58   ` Peter Xu
2024-02-26 19:19     ` Fabiano Rosas
2024-02-26 20:54       ` Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2024-02-26  7:10   ` Peter Xu
2024-02-26  7:21   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 21/34] migration/multifd: Add incoming " Fabiano Rosas
2024-02-26  7:34   ` Peter Xu
2024-02-26  7:53     ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration Fabiano Rosas
2024-02-26  7:47   ` Peter Xu
2024-02-26 22:52     ` Fabiano Rosas
2024-02-27  3:52       ` Peter Xu [this message]
2024-02-27 14:00         ` Fabiano Rosas
2024-02-27 23:46           ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 23/34] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2024-02-26  8:08   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 24/34] migration/multifd: Support incoming " Fabiano Rosas
2024-02-26  8:30   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 25/34] migration/multifd: Add fixed-ram support to fd: URI Fabiano Rosas
2024-02-26  8:37   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 26/34] tests/qtest/migration: Add a multifd + fixed-ram migration test Fabiano Rosas
2024-02-26  8:42   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 27/34] migration: Add direct-io parameter Fabiano Rosas
2024-02-21  9:17   ` Markus Armbruster
2024-02-26  8:50   ` Peter Xu
2024-02-26 13:28     ` Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 28/34] migration/multifd: Add direct-io support Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 29/34] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 30/34] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
2024-02-21  9:20   ` Markus Armbruster
2024-02-20 22:41 ` [PATCH v4 31/34] monitor: Extract fdset fd flags comparison into a function Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2024-02-21  9:27   ` Markus Armbruster
2024-02-21 13:37     ` Fabiano Rosas
2024-02-22  6:56       ` Markus Armbruster
2024-02-22 13:26         ` Fabiano Rosas
2024-02-22 14:44           ` Markus Armbruster
2024-02-20 22:41 ` [PATCH v4 33/34] migration: Add support for fdset with multifd + file Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 34/34] tests/qtest/migration: Add a test for fixed-ram with passing of fds Fabiano Rosas
2024-02-23  2:59 ` [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Peter Xu
2024-02-23 13:48   ` Claudio Fontana
2024-02-23 14:22   ` Fabiano Rosas
2024-02-26  6:15 ` Peter Xu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a0202b566 dfblob:28480f6cf )
 OR (
bs:"Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zd1cj4jkIpUktu6k@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=farosas@suse.de \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).