[PATCH v4 00/34] migration: File based migration with multifd and fixed-ram

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
@ 2024-02-20 22:41 Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport Fabiano Rosas
                   ` (35 more replies)
  0 siblings, 36 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Hi,

In this v4:

- Added support for 'fd:'. With fixed-ram, that comes free by the
  existing routing to file.c. With multifd I added a loop to create
  the channels.

- Dropped support for direct-io with fixed-ram _without_ multifd. This
  is something I said I would do for this version, but I had to drop
  it because performance is really bad. I think the single-threaded
  precopy code cannot cope with the extra latency/synchronicity of
  O_DIRECT.

- Dropped QIOTask related changes. The file migration now calls
  multifd_channel_connect() directly. Any error can now be returned
  all the way up to migrate_fd_connect(). We can also skip the
  channels_created semaphore logic when using fixed-ram.

- Moved the pwritev_read_contiguous code into a migration-specific
  file and dropped the write_base trick.

- Reduced the number of syncs to just one every ram iteration and one
  at the end on the send side; and a single one at the end on the recv
  side. The EOS flag cannot be skipped because it is used in control
  flow at ram_load_precopy.

The rest are minor changes, I have noted them in the patches
themselves.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1183853433

Series structure
================

This series enables fixed-ram in steps:

0) Cleanups                           [1-5]
1) QIOChannel interfaces              [6-10]
2) Fixed-ram format for precopy       [11-15]
3) Multifd adaptation without packets [16-19]
4) Fixed-ram format for multifd       [20-26]
5) Direct-io generic support          [27]
6) Direct-io for fixed-ram multifd with file: URI  [28-29]
7) Fdset interface for fixed-ram multifd  [30-34]

The majority of changes for this version are at step 3 due to the
rebase on top of the recent multifd cleanups.

Please take a look at the later patches in the series, step 5 onwards.

About fixed-ram
===============

Fixed-ram is a new stream format for the RAM section designed to
supplement the existing ``file:`` migration and make it compatible
with ``multifd``. This enables parallel migration of a guest's RAM to
a file.

The core of the feature is to ensure that each RAM page has a specific
offset in the resulting migration file. This enables the ``multifd``
threads to write exclusively to those offsets even if the guest is
constantly dirtying pages (i.e. live migration).

Another benefit is that the resulting file will have a bounded size,
since pages which are dirtied multiple times will always go to a fixed
location in the file, rather than constantly being added to a
sequential stream.

Having the pages at fixed offsets also allows the usage of O_DIRECT
for save/restore of the migration stream as the pages are ensured to
be written respecting O_DIRECT alignment restrictions.

Latest numbers
==============

=> guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
=> host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
=> pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
   from query-migrate

non-live           | time (ms)   pages/s   mb/s   MB/s
-------------------+-----------------------------------
file               |    110512    256258   9549   1193
  + bg-snapshot    |    245660    119581   4303    537
-------------------+-----------------------------------
fixed-ram          |    157975    216877   6672    834
  + multifd 8 ch.  |     95922    292178  10982   1372
     + direct-io   |     23268   1936897  45330   5666
-------------------------------------------------------

live               | time (ms)   pages/s   mb/s   MB/s
-------------------+-----------------------------------
file               |         -         -      -      - (file grew 4x the VM size)
  + bg-snapshot    |    357635    141747   2974    371
-------------------+-----------------------------------
fixed-ram          |         -         -      -      - (no convergence in 5 min)
  + multifd 8 ch.  |    230812    497551  14900   1862
     + direct-io   |     27475   1788025  46736   5842
-------------------------------------------------------

Previous versions of this patchset have shown performance closer to
disk saturation, but due to the query-migrate bug[1] it's hard to be
confident in the previous numbers. I don't discard the possibility of
a performance regression, but for now I can't spot anything that could
have caused it.

1- https://lore.kernel.org/r/20240219194457.26923-1-farosas@suse.de

v3:
https://lore.kernel.org/r/20231127202612.23012-1-farosas@suse.de
v2:
https://lore.kernel.org/r/20231023203608.26370-1-farosas@suse.de
v1:
https://lore.kernel.org/r/20230330180336.2791-1-farosas@suse.de

Fabiano Rosas (31):
  docs/devel/migration.rst: Document the file transport
  tests/qtest/migration: Rename fd_proto test
  tests/qtest/migration: Add a fd + file test
  migration/multifd: Remove p->quit from recv side
  migration/multifd: Release recv sem_sync earlier
  io: fsync before closing a file channel
  migration/qemu-file: add utility methods for working with seekable
    channels
  migration/ram: Introduce 'fixed-ram' migration capability
  migration: Add fixed-ram URI compatibility check
  migration/ram: Add outgoing 'fixed-ram' migration
  migration/ram: Add incoming 'fixed-ram' migration
  tests/qtest/migration: Add tests for fixed-ram file-based migration
  migration/multifd: Rename MultiFDSend|RecvParams::data to
    compress_data
  migration/multifd: Decouple recv method from pages
  migration/multifd: Allow multifd without packets
  migration/multifd: Allow receiving pages without packets
  migration/multifd: Add outgoing QIOChannelFile support
  migration/multifd: Add incoming QIOChannelFile support
  migration/multifd: Prepare multifd sync for fixed-ram migration
  migration/multifd: Support outgoing fixed-ram stream format
  migration/multifd: Support incoming fixed-ram stream format
  migration/multifd: Add fixed-ram support to fd: URI
  tests/qtest/migration: Add a multifd + fixed-ram migration test
  migration: Add direct-io parameter
  migration/multifd: Add direct-io support
  tests/qtest/migration: Add tests for file migration with direct-io
  monitor: Honor QMP request for fd removal immediately
  monitor: Extract fdset fd flags comparison into a function
  monitor: fdset: Match against O_DIRECT
  migration: Add support for fdset with multifd + file
  tests/qtest/migration: Add a test for fixed-ram with passing of fds

Nikolay Borisov (3):
  io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  io: Add generic pwritev/preadv interface
  io: implement io_pwritev/preadv for QIOChannelFile

 docs/devel/migration/features.rst   |   1 +
 docs/devel/migration/fixed-ram.rst  | 137 +++++++++
 docs/devel/migration/main.rst       |  22 ++
 include/exec/ramblock.h             |  13 +
 include/io/channel.h                |  83 ++++++
 include/migration/qemu-file-types.h |   2 +
 include/qemu/bitops.h               |  13 +
 include/qemu/osdep.h                |   2 +
 io/channel-file.c                   |  69 +++++
 io/channel.c                        |  58 ++++
 migration/fd.c                      |  30 ++
 migration/fd.h                      |   1 +
 migration/file.c                    | 258 +++++++++++++++-
 migration/file.h                    |   9 +
 migration/migration-hmp-cmds.c      |  11 +
 migration/migration.c               |  68 ++++-
 migration/multifd-zlib.c            |  26 +-
 migration/multifd-zstd.c            |  26 +-
 migration/multifd.c                 | 436 +++++++++++++++++++++-------
 migration/multifd.h                 |  27 +-
 migration/options.c                 |  66 +++++
 migration/options.h                 |   2 +
 migration/qemu-file.c               | 106 +++++++
 migration/qemu-file.h               |   6 +
 migration/ram.c                     | 333 ++++++++++++++++++++-
 migration/ram.h                     |   1 +
 migration/savevm.c                  |   1 +
 monitor/fds.c                       |  27 +-
 qapi/migration.json                 |  24 +-
 tests/qtest/migration-helpers.c     |  42 +++
 tests/qtest/migration-helpers.h     |   1 +
 tests/qtest/migration-test.c        | 303 ++++++++++++++++++-
 util/osdep.c                        |   9 +
 33 files changed, 2041 insertions(+), 172 deletions(-)
 create mode 100644 docs/devel/migration/fixed-ram.rst

-- 
2.35.3



^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-23  3:01   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 02/34] tests/qtest/migration: Rename fd_proto test Fabiano Rosas
                   ` (34 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

When adding the support for file migration with the file: transport,
we missed adding documentation for it.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/migration/main.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 331252a92c..8024275d6d 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -41,6 +41,10 @@ over any transport.
 - exec migration: do the migration using the stdin/stdout through a process.
 - fd migration: do the migration using a file descriptor that is
   passed to QEMU.  QEMU doesn't care how this file descriptor is opened.
+- file migration: do the migration using a file that is passed to QEMU
+  by path. A file offset option is supported to allow a management
+  application to add its own metadata to the start of the file without
+  QEMU interference.
 
 In addition, support is included for migration using RDMA, which
 transports the page data using ``RDMA``, where the hardware takes care of
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 02/34] tests/qtest/migration: Rename fd_proto test
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-23  3:03   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 03/34] tests/qtest/migration: Add a fd + file test Fabiano Rosas
                   ` (33 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

Next patch adds another fd test. Rename the existing one closer to
what's used on other tests, with the 'precopy' prefix.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-test.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 8a5bb1752e..b729ce4d22 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2423,7 +2423,7 @@ static void test_migrate_fd_finish_hook(QTestState *from,
     qobject_unref(rsp);
 }
 
-static void test_migrate_fd_proto(void)
+static void test_migrate_precopy_fd_socket(void)
 {
     MigrateCommon args = {
         .listen_uri = "defer",
@@ -3527,7 +3527,8 @@ int main(int argc, char **argv)
 
     /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */
 #ifndef _WIN32
-    migration_test_add("/migration/fd_proto", test_migrate_fd_proto);
+    migration_test_add("/migration/precopy/fd/tcp",
+                       test_migrate_precopy_fd_socket);
 #endif
     migration_test_add("/migration/validate_uuid", test_validate_uuid);
     migration_test_add("/migration/validate_uuid_error",
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 03/34] tests/qtest/migration: Add a fd + file test
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 02/34] tests/qtest/migration: Rename fd_proto test Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-23  3:08   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 04/34] migration/multifd: Remove p->quit from recv side Fabiano Rosas
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

The fd URI supports an fd that is backed by a file. The code should
select between QIOChannelFile and QIOChannelSocket, depending on the
type of the fd. Add a test for that.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-test.c | 41 ++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b729ce4d22..83512bce85 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2433,6 +2433,45 @@ static void test_migrate_precopy_fd_socket(void)
     };
     test_precopy_common(&args);
 }
+
+static void *migrate_precopy_fd_file_start(QTestState *from, QTestState *to)
+{
+    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
+    int src_flags = O_CREAT | O_RDWR;
+    int dst_flags = O_CREAT | O_RDWR;
+    int fds[2];
+
+    fds[0] = open(file, src_flags, 0660);
+    assert(fds[0] != -1);
+
+    fds[1] = open(file, dst_flags, 0660);
+    assert(fds[1] != -1);
+
+
+    qtest_qmp_fds_assert_success(to, &fds[0], 1,
+                                 "{ 'execute': 'getfd',"
+                                 "  'arguments': { 'fdname': 'fd-mig' }}");
+
+    qtest_qmp_fds_assert_success(from, &fds[1], 1,
+                                 "{ 'execute': 'getfd',"
+                                 "  'arguments': { 'fdname': 'fd-mig' }}");
+
+    close(fds[0]);
+    close(fds[1]);
+
+    return NULL;
+}
+
+static void test_migrate_precopy_fd_file(void)
+{
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .connect_uri = "fd:fd-mig",
+        .start_hook = migrate_precopy_fd_file_start,
+        .finish_hook = test_migrate_fd_finish_hook
+    };
+    test_file_common(&args, true);
+}
 #endif /* _WIN32 */
 
 static void do_test_validate_uuid(MigrateStart *args, bool should_fail)
@@ -3529,6 +3568,8 @@ int main(int argc, char **argv)
 #ifndef _WIN32
     migration_test_add("/migration/precopy/fd/tcp",
                        test_migrate_precopy_fd_socket);
+    migration_test_add("/migration/precopy/fd/file",
+                       test_migrate_precopy_fd_file);
 #endif
     migration_test_add("/migration/validate_uuid", test_validate_uuid);
     migration_test_add("/migration/validate_uuid_error",
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 04/34] migration/multifd: Remove p->quit from recv side
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (2 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 03/34] tests/qtest/migration: Add a fd + file test Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-23  3:13   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 05/34] migration/multifd: Release recv sem_sync earlier Fabiano Rosas
                   ` (31 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Like we did on the sending side, replace the p->quit per-channel flag
with a global atomic 'exiting' flag.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index adfe8c9a0a..fba00b9e8f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -79,6 +79,19 @@ struct {
     MultiFDMethods *ops;
 } *multifd_send_state;
 
+struct {
+    MultiFDRecvParams *params;
+    /* number of created threads */
+    int count;
+    /* syncs main thread and channels */
+    QemuSemaphore sem_sync;
+    /* global number of generated multifd packets */
+    uint64_t packet_num;
+    int exiting;
+    /* multifd ops */
+    MultiFDMethods *ops;
+} *multifd_recv_state;
+
 /* Multifd without compression */
 
 /**
@@ -440,6 +453,11 @@ static bool multifd_send_should_exit(void)
     return qatomic_read(&multifd_send_state->exiting);
 }
 
+static bool multifd_recv_should_exit(void)
+{
+    return qatomic_read(&multifd_recv_state->exiting);
+}
+
 /*
  * The migration thread can wait on either of the two semaphores.  This
  * function can be used to kick the main thread out of waiting on either of
@@ -1063,24 +1081,16 @@ bool multifd_send_setup(void)
     return true;
 }
 
-struct {
-    MultiFDRecvParams *params;
-    /* number of created threads */
-    int count;
-    /* syncs main thread and channels */
-    QemuSemaphore sem_sync;
-    /* global number of generated multifd packets */
-    uint64_t packet_num;
-    /* multifd ops */
-    MultiFDMethods *ops;
-} *multifd_recv_state;
-
 static void multifd_recv_terminate_threads(Error *err)
 {
     int i;
 
     trace_multifd_recv_terminate_threads(err != NULL);
 
+    if (qatomic_xchg(&multifd_recv_state->exiting, 1)) {
+        return;
+    }
+
     if (err) {
         MigrationState *s = migrate_get_current();
         migrate_set_error(s, err);
@@ -1094,8 +1104,6 @@ static void multifd_recv_terminate_threads(Error *err)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
-        qemu_mutex_lock(&p->mutex);
-        p->quit = true;
         /*
          * We could arrive here for two reasons:
          *  - normal quit, i.e. everything went fine, just finished
@@ -1105,7 +1113,6 @@ static void multifd_recv_terminate_threads(Error *err)
         if (p->c) {
             qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
         }
-        qemu_mutex_unlock(&p->mutex);
     }
 }
 
@@ -1210,7 +1217,7 @@ static void *multifd_recv_thread(void *opaque)
     while (true) {
         uint32_t flags;
 
-        if (p->quit) {
+        if (multifd_recv_should_exit()) {
             break;
         }
 
@@ -1274,6 +1281,7 @@ int multifd_recv_setup(Error **errp)
     multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
     multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
     qatomic_set(&multifd_recv_state->count, 0);
+    qatomic_set(&multifd_recv_state->exiting, 0);
     qemu_sem_init(&multifd_recv_state->sem_sync, 0);
     multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()];
 
@@ -1282,7 +1290,6 @@ int multifd_recv_setup(Error **errp)
 
         qemu_mutex_init(&p->mutex);
         qemu_sem_init(&p->sem_sync, 0);
-        p->quit = false;
         p->id = i;
         p->packet_len = sizeof(MultiFDPacket_t)
                       + sizeof(uint64_t) * page_count;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 05/34] migration/multifd: Release recv sem_sync earlier
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (3 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 04/34] migration/multifd: Remove p->quit from recv side Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-23  3:16   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 06/34] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
                   ` (30 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Now that multifd_recv_terminate_threads() is called only once, release
the recv side sem_sync earlier like we do for the send side.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index fba00b9e8f..43f0820996 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1104,6 +1104,12 @@ static void multifd_recv_terminate_threads(Error *err)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
+        /*
+         * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
+         * however try to wakeup it without harm in cleanup phase.
+         */
+        qemu_sem_post(&p->sem_sync);
+
         /*
          * We could arrive here for two reasons:
          *  - normal quit, i.e. everything went fine, just finished
@@ -1162,12 +1168,6 @@ void multifd_recv_cleanup(void)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
-        /*
-         * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
-         * however try to wakeup it without harm in cleanup phase.
-         */
-        qemu_sem_post(&p->sem_sync);
-
         if (p->thread_created) {
             qemu_thread_join(&p->thread);
         }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 06/34] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (4 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 05/34] migration/multifd: Release recv sem_sync earlier Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 07/34] io: Add generic pwritev/preadv interface Fabiano Rosas
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Nikolay Borisov

From: Nikolay Borisov <nborisov@suse.com>

Add a generic QIOChannel feature SEEKABLE which would be used by the
qemu_file* apis. For the time being this will be only implemented for
file channels.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 include/io/channel.h | 1 +
 io/channel-file.c    | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index 5f9dbaab65..fcb19fd672 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -44,6 +44,7 @@ enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_LISTEN,
     QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
     QIO_CHANNEL_FEATURE_READ_MSG_PEEK,
+    QIO_CHANNEL_FEATURE_SEEKABLE,
 };
 
 
diff --git a/io/channel-file.c b/io/channel-file.c
index 4a12c61886..f91bf6db1c 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -36,6 +36,10 @@ qio_channel_file_new_fd(int fd)
 
     ioc->fd = fd;
 
+    if (lseek(fd, 0, SEEK_CUR) != (off_t)-1) {
+        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
+    }
+
     trace_qio_channel_file_new_fd(ioc, fd);
 
     return ioc;
@@ -60,6 +64,10 @@ qio_channel_file_new_path(const char *path,
         return NULL;
     }
 
+    if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) {
+        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
+    }
+
     trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd);
 
     return ioc;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 07/34] io: Add generic pwritev/preadv interface
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (5 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 06/34] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 08/34] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Nikolay Borisov

From: Nikolay Borisov <nborisov@suse.com>

Introduce basic pwritev/preadv support in the generic channel layer.
Specific implementation will follow for the file channel as this is
required in order to support migration streams with fixed location of
each ram page.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 include/io/channel.h | 82 ++++++++++++++++++++++++++++++++++++++++++++
 io/channel.c         | 58 +++++++++++++++++++++++++++++++
 2 files changed, 140 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index fcb19fd672..7986c49c71 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -131,6 +131,16 @@ struct QIOChannelClass {
                            Error **errp);
 
     /* Optional callbacks */
+    ssize_t (*io_pwritev)(QIOChannel *ioc,
+                          const struct iovec *iov,
+                          size_t niov,
+                          off_t offset,
+                          Error **errp);
+    ssize_t (*io_preadv)(QIOChannel *ioc,
+                         const struct iovec *iov,
+                         size_t niov,
+                         off_t offset,
+                         Error **errp);
     int (*io_shutdown)(QIOChannel *ioc,
                        QIOChannelShutdown how,
                        Error **errp);
@@ -529,6 +539,78 @@ void qio_channel_set_follow_coroutine_ctx(QIOChannel *ioc, bool enabled);
 int qio_channel_close(QIOChannel *ioc,
                       Error **errp);
 
+/**
+ * qio_channel_pwritev
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error. To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_writev_full, apart from not supporting
+ * sending of file handles as well as beginning the write at the
+ * passed @offset
+ *
+ */
+ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
+                            size_t niov, off_t offset, Error **errp);
+
+/**
+ * qio_channel_pwrite
+ * @ioc: the channel object
+ * @buf: the memory region to write data into
+ * @buflen: the number of bytes to @buf
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error. To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ */
+ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen,
+                           off_t offset, Error **errp);
+
+/**
+ * qio_channel_preadv
+ * @ioc: the channel object
+ * @iov: the array of memory regions to read data into
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error.  To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_readv_full, apart from not supporting
+ * receiving of file handles as well as beginning the read at the
+ * passed @offset
+ *
+ */
+ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
+                           size_t niov, off_t offset, Error **errp);
+
+/**
+ * qio_channel_pread
+ * @ioc: the channel object
+ * @buf: the memory region to write data into
+ * @buflen: the number of bytes to @buf
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error.  To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ */
+ssize_t qio_channel_pread(QIOChannel *ioc, char *buf, size_t buflen,
+                          off_t offset, Error **errp);
+
 /**
  * qio_channel_shutdown:
  * @ioc: the channel object
diff --git a/io/channel.c b/io/channel.c
index 86c5834510..a1f12f8e90 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -454,6 +454,64 @@ GSource *qio_channel_add_watch_source(QIOChannel *ioc,
 }
 
 
+ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
+                            size_t niov, off_t offset, Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_pwritev) {
+        error_setg(errp, "Channel does not support pwritev");
+        return -1;
+    }
+
+    if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SEEKABLE)) {
+        error_setg_errno(errp, EINVAL, "Requested channel is not seekable");
+        return -1;
+    }
+
+    return klass->io_pwritev(ioc, iov, niov, offset, errp);
+}
+
+ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen,
+                           off_t offset, Error **errp)
+{
+    struct iovec iov = {
+        .iov_base = buf,
+        .iov_len = buflen
+    };
+
+    return qio_channel_pwritev(ioc, &iov, 1, offset, errp);
+}
+
+ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
+                           size_t niov, off_t offset, Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_preadv) {
+        error_setg(errp, "Channel does not support preadv");
+        return -1;
+    }
+
+    if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SEEKABLE)) {
+        error_setg_errno(errp, EINVAL, "Requested channel is not seekable");
+        return -1;
+    }
+
+    return klass->io_preadv(ioc, iov, niov, offset, errp);
+}
+
+ssize_t qio_channel_pread(QIOChannel *ioc, char *buf, size_t buflen,
+                          off_t offset, Error **errp)
+{
+    struct iovec iov = {
+        .iov_base = buf,
+        .iov_len = buflen
+    };
+
+    return qio_channel_preadv(ioc, &iov, 1, offset, errp);
+}
+
 int qio_channel_shutdown(QIOChannel *ioc,
                          QIOChannelShutdown how,
                          Error **errp)
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 08/34] io: implement io_pwritev/preadv for QIOChannelFile
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (6 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 07/34] io: Add generic pwritev/preadv interface Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 09/34] io: fsync before closing a file channel Fabiano Rosas
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Nikolay Borisov

From: Nikolay Borisov <nborisov@suse.com>

The upcoming 'fixed-ram' feature will require qemu to write data to
(and restore from) specific offsets of the migration file.

Add a minimal implementation of pwritev/preadv and expose them via the
io_pwritev and io_preadv interfaces.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 io/channel-file.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/io/channel-file.c b/io/channel-file.c
index f91bf6db1c..a6ad7770c6 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -146,6 +146,58 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
     return ret;
 }
 
+#ifdef CONFIG_PREADV
+static ssize_t qio_channel_file_preadv(QIOChannel *ioc,
+                                       const struct iovec *iov,
+                                       size_t niov,
+                                       off_t offset,
+                                       Error **errp)
+{
+    QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+    ssize_t ret;
+
+ retry:
+    ret = preadv(fioc->fd, iov, niov, offset);
+    if (ret < 0) {
+        if (errno == EAGAIN) {
+            return QIO_CHANNEL_ERR_BLOCK;
+        }
+        if (errno == EINTR) {
+            goto retry;
+        }
+
+        error_setg_errno(errp, errno, "Unable to read from file");
+        return -1;
+    }
+
+    return ret;
+}
+
+static ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
+                                        const struct iovec *iov,
+                                        size_t niov,
+                                        off_t offset,
+                                        Error **errp)
+{
+    QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+    ssize_t ret;
+
+ retry:
+    ret = pwritev(fioc->fd, iov, niov, offset);
+    if (ret <= 0) {
+        if (errno == EAGAIN) {
+            return QIO_CHANNEL_ERR_BLOCK;
+        }
+        if (errno == EINTR) {
+            goto retry;
+        }
+        error_setg_errno(errp, errno, "Unable to write to file");
+        return -1;
+    }
+    return ret;
+}
+#endif /* CONFIG_PREADV */
+
 static int qio_channel_file_set_blocking(QIOChannel *ioc,
                                          bool enabled,
                                          Error **errp)
@@ -231,6 +283,10 @@ static void qio_channel_file_class_init(ObjectClass *klass,
     ioc_klass->io_writev = qio_channel_file_writev;
     ioc_klass->io_readv = qio_channel_file_readv;
     ioc_klass->io_set_blocking = qio_channel_file_set_blocking;
+#ifdef CONFIG_PREADV
+    ioc_klass->io_pwritev = qio_channel_file_pwritev;
+    ioc_klass->io_preadv = qio_channel_file_preadv;
+#endif
     ioc_klass->io_seek = qio_channel_file_seek;
     ioc_klass->io_close = qio_channel_file_close;
     ioc_klass->io_create_watch = qio_channel_file_create_watch;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 09/34] io: fsync before closing a file channel
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (7 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 08/34] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 10/34] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Make sure the data is flushed to disk before closing file
channels. This is to ensure data is on disk and not lost in the event
of a host crash.

This is currently being implemented to affect the migration code when
migrating to a file, but all QIOChannelFile users should benefit from
the change.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
- improved commit message
---
 io/channel-file.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/io/channel-file.c b/io/channel-file.c
index a6ad7770c6..d4706fa592 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -242,6 +242,11 @@ static int qio_channel_file_close(QIOChannel *ioc,
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
 
+    if (qemu_fdatasync(fioc->fd) < 0) {
+        error_setg_errno(errp, errno,
+                         "Unable to synchronize file data with storage device");
+        return -1;
+    }
     if (qemu_close(fioc->fd) < 0) {
         error_setg_errno(errp, errno,
                          "Unable to close file");
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 10/34] migration/qemu-file: add utility methods for working with seekable channels
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (8 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 09/34] io: fsync before closing a file channel Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Add utility methods that will be needed when implementing 'fixed-ram'
migration capability.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
- handling EAGAIN and partial reads/writes
- removed the list of functions from the commit message
---
 include/migration/qemu-file-types.h |   2 +
 migration/qemu-file.c               | 106 ++++++++++++++++++++++++++++
 migration/qemu-file.h               |   6 ++
 3 files changed, 114 insertions(+)

diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h
index 9ba163f333..adec5abc07 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -50,6 +50,8 @@ unsigned int qemu_get_be16(QEMUFile *f);
 unsigned int qemu_get_be32(QEMUFile *f);
 uint64_t qemu_get_be64(QEMUFile *f);
 
+bool qemu_file_is_seekable(QEMUFile *f);
+
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
     qemu_put_be64(f, *pv);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 94231ff295..b10c882629 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -33,6 +33,7 @@
 #include "options.h"
 #include "qapi/error.h"
 #include "rdma.h"
+#include "io/channel-file.h"
 
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -255,6 +256,10 @@ static void qemu_iovec_release_ram(QEMUFile *f)
     memset(f->may_free, 0, sizeof(f->may_free));
 }
 
+bool qemu_file_is_seekable(QEMUFile *f)
+{
+    return qio_channel_has_feature(f->ioc, QIO_CHANNEL_FEATURE_SEEKABLE);
+}
 
 /**
  * Flushes QEMUFile buffer
@@ -447,6 +452,107 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
     }
 }
 
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+                        off_t pos)
+{
+    Error *err = NULL;
+    size_t ret;
+
+    if (f->last_error) {
+        return;
+    }
+
+    qemu_fflush(f);
+    ret = qio_channel_pwrite(f->ioc, (char *)buf, buflen, pos, &err);
+
+    if (err) {
+        qemu_file_set_error_obj(f, -EIO, err);
+        return;
+    }
+
+    if ((ssize_t)ret == QIO_CHANNEL_ERR_BLOCK) {
+        qemu_file_set_error_obj(f, -EAGAIN, NULL);
+        return;
+    }
+
+    if (ret != buflen) {
+        error_setg(&err, "Partial write of size %zu, expected %zu", ret,
+                   buflen);
+        qemu_file_set_error_obj(f, -EIO, err);
+        return;
+    }
+
+    stat64_add(&mig_stats.qemu_file_transferred, buflen);
+
+    return;
+}
+
+
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+                          off_t pos)
+{
+    Error *err = NULL;
+    size_t ret;
+
+    if (f->last_error) {
+        return 0;
+    }
+
+    ret = qio_channel_pread(f->ioc, (char *)buf, buflen, pos, &err);
+
+    if ((ssize_t)ret == -1 || err) {
+        qemu_file_set_error_obj(f, -EIO, err);
+        return 0;
+    }
+
+    if ((ssize_t)ret == QIO_CHANNEL_ERR_BLOCK) {
+        qemu_file_set_error_obj(f, -EAGAIN, NULL);
+        return 0;
+    }
+
+    if (ret != buflen) {
+        error_setg(&err, "Partial read of size %zu, expected %zu", ret, buflen);
+        qemu_file_set_error_obj(f, -EIO, err);
+        return 0;
+    }
+
+    return ret;
+}
+
+void qemu_set_offset(QEMUFile *f, off_t off, int whence)
+{
+    Error *err = NULL;
+    off_t ret;
+
+    if (qemu_file_is_writable(f)) {
+        qemu_fflush(f);
+    } else {
+        /* Drop all cached buffers if existed; will trigger a re-fill later */
+        f->buf_index = 0;
+        f->buf_size = 0;
+    }
+
+    ret = qio_channel_io_seek(f->ioc, off, whence, &err);
+    if (ret == (off_t)-1) {
+        qemu_file_set_error_obj(f, -EIO, err);
+    }
+}
+
+off_t qemu_get_offset(QEMUFile *f)
+{
+    Error *err = NULL;
+    off_t ret;
+
+    qemu_fflush(f);
+
+    ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, &err);
+    if (ret == (off_t)-1) {
+        qemu_file_set_error_obj(f, -EIO, err);
+    }
+    return ret;
+}
+
+
 void qemu_put_byte(QEMUFile *f, int v)
 {
     if (f->last_error) {
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 8aec9fabf7..32fd4a34fd 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -75,6 +75,12 @@ QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 int qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
+void qemu_set_offset(QEMUFile *f, off_t off, int whence);
+off_t qemu_get_offset(QEMUFile *f);
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+                        off_t pos);
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+                          off_t pos);
 
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (9 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 10/34] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-21  8:41   ` Markus Armbruster
                     ` (2 more replies)
  2024-02-20 22:41 ` [PATCH v4 12/34] migration: Add fixed-ram URI compatibility check Fabiano Rosas
                   ` (24 subsequent siblings)
  35 siblings, 3 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Eric Blake

Add a new migration capability 'fixed-ram'.

The core of the feature is to ensure that each RAM page has a specific
offset in the resulting migration stream. The reasons why we'd want
such behavior are:

 - The resulting file will have a bounded size, since pages which are
   dirtied multiple times will always go to a fixed location in the
   file, rather than constantly being added to a sequential
   stream. This eliminates cases where a VM with, say, 1G of RAM can
   result in a migration file that's 10s of GBs, provided that the
   workload constantly redirties memory.

 - It paves the way to implement O_DIRECT-enabled save/restore of the
   migration stream as the pages are ensured to be written at aligned
   offsets.

 - It allows the usage of multifd so we can write RAM pages to the
   migration file in parallel.

For now, enabling the capability has no effect. The next couple of
patches implement the core functionality.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
- update migration.json to 9.0 and improve wording
- move docs to a separate file and add use cases information
---
 docs/devel/migration/features.rst  |   1 +
 docs/devel/migration/fixed-ram.rst | 137 +++++++++++++++++++++++++++++
 migration/options.c                |  34 +++++++
 migration/options.h                |   1 +
 migration/savevm.c                 |   1 +
 qapi/migration.json                |   6 +-
 6 files changed, 179 insertions(+), 1 deletion(-)
 create mode 100644 docs/devel/migration/fixed-ram.rst

diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst
index a9acaf618e..4c708b679a 100644
--- a/docs/devel/migration/features.rst
+++ b/docs/devel/migration/features.rst
@@ -10,3 +10,4 @@ Migration has plenty of features to support different use cases.
    dirty-limit
    vfio
    virtio
+   fixed-ram
diff --git a/docs/devel/migration/fixed-ram.rst b/docs/devel/migration/fixed-ram.rst
new file mode 100644
index 0000000000..a6c0e5a360
--- /dev/null
+++ b/docs/devel/migration/fixed-ram.rst
@@ -0,0 +1,137 @@
+Fixed-ram
+=========
+
+Fixed-ram is a new stream format for the RAM section designed to
+supplement the existing ``file:`` migration and make it compatible
+with ``multifd``. This enables parallel migration of a guest's RAM to
+a file.
+
+The core of the feature is to ensure that each RAM page has a specific
+offset in the resulting migration file. This enables the ``multifd``
+threads to write exclusively to those offsets even if the guest is
+constantly dirtying pages (i.e. live migration). Another benefit is
+that the resulting file will have a bounded size, since pages which
+are dirtied multiple times will always go to a fixed location in the
+file, rather than constantly being added to a sequential
+stream. Having the pages at fixed offsets also allows the usage of
+O_DIRECT for save/restore of the migration stream as the pages are
+ensured to be written respecting O_DIRECT alignment restrictions.
+
+Usage
+-----
+
+On both source and destination, enable the ``multifd`` and
+``fixed-ram`` capabilities:
+
+    ``migrate_set_capability multifd on``
+
+    ``migrate_set_capability fixed-ram on``
+
+Use a ``file:`` URL for migration:
+
+    ``migrate file:/path/to/migration/file``
+
+Fixed-ram migration is best done non-live, i.e. by stopping the VM on
+the source side before migrating.
+
+For best performance enable the ``direct-io`` capability as well:
+
+    ``migrate_set_capability direct-io on``
+
+Use-cases
+---------
+
+The fixed-ram feature was designed for use cases where the migration
+stream will be directed to a file in the filesystem and not
+immediately restored on the destination VM [#]_. These could be
+thought of as snapshots. We can further categorize them into live and
+non-live.
+
+- Non-live snapshot
+
+If the use case requires a VM to be stopped before taking a snapshot,
+that's the ideal scenario for fixed-ram migration. Not having to track
+dirty pages, the migration will write the RAM pages to the disk as
+fast as it can.
+
+Note: if a snapshot is taken of a running VM, but the VM will be
+stopped after the snapshot by the admin, then consider stopping it
+right before the snapshot to take benefit of the performance gains
+mentioned above.
+
+- Live snapshot
+
+If the use case requires that the VM keeps running during and after
+the snapshot operation, then fixed-ram migration can still be used,
+but will be less performant. Other strategies such as
+background-snapshot should be evaluated as well. One benefit of
+fixed-ram in this scenario is portability since background-snapshot
+depends on async dirty tracking (KVM_GET_DIRTY_LOG) which is not
+supported outside of Linux.
+
+.. [#] While this same effect could be obtained with the usage of
+       snapshots or the ``file:`` migration alone, fixed-ram provides
+       a performance increase for VMs with larger RAM sizes (10s to
+       100s of GiBs), specially if the VM has been stopped beforehand.
+
+RAM section format
+------------------
+
+Instead of having a sequential stream of pages that follow the
+RAMBlock headers, the dirty pages for a RAMBlock follow its header
+instead. This ensures that each RAM page has a fixed offset in the
+resulting migration file.
+
+A bitmap is introduced to track which pages have been written in the
+migration file. Pages are written at a fixed location for every
+ramblock. Zero pages are ignored as they'd be zero in the destination
+migration as well.
+
+::
+
+ Without fixed-ram:                  With fixed-ram:
+
+ ---------------------               --------------------------------
+ | ramblock 1 header |               | ramblock 1 header            |
+ ---------------------               --------------------------------
+ | ramblock 2 header |               | ramblock 1 fixed-ram header  |
+ ---------------------               --------------------------------
+ | ...               |               | padding to next 1MB boundary |
+ ---------------------               | ...                          |
+ | ramblock n header |               --------------------------------
+ ---------------------               | ramblock 1 pages             |
+ | RAM_SAVE_FLAG_EOS |               | ...                          |
+ ---------------------               --------------------------------
+ | stream of pages   |               | ramblock 2 header            |
+ | (iter 1)          |               --------------------------------
+ | ...               |               | ramblock 2 fixed-ram header  |
+ ---------------------               --------------------------------
+ | RAM_SAVE_FLAG_EOS |               | padding to next 1MB boundary |
+ ---------------------               | ...                          |
+ | stream of pages   |               --------------------------------
+ | (iter 2)          |               | ramblock 2 pages             |
+ | ...               |               | ...                          |
+ ---------------------               --------------------------------
+ | ...               |               | ...                          |
+ ---------------------               --------------------------------
+                                     | RAM_SAVE_FLAG_EOS            |
+                                     --------------------------------
+                                     | ...                          |
+                                     --------------------------------
+
+ where:
+  - ramblock header: the generic information for a ramblock, such as
+    idstr, used_len, etc.
+
+  - ramblock fixed-ram header: the information added by this feature:
+    bitmap of pages written, bitmap size and offset of pages in the
+    migration file.
+
+Restrictions
+------------
+
+Since pages are written to their relative offsets and out of order
+(due to the memory dirtying patterns), streaming channels such as
+sockets are not supported. A seekable channel such as a file is
+required. This can be verified in the QIOChannel by the presence of
+the QIO_CHANNEL_FEATURE_SEEKABLE.
diff --git a/migration/options.c b/migration/options.c
index 3e3e0b93b4..4909e5c72a 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -204,6 +204,7 @@ Property migration_properties[] = {
     DEFINE_PROP_MIG_CAP("x-switchover-ack",
                         MIGRATION_CAPABILITY_SWITCHOVER_ACK),
     DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
+    DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -263,6 +264,13 @@ bool migrate_events(void)
     return s->capabilities[MIGRATION_CAPABILITY_EVENTS];
 }
 
+bool migrate_fixed_ram(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
+}
+
 bool migrate_ignore_shared(void)
 {
     MigrationState *s = migrate_get_current();
@@ -645,6 +653,32 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
         }
     }
 
+    if (new_caps[MIGRATION_CAPABILITY_FIXED_RAM]) {
+        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
+            error_setg(errp,
+                       "Fixed-ram migration is incompatible with multifd");
+            return false;
+        }
+
+        if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
+            error_setg(errp,
+                       "Fixed-ram migration is incompatible with xbzrle");
+            return false;
+        }
+
+        if (new_caps[MIGRATION_CAPABILITY_COMPRESS]) {
+            error_setg(errp,
+                       "Fixed-ram migration is incompatible with compression");
+            return false;
+        }
+
+        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+            error_setg(errp,
+                       "Fixed-ram migration is incompatible with postcopy ram");
+            return false;
+        }
+    }
+
     return true;
 }
 
diff --git a/migration/options.h b/migration/options.h
index 246c160aee..8680a10b79 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -31,6 +31,7 @@ bool migrate_compress(void);
 bool migrate_dirty_bitmaps(void);
 bool migrate_dirty_limit(void);
 bool migrate_events(void);
+bool migrate_fixed_ram(void);
 bool migrate_ignore_shared(void);
 bool migrate_late_block_activate(void);
 bool migrate_multifd(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index d612c8a902..4b928dd6bb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -245,6 +245,7 @@ static bool should_validate_capability(int capability)
     /* Validate only new capabilities to keep compatibility. */
     switch (capability) {
     case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
+    case MIGRATION_CAPABILITY_FIXED_RAM:
         return true;
     default:
         return false;
diff --git a/qapi/migration.json b/qapi/migration.json
index 5a565d9b8d..3fce5fe53e 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -531,6 +531,10 @@
 #     and can result in more stable read performance.  Requires KVM
 #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
 #
+# @fixed-ram: Migrate using fixed offsets in the migration file for
+#     each RAM page.  Requires a migration URI that supports seeking,
+#     such as a file.  (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
@@ -555,7 +559,7 @@
            { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
            'validate-uuid', 'background-snapshot',
            'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
-           'dirty-limit'] }
+           'dirty-limit', 'fixed-ram'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 12/34] migration: Add fixed-ram URI compatibility check
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (10 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  3:11   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 13/34] migration/ram: Add outgoing 'fixed-ram' migration Fabiano Rosas
                   ` (23 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

The fixed-ram migration format needs a channel that supports seeking
to be able to write each page to an arbitrary offset in the migration
stream.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 migration/migration.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index ab21de2cad..16da269847 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -142,10 +142,39 @@ static bool transport_supports_multi_channels(MigrationAddress *addr)
     return false;
 }
 
+static bool migration_needs_seekable_channel(void)
+{
+    return migrate_fixed_ram();
+}
+
+static bool transport_supports_seeking(MigrationAddress *addr)
+{
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
+        return true;
+    }
+
+    /*
+     * At this point, the user might not yet have passed the file
+     * descriptor to QEMU, so we cannot know for sure whether it
+     * refers to a plain file or a socket. Let it through anyway.
+     */
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
+        return addr->u.socket.type == SOCKET_ADDRESS_TYPE_FD;
+    }
+
+    return false;
+}
+
 static bool
 migration_channels_and_transport_compatible(MigrationAddress *addr,
                                             Error **errp)
 {
+    if (migration_needs_seekable_channel() &&
+        !transport_supports_seeking(addr)) {
+        error_setg(errp, "Migration requires seekable transport (e.g. file)");
+        return false;
+    }
+
     if (migration_needs_multiple_sockets() &&
         !transport_supports_multi_channels(addr)) {
         error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)");
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 13/34] migration/ram: Add outgoing 'fixed-ram' migration
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (11 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 12/34] migration: Add fixed-ram URI compatibility check Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  4:03   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 14/34] migration/ram: Add incoming " Fabiano Rosas
                   ` (22 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Nikolay Borisov,
	Paolo Bonzini, David Hildenbrand, Philippe Mathieu-Daudé

Implement the outgoing migration side for the 'fixed-ram' capability.

A bitmap is introduced to track which pages have been written in the
migration file. Pages are written at a fixed location for every
ramblock. Zero pages are ignored as they'd be zero in the destination
migration as well.

The migration stream is altered to put the dirty pages for a ramblock
after its header instead of having a sequential stream of pages that
follow the ramblock headers.

Without fixed-ram (current):        With fixed-ram (new):

 ---------------------               --------------------------------
 | ramblock 1 header |               | ramblock 1 header            |
 ---------------------               --------------------------------
 | ramblock 2 header |               | ramblock 1 fixed-ram header  |
 ---------------------               --------------------------------
 | ...               |               | padding to next 1MB boundary |
 ---------------------               | ...                          |
 | ramblock n header |               --------------------------------
 ---------------------               | ramblock 1 pages             |
 | RAM_SAVE_FLAG_EOS |               | ...                          |
 ---------------------               --------------------------------
 | stream of pages   |               | ramblock 2 header            |
 | (iter 1)          |               --------------------------------
 | ...               |               | ramblock 2 fixed-ram header  |
 ---------------------               --------------------------------
 | RAM_SAVE_FLAG_EOS |               | padding to next 1MB boundary |
 ---------------------               | ...                          |
 | stream of pages   |               --------------------------------
 | (iter 2)          |               | ramblock 2 pages             |
 | ...               |               | ...                          |
 ---------------------               --------------------------------
 | ...               |               | ...                          |
 ---------------------               --------------------------------
                                     | RAM_SAVE_FLAG_EOS            |
                                     --------------------------------
                                     | ...                          |
                                     --------------------------------

where:
 - ramblock header: the generic information for a ramblock, such as
   idstr, used_len, etc.

 - ramblock fixed-ram header: the new information added by this
   feature: bitmap of pages written, bitmap size and offset of pages
   in the migration file.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 include/exec/ramblock.h |  13 ++++
 migration/ram.c         | 131 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 135 insertions(+), 9 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 3eb79723c6..6a66301219 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -44,6 +44,19 @@ struct RAMBlock {
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
+
+    /*
+     * Below fields are only used by fixed-ram migration
+     */
+    /* bitmap of pages present in the migration file */
+    unsigned long *file_bmap;
+    /*
+     * offset in the file pages belonging to this ramblock are saved,
+     * used only during migration to a file.
+     */
+    off_t bitmap_offset;
+    uint64_t pages_offset;
+
     /* bitmap of already received pages in postcopy */
     unsigned long *receivedmap;
 
diff --git a/migration/ram.c b/migration/ram.c
index 4649a81204..84c531722c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -94,6 +94,18 @@
 #define RAM_SAVE_FLAG_MULTIFD_FLUSH    0x200
 /* We can't use any flag that is bigger than 0x200 */
 
+/*
+ * fixed-ram migration supports O_DIRECT, so we need to make sure the
+ * userspace buffer, the IO operation size and the file offset are
+ * aligned according to the underlying device's block size. The first
+ * two are already aligned to page size, but we need to add padding to
+ * the file to align the offset.  We cannot read the block size
+ * dynamically because the migration file can be moved between
+ * different systems, so use 1M to cover most block sizes and to keep
+ * the file offset aligned at page size as well.
+ */
+#define FIXED_RAM_FILE_OFFSET_ALIGNMENT 0x100000
+
 XBZRLECacheStats xbzrle_counters;
 
 /* used by the search for pages to send */
@@ -1127,12 +1139,18 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
         return 0;
     }
 
+    stat64_add(&mig_stats.zero_pages, 1);
+
+    if (migrate_fixed_ram()) {
+        /* zero pages are not transferred with fixed-ram */
+        clear_bit(offset >> TARGET_PAGE_BITS, pss->block->file_bmap);
+        return 1;
+    }
+
     len += save_page_header(pss, file, pss->block, offset | RAM_SAVE_FLAG_ZERO);
     qemu_put_byte(file, 0);
     len += 1;
     ram_release_page(pss->block->idstr, offset);
-
-    stat64_add(&mig_stats.zero_pages, 1);
     ram_transferred_add(len);
 
     /*
@@ -1190,14 +1208,20 @@ static int save_normal_page(PageSearchStatus *pss, RAMBlock *block,
 {
     QEMUFile *file = pss->pss_channel;
 
-    ram_transferred_add(save_page_header(pss, pss->pss_channel, block,
-                                         offset | RAM_SAVE_FLAG_PAGE));
-    if (async) {
-        qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
-                              migrate_release_ram() &&
-                              migration_in_postcopy());
+    if (migrate_fixed_ram()) {
+        qemu_put_buffer_at(file, buf, TARGET_PAGE_SIZE,
+                           block->pages_offset + offset);
+        set_bit(offset >> TARGET_PAGE_BITS, block->file_bmap);
     } else {
-        qemu_put_buffer(file, buf, TARGET_PAGE_SIZE);
+        ram_transferred_add(save_page_header(pss, pss->pss_channel, block,
+                                             offset | RAM_SAVE_FLAG_PAGE));
+        if (async) {
+            qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
+                                  migrate_release_ram() &&
+                                  migration_in_postcopy());
+        } else {
+            qemu_put_buffer(file, buf, TARGET_PAGE_SIZE);
+        }
     }
     ram_transferred_add(TARGET_PAGE_SIZE);
     stat64_add(&mig_stats.normal_pages, 1);
@@ -2412,6 +2436,8 @@ static void ram_save_cleanup(void *opaque)
         block->clear_bmap = NULL;
         g_free(block->bmap);
         block->bmap = NULL;
+        g_free(block->file_bmap);
+        block->file_bmap = NULL;
     }
 
     xbzrle_cleanup();
@@ -2779,6 +2805,9 @@ static void ram_list_init_bitmaps(void)
              */
             block->bmap = bitmap_new(pages);
             bitmap_set(block->bmap, 0, pages);
+            if (migrate_fixed_ram()) {
+                block->file_bmap = bitmap_new(pages);
+            }
             block->clear_bmap_shift = shift;
             block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
         }
@@ -2916,6 +2945,60 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
     }
 }
 
+#define FIXED_RAM_HDR_VERSION 1
+struct FixedRamHeader {
+    uint32_t version;
+    /*
+     * The target's page size, so we know how many pages are in the
+     * bitmap.
+     */
+    uint64_t page_size;
+    /*
+     * The offset in the migration file where the pages bitmap is
+     * stored.
+     */
+    uint64_t bitmap_offset;
+    /*
+     * The offset in the migration file where the actual pages (data)
+     * are stored.
+     */
+    uint64_t pages_offset;
+} QEMU_PACKED;
+typedef struct FixedRamHeader FixedRamHeader;
+
+static void fixed_ram_setup_ramblock(QEMUFile *file, RAMBlock *block)
+{
+    g_autofree FixedRamHeader *header = NULL;
+    size_t header_size, bitmap_size;
+    long num_pages;
+
+    header = g_new0(FixedRamHeader, 1);
+    header_size = sizeof(FixedRamHeader);
+
+    num_pages = block->used_length >> TARGET_PAGE_BITS;
+    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+    /*
+     * Save the file offsets of where the bitmap and the pages should
+     * go as they are written at the end of migration and during the
+     * iterative phase, respectively.
+     */
+    block->bitmap_offset = qemu_get_offset(file) + header_size;
+    block->pages_offset = ROUND_UP(block->bitmap_offset +
+                                   bitmap_size,
+                                   FIXED_RAM_FILE_OFFSET_ALIGNMENT);
+
+    header->version = cpu_to_be32(FIXED_RAM_HDR_VERSION);
+    header->page_size = cpu_to_be64(TARGET_PAGE_SIZE);
+    header->bitmap_offset = cpu_to_be64(block->bitmap_offset);
+    header->pages_offset = cpu_to_be64(block->pages_offset);
+
+    qemu_put_buffer(file, (uint8_t *) header, header_size);
+
+    /* prepare offset for next ramblock */
+    qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -2965,6 +3048,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
             if (migrate_ignore_shared()) {
                 qemu_put_be64(f, block->mr->addr);
             }
+
+            if (migrate_fixed_ram()) {
+                fixed_ram_setup_ramblock(f, block);
+            }
         }
     }
 
@@ -2998,6 +3085,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     return qemu_fflush(f);
 }
 
+static void ram_save_file_bmap(QEMUFile *f)
+{
+    RAMBlock *block;
+
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
+        long num_pages = block->used_length >> TARGET_PAGE_BITS;
+        long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+        qemu_put_buffer_at(f, (uint8_t *)block->file_bmap, bitmap_size,
+                           block->bitmap_offset);
+        ram_transferred_add(bitmap_size);
+    }
+}
+
 /**
  * ram_save_iterate: iterative stage for migration
  *
@@ -3187,6 +3288,18 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         return ret;
     }
 
+    if (migrate_fixed_ram()) {
+        ram_save_file_bmap(f);
+
+        if (qemu_file_get_error(f)) {
+            Error *local_err = NULL;
+            int err = qemu_file_get_error_obj(f, &local_err);
+
+            error_reportf_err(local_err, "Failed to write bitmap to file: ");
+            return -err;
+        }
+    }
+
     if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
         qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
     }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 14/34] migration/ram: Add incoming 'fixed-ram' migration
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (12 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 13/34] migration/ram: Add outgoing 'fixed-ram' migration Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  5:19   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 15/34] tests/qtest/migration: Add tests for fixed-ram file-based migration Fabiano Rosas
                   ` (21 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Nikolay Borisov

Add the necessary code to parse the format changes for the 'fixed-ram'
capability.

One of the more notable changes in behavior is that in the 'fixed-ram'
case ram pages are restored in one go rather than constantly looping
through the migration stream.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
- added error propagation for read_ramblock_fixed_ram()
- removed buf_size variable
---
 migration/ram.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 84c531722c..5932e1b8e1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -106,6 +106,12 @@
  */
 #define FIXED_RAM_FILE_OFFSET_ALIGNMENT 0x100000
 
+/*
+ * When doing fixed-ram migration, this is the amount we read from the
+ * pages region in the migration file at a time.
+ */
+#define FIXED_RAM_LOAD_BUF_SIZE 0x100000
+
 XBZRLECacheStats xbzrle_counters;
 
 /* used by the search for pages to send */
@@ -2999,6 +3005,35 @@ static void fixed_ram_setup_ramblock(QEMUFile *file, RAMBlock *block)
     qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
 }
 
+static bool fixed_ram_read_header(QEMUFile *file, FixedRamHeader *header,
+                                  Error **errp)
+{
+    size_t ret, header_size = sizeof(FixedRamHeader);
+
+    ret = qemu_get_buffer(file, (uint8_t *)header, header_size);
+    if (ret != header_size) {
+        error_setg(errp, "Could not read whole fixed-ram migration header "
+                   "(expected %zd, got %zd bytes)", header_size, ret);
+        return false;
+    }
+
+    /* migration stream is big-endian */
+    header->version = be32_to_cpu(header->version);
+
+    if (header->version > FIXED_RAM_HDR_VERSION) {
+        error_setg(errp, "Migration fixed-ram capability version mismatch "
+                   "(expected %d, got %d)", FIXED_RAM_HDR_VERSION,
+                   header->version);
+        return false;
+    }
+
+    header->page_size = be64_to_cpu(header->page_size);
+    header->bitmap_offset = be64_to_cpu(header->bitmap_offset);
+    header->pages_offset = be64_to_cpu(header->pages_offset);
+
+    return true;
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -3900,6 +3935,102 @@ void colo_flush_ram_cache(void)
     trace_colo_flush_ram_cache_end();
 }
 
+static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
+                                    long num_pages, unsigned long *bitmap,
+                                    Error **errp)
+{
+    ERRP_GUARD();
+    unsigned long set_bit_idx, clear_bit_idx;
+    ram_addr_t offset;
+    void *host;
+    size_t read, unread, size;
+
+    for (set_bit_idx = find_first_bit(bitmap, num_pages);
+         set_bit_idx < num_pages;
+         set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
+
+        clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1);
+
+        unread = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
+        offset = set_bit_idx << TARGET_PAGE_BITS;
+
+        while (unread > 0) {
+            host = host_from_ram_block_offset(block, offset);
+            if (!host) {
+                error_setg(errp, "page outside of ramblock %s range",
+                           block->idstr);
+                return false;
+            }
+
+            size = MIN(unread, FIXED_RAM_LOAD_BUF_SIZE);
+
+            read = qemu_get_buffer_at(f, host, size,
+                                      block->pages_offset + offset);
+            if (!read) {
+                goto err;
+            }
+            offset += read;
+            unread -= read;
+        }
+    }
+
+    return true;
+
+err:
+    qemu_file_get_error_obj(f, errp);
+    error_prepend(errp, "(%s) failed to read page " RAM_ADDR_FMT
+                  "from file offset %" PRIx64 ": ", block->idstr, offset,
+                  block->pages_offset + offset);
+    return false;
+}
+
+static void parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
+                                     ram_addr_t length, Error **errp)
+{
+    g_autofree unsigned long *bitmap = NULL;
+    FixedRamHeader header;
+    size_t bitmap_size;
+    long num_pages;
+
+    if (!fixed_ram_read_header(f, &header, errp)) {
+        return;
+    }
+
+    block->pages_offset = header.pages_offset;
+
+    /*
+     * Check the alignment of the file region that contains pages. We
+     * don't enforce FIXED_RAM_FILE_OFFSET_ALIGNMENT to allow that
+     * value to change in the future. Do only a sanity check with page
+     * size alignment.
+     */
+    if (!QEMU_IS_ALIGNED(block->pages_offset, TARGET_PAGE_SIZE)) {
+        error_setg(errp,
+                   "Error reading ramblock %s pages, region has bad alignment",
+                   block->idstr);
+        return;
+    }
+
+    num_pages = length / header.page_size;
+    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+    bitmap = g_malloc0(bitmap_size);
+    if (qemu_get_buffer_at(f, (uint8_t *)bitmap, bitmap_size,
+                           header.bitmap_offset) != bitmap_size) {
+        error_setg(errp, "Error reading dirty bitmap");
+        return;
+    }
+
+    if (!read_ramblock_fixed_ram(f, block, num_pages, bitmap, errp)) {
+        return;
+    }
+
+    /* Skip pages array */
+    qemu_set_offset(f, block->pages_offset + length, SEEK_SET);
+
+    return;
+}
+
 static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
 {
     int ret = 0;
@@ -3908,6 +4039,17 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
 
     assert(block);
 
+    if (migrate_fixed_ram()) {
+        Error *local_err = NULL;
+
+        parse_ramblock_fixed_ram(f, block, length, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -EINVAL;
+        }
+        return 0;
+    }
+
     if (!qemu_ram_is_migratable(block)) {
         error_report("block %s should not be migrated !", block->idstr);
         return -EINVAL;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 15/34] tests/qtest/migration: Add tests for fixed-ram file-based migration
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (13 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 14/34] migration/ram: Add incoming " Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 16/34] migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data Fabiano Rosas
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-test.c | 59 ++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 83512bce85..d61f93b151 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2200,6 +2200,14 @@ static void *test_mode_reboot_start(QTestState *from, QTestState *to)
     return NULL;
 }
 
+static void *migrate_fixed_ram_start(QTestState *from, QTestState *to)
+{
+    migrate_set_capability(from, "fixed-ram", true);
+    migrate_set_capability(to, "fixed-ram", true);
+
+    return NULL;
+}
+
 static void test_mode_reboot(void)
 {
     g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
@@ -2214,6 +2222,32 @@ static void test_mode_reboot(void)
     test_file_common(&args, true);
 }
 
+static void test_precopy_file_fixed_ram_live(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+                                           FILE_TEST_FILENAME);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_fixed_ram_start,
+    };
+
+    test_file_common(&args, false);
+}
+
+static void test_precopy_file_fixed_ram(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+                                           FILE_TEST_FILENAME);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_fixed_ram_start,
+    };
+
+    test_file_common(&args, true);
+}
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -2462,6 +2496,13 @@ static void *migrate_precopy_fd_file_start(QTestState *from, QTestState *to)
     return NULL;
 }
 
+static void *migrate_fd_file_fixed_ram_start(QTestState *from, QTestState *to)
+{
+    migrate_fixed_ram_start(from, to);
+
+    return migrate_precopy_fd_file_start(from, to);
+}
+
 static void test_migrate_precopy_fd_file(void)
 {
     MigrateCommon args = {
@@ -2472,6 +2513,17 @@ static void test_migrate_precopy_fd_file(void)
     };
     test_file_common(&args, true);
 }
+
+static void test_migrate_precopy_fd_file_fixed_ram(void)
+{
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .connect_uri = "fd:fd-mig",
+        .start_hook = migrate_fd_file_fixed_ram_start,
+        .finish_hook = test_migrate_fd_finish_hook
+    };
+    test_file_common(&args, true);
+}
 #endif /* _WIN32 */
 
 static void do_test_validate_uuid(MigrateStart *args, bool should_fail)
@@ -3509,6 +3561,11 @@ int main(int argc, char **argv)
         migration_test_add("/migration/mode/reboot", test_mode_reboot);
     }
 
+    migration_test_add("/migration/precopy/file/fixed-ram",
+                       test_precopy_file_fixed_ram);
+    migration_test_add("/migration/precopy/file/fixed-ram/live",
+                       test_precopy_file_fixed_ram_live);
+
 #ifdef CONFIG_GNUTLS
     migration_test_add("/migration/precopy/unix/tls/psk",
                        test_precopy_unix_tls_psk);
@@ -3570,6 +3627,8 @@ int main(int argc, char **argv)
                        test_migrate_precopy_fd_socket);
     migration_test_add("/migration/precopy/fd/file",
                        test_migrate_precopy_fd_file);
+    migration_test_add("/migration/precopy/fd/file/fixed-ram",
+                       test_migrate_precopy_fd_file_fixed_ram);
 #endif
     migration_test_add("/migration/validate_uuid", test_validate_uuid);
     migration_test_add("/migration/validate_uuid_error",
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 16/34] migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (14 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 15/34] tests/qtest/migration: Add tests for fixed-ram file-based migration Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 17/34] migration/multifd: Decouple recv method from pages Fabiano Rosas
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Use a more specific name for the compression data so we can use the
generic for the multifd core code.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-zlib.c | 20 ++++++++++----------
 migration/multifd-zstd.c | 20 ++++++++++----------
 migration/multifd.h      |  4 ++--
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 012e3bdea1..2a8f5fc9a6 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -69,7 +69,7 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
         err_msg = "out of memory for buf";
         goto err_free_zbuff;
     }
-    p->data = z;
+    p->compress_data = z;
     return 0;
 
 err_free_zbuff:
@@ -92,15 +92,15 @@ err_free_z:
  */
 static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
-    struct zlib_data *z = p->data;
+    struct zlib_data *z = p->compress_data;
 
     deflateEnd(&z->zs);
     g_free(z->zbuff);
     z->zbuff = NULL;
     g_free(z->buf);
     z->buf = NULL;
-    g_free(p->data);
-    p->data = NULL;
+    g_free(p->compress_data);
+    p->compress_data = NULL;
 }
 
 /**
@@ -117,7 +117,7 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     MultiFDPages_t *pages = p->pages;
-    struct zlib_data *z = p->data;
+    struct zlib_data *z = p->compress_data;
     z_stream *zs = &z->zs;
     uint32_t out_size = 0;
     int ret;
@@ -194,7 +194,7 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error **errp)
     struct zlib_data *z = g_new0(struct zlib_data, 1);
     z_stream *zs = &z->zs;
 
-    p->data = z;
+    p->compress_data = z;
     zs->zalloc = Z_NULL;
     zs->zfree = Z_NULL;
     zs->opaque = Z_NULL;
@@ -224,13 +224,13 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error **errp)
  */
 static void zlib_recv_cleanup(MultiFDRecvParams *p)
 {
-    struct zlib_data *z = p->data;
+    struct zlib_data *z = p->compress_data;
 
     inflateEnd(&z->zs);
     g_free(z->zbuff);
     z->zbuff = NULL;
-    g_free(p->data);
-    p->data = NULL;
+    g_free(p->compress_data);
+    p->compress_data = NULL;
 }
 
 /**
@@ -246,7 +246,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
  */
 static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
-    struct zlib_data *z = p->data;
+    struct zlib_data *z = p->compress_data;
     z_stream *zs = &z->zs;
     uint32_t in_size = p->next_packet_size;
     /* we measure the change of total_out */
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index dc8fe43e94..593cf290ad 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -52,7 +52,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
     struct zstd_data *z = g_new0(struct zstd_data, 1);
     int res;
 
-    p->data = z;
+    p->compress_data = z;
     z->zcs = ZSTD_createCStream();
     if (!z->zcs) {
         g_free(z);
@@ -90,14 +90,14 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
  */
 static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
-    struct zstd_data *z = p->data;
+    struct zstd_data *z = p->compress_data;
 
     ZSTD_freeCStream(z->zcs);
     z->zcs = NULL;
     g_free(z->zbuff);
     z->zbuff = NULL;
-    g_free(p->data);
-    p->data = NULL;
+    g_free(p->compress_data);
+    p->compress_data = NULL;
 }
 
 /**
@@ -114,7 +114,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     MultiFDPages_t *pages = p->pages;
-    struct zstd_data *z = p->data;
+    struct zstd_data *z = p->compress_data;
     int ret;
     uint32_t i;
 
@@ -183,7 +183,7 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp)
     struct zstd_data *z = g_new0(struct zstd_data, 1);
     int ret;
 
-    p->data = z;
+    p->compress_data = z;
     z->zds = ZSTD_createDStream();
     if (!z->zds) {
         g_free(z);
@@ -221,14 +221,14 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp)
  */
 static void zstd_recv_cleanup(MultiFDRecvParams *p)
 {
-    struct zstd_data *z = p->data;
+    struct zstd_data *z = p->compress_data;
 
     ZSTD_freeDStream(z->zds);
     z->zds = NULL;
     g_free(z->zbuff);
     z->zbuff = NULL;
-    g_free(p->data);
-    p->data = NULL;
+    g_free(p->compress_data);
+    p->compress_data = NULL;
 }
 
 /**
@@ -248,7 +248,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
     uint32_t out_size = 0;
     uint32_t expected_size = p->normal_num * p->page_size;
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
-    struct zstd_data *z = p->data;
+    struct zstd_data *z = p->compress_data;
     int ret;
     int i;
 
diff --git a/migration/multifd.h b/migration/multifd.h
index 8a1cad0996..6c18732827 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -129,7 +129,7 @@ typedef struct {
     /* number of iovs used */
     uint32_t iovs_num;
     /* used for compression methods */
-    void *data;
+    void *compress_data;
 }  MultiFDSendParams;
 
 typedef struct {
@@ -185,7 +185,7 @@ typedef struct {
     /* num of non zero pages */
     uint32_t normal_num;
     /* used for de-compression methods */
-    void *data;
+    void *compress_data;
 } MultiFDRecvParams;
 
 typedef struct {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 17/34] migration/multifd: Decouple recv method from pages
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (15 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 16/34] migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 18/34] migration/multifd: Allow multifd without packets Fabiano Rosas
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Next patch will abstract the type of data being received by the
channels, so do some cleanup now to remove references to pages and
dependency on 'normal_num'.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-zlib.c |  6 +++---
 migration/multifd-zstd.c |  6 +++---
 migration/multifd.c      | 13 ++++++++-----
 migration/multifd.h      |  4 ++--
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 2a8f5fc9a6..6120faad65 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -234,7 +234,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
 }
 
 /**
- * zlib_recv_pages: read the data from the channel into actual pages
+ * zlib_recv: read the data from the channel into actual pages
  *
  * Read the compressed buffer, and uncompress it into the actual
  * pages.
@@ -244,7 +244,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
  * @p: Params for the channel that we are using
  * @errp: pointer to an error
  */
-static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int zlib_recv(MultiFDRecvParams *p, Error **errp)
 {
     struct zlib_data *z = p->compress_data;
     z_stream *zs = &z->zs;
@@ -319,7 +319,7 @@ static MultiFDMethods multifd_zlib_ops = {
     .send_prepare = zlib_send_prepare,
     .recv_setup = zlib_recv_setup,
     .recv_cleanup = zlib_recv_cleanup,
-    .recv_pages = zlib_recv_pages
+    .recv = zlib_recv
 };
 
 static void multifd_zlib_register(void)
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 593cf290ad..cac236833d 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -232,7 +232,7 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p)
 }
 
 /**
- * zstd_recv_pages: read the data from the channel into actual pages
+ * zstd_recv: read the data from the channel into actual pages
  *
  * Read the compressed buffer, and uncompress it into the actual
  * pages.
@@ -242,7 +242,7 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p)
  * @p: Params for the channel that we are using
  * @errp: pointer to an error
  */
-static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int zstd_recv(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t in_size = p->next_packet_size;
     uint32_t out_size = 0;
@@ -310,7 +310,7 @@ static MultiFDMethods multifd_zstd_ops = {
     .send_prepare = zstd_send_prepare,
     .recv_setup = zstd_recv_setup,
     .recv_cleanup = zstd_recv_cleanup,
-    .recv_pages = zstd_recv_pages
+    .recv = zstd_recv
 };
 
 static void multifd_zstd_register(void)
diff --git a/migration/multifd.c b/migration/multifd.c
index 43f0820996..5a38cb222f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -197,7 +197,7 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
 }
 
 /**
- * nocomp_recv_pages: read the data from the channel into actual pages
+ * nocomp_recv: read the data from the channel
  *
  * For no compression we just need to read things into the correct place.
  *
@@ -206,7 +206,7 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
  * @p: Params for the channel that we are using
  * @errp: pointer to an error
  */
-static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
@@ -228,7 +228,7 @@ static MultiFDMethods multifd_nocomp_ops = {
     .send_prepare = nocomp_send_prepare,
     .recv_setup = nocomp_recv_setup,
     .recv_cleanup = nocomp_recv_cleanup,
-    .recv_pages = nocomp_recv_pages
+    .recv = nocomp_recv
 };
 
 static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {
@@ -1216,6 +1216,8 @@ static void *multifd_recv_thread(void *opaque)
 
     while (true) {
         uint32_t flags;
+        bool has_data = false;
+        p->normal_num = 0;
 
         if (multifd_recv_should_exit()) {
             break;
@@ -1237,10 +1239,11 @@ static void *multifd_recv_thread(void *opaque)
         flags = p->flags;
         /* recv methods don't know how to handle the SYNC flag */
         p->flags &= ~MULTIFD_FLAG_SYNC;
+        has_data = !!p->normal_num;
         qemu_mutex_unlock(&p->mutex);
 
-        if (p->normal_num) {
-            ret = multifd_recv_state->ops->recv_pages(p, &local_err);
+        if (has_data) {
+            ret = multifd_recv_state->ops->recv(p, &local_err);
             if (ret != 0) {
                 break;
             }
diff --git a/migration/multifd.h b/migration/multifd.h
index 6c18732827..9a6a7a72df 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -199,8 +199,8 @@ typedef struct {
     int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
     /* Cleanup for receiving side */
     void (*recv_cleanup)(MultiFDRecvParams *p);
-    /* Read all pages */
-    int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
+    /* Read all data */
+    int (*recv)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 18/34] migration/multifd: Allow multifd without packets
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (16 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 17/34] migration/multifd: Decouple recv method from pages Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  5:57   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 19/34] migration/multifd: Allow receiving pages " Fabiano Rosas
                   ` (17 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

For the upcoming support to the new 'fixed-ram' migration stream
format, we cannot use multifd packets because each write into the
ramblock section in the migration file is expected to contain only the
guest pages. They are written at their respective offsets relative to
the ramblock section header.

There is no space for the packet information and the expected gains
from the new approach come partly from being able to write the pages
sequentially without extraneous data in between.

The new format also simply doesn't need the packets and all necessary
information can be taken from the standard migration headers with some
(future) changes to multifd code.

Use the presence of the fixed-ram capability to decide whether to send
packets.

This only moves code under multifd_use_packets(), it has no effect for
now as fixed-ram cannot yet be enabled with multifd.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 188 +++++++++++++++++++++++++++-----------------
 1 file changed, 117 insertions(+), 71 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 5a38cb222f..0a5279314d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -92,6 +92,11 @@ struct {
     MultiFDMethods *ops;
 } *multifd_recv_state;
 
+static bool multifd_use_packets(void)
+{
+    return !migrate_fixed_ram();
+}
+
 /* Multifd without compression */
 
 /**
@@ -136,10 +141,11 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
 static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     bool use_zero_copy_send = migrate_zero_copy_send();
+    bool use_packets = multifd_use_packets();
     MultiFDPages_t *pages = p->pages;
     int ret;
 
-    if (!use_zero_copy_send) {
+    if (!use_zero_copy_send && use_packets) {
         /*
          * Only !zerocopy needs the header in IOV; zerocopy will
          * send it separately.
@@ -156,14 +162,16 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
     p->next_packet_size = pages->num * p->page_size;
     p->flags |= MULTIFD_FLAG_NOCOMP;
 
-    multifd_send_fill_packet(p);
+    if (use_packets) {
+        multifd_send_fill_packet(p);
 
-    if (use_zero_copy_send) {
-        /* Send header first, without zerocopy */
-        ret = qio_channel_write_all(p->c, (void *)p->packet,
-                                    p->packet_len, errp);
-        if (ret != 0) {
-            return -1;
+        if (use_zero_copy_send) {
+            /* Send header first, without zerocopy */
+            ret = qio_channel_write_all(p->c, (void *)p->packet,
+                                        p->packet_len, errp);
+            if (ret != 0) {
+                return -1;
+            }
         }
     }
 
@@ -215,11 +223,16 @@ static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
                    p->id, flags, MULTIFD_FLAG_NOCOMP);
         return -1;
     }
-    for (int i = 0; i < p->normal_num; i++) {
-        p->iov[i].iov_base = p->host + p->normal[i];
-        p->iov[i].iov_len = p->page_size;
+
+    if (multifd_use_packets()) {
+        for (int i = 0; i < p->normal_num; i++) {
+            p->iov[i].iov_base = p->host + p->normal[i];
+            p->iov[i].iov_len = p->page_size;
+        }
+        return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
     }
-    return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
+
+    return 0;
 }
 
 static MultiFDMethods multifd_nocomp_ops = {
@@ -799,15 +812,18 @@ static void *multifd_send_thread(void *opaque)
     MigrationThread *thread = NULL;
     Error *local_err = NULL;
     int ret = 0;
+    bool use_packets = multifd_use_packets();
 
     thread = migration_threads_add(p->name, qemu_get_thread_id());
 
     trace_multifd_send_thread_start(p->id);
     rcu_register_thread();
 
-    if (multifd_send_initial_packet(p, &local_err) < 0) {
-        ret = -1;
-        goto out;
+    if (use_packets) {
+        if (multifd_send_initial_packet(p, &local_err) < 0) {
+            ret = -1;
+            goto out;
+        }
     }
 
     while (true) {
@@ -858,16 +874,20 @@ static void *multifd_send_thread(void *opaque)
              * it doesn't require explicit memory barriers.
              */
             assert(qatomic_read(&p->pending_sync));
-            p->flags = MULTIFD_FLAG_SYNC;
-            multifd_send_fill_packet(p);
-            ret = qio_channel_write_all(p->c, (void *)p->packet,
-                                        p->packet_len, &local_err);
-            if (ret != 0) {
-                break;
+
+            if (use_packets) {
+                p->flags = MULTIFD_FLAG_SYNC;
+                multifd_send_fill_packet(p);
+                ret = qio_channel_write_all(p->c, (void *)p->packet,
+                                            p->packet_len, &local_err);
+                if (ret != 0) {
+                    break;
+                }
+                /* p->next_packet_size will always be zero for a SYNC packet */
+                stat64_add(&mig_stats.multifd_bytes, p->packet_len);
+                p->flags = 0;
             }
-            /* p->next_packet_size will always be zero for a SYNC packet */
-            stat64_add(&mig_stats.multifd_bytes, p->packet_len);
-            p->flags = 0;
+
             qatomic_set(&p->pending_sync, false);
             qemu_sem_post(&p->sem_sync);
         }
@@ -1016,6 +1036,7 @@ bool multifd_send_setup(void)
     Error *local_err = NULL;
     int thread_count, ret = 0;
     uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+    bool use_packets = multifd_use_packets();
     uint8_t i;
 
     if (!migrate_multifd()) {
@@ -1038,27 +1059,35 @@ bool multifd_send_setup(void)
         qemu_sem_init(&p->sem_sync, 0);
         p->id = i;
         p->pages = multifd_pages_init(page_count);
-        p->packet_len = sizeof(MultiFDPacket_t)
-                      + sizeof(uint64_t) * page_count;
-        p->packet = g_malloc0(p->packet_len);
-        p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
-        p->packet->version = cpu_to_be32(MULTIFD_VERSION);
+
+        if (use_packets) {
+            p->packet_len = sizeof(MultiFDPacket_t)
+                          + sizeof(uint64_t) * page_count;
+            p->packet = g_malloc0(p->packet_len);
+            p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
+            p->packet->version = cpu_to_be32(MULTIFD_VERSION);
+
+            /* We need one extra place for the packet header */
+            p->iov = g_new0(struct iovec, page_count + 1);
+        } else {
+            p->iov = g_new0(struct iovec, page_count);
+        }
         p->name = g_strdup_printf("multifdsend_%d", i);
-        /* We need one extra place for the packet header */
-        p->iov = g_new0(struct iovec, page_count + 1);
         p->page_size = qemu_target_page_size();
         p->page_count = page_count;
         p->write_flags = 0;
         multifd_new_send_channel_create(p);
     }
 
-    /*
-     * Wait until channel creation has started for all channels. The
-     * creation can still fail, but no more channels will be created
-     * past this point.
-     */
-    for (i = 0; i < thread_count; i++) {
-        qemu_sem_wait(&multifd_send_state->channels_created);
+    if (use_packets) {
+        /*
+         * Wait until channel creation has started for all channels. The
+         * creation can still fail, but no more channels will be created
+         * past this point.
+         */
+        for (i = 0; i < thread_count; i++) {
+            qemu_sem_wait(&multifd_send_state->channels_created);
+        }
     }
 
     for (i = 0; i < thread_count; i++) {
@@ -1108,7 +1137,9 @@ static void multifd_recv_terminate_threads(Error *err)
          * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
          * however try to wakeup it without harm in cleanup phase.
          */
-        qemu_sem_post(&p->sem_sync);
+        if (multifd_use_packets()) {
+            qemu_sem_post(&p->sem_sync);
+        }
 
         /*
          * We could arrive here for two reasons:
@@ -1182,7 +1213,7 @@ void multifd_recv_sync_main(void)
 {
     int i;
 
-    if (!migrate_multifd()) {
+    if (!migrate_multifd() || !multifd_use_packets()) {
         return;
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -1209,13 +1240,14 @@ static void *multifd_recv_thread(void *opaque)
 {
     MultiFDRecvParams *p = opaque;
     Error *local_err = NULL;
+    bool use_packets = multifd_use_packets();
     int ret;
 
     trace_multifd_recv_thread_start(p->id);
     rcu_register_thread();
 
     while (true) {
-        uint32_t flags;
+        uint32_t flags = 0;
         bool has_data = false;
         p->normal_num = 0;
 
@@ -1223,25 +1255,27 @@ static void *multifd_recv_thread(void *opaque)
             break;
         }
 
-        ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
-                                       p->packet_len, &local_err);
-        if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
-            break;
-        }
+        if (use_packets) {
+            ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
+                                           p->packet_len, &local_err);
+            if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
+                break;
+            }
 
-        qemu_mutex_lock(&p->mutex);
-        ret = multifd_recv_unfill_packet(p, &local_err);
-        if (ret) {
+            qemu_mutex_lock(&p->mutex);
+            ret = multifd_recv_unfill_packet(p, &local_err);
+            if (ret) {
+                qemu_mutex_unlock(&p->mutex);
+                break;
+            }
+
+            flags = p->flags;
+            /* recv methods don't know how to handle the SYNC flag */
+            p->flags &= ~MULTIFD_FLAG_SYNC;
+            has_data = !!p->normal_num;
             qemu_mutex_unlock(&p->mutex);
-            break;
         }
 
-        flags = p->flags;
-        /* recv methods don't know how to handle the SYNC flag */
-        p->flags &= ~MULTIFD_FLAG_SYNC;
-        has_data = !!p->normal_num;
-        qemu_mutex_unlock(&p->mutex);
-
         if (has_data) {
             ret = multifd_recv_state->ops->recv(p, &local_err);
             if (ret != 0) {
@@ -1249,9 +1283,11 @@ static void *multifd_recv_thread(void *opaque)
             }
         }
 
-        if (flags & MULTIFD_FLAG_SYNC) {
-            qemu_sem_post(&multifd_recv_state->sem_sync);
-            qemu_sem_wait(&p->sem_sync);
+        if (use_packets) {
+            if (flags & MULTIFD_FLAG_SYNC) {
+                qemu_sem_post(&multifd_recv_state->sem_sync);
+                qemu_sem_wait(&p->sem_sync);
+            }
         }
     }
 
@@ -1270,6 +1306,7 @@ int multifd_recv_setup(Error **errp)
 {
     int thread_count;
     uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+    bool use_packets = multifd_use_packets();
     uint8_t i;
 
     /*
@@ -1294,9 +1331,12 @@ int multifd_recv_setup(Error **errp)
         qemu_mutex_init(&p->mutex);
         qemu_sem_init(&p->sem_sync, 0);
         p->id = i;
-        p->packet_len = sizeof(MultiFDPacket_t)
-                      + sizeof(uint64_t) * page_count;
-        p->packet = g_malloc0(p->packet_len);
+
+        if (use_packets) {
+            p->packet_len = sizeof(MultiFDPacket_t)
+                + sizeof(uint64_t) * page_count;
+            p->packet = g_malloc0(p->packet_len);
+        }
         p->name = g_strdup_printf("multifdrecv_%d", i);
         p->iov = g_new0(struct iovec, page_count);
         p->normal = g_new0(ram_addr_t, page_count);
@@ -1340,18 +1380,24 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 {
     MultiFDRecvParams *p;
     Error *local_err = NULL;
+    bool use_packets = multifd_use_packets();
     int id;
 
-    id = multifd_recv_initial_packet(ioc, &local_err);
-    if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
-        error_propagate_prepend(errp, local_err,
-                                "failed to receive packet"
-                                " via multifd channel %d: ",
-                                qatomic_read(&multifd_recv_state->count));
-        return;
+    if (use_packets) {
+        id = multifd_recv_initial_packet(ioc, &local_err);
+        if (id < 0) {
+            multifd_recv_terminate_threads(local_err);
+            error_propagate_prepend(errp, local_err,
+                                    "failed to receive packet"
+                                    " via multifd channel %d: ",
+                                    qatomic_read(&multifd_recv_state->count));
+            return;
+        }
+        trace_multifd_recv_new_channel(id);
+    } else {
+        /* next patch gives this a meaningful value */
+        id = 0;
     }
-    trace_multifd_recv_new_channel(id);
 
     p = &multifd_recv_state->params[id];
     if (p->c != NULL) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 19/34] migration/multifd: Allow receiving pages without packets
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (17 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 18/34] migration/multifd: Allow multifd without packets Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  6:58   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
                   ` (16 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Currently multifd does not need to have knowledge of pages on the
receiving side because all the information needed is within the
packets that come in the stream.

We're about to add support to fixed-ram migration, which cannot use
packets because it expects the ramblock section in the migration file
to contain only the guest pages data.

Add a data structure to transfer pages between the ram migration code
and the multifd receiving threads.

We don't want to reuse MultiFDPages_t for two reasons:

a) multifd threads don't really need to know about the data they're
   receiving.

b) the receiving side has to be stopped to load the pages, which means
   we can experiment with larger granularities than page size when
   transferring data.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
@Peter: a 'quit' flag cannot be used instead of pending_job. The
receiving thread needs know there's no more data coming. If the
migration thread sets a 'quit' flag, the multifd thread would see the
flag right away and exit. The only way is to clear pending_job on the
thread and spin once more.
---
 migration/file.c    |   1 +
 migration/multifd.c | 122 +++++++++++++++++++++++++++++++++++++++++---
 migration/multifd.h |  15 ++++++
 3 files changed, 131 insertions(+), 7 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index 5d4975f43e..22d052a71f 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "exec/ramblock.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
 #include "channel.h"
diff --git a/migration/multifd.c b/migration/multifd.c
index 0a5279314d..45a0c7aaa8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -81,9 +81,15 @@ struct {
 
 struct {
     MultiFDRecvParams *params;
+    MultiFDRecvData *data;
     /* number of created threads */
     int count;
-    /* syncs main thread and channels */
+    /*
+     * For sockets: this is posted once for each MULTIFD_FLAG_SYNC flag.
+     *
+     * For files: this is only posted at the end of the file load to mark
+     *            completion of the load process.
+     */
     QemuSemaphore sem_sync;
     /* global number of generated multifd packets */
     uint64_t packet_num;
@@ -1110,6 +1116,53 @@ bool multifd_send_setup(void)
     return true;
 }
 
+bool multifd_recv(void)
+{
+    int i;
+    static int next_recv_channel;
+    MultiFDRecvParams *p = NULL;
+    MultiFDRecvData *data = multifd_recv_state->data;
+
+    /*
+     * next_channel can remain from a previous migration that was
+     * using more channels, so ensure it doesn't overflow if the
+     * limit is lower now.
+     */
+    next_recv_channel %= migrate_multifd_channels();
+    for (i = next_recv_channel;; i = (i + 1) % migrate_multifd_channels()) {
+        if (multifd_recv_should_exit()) {
+            return false;
+        }
+
+        p = &multifd_recv_state->params[i];
+
+        /*
+         * Safe to read atomically without a lock because the flag is
+         * only set by this function below. Reading an old value of
+         * true is not an issue because it would only send us looking
+         * for the next idle channel.
+         */
+        if (qatomic_read(&p->pending_job) == false) {
+            next_recv_channel = (i + 1) % migrate_multifd_channels();
+            break;
+        }
+    }
+
+    assert(!p->data->size);
+    multifd_recv_state->data = p->data;
+    p->data = data;
+
+    qatomic_set(&p->pending_job, true);
+    qemu_sem_post(&p->sem);
+
+    return true;
+}
+
+MultiFDRecvData *multifd_get_recv_data(void)
+{
+    return multifd_recv_state->data;
+}
+
 static void multifd_recv_terminate_threads(Error *err)
 {
     int i;
@@ -1134,11 +1187,26 @@ static void multifd_recv_terminate_threads(Error *err)
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
         /*
-         * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
-         * however try to wakeup it without harm in cleanup phase.
+         * The migration thread and channels interact differently
+         * depending on the presence of packets.
          */
         if (multifd_use_packets()) {
+            /*
+             * The channel receives as long as there are packets. When
+             * packets end (i.e. MULTIFD_FLAG_SYNC is reached), the
+             * channel waits for the migration thread to sync. If the
+             * sync never happens, do it here.
+             */
             qemu_sem_post(&p->sem_sync);
+        } else {
+            /*
+             * The channel waits for the migration thread to give it
+             * work. When the migration thread runs out of work, it
+             * releases the channel and waits for any pending work to
+             * finish. If we reach here (e.g. due to error) before the
+             * work runs out, release the channel.
+             */
+            qemu_sem_post(&p->sem);
         }
 
         /*
@@ -1167,6 +1235,7 @@ static void multifd_recv_cleanup_channel(MultiFDRecvParams *p)
     p->c = NULL;
     qemu_mutex_destroy(&p->mutex);
     qemu_sem_destroy(&p->sem_sync);
+    qemu_sem_destroy(&p->sem);
     g_free(p->name);
     p->name = NULL;
     p->packet_len = 0;
@@ -1184,6 +1253,8 @@ static void multifd_recv_cleanup_state(void)
     qemu_sem_destroy(&multifd_recv_state->sem_sync);
     g_free(multifd_recv_state->params);
     multifd_recv_state->params = NULL;
+    g_free(multifd_recv_state->data);
+    multifd_recv_state->data = NULL;
     g_free(multifd_recv_state);
     multifd_recv_state = NULL;
 }
@@ -1251,11 +1322,11 @@ static void *multifd_recv_thread(void *opaque)
         bool has_data = false;
         p->normal_num = 0;
 
-        if (multifd_recv_should_exit()) {
-            break;
-        }
-
         if (use_packets) {
+            if (multifd_recv_should_exit()) {
+                break;
+            }
+
             ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
                                            p->packet_len, &local_err);
             if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
@@ -1274,6 +1345,26 @@ static void *multifd_recv_thread(void *opaque)
             p->flags &= ~MULTIFD_FLAG_SYNC;
             has_data = !!p->normal_num;
             qemu_mutex_unlock(&p->mutex);
+        } else {
+            /*
+             * No packets, so we need to wait for the vmstate code to
+             * give us work.
+             */
+            qemu_sem_wait(&p->sem);
+
+            if (multifd_recv_should_exit()) {
+                break;
+            }
+
+            /*
+             * Migration thread did not send work, break and signal
+             * sem_sync so it knows we're not lagging behind.
+             */
+            if (!qatomic_read(&p->pending_job)) {
+                break;
+            }
+
+            has_data = !!p->data->size;
         }
 
         if (has_data) {
@@ -1288,9 +1379,17 @@ static void *multifd_recv_thread(void *opaque)
                 qemu_sem_post(&multifd_recv_state->sem_sync);
                 qemu_sem_wait(&p->sem_sync);
             }
+        } else {
+            p->total_normal_pages += p->data->size / qemu_target_page_size();
+            p->data->size = 0;
+            qatomic_set(&p->pending_job, false);
         }
     }
 
+    if (!use_packets) {
+        qemu_sem_post(&p->sem_sync);
+    }
+
     if (local_err) {
         multifd_recv_terminate_threads(local_err);
         error_free(local_err);
@@ -1320,6 +1419,10 @@ int multifd_recv_setup(Error **errp)
     thread_count = migrate_multifd_channels();
     multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
     multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
+
+    multifd_recv_state->data = g_new0(MultiFDRecvData, 1);
+    multifd_recv_state->data->size = 0;
+
     qatomic_set(&multifd_recv_state->count, 0);
     qatomic_set(&multifd_recv_state->exiting, 0);
     qemu_sem_init(&multifd_recv_state->sem_sync, 0);
@@ -1330,8 +1433,13 @@ int multifd_recv_setup(Error **errp)
 
         qemu_mutex_init(&p->mutex);
         qemu_sem_init(&p->sem_sync, 0);
+        qemu_sem_init(&p->sem, 0);
+        p->pending_job = false;
         p->id = i;
 
+        p->data = g_new0(MultiFDRecvData, 1);
+        p->data->size = 0;
+
         if (use_packets) {
             p->packet_len = sizeof(MultiFDPacket_t)
                 + sizeof(uint64_t) * page_count;
diff --git a/migration/multifd.h b/migration/multifd.h
index 9a6a7a72df..19188815a3 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
+typedef struct MultiFDRecvData MultiFDRecvData;
+
 bool multifd_send_setup(void);
 void multifd_send_shutdown(void);
 int multifd_recv_setup(Error **errp);
@@ -23,6 +25,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
 int multifd_send_sync_main(void);
 bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
+bool multifd_recv(void);
+MultiFDRecvData *multifd_get_recv_data(void);
 
 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
@@ -63,6 +67,13 @@ typedef struct {
     RAMBlock *block;
 } MultiFDPages_t;
 
+struct MultiFDRecvData {
+    void *opaque;
+    size_t size;
+    /* for preadv */
+    off_t file_offset;
+};
+
 typedef struct {
     /* Fields are only written at creating/deletion time */
     /* No lock required for them, they are read only */
@@ -154,6 +165,8 @@ typedef struct {
 
     /* syncs main thread and channels */
     QemuSemaphore sem_sync;
+    /* sem where to wait for more work */
+    QemuSemaphore sem;
 
     /* this mutex protects the following parameters */
     QemuMutex mutex;
@@ -163,6 +176,8 @@ typedef struct {
     uint32_t flags;
     /* global number of generated multifd packets */
     uint64_t packet_num;
+    int pending_job;
+    MultiFDRecvData *data;
 
     /* thread local variables. No locking required */
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (18 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 19/34] migration/multifd: Allow receiving pages " Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  7:10   ` Peter Xu
  2024-02-26  7:21   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 21/34] migration/multifd: Add incoming " Fabiano Rosas
                   ` (15 subsequent siblings)
  35 siblings, 2 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Allow multifd to open file-backed channels. This will be used when
enabling the fixed-ram migration stream format which expects a
seekable transport.

The QIOChannel read and write methods will use the preadv/pwritev
versions which don't update the file offset at each call so we can
reuse the fd without re-opening for every channel.

Contrary to the socket migration, the file migration doesn't need an
asynchronous channel creation process, so expose
multifd_channel_connect() and call it directly.

Note that this is just setup code and multifd cannot yet make use of
the file channels.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c    | 40 ++++++++++++++++++++++++++++++++++++++--
 migration/file.h    |  5 +++++
 migration/multifd.c | 27 ++++++++++++++++++++++-----
 migration/multifd.h |  2 ++
 4 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index 22d052a71f..ac9f6ae40a 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -12,12 +12,17 @@
 #include "channel.h"
 #include "file.h"
 #include "migration.h"
+#include "multifd.h"
 #include "io/channel-file.h"
 #include "io/channel-util.h"
 #include "trace.h"
 
 #define OFFSET_OPTION ",offset="
 
+static struct FileOutgoingArgs {
+    char *fname;
+} outgoing_args;
+
 /* Remove the offset option from @filespec and return it in @offsetp. */
 
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
@@ -37,6 +42,34 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
     return 0;
 }
 
+int file_send_channel_destroy(QIOChannel *ioc)
+{
+    if (ioc) {
+        qio_channel_close(ioc, NULL);
+    }
+    g_free(outgoing_args.fname);
+    outgoing_args.fname = NULL;
+
+    return 0;
+}
+
+bool file_send_channel_create(gpointer opaque, Error **errp)
+{
+    QIOChannelFile *ioc;
+    int flags = O_WRONLY;
+
+    ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
+    if (!ioc) {
+        return false;
+    }
+
+    if (!multifd_channel_connect(opaque, QIO_CHANNEL(ioc), errp)) {
+        return false;
+    }
+
+    return true;
+}
+
 void file_start_outgoing_migration(MigrationState *s,
                                    FileMigrationArgs *file_args, Error **errp)
 {
@@ -44,15 +77,18 @@ void file_start_outgoing_migration(MigrationState *s,
     g_autofree char *filename = g_strdup(file_args->filename);
     uint64_t offset = file_args->offset;
     QIOChannel *ioc;
+    int flags = O_CREAT | O_TRUNC | O_WRONLY;
+    mode_t mode = 0660;
 
     trace_migration_file_outgoing(filename);
 
-    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
-                                     0600, errp);
+    fioc = qio_channel_file_new_path(filename, flags, mode, errp);
     if (!fioc) {
         return;
     }
 
+    outgoing_args.fname = g_strdup(filename);
+
     ioc = QIO_CHANNEL(fioc);
     if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
         return;
diff --git a/migration/file.h b/migration/file.h
index 37d6a08bfc..90794b494b 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -9,10 +9,15 @@
 #define QEMU_MIGRATION_FILE_H
 
 #include "qapi/qapi-types-migration.h"
+#include "io/task.h"
+#include "channel.h"
 
 void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp);
 
 void file_start_outgoing_migration(MigrationState *s,
                                    FileMigrationArgs *file_args, Error **errp);
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
+
+bool file_send_channel_create(gpointer opaque, Error **errp);
+int file_send_channel_destroy(QIOChannel *ioc);
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index 45a0c7aaa8..507b497d52 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -17,6 +17,7 @@
 #include "exec/ramblock.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "file.h"
 #include "ram.h"
 #include "migration.h"
 #include "migration-stats.h"
@@ -28,6 +29,7 @@
 #include "threadinfo.h"
 #include "options.h"
 #include "qemu/yank.h"
+#include "io/channel-file.h"
 #include "io/channel-socket.h"
 #include "yank_functions.h"
 
@@ -680,6 +682,9 @@ static void multifd_send_terminate_threads(void)
 
 static int multifd_send_channel_destroy(QIOChannel *send)
 {
+    if (!multifd_use_packets()) {
+        return file_send_channel_destroy(send);
+    }
     return socket_send_channel_destroy(send);
 }
 
@@ -959,9 +964,8 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p,
     return true;
 }
 
-static bool multifd_channel_connect(MultiFDSendParams *p,
-                                    QIOChannel *ioc,
-                                    Error **errp)
+bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc,
+                             Error **errp)
 {
     qio_channel_set_delay(ioc, false);
 
@@ -1031,9 +1035,14 @@ out:
     error_free(local_err);
 }
 
-static void multifd_new_send_channel_create(gpointer opaque)
+static bool multifd_new_send_channel_create(gpointer opaque, Error **errp)
 {
+    if (!multifd_use_packets()) {
+        return file_send_channel_create(opaque, errp);
+    }
+
     socket_send_channel_create(multifd_new_send_channel_async, opaque);
+    return true;
 }
 
 bool multifd_send_setup(void)
@@ -1082,7 +1091,15 @@ bool multifd_send_setup(void)
         p->page_size = qemu_target_page_size();
         p->page_count = page_count;
         p->write_flags = 0;
-        multifd_new_send_channel_create(p);
+
+        if (!multifd_new_send_channel_create(p, &local_err)) {
+            /*
+             * File channel creation is synchronous, we don't need the
+             * semaphore below, it's safe to return now.
+             */
+            assert(migrate_fixed_ram());
+            return -1;
+        }
     }
 
     if (use_packets) {
diff --git a/migration/multifd.h b/migration/multifd.h
index 19188815a3..135f6ed098 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -228,5 +228,7 @@ static inline void multifd_send_prepare_header(MultiFDSendParams *p)
     p->iovs_num++;
 }
 
+bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc,
+                             Error **errp);
 
 #endif
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 21/34] migration/multifd: Add incoming QIOChannelFile support
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (19 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  7:34   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration Fabiano Rosas
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

On the receiving side we don't need to differentiate between main
channel and threads, so whichever channel is defined first gets to be
the main one. And since there are no packets, use the atomic channel
count to index into the params array.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c      | 34 ++++++++++++++++++++++++++--------
 migration/migration.c |  3 ++-
 migration/multifd.c   |  3 +--
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index ac9f6ae40a..a186dc592a 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -8,6 +8,7 @@
 #include "qemu/osdep.h"
 #include "exec/ramblock.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "channel.h"
 #include "file.h"
@@ -15,6 +16,7 @@
 #include "multifd.h"
 #include "io/channel-file.h"
 #include "io/channel-util.h"
+#include "options.h"
 #include "trace.h"
 
 #define OFFSET_OPTION ",offset="
@@ -111,7 +113,8 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
     g_autofree char *filename = g_strdup(file_args->filename);
     QIOChannelFile *fioc = NULL;
     uint64_t offset = file_args->offset;
-    QIOChannel *ioc;
+    int channels = 1;
+    int i = 0, fd;
 
     trace_migration_file_incoming(filename);
 
@@ -120,13 +123,28 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
         return;
     }
 
-    ioc = QIO_CHANNEL(fioc);
-    if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
+    if (offset &&
+        qio_channel_io_seek(QIO_CHANNEL(fioc), offset, SEEK_SET, errp) < 0) {
         return;
     }
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
-    qio_channel_add_watch_full(ioc, G_IO_IN,
-                               file_accept_incoming_migration,
-                               NULL, NULL,
-                               g_main_context_get_thread_default());
+
+    if (migrate_multifd()) {
+        channels += migrate_multifd_channels();
+    }
+
+    fd = fioc->fd;
+
+    do {
+        QIOChannel *ioc = QIO_CHANNEL(fioc);
+
+        qio_channel_set_name(ioc, "migration-file-incoming");
+        qio_channel_add_watch_full(ioc, G_IO_IN,
+                                   file_accept_incoming_migration,
+                                   NULL, NULL,
+                                   g_main_context_get_thread_default());
+    } while (++i < channels && (fioc = qio_channel_file_new_fd(fd)));
+
+    if (!fioc) {
+        error_setg(errp, "Error creating migration incoming channel");
+    }
 }
diff --git a/migration/migration.c b/migration/migration.c
index 16da269847..e2218b9de7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -896,7 +896,8 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
     uint32_t channel_magic = 0;
     int ret = 0;
 
-    if (migrate_multifd() && !migrate_postcopy_ram() &&
+    if (migrate_multifd() && !migrate_fixed_ram() &&
+        !migrate_postcopy_ram() &&
         qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
         /*
          * With multiple channels, it is possible that we receive channels
diff --git a/migration/multifd.c b/migration/multifd.c
index 507b497d52..cb5f4fb3e0 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1520,8 +1520,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
         }
         trace_multifd_recv_new_channel(id);
     } else {
-        /* next patch gives this a meaningful value */
-        id = 0;
+        id = qatomic_read(&multifd_recv_state->count);
     }
 
     p = &multifd_recv_state->params[id];
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (20 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 21/34] migration/multifd: Add incoming " Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  7:47   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 23/34] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

The fixed-ram migration can be performed live or non-live, but it is
always asynchronous, i.e. the source machine and the destination
machine are not migrating at the same time. We only need some pieces
of the multifd sync operations.

multifd_send_sync_main()
------------------------
  Issued by the ram migration code on the migration thread, causes the
  multifd send channels to synchronize with the migration thread and
  makes the sending side emit a packet with the MULTIFD_FLUSH flag.

  With fixed-ram we want to maintain the sync on the sending side
  because that provides ordering between the rounds of dirty pages when
  migrating live.

MULTIFD_FLUSH
-------------
  On the receiving side, the presence of the MULTIFD_FLUSH flag on a
  packet causes the receiving channels to start synchronizing with the
  main thread.

  We're not using packets with fixed-ram, so there's no MULTIFD_FLUSH
  flag and therefore no channel sync on the receiving side.

multifd_recv_sync_main()
------------------------
  Issued by the migration thread when the ram migration flag
  RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
  on the receiving side to start synchronizing with the recv
  channels. Due to compatibility, this is also issued when
  RAM_SAVE_FLAG_EOS is received.

  For fixed-ram we only need to synchronize the channels at the end of
  migration to avoid doing cleanup before the channels have finished
  their IO.

Make sure the multifd syncs are only issued at the appropriate
times. Note that due to pre-existing backward compatibility issues, we
have the multifd_flush_after_each_section property that enables an
older behavior of synchronizing channels more frequently (and
inefficiently). Fixed-ram should always run with that property
disabled (default).

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/ram.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5932e1b8e1..c7050f6f68 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1369,8 +1369,11 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
                 if (ret < 0) {
                     return ret;
                 }
-                qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
-                qemu_fflush(f);
+
+                if (!migrate_fixed_ram()) {
+                    qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
+                    qemu_fflush(f);
+                }
             }
             /*
              * If memory migration starts over, we will meet a dirtied page
@@ -3112,7 +3115,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
+    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()
+        && !migrate_fixed_ram()) {
         qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
     }
 
@@ -4253,6 +4257,15 @@ static int ram_load_precopy(QEMUFile *f)
             break;
         case RAM_SAVE_FLAG_EOS:
             /* normal exit */
+            if (migrate_fixed_ram()) {
+                /*
+                 * The EOS flag appears multiple times on the
+                 * stream. Fixed-ram needs only one sync at the
+                 * end. It will be done on the flush flag above.
+                 */
+                break;
+            }
+
             if (migrate_multifd() &&
                 migrate_multifd_flush_after_each_section()) {
                 multifd_recv_sync_main();
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 23/34] migration/multifd: Support outgoing fixed-ram stream format
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (21 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  8:08   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 24/34] migration/multifd: Support incoming " Fabiano Rosas
                   ` (12 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

The new fixed-ram stream format uses a file transport and puts ram
pages in the migration file at their respective offsets and can be
done in parallel by using the pwritev system call which takes iovecs
and an offset.

Add support to enabling the new format along with multifd to make use
of the threading and page handling already in place.

This requires multifd to stop sending headers and leaving the stream
format to the fixed-ram code. When it comes time to write the data, we
need to call a version of qio_channel_write that can take an offset.

Usage on HMP is:

(qemu) stop
(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability fixed-ram on
(qemu) migrate_set_parameter max-bandwidth 0
(qemu) migrate_set_parameter multifd-channels 8
(qemu) migrate file:migfile

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 include/qemu/bitops.h | 13 ++++++++++++
 migration/file.c      | 47 +++++++++++++++++++++++++++++++++++++++++++
 migration/file.h      |  2 ++
 migration/migration.c | 12 ++++++-----
 migration/multifd.c   | 24 ++++++++++++++++++++--
 migration/options.c   | 14 +++++++------
 migration/ram.c       | 17 +++++++++++++---
 migration/ram.h       |  1 +
 8 files changed, 114 insertions(+), 16 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index cb3526d1f4..2c0a2fe751 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -67,6 +67,19 @@ static inline void clear_bit(long nr, unsigned long *addr)
     *p &= ~mask;
 }
 
+/**
+ * clear_bit_atomic - Clears a bit in memory atomically
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ */
+static inline void clear_bit_atomic(long nr, unsigned long *addr)
+{
+    unsigned long mask = BIT_MASK(nr);
+    unsigned long *p = addr + BIT_WORD(nr);
+
+    return qatomic_and(p, ~mask);
+}
+
 /**
  * change_bit - Toggle a bit in memory
  * @nr: Bit to change
diff --git a/migration/file.c b/migration/file.c
index a186dc592a..94e8e08363 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -148,3 +148,50 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
         error_setg(errp, "Error creating migration incoming channel");
     }
 }
+
+int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
+                            int niov, RAMBlock *block, Error **errp)
+{
+    ssize_t ret = -1;
+    int i, slice_idx, slice_num;
+    uintptr_t base, next, offset;
+    size_t len;
+
+    slice_idx = 0;
+    slice_num = 1;
+
+    /*
+     * If the iov array doesn't have contiguous elements, we need to
+     * split it in slices because we only have one file offset for the
+     * whole iov. Do this here so callers don't need to break the iov
+     * array themselves.
+     */
+    for (i = 0; i < niov; i++, slice_num++) {
+        base = (uintptr_t) iov[i].iov_base;
+
+        if (i != niov - 1) {
+            len = iov[i].iov_len;
+            next = (uintptr_t) iov[i + 1].iov_base;
+
+            if (base + len == next) {
+                continue;
+            }
+        }
+
+        /*
+         * Use the offset of the first element of the segment that
+         * we're sending.
+         */
+        offset = (uintptr_t) iov[slice_idx].iov_base - (uintptr_t) block->host;
+        ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
+                                  block->pages_offset + offset, errp);
+        if (ret < 0) {
+            break;
+        }
+
+        slice_idx += slice_num;
+        slice_num = 0;
+    }
+
+    return (ret < 0) ? -1 : 0;
+}
diff --git a/migration/file.h b/migration/file.h
index 90794b494b..390dcc6821 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -20,4 +20,6 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
 
 bool file_send_channel_create(gpointer opaque, Error **errp);
 int file_send_channel_destroy(QIOChannel *ioc);
+int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
+                            int niov, RAMBlock *block, Error **errp);
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index e2218b9de7..32b291a282 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -134,12 +134,14 @@ static bool transport_supports_multi_channels(MigrationAddress *addr)
     if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
         SocketAddress *saddr = &addr->u.socket;
 
-        return saddr->type == SOCKET_ADDRESS_TYPE_INET ||
-               saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
-               saddr->type == SOCKET_ADDRESS_TYPE_VSOCK;
+        return (saddr->type == SOCKET_ADDRESS_TYPE_INET ||
+                saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
+                saddr->type == SOCKET_ADDRESS_TYPE_VSOCK);
+    } else if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
+        return migrate_fixed_ram();
+    } else {
+        return false;
     }
-
-    return false;
 }
 
 static bool migration_needs_seekable_channel(void)
diff --git a/migration/multifd.c b/migration/multifd.c
index cb5f4fb3e0..b251c58ec2 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -105,6 +105,17 @@ static bool multifd_use_packets(void)
     return !migrate_fixed_ram();
 }
 
+static void multifd_set_file_bitmap(MultiFDSendParams *p)
+{
+    MultiFDPages_t *pages = p->pages;
+
+    assert(pages->block);
+
+    for (int i = 0; i < p->pages->num; i++) {
+        ramblock_set_file_bmap_atomic(pages->block, pages->offset[i]);
+    }
+}
+
 /* Multifd without compression */
 
 /**
@@ -181,6 +192,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
                 return -1;
             }
         }
+    } else {
+        multifd_set_file_bitmap(p);
     }
 
     return 0;
@@ -860,8 +873,15 @@ static void *multifd_send_thread(void *opaque)
                 break;
             }
 
-            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
-                                              0, p->write_flags, &local_err);
+            if (migrate_fixed_ram()) {
+                ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num,
+                                              p->pages->block, &local_err);
+            } else {
+                ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num,
+                                                  NULL, 0, p->write_flags,
+                                                  &local_err);
+            }
+
             if (ret != 0) {
                 break;
             }
diff --git a/migration/options.c b/migration/options.c
index 4909e5c72a..bfcd2d7132 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -654,12 +654,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
     }
 
     if (new_caps[MIGRATION_CAPABILITY_FIXED_RAM]) {
-        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
-            error_setg(errp,
-                       "Fixed-ram migration is incompatible with multifd");
-            return false;
-        }
-
         if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
             error_setg(errp,
                        "Fixed-ram migration is incompatible with xbzrle");
@@ -1252,6 +1246,14 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
     }
 #endif
 
+    if (migrate_fixed_ram() &&
+        ((params->has_multifd_compression && params->multifd_compression) ||
+         (params->tls_creds && *params->tls_creds))) {
+        error_setg(errp,
+                   "Fixed-ram only available for non-compressed non-TLS multifd migration");
+        return false;
+    }
+
     if (params->has_x_vcpu_dirty_limit_period &&
         (params->x_vcpu_dirty_limit_period < 1 ||
          params->x_vcpu_dirty_limit_period > 1000)) {
diff --git a/migration/ram.c b/migration/ram.c
index c7050f6f68..ad540ae9ce 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1149,7 +1149,7 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
 
     if (migrate_fixed_ram()) {
         /* zero pages are not transferred with fixed-ram */
-        clear_bit(offset >> TARGET_PAGE_BITS, pss->block->file_bmap);
+        clear_bit_atomic(offset >> TARGET_PAGE_BITS, pss->block->file_bmap);
         return 1;
     }
 
@@ -2445,8 +2445,6 @@ static void ram_save_cleanup(void *opaque)
         block->clear_bmap = NULL;
         g_free(block->bmap);
         block->bmap = NULL;
-        g_free(block->file_bmap);
-        block->file_bmap = NULL;
     }
 
     xbzrle_cleanup();
@@ -3135,9 +3133,22 @@ static void ram_save_file_bmap(QEMUFile *f)
         qemu_put_buffer_at(f, (uint8_t *)block->file_bmap, bitmap_size,
                            block->bitmap_offset);
         ram_transferred_add(bitmap_size);
+
+        /*
+         * Free the bitmap here to catch any synchronization issues
+         * with multifd channels. No channels should be sending pages
+         * after we've written the bitmap to file.
+         */
+        g_free(block->file_bmap);
+        block->file_bmap = NULL;
     }
 }
 
+void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset)
+{
+    set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
+}
+
 /**
  * ram_save_iterate: iterative stage for migration
  *
diff --git a/migration/ram.h b/migration/ram.h
index 9b937a446b..b9ac0da587 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -75,6 +75,7 @@ bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
+void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset);
 
 /* ram cache */
 int colo_init_ram_cache(void);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 24/34] migration/multifd: Support incoming fixed-ram stream format
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (22 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 23/34] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  8:30   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 25/34] migration/multifd: Add fixed-ram support to fd: URI Fabiano Rosas
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

For the incoming fixed-ram migration we need to read the ramblock
headers, get the pages bitmap and send the host address of each
non-zero page to the multifd channel thread for writing.

Usage on HMP is:

(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability fixed-ram on
(qemu) migrate_incoming file:migfile

(the ram.h include needs to move because we've been previously relying
on it being included from migration.c. Now file.h will start including
multifd.h before migration.o is processed)

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c    | 25 ++++++++++++++++++++++++-
 migration/file.h    |  2 ++
 migration/multifd.c | 34 ++++++++++++++++++++++++++++++----
 migration/multifd.h |  2 ++
 migration/ram.c     | 36 +++++++++++++++++++++++++++++++++---
 5 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index 94e8e08363..1a18e608fc 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -13,7 +13,6 @@
 #include "channel.h"
 #include "file.h"
 #include "migration.h"
-#include "multifd.h"
 #include "io/channel-file.h"
 #include "io/channel-util.h"
 #include "options.h"
@@ -195,3 +194,27 @@ int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
 
     return (ret < 0) ? -1 : 0;
 }
+
+int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp)
+{
+    MultiFDRecvData *data = p->data;
+    size_t ret;
+    uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
+
+    if (flags != MULTIFD_FLAG_NOCOMP) {
+        error_setg(errp, "multifd %u: flags received %x flags expected %x",
+                   p->id, flags, MULTIFD_FLAG_NOCOMP);
+        return -1;
+    }
+
+    ret = qio_channel_pread(p->c, (char *) data->opaque,
+                            data->size, data->file_offset, errp);
+    if (ret != data->size) {
+        error_prepend(errp,
+                      "multifd recv (%u): read 0x%zx, expected 0x%zx",
+                      p->id, ret, data->size);
+        return -1;
+    }
+
+    return 0;
+}
diff --git a/migration/file.h b/migration/file.h
index 390dcc6821..9fe8af73fc 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -11,6 +11,7 @@
 #include "qapi/qapi-types-migration.h"
 #include "io/task.h"
 #include "channel.h"
+#include "multifd.h"
 
 void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp);
 
@@ -22,4 +23,5 @@ bool file_send_channel_create(gpointer opaque, Error **errp);
 int file_send_channel_destroy(QIOChannel *ioc);
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
                             int niov, RAMBlock *block, Error **errp);
+int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index b251c58ec2..a0202b5661 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -18,7 +18,6 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "file.h"
-#include "ram.h"
 #include "migration.h"
 #include "migration-stats.h"
 #include "socket.h"
@@ -251,9 +250,9 @@ static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
             p->iov[i].iov_len = p->page_size;
         }
         return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
+    } else {
+        return multifd_file_recv_data(p, errp);
     }
-
-    return 0;
 }
 
 static MultiFDMethods multifd_nocomp_ops = {
@@ -1317,13 +1316,40 @@ void multifd_recv_cleanup(void)
     multifd_recv_cleanup_state();
 }
 
+
+/*
+ * Wait until all channels have finished receiving data. Once this
+ * function returns, cleanup routines are safe to run.
+ */
+static void multifd_file_recv_sync(void)
+{
+    int i;
+
+    for (i = 0; i < migrate_multifd_channels(); i++) {
+        MultiFDRecvParams *p = &multifd_recv_state->params[i];
+
+        trace_multifd_recv_sync_main_wait(p->id);
+
+        qemu_sem_post(&p->sem);
+
+        trace_multifd_recv_sync_main_signal(p->id);
+        qemu_sem_wait(&p->sem_sync);
+    }
+    return;
+}
+
 void multifd_recv_sync_main(void)
 {
     int i;
 
-    if (!migrate_multifd() || !multifd_use_packets()) {
+    if (!migrate_multifd()) {
         return;
     }
+
+    if (!multifd_use_packets()) {
+        return multifd_file_recv_sync();
+    }
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
diff --git a/migration/multifd.h b/migration/multifd.h
index 135f6ed098..8f89199721 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
+#include "ram.h"
+
 typedef struct MultiFDRecvData MultiFDRecvData;
 
 bool multifd_send_setup(void);
diff --git a/migration/ram.c b/migration/ram.c
index ad540ae9ce..826ac745a0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -111,6 +111,7 @@
  * pages region in the migration file at a time.
  */
 #define FIXED_RAM_LOAD_BUF_SIZE 0x100000
+#define FIXED_RAM_MULTIFD_LOAD_BUF_SIZE 0x100000
 
 XBZRLECacheStats xbzrle_counters;
 
@@ -3950,6 +3951,27 @@ void colo_flush_ram_cache(void)
     trace_colo_flush_ram_cache_end();
 }
 
+static size_t ram_load_multifd_pages(void *host_addr, size_t size,
+                                     uint64_t offset)
+{
+    MultiFDRecvData *data = multifd_get_recv_data();
+
+    /*
+     * Pointing the opaque directly to the host buffer, no
+     * preprocessing needed.
+     */
+    data->opaque = host_addr;
+
+    data->file_offset = offset;
+    data->size = size;
+
+    if (!multifd_recv()) {
+        return 0;
+    }
+
+    return size;
+}
+
 static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
                                     long num_pages, unsigned long *bitmap,
                                     Error **errp)
@@ -3959,6 +3981,8 @@ static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
     ram_addr_t offset;
     void *host;
     size_t read, unread, size;
+    size_t buf_size = (migrate_multifd() ? FIXED_RAM_MULTIFD_LOAD_BUF_SIZE :
+                       FIXED_RAM_LOAD_BUF_SIZE);
 
     for (set_bit_idx = find_first_bit(bitmap, num_pages);
          set_bit_idx < num_pages;
@@ -3977,10 +4001,16 @@ static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
                 return false;
             }
 
-            size = MIN(unread, FIXED_RAM_LOAD_BUF_SIZE);
+            size = MIN(unread, buf_size);
+
+            if (migrate_multifd()) {
+                read = ram_load_multifd_pages(host, size,
+                                              block->pages_offset + offset);
+            } else {
+                read = qemu_get_buffer_at(f, host, size,
+                                          block->pages_offset + offset);
+            }
 
-            read = qemu_get_buffer_at(f, host, size,
-                                      block->pages_offset + offset);
             if (!read) {
                 goto err;
             }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 25/34] migration/multifd: Add fixed-ram support to fd: URI
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (23 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 24/34] migration/multifd: Support incoming " Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  8:37   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 26/34] tests/qtest/migration: Add a multifd + fixed-ram migration test Fabiano Rosas
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

If we receive a file descriptor that points to a regular file, there's
nothing stopping us from doing multifd migration with fixed-ram to
that file.

Enable the fd: URI to work with multifd + fixed-ram.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/fd.c        | 30 ++++++++++++++++++++++++++++++
 migration/fd.h        |  1 +
 migration/file.c      | 12 +++++++++---
 migration/migration.c |  4 ++++
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index 0eb677dcae..b7e4d071a4 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -19,14 +19,28 @@
 #include "fd.h"
 #include "migration.h"
 #include "monitor/monitor.h"
+#include "io/channel-file.h"
 #include "io/channel-util.h"
+#include "options.h"
 #include "trace.h"
 
 
+static struct FdOutgoingArgs {
+    int fd;
+} outgoing_args;
+
+int fd_args_get_fd(void)
+{
+    return outgoing_args.fd;
+}
+
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
 {
     QIOChannel *ioc;
     int fd = monitor_get_fd(monitor_cur(), fdname, errp);
+
+    outgoing_args.fd = -1;
+
     if (fd == -1) {
         return;
     }
@@ -38,6 +52,8 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **
         return;
     }
 
+    outgoing_args.fd = fd;
+
     qio_channel_set_name(ioc, "migration-fd-outgoing");
     migration_channel_connect(s, ioc, NULL, NULL);
     object_unref(OBJECT(ioc));
@@ -73,4 +89,18 @@ void fd_start_incoming_migration(const char *fdname, Error **errp)
                                fd_accept_incoming_migration,
                                NULL, NULL,
                                g_main_context_get_thread_default());
+
+    if (migrate_multifd()) {
+        int channels = migrate_multifd_channels();
+
+        while (channels--) {
+            ioc = QIO_CHANNEL(qio_channel_file_new_fd(fd));
+
+            qio_channel_set_name(ioc, "migration-fd-incoming");
+            qio_channel_add_watch_full(ioc, G_IO_IN,
+                                       fd_accept_incoming_migration,
+                                       NULL, NULL,
+                                       g_main_context_get_thread_default());
+        }
+    }
 }
diff --git a/migration/fd.h b/migration/fd.h
index b901bc014e..1be980c130 100644
--- a/migration/fd.h
+++ b/migration/fd.h
@@ -20,4 +20,5 @@ void fd_start_incoming_migration(const char *fdname, Error **errp);
 
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
                                  Error **errp);
+int fd_args_get_fd(void);
 #endif
diff --git a/migration/file.c b/migration/file.c
index 1a18e608fc..27ccfc6a1d 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -11,6 +11,7 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "channel.h"
+#include "fd.h"
 #include "file.h"
 #include "migration.h"
 #include "io/channel-file.h"
@@ -58,10 +59,15 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
 {
     QIOChannelFile *ioc;
     int flags = O_WRONLY;
+    int fd = fd_args_get_fd();
 
-    ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
-    if (!ioc) {
-        return false;
+    if (fd && fd != -1) {
+        ioc = qio_channel_file_new_fd(fd);
+    } else {
+        ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
+        if (!ioc) {
+            return false;
+        }
     }
 
     if (!multifd_channel_connect(opaque, QIO_CHANNEL(ioc), errp)) {
diff --git a/migration/migration.c b/migration/migration.c
index 32b291a282..ce7e6f5065 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -134,6 +134,10 @@ static bool transport_supports_multi_channels(MigrationAddress *addr)
     if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
         SocketAddress *saddr = &addr->u.socket;
 
+        if (saddr->type == SOCKET_ADDRESS_TYPE_FD) {
+            return migrate_fixed_ram();
+        }
+
         return (saddr->type == SOCKET_ADDRESS_TYPE_INET ||
                 saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
                 saddr->type == SOCKET_ADDRESS_TYPE_VSOCK);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 26/34] tests/qtest/migration: Add a multifd + fixed-ram migration test
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (24 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 25/34] migration/multifd: Add fixed-ram support to fd: URI Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-26  8:42   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 27/34] migration: Add direct-io parameter Fabiano Rosas
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-test.c | 68 ++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d61f93b151..cb9f16f78e 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2248,6 +2248,46 @@ static void test_precopy_file_fixed_ram(void)
     test_file_common(&args, true);
 }
 
+static void *migrate_multifd_fixed_ram_start(QTestState *from, QTestState *to)
+{
+    migrate_fixed_ram_start(from, to);
+
+    migrate_set_parameter_int(from, "multifd-channels", 4);
+    migrate_set_parameter_int(to, "multifd-channels", 4);
+
+    migrate_set_capability(from, "multifd", true);
+    migrate_set_capability(to, "multifd", true);
+
+    return NULL;
+}
+
+static void test_multifd_file_fixed_ram_live(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+                                           FILE_TEST_FILENAME);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fixed_ram_start,
+    };
+
+    test_file_common(&args, false);
+}
+
+static void test_multifd_file_fixed_ram(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+                                           FILE_TEST_FILENAME);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fixed_ram_start,
+    };
+
+    test_file_common(&args, true);
+}
+
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -2524,6 +2564,25 @@ static void test_migrate_precopy_fd_file_fixed_ram(void)
     };
     test_file_common(&args, true);
 }
+
+static void *migrate_multifd_fd_fixed_ram_start(QTestState *from,
+                                                QTestState *to)
+{
+    migrate_multifd_fixed_ram_start(from, to);
+    return migrate_precopy_fd_file_start(from, to);
+}
+
+static void test_multifd_fd_fixed_ram(void)
+{
+    MigrateCommon args = {
+        .connect_uri = "fd:fd-mig",
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fd_fixed_ram_start,
+        .finish_hook = test_migrate_fd_finish_hook
+    };
+
+    test_file_common(&args, true);
+}
 #endif /* _WIN32 */
 
 static void do_test_validate_uuid(MigrateStart *args, bool should_fail)
@@ -3566,6 +3625,15 @@ int main(int argc, char **argv)
     migration_test_add("/migration/precopy/file/fixed-ram/live",
                        test_precopy_file_fixed_ram_live);
 
+    migration_test_add("/migration/multifd/file/fixed-ram",
+                       test_multifd_file_fixed_ram);
+    migration_test_add("/migration/multifd/file/fixed-ram/live",
+                       test_multifd_file_fixed_ram_live);
+#ifndef _WIN32
+    migration_test_add("/migration/multifd/fd/fixed-ram",
+                       test_multifd_fd_fixed_ram);
+#endif
+
 #ifdef CONFIG_GNUTLS
     migration_test_add("/migration/precopy/unix/tls/psk",
                        test_precopy_unix_tls_psk);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 27/34] migration: Add direct-io parameter
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (25 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 26/34] tests/qtest/migration: Add a multifd + fixed-ram migration test Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-21  9:17   ` Markus Armbruster
  2024-02-26  8:50   ` Peter Xu
  2024-02-20 22:41 ` [PATCH v4 28/34] migration/multifd: Add direct-io support Fabiano Rosas
                   ` (8 subsequent siblings)
  35 siblings, 2 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Eric Blake

Add the direct-io migration parameter that tells the migration code to
use O_DIRECT when opening the migration stream file whenever possible.

This is currently only used with the fixed-ram migration that has a
clear window guaranteed to perform aligned writes.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 include/qemu/osdep.h           |  2 ++
 migration/migration-hmp-cmds.c | 11 +++++++++++
 migration/options.c            | 30 ++++++++++++++++++++++++++++++
 qapi/migration.json            | 18 +++++++++++++++---
 4 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index c7053cdc2b..645c14a65d 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
 bool qemu_has_ofd_lock(void);
 #endif
 
+bool qemu_has_direct_io(void);
+
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
 #elif defined(WIN64)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 99b49df5dd..77313346c2 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -392,6 +392,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %s\n",
             MigrationParameter_str(MIGRATION_PARAMETER_MODE),
             qapi_enum_lookup(&MigMode_lookup, params->mode));
+
+        if (params->has_direct_io) {
+            monitor_printf(mon, "%s: %s\n",
+                           MigrationParameter_str(
+                               MIGRATION_PARAMETER_DIRECT_IO),
+                           params->direct_io ? "on" : "off");
+        }
     }
 
     qapi_free_MigrationParameters(params);
@@ -681,6 +688,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_mode = true;
         visit_type_MigMode(v, param, &p->mode, &err);
         break;
+    case MIGRATION_PARAMETER_DIRECT_IO:
+        p->has_direct_io = true;
+        visit_type_bool(v, param, &p->direct_io, &err);
+        break;
     default:
         assert(0);
     }
diff --git a/migration/options.c b/migration/options.c
index bfcd2d7132..b347dbc670 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -823,6 +823,22 @@ int migrate_decompress_threads(void)
     return s->parameters.decompress_threads;
 }
 
+bool migrate_direct_io(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    /* For now O_DIRECT is only supported with fixed-ram */
+    if (!s->capabilities[MIGRATION_CAPABILITY_FIXED_RAM]) {
+        return false;
+    }
+
+    if (s->parameters.has_direct_io) {
+        return s->parameters.direct_io;
+    }
+
+    return false;
+}
+
 uint64_t migrate_downtime_limit(void)
 {
     MigrationState *s = migrate_get_current();
@@ -1042,6 +1058,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->has_mode = true;
     params->mode = s->parameters.mode;
 
+    if (s->parameters.has_direct_io) {
+        params->has_direct_io = true;
+        params->direct_io = s->parameters.direct_io;
+    }
+
     return params;
 }
 
@@ -1077,6 +1098,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_x_vcpu_dirty_limit_period = true;
     params->has_vcpu_dirty_limit = true;
     params->has_mode = true;
+    params->has_direct_io = qemu_has_direct_io();
 }
 
 /*
@@ -1386,6 +1408,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_mode) {
         dest->mode = params->mode;
     }
+
+    if (params->has_direct_io) {
+        dest->direct_io = params->direct_io;
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1530,6 +1556,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_mode) {
         s->parameters.mode = params->mode;
     }
+
+    if (params->has_direct_io) {
+        s->parameters.direct_io = params->direct_io;
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/qapi/migration.json b/qapi/migration.json
index 3fce5fe53e..41241a2178 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -878,6 +878,9 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #        (Since 8.2)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. This
+#     requires that the 'fixed-ram' capability is enabled. (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -911,7 +914,8 @@
            'block-bitmap-mapping',
            { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
            'vcpu-dirty-limit',
-           'mode'] }
+           'mode',
+           'direct-io'] }
 
 ##
 # @MigrateSetParameters:
@@ -1070,6 +1074,9 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #        (Since 8.2)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. This
+#     requires that the 'fixed-ram' capability is enabled. (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1123,7 +1130,8 @@
             '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
                                             'features': [ 'unstable' ] },
             '*vcpu-dirty-limit': 'uint64',
-            '*mode': 'MigMode'} }
+            '*mode': 'MigMode',
+            '*direct-io': 'bool' } }
 
 ##
 # @migrate-set-parameters:
@@ -1298,6 +1306,9 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #        (Since 8.2)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. This
+#     requires that the 'fixed-ram' capability is enabled. (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1348,7 +1359,8 @@
             '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
                                             'features': [ 'unstable' ] },
             '*vcpu-dirty-limit': 'uint64',
-            '*mode': 'MigMode'} }
+            '*mode': 'MigMode',
+            '*direct-io': 'bool' } }
 
 ##
 # @query-migrate-parameters:
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 28/34] migration/multifd: Add direct-io support
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (26 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 27/34] migration: Add direct-io parameter Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 29/34] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c      | 18 +++++++++++++++++-
 migration/migration.c | 24 ++++++++++++++++++++++++
 migration/options.h   |  1 +
 util/osdep.c          |  9 +++++++++
 4 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/migration/file.c b/migration/file.c
index 27ccfc6a1d..f1c7615fb6 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -57,10 +57,26 @@ int file_send_channel_destroy(QIOChannel *ioc)
 
 bool file_send_channel_create(gpointer opaque, Error **errp)
 {
-    QIOChannelFile *ioc;
+    QIOChannelFile *ioc = NULL;
     int flags = O_WRONLY;
     int fd = fd_args_get_fd();
 
+    if (migrate_direct_io()) {
+#ifdef O_DIRECT
+        /*
+         * Enable O_DIRECT for the secondary channels. These are used
+         * for sending ram pages and writes should be guaranteed to be
+         * aligned to at least page size.
+         */
+        flags |= O_DIRECT;
+#else
+        error_setg(errp, "System does not support O_DIRECT");
+        error_append_hint(errp,
+                          "Try disabling direct-io migration capability\n");
+        return false;
+#endif
+    }
+
     if (fd && fd != -1) {
         ioc = qio_channel_file_new_fd(fd);
     } else {
diff --git a/migration/migration.c b/migration/migration.c
index ce7e6f5065..ecc07c4847 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -153,6 +153,16 @@ static bool migration_needs_seekable_channel(void)
     return migrate_fixed_ram();
 }
 
+static bool migration_needs_multiple_fds(void)
+{
+    /*
+     * When doing direct-io, multifd requires two different,
+     * non-duplicated file descriptors so we can use one of them for
+     * unaligned IO.
+     */
+    return migrate_multifd() && migrate_direct_io();
+}
+
 static bool transport_supports_seeking(MigrationAddress *addr)
 {
     if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
@@ -171,6 +181,12 @@ static bool transport_supports_seeking(MigrationAddress *addr)
     return false;
 }
 
+static bool transport_supports_multiple_fds(MigrationAddress *addr)
+{
+    /* file: works because QEMU can open it multiple times */
+    return addr->transport == MIGRATION_ADDRESS_TYPE_FILE;
+}
+
 static bool
 migration_channels_and_transport_compatible(MigrationAddress *addr,
                                             Error **errp)
@@ -187,6 +203,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
         return false;
     }
 
+    if (migration_needs_multiple_fds() &&
+        !transport_supports_multiple_fds(addr)) {
+        error_setg(errp,
+                   "Migration with direct-io is incompatible with the fd: URI,"
+                   " use file: instead");
+        return false;
+    }
+
     return true;
 }
 
diff --git a/migration/options.h b/migration/options.h
index 8680a10b79..39cbc171f7 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -79,6 +79,7 @@ uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_cpu_throttle_initial(void);
 bool migrate_cpu_throttle_tailslow(void);
 int migrate_decompress_threads(void);
+bool migrate_direct_io(void);
 uint64_t migrate_downtime_limit(void);
 uint8_t migrate_max_cpu_throttle(void);
 uint64_t migrate_max_bandwidth(void);
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..d0227a60ab 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -277,6 +277,15 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
 }
 #endif
 
+bool qemu_has_direct_io(void)
+{
+#ifdef O_DIRECT
+    return true;
+#else
+    return false;
+#endif
+}
+
 static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
 {
     int ret;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 29/34] tests/qtest/migration: Add tests for file migration with direct-io
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (27 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 28/34] migration/multifd: Add direct-io support Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 30/34] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

The tests are only allowed to run in systems that know and in
filesystems which support O_DIRECT.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-helpers.c | 39 ++++++++++++++++++++++++++++++
 tests/qtest/migration-helpers.h |  1 +
 tests/qtest/migration-test.c    | 42 +++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index e451dbdbed..4ae43db1ca 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -323,3 +323,42 @@ void migration_test_add(const char *path, void (*fn)(void))
     qtest_add_data_func_full(path, test, migration_test_wrapper,
                              migration_test_destroy);
 }
+
+#ifdef O_DIRECT
+/*
+ * Probe for O_DIRECT support on the filesystem. Since this is used
+ * for tests, be conservative, if anything fails, assume it's
+ * unsupported.
+ */
+bool probe_o_direct_support(const char *tmpfs)
+{
+    g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
+    int fd, flags = O_CREAT | O_RDWR | O_DIRECT;
+    void *buf;
+    ssize_t ret, len;
+    uint64_t offset;
+
+    fd = open(filename, flags, 0660);
+    if (fd < 0) {
+        unlink(filename);
+        return false;
+    }
+
+    /*
+     * Assuming 4k should be enough to satisfy O_DIRECT alignment
+     * requirements. The migration code uses 1M to be conservative.
+     */
+    len = 0x100000;
+    offset = 0x100000;
+
+    buf = g_malloc0(len);
+    ret = pwrite(fd, buf, len, offset);
+    unlink(filename);
+
+    if (ret < 0) {
+        return false;
+    }
+
+    return true;
+}
+#endif
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 3bf7ded1b9..d4d641899a 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -52,5 +52,6 @@ char *find_common_machine_version(const char *mtype, const char *var1,
                                   const char *var2);
 char *resolve_machine_version(const char *alias, const char *var1,
                               const char *var2);
+bool probe_o_direct_support(const char *tmpfs);
 void migration_test_add(const char *path, void (*fn)(void));
 #endif /* MIGRATION_HELPERS_H */
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cb9f16f78e..0931ba18df 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2287,6 +2287,43 @@ static void test_multifd_file_fixed_ram(void)
     test_file_common(&args, true);
 }
 
+#ifdef O_DIRECT
+static void *migrate_fixed_ram_dio_start(QTestState *from,
+                                                 QTestState *to)
+{
+    migrate_fixed_ram_start(from, to);
+    migrate_set_parameter_bool(from, "direct-io", true);
+    migrate_set_parameter_bool(to, "direct-io", true);
+
+    return NULL;
+}
+
+static void *migrate_multifd_fixed_ram_dio_start(QTestState *from,
+                                                 QTestState *to)
+{
+    migrate_multifd_fixed_ram_start(from, to);
+    return migrate_fixed_ram_dio_start(from, to);
+}
+
+static void test_multifd_file_fixed_ram_dio(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+                                           FILE_TEST_FILENAME);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fixed_ram_dio_start,
+    };
+
+    if (!probe_o_direct_support(tmpfs)) {
+        g_test_skip("Filesystem does not support O_DIRECT");
+        return;
+    }
+
+    test_file_common(&args, true);
+}
+
+#endif /* O_DIRECT */
 
 static void test_precopy_tcp_plain(void)
 {
@@ -3634,6 +3671,11 @@ int main(int argc, char **argv)
                        test_multifd_fd_fixed_ram);
 #endif
 
+#ifdef O_DIRECT
+    migration_test_add("/migration/multifd/file/fixed-ram/dio",
+                       test_multifd_file_fixed_ram_dio);
+#endif
+
 #ifdef CONFIG_GNUTLS
     migration_test_add("/migration/precopy/unix/tls/psk",
                        test_precopy_unix_tls_psk);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 30/34] monitor: Honor QMP request for fd removal immediately
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (28 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 29/34] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-21  9:20   ` Markus Armbruster
  2024-02-20 22:41 ` [PATCH v4 31/34] monitor: Extract fdset fd flags comparison into a function Fabiano Rosas
                   ` (5 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

We're currently only removing an fd from the fdset if the VM is
running. This causes a QMP call to "remove-fd" to not actually remove
the fd if the VM happens to be stopped.

While the fd would eventually be removed when monitor_fdset_cleanup()
is called again, the user request should be honored and the fd
actually removed. Calling remove-fd + query-fdset shows a recently
removed fd still present.

The runstate_is_running() check was introduced by commit ebe52b592d
("monitor: Prevent removing fd from set during init"), which by the
shortlog indicates that they were trying to avoid removing an
yet-unduplicated fd too early.

I don't see why an fd explicitly removed with qmp_remove_fd() should
be under runstate_is_running(). I'm assuming this was a mistake when
adding the parenthesis around the expression.

Move the runstate_is_running() check to apply only to the
QLIST_EMPTY(dup_fds) side of the expression and ignore it when
mon_fdset_fd->removed has been explicitly set.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 monitor/fds.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..4ec3b7eea9 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
     MonFdsetFd *mon_fdset_fd_next;
 
     QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
-        if ((mon_fdset_fd->removed ||
-                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
-                runstate_is_running()) {
+        if (mon_fdset_fd->removed ||
+            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
+             runstate_is_running())) {
             close(mon_fdset_fd->fd);
             g_free(mon_fdset_fd->opaque);
             QLIST_REMOVE(mon_fdset_fd, next);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 31/34] monitor: Extract fdset fd flags comparison into a function
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (29 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 30/34] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT Fabiano Rosas
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

We're about to add one more condition to the flags comparison that
requires an ifdef. Move the code into a separate function now to make
it cleaner after the next patch.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 monitor/fds.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index 4ec3b7eea9..9a28e4b72b 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -406,6 +406,19 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
     return fdinfo;
 }
 
+#ifndef _WIN32
+static bool monitor_fdset_flags_match(int flags, int fd_flags)
+{
+    bool match = false;
+
+    if ((flags & O_ACCMODE) == (fd_flags & O_ACCMODE)) {
+        match = true;
+    }
+
+    return match;
+}
+#endif
+
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 {
 #ifdef _WIN32
@@ -431,7 +444,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
                 return -1;
             }
 
-            if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
+            if (monitor_fdset_flags_match(flags, mon_fd_flags)) {
                 fd = mon_fdset_fd->fd;
                 break;
             }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (30 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 31/34] monitor: Extract fdset fd flags comparison into a function Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-21  9:27   ` Markus Armbruster
  2024-02-20 22:41 ` [PATCH v4 33/34] migration: Add support for fdset with multifd + file Fabiano Rosas
                   ` (3 subsequent siblings)
  35 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

We're about to enable the use of O_DIRECT in the migration code and
due to the alignment restrictions imposed by filesystems we need to
make sure the flag is only used when doing aligned IO.

The migration will do parallel IO to different regions of a file, so
we need to use more than one file descriptor. Those cannot be obtained
by duplicating (dup()) since duplicated file descriptors share the
file status flags, including O_DIRECT. If one migration channel does
unaligned IO while another sets O_DIRECT to do aligned IO, the
filesystem would fail the unaligned operation.

The add-fd QMP command along with the fdset code are specifically
designed to allow the user to pass a set of file descriptors with
different access flags into QEMU to be later fetched by code that
needs to alternate between those flags when doing IO.

Extend the fdset matching function to behave the same with the
O_DIRECT flag.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 monitor/fds.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/monitor/fds.c b/monitor/fds.c
index 9a28e4b72b..42bf3eb982 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -413,6 +413,12 @@ static bool monitor_fdset_flags_match(int flags, int fd_flags)
 
     if ((flags & O_ACCMODE) == (fd_flags & O_ACCMODE)) {
         match = true;
+
+#ifdef O_DIRECT
+        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
+            match = false;
+        }
+#endif
     }
 
     return match;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 33/34] migration: Add support for fdset with multifd + file
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (31 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-20 22:41 ` [PATCH v4 34/34] tests/qtest/migration: Add a test for fixed-ram with passing of fds Fabiano Rosas
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana

Allow multifd to use an fdset when migrating to a file. This is useful
for the scenario where the management layer wants to have control over
the migration file.

By receiving the file descriptors directly, QEMU can delegate some
high level operating system operations to the management layer (such
as mandatory access control). The management layer might also want to
add its own headers before the migration stream.

Enable the "file:/dev/fdset/#" syntax for the multifd migration with
fixed-ram. The requirements for the fdset mechanism are:

On the migration source side:

- the fdset must contain two fds that are not duplicates between
  themselves;
- if direct-io is to be used, exactly one of the fds must have the
  O_DIRECT flag set;
- the file must be opened with WRONLY both times.

On the migration destination side:

- the fdset must contain one fd;
- the file must be opened with RDONLY.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/migration/main.rst | 18 +++++++
 migration/file.c              | 97 +++++++++++++++++++++++++++++++++--
 2 files changed, 111 insertions(+), 4 deletions(-)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 8024275d6d..ac6b6a8eb0 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -46,6 +46,24 @@ over any transport.
   application to add its own metadata to the start of the file without
   QEMU interference.
 
+  The file migration also supports using a file that has already been
+  opened. A set of file descriptors is passed to QEMU via an "fdset"
+  (see add-fd QMP command documentation). This method allows a
+  management application to have control over the migration file
+  opening operation. There are, however, strict requirements to this
+  interface:
+
+  On the migration source side:
+    - the fdset must contain two file descriptors that are not
+      duplicates between themselves;
+    - if the direct-io capability is to be used, exactly one of the
+      file descriptors must have the O_DIRECT flag set;
+    - the file must be opened with WRONLY both times.
+
+  On the migration destination side:
+    - the fdset must contain one file descriptor;
+    - the file must be opened with RDONLY.
+
 In addition, support is included for migration using RDMA, which
 transports the page data using ``RDMA``, where the hardware takes care of
 transporting the pages, and the load on the CPU is much lower.  While the
diff --git a/migration/file.c b/migration/file.c
index f1c7615fb6..95f3210faf 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -10,12 +10,14 @@
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-misc.h"
 #include "channel.h"
 #include "fd.h"
 #include "file.h"
 #include "migration.h"
 #include "io/channel-file.h"
 #include "io/channel-util.h"
+#include "monitor/monitor.h"
 #include "options.h"
 #include "trace.h"
 
@@ -23,6 +25,7 @@
 
 static struct FileOutgoingArgs {
     char *fname;
+    int64_t fdset_id;
 } outgoing_args;
 
 /* Remove the offset option from @filespec and return it in @offsetp. */
@@ -44,6 +47,84 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
     return 0;
 }
 
+/*
+ * If the open flags and file status flags from the file descriptors
+ * in the fdset don't match what QEMU expects, errno gets set to
+ * EACCES. Let's provide a more user-friendly message.
+ */
+static void file_fdset_error(int flags, Error **errp)
+{
+    ERRP_GUARD();
+
+    if (errno == EACCES) {
+        /* ditch the previous error */
+        error_free(*errp);
+        *errp = NULL;
+
+        error_setg(errp, "Fdset is missing a file descriptor with flags: 0x%x",
+                   flags);
+    }
+}
+
+static void file_remove_fdset(void)
+{
+    if (outgoing_args.fdset_id != -1) {
+        qmp_remove_fd(outgoing_args.fdset_id, false, -1, NULL);
+        outgoing_args.fdset_id = -1;
+    }
+}
+
+/*
+ * Due to the behavior of the dup() system call, we need the fdset to
+ * have two non-duplicate fds so we can enable direct IO in the
+ * secondary channels without affecting the main channel.
+ */
+static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
+                             Error **errp)
+{
+    FdsetInfoList *fds_info;
+    FdsetFdInfoList *fd_info;
+    const char *fdset_id_str;
+    int nfds = 0;
+
+    *fdset_id = -1;
+
+    if (!strstart(filename, "/dev/fdset/", &fdset_id_str)) {
+        return true;
+    }
+
+    if (!migrate_multifd()) {
+        error_setg(errp, "fdset is only supported with multifd");
+        return false;
+    }
+
+    *fdset_id = qemu_parse_fd(fdset_id_str);
+
+    for (fds_info = qmp_query_fdsets(NULL); fds_info;
+         fds_info = fds_info->next) {
+
+        if (*fdset_id != fds_info->value->fdset_id) {
+            continue;
+        }
+
+        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
+            if (nfds++ > 2) {
+                break;
+            }
+        }
+    }
+
+    if (nfds != 2) {
+        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
+                   "got %d", nfds);
+        qmp_remove_fd(*fdset_id, false, -1, NULL);
+        *fdset_id = -1;
+        return false;
+    }
+
+    return true;
+}
+
 int file_send_channel_destroy(QIOChannel *ioc)
 {
     if (ioc) {
@@ -52,6 +133,7 @@ int file_send_channel_destroy(QIOChannel *ioc)
     g_free(outgoing_args.fname);
     outgoing_args.fname = NULL;
 
+    file_remove_fdset();
     return 0;
 }
 
@@ -82,6 +164,7 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
     } else {
         ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
         if (!ioc) {
+            file_fdset_error(flags, errp);
             return false;
         }
     }
@@ -105,13 +188,18 @@ void file_start_outgoing_migration(MigrationState *s,
 
     trace_migration_file_outgoing(filename);
 
-    fioc = qio_channel_file_new_path(filename, flags, mode, errp);
-    if (!fioc) {
+    if (!file_parse_fdset(filename, &outgoing_args.fdset_id, errp)) {
         return;
     }
 
     outgoing_args.fname = g_strdup(filename);
 
+    fioc = qio_channel_file_new_path(filename, flags, mode, errp);
+    if (!fioc) {
+        file_fdset_error(flags, errp);
+        return;
+    }
+
     ioc = QIO_CHANNEL(fioc);
     if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
         return;
@@ -135,12 +223,13 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
     QIOChannelFile *fioc = NULL;
     uint64_t offset = file_args->offset;
     int channels = 1;
-    int i = 0, fd;
+    int i = 0, fd, flags = O_RDONLY;
 
     trace_migration_file_incoming(filename);
 
-    fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
+    fioc = qio_channel_file_new_path(filename, flags, 0, errp);
     if (!fioc) {
+        file_fdset_error(flags, errp);
         return;
     }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 34/34] tests/qtest/migration: Add a test for fixed-ram with passing of fds
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (32 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 33/34] migration: Add support for fdset with multifd + file Fabiano Rosas
@ 2024-02-20 22:41 ` Fabiano Rosas
  2024-02-23  2:59 ` [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Peter Xu
  2024-02-26  6:15 ` Peter Xu
  35 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-20 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

Add a multifd test for fixed-ram with passing of fds into QEMU. This
is how libvirt will consume the feature.

There are a couple of details to the fdset mechanism:

- multifd needs two distinct file descriptors (not duplicated with
  dup()) on the outgoing side so it can enable O_DIRECT only on the
  channels that write with alignment. The dup() system call creates
  file descriptors that share status flags, of which O_DIRECT is one.

  the incoming side doesn't set O_DIRECT, so it can dup() fds and
  therefore can receive only one in the fdset.

- the open() access mode flags used for the fds passed into QEMU need
  to match the flags QEMU uses to open the file. Currently O_WRONLY
  for src and O_RDONLY for dst.

O_DIRECT is not supported on all systems/filesystems, so run the fdset
test without O_DIRECT if that's the case. The migration code should
still work in that scenario.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-helpers.c |  7 ++-
 tests/qtest/migration-test.c    | 88 +++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 4ae43db1ca..3990da3c16 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -333,7 +333,7 @@ void migration_test_add(const char *path, void (*fn)(void))
 bool probe_o_direct_support(const char *tmpfs)
 {
     g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
-    int fd, flags = O_CREAT | O_RDWR | O_DIRECT;
+    int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
     void *buf;
     ssize_t ret, len;
     uint64_t offset;
@@ -351,9 +351,12 @@ bool probe_o_direct_support(const char *tmpfs)
     len = 0x100000;
     offset = 0x100000;
 
-    buf = g_malloc0(len);
+    buf = aligned_alloc(len, len);
+    g_assert(buf);
+
     ret = pwrite(fd, buf, len, offset);
     unlink(filename);
+    g_free(buf);
 
     if (ret < 0) {
         return false;
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0931ba18df..8c5536958d 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2323,8 +2323,91 @@ static void test_multifd_file_fixed_ram_dio(void)
     test_file_common(&args, true);
 }
 
+static void migrate_multifd_fixed_ram_fdset_dio_end(QTestState *from,
+                                                    QTestState *to,
+                                                    void *opaque)
+{
+    QDict *resp;
+    QList *fdsets;
+
+    /*
+     * Check that we removed the fdsets after migration, otherwise a
+     * second migration would fail due to too many fdsets.
+     */
+
+    resp = qtest_qmp(from, "{'execute': 'query-fdsets', "
+                     "'arguments': {}}");
+    g_assert(qdict_haskey(resp, "return"));
+    fdsets = qdict_get_qlist(resp, "return");
+    g_assert(fdsets && qlist_empty(fdsets));
+}
 #endif /* O_DIRECT */
 
+#ifndef _WIN32
+static void *migrate_multifd_fixed_ram_fdset(QTestState *from, QTestState *to)
+{
+    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
+    int fds[3];
+    int src_flags = O_CREAT | O_WRONLY;
+    int dst_flags = O_CREAT | O_RDONLY;
+
+    /* main outgoing channel: no O_DIRECT */
+    fds[0] = open(file, src_flags, 0660);
+    assert(fds[0] != -1);
+
+#ifdef O_DIRECT
+    src_flags |= O_DIRECT;
+#endif
+
+    /* secondary outgoing channels */
+    fds[1] = open(file, src_flags, 0660);
+    assert(fds[1] != -1);
+
+    qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+    qtest_qmp_fds_assert_success(from, &fds[1], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+    /* incoming channel */
+    fds[2] = open(file, dst_flags, 0660);
+    assert(fds[2] != -1);
+
+    qtest_qmp_fds_assert_success(to, &fds[2], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+#ifdef O_DIRECT
+        migrate_multifd_fixed_ram_dio_start(from, to);
+#else
+        migrate_multifd_fixed_ram_start(from, to);
+#endif
+
+    return NULL;
+}
+
+static void test_multifd_file_fixed_ram_fdset(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=0x100");
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fixed_ram_fdset,
+#ifdef O_DIRECT
+        .finish_hook = migrate_multifd_fixed_ram_fdset_dio_end,
+#endif
+    };
+
+#ifdef O_DIRECT
+    if (!probe_o_direct_support(tmpfs)) {
+        g_test_skip("Filesystem does not support O_DIRECT");
+        return;
+    }
+#endif
+
+    test_file_common(&args, true);
+}
+#endif /* _WIN32 */
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -3676,6 +3759,11 @@ int main(int argc, char **argv)
                        test_multifd_file_fixed_ram_dio);
 #endif
 
+#ifndef _WIN32
+    qtest_add_func("/migration/multifd/file/fixed-ram/fdset",
+                   test_multifd_file_fixed_ram_fdset);
+#endif
+
 #ifdef CONFIG_GNUTLS
     migration_test_add("/migration/precopy/unix/tls/psk",
                        test_precopy_unix_tls_psk);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability
  2024-02-20 22:41 ` [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
@ 2024-02-21  8:41   ` Markus Armbruster
  2024-02-21 13:24     ` Fabiano Rosas
  2024-02-26  3:07   ` Peter Xu
  2024-02-26  3:22   ` Peter Xu
  2 siblings, 1 reply; 79+ messages in thread
From: Markus Armbruster @ 2024-02-21  8:41 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana, Eric Blake

Fabiano Rosas <farosas@suse.de> writes:

> Add a new migration capability 'fixed-ram'.
>
> The core of the feature is to ensure that each RAM page has a specific
> offset in the resulting migration stream. The reasons why we'd want
> such behavior are:
>
>  - The resulting file will have a bounded size, since pages which are
>    dirtied multiple times will always go to a fixed location in the
>    file, rather than constantly being added to a sequential
>    stream. This eliminates cases where a VM with, say, 1G of RAM can
>    result in a migration file that's 10s of GBs, provided that the
>    workload constantly redirties memory.
>
>  - It paves the way to implement O_DIRECT-enabled save/restore of the
>    migration stream as the pages are ensured to be written at aligned
>    offsets.
>
>  - It allows the usage of multifd so we can write RAM pages to the
>    migration file in parallel.
>
> For now, enabling the capability has no effect. The next couple of
> patches implement the core functionality.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5a565d9b8d..3fce5fe53e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -531,6 +531,10 @@
>  #     and can result in more stable read performance.  Requires KVM
>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>  #
> +# @fixed-ram: Migrate using fixed offsets in the migration file for
> +#     each RAM page.  Requires a migration URI that supports seeking,
> +#     such as a file.  (since 9.0)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
> @@ -555,7 +559,7 @@
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> -           'dirty-limit'] }
> +           'dirty-limit', 'fixed-ram'] }
>  
>  ##
>  # @MigrationCapabilityStatus:

Can we find a better name than @fixed-ram?  @fixed-ram-offsets?
@use-seek?

Apart from that, QAPI schema
Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 27/34] migration: Add direct-io parameter
  2024-02-20 22:41 ` [PATCH v4 27/34] migration: Add direct-io parameter Fabiano Rosas
@ 2024-02-21  9:17   ` Markus Armbruster
  2024-02-26  8:50   ` Peter Xu
  1 sibling, 0 replies; 79+ messages in thread
From: Markus Armbruster @ 2024-02-21  9:17 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana, Eric Blake

Fabiano Rosas <farosas@suse.de> writes:

> Add the direct-io migration parameter that tells the migration code to
> use O_DIRECT when opening the migration stream file whenever possible.
>
> This is currently only used with the fixed-ram migration that has a
> clear window guaranteed to perform aligned writes.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 3fce5fe53e..41241a2178 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -878,6 +878,9 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @direct-io: Open migration files with O_DIRECT when possible. This
> +#     requires that the 'fixed-ram' capability is enabled. (since 9.0)

'fixed-ram' is a cross-reference to MigrationCapability member
fixed-ram.

For local members, @name is better than 'name', because @name carries
meaning, while 'name' could be anything.

Currently, @name is merely shorthand for ``name``, which is an reST
"inline literal", commonly used for short code snippets.  Rendered in
fixed-width font, unlike 'name'.

Making @name generating a link to the description would be a nice
improvement.

For non-local members, we can't make @name a link without also
specifying the thing it's a member of.

Let's stick to @name for member names, even non-local ones, so we get
the same font for all of them.

> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -911,7 +914,8 @@
>             'block-bitmap-mapping',
>             { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
>             'vcpu-dirty-limit',
> -           'mode'] }
> +           'mode',
> +           'direct-io'] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -1070,6 +1074,9 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @direct-io: Open migration files with O_DIRECT when possible. This
> +#     requires that the 'fixed-ram' capability is enabled. (since 9.0)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1123,7 +1130,8 @@
>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
> -            '*mode': 'MigMode'} }
> +            '*mode': 'MigMode',
> +            '*direct-io': 'bool' } }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1298,6 +1306,9 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @direct-io: Open migration files with O_DIRECT when possible. This
> +#     requires that the 'fixed-ram' capability is enabled. (since 9.0)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1348,7 +1359,8 @@
>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
> -            '*mode': 'MigMode'} }
> +            '*mode': 'MigMode',
> +            '*direct-io': 'bool' } }
>  
>  ##
>  # @query-migrate-parameters:

Other than that, QAPI schema
Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 30/34] monitor: Honor QMP request for fd removal immediately
  2024-02-20 22:41 ` [PATCH v4 30/34] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
@ 2024-02-21  9:20   ` Markus Armbruster
  0 siblings, 0 replies; 79+ messages in thread
From: Markus Armbruster @ 2024-02-21  9:20 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana

Fabiano Rosas <farosas@suse.de> writes:

> We're currently only removing an fd from the fdset if the VM is
> running. This causes a QMP call to "remove-fd" to not actually remove
> the fd if the VM happens to be stopped.
>
> While the fd would eventually be removed when monitor_fdset_cleanup()
> is called again, the user request should be honored and the fd
> actually removed. Calling remove-fd + query-fdset shows a recently
> removed fd still present.
>
> The runstate_is_running() check was introduced by commit ebe52b592d
> ("monitor: Prevent removing fd from set during init"), which by the
> shortlog indicates that they were trying to avoid removing an
> yet-unduplicated fd too early.
>
> I don't see why an fd explicitly removed with qmp_remove_fd() should
> be under runstate_is_running(). I'm assuming this was a mistake when
> adding the parenthesis around the expression.
>
> Move the runstate_is_running() check to apply only to the
> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
> mon_fdset_fd->removed has been explicitly set.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Eric, Kevin, your fingerprints are on commit ebe52b592d.  Could you have
a look at this fix?

> ---
>  monitor/fds.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..4ec3b7eea9 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>      MonFdsetFd *mon_fdset_fd_next;
>  
>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
> -        if ((mon_fdset_fd->removed ||
> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
> -                runstate_is_running()) {
> +        if (mon_fdset_fd->removed ||
> +            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
> +             runstate_is_running())) {
>              close(mon_fdset_fd->fd);
>              g_free(mon_fdset_fd->opaque);
>              QLIST_REMOVE(mon_fdset_fd, next);



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT
  2024-02-20 22:41 ` [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT Fabiano Rosas
@ 2024-02-21  9:27   ` Markus Armbruster
  2024-02-21 13:37     ` Fabiano Rosas
  0 siblings, 1 reply; 79+ messages in thread
From: Markus Armbruster @ 2024-02-21  9:27 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana

Fabiano Rosas <farosas@suse.de> writes:

> We're about to enable the use of O_DIRECT in the migration code and
> due to the alignment restrictions imposed by filesystems we need to
> make sure the flag is only used when doing aligned IO.
>
> The migration will do parallel IO to different regions of a file, so
> we need to use more than one file descriptor. Those cannot be obtained
> by duplicating (dup()) since duplicated file descriptors share the
> file status flags, including O_DIRECT. If one migration channel does
> unaligned IO while another sets O_DIRECT to do aligned IO, the
> filesystem would fail the unaligned operation.
>
> The add-fd QMP command along with the fdset code are specifically
> designed to allow the user to pass a set of file descriptors with
> different access flags into QEMU to be later fetched by code that
> needs to alternate between those flags when doing IO.
>
> Extend the fdset matching function to behave the same with the
> O_DIRECT flag.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  monitor/fds.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/monitor/fds.c b/monitor/fds.c
> index 9a28e4b72b..42bf3eb982 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -413,6 +413,12 @@ static bool monitor_fdset_flags_match(int flags, int fd_flags)
   static bool monitor_fdset_flags_match(int flags, int fd_flags)
   {
       bool match = false;
   
>      if ((flags & O_ACCMODE) == (fd_flags & O_ACCMODE)) {
>          match = true;
> +
> +#ifdef O_DIRECT
> +        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
> +            match = false;
> +        }
> +#endif
>      }
>  
>      return match;
   }

I'd prefer something like

   static bool monitor_fdset_flags_match(int flags, int fd_flags)
   {
   #ifdef O_DIRECT
       if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
           return false;
       }
   #endif

       if ((flags & O_ACCMODE) != (fd_flags & O_ACCMODE)) {
           return false;

       }

       return true;
   }



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability
  2024-02-21  8:41   ` Markus Armbruster
@ 2024-02-21 13:24     ` Fabiano Rosas
  2024-02-21 13:50       ` Daniel P. Berrangé
  0 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-21 13:24 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana, Eric Blake

Markus Armbruster <armbru@redhat.com> writes:

> Fabiano Rosas <farosas@suse.de> writes:
>
>> Add a new migration capability 'fixed-ram'.
>>
>> The core of the feature is to ensure that each RAM page has a specific
>> offset in the resulting migration stream. The reasons why we'd want
>> such behavior are:
>>
>>  - The resulting file will have a bounded size, since pages which are
>>    dirtied multiple times will always go to a fixed location in the
>>    file, rather than constantly being added to a sequential
>>    stream. This eliminates cases where a VM with, say, 1G of RAM can
>>    result in a migration file that's 10s of GBs, provided that the
>>    workload constantly redirties memory.
>>
>>  - It paves the way to implement O_DIRECT-enabled save/restore of the
>>    migration stream as the pages are ensured to be written at aligned
>>    offsets.
>>
>>  - It allows the usage of multifd so we can write RAM pages to the
>>    migration file in parallel.
>>
>> For now, enabling the capability has no effect. The next couple of
>> patches implement the core functionality.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> [...]
>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 5a565d9b8d..3fce5fe53e 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -531,6 +531,10 @@
>>  #     and can result in more stable read performance.  Requires KVM
>>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>>  #
>> +# @fixed-ram: Migrate using fixed offsets in the migration file for
>> +#     each RAM page.  Requires a migration URI that supports seeking,
>> +#     such as a file.  (since 9.0)
>> +#
>>  # Features:
>>  #
>>  # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
>> @@ -555,7 +559,7 @@
>>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>             'validate-uuid', 'background-snapshot',
>>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
>> -           'dirty-limit'] }
>> +           'dirty-limit', 'fixed-ram'] }
>>  
>>  ##
>>  # @MigrationCapabilityStatus:
>
> Can we find a better name than @fixed-ram?  @fixed-ram-offsets?
> @use-seek?

I have no idea how we came to fixed-ram. The archives don't provide any
clarification. I find it confusing at first glance as well.

A little brainstorming on how fixed-ram is different from exiting
migration:

Fixed-ram:
  uses a file, like the 'file:' migration;

  needs a seeking medium, such as a file;

  migrates ram by placing a page always in the same offset in the
  file, contrary to normal migration which streams the page changes
  continuously;

  ensures a migration file of size bounded to VM RAM size, contrary to
  normal 'file:' migration which creates a file with unbounded size;

  enables multi-threaded RAM migration, even though we only use it when
  multifd is enabled;

  uses scatter-gatter APIs (pwritev, preadv);

So a few options:

(disconsidering use-seek, it might be even more generic/vague)

- fixed-ram-offsets
- non-streaming (or streaming: false)
- ram-scatter-gather (ram-sg)
- parallel-ram (even with the slight inaccuracy that we sometimes do it single-threaded)

Remember we also use this name internally, so I think a broader
"feature" name is better that a super specific one.

Does anyone have a strong preference? Other suggestions?

> Apart from that, QAPI schema
> Acked-by: Markus Armbruster <armbru@redhat.com>

Thanks!


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT
  2024-02-21  9:27   ` Markus Armbruster
@ 2024-02-21 13:37     ` Fabiano Rosas
  2024-02-22  6:56       ` Markus Armbruster
  0 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-21 13:37 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana

Markus Armbruster <armbru@redhat.com> writes:

> Fabiano Rosas <farosas@suse.de> writes:
>
>> We're about to enable the use of O_DIRECT in the migration code and
>> due to the alignment restrictions imposed by filesystems we need to
>> make sure the flag is only used when doing aligned IO.
>>
>> The migration will do parallel IO to different regions of a file, so
>> we need to use more than one file descriptor. Those cannot be obtained
>> by duplicating (dup()) since duplicated file descriptors share the
>> file status flags, including O_DIRECT. If one migration channel does
>> unaligned IO while another sets O_DIRECT to do aligned IO, the
>> filesystem would fail the unaligned operation.
>>
>> The add-fd QMP command along with the fdset code are specifically
>> designed to allow the user to pass a set of file descriptors with
>> different access flags into QEMU to be later fetched by code that
>> needs to alternate between those flags when doing IO.
>>
>> Extend the fdset matching function to behave the same with the
>> O_DIRECT flag.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  monitor/fds.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/monitor/fds.c b/monitor/fds.c
>> index 9a28e4b72b..42bf3eb982 100644
>> --- a/monitor/fds.c
>> +++ b/monitor/fds.c
>> @@ -413,6 +413,12 @@ static bool monitor_fdset_flags_match(int flags, int fd_flags)
>    static bool monitor_fdset_flags_match(int flags, int fd_flags)
>    {
>        bool match = false;
>    
>>      if ((flags & O_ACCMODE) == (fd_flags & O_ACCMODE)) {
>>          match = true;
>> +
>> +#ifdef O_DIRECT
>> +        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
>> +            match = false;
>> +        }
>> +#endif
>>      }
>>  
>>      return match;
>    }
>
> I'd prefer something like
>
>    static bool monitor_fdset_flags_match(int flags, int fd_flags)
>    {
>    #ifdef O_DIRECT
>        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
>            return false;
>        }
>    #endif
>
>        if ((flags & O_ACCMODE) != (fd_flags & O_ACCMODE)) {
>            return false;
>
>        }
>
>        return true;
>    }

This makes the O_DIRECT flag dictate the outcome when it's present. I
want O_DIRECT to be considered only when all other flags have matched.

Otherwise we regress the original use-case if the user happened to have
put O_DIRECT in the flags. A non-match due to different O_ACCMODE would
become a match due to (possibly) matching O_DIRECT.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability
  2024-02-21 13:24     ` Fabiano Rosas
@ 2024-02-21 13:50       ` Daniel P. Berrangé
  2024-02-21 15:05         ` Fabiano Rosas
  0 siblings, 1 reply; 79+ messages in thread
From: Daniel P. Berrangé @ 2024-02-21 13:50 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Markus Armbruster, qemu-devel, Peter Xu, Claudio Fontana,
	Eric Blake

On Wed, Feb 21, 2024 at 10:24:05AM -0300, Fabiano Rosas wrote:
> Markus Armbruster <armbru@redhat.com> writes:
> 
> > Fabiano Rosas <farosas@suse.de> writes:
> >
> >> Add a new migration capability 'fixed-ram'.
> >>
> >> The core of the feature is to ensure that each RAM page has a specific
> >> offset in the resulting migration stream. The reasons why we'd want
> >> such behavior are:
> >>
> >>  - The resulting file will have a bounded size, since pages which are
> >>    dirtied multiple times will always go to a fixed location in the
> >>    file, rather than constantly being added to a sequential
> >>    stream. This eliminates cases where a VM with, say, 1G of RAM can
> >>    result in a migration file that's 10s of GBs, provided that the
> >>    workload constantly redirties memory.
> >>
> >>  - It paves the way to implement O_DIRECT-enabled save/restore of the
> >>    migration stream as the pages are ensured to be written at aligned
> >>    offsets.
> >>
> >>  - It allows the usage of multifd so we can write RAM pages to the
> >>    migration file in parallel.
> >>
> >> For now, enabling the capability has no effect. The next couple of
> >> patches implement the core functionality.
> >>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >
> > [...]
> >
> >> diff --git a/qapi/migration.json b/qapi/migration.json
> >> index 5a565d9b8d..3fce5fe53e 100644
> >> --- a/qapi/migration.json
> >> +++ b/qapi/migration.json
> >> @@ -531,6 +531,10 @@
> >>  #     and can result in more stable read performance.  Requires KVM
> >>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
> >>  #
> >> +# @fixed-ram: Migrate using fixed offsets in the migration file for
> >> +#     each RAM page.  Requires a migration URI that supports seeking,
> >> +#     such as a file.  (since 9.0)
> >> +#
> >>  # Features:
> >>  #
> >>  # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
> >> @@ -555,7 +559,7 @@
> >>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> >>             'validate-uuid', 'background-snapshot',
> >>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> >> -           'dirty-limit'] }
> >> +           'dirty-limit', 'fixed-ram'] }
> >>  
> >>  ##
> >>  # @MigrationCapabilityStatus:
> >
> > Can we find a better name than @fixed-ram?  @fixed-ram-offsets?
> > @use-seek?
> 
> I have no idea how we came to fixed-ram. The archives don't provide any
> clarification. I find it confusing at first glance as well.
> 
> A little brainstorming on how fixed-ram is different from exiting
> migration:
> 
> Fixed-ram:
>   uses a file, like the 'file:' migration;
> 
>   needs a seeking medium, such as a file;
> 
>   migrates ram by placing a page always in the same offset in the
>   file, contrary to normal migration which streams the page changes
>   continuously;
> 
>   ensures a migration file of size bounded to VM RAM size, contrary to
>   normal 'file:' migration which creates a file with unbounded size;
> 
>   enables multi-threaded RAM migration, even though we only use it when
>   multifd is enabled;
> 
>   uses scatter-gatter APIs (pwritev, preadv);
> 
> So a few options:
> 
> (disconsidering use-seek, it might be even more generic/vague)
> 
> - fixed-ram-offsets
> - non-streaming (or streaming: false)
> - ram-scatter-gather (ram-sg)
> - parallel-ram (even with the slight inaccuracy that we sometimes do it single-threaded)

I could add 'mapped-ram', as an alternative to 'fixed-ram'.

The key distinguishing & motivating feature here is that
RAM regions are mapped directly to file regions, instead
of just being streamed at arbitrary points.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability
  2024-02-21 13:50       ` Daniel P. Berrangé
@ 2024-02-21 15:05         ` Fabiano Rosas
  0 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-21 15:05 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, qemu-devel, Peter Xu, Claudio Fontana,
	Eric Blake

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Feb 21, 2024 at 10:24:05AM -0300, Fabiano Rosas wrote:
>> Markus Armbruster <armbru@redhat.com> writes:
>> 
>> > Fabiano Rosas <farosas@suse.de> writes:
>> >
>> >> Add a new migration capability 'fixed-ram'.
>> >>
>> >> The core of the feature is to ensure that each RAM page has a specific
>> >> offset in the resulting migration stream. The reasons why we'd want
>> >> such behavior are:
>> >>
>> >>  - The resulting file will have a bounded size, since pages which are
>> >>    dirtied multiple times will always go to a fixed location in the
>> >>    file, rather than constantly being added to a sequential
>> >>    stream. This eliminates cases where a VM with, say, 1G of RAM can
>> >>    result in a migration file that's 10s of GBs, provided that the
>> >>    workload constantly redirties memory.
>> >>
>> >>  - It paves the way to implement O_DIRECT-enabled save/restore of the
>> >>    migration stream as the pages are ensured to be written at aligned
>> >>    offsets.
>> >>
>> >>  - It allows the usage of multifd so we can write RAM pages to the
>> >>    migration file in parallel.
>> >>
>> >> For now, enabling the capability has no effect. The next couple of
>> >> patches implement the core functionality.
>> >>
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >
>> > [...]
>> >
>> >> diff --git a/qapi/migration.json b/qapi/migration.json
>> >> index 5a565d9b8d..3fce5fe53e 100644
>> >> --- a/qapi/migration.json
>> >> +++ b/qapi/migration.json
>> >> @@ -531,6 +531,10 @@
>> >>  #     and can result in more stable read performance.  Requires KVM
>> >>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>> >>  #
>> >> +# @fixed-ram: Migrate using fixed offsets in the migration file for
>> >> +#     each RAM page.  Requires a migration URI that supports seeking,
>> >> +#     such as a file.  (since 9.0)
>> >> +#
>> >>  # Features:
>> >>  #
>> >>  # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
>> >> @@ -555,7 +559,7 @@
>> >>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>> >>             'validate-uuid', 'background-snapshot',
>> >>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
>> >> -           'dirty-limit'] }
>> >> +           'dirty-limit', 'fixed-ram'] }
>> >>  
>> >>  ##
>> >>  # @MigrationCapabilityStatus:
>> >
>> > Can we find a better name than @fixed-ram?  @fixed-ram-offsets?
>> > @use-seek?
>> 
>> I have no idea how we came to fixed-ram. The archives don't provide any
>> clarification. I find it confusing at first glance as well.
>> 
>> A little brainstorming on how fixed-ram is different from exiting
>> migration:
>> 
>> Fixed-ram:
>>   uses a file, like the 'file:' migration;
>> 
>>   needs a seeking medium, such as a file;
>> 
>>   migrates ram by placing a page always in the same offset in the
>>   file, contrary to normal migration which streams the page changes
>>   continuously;
>> 
>>   ensures a migration file of size bounded to VM RAM size, contrary to
>>   normal 'file:' migration which creates a file with unbounded size;
>> 
>>   enables multi-threaded RAM migration, even though we only use it when
>>   multifd is enabled;
>> 
>>   uses scatter-gatter APIs (pwritev, preadv);
>> 
>> So a few options:
>> 
>> (disconsidering use-seek, it might be even more generic/vague)
>> 
>> - fixed-ram-offsets
>> - non-streaming (or streaming: false)
>> - ram-scatter-gather (ram-sg)
>> - parallel-ram (even with the slight inaccuracy that we sometimes do it single-threaded)
>
> I could add 'mapped-ram', as an alternative to 'fixed-ram'.
>
> The key distinguishing & motivating feature here is that
> RAM regions are mapped directly to file regions, instead
> of just being streamed at arbitrary points.

"map" is certainly a good shorthand for the various "placed at relative
offsets" that I used throughout this series.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT
  2024-02-21 13:37     ` Fabiano Rosas
@ 2024-02-22  6:56       ` Markus Armbruster
  2024-02-22 13:26         ` Fabiano Rosas
  0 siblings, 1 reply; 79+ messages in thread
From: Markus Armbruster @ 2024-02-22  6:56 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> Fabiano Rosas <farosas@suse.de> writes:
>>
>>> We're about to enable the use of O_DIRECT in the migration code and
>>> due to the alignment restrictions imposed by filesystems we need to
>>> make sure the flag is only used when doing aligned IO.
>>>
>>> The migration will do parallel IO to different regions of a file, so
>>> we need to use more than one file descriptor. Those cannot be obtained
>>> by duplicating (dup()) since duplicated file descriptors share the
>>> file status flags, including O_DIRECT. If one migration channel does
>>> unaligned IO while another sets O_DIRECT to do aligned IO, the
>>> filesystem would fail the unaligned operation.
>>>
>>> The add-fd QMP command along with the fdset code are specifically
>>> designed to allow the user to pass a set of file descriptors with
>>> different access flags into QEMU to be later fetched by code that
>>> needs to alternate between those flags when doing IO.
>>>
>>> Extend the fdset matching function to behave the same with the
>>> O_DIRECT flag.
>>>
>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> ---
>>>  monitor/fds.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/monitor/fds.c b/monitor/fds.c
>>> index 9a28e4b72b..42bf3eb982 100644
>>> --- a/monitor/fds.c
>>> +++ b/monitor/fds.c
>>> @@ -413,6 +413,12 @@ static bool monitor_fdset_flags_match(int flags, int fd_flags)
>>    static bool monitor_fdset_flags_match(int flags, int fd_flags)
>>    {
>>        bool match = false;
>>    
>>>      if ((flags & O_ACCMODE) == (fd_flags & O_ACCMODE)) {
>>>          match = true;
>>> +
>>> +#ifdef O_DIRECT
>>> +        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
>>> +            match = false;
>>> +        }
>>> +#endif
>>>      }
>>>  
>>>      return match;
>>    }
>>
>> I'd prefer something like
>>
>>    static bool monitor_fdset_flags_match(int flags, int fd_flags)
>>    {
>>    #ifdef O_DIRECT
>>        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
>>            return false;
>>        }
>>    #endif
>>
>>        if ((flags & O_ACCMODE) != (fd_flags & O_ACCMODE)) {
>>            return false;
>>
>>        }
>>
>>        return true;
>>    }
>
> This makes the O_DIRECT flag dictate the outcome when it's present. I
> want O_DIRECT to be considered only when all other flags have matched.
>
> Otherwise we regress the original use-case if the user happened to have
> put O_DIRECT in the flags. A non-match due to different O_ACCMODE would
> become a match due to (possibly) matching O_DIRECT.

The fact that I missed this signifies one of two things: either was
suffering from code review brain (quite possible!), or this needs a
comment and/or clearer coding.

If I understand you correctly, you want to return true when the bits
selected by the two masks together match.

If we didn't need ifdeffery, we wouldn't use nested conditionals for
comparing bits under a mask.  We'd use something like

        int mask = O_ACCMODE | O_DIRECT;

        return (flags & mask) == (fd_flags & mask);

Bring back the ifdeffery:

        int mask = O_ACCMODE;

    #ifdef O_DIRECT
        mask |= O_DIRECT;
    #endif

        return (flags & mask) == (fd_flags & mask);

Or maybe even

    #ifndef O_DIRECT
    #define O_DIRECT 0
    #endif

        int mask = O_ACCMODE | O_DIRECT;

        return (flags & mask) == (fd_flags & mask);

Not sure this is even worth a helper function.

Or am I stull suffering from code review brain?



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT
  2024-02-22  6:56       ` Markus Armbruster
@ 2024-02-22 13:26         ` Fabiano Rosas
  2024-02-22 14:44           ` Markus Armbruster
  0 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-22 13:26 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana

Markus Armbruster <armbru@redhat.com> writes:

> Fabiano Rosas <farosas@suse.de> writes:
>
>> Markus Armbruster <armbru@redhat.com> writes:
>>
>>> Fabiano Rosas <farosas@suse.de> writes:
>>>
>>>> We're about to enable the use of O_DIRECT in the migration code and
>>>> due to the alignment restrictions imposed by filesystems we need to
>>>> make sure the flag is only used when doing aligned IO.
>>>>
>>>> The migration will do parallel IO to different regions of a file, so
>>>> we need to use more than one file descriptor. Those cannot be obtained
>>>> by duplicating (dup()) since duplicated file descriptors share the
>>>> file status flags, including O_DIRECT. If one migration channel does
>>>> unaligned IO while another sets O_DIRECT to do aligned IO, the
>>>> filesystem would fail the unaligned operation.
>>>>
>>>> The add-fd QMP command along with the fdset code are specifically
>>>> designed to allow the user to pass a set of file descriptors with
>>>> different access flags into QEMU to be later fetched by code that
>>>> needs to alternate between those flags when doing IO.
>>>>
>>>> Extend the fdset matching function to behave the same with the
>>>> O_DIRECT flag.
>>>>
>>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>> ---
>>>>  monitor/fds.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/monitor/fds.c b/monitor/fds.c
>>>> index 9a28e4b72b..42bf3eb982 100644
>>>> --- a/monitor/fds.c
>>>> +++ b/monitor/fds.c
>>>> @@ -413,6 +413,12 @@ static bool monitor_fdset_flags_match(int flags, int fd_flags)
>>>    static bool monitor_fdset_flags_match(int flags, int fd_flags)
>>>    {
>>>        bool match = false;
>>>    
>>>>      if ((flags & O_ACCMODE) == (fd_flags & O_ACCMODE)) {
>>>>          match = true;
>>>> +
>>>> +#ifdef O_DIRECT
>>>> +        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
>>>> +            match = false;
>>>> +        }
>>>> +#endif
>>>>      }
>>>>  
>>>>      return match;
>>>    }
>>>
>>> I'd prefer something like
>>>
>>>    static bool monitor_fdset_flags_match(int flags, int fd_flags)
>>>    {
>>>    #ifdef O_DIRECT
>>>        if ((flags & O_DIRECT) != (fd_flags & O_DIRECT)) {
>>>            return false;
>>>        }
>>>    #endif
>>>
>>>        if ((flags & O_ACCMODE) != (fd_flags & O_ACCMODE)) {
>>>            return false;
>>>
>>>        }
>>>
>>>        return true;
>>>    }
>>
>> This makes the O_DIRECT flag dictate the outcome when it's present. I
>> want O_DIRECT to be considered only when all other flags have matched.
>>
>> Otherwise we regress the original use-case if the user happened to have
>> put O_DIRECT in the flags. A non-match due to different O_ACCMODE would
>> become a match due to (possibly) matching O_DIRECT.
>
> The fact that I missed this signifies one of two things: either was
> suffering from code review brain (quite possible!), or this needs a
> comment and/or clearer coding.
>
> If I understand you correctly, you want to return true when the bits
> selected by the two masks together match.
>
> If we didn't need ifdeffery, we wouldn't use nested conditionals for
> comparing bits under a mask.  We'd use something like
>
>         int mask = O_ACCMODE | O_DIRECT;
>
>         return (flags & mask) == (fd_flags & mask);
>
> Bring back the ifdeffery:
>
>         int mask = O_ACCMODE;
>
>     #ifdef O_DIRECT
>         mask |= O_DIRECT;
>     #endif
>
>         return (flags & mask) == (fd_flags & mask);

Could be. I'll change it.

>
> Or maybe even
>
>     #ifndef O_DIRECT
>     #define O_DIRECT 0
>     #endif
>
>         int mask = O_ACCMODE | O_DIRECT;
>
>         return (flags & mask) == (fd_flags & mask);
>
> Not sure this is even worth a helper function.

Agreed.

>
> Or am I stull suffering from code review brain?

Yes, stull suffering. =)


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT
  2024-02-22 13:26         ` Fabiano Rosas
@ 2024-02-22 14:44           ` Markus Armbruster
  0 siblings, 0 replies; 79+ messages in thread
From: Markus Armbruster @ 2024-02-22 14:44 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, Peter Xu, Claudio Fontana

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:

[...]

>> Or am I stull suffering from code review brain?
>
> Yes, stull suffering. =)

%-}

/me hoists white flag



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (33 preceding siblings ...)
  2024-02-20 22:41 ` [PATCH v4 34/34] tests/qtest/migration: Add a test for fixed-ram with passing of fds Fabiano Rosas
@ 2024-02-23  2:59 ` Peter Xu
  2024-02-23 13:48   ` Claudio Fontana
  2024-02-23 14:22   ` Fabiano Rosas
  2024-02-26  6:15 ` Peter Xu
  35 siblings, 2 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-23  2:59 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
> Latest numbers
> ==============
> 
> => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
> => host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
> => pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
>    from query-migrate
> 
> non-live           | time (ms)   pages/s   mb/s   MB/s
> -------------------+-----------------------------------
> file               |    110512    256258   9549   1193
>   + bg-snapshot    |    245660    119581   4303    537

Is this the one using userfault?  I'm surprised it's much slower when
enabled; logically for a non-live snapshot it should take similar loops
like a normal migration as it should have zero faults, then it should be
similar performance.

> -------------------+-----------------------------------
> fixed-ram          |    157975    216877   6672    834
>   + multifd 8 ch.  |     95922    292178  10982   1372
>      + direct-io   |     23268   1936897  45330   5666
> -------------------------------------------------------
> 
> live               | time (ms)   pages/s   mb/s   MB/s
> -------------------+-----------------------------------
> file               |         -         -      -      - (file grew 4x the VM size)
>   + bg-snapshot    |    357635    141747   2974    371
> -------------------+-----------------------------------
> fixed-ram          |         -         -      -      - (no convergence in 5 min)
>   + multifd 8 ch.  |    230812    497551  14900   1862
>      + direct-io   |     27475   1788025  46736   5842
> -------------------------------------------------------

Also surprised on direct-io too.. that is definitely something tremendous.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport
  2024-02-20 22:41 ` [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport Fabiano Rosas
@ 2024-02-23  3:01   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-23  3:01 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:05PM -0300, Fabiano Rosas wrote:
> When adding the support for file migration with the file: transport,
> we missed adding documentation for it.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 02/34] tests/qtest/migration: Rename fd_proto test
  2024-02-20 22:41 ` [PATCH v4 02/34] tests/qtest/migration: Rename fd_proto test Fabiano Rosas
@ 2024-02-23  3:03   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-23  3:03 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

On Tue, Feb 20, 2024 at 07:41:06PM -0300, Fabiano Rosas wrote:
> Next patch adds another fd test. Rename the existing one closer to
> what's used on other tests, with the 'precopy' prefix.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 03/34] tests/qtest/migration: Add a fd + file test
  2024-02-20 22:41 ` [PATCH v4 03/34] tests/qtest/migration: Add a fd + file test Fabiano Rosas
@ 2024-02-23  3:08   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-23  3:08 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

On Tue, Feb 20, 2024 at 07:41:07PM -0300, Fabiano Rosas wrote:
> The fd URI supports an fd that is backed by a file. The code should
> select between QIOChannelFile and QIOChannelSocket, depending on the
> type of the fd. Add a test for that.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 04/34] migration/multifd: Remove p->quit from recv side
  2024-02-20 22:41 ` [PATCH v4 04/34] migration/multifd: Remove p->quit from recv side Fabiano Rosas
@ 2024-02-23  3:13   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-23  3:13 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:08PM -0300, Fabiano Rosas wrote:
> Like we did on the sending side, replace the p->quit per-channel flag
> with a global atomic 'exiting' flag.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 05/34] migration/multifd: Release recv sem_sync earlier
  2024-02-20 22:41 ` [PATCH v4 05/34] migration/multifd: Release recv sem_sync earlier Fabiano Rosas
@ 2024-02-23  3:16   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-23  3:16 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:09PM -0300, Fabiano Rosas wrote:
> Now that multifd_recv_terminate_threads() is called only once, release
> the recv side sem_sync earlier like we do for the send side.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
  2024-02-23  2:59 ` [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Peter Xu
@ 2024-02-23 13:48   ` Claudio Fontana
  2024-02-23 14:22   ` Fabiano Rosas
  1 sibling, 0 replies; 79+ messages in thread
From: Claudio Fontana @ 2024-02-23 13:48 UTC (permalink / raw)
  To: Peter Xu, Fabiano Rosas; +Cc: qemu-devel, berrange, armbru

On 2/23/24 03:59, Peter Xu wrote:
> On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
>> Latest numbers
>> ==============
>>
>> => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
>> => host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
>> => pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
>>    from query-migrate
>>
>> non-live           | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |    110512    256258   9549   1193
>>   + bg-snapshot    |    245660    119581   4303    537
> 
> Is this the one using userfault?  I'm surprised it's much slower when
> enabled; logically for a non-live snapshot it should take similar loops
> like a normal migration as it should have zero faults, then it should be
> similar performance.
> 
>> -------------------+-----------------------------------
>> fixed-ram          |    157975    216877   6672    834
>>   + multifd 8 ch.  |     95922    292178  10982   1372
>>      + direct-io   |     23268   1936897  45330   5666
>> -------------------------------------------------------
>>
>> live               | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |         -         -      -      - (file grew 4x the VM size)
>>   + bg-snapshot    |    357635    141747   2974    371
>> -------------------+-----------------------------------
>> fixed-ram          |         -         -      -      - (no convergence in 5 min)
>>   + multifd 8 ch.  |    230812    497551  14900   1862
>>      + direct-io   |     27475   1788025  46736   5842
>> -------------------------------------------------------
> 
> Also surprised on direct-io too.. that is definitely something tremendous.
> 

Awesome! Can't wait to have this available for our customers.

Ciao,

Claudio




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
  2024-02-23  2:59 ` [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Peter Xu
  2024-02-23 13:48   ` Claudio Fontana
@ 2024-02-23 14:22   ` Fabiano Rosas
  1 sibling, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-23 14:22 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

Peter Xu <peterx@redhat.com> writes:

> On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
>> Latest numbers
>> ==============
>> 
>> => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
>> => host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
>> => pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
>>    from query-migrate
>> 
>> non-live           | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |    110512    256258   9549   1193
>>   + bg-snapshot    |    245660    119581   4303    537
>
> Is this the one using userfault?  I'm surprised it's much slower when
> enabled; logically for a non-live snapshot it should take similar loops
> like a normal migration as it should have zero faults, then it should be
> similar performance.

I just enabled the background-snapshot capability. Is there extra setup
that must be done to enable this properly? The ufd_version_check from
migration-test returns true on this system.

>> -------------------+-----------------------------------
>> fixed-ram          |    157975    216877   6672    834
>>   + multifd 8 ch.  |     95922    292178  10982   1372
>>      + direct-io   |     23268   1936897  45330   5666
>> -------------------------------------------------------
>> 
>> live               | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |         -         -      -      - (file grew 4x the VM size)
>>   + bg-snapshot    |    357635    141747   2974    371
>> -------------------+-----------------------------------
>> fixed-ram          |         -         -      -      - (no convergence in 5 min)
>>   + multifd 8 ch.  |    230812    497551  14900   1862
>>      + direct-io   |     27475   1788025  46736   5842
>> -------------------------------------------------------
>
> Also surprised on direct-io too.. that is definitely something tremendous.

Indeed. That was the intention with this series all along after all.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability
  2024-02-20 22:41 ` [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
  2024-02-21  8:41   ` Markus Armbruster
@ 2024-02-26  3:07   ` Peter Xu
  2024-02-26  3:22   ` Peter Xu
  2 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  3:07 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Eric Blake

On Tue, Feb 20, 2024 at 07:41:15PM -0300, Fabiano Rosas wrote:
> Add a new migration capability 'fixed-ram'.
> 
> The core of the feature is to ensure that each RAM page has a specific
> offset in the resulting migration stream. The reasons why we'd want
> such behavior are:
> 
>  - The resulting file will have a bounded size, since pages which are
>    dirtied multiple times will always go to a fixed location in the
>    file, rather than constantly being added to a sequential
>    stream. This eliminates cases where a VM with, say, 1G of RAM can
>    result in a migration file that's 10s of GBs, provided that the
>    workload constantly redirties memory.
> 
>  - It paves the way to implement O_DIRECT-enabled save/restore of the
>    migration stream as the pages are ensured to be written at aligned
>    offsets.
> 
>  - It allows the usage of multifd so we can write RAM pages to the
>    migration file in parallel.
> 
> For now, enabling the capability has no effect. The next couple of
> patches implement the core functionality.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> - update migration.json to 9.0 and improve wording
> - move docs to a separate file and add use cases information
> ---
>  docs/devel/migration/features.rst  |   1 +
>  docs/devel/migration/fixed-ram.rst | 137 +++++++++++++++++++++++++++++
>  migration/options.c                |  34 +++++++
>  migration/options.h                |   1 +
>  migration/savevm.c                 |   1 +
>  qapi/migration.json                |   6 +-
>  6 files changed, 179 insertions(+), 1 deletion(-)
>  create mode 100644 docs/devel/migration/fixed-ram.rst
> 
> diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst
> index a9acaf618e..4c708b679a 100644
> --- a/docs/devel/migration/features.rst
> +++ b/docs/devel/migration/features.rst
> @@ -10,3 +10,4 @@ Migration has plenty of features to support different use cases.
>     dirty-limit
>     vfio
>     virtio
> +   fixed-ram
> diff --git a/docs/devel/migration/fixed-ram.rst b/docs/devel/migration/fixed-ram.rst
> new file mode 100644
> index 0000000000..a6c0e5a360
> --- /dev/null
> +++ b/docs/devel/migration/fixed-ram.rst
> @@ -0,0 +1,137 @@
> +Fixed-ram
> +=========
> +
> +Fixed-ram is a new stream format for the RAM section designed to
> +supplement the existing ``file:`` migration and make it compatible
> +with ``multifd``. This enables parallel migration of a guest's RAM to
> +a file.
> +
> +The core of the feature is to ensure that each RAM page has a specific
> +offset in the resulting migration file. This enables the ``multifd``
> +threads to write exclusively to those offsets even if the guest is
> +constantly dirtying pages (i.e. live migration). Another benefit is
> +that the resulting file will have a bounded size, since pages which
> +are dirtied multiple times will always go to a fixed location in the
> +file, rather than constantly being added to a sequential
> +stream. Having the pages at fixed offsets also allows the usage of
> +O_DIRECT for save/restore of the migration stream as the pages are
> +ensured to be written respecting O_DIRECT alignment restrictions.
> +
> +Usage
> +-----
> +
> +On both source and destination, enable the ``multifd`` and
> +``fixed-ram`` capabilities:
> +
> +    ``migrate_set_capability multifd on``
> +
> +    ``migrate_set_capability fixed-ram on``
> +
> +Use a ``file:`` URL for migration:
> +
> +    ``migrate file:/path/to/migration/file``
> +
> +Fixed-ram migration is best done non-live, i.e. by stopping the VM on
> +the source side before migrating.
> +
> +For best performance enable the ``direct-io`` capability as well:
> +
> +    ``migrate_set_capability direct-io on``
> +
> +Use-cases
> +---------
> +
> +The fixed-ram feature was designed for use cases where the migration
> +stream will be directed to a file in the filesystem and not
> +immediately restored on the destination VM [#]_. These could be
> +thought of as snapshots. We can further categorize them into live and
> +non-live.
> +
> +- Non-live snapshot
> +
> +If the use case requires a VM to be stopped before taking a snapshot,
> +that's the ideal scenario for fixed-ram migration. Not having to track
> +dirty pages, the migration will write the RAM pages to the disk as
> +fast as it can.
> +
> +Note: if a snapshot is taken of a running VM, but the VM will be
> +stopped after the snapshot by the admin, then consider stopping it
> +right before the snapshot to take benefit of the performance gains
> +mentioned above.
> +
> +- Live snapshot
> +
> +If the use case requires that the VM keeps running during and after
> +the snapshot operation, then fixed-ram migration can still be used,
> +but will be less performant. Other strategies such as
> +background-snapshot should be evaluated as well. One benefit of
> +fixed-ram in this scenario is portability since background-snapshot
> +depends on async dirty tracking (KVM_GET_DIRTY_LOG) which is not

Background snapshot uses userfaultfd-wp rather than KVM_GET_DIRTY_LOG.  The
statement is still correct though, that userfault is only supported on
Linux in general (wp is one sub-feature, represents "write-protect mode")
so this should help portability, as it removes the dependency on the OS.

> +supported outside of Linux.
> +
> +.. [#] While this same effect could be obtained with the usage of
> +       snapshots or the ``file:`` migration alone, fixed-ram provides
> +       a performance increase for VMs with larger RAM sizes (10s to
> +       100s of GiBs), specially if the VM has been stopped beforehand.
> +
> +RAM section format
> +------------------
> +
> +Instead of having a sequential stream of pages that follow the
> +RAMBlock headers, the dirty pages for a RAMBlock follow its header
> +instead. This ensures that each RAM page has a fixed offset in the
> +resulting migration file.
> +
> +A bitmap is introduced to track which pages have been written in the
> +migration file. Pages are written at a fixed location for every
> +ramblock. Zero pages are ignored as they'd be zero in the destination
> +migration as well.
> +
> +::
> +
> + Without fixed-ram:                  With fixed-ram:
> +
> + ---------------------               --------------------------------
> + | ramblock 1 header |               | ramblock 1 header            |
> + ---------------------               --------------------------------
> + | ramblock 2 header |               | ramblock 1 fixed-ram header  |
> + ---------------------               --------------------------------
> + | ...               |               | padding to next 1MB boundary |
> + ---------------------               | ...                          |
> + | ramblock n header |               --------------------------------
> + ---------------------               | ramblock 1 pages             |
> + | RAM_SAVE_FLAG_EOS |               | ...                          |
> + ---------------------               --------------------------------
> + | stream of pages   |               | ramblock 2 header            |
> + | (iter 1)          |               --------------------------------
> + | ...               |               | ramblock 2 fixed-ram header  |
> + ---------------------               --------------------------------
> + | RAM_SAVE_FLAG_EOS |               | padding to next 1MB boundary |
> + ---------------------               | ...                          |
> + | stream of pages   |               --------------------------------
> + | (iter 2)          |               | ramblock 2 pages             |
> + | ...               |               | ...                          |
> + ---------------------               --------------------------------
> + | ...               |               | ...                          |
> + ---------------------               --------------------------------
> +                                     | RAM_SAVE_FLAG_EOS            |
> +                                     --------------------------------
> +                                     | ...                          |
> +                                     --------------------------------
> +
> + where:
> +  - ramblock header: the generic information for a ramblock, such as
> +    idstr, used_len, etc.
> +
> +  - ramblock fixed-ram header: the information added by this feature:
> +    bitmap of pages written, bitmap size and offset of pages in the
> +    migration file.
> +
> +Restrictions
> +------------
> +
> +Since pages are written to their relative offsets and out of order
> +(due to the memory dirtying patterns), streaming channels such as
> +sockets are not supported. A seekable channel such as a file is
> +required. This can be verified in the QIOChannel by the presence of
> +the QIO_CHANNEL_FEATURE_SEEKABLE.
> diff --git a/migration/options.c b/migration/options.c
> index 3e3e0b93b4..4909e5c72a 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -204,6 +204,7 @@ Property migration_properties[] = {
>      DEFINE_PROP_MIG_CAP("x-switchover-ack",
>                          MIGRATION_CAPABILITY_SWITCHOVER_ACK),
>      DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
> +    DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),

Let's directly use "fixed-ram" (or "mapped-ram", or whatever new name we
decide to use, as long as without "x-")?

migration_properties is not documented anywhere, mostly yet for debugging
purpose.  We could have dropped all the "x-"s, IMHO.

>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -263,6 +264,13 @@ bool migrate_events(void)
>      return s->capabilities[MIGRATION_CAPABILITY_EVENTS];
>  }
>  
> +bool migrate_fixed_ram(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    return s->capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
> +}
> +
>  bool migrate_ignore_shared(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -645,6 +653,32 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>          }
>      }
>  
> +    if (new_caps[MIGRATION_CAPABILITY_FIXED_RAM]) {
> +        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
> +            error_setg(errp,
> +                       "Fixed-ram migration is incompatible with multifd");
> +            return false;
> +        }
> +
> +        if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
> +            error_setg(errp,
> +                       "Fixed-ram migration is incompatible with xbzrle");
> +            return false;
> +        }
> +
> +        if (new_caps[MIGRATION_CAPABILITY_COMPRESS]) {
> +            error_setg(errp,
> +                       "Fixed-ram migration is incompatible with compression");
> +            return false;
> +        }
> +
> +        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
> +            error_setg(errp,
> +                       "Fixed-ram migration is incompatible with postcopy ram");
> +            return false;
> +        }
> +    }
> +
>      return true;
>  }
>  
> diff --git a/migration/options.h b/migration/options.h
> index 246c160aee..8680a10b79 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -31,6 +31,7 @@ bool migrate_compress(void);
>  bool migrate_dirty_bitmaps(void);
>  bool migrate_dirty_limit(void);
>  bool migrate_events(void);
> +bool migrate_fixed_ram(void);
>  bool migrate_ignore_shared(void);
>  bool migrate_late_block_activate(void);
>  bool migrate_multifd(void);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d612c8a902..4b928dd6bb 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -245,6 +245,7 @@ static bool should_validate_capability(int capability)
>      /* Validate only new capabilities to keep compatibility. */
>      switch (capability) {
>      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
> +    case MIGRATION_CAPABILITY_FIXED_RAM:
>          return true;
>      default:
>          return false;
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5a565d9b8d..3fce5fe53e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -531,6 +531,10 @@
>  #     and can result in more stable read performance.  Requires KVM
>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>  #
> +# @fixed-ram: Migrate using fixed offsets in the migration file for
> +#     each RAM page.  Requires a migration URI that supports seeking,
> +#     such as a file.  (since 9.0)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
> @@ -555,7 +559,7 @@
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> -           'dirty-limit'] }
> +           'dirty-limit', 'fixed-ram'] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 12/34] migration: Add fixed-ram URI compatibility check
  2024-02-20 22:41 ` [PATCH v4 12/34] migration: Add fixed-ram URI compatibility check Fabiano Rosas
@ 2024-02-26  3:11   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  3:11 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:16PM -0300, Fabiano Rosas wrote:
> The fixed-ram migration format needs a channel that supports seeking
> to be able to write each page to an arbitrary offset in the migration
> stream.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability
  2024-02-20 22:41 ` [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
  2024-02-21  8:41   ` Markus Armbruster
  2024-02-26  3:07   ` Peter Xu
@ 2024-02-26  3:22   ` Peter Xu
  2 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  3:22 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Eric Blake

On Tue, Feb 20, 2024 at 07:41:15PM -0300, Fabiano Rosas wrote:
> + Without fixed-ram:                  With fixed-ram:
> +
> + ---------------------               --------------------------------
> + | ramblock 1 header |               | ramblock 1 header            |
> + ---------------------               --------------------------------
> + | ramblock 2 header |               | ramblock 1 fixed-ram header  |
> + ---------------------               --------------------------------
> + | ...               |               | padding to next 1MB boundary |
> + ---------------------               | ...                          |
> + | ramblock n header |               --------------------------------
> + ---------------------               | ramblock 1 pages             |
> + | RAM_SAVE_FLAG_EOS |               | ...                          |
> + ---------------------               --------------------------------
> + | stream of pages   |               | ramblock 2 header            |
> + | (iter 1)          |               --------------------------------
> + | ...               |               | ramblock 2 fixed-ram header  |
> + ---------------------               --------------------------------
> + | RAM_SAVE_FLAG_EOS |               | padding to next 1MB boundary |
> + ---------------------               | ...                          |
> + | stream of pages   |               --------------------------------
> + | (iter 2)          |               | ramblock 2 pages             |
> + | ...               |               | ...                          |
> + ---------------------               --------------------------------
> + | ...               |               | ...                          |
> + ---------------------               --------------------------------
> +                                     | RAM_SAVE_FLAG_EOS            |
> +                                     --------------------------------
> +                                     | ...                          |
> +                                     --------------------------------
> +
> + where:

Super-nit: you can drop the " " otherwise it's put into the quote.

> +  - ramblock header: the generic information for a ramblock, such as
> +    idstr, used_len, etc.
> +
> +  - ramblock fixed-ram header: the information added by this feature:
> +    bitmap of pages written, bitmap size and offset of pages in the
> +    migration file.
> +
> +Restrictions
> +------------
> +
> +Since pages are written to their relative offsets and out of order
> +(due to the memory dirtying patterns), streaming channels such as
> +sockets are not supported. A seekable channel such as a file is
> +required. This can be verified in the QIOChannel by the presence of
> +the QIO_CHANNEL_FEATURE_SEEKABLE.

Would it worth also mention that it only provides fixed offsets to "guest
physical RAM" only?  For example, GPU RAM won't apply as they're migrated
as part of device states, even if also iterable.  IOW, IIUC if there's a
VFIO device (or a few), the fixed-ram migration file will still be unbound
in size, because device can keep flushing stale vRAM to the image..

Maybe that's too specific, I'll leave that to you to decide whether to even
mention it.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 13/34] migration/ram: Add outgoing 'fixed-ram' migration
  2024-02-20 22:41 ` [PATCH v4 13/34] migration/ram: Add outgoing 'fixed-ram' migration Fabiano Rosas
@ 2024-02-26  4:03   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  4:03 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Nikolay Borisov,
	Paolo Bonzini, David Hildenbrand, Philippe Mathieu-Daudé

On Tue, Feb 20, 2024 at 07:41:17PM -0300, Fabiano Rosas wrote:
> Implement the outgoing migration side for the 'fixed-ram' capability.
> 
> A bitmap is introduced to track which pages have been written in the
> migration file. Pages are written at a fixed location for every
> ramblock. Zero pages are ignored as they'd be zero in the destination
> migration as well.
> 
> The migration stream is altered to put the dirty pages for a ramblock
> after its header instead of having a sequential stream of pages that
> follow the ramblock headers.
> 
> Without fixed-ram (current):        With fixed-ram (new):
> 
>  ---------------------               --------------------------------
>  | ramblock 1 header |               | ramblock 1 header            |
>  ---------------------               --------------------------------
>  | ramblock 2 header |               | ramblock 1 fixed-ram header  |
>  ---------------------               --------------------------------
>  | ...               |               | padding to next 1MB boundary |
>  ---------------------               | ...                          |
>  | ramblock n header |               --------------------------------
>  ---------------------               | ramblock 1 pages             |
>  | RAM_SAVE_FLAG_EOS |               | ...                          |
>  ---------------------               --------------------------------
>  | stream of pages   |               | ramblock 2 header            |
>  | (iter 1)          |               --------------------------------
>  | ...               |               | ramblock 2 fixed-ram header  |
>  ---------------------               --------------------------------
>  | RAM_SAVE_FLAG_EOS |               | padding to next 1MB boundary |
>  ---------------------               | ...                          |
>  | stream of pages   |               --------------------------------
>  | (iter 2)          |               | ramblock 2 pages             |
>  | ...               |               | ...                          |
>  ---------------------               --------------------------------
>  | ...               |               | ...                          |
>  ---------------------               --------------------------------
>                                      | RAM_SAVE_FLAG_EOS            |
>                                      --------------------------------
>                                      | ...                          |
>                                      --------------------------------
> 
> where:
>  - ramblock header: the generic information for a ramblock, such as
>    idstr, used_len, etc.
> 
>  - ramblock fixed-ram header: the new information added by this
>    feature: bitmap of pages written, bitmap size and offset of pages
>    in the migration file.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

Still one comment below:

[...]

> @@ -3187,6 +3288,18 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> +    if (migrate_fixed_ram()) {
> +        ram_save_file_bmap(f);
> +
> +        if (qemu_file_get_error(f)) {
> +            Error *local_err = NULL;
> +            int err = qemu_file_get_error_obj(f, &local_err);
> +
> +            error_reportf_err(local_err, "Failed to write bitmap to file: ");

We always do error report if we set s->error.

Ideally I think we should have Error** passed to the caller and set
s->error there, instead of report here.  But the whole error handling is
still a bit of a mess, so I guess we can do anything on top.

> +            return -err;
> +        }
> +    }

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 14/34] migration/ram: Add incoming 'fixed-ram' migration
  2024-02-20 22:41 ` [PATCH v4 14/34] migration/ram: Add incoming " Fabiano Rosas
@ 2024-02-26  5:19   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  5:19 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Nikolay Borisov

On Tue, Feb 20, 2024 at 07:41:18PM -0300, Fabiano Rosas wrote:
> Add the necessary code to parse the format changes for the 'fixed-ram'
> capability.
> 
> One of the more notable changes in behavior is that in the 'fixed-ram'
> case ram pages are restored in one go rather than constantly looping
> through the migration stream.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

Two more nitpicks below.

> ---
> - added error propagation for read_ramblock_fixed_ram()
> - removed buf_size variable
> ---
>  migration/ram.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 142 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 84c531722c..5932e1b8e1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -106,6 +106,12 @@
>   */
>  #define FIXED_RAM_FILE_OFFSET_ALIGNMENT 0x100000
>  
> +/*
> + * When doing fixed-ram migration, this is the amount we read from the
> + * pages region in the migration file at a time.
> + */
> +#define FIXED_RAM_LOAD_BUF_SIZE 0x100000
> +
>  XBZRLECacheStats xbzrle_counters;
>  
>  /* used by the search for pages to send */
> @@ -2999,6 +3005,35 @@ static void fixed_ram_setup_ramblock(QEMUFile *file, RAMBlock *block)
>      qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
>  }
>  
> +static bool fixed_ram_read_header(QEMUFile *file, FixedRamHeader *header,
> +                                  Error **errp)
> +{
> +    size_t ret, header_size = sizeof(FixedRamHeader);
> +
> +    ret = qemu_get_buffer(file, (uint8_t *)header, header_size);
> +    if (ret != header_size) {
> +        error_setg(errp, "Could not read whole fixed-ram migration header "
> +                   "(expected %zd, got %zd bytes)", header_size, ret);
> +        return false;
> +    }
> +
> +    /* migration stream is big-endian */
> +    header->version = be32_to_cpu(header->version);
> +
> +    if (header->version > FIXED_RAM_HDR_VERSION) {
> +        error_setg(errp, "Migration fixed-ram capability version mismatch "
> +                   "(expected %d, got %d)", FIXED_RAM_HDR_VERSION,
> +                   header->version);

Instead of "mismatch", perhaps "not supported"?

It doesn't need to strictly match the macro defined, e.g., if we boost it
some day we could support more than one versions?  However larger than the
macro means it came from a newer QEMU, hence not supported seems more
appropriate.

> +        return false;
> +    }
> +
> +    header->page_size = be64_to_cpu(header->page_size);
> +    header->bitmap_offset = be64_to_cpu(header->bitmap_offset);
> +    header->pages_offset = be64_to_cpu(header->pages_offset);
> +
> +    return true;
> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> @@ -3900,6 +3935,102 @@ void colo_flush_ram_cache(void)
>      trace_colo_flush_ram_cache_end();
>  }
>  
> +static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
> +                                    long num_pages, unsigned long *bitmap,
> +                                    Error **errp)
> +{
> +    ERRP_GUARD();
> +    unsigned long set_bit_idx, clear_bit_idx;
> +    ram_addr_t offset;
> +    void *host;
> +    size_t read, unread, size;
> +
> +    for (set_bit_idx = find_first_bit(bitmap, num_pages);
> +         set_bit_idx < num_pages;
> +         set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
> +
> +        clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1);
> +
> +        unread = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
> +        offset = set_bit_idx << TARGET_PAGE_BITS;
> +
> +        while (unread > 0) {
> +            host = host_from_ram_block_offset(block, offset);
> +            if (!host) {
> +                error_setg(errp, "page outside of ramblock %s range",
> +                           block->idstr);
> +                return false;
> +            }
> +
> +            size = MIN(unread, FIXED_RAM_LOAD_BUF_SIZE);
> +
> +            read = qemu_get_buffer_at(f, host, size,
> +                                      block->pages_offset + offset);
> +            if (!read) {
> +                goto err;
> +            }
> +            offset += read;
> +            unread -= read;
> +        }
> +    }
> +
> +    return true;
> +
> +err:
> +    qemu_file_get_error_obj(f, errp);
> +    error_prepend(errp, "(%s) failed to read page " RAM_ADDR_FMT
> +                  "from file offset %" PRIx64 ": ", block->idstr, offset,
> +                  block->pages_offset + offset);
> +    return false;
> +}
> +
> +static void parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
> +                                     ram_addr_t length, Error **errp)
> +{
> +    g_autofree unsigned long *bitmap = NULL;
> +    FixedRamHeader header;
> +    size_t bitmap_size;
> +    long num_pages;
> +
> +    if (!fixed_ram_read_header(f, &header, errp)) {
> +        return;
> +    }
> +
> +    block->pages_offset = header.pages_offset;
> +
> +    /*
> +     * Check the alignment of the file region that contains pages. We
> +     * don't enforce FIXED_RAM_FILE_OFFSET_ALIGNMENT to allow that
> +     * value to change in the future. Do only a sanity check with page
> +     * size alignment.
> +     */
> +    if (!QEMU_IS_ALIGNED(block->pages_offset, TARGET_PAGE_SIZE)) {
> +        error_setg(errp,
> +                   "Error reading ramblock %s pages, region has bad alignment",
> +                   block->idstr);
> +        return;
> +    }
> +
> +    num_pages = length / header.page_size;
> +    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
> +
> +    bitmap = g_malloc0(bitmap_size);
> +    if (qemu_get_buffer_at(f, (uint8_t *)bitmap, bitmap_size,
> +                           header.bitmap_offset) != bitmap_size) {
> +        error_setg(errp, "Error reading dirty bitmap");
> +        return;
> +    }
> +
> +    if (!read_ramblock_fixed_ram(f, block, num_pages, bitmap, errp)) {
> +        return;
> +    }
> +
> +    /* Skip pages array */
> +    qemu_set_offset(f, block->pages_offset + length, SEEK_SET);
> +
> +    return;
> +}
> +
>  static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>  {
>      int ret = 0;
> @@ -3908,6 +4039,17 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>  
>      assert(block);
>  
> +    if (migrate_fixed_ram()) {
> +        Error *local_err = NULL;

We could move this to top, merge with the other local_err used by
qemu_ram_resize().

> +
> +        parse_ramblock_fixed_ram(f, block, length, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return -EINVAL;
> +        }
> +        return 0;
> +    }
> +
>      if (!qemu_ram_is_migratable(block)) {
>          error_report("block %s should not be migrated !", block->idstr);
>          return -EINVAL;
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 18/34] migration/multifd: Allow multifd without packets
  2024-02-20 22:41 ` [PATCH v4 18/34] migration/multifd: Allow multifd without packets Fabiano Rosas
@ 2024-02-26  5:57   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  5:57 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:22PM -0300, Fabiano Rosas wrote:
> For the upcoming support to the new 'fixed-ram' migration stream
> format, we cannot use multifd packets because each write into the
> ramblock section in the migration file is expected to contain only the
> guest pages. They are written at their respective offsets relative to
> the ramblock section header.
> 
> There is no space for the packet information and the expected gains
> from the new approach come partly from being able to write the pages
> sequentially without extraneous data in between.
> 
> The new format also simply doesn't need the packets and all necessary
> information can be taken from the standard migration headers with some
> (future) changes to multifd code.
> 
> Use the presence of the fixed-ram capability to decide whether to send
> packets.
> 
> This only moves code under multifd_use_packets(), it has no effect for
> now as fixed-ram cannot yet be enabled with multifd.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Mostly good to me, but since we'll probably need at least one more round, I
left some more comments.

> ---
>  migration/multifd.c | 188 +++++++++++++++++++++++++++-----------------
>  1 file changed, 117 insertions(+), 71 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 5a38cb222f..0a5279314d 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -92,6 +92,11 @@ struct {
>      MultiFDMethods *ops;
>  } *multifd_recv_state;
>  
> +static bool multifd_use_packets(void)
> +{
> +    return !migrate_fixed_ram();
> +}
> +
>  /* Multifd without compression */
>  
>  /**
> @@ -136,10 +141,11 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
>  static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>  {
>      bool use_zero_copy_send = migrate_zero_copy_send();
> +    bool use_packets = multifd_use_packets();
>      MultiFDPages_t *pages = p->pages;
>      int ret;
>  
> -    if (!use_zero_copy_send) {
> +    if (!use_zero_copy_send && use_packets) {
>          /*
>           * Only !zerocopy needs the header in IOV; zerocopy will
>           * send it separately.
> @@ -156,14 +162,16 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>      p->next_packet_size = pages->num * p->page_size;
>      p->flags |= MULTIFD_FLAG_NOCOMP;

These two shouldn't be needed by fixed-ram, either?

IIUC only the IOV prepare and future zero page detections may be needed for
fixed-ram in nocomp_send_prepare(). Perhaps something like this would be
clearer?

static void nocomp_send_prepare_iovs(MultiFDSendParams *p)
{
    MultiFDPages_t *pages = p->pages;
    int i;

    for (i = 0; i < pages->num; i++) {
        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
        p->iov[p->iovs_num].iov_len = p->page_size;
        p->iovs_num++;
    }
}

static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
{
    bool use_zero_copy_send = migrate_zero_copy_send();
    MultiFDPages_t *pages = p->pages;
    int ret;

    if (!multifd_use_packet()) {
        nocomp_send_prepare_iovs(p);
        return true;
    }

    if (!use_zero_copy_send) {
        /*
         * Only !zerocopy needs the header in IOV; zerocopy will
         * send it separately.
         */
        multifd_send_prepare_header(p);
    }

    nocomp_send_prepare_iovs(p);
    ...
}

Then in the future we can also put zero page detection logic into this new
nocomp_send_prepare_iovs(), iiuc.

>  
> -    multifd_send_fill_packet(p);
> +    if (use_packets) {
> +        multifd_send_fill_packet(p);
>  
> -    if (use_zero_copy_send) {
> -        /* Send header first, without zerocopy */
> -        ret = qio_channel_write_all(p->c, (void *)p->packet,
> -                                    p->packet_len, errp);
> -        if (ret != 0) {
> -            return -1;
> +        if (use_zero_copy_send) {
> +            /* Send header first, without zerocopy */
> +            ret = qio_channel_write_all(p->c, (void *)p->packet,
> +                                        p->packet_len, errp);
> +            if (ret != 0) {
> +                return -1;
> +            }
>          }
>      }
>  
> @@ -215,11 +223,16 @@ static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
>                     p->id, flags, MULTIFD_FLAG_NOCOMP);
>          return -1;
>      }
> -    for (int i = 0; i < p->normal_num; i++) {
> -        p->iov[i].iov_base = p->host + p->normal[i];
> -        p->iov[i].iov_len = p->page_size;
> +
> +    if (multifd_use_packets()) {
> +        for (int i = 0; i < p->normal_num; i++) {
> +            p->iov[i].iov_base = p->host + p->normal[i];
> +            p->iov[i].iov_len = p->page_size;
> +        }
> +        return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>      }
> -    return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
> +
> +    return 0;
>  }
>  
>  static MultiFDMethods multifd_nocomp_ops = {
> @@ -799,15 +812,18 @@ static void *multifd_send_thread(void *opaque)
>      MigrationThread *thread = NULL;
>      Error *local_err = NULL;
>      int ret = 0;
> +    bool use_packets = multifd_use_packets();
>  
>      thread = migration_threads_add(p->name, qemu_get_thread_id());
>  
>      trace_multifd_send_thread_start(p->id);
>      rcu_register_thread();
>  
> -    if (multifd_send_initial_packet(p, &local_err) < 0) {
> -        ret = -1;
> -        goto out;
> +    if (use_packets) {
> +        if (multifd_send_initial_packet(p, &local_err) < 0) {
> +            ret = -1;
> +            goto out;
> +        }
>      }
>  
>      while (true) {
> @@ -858,16 +874,20 @@ static void *multifd_send_thread(void *opaque)
>               * it doesn't require explicit memory barriers.
>               */
>              assert(qatomic_read(&p->pending_sync));
> -            p->flags = MULTIFD_FLAG_SYNC;
> -            multifd_send_fill_packet(p);
> -            ret = qio_channel_write_all(p->c, (void *)p->packet,
> -                                        p->packet_len, &local_err);
> -            if (ret != 0) {
> -                break;
> +
> +            if (use_packets) {
> +                p->flags = MULTIFD_FLAG_SYNC;
> +                multifd_send_fill_packet(p);
> +                ret = qio_channel_write_all(p->c, (void *)p->packet,
> +                                            p->packet_len, &local_err);
> +                if (ret != 0) {
> +                    break;
> +                }
> +                /* p->next_packet_size will always be zero for a SYNC packet */
> +                stat64_add(&mig_stats.multifd_bytes, p->packet_len);
> +                p->flags = 0;
>              }
> -            /* p->next_packet_size will always be zero for a SYNC packet */
> -            stat64_add(&mig_stats.multifd_bytes, p->packet_len);
> -            p->flags = 0;
> +
>              qatomic_set(&p->pending_sync, false);
>              qemu_sem_post(&p->sem_sync);
>          }
> @@ -1016,6 +1036,7 @@ bool multifd_send_setup(void)
>      Error *local_err = NULL;
>      int thread_count, ret = 0;
>      uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> +    bool use_packets = multifd_use_packets();
>      uint8_t i;
>  
>      if (!migrate_multifd()) {
> @@ -1038,27 +1059,35 @@ bool multifd_send_setup(void)
>          qemu_sem_init(&p->sem_sync, 0);
>          p->id = i;
>          p->pages = multifd_pages_init(page_count);
> -        p->packet_len = sizeof(MultiFDPacket_t)
> -                      + sizeof(uint64_t) * page_count;
> -        p->packet = g_malloc0(p->packet_len);
> -        p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
> -        p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> +
> +        if (use_packets) {
> +            p->packet_len = sizeof(MultiFDPacket_t)
> +                          + sizeof(uint64_t) * page_count;
> +            p->packet = g_malloc0(p->packet_len);
> +            p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
> +            p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> +
> +            /* We need one extra place for the packet header */
> +            p->iov = g_new0(struct iovec, page_count + 1);
> +        } else {
> +            p->iov = g_new0(struct iovec, page_count);
> +        }
>          p->name = g_strdup_printf("multifdsend_%d", i);
> -        /* We need one extra place for the packet header */
> -        p->iov = g_new0(struct iovec, page_count + 1);
>          p->page_size = qemu_target_page_size();
>          p->page_count = page_count;
>          p->write_flags = 0;
>          multifd_new_send_channel_create(p);
>      }
>  
> -    /*
> -     * Wait until channel creation has started for all channels. The
> -     * creation can still fail, but no more channels will be created
> -     * past this point.
> -     */
> -    for (i = 0; i < thread_count; i++) {
> -        qemu_sem_wait(&multifd_send_state->channels_created);
> +    if (use_packets) {
> +        /*
> +         * Wait until channel creation has started for all channels. The
> +         * creation can still fail, but no more channels will be created
> +         * past this point.
> +         */
> +        for (i = 0; i < thread_count; i++) {
> +            qemu_sem_wait(&multifd_send_state->channels_created);
> +        }
>      }

If so we may need a document for channels_created explaining that it's only
used in "packet-typed" multifd migrations.  And it's always not obvious when
reading this chunk to understand why the thread management can be relevant
to "packet" mode or not.

Instead of doing so, IMHO it's much cleaner we leave it be, then post the
channels_created in your new file_send_channel_create() instead - even if we
know it's synchronous, we keep the channels_created semantics simple.

>  
>      for (i = 0; i < thread_count; i++) {
> @@ -1108,7 +1137,9 @@ static void multifd_recv_terminate_threads(Error *err)
>           * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
>           * however try to wakeup it without harm in cleanup phase.
>           */
> -        qemu_sem_post(&p->sem_sync);
> +        if (multifd_use_packets()) {
> +            qemu_sem_post(&p->sem_sync);
> +        }
>  
>          /*
>           * We could arrive here for two reasons:
> @@ -1182,7 +1213,7 @@ void multifd_recv_sync_main(void)
>  {
>      int i;
>  
> -    if (!migrate_multifd()) {
> +    if (!migrate_multifd() || !multifd_use_packets()) {
>          return;
>      }
>      for (i = 0; i < migrate_multifd_channels(); i++) {
> @@ -1209,13 +1240,14 @@ static void *multifd_recv_thread(void *opaque)
>  {
>      MultiFDRecvParams *p = opaque;
>      Error *local_err = NULL;
> +    bool use_packets = multifd_use_packets();
>      int ret;
>  
>      trace_multifd_recv_thread_start(p->id);
>      rcu_register_thread();
>  
>      while (true) {
> -        uint32_t flags;
> +        uint32_t flags = 0;
>          bool has_data = false;
>          p->normal_num = 0;
>  
> @@ -1223,25 +1255,27 @@ static void *multifd_recv_thread(void *opaque)
>              break;
>          }
>  
> -        ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
> -                                       p->packet_len, &local_err);
> -        if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
> -            break;
> -        }
> +        if (use_packets) {
> +            ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
> +                                           p->packet_len, &local_err);
> +            if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
> +                break;
> +            }
>  
> -        qemu_mutex_lock(&p->mutex);
> -        ret = multifd_recv_unfill_packet(p, &local_err);
> -        if (ret) {
> +            qemu_mutex_lock(&p->mutex);
> +            ret = multifd_recv_unfill_packet(p, &local_err);
> +            if (ret) {
> +                qemu_mutex_unlock(&p->mutex);
> +                break;
> +            }
> +
> +            flags = p->flags;
> +            /* recv methods don't know how to handle the SYNC flag */
> +            p->flags &= ~MULTIFD_FLAG_SYNC;
> +            has_data = !!p->normal_num;
>              qemu_mutex_unlock(&p->mutex);
> -            break;
>          }
>  
> -        flags = p->flags;
> -        /* recv methods don't know how to handle the SYNC flag */
> -        p->flags &= ~MULTIFD_FLAG_SYNC;
> -        has_data = !!p->normal_num;
> -        qemu_mutex_unlock(&p->mutex);
> -
>          if (has_data) {
>              ret = multifd_recv_state->ops->recv(p, &local_err);
>              if (ret != 0) {
> @@ -1249,9 +1283,11 @@ static void *multifd_recv_thread(void *opaque)
>              }
>          }
>  
> -        if (flags & MULTIFD_FLAG_SYNC) {
> -            qemu_sem_post(&multifd_recv_state->sem_sync);
> -            qemu_sem_wait(&p->sem_sync);
> +        if (use_packets) {
> +            if (flags & MULTIFD_FLAG_SYNC) {
> +                qemu_sem_post(&multifd_recv_state->sem_sync);
> +                qemu_sem_wait(&p->sem_sync);
> +            }

Some comment explaining why this is only used in packet mode would be nice.

>          }
>      }
>  
> @@ -1270,6 +1306,7 @@ int multifd_recv_setup(Error **errp)
>  {
>      int thread_count;
>      uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> +    bool use_packets = multifd_use_packets();
>      uint8_t i;
>  
>      /*
> @@ -1294,9 +1331,12 @@ int multifd_recv_setup(Error **errp)
>          qemu_mutex_init(&p->mutex);
>          qemu_sem_init(&p->sem_sync, 0);
>          p->id = i;
> -        p->packet_len = sizeof(MultiFDPacket_t)
> -                      + sizeof(uint64_t) * page_count;
> -        p->packet = g_malloc0(p->packet_len);
> +
> +        if (use_packets) {
> +            p->packet_len = sizeof(MultiFDPacket_t)
> +                + sizeof(uint64_t) * page_count;
> +            p->packet = g_malloc0(p->packet_len);
> +        }
>          p->name = g_strdup_printf("multifdrecv_%d", i);
>          p->iov = g_new0(struct iovec, page_count);
>          p->normal = g_new0(ram_addr_t, page_count);
> @@ -1340,18 +1380,24 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>  {
>      MultiFDRecvParams *p;
>      Error *local_err = NULL;
> +    bool use_packets = multifd_use_packets();
>      int id;
>  
> -    id = multifd_recv_initial_packet(ioc, &local_err);
> -    if (id < 0) {
> -        multifd_recv_terminate_threads(local_err);
> -        error_propagate_prepend(errp, local_err,
> -                                "failed to receive packet"
> -                                " via multifd channel %d: ",
> -                                qatomic_read(&multifd_recv_state->count));
> -        return;
> +    if (use_packets) {
> +        id = multifd_recv_initial_packet(ioc, &local_err);
> +        if (id < 0) {
> +            multifd_recv_terminate_threads(local_err);
> +            error_propagate_prepend(errp, local_err,
> +                                    "failed to receive packet"
> +                                    " via multifd channel %d: ",
> +                                    qatomic_read(&multifd_recv_state->count));
> +            return;
> +        }
> +        trace_multifd_recv_new_channel(id);
> +    } else {
> +        /* next patch gives this a meaningful value */
> +        id = 0;
>      }
> -    trace_multifd_recv_new_channel(id);
>  
>      p = &multifd_recv_state->params[id];
>      if (p->c != NULL) {
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
  2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
                   ` (34 preceding siblings ...)
  2024-02-23  2:59 ` [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Peter Xu
@ 2024-02-26  6:15 ` Peter Xu
  35 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  6:15 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
> 0) Cleanups                           [1-5]

While I am still reading the rest.. I queued these five first.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 19/34] migration/multifd: Allow receiving pages without packets
  2024-02-20 22:41 ` [PATCH v4 19/34] migration/multifd: Allow receiving pages " Fabiano Rosas
@ 2024-02-26  6:58   ` Peter Xu
  2024-02-26 19:19     ` Fabiano Rosas
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2024-02-26  6:58 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:23PM -0300, Fabiano Rosas wrote:
> Currently multifd does not need to have knowledge of pages on the
> receiving side because all the information needed is within the
> packets that come in the stream.
> 
> We're about to add support to fixed-ram migration, which cannot use
> packets because it expects the ramblock section in the migration file
> to contain only the guest pages data.
> 
> Add a data structure to transfer pages between the ram migration code
> and the multifd receiving threads.
> 
> We don't want to reuse MultiFDPages_t for two reasons:
> 
> a) multifd threads don't really need to know about the data they're
>    receiving.
> 
> b) the receiving side has to be stopped to load the pages, which means
>    we can experiment with larger granularities than page size when
>    transferring data.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> @Peter: a 'quit' flag cannot be used instead of pending_job. The
> receiving thread needs know there's no more data coming. If the
> migration thread sets a 'quit' flag, the multifd thread would see the
> flag right away and exit.

Hmm.. isn't this exactly what we want?  I'll comment for this inline below.

> The only way is to clear pending_job on the
> thread and spin once more.
> ---
>  migration/file.c    |   1 +
>  migration/multifd.c | 122 +++++++++++++++++++++++++++++++++++++++++---
>  migration/multifd.h |  15 ++++++
>  3 files changed, 131 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index 5d4975f43e..22d052a71f 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -6,6 +6,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "exec/ramblock.h"
>  #include "qemu/cutils.h"
>  #include "qapi/error.h"
>  #include "channel.h"
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0a5279314d..45a0c7aaa8 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -81,9 +81,15 @@ struct {
>  
>  struct {
>      MultiFDRecvParams *params;
> +    MultiFDRecvData *data;
>      /* number of created threads */
>      int count;
> -    /* syncs main thread and channels */
> +    /*
> +     * For sockets: this is posted once for each MULTIFD_FLAG_SYNC flag.
> +     *
> +     * For files: this is only posted at the end of the file load to mark
> +     *            completion of the load process.
> +     */
>      QemuSemaphore sem_sync;
>      /* global number of generated multifd packets */
>      uint64_t packet_num;
> @@ -1110,6 +1116,53 @@ bool multifd_send_setup(void)
>      return true;
>  }
>  
> +bool multifd_recv(void)
> +{
> +    int i;
> +    static int next_recv_channel;
> +    MultiFDRecvParams *p = NULL;
> +    MultiFDRecvData *data = multifd_recv_state->data;

[1]

> +
> +    /*
> +     * next_channel can remain from a previous migration that was
> +     * using more channels, so ensure it doesn't overflow if the
> +     * limit is lower now.
> +     */
> +    next_recv_channel %= migrate_multifd_channels();
> +    for (i = next_recv_channel;; i = (i + 1) % migrate_multifd_channels()) {
> +        if (multifd_recv_should_exit()) {
> +            return false;
> +        }
> +
> +        p = &multifd_recv_state->params[i];
> +
> +        /*
> +         * Safe to read atomically without a lock because the flag is
> +         * only set by this function below. Reading an old value of
> +         * true is not an issue because it would only send us looking
> +         * for the next idle channel.
> +         */
> +        if (qatomic_read(&p->pending_job) == false) {
> +            next_recv_channel = (i + 1) % migrate_multifd_channels();
> +            break;
> +        }
> +    }

IIUC you'll need an smp_mb_acquire() here.  The ordering of "reading
pending_job" and below must be guaranteed, similar to the sender side.

> +
> +    assert(!p->data->size);
> +    multifd_recv_state->data = p->data;

[2]

> +    p->data = data;
> +
> +    qatomic_set(&p->pending_job, true);

Then here:

       qatomic_store_release(&p->pending_job, true);

Please consider add comment above all acquire/releases pairs like sender
too.

> +    qemu_sem_post(&p->sem);
> +
> +    return true;
> +}
> +
> +MultiFDRecvData *multifd_get_recv_data(void)
> +{
> +    return multifd_recv_state->data;
> +}

Can also use it above [1].

I'm thinking maybe we can do something like:

#define  MULTIFD_RECV_DATA_GLOBAL  (multifd_recv_state->data)

Then we can also use it at [2], and replace multifd_get_recv_data()?

> +
>  static void multifd_recv_terminate_threads(Error *err)
>  {
>      int i;
> @@ -1134,11 +1187,26 @@ static void multifd_recv_terminate_threads(Error *err)
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
>          /*
> -         * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
> -         * however try to wakeup it without harm in cleanup phase.
> +         * The migration thread and channels interact differently
> +         * depending on the presence of packets.
>           */
>          if (multifd_use_packets()) {
> +            /*
> +             * The channel receives as long as there are packets. When
> +             * packets end (i.e. MULTIFD_FLAG_SYNC is reached), the
> +             * channel waits for the migration thread to sync. If the
> +             * sync never happens, do it here.
> +             */
>              qemu_sem_post(&p->sem_sync);
> +        } else {
> +            /*
> +             * The channel waits for the migration thread to give it
> +             * work. When the migration thread runs out of work, it
> +             * releases the channel and waits for any pending work to
> +             * finish. If we reach here (e.g. due to error) before the
> +             * work runs out, release the channel.
> +             */
> +            qemu_sem_post(&p->sem);
>          }
>  
>          /*
> @@ -1167,6 +1235,7 @@ static void multifd_recv_cleanup_channel(MultiFDRecvParams *p)
>      p->c = NULL;
>      qemu_mutex_destroy(&p->mutex);
>      qemu_sem_destroy(&p->sem_sync);
> +    qemu_sem_destroy(&p->sem);
>      g_free(p->name);
>      p->name = NULL;
>      p->packet_len = 0;
> @@ -1184,6 +1253,8 @@ static void multifd_recv_cleanup_state(void)
>      qemu_sem_destroy(&multifd_recv_state->sem_sync);
>      g_free(multifd_recv_state->params);
>      multifd_recv_state->params = NULL;
> +    g_free(multifd_recv_state->data);
> +    multifd_recv_state->data = NULL;
>      g_free(multifd_recv_state);
>      multifd_recv_state = NULL;
>  }
> @@ -1251,11 +1322,11 @@ static void *multifd_recv_thread(void *opaque)
>          bool has_data = false;
>          p->normal_num = 0;
>  
> -        if (multifd_recv_should_exit()) {
> -            break;
> -        }
> -
>          if (use_packets) {
> +            if (multifd_recv_should_exit()) {
> +                break;
> +            }
> +
>              ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>                                             p->packet_len, &local_err);
>              if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
> @@ -1274,6 +1345,26 @@ static void *multifd_recv_thread(void *opaque)
>              p->flags &= ~MULTIFD_FLAG_SYNC;
>              has_data = !!p->normal_num;
>              qemu_mutex_unlock(&p->mutex);
> +        } else {
> +            /*
> +             * No packets, so we need to wait for the vmstate code to
> +             * give us work.
> +             */
> +            qemu_sem_wait(&p->sem);
> +
> +            if (multifd_recv_should_exit()) {
> +                break;
> +            }
> +
> +            /*
> +             * Migration thread did not send work, break and signal
> +             * sem_sync so it knows we're not lagging behind.
> +             */
> +            if (!qatomic_read(&p->pending_job)) {
> +                break;
> +            }

In reality, this _must_ be true when reaching here, right?  Since AFAIU
recv side p->sem is posted only in two conditions:

  1) when there is work (pending_job==true)
  2) when terminating threads (multifd_recv_should_exit==true)

Then if 2) is checked above, I assume 1) must be the case here?

> +
> +            has_data = !!p->data->size;
>          }
>  
>          if (has_data) {
> @@ -1288,9 +1379,17 @@ static void *multifd_recv_thread(void *opaque)
>                  qemu_sem_post(&multifd_recv_state->sem_sync);
>                  qemu_sem_wait(&p->sem_sync);
>              }
> +        } else {
> +            p->total_normal_pages += p->data->size / qemu_target_page_size();
> +            p->data->size = 0;
> +            qatomic_set(&p->pending_job, false);

I think it needs to be:

  qatomic_store_release(&p->pending_job, false);

?

So as to guarantee when the other side sees pending_job==false, size must
already have been reset.

>          }
>      }
>  
> +    if (!use_packets) {
> +        qemu_sem_post(&p->sem_sync);
> +    }
> +
>      if (local_err) {
>          multifd_recv_terminate_threads(local_err);
>          error_free(local_err);
> @@ -1320,6 +1419,10 @@ int multifd_recv_setup(Error **errp)
>      thread_count = migrate_multifd_channels();
>      multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
>      multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
> +
> +    multifd_recv_state->data = g_new0(MultiFDRecvData, 1);
> +    multifd_recv_state->data->size = 0;
> +
>      qatomic_set(&multifd_recv_state->count, 0);
>      qatomic_set(&multifd_recv_state->exiting, 0);
>      qemu_sem_init(&multifd_recv_state->sem_sync, 0);
> @@ -1330,8 +1433,13 @@ int multifd_recv_setup(Error **errp)
>  
>          qemu_mutex_init(&p->mutex);
>          qemu_sem_init(&p->sem_sync, 0);
> +        qemu_sem_init(&p->sem, 0);
> +        p->pending_job = false;
>          p->id = i;
>  
> +        p->data = g_new0(MultiFDRecvData, 1);
> +        p->data->size = 0;
> +
>          if (use_packets) {
>              p->packet_len = sizeof(MultiFDPacket_t)
>                  + sizeof(uint64_t) * page_count;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 9a6a7a72df..19188815a3 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -13,6 +13,8 @@
>  #ifndef QEMU_MIGRATION_MULTIFD_H
>  #define QEMU_MIGRATION_MULTIFD_H
>  
> +typedef struct MultiFDRecvData MultiFDRecvData;
> +
>  bool multifd_send_setup(void);
>  void multifd_send_shutdown(void);
>  int multifd_recv_setup(Error **errp);
> @@ -23,6 +25,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>  void multifd_recv_sync_main(void);
>  int multifd_send_sync_main(void);
>  bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
> +bool multifd_recv(void);
> +MultiFDRecvData *multifd_get_recv_data(void);
>  
>  /* Multifd Compression flags */
>  #define MULTIFD_FLAG_SYNC (1 << 0)
> @@ -63,6 +67,13 @@ typedef struct {
>      RAMBlock *block;
>  } MultiFDPages_t;
>  
> +struct MultiFDRecvData {
> +    void *opaque;
> +    size_t size;
> +    /* for preadv */
> +    off_t file_offset;
> +};
> +
>  typedef struct {
>      /* Fields are only written at creating/deletion time */
>      /* No lock required for them, they are read only */
> @@ -154,6 +165,8 @@ typedef struct {
>  
>      /* syncs main thread and channels */
>      QemuSemaphore sem_sync;
> +    /* sem where to wait for more work */
> +    QemuSemaphore sem;
>  
>      /* this mutex protects the following parameters */
>      QemuMutex mutex;
> @@ -163,6 +176,8 @@ typedef struct {
>      uint32_t flags;
>      /* global number of generated multifd packets */
>      uint64_t packet_num;
> +    int pending_job;
> +    MultiFDRecvData *data;
>  
>      /* thread local variables. No locking required */
>  
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support
  2024-02-20 22:41 ` [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
@ 2024-02-26  7:10   ` Peter Xu
  2024-02-26  7:21   ` Peter Xu
  1 sibling, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  7:10 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:24PM -0300, Fabiano Rosas wrote:
> Allow multifd to open file-backed channels. This will be used when
> enabling the fixed-ram migration stream format which expects a
> seekable transport.
> 
> The QIOChannel read and write methods will use the preadv/pwritev
> versions which don't update the file offset at each call so we can
> reuse the fd without re-opening for every channel.
> 
> Contrary to the socket migration, the file migration doesn't need an
> asynchronous channel creation process, so expose
> multifd_channel_connect() and call it directly.
> 
> Note that this is just setup code and multifd cannot yet make use of
> the file channels.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/file.c    | 40 ++++++++++++++++++++++++++++++++++++++--
>  migration/file.h    |  5 +++++
>  migration/multifd.c | 27 ++++++++++++++++++++++-----
>  migration/multifd.h |  2 ++
>  4 files changed, 67 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index 22d052a71f..ac9f6ae40a 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -12,12 +12,17 @@
>  #include "channel.h"
>  #include "file.h"
>  #include "migration.h"
> +#include "multifd.h"
>  #include "io/channel-file.h"
>  #include "io/channel-util.h"
>  #include "trace.h"
>  
>  #define OFFSET_OPTION ",offset="
>  
> +static struct FileOutgoingArgs {
> +    char *fname;
> +} outgoing_args;
> +
>  /* Remove the offset option from @filespec and return it in @offsetp. */
>  
>  int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
> @@ -37,6 +42,34 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
>      return 0;
>  }
>  
> +int file_send_channel_destroy(QIOChannel *ioc)
> +{
> +    if (ioc) {
> +        qio_channel_close(ioc, NULL);
> +    }
> +    g_free(outgoing_args.fname);
> +    outgoing_args.fname = NULL;
> +
> +    return 0;
> +}
> +
> +bool file_send_channel_create(gpointer opaque, Error **errp)
> +{
> +    QIOChannelFile *ioc;
> +    int flags = O_WRONLY;
> +
> +    ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
> +    if (!ioc) {
> +        return false;
> +    }
> +
> +    if (!multifd_channel_connect(opaque, QIO_CHANNEL(ioc), errp)) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  void file_start_outgoing_migration(MigrationState *s,
>                                     FileMigrationArgs *file_args, Error **errp)
>  {
> @@ -44,15 +77,18 @@ void file_start_outgoing_migration(MigrationState *s,
>      g_autofree char *filename = g_strdup(file_args->filename);
>      uint64_t offset = file_args->offset;
>      QIOChannel *ioc;
> +    int flags = O_CREAT | O_TRUNC | O_WRONLY;
> +    mode_t mode = 0660;
>  
>      trace_migration_file_outgoing(filename);
>  
> -    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
> -                                     0600, errp);
> +    fioc = qio_channel_file_new_path(filename, flags, mode, errp);

This change seems irrelevant, could be squashed into the previous patch
when introduced?

>      if (!fioc) {
>          return;
>      }
>  
> +    outgoing_args.fname = g_strdup(filename);
> +
>      ioc = QIO_CHANNEL(fioc);
>      if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
>          return;
> diff --git a/migration/file.h b/migration/file.h
> index 37d6a08bfc..90794b494b 100644
> --- a/migration/file.h
> +++ b/migration/file.h
> @@ -9,10 +9,15 @@
>  #define QEMU_MIGRATION_FILE_H
>  
>  #include "qapi/qapi-types-migration.h"
> +#include "io/task.h"
> +#include "channel.h"
>  
>  void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp);
>  
>  void file_start_outgoing_migration(MigrationState *s,
>                                     FileMigrationArgs *file_args, Error **errp);
>  int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
> +
> +bool file_send_channel_create(gpointer opaque, Error **errp);
> +int file_send_channel_destroy(QIOChannel *ioc);
>  #endif
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 45a0c7aaa8..507b497d52 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -17,6 +17,7 @@
>  #include "exec/ramblock.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
> +#include "file.h"
>  #include "ram.h"
>  #include "migration.h"
>  #include "migration-stats.h"
> @@ -28,6 +29,7 @@
>  #include "threadinfo.h"
>  #include "options.h"
>  #include "qemu/yank.h"
> +#include "io/channel-file.h"
>  #include "io/channel-socket.h"
>  #include "yank_functions.h"
>  
> @@ -680,6 +682,9 @@ static void multifd_send_terminate_threads(void)
>  
>  static int multifd_send_channel_destroy(QIOChannel *send)
>  {
> +    if (!multifd_use_packets()) {
> +        return file_send_channel_destroy(send);
> +    }
>      return socket_send_channel_destroy(send);
>  }
>  
> @@ -959,9 +964,8 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p,
>      return true;
>  }
>  
> -static bool multifd_channel_connect(MultiFDSendParams *p,
> -                                    QIOChannel *ioc,
> -                                    Error **errp)
> +bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc,
> +                             Error **errp)
>  {
>      qio_channel_set_delay(ioc, false);
>  
> @@ -1031,9 +1035,14 @@ out:
>      error_free(local_err);
>  }
>  
> -static void multifd_new_send_channel_create(gpointer opaque)
> +static bool multifd_new_send_channel_create(gpointer opaque, Error **errp)
>  {
> +    if (!multifd_use_packets()) {
> +        return file_send_channel_create(opaque, errp);
> +    }
> +
>      socket_send_channel_create(multifd_new_send_channel_async, opaque);
> +    return true;
>  }
>  
>  bool multifd_send_setup(void)
> @@ -1082,7 +1091,15 @@ bool multifd_send_setup(void)
>          p->page_size = qemu_target_page_size();
>          p->page_count = page_count;
>          p->write_flags = 0;
> -        multifd_new_send_channel_create(p);
> +
> +        if (!multifd_new_send_channel_create(p, &local_err)) {
> +            /*
> +             * File channel creation is synchronous, we don't need the
> +             * semaphore below, it's safe to return now.
> +             */
> +            assert(migrate_fixed_ram());

This comment and assert() is slightly confusing to me.  Drop them?

IMHO it's always safe to directly return here, the channels_created sem
will be destroyed later anyway so the number shouldn't matter.

And as I commented in the other email, IMHO it's cleaner we also post that
sem in file_send_channel_create().

> +            return -1;
> +        }
>      }
>  
>      if (use_packets) {
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 19188815a3..135f6ed098 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -228,5 +228,7 @@ static inline void multifd_send_prepare_header(MultiFDSendParams *p)
>      p->iovs_num++;
>  }
>  
> +bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc,
> +                             Error **errp);
>  
>  #endif
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support
  2024-02-20 22:41 ` [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
  2024-02-26  7:10   ` Peter Xu
@ 2024-02-26  7:21   ` Peter Xu
  1 sibling, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  7:21 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:24PM -0300, Fabiano Rosas wrote:
> +int file_send_channel_destroy(QIOChannel *ioc)
> +{
> +    if (ioc) {
> +        qio_channel_close(ioc, NULL);
> +    }
> +    g_free(outgoing_args.fname);
> +    outgoing_args.fname = NULL;

Ah another thing: we may want to have file_cleanup_outgoing_migration()
from the 1st day if possible..

https://lore.kernel.org/all/20240222095301.171137-5-peterx@redhat.com/

The other one was already in my queue, so feel free to rebase to
migration-next directly if before the next pull (I'll remember to push
soon; now it is in -staging).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 21/34] migration/multifd: Add incoming QIOChannelFile support
  2024-02-20 22:41 ` [PATCH v4 21/34] migration/multifd: Add incoming " Fabiano Rosas
@ 2024-02-26  7:34   ` Peter Xu
  2024-02-26  7:53     ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2024-02-26  7:34 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:25PM -0300, Fabiano Rosas wrote:
> On the receiving side we don't need to differentiate between main
> channel and threads, so whichever channel is defined first gets to be
> the main one. And since there are no packets, use the atomic channel
> count to index into the params array.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/file.c      | 34 ++++++++++++++++++++++++++--------
>  migration/migration.c |  3 ++-
>  migration/multifd.c   |  3 +--
>  3 files changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index ac9f6ae40a..a186dc592a 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -8,6 +8,7 @@
>  #include "qemu/osdep.h"
>  #include "exec/ramblock.h"
>  #include "qemu/cutils.h"
> +#include "qemu/error-report.h"
>  #include "qapi/error.h"
>  #include "channel.h"
>  #include "file.h"
> @@ -15,6 +16,7 @@
>  #include "multifd.h"
>  #include "io/channel-file.h"
>  #include "io/channel-util.h"
> +#include "options.h"
>  #include "trace.h"
>  
>  #define OFFSET_OPTION ",offset="
> @@ -111,7 +113,8 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
>      g_autofree char *filename = g_strdup(file_args->filename);
>      QIOChannelFile *fioc = NULL;
>      uint64_t offset = file_args->offset;
> -    QIOChannel *ioc;
> +    int channels = 1;
> +    int i = 0, fd;
>  
>      trace_migration_file_incoming(filename);
>  
> @@ -120,13 +123,28 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
>          return;
>      }
>  
> -    ioc = QIO_CHANNEL(fioc);
> -    if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
> +    if (offset &&
> +        qio_channel_io_seek(QIO_CHANNEL(fioc), offset, SEEK_SET, errp) < 0) {
>          return;
>      }
> -    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
> -    qio_channel_add_watch_full(ioc, G_IO_IN,
> -                               file_accept_incoming_migration,
> -                               NULL, NULL,
> -                               g_main_context_get_thread_default());
> +
> +    if (migrate_multifd()) {
> +        channels += migrate_multifd_channels();
> +    }
> +
> +    fd = fioc->fd;
> +
> +    do {
> +        QIOChannel *ioc = QIO_CHANNEL(fioc);
> +
> +        qio_channel_set_name(ioc, "migration-file-incoming");
> +        qio_channel_add_watch_full(ioc, G_IO_IN,
> +                                   file_accept_incoming_migration,
> +                                   NULL, NULL,
> +                                   g_main_context_get_thread_default());
> +    } while (++i < channels && (fioc = qio_channel_file_new_fd(fd)));

Note that reusing fd here has similar risk in the future that one iochannel
can affect the other, as potentially all shares the same fd underneath; I
think it's the same as "two qemufile v.s. one iochannel" issue that we're
fighting recently.

IIUC the clean case is still that we open one fd for each iochannel.  Or
e.g. as long as one iochannel close() its fd, it immediately invalidates
all the rest iochannels on something like use-after-free of that fd index;
any fd operates races with another fd being opened concurrently.

Maybe we can already use a loop of qio_channel_file_new_path()?  OS should
already cached the dentry etc. so I assume the following ones should be
super fast?  Or there's other complexities that I didn't aware?

> +
> +    if (!fioc) {
> +        error_setg(errp, "Error creating migration incoming channel");
> +    }
>  }
> diff --git a/migration/migration.c b/migration/migration.c
> index 16da269847..e2218b9de7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -896,7 +896,8 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>      uint32_t channel_magic = 0;
>      int ret = 0;
>  
> -    if (migrate_multifd() && !migrate_postcopy_ram() &&
> +    if (migrate_multifd() && !migrate_fixed_ram() &&
> +        !migrate_postcopy_ram() &&
>          qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
>          /*
>           * With multiple channels, it is possible that we receive channels
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 507b497d52..cb5f4fb3e0 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1520,8 +1520,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>          }
>          trace_multifd_recv_new_channel(id);
>      } else {
> -        /* next patch gives this a meaningful value */
> -        id = 0;
> +        id = qatomic_read(&multifd_recv_state->count);
>      }
>  
>      p = &multifd_recv_state->params[id];
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration
  2024-02-20 22:41 ` [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration Fabiano Rosas
@ 2024-02-26  7:47   ` Peter Xu
  2024-02-26 22:52     ` Fabiano Rosas
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2024-02-26  7:47 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:26PM -0300, Fabiano Rosas wrote:
> The fixed-ram migration can be performed live or non-live, but it is
> always asynchronous, i.e. the source machine and the destination
> machine are not migrating at the same time. We only need some pieces
> of the multifd sync operations.
> 
> multifd_send_sync_main()
> ------------------------
>   Issued by the ram migration code on the migration thread, causes the
>   multifd send channels to synchronize with the migration thread and
>   makes the sending side emit a packet with the MULTIFD_FLUSH flag.
> 
>   With fixed-ram we want to maintain the sync on the sending side
>   because that provides ordering between the rounds of dirty pages when
>   migrating live.
> 
> MULTIFD_FLUSH
> -------------
>   On the receiving side, the presence of the MULTIFD_FLUSH flag on a
>   packet causes the receiving channels to start synchronizing with the
>   main thread.
> 
>   We're not using packets with fixed-ram, so there's no MULTIFD_FLUSH
>   flag and therefore no channel sync on the receiving side.
> 
> multifd_recv_sync_main()
> ------------------------
>   Issued by the migration thread when the ram migration flag
>   RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
>   on the receiving side to start synchronizing with the recv
>   channels. Due to compatibility, this is also issued when
>   RAM_SAVE_FLAG_EOS is received.
> 
>   For fixed-ram we only need to synchronize the channels at the end of
>   migration to avoid doing cleanup before the channels have finished
>   their IO.
> 
> Make sure the multifd syncs are only issued at the appropriate
> times. Note that due to pre-existing backward compatibility issues, we
> have the multifd_flush_after_each_section property that enables an
> older behavior of synchronizing channels more frequently (and
> inefficiently). Fixed-ram should always run with that property
> disabled (default).

What if the user enables multifd_flush_after_each_section=true?

IMHO we don't necessarily need to attach the fixed-ram loading flush to any
flag in the stream.  For fixed-ram IIUC all the loads will happen in one
shot of ram_load() anyway when parsing the ramblock list, so.. how about we
decouple the fixed-ram load flush from the stream by always do a sync in
ram_load() unconditionally?

@@ -4368,6 +4367,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             ret = ram_load_precopy(f);
         }
     }
+
+    /*
+     * Fixed-ram migration may queue load tasks to multifd threads; make
+     * sure they're all done.
+     */
+    if (migrate_fixed_ram() && migrate_multifd()) {
+        multifd_recv_sync_main();
+    }
+
     trace_ram_load_complete(ret, seq_iter);
 
     return ret;

Then ram_load() always guarantees synchronous loading of pages, and
fixed-ram will completely ignore multifd flushes (then we also skip it for
the ram_save_complete() like what this patch does for the rest).

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/ram.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 5932e1b8e1..c7050f6f68 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1369,8 +1369,11 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>                  if (ret < 0) {
>                      return ret;
>                  }
> -                qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
> -                qemu_fflush(f);
> +
> +                if (!migrate_fixed_ram()) {
> +                    qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
> +                    qemu_fflush(f);
> +                }
>              }
>              /*
>               * If memory migration starts over, we will meet a dirtied page
> @@ -3112,7 +3115,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> -    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
> +    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()
> +        && !migrate_fixed_ram()) {
>          qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
>      }
>  
> @@ -4253,6 +4257,15 @@ static int ram_load_precopy(QEMUFile *f)
>              break;
>          case RAM_SAVE_FLAG_EOS:
>              /* normal exit */
> +            if (migrate_fixed_ram()) {
> +                /*
> +                 * The EOS flag appears multiple times on the
> +                 * stream. Fixed-ram needs only one sync at the
> +                 * end. It will be done on the flush flag above.
> +                 */
> +                break;
> +            }
> +
>              if (migrate_multifd() &&
>                  migrate_multifd_flush_after_each_section()) {
>                  multifd_recv_sync_main();
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 21/34] migration/multifd: Add incoming QIOChannelFile support
  2024-02-26  7:34   ` Peter Xu
@ 2024-02-26  7:53     ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  7:53 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Mon, Feb 26, 2024 at 03:34:26PM +0800, Peter Xu wrote:
> On Tue, Feb 20, 2024 at 07:41:25PM -0300, Fabiano Rosas wrote:
> > On the receiving side we don't need to differentiate between main
> > channel and threads, so whichever channel is defined first gets to be
> > the main one. And since there are no packets, use the atomic channel
> > count to index into the params array.
> > 
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> > ---
> >  migration/file.c      | 34 ++++++++++++++++++++++++++--------
> >  migration/migration.c |  3 ++-
> >  migration/multifd.c   |  3 +--
> >  3 files changed, 29 insertions(+), 11 deletions(-)
> > 
> > diff --git a/migration/file.c b/migration/file.c
> > index ac9f6ae40a..a186dc592a 100644
> > --- a/migration/file.c
> > +++ b/migration/file.c
> > @@ -8,6 +8,7 @@
> >  #include "qemu/osdep.h"
> >  #include "exec/ramblock.h"
> >  #include "qemu/cutils.h"
> > +#include "qemu/error-report.h"
> >  #include "qapi/error.h"
> >  #include "channel.h"
> >  #include "file.h"
> > @@ -15,6 +16,7 @@
> >  #include "multifd.h"
> >  #include "io/channel-file.h"
> >  #include "io/channel-util.h"
> > +#include "options.h"
> >  #include "trace.h"
> >  
> >  #define OFFSET_OPTION ",offset="
> > @@ -111,7 +113,8 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
> >      g_autofree char *filename = g_strdup(file_args->filename);
> >      QIOChannelFile *fioc = NULL;
> >      uint64_t offset = file_args->offset;
> > -    QIOChannel *ioc;
> > +    int channels = 1;
> > +    int i = 0, fd;
> >  
> >      trace_migration_file_incoming(filename);
> >  
> > @@ -120,13 +123,28 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
> >          return;
> >      }
> >  
> > -    ioc = QIO_CHANNEL(fioc);
> > -    if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
> > +    if (offset &&
> > +        qio_channel_io_seek(QIO_CHANNEL(fioc), offset, SEEK_SET, errp) < 0) {
> >          return;
> >      }
> > -    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
> > -    qio_channel_add_watch_full(ioc, G_IO_IN,
> > -                               file_accept_incoming_migration,
> > -                               NULL, NULL,
> > -                               g_main_context_get_thread_default());
> > +
> > +    if (migrate_multifd()) {
> > +        channels += migrate_multifd_channels();
> > +    }
> > +
> > +    fd = fioc->fd;
> > +
> > +    do {
> > +        QIOChannel *ioc = QIO_CHANNEL(fioc);
> > +
> > +        qio_channel_set_name(ioc, "migration-file-incoming");
> > +        qio_channel_add_watch_full(ioc, G_IO_IN,
> > +                                   file_accept_incoming_migration,
> > +                                   NULL, NULL,
> > +                                   g_main_context_get_thread_default());
> > +    } while (++i < channels && (fioc = qio_channel_file_new_fd(fd)));
> 
> Note that reusing fd here has similar risk in the future that one iochannel
> can affect the other, as potentially all shares the same fd underneath; I
> think it's the same as "two qemufile v.s. one iochannel" issue that we're
> fighting recently.
> 
> IIUC the clean case is still that we open one fd for each iochannel.  Or
> e.g. as long as one iochannel close() its fd, it immediately invalidates
> all the rest iochannels on something like use-after-free of that fd index;
> any fd operates races with another fd being opened concurrently.
> 
> Maybe we can already use a loop of qio_channel_file_new_path()?  OS should
> already cached the dentry etc. so I assume the following ones should be
> super fast?  Or there's other complexities that I didn't aware?

Or simply use dup()?

> 
> > +
> > +    if (!fioc) {
> > +        error_setg(errp, "Error creating migration incoming channel");
> > +    }
> >  }
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 16da269847..e2218b9de7 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -896,7 +896,8 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> >      uint32_t channel_magic = 0;
> >      int ret = 0;
> >  
> > -    if (migrate_multifd() && !migrate_postcopy_ram() &&
> > +    if (migrate_multifd() && !migrate_fixed_ram() &&
> > +        !migrate_postcopy_ram() &&
> >          qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
> >          /*
> >           * With multiple channels, it is possible that we receive channels
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 507b497d52..cb5f4fb3e0 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -1520,8 +1520,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
> >          }
> >          trace_multifd_recv_new_channel(id);
> >      } else {
> > -        /* next patch gives this a meaningful value */
> > -        id = 0;
> > +        id = qatomic_read(&multifd_recv_state->count);
> >      }
> >  
> >      p = &multifd_recv_state->params[id];
> > -- 
> > 2.35.3
> > 
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 23/34] migration/multifd: Support outgoing fixed-ram stream format
  2024-02-20 22:41 ` [PATCH v4 23/34] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
@ 2024-02-26  8:08   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  8:08 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:27PM -0300, Fabiano Rosas wrote:
> The new fixed-ram stream format uses a file transport and puts ram
> pages in the migration file at their respective offsets and can be
> done in parallel by using the pwritev system call which takes iovecs
> and an offset.
> 
> Add support to enabling the new format along with multifd to make use
> of the threading and page handling already in place.
> 
> This requires multifd to stop sending headers and leaving the stream
> format to the fixed-ram code. When it comes time to write the data, we
> need to call a version of qio_channel_write that can take an offset.
> 
> Usage on HMP is:
> 
> (qemu) stop
> (qemu) migrate_set_capability multifd on
> (qemu) migrate_set_capability fixed-ram on
> (qemu) migrate_set_parameter max-bandwidth 0
> (qemu) migrate_set_parameter multifd-channels 8
> (qemu) migrate file:migfile
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

Some nitpicks below.

> ---
>  include/qemu/bitops.h | 13 ++++++++++++
>  migration/file.c      | 47 +++++++++++++++++++++++++++++++++++++++++++
>  migration/file.h      |  2 ++
>  migration/migration.c | 12 ++++++-----
>  migration/multifd.c   | 24 ++++++++++++++++++++--
>  migration/options.c   | 14 +++++++------
>  migration/ram.c       | 17 +++++++++++++---
>  migration/ram.h       |  1 +
>  8 files changed, 114 insertions(+), 16 deletions(-)
> 
> diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
> index cb3526d1f4..2c0a2fe751 100644
> --- a/include/qemu/bitops.h
> +++ b/include/qemu/bitops.h
> @@ -67,6 +67,19 @@ static inline void clear_bit(long nr, unsigned long *addr)
>      *p &= ~mask;
>  }
>  
> +/**
> + * clear_bit_atomic - Clears a bit in memory atomically
> + * @nr: Bit to clear
> + * @addr: Address to start counting from
> + */
> +static inline void clear_bit_atomic(long nr, unsigned long *addr)
> +{
> +    unsigned long mask = BIT_MASK(nr);
> +    unsigned long *p = addr + BIT_WORD(nr);
> +
> +    return qatomic_and(p, ~mask);
> +}
> +
>  /**
>   * change_bit - Toggle a bit in memory
>   * @nr: Bit to change
> diff --git a/migration/file.c b/migration/file.c
> index a186dc592a..94e8e08363 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -148,3 +148,50 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
>          error_setg(errp, "Error creating migration incoming channel");
>      }
>  }
> +
> +int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
> +                            int niov, RAMBlock *block, Error **errp)
> +{
> +    ssize_t ret = -1;
> +    int i, slice_idx, slice_num;
> +    uintptr_t base, next, offset;
> +    size_t len;
> +
> +    slice_idx = 0;
> +    slice_num = 1;
> +
> +    /*
> +     * If the iov array doesn't have contiguous elements, we need to
> +     * split it in slices because we only have one file offset for the
> +     * whole iov. Do this here so callers don't need to break the iov
> +     * array themselves.
> +     */
> +    for (i = 0; i < niov; i++, slice_num++) {
> +        base = (uintptr_t) iov[i].iov_base;
> +
> +        if (i != niov - 1) {
> +            len = iov[i].iov_len;
> +            next = (uintptr_t) iov[i + 1].iov_base;
> +
> +            if (base + len == next) {
> +                continue;
> +            }
> +        }
> +
> +        /*
> +         * Use the offset of the first element of the segment that
> +         * we're sending.
> +         */
> +        offset = (uintptr_t) iov[slice_idx].iov_base - (uintptr_t) block->host;

Wanna do a sanity check over offset v.s. block->used_length?

> +        ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
> +                                  block->pages_offset + offset, errp);
> +        if (ret < 0) {
> +            break;
> +        }
> +
> +        slice_idx += slice_num;
> +        slice_num = 0;
> +    }
> +
> +    return (ret < 0) ? -1 : 0;

IMHO we don't need to hide the negative ret, hence:

  return (ret < 0) ? ret : 0;

> +}
> diff --git a/migration/file.h b/migration/file.h
> index 90794b494b..390dcc6821 100644
> --- a/migration/file.h
> +++ b/migration/file.h
> @@ -20,4 +20,6 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
>  
>  bool file_send_channel_create(gpointer opaque, Error **errp);
>  int file_send_channel_destroy(QIOChannel *ioc);
> +int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
> +                            int niov, RAMBlock *block, Error **errp);
>  #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index e2218b9de7..32b291a282 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -134,12 +134,14 @@ static bool transport_supports_multi_channels(MigrationAddress *addr)
>      if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
>          SocketAddress *saddr = &addr->u.socket;
>  
> -        return saddr->type == SOCKET_ADDRESS_TYPE_INET ||
> -               saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
> -               saddr->type == SOCKET_ADDRESS_TYPE_VSOCK;
> +        return (saddr->type == SOCKET_ADDRESS_TYPE_INET ||
> +                saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
> +                saddr->type == SOCKET_ADDRESS_TYPE_VSOCK);
> +    } else if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
> +        return migrate_fixed_ram();
> +    } else {
> +        return false;
>      }
> -
> -    return false;
>  }
>  
>  static bool migration_needs_seekable_channel(void)
> diff --git a/migration/multifd.c b/migration/multifd.c
> index cb5f4fb3e0..b251c58ec2 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -105,6 +105,17 @@ static bool multifd_use_packets(void)
>      return !migrate_fixed_ram();
>  }
>  
> +static void multifd_set_file_bitmap(MultiFDSendParams *p)
> +{
> +    MultiFDPages_t *pages = p->pages;
> +
> +    assert(pages->block);
> +
> +    for (int i = 0; i < p->pages->num; i++) {
> +        ramblock_set_file_bmap_atomic(pages->block, pages->offset[i]);
> +    }
> +}
> +
>  /* Multifd without compression */
>  
>  /**
> @@ -181,6 +192,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>                  return -1;
>              }
>          }
> +    } else {
> +        multifd_set_file_bitmap(p);

PS: if you liked my other proposal, you can move this to the entry when
handing migrate_fixed_ram().

>      }
>  
>      return 0;
> @@ -860,8 +873,15 @@ static void *multifd_send_thread(void *opaque)
>                  break;
>              }
>  
> -            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
> -                                              0, p->write_flags, &local_err);
> +            if (migrate_fixed_ram()) {
> +                ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num,
> +                                              p->pages->block, &local_err);
> +            } else {
> +                ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num,
> +                                                  NULL, 0, p->write_flags,
> +                                                  &local_err);
> +            }
> +
>              if (ret != 0) {
>                  break;
>              }
> diff --git a/migration/options.c b/migration/options.c
> index 4909e5c72a..bfcd2d7132 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -654,12 +654,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>      }
>  
>      if (new_caps[MIGRATION_CAPABILITY_FIXED_RAM]) {
> -        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
> -            error_setg(errp,
> -                       "Fixed-ram migration is incompatible with multifd");
> -            return false;
> -        }
> -
>          if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
>              error_setg(errp,
>                         "Fixed-ram migration is incompatible with xbzrle");
> @@ -1252,6 +1246,14 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
>      }
>  #endif
>  
> +    if (migrate_fixed_ram() &&
> +        ((params->has_multifd_compression && params->multifd_compression) ||
> +         (params->tls_creds && *params->tls_creds))) {

migrate_tls()?

> +        error_setg(errp,
> +                   "Fixed-ram only available for non-compressed non-TLS multifd migration");
> +        return false;
> +    }

IIUC this could miss the case where one can set tls creds _before_ enable
fixed-ram cap?

We can also check both places but I always think it awkward to duplicates.

For cross-(cap+param) checks maybe we can use migrate_prepare()?

> +
>      if (params->has_x_vcpu_dirty_limit_period &&
>          (params->x_vcpu_dirty_limit_period < 1 ||
>           params->x_vcpu_dirty_limit_period > 1000)) {
> diff --git a/migration/ram.c b/migration/ram.c
> index c7050f6f68..ad540ae9ce 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1149,7 +1149,7 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
>  
>      if (migrate_fixed_ram()) {
>          /* zero pages are not transferred with fixed-ram */
> -        clear_bit(offset >> TARGET_PAGE_BITS, pss->block->file_bmap);
> +        clear_bit_atomic(offset >> TARGET_PAGE_BITS, pss->block->file_bmap);
>          return 1;
>      }
>  
> @@ -2445,8 +2445,6 @@ static void ram_save_cleanup(void *opaque)
>          block->clear_bmap = NULL;
>          g_free(block->bmap);
>          block->bmap = NULL;
> -        g_free(block->file_bmap);
> -        block->file_bmap = NULL;
>      }
>  
>      xbzrle_cleanup();
> @@ -3135,9 +3133,22 @@ static void ram_save_file_bmap(QEMUFile *f)
>          qemu_put_buffer_at(f, (uint8_t *)block->file_bmap, bitmap_size,
>                             block->bitmap_offset);
>          ram_transferred_add(bitmap_size);
> +
> +        /*
> +         * Free the bitmap here to catch any synchronization issues
> +         * with multifd channels. No channels should be sending pages
> +         * after we've written the bitmap to file.
> +         */
> +        g_free(block->file_bmap);
> +        block->file_bmap = NULL;
>      }
>  }
>  
> +void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset)
> +{
> +    set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
> +}
> +
>  /**
>   * ram_save_iterate: iterative stage for migration
>   *
> diff --git a/migration/ram.h b/migration/ram.h
> index 9b937a446b..b9ac0da587 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -75,6 +75,7 @@ bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
>  bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
>  void postcopy_preempt_shutdown_file(MigrationState *s);
>  void *postcopy_preempt_thread(void *opaque);
> +void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset);
>  
>  /* ram cache */
>  int colo_init_ram_cache(void);
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 24/34] migration/multifd: Support incoming fixed-ram stream format
  2024-02-20 22:41 ` [PATCH v4 24/34] migration/multifd: Support incoming " Fabiano Rosas
@ 2024-02-26  8:30   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  8:30 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:28PM -0300, Fabiano Rosas wrote:
> For the incoming fixed-ram migration we need to read the ramblock
> headers, get the pages bitmap and send the host address of each
> non-zero page to the multifd channel thread for writing.
> 
> Usage on HMP is:
> 
> (qemu) migrate_set_capability multifd on
> (qemu) migrate_set_capability fixed-ram on
> (qemu) migrate_incoming file:migfile
> 
> (the ram.h include needs to move because we've been previously relying
> on it being included from migration.c. Now file.h will start including
> multifd.h before migration.o is processed)
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/file.c    | 25 ++++++++++++++++++++++++-
>  migration/file.h    |  2 ++
>  migration/multifd.c | 34 ++++++++++++++++++++++++++++++----
>  migration/multifd.h |  2 ++
>  migration/ram.c     | 36 +++++++++++++++++++++++++++++++++---
>  5 files changed, 91 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index 94e8e08363..1a18e608fc 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -13,7 +13,6 @@
>  #include "channel.h"
>  #include "file.h"
>  #include "migration.h"
> -#include "multifd.h"
>  #include "io/channel-file.h"
>  #include "io/channel-util.h"
>  #include "options.h"
> @@ -195,3 +194,27 @@ int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
>  
>      return (ret < 0) ? -1 : 0;
>  }
> +
> +int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp)
> +{
> +    MultiFDRecvData *data = p->data;
> +    size_t ret;
> +    uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
> +
> +    if (flags != MULTIFD_FLAG_NOCOMP) {
> +        error_setg(errp, "multifd %u: flags received %x flags expected %x",
> +                   p->id, flags, MULTIFD_FLAG_NOCOMP);
> +        return -1;
> +    }

This chunk can be dropped?  There's no packet in fixed-ram, this check is
no-op only because MULTIFD_FLAG_NOCOMP==0, iiuc.

The check should be done OTOH to make sure fixed-ram don't run together
with any multifd compressions enabled.  I remember we discussed this
before.  Is it still missing?  Note that multifd compression has its own
parameter (rather than the COMPRESS capability), it should be a check
against migrate_multifd_compression()==MULTIFD_COMPRESSION_NONE.

> +
> +    ret = qio_channel_pread(p->c, (char *) data->opaque,
> +                            data->size, data->file_offset, errp);
> +    if (ret != data->size) {
> +        error_prepend(errp,
> +                      "multifd recv (%u): read 0x%zx, expected 0x%zx",
> +                      p->id, ret, data->size);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> diff --git a/migration/file.h b/migration/file.h
> index 390dcc6821..9fe8af73fc 100644
> --- a/migration/file.h
> +++ b/migration/file.h
> @@ -11,6 +11,7 @@
>  #include "qapi/qapi-types-migration.h"
>  #include "io/task.h"
>  #include "channel.h"
> +#include "multifd.h"
>  
>  void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp);
>  
> @@ -22,4 +23,5 @@ bool file_send_channel_create(gpointer opaque, Error **errp);
>  int file_send_channel_destroy(QIOChannel *ioc);
>  int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
>                              int niov, RAMBlock *block, Error **errp);
> +int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
>  #endif
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b251c58ec2..a0202b5661 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -18,7 +18,6 @@
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
>  #include "file.h"
> -#include "ram.h"
>  #include "migration.h"
>  #include "migration-stats.h"
>  #include "socket.h"
> @@ -251,9 +250,9 @@ static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
>              p->iov[i].iov_len = p->page_size;
>          }
>          return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
> +    } else {
> +        return multifd_file_recv_data(p, errp);
>      }
> -
> -    return 0;
>  }
>  
>  static MultiFDMethods multifd_nocomp_ops = {
> @@ -1317,13 +1316,40 @@ void multifd_recv_cleanup(void)
>      multifd_recv_cleanup_state();
>  }
>  
> +
> +/*
> + * Wait until all channels have finished receiving data. Once this
> + * function returns, cleanup routines are safe to run.
> + */
> +static void multifd_file_recv_sync(void)
> +{
> +    int i;
> +
> +    for (i = 0; i < migrate_multifd_channels(); i++) {
> +        MultiFDRecvParams *p = &multifd_recv_state->params[i];
> +
> +        trace_multifd_recv_sync_main_wait(p->id);
> +
> +        qemu_sem_post(&p->sem);
> +
> +        trace_multifd_recv_sync_main_signal(p->id);
> +        qemu_sem_wait(&p->sem_sync);
> +    }
> +    return;
> +}
> +
>  void multifd_recv_sync_main(void)
>  {
>      int i;
>  
> -    if (!migrate_multifd() || !multifd_use_packets()) {
> +    if (!migrate_multifd()) {
>          return;
>      }
> +
> +    if (!multifd_use_packets()) {
> +        return multifd_file_recv_sync();
> +    }
> +
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 135f6ed098..8f89199721 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -13,6 +13,8 @@
>  #ifndef QEMU_MIGRATION_MULTIFD_H
>  #define QEMU_MIGRATION_MULTIFD_H
>  
> +#include "ram.h"
> +
>  typedef struct MultiFDRecvData MultiFDRecvData;
>  
>  bool multifd_send_setup(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index ad540ae9ce..826ac745a0 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -111,6 +111,7 @@
>   * pages region in the migration file at a time.
>   */
>  #define FIXED_RAM_LOAD_BUF_SIZE 0x100000
> +#define FIXED_RAM_MULTIFD_LOAD_BUF_SIZE 0x100000
>  
>  XBZRLECacheStats xbzrle_counters;
>  
> @@ -3950,6 +3951,27 @@ void colo_flush_ram_cache(void)
>      trace_colo_flush_ram_cache_end();
>  }
>  
> +static size_t ram_load_multifd_pages(void *host_addr, size_t size,
> +                                     uint64_t offset)
> +{
> +    MultiFDRecvData *data = multifd_get_recv_data();
> +
> +    /*
> +     * Pointing the opaque directly to the host buffer, no
> +     * preprocessing needed.
> +     */
> +    data->opaque = host_addr;
> +

nit: unneeded newline?  There's a similar one in send side.  I'd drop the
comment altogether as it's not extremely helpful.  Maybe we can directly
use data->host_addr already (as it always reads chunk into a host buffer)?

> +    data->file_offset = offset;
> +    data->size = size;
> +
> +    if (!multifd_recv()) {
> +        return 0;
> +    }
> +
> +    return size;
> +}
> +
>  static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
>                                      long num_pages, unsigned long *bitmap,
>                                      Error **errp)
> @@ -3959,6 +3981,8 @@ static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
>      ram_addr_t offset;
>      void *host;
>      size_t read, unread, size;
> +    size_t buf_size = (migrate_multifd() ? FIXED_RAM_MULTIFD_LOAD_BUF_SIZE :
> +                       FIXED_RAM_LOAD_BUF_SIZE);

Are they the same?  Maybe we don't need the new one until we want to make
it different?

>  
>      for (set_bit_idx = find_first_bit(bitmap, num_pages);
>           set_bit_idx < num_pages;
> @@ -3977,10 +4001,16 @@ static bool read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
>                  return false;
>              }
>  
> -            size = MIN(unread, FIXED_RAM_LOAD_BUF_SIZE);
> +            size = MIN(unread, buf_size);
> +
> +            if (migrate_multifd()) {
> +                read = ram_load_multifd_pages(host, size,
> +                                              block->pages_offset + offset);
> +            } else {
> +                read = qemu_get_buffer_at(f, host, size,
> +                                          block->pages_offset + offset);
> +            }
>  
> -            read = qemu_get_buffer_at(f, host, size,
> -                                      block->pages_offset + offset);
>              if (!read) {
>                  goto err;
>              }
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 25/34] migration/multifd: Add fixed-ram support to fd: URI
  2024-02-20 22:41 ` [PATCH v4 25/34] migration/multifd: Add fixed-ram support to fd: URI Fabiano Rosas
@ 2024-02-26  8:37   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  8:37 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 20, 2024 at 07:41:29PM -0300, Fabiano Rosas wrote:
> If we receive a file descriptor that points to a regular file, there's
> nothing stopping us from doing multifd migration with fixed-ram to
> that file.
> 
> Enable the fd: URI to work with multifd + fixed-ram.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/fd.c        | 30 ++++++++++++++++++++++++++++++
>  migration/fd.h        |  1 +
>  migration/file.c      | 12 +++++++++---
>  migration/migration.c |  4 ++++
>  4 files changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/fd.c b/migration/fd.c
> index 0eb677dcae..b7e4d071a4 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -19,14 +19,28 @@
>  #include "fd.h"
>  #include "migration.h"
>  #include "monitor/monitor.h"
> +#include "io/channel-file.h"
>  #include "io/channel-util.h"
> +#include "options.h"
>  #include "trace.h"
>  
>  
> +static struct FdOutgoingArgs {
> +    int fd;
> +} outgoing_args;
> +
> +int fd_args_get_fd(void)
> +{
> +    return outgoing_args.fd;
> +}
> +
>  void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>  {
>      QIOChannel *ioc;
>      int fd = monitor_get_fd(monitor_cur(), fdname, errp);
> +
> +    outgoing_args.fd = -1;

I suggest we either drop this, or close() it before releasing; basically
each fd reference holds a real fd index would be easier, IMHO.

Also, we'd want to free the fd (by closing it) just like we free
outgoing_args.fname?  Otherwise if a fd is passed in, who is responsible to
close it and release the fd resource?

> +
>      if (fd == -1) {
>          return;
>      }
> @@ -38,6 +52,8 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **
>          return;
>      }
>  
> +    outgoing_args.fd = fd;

If agree with above, then dup(fd) here.

> +
>      qio_channel_set_name(ioc, "migration-fd-outgoing");
>      migration_channel_connect(s, ioc, NULL, NULL);
>      object_unref(OBJECT(ioc));
> @@ -73,4 +89,18 @@ void fd_start_incoming_migration(const char *fdname, Error **errp)
>                                 fd_accept_incoming_migration,
>                                 NULL, NULL,
>                                 g_main_context_get_thread_default());
> +
> +    if (migrate_multifd()) {
> +        int channels = migrate_multifd_channels();
> +
> +        while (channels--) {
> +            ioc = QIO_CHANNEL(qio_channel_file_new_fd(fd));

dup(fd)?

> +
> +            qio_channel_set_name(ioc, "migration-fd-incoming");
> +            qio_channel_add_watch_full(ioc, G_IO_IN,
> +                                       fd_accept_incoming_migration,
> +                                       NULL, NULL,
> +                                       g_main_context_get_thread_default());
> +        }
> +    }
>  }
> diff --git a/migration/fd.h b/migration/fd.h
> index b901bc014e..1be980c130 100644
> --- a/migration/fd.h
> +++ b/migration/fd.h
> @@ -20,4 +20,5 @@ void fd_start_incoming_migration(const char *fdname, Error **errp);
>  
>  void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
>                                   Error **errp);
> +int fd_args_get_fd(void);
>  #endif
> diff --git a/migration/file.c b/migration/file.c
> index 1a18e608fc..27ccfc6a1d 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -11,6 +11,7 @@
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
>  #include "channel.h"
> +#include "fd.h"
>  #include "file.h"
>  #include "migration.h"
>  #include "io/channel-file.h"
> @@ -58,10 +59,15 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
>  {
>      QIOChannelFile *ioc;
>      int flags = O_WRONLY;
> +    int fd = fd_args_get_fd();
>  
> -    ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
> -    if (!ioc) {
> -        return false;
> +    if (fd && fd != -1) {
> +        ioc = qio_channel_file_new_fd(fd);
> +    } else {
> +        ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
> +        if (!ioc) {
> +            return false;
> +        }
>      }
>  
>      if (!multifd_channel_connect(opaque, QIO_CHANNEL(ioc), errp)) {
> diff --git a/migration/migration.c b/migration/migration.c
> index 32b291a282..ce7e6f5065 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -134,6 +134,10 @@ static bool transport_supports_multi_channels(MigrationAddress *addr)
>      if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
>          SocketAddress *saddr = &addr->u.socket;
>  
> +        if (saddr->type == SOCKET_ADDRESS_TYPE_FD) {
> +            return migrate_fixed_ram();
> +        }
> +
>          return (saddr->type == SOCKET_ADDRESS_TYPE_INET ||
>                  saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
>                  saddr->type == SOCKET_ADDRESS_TYPE_VSOCK);
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 26/34] tests/qtest/migration: Add a multifd + fixed-ram migration test
  2024-02-20 22:41 ` [PATCH v4 26/34] tests/qtest/migration: Add a multifd + fixed-ram migration test Fabiano Rosas
@ 2024-02-26  8:42   ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-26  8:42 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

On Tue, Feb 20, 2024 at 07:41:30PM -0300, Fabiano Rosas wrote:
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

One question to double check with you:

[...]

> +#ifndef _WIN32
> +    migration_test_add("/migration/multifd/fd/fixed-ram",
> +                       test_multifd_fd_fixed_ram);
> +#endif

I know we mostly ever use _WIN32 to check these, but why not CONFIG_POSIX?

commit d7613ee2165769303d0fa31069c4b6a840f0dae2
Author: Bin Meng <bin.meng@windriver.com>
Date:   Wed Aug 24 17:39:59 2022 +0800

    tests/qtest: migration-test: Skip running test_migrate_fd_proto on win32

It wanted to avoid socketpair(), which makes sense.  However e.g. qmp cmd
"getfd" is with CONFIG_POSIX.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 27/34] migration: Add direct-io parameter
  2024-02-20 22:41 ` [PATCH v4 27/34] migration: Add direct-io parameter Fabiano Rosas
  2024-02-21  9:17   ` Markus Armbruster
@ 2024-02-26  8:50   ` Peter Xu
  2024-02-26 13:28     ` Fabiano Rosas
  1 sibling, 1 reply; 79+ messages in thread
From: Peter Xu @ 2024-02-26  8:50 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Eric Blake

On Tue, Feb 20, 2024 at 07:41:31PM -0300, Fabiano Rosas wrote:
> Add the direct-io migration parameter that tells the migration code to
> use O_DIRECT when opening the migration stream file whenever possible.
> 
> This is currently only used with the fixed-ram migration that has a
> clear window guaranteed to perform aligned writes.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

I didn't read into this patch and followings yet.

I think we have a discussion last time and the plan is we hopefully can
merge part of fixed-ram already for 9.0 (March 12th softfreeze).

I suggest we focus with the first 26 patches and land them first if
possible.  If you agree then feel free to respin without direct-ios.  Then
we can keep the discussions separate, and direct-ios can be concurrently
discussed, but then posted as another patchset (with proper based-on:
tags)?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 27/34] migration: Add direct-io parameter
  2024-02-26  8:50   ` Peter Xu
@ 2024-02-26 13:28     ` Fabiano Rosas
  0 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-26 13:28 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Eric Blake

Peter Xu <peterx@redhat.com> writes:

> On Tue, Feb 20, 2024 at 07:41:31PM -0300, Fabiano Rosas wrote:
>> Add the direct-io migration parameter that tells the migration code to
>> use O_DIRECT when opening the migration stream file whenever possible.
>> 
>> This is currently only used with the fixed-ram migration that has a
>> clear window guaranteed to perform aligned writes.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> I didn't read into this patch and followings yet.
>
> I think we have a discussion last time and the plan is we hopefully can
> merge part of fixed-ram already for 9.0 (March 12th softfreeze).
>
> I suggest we focus with the first 26 patches and land them first if
> possible.  If you agree then feel free to respin without direct-ios.  Then
> we can keep the discussions separate, and direct-ios can be concurrently
> discussed, but then posted as another patchset (with proper based-on:
> tags)?

Ok, I'll be working on the respin without direct-io.

Thanks

>
> Thanks,


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 19/34] migration/multifd: Allow receiving pages without packets
  2024-02-26  6:58   ` Peter Xu
@ 2024-02-26 19:19     ` Fabiano Rosas
  2024-02-26 20:54       ` Fabiano Rosas
  0 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-26 19:19 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

Peter Xu <peterx@redhat.com> writes:

> On Tue, Feb 20, 2024 at 07:41:23PM -0300, Fabiano Rosas wrote:
>> Currently multifd does not need to have knowledge of pages on the
>> receiving side because all the information needed is within the
>> packets that come in the stream.
>> 
>> We're about to add support to fixed-ram migration, which cannot use
>> packets because it expects the ramblock section in the migration file
>> to contain only the guest pages data.
>> 
>> Add a data structure to transfer pages between the ram migration code
>> and the multifd receiving threads.
>> 
>> We don't want to reuse MultiFDPages_t for two reasons:
>> 
>> a) multifd threads don't really need to know about the data they're
>>    receiving.
>> 
>> b) the receiving side has to be stopped to load the pages, which means
>>    we can experiment with larger granularities than page size when
>>    transferring data.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> @Peter: a 'quit' flag cannot be used instead of pending_job. The
>> receiving thread needs know there's no more data coming. If the
>> migration thread sets a 'quit' flag, the multifd thread would see the
>> flag right away and exit.
>
> Hmm.. isn't this exactly what we want?  I'll comment for this inline below.
>
>> The only way is to clear pending_job on the
>> thread and spin once more.
>> ---
>>  migration/file.c    |   1 +
>>  migration/multifd.c | 122 +++++++++++++++++++++++++++++++++++++++++---
>>  migration/multifd.h |  15 ++++++
>>  3 files changed, 131 insertions(+), 7 deletions(-)
>> 
>> diff --git a/migration/file.c b/migration/file.c
>> index 5d4975f43e..22d052a71f 100644
>> --- a/migration/file.c
>> +++ b/migration/file.c
>> @@ -6,6 +6,7 @@
>>   */
>>  
>>  #include "qemu/osdep.h"
>> +#include "exec/ramblock.h"
>>  #include "qemu/cutils.h"
>>  #include "qapi/error.h"
>>  #include "channel.h"
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 0a5279314d..45a0c7aaa8 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -81,9 +81,15 @@ struct {
>>  
>>  struct {
>>      MultiFDRecvParams *params;
>> +    MultiFDRecvData *data;
>>      /* number of created threads */
>>      int count;
>> -    /* syncs main thread and channels */
>> +    /*
>> +     * For sockets: this is posted once for each MULTIFD_FLAG_SYNC flag.
>> +     *
>> +     * For files: this is only posted at the end of the file load to mark
>> +     *            completion of the load process.
>> +     */
>>      QemuSemaphore sem_sync;
>>      /* global number of generated multifd packets */
>>      uint64_t packet_num;
>> @@ -1110,6 +1116,53 @@ bool multifd_send_setup(void)
>>      return true;
>>  }
>>  
>> +bool multifd_recv(void)
>> +{
>> +    int i;
>> +    static int next_recv_channel;
>> +    MultiFDRecvParams *p = NULL;
>> +    MultiFDRecvData *data = multifd_recv_state->data;
>
> [1]
>
>> +
>> +    /*
>> +     * next_channel can remain from a previous migration that was
>> +     * using more channels, so ensure it doesn't overflow if the
>> +     * limit is lower now.
>> +     */
>> +    next_recv_channel %= migrate_multifd_channels();
>> +    for (i = next_recv_channel;; i = (i + 1) % migrate_multifd_channels()) {
>> +        if (multifd_recv_should_exit()) {
>> +            return false;
>> +        }
>> +
>> +        p = &multifd_recv_state->params[i];
>> +
>> +        /*
>> +         * Safe to read atomically without a lock because the flag is
>> +         * only set by this function below. Reading an old value of
>> +         * true is not an issue because it would only send us looking
>> +         * for the next idle channel.
>> +         */
>> +        if (qatomic_read(&p->pending_job) == false) {
>> +            next_recv_channel = (i + 1) % migrate_multifd_channels();
>> +            break;
>> +        }
>> +    }
>
> IIUC you'll need an smp_mb_acquire() here.  The ordering of "reading
> pending_job" and below must be guaranteed, similar to the sender side.
>

I've been thinking about this even on the sending side.

We shouldn't need the barrier here because there's a control flow
dependency on breaking the loop. I think pending_job *must* be read
prior to here, otherwise the program is just wrong. Does that make
sense?

>> +
>> +    assert(!p->data->size);
>> +    multifd_recv_state->data = p->data;
>
> [2]
>
>> +    p->data = data;
>> +
>> +    qatomic_set(&p->pending_job, true);
>
> Then here:
>
>        qatomic_store_release(&p->pending_job, true);

Ok.

>
> Please consider add comment above all acquire/releases pairs like sender
> too.
>
>> +    qemu_sem_post(&p->sem);
>> +
>> +    return true;
>> +}
>> +
>> +MultiFDRecvData *multifd_get_recv_data(void)
>> +{
>> +    return multifd_recv_state->data;
>> +}
>
> Can also use it above [1].
>
> I'm thinking maybe we can do something like:
>
> #define  MULTIFD_RECV_DATA_GLOBAL  (multifd_recv_state->data)
>
> Then we can also use it at [2], and replace multifd_get_recv_data()?
>

We need the helper because multifd_recv_state->data needs to be
accessible from ram.c in patch 24.

>> + static void multifd_recv_terminate_threads(Error *err) { int i; @@
>> -1134,11 +1187,26 @@ static void multifd_recv_terminate_threads(Error
>> *err) MultiFDRecvParams *p = &multifd_recv_state->params[i];
>>  
>>          /*
>> -         * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
>> -         * however try to wakeup it without harm in cleanup phase.
>> +         * The migration thread and channels interact differently
>> +         * depending on the presence of packets.
>>           */
>>          if (multifd_use_packets()) {
>> +            /*
>> +             * The channel receives as long as there are packets. When
>> +             * packets end (i.e. MULTIFD_FLAG_SYNC is reached), the
>> +             * channel waits for the migration thread to sync. If the
>> +             * sync never happens, do it here.
>> +             */
>>              qemu_sem_post(&p->sem_sync);
>> +        } else {
>> +            /*
>> +             * The channel waits for the migration thread to give it
>> +             * work. When the migration thread runs out of work, it
>> +             * releases the channel and waits for any pending work to
>> +             * finish. If we reach here (e.g. due to error) before the
>> +             * work runs out, release the channel.
>> +             */
>> +            qemu_sem_post(&p->sem);
>>          }
>>  
>>          /*
>> @@ -1167,6 +1235,7 @@ static void multifd_recv_cleanup_channel(MultiFDRecvParams *p)
>>      p->c = NULL;
>>      qemu_mutex_destroy(&p->mutex);
>>      qemu_sem_destroy(&p->sem_sync);
>> +    qemu_sem_destroy(&p->sem);
>>      g_free(p->name);
>>      p->name = NULL;
>>      p->packet_len = 0;
>> @@ -1184,6 +1253,8 @@ static void multifd_recv_cleanup_state(void)
>>      qemu_sem_destroy(&multifd_recv_state->sem_sync);
>>      g_free(multifd_recv_state->params);
>>      multifd_recv_state->params = NULL;
>> +    g_free(multifd_recv_state->data);
>> +    multifd_recv_state->data = NULL;
>>      g_free(multifd_recv_state);
>>      multifd_recv_state = NULL;
>>  }
>> @@ -1251,11 +1322,11 @@ static void *multifd_recv_thread(void *opaque)
>>          bool has_data = false;
>>          p->normal_num = 0;
>>  
>> -        if (multifd_recv_should_exit()) {
>> -            break;
>> -        }
>> -
>>          if (use_packets) {
>> +            if (multifd_recv_should_exit()) {
>> +                break;
>> +            }
>> +
>>              ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>>                                             p->packet_len, &local_err);
>>              if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
>> @@ -1274,6 +1345,26 @@ static void *multifd_recv_thread(void *opaque)
>>              p->flags &= ~MULTIFD_FLAG_SYNC;
>>              has_data = !!p->normal_num;
>>              qemu_mutex_unlock(&p->mutex);
>> +        } else {
>> +            /*
>> +             * No packets, so we need to wait for the vmstate code to
>> +             * give us work.
>> +             */
>> +            qemu_sem_wait(&p->sem);
>> +
>> +            if (multifd_recv_should_exit()) {
>> +                break;
>> +            }
>> +
>> +            /*
>> +             * Migration thread did not send work, break and signal
>> +             * sem_sync so it knows we're not lagging behind.
>> +             */
>> +            if (!qatomic_read(&p->pending_job)) {
>> +                break;
>> +            }
>
> In reality, this _must_ be true when reaching here, right?  Since AFAIU
> recv side p->sem is posted only in two conditions:
>
>   1) when there is work (pending_job==true)
>   2) when terminating threads (multifd_recv_should_exit==true)

    3) at multifd_recv_sync_main (pending_job state is unknown)

>
> Then if 2) is checked above, I assume 1) must be the case here?
>

The issue is that 'exiting' is global while p->pending_job is
per-channel. Whenever we set 'exiting', there's no guarantee that all
channels have already passed the should_exit check. Some of them could
still have pending_job=true by the time they see the exiting flag.

We queue all the jobs and immediately call recv_sync_main. It doesn't
matter that all jobs are queued and that we know for sure the work is
done. What matters is that each channel gets to finish its work before
it sees the exit flag. And that depends on checking pending_job.

>> +
>> +            has_data = !!p->data->size;
>>          }
>>  
>>          if (has_data) {
>> @@ -1288,9 +1379,17 @@ static void *multifd_recv_thread(void *opaque)
>>                  qemu_sem_post(&multifd_recv_state->sem_sync);
>>                  qemu_sem_wait(&p->sem_sync);
>>              }
>> +        } else {
>> +            p->total_normal_pages += p->data->size / qemu_target_page_size();
>> +            p->data->size = 0;
>> +            qatomic_set(&p->pending_job, false);
>
> I think it needs to be:
>
>   qatomic_store_release(&p->pending_job, false);
>
> ?
>
> So as to guarantee when the other side sees pending_job==false, size must
> already have been reset.
>

Ok.

>>          }
>>      }
>>  
>> +    if (!use_packets) {
>> +        qemu_sem_post(&p->sem_sync);
>> +    }
>> +
>>      if (local_err) {
>>          multifd_recv_terminate_threads(local_err);
>>          error_free(local_err);
>> @@ -1320,6 +1419,10 @@ int multifd_recv_setup(Error **errp)
>>      thread_count = migrate_multifd_channels();
>>      multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
>>      multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
>> +
>> +    multifd_recv_state->data = g_new0(MultiFDRecvData, 1);
>> +    multifd_recv_state->data->size = 0;
>> +
>>      qatomic_set(&multifd_recv_state->count, 0);
>>      qatomic_set(&multifd_recv_state->exiting, 0);
>>      qemu_sem_init(&multifd_recv_state->sem_sync, 0);
>> @@ -1330,8 +1433,13 @@ int multifd_recv_setup(Error **errp)
>>  
>>          qemu_mutex_init(&p->mutex);
>>          qemu_sem_init(&p->sem_sync, 0);
>> +        qemu_sem_init(&p->sem, 0);
>> +        p->pending_job = false;
>>          p->id = i;
>>  
>> +        p->data = g_new0(MultiFDRecvData, 1);
>> +        p->data->size = 0;
>> +
>>          if (use_packets) {
>>              p->packet_len = sizeof(MultiFDPacket_t)
>>                  + sizeof(uint64_t) * page_count;
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index 9a6a7a72df..19188815a3 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -13,6 +13,8 @@
>>  #ifndef QEMU_MIGRATION_MULTIFD_H
>>  #define QEMU_MIGRATION_MULTIFD_H
>>  
>> +typedef struct MultiFDRecvData MultiFDRecvData;
>> +
>>  bool multifd_send_setup(void);
>>  void multifd_send_shutdown(void);
>>  int multifd_recv_setup(Error **errp);
>> @@ -23,6 +25,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>  void multifd_recv_sync_main(void);
>>  int multifd_send_sync_main(void);
>>  bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>> +bool multifd_recv(void);
>> +MultiFDRecvData *multifd_get_recv_data(void);
>>  
>>  /* Multifd Compression flags */
>>  #define MULTIFD_FLAG_SYNC (1 << 0)
>> @@ -63,6 +67,13 @@ typedef struct {
>>      RAMBlock *block;
>>  } MultiFDPages_t;
>>  
>> +struct MultiFDRecvData {
>> +    void *opaque;
>> +    size_t size;
>> +    /* for preadv */
>> +    off_t file_offset;
>> +};
>> +
>>  typedef struct {
>>      /* Fields are only written at creating/deletion time */
>>      /* No lock required for them, they are read only */
>> @@ -154,6 +165,8 @@ typedef struct {
>>  
>>      /* syncs main thread and channels */
>>      QemuSemaphore sem_sync;
>> +    /* sem where to wait for more work */
>> +    QemuSemaphore sem;
>>  
>>      /* this mutex protects the following parameters */
>>      QemuMutex mutex;
>> @@ -163,6 +176,8 @@ typedef struct {
>>      uint32_t flags;
>>      /* global number of generated multifd packets */
>>      uint64_t packet_num;
>> +    int pending_job;
>> +    MultiFDRecvData *data;
>>  
>>      /* thread local variables. No locking required */
>>  
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 19/34] migration/multifd: Allow receiving pages without packets
  2024-02-26 19:19     ` Fabiano Rosas
@ 2024-02-26 20:54       ` Fabiano Rosas
  0 siblings, 0 replies; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-26 20:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

Fabiano Rosas <farosas@suse.de> writes:

> Peter Xu <peterx@redhat.com> writes:
>
>> On Tue, Feb 20, 2024 at 07:41:23PM -0300, Fabiano Rosas wrote:
>>> Currently multifd does not need to have knowledge of pages on the
>>> receiving side because all the information needed is within the
>>> packets that come in the stream.
>>> 
>>> We're about to add support to fixed-ram migration, which cannot use
>>> packets because it expects the ramblock section in the migration file
>>> to contain only the guest pages data.
>>> 
>>> Add a data structure to transfer pages between the ram migration code
>>> and the multifd receiving threads.
>>> 
>>> We don't want to reuse MultiFDPages_t for two reasons:
>>> 
>>> a) multifd threads don't really need to know about the data they're
>>>    receiving.
>>> 
>>> b) the receiving side has to be stopped to load the pages, which means
>>>    we can experiment with larger granularities than page size when
>>>    transferring data.
>>> 
>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> ---
>>> @Peter: a 'quit' flag cannot be used instead of pending_job. The
>>> receiving thread needs know there's no more data coming. If the
>>> migration thread sets a 'quit' flag, the multifd thread would see the
>>> flag right away and exit.
>>
>> Hmm.. isn't this exactly what we want?  I'll comment for this inline below.
>>
>>> The only way is to clear pending_job on the
>>> thread and spin once more.
>>> ---
>>>  migration/file.c    |   1 +
>>>  migration/multifd.c | 122 +++++++++++++++++++++++++++++++++++++++++---
>>>  migration/multifd.h |  15 ++++++
>>>  3 files changed, 131 insertions(+), 7 deletions(-)
>>> 
>>> diff --git a/migration/file.c b/migration/file.c
>>> index 5d4975f43e..22d052a71f 100644
>>> --- a/migration/file.c
>>> +++ b/migration/file.c
>>> @@ -6,6 +6,7 @@
>>>   */
>>>  
>>>  #include "qemu/osdep.h"
>>> +#include "exec/ramblock.h"
>>>  #include "qemu/cutils.h"
>>>  #include "qapi/error.h"
>>>  #include "channel.h"
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index 0a5279314d..45a0c7aaa8 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -81,9 +81,15 @@ struct {
>>>  
>>>  struct {
>>>      MultiFDRecvParams *params;
>>> +    MultiFDRecvData *data;
>>>      /* number of created threads */
>>>      int count;
>>> -    /* syncs main thread and channels */
>>> +    /*
>>> +     * For sockets: this is posted once for each MULTIFD_FLAG_SYNC flag.
>>> +     *
>>> +     * For files: this is only posted at the end of the file load to mark
>>> +     *            completion of the load process.
>>> +     */
>>>      QemuSemaphore sem_sync;
>>>      /* global number of generated multifd packets */
>>>      uint64_t packet_num;
>>> @@ -1110,6 +1116,53 @@ bool multifd_send_setup(void)
>>>      return true;
>>>  }
>>>  
>>> +bool multifd_recv(void)
>>> +{
>>> +    int i;
>>> +    static int next_recv_channel;
>>> +    MultiFDRecvParams *p = NULL;
>>> +    MultiFDRecvData *data = multifd_recv_state->data;
>>
>> [1]
>>
>>> +
>>> +    /*
>>> +     * next_channel can remain from a previous migration that was
>>> +     * using more channels, so ensure it doesn't overflow if the
>>> +     * limit is lower now.
>>> +     */
>>> +    next_recv_channel %= migrate_multifd_channels();
>>> +    for (i = next_recv_channel;; i = (i + 1) % migrate_multifd_channels()) {
>>> +        if (multifd_recv_should_exit()) {
>>> +            return false;
>>> +        }
>>> +
>>> +        p = &multifd_recv_state->params[i];
>>> +
>>> +        /*
>>> +         * Safe to read atomically without a lock because the flag is
>>> +         * only set by this function below. Reading an old value of
>>> +         * true is not an issue because it would only send us looking
>>> +         * for the next idle channel.
>>> +         */
>>> +        if (qatomic_read(&p->pending_job) == false) {
>>> +            next_recv_channel = (i + 1) % migrate_multifd_channels();
>>> +            break;
>>> +        }
>>> +    }
>>
>> IIUC you'll need an smp_mb_acquire() here.  The ordering of "reading
>> pending_job" and below must be guaranteed, similar to the sender side.
>>
>
> I've been thinking about this even on the sending side.
>
> We shouldn't need the barrier here because there's a control flow
> dependency on breaking the loop. I think pending_job *must* be read
> prior to here, otherwise the program is just wrong. Does that make
> sense?

Hm, nevermind actually. We need to order this against data->size update
on the other thread anyway.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration
  2024-02-26  7:47   ` Peter Xu
@ 2024-02-26 22:52     ` Fabiano Rosas
  2024-02-27  3:52       ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-26 22:52 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

Peter Xu <peterx@redhat.com> writes:

> On Tue, Feb 20, 2024 at 07:41:26PM -0300, Fabiano Rosas wrote:
>> The fixed-ram migration can be performed live or non-live, but it is
>> always asynchronous, i.e. the source machine and the destination
>> machine are not migrating at the same time. We only need some pieces
>> of the multifd sync operations.
>> 
>> multifd_send_sync_main()
>> ------------------------
>>   Issued by the ram migration code on the migration thread, causes the
>>   multifd send channels to synchronize with the migration thread and
>>   makes the sending side emit a packet with the MULTIFD_FLUSH flag.
>> 
>>   With fixed-ram we want to maintain the sync on the sending side
>>   because that provides ordering between the rounds of dirty pages when
>>   migrating live.
>> 
>> MULTIFD_FLUSH
>> -------------
>>   On the receiving side, the presence of the MULTIFD_FLUSH flag on a
>>   packet causes the receiving channels to start synchronizing with the
>>   main thread.
>> 
>>   We're not using packets with fixed-ram, so there's no MULTIFD_FLUSH
>>   flag and therefore no channel sync on the receiving side.
>> 
>> multifd_recv_sync_main()
>> ------------------------
>>   Issued by the migration thread when the ram migration flag
>>   RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
>>   on the receiving side to start synchronizing with the recv
>>   channels. Due to compatibility, this is also issued when
>>   RAM_SAVE_FLAG_EOS is received.
>> 
>>   For fixed-ram we only need to synchronize the channels at the end of
>>   migration to avoid doing cleanup before the channels have finished
>>   their IO.
>> 
>> Make sure the multifd syncs are only issued at the appropriate
>> times. Note that due to pre-existing backward compatibility issues, we
>> have the multifd_flush_after_each_section property that enables an
>> older behavior of synchronizing channels more frequently (and
>> inefficiently). Fixed-ram should always run with that property
>> disabled (default).
>
> What if the user enables multifd_flush_after_each_section=true?
>
> IMHO we don't necessarily need to attach the fixed-ram loading flush to any
> flag in the stream.  For fixed-ram IIUC all the loads will happen in one
> shot of ram_load() anyway when parsing the ramblock list, so.. how about we
> decouple the fixed-ram load flush from the stream by always do a sync in
> ram_load() unconditionally?

I would like to. But it's not possible because ram_load() is called once
per section. So once for each EOS flag on the stream. We'll have at
least two calls to ram_load(), once due to qemu_savevm_state_iterate()
and another due to qemu_savevm_state_complete_precopy().

The fact that fixed-ram can use just one load doesn't change the fact
that we perform more than one "save". So we'll need to use the FLUSH
flag in this case unfortunately.

>
> @@ -4368,6 +4367,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              ret = ram_load_precopy(f);
>          }
>      }
> +
> +    /*
> +     * Fixed-ram migration may queue load tasks to multifd threads; make
> +     * sure they're all done.
> +     */
> +    if (migrate_fixed_ram() && migrate_multifd()) {
> +        multifd_recv_sync_main();
> +    }
> +
>      trace_ram_load_complete(ret, seq_iter);
>  
>      return ret;
>
> Then ram_load() always guarantees synchronous loading of pages, and
> fixed-ram will completely ignore multifd flushes (then we also skip it for
> the ram_save_complete() like what this patch does for the rest).
>
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/ram.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 5932e1b8e1..c7050f6f68 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1369,8 +1369,11 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>>                  if (ret < 0) {
>>                      return ret;
>>                  }
>> -                qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
>> -                qemu_fflush(f);
>> +
>> +                if (!migrate_fixed_ram()) {
>> +                    qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
>> +                    qemu_fflush(f);
>> +                }
>>              }
>>              /*
>>               * If memory migration starts over, we will meet a dirtied page
>> @@ -3112,7 +3115,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>          return ret;
>>      }
>>  
>> -    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
>> +    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()
>> +        && !migrate_fixed_ram()) {
>>          qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
>>      }
>>  
>> @@ -4253,6 +4257,15 @@ static int ram_load_precopy(QEMUFile *f)
>>              break;
>>          case RAM_SAVE_FLAG_EOS:
>>              /* normal exit */
>> +            if (migrate_fixed_ram()) {
>> +                /*
>> +                 * The EOS flag appears multiple times on the
>> +                 * stream. Fixed-ram needs only one sync at the
>> +                 * end. It will be done on the flush flag above.
>> +                 */
>> +                break;
>> +            }
>> +
>>              if (migrate_multifd() &&
>>                  migrate_multifd_flush_after_each_section()) {
>>                  multifd_recv_sync_main();
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration
  2024-02-26 22:52     ` Fabiano Rosas
@ 2024-02-27  3:52       ` Peter Xu
  2024-02-27 14:00         ` Fabiano Rosas
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2024-02-27  3:52 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Mon, Feb 26, 2024 at 07:52:20PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Tue, Feb 20, 2024 at 07:41:26PM -0300, Fabiano Rosas wrote:
> >> The fixed-ram migration can be performed live or non-live, but it is
> >> always asynchronous, i.e. the source machine and the destination
> >> machine are not migrating at the same time. We only need some pieces
> >> of the multifd sync operations.
> >> 
> >> multifd_send_sync_main()
> >> ------------------------
> >>   Issued by the ram migration code on the migration thread, causes the
> >>   multifd send channels to synchronize with the migration thread and
> >>   makes the sending side emit a packet with the MULTIFD_FLUSH flag.
> >> 
> >>   With fixed-ram we want to maintain the sync on the sending side
> >>   because that provides ordering between the rounds of dirty pages when
> >>   migrating live.
> >> 
> >> MULTIFD_FLUSH
> >> -------------
> >>   On the receiving side, the presence of the MULTIFD_FLUSH flag on a
> >>   packet causes the receiving channels to start synchronizing with the
> >>   main thread.
> >> 
> >>   We're not using packets with fixed-ram, so there's no MULTIFD_FLUSH
> >>   flag and therefore no channel sync on the receiving side.
> >> 
> >> multifd_recv_sync_main()
> >> ------------------------
> >>   Issued by the migration thread when the ram migration flag
> >>   RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
> >>   on the receiving side to start synchronizing with the recv
> >>   channels. Due to compatibility, this is also issued when
> >>   RAM_SAVE_FLAG_EOS is received.
> >> 
> >>   For fixed-ram we only need to synchronize the channels at the end of
> >>   migration to avoid doing cleanup before the channels have finished
> >>   their IO.
> >> 
> >> Make sure the multifd syncs are only issued at the appropriate
> >> times. Note that due to pre-existing backward compatibility issues, we
> >> have the multifd_flush_after_each_section property that enables an
> >> older behavior of synchronizing channels more frequently (and
> >> inefficiently). Fixed-ram should always run with that property
> >> disabled (default).
> >
> > What if the user enables multifd_flush_after_each_section=true?
> >
> > IMHO we don't necessarily need to attach the fixed-ram loading flush to any
> > flag in the stream.  For fixed-ram IIUC all the loads will happen in one
> > shot of ram_load() anyway when parsing the ramblock list, so.. how about we
> > decouple the fixed-ram load flush from the stream by always do a sync in
> > ram_load() unconditionally?
> 
> I would like to. But it's not possible because ram_load() is called once
> per section. So once for each EOS flag on the stream. We'll have at
> least two calls to ram_load(), once due to qemu_savevm_state_iterate()
> and another due to qemu_savevm_state_complete_precopy().
> 
> The fact that fixed-ram can use just one load doesn't change the fact
> that we perform more than one "save". So we'll need to use the FLUSH
> flag in this case unfortunately.

After I re-read it, I found one more issue.

Now recv side sync is "once and for all" - it doesn't allow a second time
to sync_main because it syncs only until quits.  That is IMHO making the
code much harder to maintain, and we'll need rich comment to explain why is
that happening.

Ideally any "sync main" for recv threads can be called multiple times.  And
IMHO it's not really hard.  Actually it can make the code much cleaner by
merging some logic between socket-based and file-based from that regard.

I tried to play with your branch and propose something like this, just to
show what I meant. This should allow all new fixed-ram test to pass here,
meanwhile it should allow sync main on recv side to be re-entrant, sharing
the logic with socket-based as much as possible:

=====
diff --git a/migration/multifd.c b/migration/multifd.c
index a0202b5661..28480f6cfe 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -86,10 +86,8 @@ struct {
     /* number of created threads */
     int count;
     /*
-     * For sockets: this is posted once for each MULTIFD_FLAG_SYNC flag.
-     *
-     * For files: this is only posted at the end of the file load to mark
-     *            completion of the load process.
+     * This is always posted by the recv threads, the main thread uses it
+     * to wait for recv threads to finish assigned tasks.
      */
     QemuSemaphore sem_sync;
     /* global number of generated multifd packets */
@@ -1316,38 +1314,55 @@ void multifd_recv_cleanup(void)
     multifd_recv_cleanup_state();
 }
 
-
-/*
- * Wait until all channels have finished receiving data. Once this
- * function returns, cleanup routines are safe to run.
- */
-static void multifd_file_recv_sync(void)
+static void multifd_recv_file_sync_request(void)
 {
     int i;
 
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
-        trace_multifd_recv_sync_main_wait(p->id);
-
+        /*
+         * We play a trick here: instead of using a separate pending_sync
+         * to send a sync request (like what we do on senders), we simply
+         * kick the recv thread once without setting pending_job.
+         *
+         * If there's already a pending_job, the thread will only see it
+         * after it processed the current.  If there's no pending_job,
+         * it'll see this immediately.
+         */
         qemu_sem_post(&p->sem);
-
         trace_multifd_recv_sync_main_signal(p->id);
-        qemu_sem_wait(&p->sem_sync);
     }
-    return;
 }
 
+/*
+ * Request a sync for all the multifd recv threads.
+ *
+ * For socket-based, sync request is much more complicated, which relies on
+ * collaborations between both explicit RAM_SAVE_FLAG_MULTIFD_FLUSH in the
+ * main stream, and MULTIFD_FLAG_SYNC flag in per-channel protocol.  Here
+ * it should be invoked by the main stream request.
+ *
+ * For file-based, it is much simpler, because there's no need for a strong
+ * sync semantics between the main thread and the recv threads.  What we
+ * need is only to make sure all recv threads finished their tasks.
+ */
 void multifd_recv_sync_main(void)
 {
+    bool file_based = !multifd_use_packets();
     int i;
 
     if (!migrate_multifd()) {
         return;
     }
 
-    if (!multifd_use_packets()) {
-        return multifd_file_recv_sync();
+    if (file_based) {
+        /*
+         * File-based multifd requires an explicit sync request because
+         * tasks are assigned by the main recv thread, rather than parsed
+         * through the multifd channels.
+         */
+        multifd_recv_file_sync_request();
     }
 
     for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -1356,6 +1371,11 @@ void multifd_recv_sync_main(void)
         trace_multifd_recv_sync_main_wait(p->id);
         qemu_sem_wait(&multifd_recv_state->sem_sync);
     }
+
+    if (file_based) {
+        return;
+    }
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
@@ -1420,11 +1440,12 @@ static void *multifd_recv_thread(void *opaque)
             }
 
             /*
-             * Migration thread did not send work, break and signal
-             * sem_sync so it knows we're not lagging behind.
+             * Migration thread did not send work, this emulates
+             * pending_sync, post sem_sync to notify the main thread.
              */
             if (!qatomic_read(&p->pending_job)) {
-                break;
+                qemu_sem_post(&multifd_recv_state->sem_sync);
+                continue;
             }
 
             has_data = !!p->data->size;
@@ -1449,10 +1470,6 @@ static void *multifd_recv_thread(void *opaque)
         }
     }
 
-    if (!use_packets) {
-        qemu_sem_post(&p->sem_sync);
-    }
-
     if (local_err) {
         multifd_recv_terminate_threads(local_err);
         error_free(local_err);

==========

Note that I used multifd_recv_state->sem_sync to send the message rather
than p->sem, not only because socket-based has similar logic on using that
sem, but also because main thread shouldn't care about "which" recv thread
has finished, but "all recv threads are idle".

Do you think this should work out for us in a nicer way?

Then we talk about the other issue, on whether we should rely on migration
stream to flush recv threads.  My answer is still hopefully a no.

In the ideal case, fixed-ram image format should even be tailed to not use
a live stream protocol.  For example, currently during ram iterations we
should flush quite a lot of ram QEMU_VM_SECTION_PART sections contains
mostly rubbish but then ending that with RAM_SAVE_FLAG_EOS. Then we keep
doing this in the iteration loop.  Here the real meat is during processing
of QEMU_VM_SECTION_PART, the src QEMU will update the guest pages with
fixed offsets in the file.  That however doesn't really contribute to
anything valuable in the migration stream itself (things sent over
to_dst_file).

AFAIU we chose to still use that logic only for simplicity, even if we know
those EOSs and all RAM streams are garbage.  Now we tend to add one
dependency on part of the garbage, which is RAM_SAVE_FLAG_MULTIFD_FLUSH in
this case; which is useful in socket-based but shouldn't be necessary for
file.

I think I have a solution besides ram_load(): ultimately fixed-ram stores
all guest mem in the QEMU_VM_SECTION_START section of the ram, through all
of the RAM_SAVE_FLAG_MEM_SIZE (which leads to parse_ramblocks()).  If so,
perhaps we can do one shot sync for file at the end of parse_ramblocks()?
Then we decouple sync_main on recv for file-based completely against all
stream flags.

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration
  2024-02-27  3:52       ` Peter Xu
@ 2024-02-27 14:00         ` Fabiano Rosas
  2024-02-27 23:46           ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Fabiano Rosas @ 2024-02-27 14:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

Peter Xu <peterx@redhat.com> writes:

> On Mon, Feb 26, 2024 at 07:52:20PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Tue, Feb 20, 2024 at 07:41:26PM -0300, Fabiano Rosas wrote:
>> >> The fixed-ram migration can be performed live or non-live, but it is
>> >> always asynchronous, i.e. the source machine and the destination
>> >> machine are not migrating at the same time. We only need some pieces
>> >> of the multifd sync operations.
>> >> 
>> >> multifd_send_sync_main()
>> >> ------------------------
>> >>   Issued by the ram migration code on the migration thread, causes the
>> >>   multifd send channels to synchronize with the migration thread and
>> >>   makes the sending side emit a packet with the MULTIFD_FLUSH flag.
>> >> 
>> >>   With fixed-ram we want to maintain the sync on the sending side
>> >>   because that provides ordering between the rounds of dirty pages when
>> >>   migrating live.
>> >> 
>> >> MULTIFD_FLUSH
>> >> -------------
>> >>   On the receiving side, the presence of the MULTIFD_FLUSH flag on a
>> >>   packet causes the receiving channels to start synchronizing with the
>> >>   main thread.
>> >> 
>> >>   We're not using packets with fixed-ram, so there's no MULTIFD_FLUSH
>> >>   flag and therefore no channel sync on the receiving side.
>> >> 
>> >> multifd_recv_sync_main()
>> >> ------------------------
>> >>   Issued by the migration thread when the ram migration flag
>> >>   RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
>> >>   on the receiving side to start synchronizing with the recv
>> >>   channels. Due to compatibility, this is also issued when
>> >>   RAM_SAVE_FLAG_EOS is received.
>> >> 
>> >>   For fixed-ram we only need to synchronize the channels at the end of
>> >>   migration to avoid doing cleanup before the channels have finished
>> >>   their IO.
>> >> 
>> >> Make sure the multifd syncs are only issued at the appropriate
>> >> times. Note that due to pre-existing backward compatibility issues, we
>> >> have the multifd_flush_after_each_section property that enables an
>> >> older behavior of synchronizing channels more frequently (and
>> >> inefficiently). Fixed-ram should always run with that property
>> >> disabled (default).
>> >
>> > What if the user enables multifd_flush_after_each_section=true?
>> >
>> > IMHO we don't necessarily need to attach the fixed-ram loading flush to any
>> > flag in the stream.  For fixed-ram IIUC all the loads will happen in one
>> > shot of ram_load() anyway when parsing the ramblock list, so.. how about we
>> > decouple the fixed-ram load flush from the stream by always do a sync in
>> > ram_load() unconditionally?
>> 
>> I would like to. But it's not possible because ram_load() is called once
>> per section. So once for each EOS flag on the stream. We'll have at
>> least two calls to ram_load(), once due to qemu_savevm_state_iterate()
>> and another due to qemu_savevm_state_complete_precopy().
>> 
>> The fact that fixed-ram can use just one load doesn't change the fact
>> that we perform more than one "save". So we'll need to use the FLUSH
>> flag in this case unfortunately.
>
> After I re-read it, I found one more issue.
>
> Now recv side sync is "once and for all" - it doesn't allow a second time
> to sync_main because it syncs only until quits.  That is IMHO making the
> code much harder to maintain, and we'll need rich comment to explain why is
> that happening.
>
> Ideally any "sync main" for recv threads can be called multiple times.  And
> IMHO it's not really hard.  Actually it can make the code much cleaner by
> merging some logic between socket-based and file-based from that regard.
>
> I tried to play with your branch and propose something like this, just to
> show what I meant. This should allow all new fixed-ram test to pass here,
> meanwhile it should allow sync main on recv side to be re-entrant, sharing
> the logic with socket-based as much as possible:
>
> =====
> diff --git a/migration/multifd.c b/migration/multifd.c
> index a0202b5661..28480f6cfe 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -86,10 +86,8 @@ struct {
>      /* number of created threads */
>      int count;
>      /*
> -     * For sockets: this is posted once for each MULTIFD_FLAG_SYNC flag.
> -     *
> -     * For files: this is only posted at the end of the file load to mark
> -     *            completion of the load process.
> +     * This is always posted by the recv threads, the main thread uses it
> +     * to wait for recv threads to finish assigned tasks.
>       */
>      QemuSemaphore sem_sync;
>      /* global number of generated multifd packets */
> @@ -1316,38 +1314,55 @@ void multifd_recv_cleanup(void)
>      multifd_recv_cleanup_state();
>  }
>  
> -
> -/*
> - * Wait until all channels have finished receiving data. Once this
> - * function returns, cleanup routines are safe to run.
> - */
> -static void multifd_file_recv_sync(void)
> +static void multifd_recv_file_sync_request(void)
>  {
>      int i;
>  
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
> -        trace_multifd_recv_sync_main_wait(p->id);
> -
> +        /*
> +         * We play a trick here: instead of using a separate pending_sync
> +         * to send a sync request (like what we do on senders), we simply
> +         * kick the recv thread once without setting pending_job.
> +         *
> +         * If there's already a pending_job, the thread will only see it
> +         * after it processed the current.  If there's no pending_job,
> +         * it'll see this immediately.
> +         */
>          qemu_sem_post(&p->sem);
> -
>          trace_multifd_recv_sync_main_signal(p->id);
> -        qemu_sem_wait(&p->sem_sync);
>      }
> -    return;
>  }
>  
> +/*
> + * Request a sync for all the multifd recv threads.
> + *
> + * For socket-based, sync request is much more complicated, which relies on
> + * collaborations between both explicit RAM_SAVE_FLAG_MULTIFD_FLUSH in the
> + * main stream, and MULTIFD_FLAG_SYNC flag in per-channel protocol.  Here
> + * it should be invoked by the main stream request.
> + *
> + * For file-based, it is much simpler, because there's no need for a strong
> + * sync semantics between the main thread and the recv threads.  What we
> + * need is only to make sure all recv threads finished their tasks.
> + */
>  void multifd_recv_sync_main(void)
>  {
> +    bool file_based = !multifd_use_packets();
>      int i;
>  
>      if (!migrate_multifd()) {
>          return;
>      }
>  
> -    if (!multifd_use_packets()) {
> -        return multifd_file_recv_sync();
> +    if (file_based) {
> +        /*
> +         * File-based multifd requires an explicit sync request because
> +         * tasks are assigned by the main recv thread, rather than parsed
> +         * through the multifd channels.
> +         */
> +        multifd_recv_file_sync_request();
>      }
>  
>      for (i = 0; i < migrate_multifd_channels(); i++) {
> @@ -1356,6 +1371,11 @@ void multifd_recv_sync_main(void)
>          trace_multifd_recv_sync_main_wait(p->id);
>          qemu_sem_wait(&multifd_recv_state->sem_sync);
>      }
> +
> +    if (file_based) {
> +        return;
> +    }
> +
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
> @@ -1420,11 +1440,12 @@ static void *multifd_recv_thread(void *opaque)
>              }
>  
>              /*
> -             * Migration thread did not send work, break and signal
> -             * sem_sync so it knows we're not lagging behind.
> +             * Migration thread did not send work, this emulates
> +             * pending_sync, post sem_sync to notify the main thread.
>               */
>              if (!qatomic_read(&p->pending_job)) {
> -                break;
> +                qemu_sem_post(&multifd_recv_state->sem_sync);
> +                continue;
>              }
>  
>              has_data = !!p->data->size;
> @@ -1449,10 +1470,6 @@ static void *multifd_recv_thread(void *opaque)
>          }
>      }
>  
> -    if (!use_packets) {
> -        qemu_sem_post(&p->sem_sync);
> -    }
> -
>      if (local_err) {
>          multifd_recv_terminate_threads(local_err);
>          error_free(local_err);
>
> ==========
>
> Note that I used multifd_recv_state->sem_sync to send the message rather
> than p->sem, not only because socket-based has similar logic on using that
> sem, but also because main thread shouldn't care about "which" recv thread
> has finished, but "all recv threads are idle".
>
> Do you think this should work out for us in a nicer way?
>

I don't really like the interleaving of file and socket logic at
multifd_recv_sync_main(), but I can live with it.

Waiting on multifd_recv_state->sem_sync is problematic because if the
thread has an error, that will hang forever.

Actually, I don't even see this being handled in _current_ code
anywhere, we probably have a bug there. I guess we need to add one more
"post this sem just because" somewhere. multifd_recv_kick_main probably.

> Then we talk about the other issue, on whether we should rely on migration
> stream to flush recv threads.  My answer is still hopefully a no.
>
> In the ideal case, fixed-ram image format should even be tailed to not use
> a live stream protocol.  For example, currently during ram iterations we
> should flush quite a lot of ram QEMU_VM_SECTION_PART sections contains
> mostly rubbish but then ending that with RAM_SAVE_FLAG_EOS. Then we keep
> doing this in the iteration loop.  Here the real meat is during processing
> of QEMU_VM_SECTION_PART, the src QEMU will update the guest pages with
> fixed offsets in the file.  That however doesn't really contribute to
> anything valuable in the migration stream itself (things sent over
> to_dst_file).
>
> AFAIU we chose to still use that logic only for simplicity, even if we know
> those EOSs and all RAM streams are garbage.  Now we tend to add one
> dependency on part of the garbage, which is RAM_SAVE_FLAG_MULTIFD_FLUSH in
> this case; which is useful in socket-based but shouldn't be necessary for
> file.
>
> I think I have a solution besides ram_load(): ultimately fixed-ram stores
> all guest mem in the QEMU_VM_SECTION_START section of the ram, through all
> of the RAM_SAVE_FLAG_MEM_SIZE (which leads to parse_ramblocks()).  If so,
> perhaps we can do one shot sync for file at the end of parse_ramblocks()?
> Then we decouple sync_main on recv for file-based completely against all
> stream flags.

Yeah, that could work. I think I'll blacklist all unused flags using the
invalid_flags logic.

Thanks


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration
  2024-02-27 14:00         ` Fabiano Rosas
@ 2024-02-27 23:46           ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2024-02-27 23:46 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana

On Tue, Feb 27, 2024 at 11:00:44AM -0300, Fabiano Rosas wrote:
> I don't really like the interleaving of file and socket logic at
> multifd_recv_sync_main(), but I can live with it.

The idea was to share the "wait" part and the semaphore.  If you don't like
the form of it, an alternative is we can provide three helpers (file_kick,
wait, socket_kick), then:

  if (file) {
    file_kick();
    wait();
  } else {
    wait();
    socket_kick();
  }

> 
> Waiting on multifd_recv_state->sem_sync is problematic because if the
> thread has an error, that will hang forever.
> 
> Actually, I don't even see this being handled in _current_ code
> anywhere, we probably have a bug there. I guess we need to add one more
> "post this sem just because" somewhere. multifd_recv_kick_main probably.

Might because dest qemu is even less of a concern? As if something wrong on
dest, then src is probably already failing the migration, then libvirt or
upper layer can directly kill dest qemu (while we can't do that to src).
But yeah we should still fix it at some point.. to make dest qemu quit
gracefully in error cases, and it'll also help more in the future if
multifd will support postcopy, then both src/dst can't be killed.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2024-02-27 23:48 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-20 22:41 [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 01/34] docs/devel/migration.rst: Document the file transport Fabiano Rosas
2024-02-23  3:01   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 02/34] tests/qtest/migration: Rename fd_proto test Fabiano Rosas
2024-02-23  3:03   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 03/34] tests/qtest/migration: Add a fd + file test Fabiano Rosas
2024-02-23  3:08   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 04/34] migration/multifd: Remove p->quit from recv side Fabiano Rosas
2024-02-23  3:13   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 05/34] migration/multifd: Release recv sem_sync earlier Fabiano Rosas
2024-02-23  3:16   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 06/34] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 07/34] io: Add generic pwritev/preadv interface Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 08/34] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 09/34] io: fsync before closing a file channel Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 10/34] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 11/34] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2024-02-21  8:41   ` Markus Armbruster
2024-02-21 13:24     ` Fabiano Rosas
2024-02-21 13:50       ` Daniel P. Berrangé
2024-02-21 15:05         ` Fabiano Rosas
2024-02-26  3:07   ` Peter Xu
2024-02-26  3:22   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 12/34] migration: Add fixed-ram URI compatibility check Fabiano Rosas
2024-02-26  3:11   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 13/34] migration/ram: Add outgoing 'fixed-ram' migration Fabiano Rosas
2024-02-26  4:03   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 14/34] migration/ram: Add incoming " Fabiano Rosas
2024-02-26  5:19   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 15/34] tests/qtest/migration: Add tests for fixed-ram file-based migration Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 16/34] migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 17/34] migration/multifd: Decouple recv method from pages Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 18/34] migration/multifd: Allow multifd without packets Fabiano Rosas
2024-02-26  5:57   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 19/34] migration/multifd: Allow receiving pages " Fabiano Rosas
2024-02-26  6:58   ` Peter Xu
2024-02-26 19:19     ` Fabiano Rosas
2024-02-26 20:54       ` Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 20/34] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2024-02-26  7:10   ` Peter Xu
2024-02-26  7:21   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 21/34] migration/multifd: Add incoming " Fabiano Rosas
2024-02-26  7:34   ` Peter Xu
2024-02-26  7:53     ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 22/34] migration/multifd: Prepare multifd sync for fixed-ram migration Fabiano Rosas
2024-02-26  7:47   ` Peter Xu
2024-02-26 22:52     ` Fabiano Rosas
2024-02-27  3:52       ` Peter Xu
2024-02-27 14:00         ` Fabiano Rosas
2024-02-27 23:46           ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 23/34] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2024-02-26  8:08   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 24/34] migration/multifd: Support incoming " Fabiano Rosas
2024-02-26  8:30   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 25/34] migration/multifd: Add fixed-ram support to fd: URI Fabiano Rosas
2024-02-26  8:37   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 26/34] tests/qtest/migration: Add a multifd + fixed-ram migration test Fabiano Rosas
2024-02-26  8:42   ` Peter Xu
2024-02-20 22:41 ` [PATCH v4 27/34] migration: Add direct-io parameter Fabiano Rosas
2024-02-21  9:17   ` Markus Armbruster
2024-02-26  8:50   ` Peter Xu
2024-02-26 13:28     ` Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 28/34] migration/multifd: Add direct-io support Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 29/34] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 30/34] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
2024-02-21  9:20   ` Markus Armbruster
2024-02-20 22:41 ` [PATCH v4 31/34] monitor: Extract fdset fd flags comparison into a function Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 32/34] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2024-02-21  9:27   ` Markus Armbruster
2024-02-21 13:37     ` Fabiano Rosas
2024-02-22  6:56       ` Markus Armbruster
2024-02-22 13:26         ` Fabiano Rosas
2024-02-22 14:44           ` Markus Armbruster
2024-02-20 22:41 ` [PATCH v4 33/34] migration: Add support for fdset with multifd + file Fabiano Rosas
2024-02-20 22:41 ` [PATCH v4 34/34] tests/qtest/migration: Add a test for fixed-ram with passing of fds Fabiano Rosas
2024-02-23  2:59 ` [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram Peter Xu
2024-02-23 13:48   ` Claudio Fontana
2024-02-23 14:22   ` Fabiano Rosas
2024-02-26  6:15 ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).