[PATCH v3 00/14] File-based migration support and fixed-ram features

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/14] File-based migration support and fixed-ram features
@ 2022-10-28 10:39 Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 01/14] migration: support file: uri for source migration Nikolay Borisov
                   ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Here's the 3rd version of file-based migration support [0]. For background
check the cover letter of the initial. The main changes are :

- Updated commit message as per Daniel Berrange's suggestino for Patches 1-2

- Fixed tab in various pages

- Added comments better explaining how json_writer_start_object in
qemu_savevm_state_header is matched and also squashed the analyze-migration.py
parts into patch 3

- Reworked the way pwritv/preadv are introduced. Now there are generic
callbacks in QIOChannel that are implemented for the QIOChannelFile.

- Separated the introduction of QEMUFile-related helpers from the patch
introducing the io interfaces.

- Added qtests for the file-based migration as well as for the fixed-ram
feature.

[0] https://lore.kernel.org/qemu-devel/20221004123733.2745519-1-nborisov@suse.com/

Nikolay Borisov (14):
  migration: support file: uri for source migration
  migration: Add support for 'file:' uri for incoming migration
  migration: Initial support of fixed-ram feature for
    analyze-migration.py
  io: Add generic pwritev/preadv interface
  io: implement io_pwritev for QIOChannelFile
  io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  migration/qemu-file: add utility methods for working with seekable
    channels
  io: Add preadv support to QIOChannelFile
  migration: add qemu_get_buffer_at
  migration/ram: Introduce 'fixed-ram' migration stream capability
  migration: Refactor precopy ram loading code
  migration: Add support for 'fixed-ram' migration restore
  tests: Add migrate_incoming_qmp helper
  tests/qtest: migration-test: Add tests for file-based migration

 include/exec/ramblock.h             |   7 +
 include/io/channel.h                |  50 +++++
 include/migration/qemu-file-types.h |   2 +
 io/channel-file.c                   |  61 ++++++
 io/channel.c                        |  26 +++
 migration/file.c                    |  38 ++++
 migration/file.h                    |  10 +
 migration/meson.build               |   1 +
 migration/migration.c               |  61 +++++-
 migration/migration.h               |   6 +
 migration/qemu-file.c               |  82 +++++++
 migration/qemu-file.h               |   4 +
 migration/ram.c                     | 328 +++++++++++++++++++++-------
 migration/savevm.c                  |  48 ++--
 qapi/migration.json                 |   2 +-
 scripts/analyze-migration.py        |  49 ++++-
 tests/qtest/migration-helpers.c     |  19 ++
 tests/qtest/migration-helpers.h     |   4 +
 tests/qtest/migration-test.c        |  46 ++++
 19 files changed, 743 insertions(+), 101 deletions(-)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

--
2.34.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 01/14] migration: support file: uri for source migration
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 02/14] migration: Add support for 'file:' uri for incoming migration Nikolay Borisov
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Implement support for a "file:" uri so that a migration can be initiated
directly to a file from QEMU.

Unlike other migration protocol backends, the 'file' protocol cannot
honour non-blocking mode. POSIX file/block storage will always report
ready to read/write, regardless of how slow the underlying storage
will be at servicing the request.

For outgoing migration this limitation is not a serious problem as
the migration data transfer always happens in a dedicated thread.
It may, however, result in delays in honouring a request to cancel
the migration operation.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 migration/file.c      | 23 +++++++++++++++++++++++
 migration/file.h      |  9 +++++++++
 migration/meson.build |  1 +
 migration/migration.c |  3 +++
 4 files changed, 36 insertions(+)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

diff --git a/migration/file.c b/migration/file.c
new file mode 100644
index 000000000000..02896a7cab99
--- /dev/null
+++ b/migration/file.c
@@ -0,0 +1,23 @@
+#include "qemu/osdep.h"
+#include "channel.h"
+#include "io/channel-file.h"
+#include "file.h"
+#include "qemu/error-report.h"
+
+
+void file_start_outgoing_migration(MigrationState *s, const char *fname, Error **errp)
+{
+	QIOChannelFile *ioc;
+
+	ioc = qio_channel_file_new_path(fname, O_CREAT|O_TRUNC|O_WRONLY, 0660, errp);
+	if (!ioc) {
+		error_report("Error creating a channel");
+		return;
+	}
+
+	qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-outgoing");
+	migration_channel_connect(s, QIO_CHANNEL(ioc), NULL, NULL);
+	object_unref(OBJECT(ioc));
+}
+
+
diff --git a/migration/file.h b/migration/file.h
new file mode 100644
index 000000000000..d476eb1157f9
--- /dev/null
+++ b/migration/file.h
@@ -0,0 +1,9 @@
+#ifndef QEMU_MIGRATION_FILE_H
+#define QEMU_MIGRATION_FILE_H
+
+void file_start_outgoing_migration(MigrationState *s,
+                                   const char *filename,
+                                   Error **errp);
+
+#endif
+
diff --git a/migration/meson.build b/migration/meson.build
index 690487cf1a81..30a8392701c3 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -17,6 +17,7 @@ softmmu_ss.add(files(
   'colo.c',
   'exec.c',
   'fd.c',
+  'file.c',
   'global_state.c',
   'migration.c',
   'multifd.c',
diff --git a/migration/migration.c b/migration/migration.c
index 739bb683f34f..b5373db38535 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -20,6 +20,7 @@
 #include "migration/blocker.h"
 #include "exec.h"
 #include "fd.h"
+#include "file.h"
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
@@ -2415,6 +2416,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         exec_start_outgoing_migration(s, p, &local_err);
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_outgoing_migration(s, p, &local_err);
+    } else if (strstart(uri, "file:", &p)) {
+	file_start_outgoing_migration(s, p, &local_err);
     } else {
         if (!(has_resume && resume)) {
             yank_unregister_instance(MIGRATION_YANK_INSTANCE);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 02/14] migration: Add support for 'file:' uri for incoming migration
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 01/14] migration: support file: uri for source migration Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 15:58   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 03/14] migration: Initial support of fixed-ram feature for analyze-migration.py Nikolay Borisov
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

This is a counterpart to the 'file:' uri support for source migration,
now a file can also serve as the source of an incoming migration.

Unlike other migration protocol backends, the 'file' protocol cannot
honour non-blocking mode. POSIX file/block storage will always report
ready to read/write, regardless of how slow the underlying storage
will be at servicing the request.

For incoming migration this limitation may result in the main event
loop not being fully responsive while loading the VM state. This
won't impact the VM since it is not running at this phase, however,
it may impact management applications.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 migration/file.c      | 15 +++++++++++++++
 migration/file.h      |  1 +
 migration/migration.c |  2 ++
 3 files changed, 18 insertions(+)

diff --git a/migration/file.c b/migration/file.c
index 02896a7cab99..93eb718aa0f4 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -21,3 +21,18 @@ void file_start_outgoing_migration(MigrationState *s, const char *fname, Error *
 }
 
 
+void file_start_incoming_migration(const char *fname, Error **errp)
+{
+	QIOChannelFile *ioc;
+
+	ioc = qio_channel_file_new_path(fname, O_RDONLY, 0, errp);
+	if (!ioc) {
+		error_report("Error creating a channel");
+		return;
+	}
+
+	qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
+	migration_channel_process_incoming(QIO_CHANNEL(ioc));
+	object_unref(OBJECT(ioc));
+}
+
diff --git a/migration/file.h b/migration/file.h
index d476eb1157f9..cdbd291322d4 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -5,5 +5,6 @@ void file_start_outgoing_migration(MigrationState *s,
                                    const char *filename,
                                    Error **errp);
 
+void file_start_incoming_migration(const char *fname, Error **errp);
 #endif
 
diff --git a/migration/migration.c b/migration/migration.c
index b5373db38535..eafd887254dd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -506,6 +506,8 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
         exec_start_incoming_migration(p, errp);
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_incoming_migration(p, errp);
+    } else if (strstart(uri, "file:", &p)) {
+        file_start_incoming_migration(p, errp);
     } else {
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 02/14] migration: Add support for 'file:' uri for incoming migration
  2022-10-28 10:39 ` [PATCH v3 02/14] migration: Add support for 'file:' uri for incoming migration Nikolay Borisov
@ 2023-02-10 15:58   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 15:58 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:02PM +0300, Nikolay Borisov wrote:
> This is a counterpart to the 'file:' uri support for source migration,
> now a file can also serve as the source of an incoming migration.
> 
> Unlike other migration protocol backends, the 'file' protocol cannot
> honour non-blocking mode. POSIX file/block storage will always report
> ready to read/write, regardless of how slow the underlying storage
> will be at servicing the request.
> 
> For incoming migration this limitation may result in the main event
> loop not being fully responsive while loading the VM state. This
> won't impact the VM since it is not running at this phase, however,
> it may impact management applications.

Looking at the QEMU incoming migration code, we're relying on
coroutines. Specifically we initiate the load of state / RAM
in process_incoming_migration_co, and the coroutine magic
is handled in qemu_fill_buffer in migration/qemu-file.c which
does the coroutine yield.

This should be allowing the QMP monitor to be functional while
incoming migration is loaded. That said, I expect a great many
QMP commands will cause problems if invoked, especially any
that hotplug/unplug stuff. IOW, probably only safe to use the
query-xxx commands in general.

With the unhelpful POSIX semantics for poll() on plain files /
block devices, the qemu_fill_buffer method is unlikely to ever
see  QIO_CHANNEL_ERR_BLOCK, so is unlikely to ever yield in the
coroutine. Thus the QMP monitor will not process data until the
migration load is finished.

For libvirt this is not actually a problem in practice from what
I understand, as we don't try to run any QMP commands, we merely
wait until we see an even indicating migration is finished. 

To solve this problem we would need to make the incoming migration
run inside a thread, instead of coroutine. This does not appear
like it would be too hard, mostly just need to replace the
coroutine creation command with a thread creation command and
switch the QEMUFile to blocking  mode.

The open question is whether there are any locking/concurrency
concerns with doing this. To some extent those concerns would
already exist, even with using coroutines, as there are other
threads present, and QMP commands can change state in the
middle of a migration load if an app is foolish enough to
request it.  So maybe switching to threads isn't actually
that hard ?

Either way, I think libvirt could live with the blocked QMP
monitor during incoming migrate in the short/medium term.
So this doesn't look like a blocker to me currently.

> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  migration/file.c      | 15 +++++++++++++++
>  migration/file.h      |  1 +
>  migration/migration.c |  2 ++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/migration/file.c b/migration/file.c
> index 02896a7cab99..93eb718aa0f4 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -21,3 +21,18 @@ void file_start_outgoing_migration(MigrationState *s, const char *fname, Error *
>  }
>  
>  
> +void file_start_incoming_migration(const char *fname, Error **errp)
> +{
> +	QIOChannelFile *ioc;
> +
> +	ioc = qio_channel_file_new_path(fname, O_RDONLY, 0, errp);
> +	if (!ioc) {
> +		error_report("Error creating a channel");
> +		return;
> +	}
> +
> +	qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
> +	migration_channel_process_incoming(QIO_CHANNEL(ioc));
> +	object_unref(OBJECT(ioc));
> +}
> +
> diff --git a/migration/file.h b/migration/file.h
> index d476eb1157f9..cdbd291322d4 100644
> --- a/migration/file.h
> +++ b/migration/file.h
> @@ -5,5 +5,6 @@ void file_start_outgoing_migration(MigrationState *s,
>                                     const char *filename,
>                                     Error **errp);
>  
> +void file_start_incoming_migration(const char *fname, Error **errp);
>  #endif
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index b5373db38535..eafd887254dd 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -506,6 +506,8 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
>          exec_start_incoming_migration(p, errp);
>      } else if (strstart(uri, "fd:", &p)) {
>          fd_start_incoming_migration(p, errp);
> +    } else if (strstart(uri, "file:", &p)) {
> +        file_start_incoming_migration(p, errp);
>      } else {
>          error_setg(errp, "unknown migration protocol: %s", uri);
>      }
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 03/14] migration: Initial support of fixed-ram feature for analyze-migration.py
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 01/14] migration: support file: uri for source migration Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 02/14] migration: Add support for 'file:' uri for incoming migration Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 16:13   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 04/14] io: Add generic pwritev/preadv interface Nikolay Borisov
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

In order to allow analyze-migration.py script to work with migration
streams that have the 'fixed-ram' capability set it's required to have
access to the stream's configuration object. This commit enables this
by making migration json writer part of MigrationState struct, allowing
the configuration object be serialized to json.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 migration/migration.c        |  5 ++++
 migration/migration.h        |  3 +++
 migration/savevm.c           | 47 ++++++++++++++++++++++------------
 scripts/analyze-migration.py | 49 +++++++++++++++++++++++++++++++++---
 4 files changed, 85 insertions(+), 19 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index eafd887254dd..11ceea340702 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1897,6 +1897,8 @@ static void migrate_fd_cleanup(MigrationState *s)
     g_free(s->hostname);
     s->hostname = NULL;
 
+    json_writer_free(s->vmdesc);
+
     qemu_savevm_state_cleanup();
 
     if (s->to_dst_file) {
@@ -2155,6 +2157,7 @@ void migrate_init(MigrationState *s)
     error_free(s->error);
     s->error = NULL;
     s->hostname = NULL;
+    s->vmdesc = NULL;
 
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
@@ -4270,6 +4273,8 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         return;
     }
 
+    s->vmdesc = json_writer_new(false);
+
     if (multifd_save_setup(&local_err) != 0) {
         error_report_err(local_err);
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index cdad8aceaaab..96f27aba2210 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
 #include "exec/cpu-common.h"
 #include "hw/qdev-core.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine_int.h"
 #include "io/channel.h"
@@ -261,6 +262,8 @@ struct MigrationState {
 
     int state;
 
+    JSONWriter *vmdesc;
+
     /* State related to return path */
     struct {
         /* Protected by qemu_file_lock */
diff --git a/migration/savevm.c b/migration/savevm.c
index 48e85c052c2c..44a222888306 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1137,13 +1137,23 @@ void qemu_savevm_non_migratable_list(strList **reasons)
 
 void qemu_savevm_state_header(QEMUFile *f)
 {
+    MigrationState *s = migrate_get_current();
     trace_savevm_state_header();
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
     qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 
-    if (migrate_get_current()->send_configuration) {
+    if (s->send_configuration) {
         qemu_put_byte(f, QEMU_VM_CONFIGURATION);
-        vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
+        /*
+         * This starts the main json object and is paired with the
+         * json_writer_end_object in
+         * qemu_savevm_state_complete_precopy_non_iterable
+         */
+        json_writer_start_object(s->vmdesc, NULL);
+        json_writer_start_object(s->vmdesc, "configuration");
+        vmstate_save_state(f, &vmstate_configuration, &savevm_state, s->vmdesc);
+        json_writer_end_object(s->vmdesc);
+
     }
 }
 
@@ -1364,15 +1374,16 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool in_postcopy,
                                                     bool inactivate_disks)
 {
-    g_autoptr(JSONWriter) vmdesc = NULL;
+    MigrationState *s = migrate_get_current();
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
 
-    vmdesc = json_writer_new(false);
-    json_writer_start_object(vmdesc, NULL);
-    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
-    json_writer_start_array(vmdesc, "devices");
+    if (!s->send_configuration) {
+        json_writer_start_object(s->vmdesc, NULL);
+    }
+    json_writer_int64(s->vmdesc, "page_size", qemu_target_page_size());
+    json_writer_start_array(s->vmdesc, "devices");
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 
         if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
@@ -1385,12 +1396,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 
         trace_savevm_section_start(se->idstr, se->section_id);
 
-        json_writer_start_object(vmdesc, NULL);
-        json_writer_str(vmdesc, "name", se->idstr);
-        json_writer_int64(vmdesc, "instance_id", se->instance_id);
+        json_writer_start_object(s->vmdesc, NULL);
+        json_writer_str(s->vmdesc, "name", se->idstr);
+        json_writer_int64(s->vmdesc, "instance_id", se->instance_id);
 
         save_section_header(f, se, QEMU_VM_SECTION_FULL);
-        ret = vmstate_save(f, se, vmdesc);
+        ret = vmstate_save(f, se, s->vmdesc);
         if (ret) {
             qemu_file_set_error(f, ret);
             return ret;
@@ -1398,7 +1409,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         trace_savevm_section_end(se->idstr, se->section_id, 0);
         save_section_footer(f, se);
 
-        json_writer_end_object(vmdesc);
+        json_writer_end_object(s->vmdesc);
     }
 
     if (inactivate_disks) {
@@ -1417,14 +1428,18 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         qemu_put_byte(f, QEMU_VM_EOF);
     }
 
-    json_writer_end_array(vmdesc);
-    json_writer_end_object(vmdesc);
-    vmdesc_len = strlen(json_writer_get(vmdesc));
+    json_writer_end_array(s->vmdesc);
+    /*
+     * This finishes the top level json object, its opoening counter part
+     * is either in this function or in qemu_savevm_state_header
+     */
+    json_writer_end_object(s->vmdesc);
+    vmdesc_len = strlen(json_writer_get(s->vmdesc));
 
     if (should_send_vmdesc()) {
         qemu_put_byte(f, QEMU_VM_VMDESCRIPTION);
         qemu_put_be32(f, vmdesc_len);
-        qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
+        qemu_put_buffer(f, (uint8_t *)json_writer_get(s->vmdesc), vmdesc_len);
     }
 
     return 0;
diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index b82a1b0c58c4..9785a640fbf8 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -23,6 +23,7 @@
 import collections
 import struct
 import sys
+import math
 
 
 def mkdir_p(path):
@@ -119,11 +120,16 @@ def __init__(self, file, version_id, ramargs, section_key):
         self.file = file
         self.section_key = section_key
         self.TARGET_PAGE_SIZE = ramargs['page_size']
+        self.TARGET_PAGE_BITS = math.log2(self.TARGET_PAGE_SIZE)
         self.dump_memory = ramargs['dump_memory']
         self.write_memory = ramargs['write_memory']
+        self.fixed_ram = ramargs['fixed-ram']
         self.sizeinfo = collections.OrderedDict()
+        self.bitmap_offset = collections.OrderedDict()
+        self.pages_offset = collections.OrderedDict()
         self.data = collections.OrderedDict()
         self.data['section sizes'] = self.sizeinfo
+        self.ram_read = False
         self.name = ''
         if self.write_memory:
             self.files = { }
@@ -140,7 +146,13 @@ def __str__(self):
     def getDict(self):
         return self.data
 
+    def write_or_dump_fixed_ram(self):
+        pass
+
     def read(self):
+        if self.fixed_ram and self.ram_read:
+            return
+
         # Read all RAM sections
         while True:
             addr = self.file.read64()
@@ -167,7 +179,25 @@ def read(self):
                         f.truncate(0)
                         f.truncate(len)
                         self.files[self.name] = f
+
+                    if self.fixed_ram:
+                        bitmap_len = self.file.read32()
+                        # skip the pages_offset which we don't need
+                        offset = self.file.tell() + 8
+                        self.bitmap_offset[self.name] = offset
+                        offset = ((offset + bitmap_len + self.TARGET_PAGE_SIZE - 1) // self.TARGET_PAGE_SIZE) * self.TARGET_PAGE_SIZE
+                        self.pages_offset[self.name] = offset
+                        self.file.file.seek(offset + len)
+
                 flags &= ~self.RAM_SAVE_FLAG_MEM_SIZE
+                if self.fixed_ram:
+                    self.ram_read = True
+                # now we should rewind to the ram page offset of the first
+                # ram section
+                if self.fixed_ram:
+                    if self.write_memory or self.dump_memory:
+                        self.write_or_dump_fixed_ram()
+                        return
 
             if flags & self.RAM_SAVE_FLAG_COMPRESS:
                 if flags & self.RAM_SAVE_FLAG_CONTINUE:
@@ -208,7 +238,7 @@ def read(self):
 
             # End of RAM section
             if flags & self.RAM_SAVE_FLAG_EOS:
-                break
+               return
 
             if flags != 0:
                 raise Exception("Unknown RAM flags: %x" % flags)
@@ -521,6 +551,7 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False):
         ramargs['page_size'] = self.vmsd_desc['page_size']
         ramargs['dump_memory'] = dump_memory
         ramargs['write_memory'] = write_memory
+        ramargs['fixed-ram'] = False
         self.section_classes[('ram',0)][1] = ramargs
 
         while True:
@@ -528,8 +559,20 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False):
             if section_type == self.QEMU_VM_EOF:
                 break
             elif section_type == self.QEMU_VM_CONFIGURATION:
-                section = ConfigurationSection(file)
-                section.read()
+                config_desc = self.vmsd_desc.get('configuration')
+                if config_desc is not None:
+                    config = VMSDSection(file, 1, config_desc, 'configuration')
+                    config.read()
+                    caps = config.data.get("configuration/capabilities")
+                    if caps is not None:
+                        caps = caps.data["capabilities"]
+                        if type(caps) != list:
+                            caps = [caps]
+                        for i in caps:
+                            # chomp out string length
+                            cap = i.data[1:].decode("utf8")
+                            if cap == "fixed-ram":
+                                ramargs['fixed-ram'] = True
             elif section_type == self.QEMU_VM_SECTION_START or section_type == self.QEMU_VM_SECTION_FULL:
                 section_id = file.read32()
                 name = file.readstr()
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 03/14] migration: Initial support of fixed-ram feature for analyze-migration.py
  2022-10-28 10:39 ` [PATCH v3 03/14] migration: Initial support of fixed-ram feature for analyze-migration.py Nikolay Borisov
@ 2023-02-10 16:13   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 16:13 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:03PM +0300, Nikolay Borisov wrote:
> In order to allow analyze-migration.py script to work with migration
> streams that have the 'fixed-ram' capability set it's required to have
> access to the stream's configuration object. This commit enables this
> by making migration json writer part of MigrationState struct, allowing
> the configuration object be serialized to json.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  migration/migration.c        |  5 ++++
>  migration/migration.h        |  3 +++
>  migration/savevm.c           | 47 ++++++++++++++++++++++------------
>  scripts/analyze-migration.py | 49 +++++++++++++++++++++++++++++++++---
>  4 files changed, 85 insertions(+), 19 deletions(-)

This patch has a minor clash with a patch that's since merged
into git master

  commit e3bf5e68e2a97898f37834c47449101172ced123
  Author: David Hildenbrand <david@redhat.com>
  Date:   Tue Jan 17 12:22:43 2023 +0100

    migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()

so needs some adjustment, but nothing too major

The actual goal of this change makes sense to me.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 04/14] io: Add generic pwritev/preadv interface
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (2 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 03/14] migration: Initial support of fixed-ram feature for analyze-migration.py Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 16:26   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 05/14] io: implement io_pwritev for QIOChannelFile Nikolay Borisov
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Introduce basic pwriteve/preadv support in the generic channel layer.
SPecific implementation will follow for the file channel as this is
required in order to support migration streams with fixed location of
each ram page.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 include/io/channel.h | 49 ++++++++++++++++++++++++++++++++++++++++++++
 io/channel.c         | 26 +++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index c680ee748021..6b10bce8bbdf 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -124,6 +124,16 @@ struct QIOChannelClass {
                            Error **errp);
 
     /* Optional callbacks */
+    ssize_t (*io_pwritev)(QIOChannel *ioc,
+                       const struct iovec *iov,
+                       size_t niov,
+                       off_t offset,
+                       Error **errp);
+    ssize_t (*io_preadv)(QIOChannel *ioc,
+                      const struct iovec *iov,
+                      size_t niov,
+                      off_t offset,
+                      Error **errp);
     int (*io_shutdown)(QIOChannel *ioc,
                        QIOChannelShutdown how,
                        Error **errp);
@@ -504,6 +514,45 @@ int qio_channel_set_blocking(QIOChannel *ioc,
 int qio_channel_close(QIOChannel *ioc,
                       Error **errp);
 
+
+/**
+ * qio_channel_io_pwritev
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report an error.
+ * To avoid errors, the caller may check for the feature flag
+ * QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_writev_full, apart from not supporting sending of file
+ * handles as well as beginning the write at the passed @offset
+ *
+ */
+ssize_t qio_channel_io_pwritev(QIOChannel *ioc, const struct iovec *iov,
+                               size_t niov, off_t offset, Error **errp);
+
+
+/**
+ * qio_channel_io_preadv
+ * @ioc: the channel object
+ * @iov: the array of memory regions to read data into
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report an error.
+ * To avoid errors, the caller may check for the feature flag
+ * QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_readv_full, apart from not supporting receiving of file
+ * handles as well as beginning the read at the passed @offset
+ *
+ */
+ssize_t qio_channel_io_preadv(QIOChannel *ioc, const struct iovec *iov,
+                             size_t niov, off_t offset, Error **errp);
 /**
  * qio_channel_shutdown:
  * @ioc: the channel object
diff --git a/io/channel.c b/io/channel.c
index 0640941ac573..f5ac9499a7ad 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -437,6 +437,32 @@ GSource *qio_channel_add_watch_source(QIOChannel *ioc,
 }
 
 
+ssize_t qio_channel_io_pwritev(QIOChannel *ioc, const struct iovec *iov,
+                               size_t niov, off_t offset, Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_pwritev) {
+        error_setg(errp, "Channel does not support pwritev");
+        return -1;
+    }
+
+    return klass->io_pwritev(ioc, iov, niov, offset, errp);
+}
+
+ssize_t qio_channel_io_preadv(QIOChannel *ioc, const struct iovec *iov,
+                               size_t niov, off_t offset, Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_preadv) {
+        error_setg(errp, "Channel does not support preadv");
+        return -1;
+    }
+
+    return klass->io_preadv(ioc, iov, niov, offset, errp);
+}
+
 int qio_channel_shutdown(QIOChannel *ioc,
                          QIOChannelShutdown how,
                          Error **errp)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 04/14] io: Add generic pwritev/preadv interface
  2022-10-28 10:39 ` [PATCH v3 04/14] io: Add generic pwritev/preadv interface Nikolay Borisov
@ 2023-02-10 16:26   ` Daniel P. Berrangé
  2023-02-10 16:58     ` Daniel P. Berrangé
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 16:26 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:04PM +0300, Nikolay Borisov wrote:
> Introduce basic pwriteve/preadv support in the generic channel layer.
> SPecific implementation will follow for the file channel as this is
> required in order to support migration streams with fixed location of
> each ram page.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  include/io/channel.h | 49 ++++++++++++++++++++++++++++++++++++++++++++
>  io/channel.c         | 26 +++++++++++++++++++++++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index c680ee748021..6b10bce8bbdf 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -124,6 +124,16 @@ struct QIOChannelClass {
>                             Error **errp);
>  
>      /* Optional callbacks */
> +    ssize_t (*io_pwritev)(QIOChannel *ioc,
> +                       const struct iovec *iov,
> +                       size_t niov,
> +                       off_t offset,
> +                       Error **errp);
> +    ssize_t (*io_preadv)(QIOChannel *ioc,
> +                      const struct iovec *iov,
> +                      size_t niov,
> +                      off_t offset,
> +                      Error **errp);

nit-pick the alignment of the 2nd line of parameters onwards,
needs to be indented by 3 more spaces.


> +/**
> + * qio_channel_io_pwritev
> + * @ioc: the channel object
> + * @iov: the array of memory regions to write data from
> + * @niov: the length of the @iov array
> + * @offset: offset in the channel where writes should begin
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Not all implementations will support this facility, so may report an error.
> + * To avoid errors, the caller may check for the feature flag
> + * QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
> + *
> + * Behaves as qio_channel_writev_full, apart from not supporting sending of file

Given this, we should have  "_full" suffix on these methods
too for consistent naming policy.

> + * handles as well as beginning the write at the passed @offset
> + *
> + */
> +ssize_t qio_channel_io_pwritev(QIOChannel *ioc, const struct iovec *iov,
> +                               size_t niov, off_t offset, Error **errp);
> +
> +
> +/**
> + * qio_channel_io_preadv
> + * @ioc: the channel object
> + * @iov: the array of memory regions to read data into
> + * @niov: the length of the @iov array
> + * @offset: offset in the channel where writes should begin
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Not all implementations will support this facility, so may report an error.
> + * To avoid errors, the caller may check for the feature flag
> + * QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
> + *
> + * Behaves as qio_channel_readv_full, apart from not supporting receiving of file
> + * handles as well as beginning the read at the passed @offset
> + *
> + */
> +ssize_t qio_channel_io_preadv(QIOChannel *ioc, const struct iovec *iov,
> +                             size_t niov, off_t offset, Error **errp);
>  /**
>   * qio_channel_shutdown:
>   * @ioc: the channel object
> diff --git a/io/channel.c b/io/channel.c
> index 0640941ac573..f5ac9499a7ad 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -437,6 +437,32 @@ GSource *qio_channel_add_watch_source(QIOChannel *ioc,
>  }
>  
>  
> +ssize_t qio_channel_io_pwritev(QIOChannel *ioc, const struct iovec *iov,
> +                               size_t niov, off_t offset, Error **errp)
> +{
> +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> +
> +    if (!klass->io_pwritev) {
> +        error_setg(errp, "Channel does not support pwritev");
> +        return -1;
> +    }

This also possibly benefits from a validation check

    if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SEEKABLE)) {
        error_setg_errno(errp, EINVAL,
                         "Requested channel is not seekable")
        return -1;
    }

because the QIOChannelFile impl will support io_pwritev callback,
but not all the FDs it can use will support seeking.

Same check for preadv

> +
> +    return klass->io_pwritev(ioc, iov, niov, offset, errp);
> +}
> +
> +ssize_t qio_channel_io_preadv(QIOChannel *ioc, const struct iovec *iov,
> +                               size_t niov, off_t offset, Error **errp)
> +{
> +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> +
> +    if (!klass->io_preadv) {
> +        error_setg(errp, "Channel does not support preadv");
> +        return -1;
> +    }
> +
> +    return klass->io_preadv(ioc, iov, niov, offset, errp);
> +}
> +
>  int qio_channel_shutdown(QIOChannel *ioc,
>                           QIOChannelShutdown how,
>                           Error **errp)
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 04/14] io: Add generic pwritev/preadv interface
  2023-02-10 16:26   ` Daniel P. Berrangé
@ 2023-02-10 16:58     ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 16:58 UTC (permalink / raw)
  To: Nikolay Borisov, dgilbert, qemu-devel, jfehlig, Claudio.Fontana,
	dfaggioli

On Fri, Feb 10, 2023 at 04:26:22PM +0000, Daniel P. Berrangé wrote:
> On Fri, Oct 28, 2022 at 01:39:04PM +0300, Nikolay Borisov wrote:
> > Introduce basic pwriteve/preadv support in the generic channel layer.
> > SPecific implementation will follow for the file channel as this is
> > required in order to support migration streams with fixed location of
> > each ram page.
> > 
> > Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> > ---
> >  include/io/channel.h | 49 ++++++++++++++++++++++++++++++++++++++++++++
> >  io/channel.c         | 26 +++++++++++++++++++++++
> >  2 files changed, 75 insertions(+)
> > 
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index c680ee748021..6b10bce8bbdf 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h
> > @@ -124,6 +124,16 @@ struct QIOChannelClass {
> >                             Error **errp);
> >  
> >      /* Optional callbacks */
> > +    ssize_t (*io_pwritev)(QIOChannel *ioc,
> > +                       const struct iovec *iov,
> > +                       size_t niov,
> > +                       off_t offset,
> > +                       Error **errp);
> > +    ssize_t (*io_preadv)(QIOChannel *ioc,
> > +                      const struct iovec *iov,
> > +                      size_t niov,
> > +                      off_t offset,
> > +                      Error **errp);
> 
> nit-pick the alignment of the 2nd line of parameters onwards,
> needs to be indented by 3 more spaces.
> 
> 
> > +/**
> > + * qio_channel_io_pwritev
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to write data from
> > + * @niov: the length of the @iov array
> > + * @offset: offset in the channel where writes should begin
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Not all implementations will support this facility, so may report an error.
> > + * To avoid errors, the caller may check for the feature flag
> > + * QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
> > + *
> > + * Behaves as qio_channel_writev_full, apart from not supporting sending of file
> 
> Given this, we should have  "_full" suffix on these methods
> too for consistent naming policy.
> 
> > + * handles as well as beginning the write at the passed @offset
> > + *
> > + */
> > +ssize_t qio_channel_io_pwritev(QIOChannel *ioc, const struct iovec *iov,
> > +                               size_t niov, off_t offset, Error **errp);

In addition to giving this method the '_full' suffix, we should also
add the qio_channel_io_pwritev entrypoint which takes a single
buffer instead of iovec. The migration code in QEMUFile that is
added in a later patch in this series doesn't actually want to
use iovecs in the first place. So having the non-iov entrypoints
in QIOChannel will simplify the migration code.

> > +
> > +
> > +/**
> > + * qio_channel_io_preadv
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to read data into
> > + * @niov: the length of the @iov array
> > + * @offset: offset in the channel where writes should begin
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Not all implementations will support this facility, so may report an error.
> > + * To avoid errors, the caller may check for the feature flag
> > + * QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
> > + *
> > + * Behaves as qio_channel_readv_full, apart from not supporting receiving of file
> > + * handles as well as beginning the read at the passed @offset
> > + *
> > + */
> > +ssize_t qio_channel_io_preadv(QIOChannel *ioc, const struct iovec *iov,
> > +                             size_t niov, off_t offset, Error **errp);
> >  /**
> >   * qio_channel_shutdown:
> >   * @ioc: the channel object
> > diff --git a/io/channel.c b/io/channel.c
> > index 0640941ac573..f5ac9499a7ad 100644
> > --- a/io/channel.c
> > +++ b/io/channel.c
> > @@ -437,6 +437,32 @@ GSource *qio_channel_add_watch_source(QIOChannel *ioc,
> >  }
> >  
> >  
> > +ssize_t qio_channel_io_pwritev(QIOChannel *ioc, const struct iovec *iov,
> > +                               size_t niov, off_t offset, Error **errp)
> > +{
> > +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > +
> > +    if (!klass->io_pwritev) {
> > +        error_setg(errp, "Channel does not support pwritev");
> > +        return -1;
> > +    }
> 
> This also possibly benefits from a validation check
> 
>     if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SEEKABLE)) {
>         error_setg_errno(errp, EINVAL,
>                          "Requested channel is not seekable")
>         return -1;
>     }
> 
> because the QIOChannelFile impl will support io_pwritev callback,
> but not all the FDs it can use will support seeking.
> 
> Same check for preadv
> 
> > +
> > +    return klass->io_pwritev(ioc, iov, niov, offset, errp);
> > +}
> > +
> > +ssize_t qio_channel_io_preadv(QIOChannel *ioc, const struct iovec *iov,
> > +                               size_t niov, off_t offset, Error **errp)
> > +{
> > +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > +
> > +    if (!klass->io_preadv) {
> > +        error_setg(errp, "Channel does not support preadv");
> > +        return -1;
> > +    }
> > +
> > +    return klass->io_preadv(ioc, iov, niov, offset, errp);
> > +}
> > +
> >  int qio_channel_shutdown(QIOChannel *ioc,
> >                           QIOChannelShutdown how,
> >                           Error **errp)
> > -- 
> > 2.34.1
> > 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 05/14] io: implement io_pwritev for QIOChannelFile
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (3 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 04/14] io: Add generic pwritev/preadv interface Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 16:34   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 06/14] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Nikolay Borisov
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

The upcoming 'fixed-ram' feature would require qemu to write data at
specific offsets of the file. Add a minimal implementation of pwritev
and expose it via the io_pwritev interface.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 io/channel-file.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/io/channel-file.c b/io/channel-file.c
index b67687c2aa64..a7a90c12dc2b 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -136,6 +136,30 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
     return ret;
 }
 
+static ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
+                                        const struct iovec *iov,
+                                        size_t niov,
+                                        off_t offset,
+                                        Error **errp)
+{
+    QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+    ssize_t ret;
+
+ retry:
+    ret = pwritev(fioc->fd, iov, niov, offset);
+    if (ret <= 0) {
+        if (errno == EAGAIN) {
+            return QIO_CHANNEL_ERR_BLOCK;
+        }
+        if (errno == EINTR) {
+            goto retry;
+        }
+        error_setg_errno(errp, errno, "Unable to write to file");
+        return -1;
+    }
+    return ret;
+}
+
 static int qio_channel_file_set_blocking(QIOChannel *ioc,
                                          bool enabled,
                                          Error **errp)
@@ -218,6 +242,7 @@ static void qio_channel_file_class_init(ObjectClass *klass,
     ioc_klass->io_writev = qio_channel_file_writev;
     ioc_klass->io_readv = qio_channel_file_readv;
     ioc_klass->io_set_blocking = qio_channel_file_set_blocking;
+    ioc_klass->io_pwritev = qio_channel_file_pwritev;
     ioc_klass->io_seek = qio_channel_file_seek;
     ioc_klass->io_close = qio_channel_file_close;
     ioc_klass->io_create_watch = qio_channel_file_create_watch;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 05/14] io: implement io_pwritev for QIOChannelFile
  2022-10-28 10:39 ` [PATCH v3 05/14] io: implement io_pwritev for QIOChannelFile Nikolay Borisov
@ 2023-02-10 16:34   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 16:34 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:05PM +0300, Nikolay Borisov wrote:
> The upcoming 'fixed-ram' feature would require qemu to write data at
> specific offsets of the file. Add a minimal implementation of pwritev
> and expose it via the io_pwritev interface.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  io/channel-file.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 06/14] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (4 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 05/14] io: implement io_pwritev for QIOChannelFile Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 16:38   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 07/14] migration/qemu-file: add utility methods for working with seekable channels Nikolay Borisov
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Add a generic QIOChannel feature SEEKABLE which would be used by the
qemu_file* apis. For the time being this will be only implemented for
file channels.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 include/io/channel.h | 1 +
 io/channel-file.c    | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index 6b10bce8bbdf..b645989e467c 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -41,6 +41,7 @@ enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
     QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
+    QIO_CHANNEL_FEATURE_SEEKABLE,
 };
 
 
diff --git a/io/channel-file.c b/io/channel-file.c
index a7a90c12dc2b..e213a0fd7cd2 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -35,6 +35,11 @@ qio_channel_file_new_fd(int fd)
 
     ioc->fd = fd;
 
+    if (lseek(fd, 0, SEEK_CUR) != (off_t)-1) {
+        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
+    }
+
+    qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
     trace_qio_channel_file_new_fd(ioc, fd);
 
     return ioc;
@@ -59,6 +64,10 @@ qio_channel_file_new_path(const char *path,
         return NULL;
     }
 
+    if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) {
+        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
+    }
+
     trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd);
 
     return ioc;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 06/14] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  2022-10-28 10:39 ` [PATCH v3 06/14] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Nikolay Borisov
@ 2023-02-10 16:38   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 16:38 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:06PM +0300, Nikolay Borisov wrote:
> Add a generic QIOChannel feature SEEKABLE which would be used by the
> qemu_file* apis. For the time being this will be only implemented for
> file channels.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  include/io/channel.h | 1 +
>  io/channel-file.c    | 9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 6b10bce8bbdf..b645989e467c 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -41,6 +41,7 @@ enum QIOChannelFeature {
>      QIO_CHANNEL_FEATURE_SHUTDOWN,
>      QIO_CHANNEL_FEATURE_LISTEN,
>      QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
> +    QIO_CHANNEL_FEATURE_SEEKABLE,
>  };
>  
>  
> diff --git a/io/channel-file.c b/io/channel-file.c
> index a7a90c12dc2b..e213a0fd7cd2 100644
> --- a/io/channel-file.c
> +++ b/io/channel-file.c
> @@ -35,6 +35,11 @@ qio_channel_file_new_fd(int fd)
>  
>      ioc->fd = fd;
>  
> +    if (lseek(fd, 0, SEEK_CUR) != (off_t)-1) {
> +        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
> +    }
> +
> +    qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);

This second qio_channel_set_feature call is a rebase mistake I
presume. With that removed, the rest of the patch looks fine

  Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


>      trace_qio_channel_file_new_fd(ioc, fd);
>  
>      return ioc;
> @@ -59,6 +64,10 @@ qio_channel_file_new_path(const char *path,
>          return NULL;
>      }
>  
> +    if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) {
> +        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
> +    }
> +
>      trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd);
>  
>      return ioc;
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 07/14] migration/qemu-file: add utility methods for working with seekable channels
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (5 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 06/14] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 08/14] io: Add preadv support to QIOChannelFile Nikolay Borisov
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Add
qemu_file_is_seekable/qemu_put_buffer_at/qemu_set_offset/qemu_get_offset
as those utility methods will be needed when implementing 'fixed-ram'
migration capability.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 include/migration/qemu-file-types.h |  2 +
 migration/qemu-file.c               | 59 +++++++++++++++++++++++++++++
 migration/qemu-file.h               |  3 ++
 3 files changed, 64 insertions(+)

diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h
index 2867e3da84ab..eb0325ee8687 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -50,6 +50,8 @@ unsigned int qemu_get_be16(QEMUFile *f);
 unsigned int qemu_get_be32(QEMUFile *f);
 uint64_t qemu_get_be64(QEMUFile *f);
 
+bool qemu_file_is_seekable(QEMUFile *f);
+
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
     qemu_put_be64(f, *pv);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4f400c2e5265..d0e0ba6150f7 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,6 +30,7 @@
 #include "qemu-file.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "io/channel-file.h"
 
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -260,6 +261,10 @@ static void qemu_iovec_release_ram(QEMUFile *f)
     memset(f->may_free, 0, sizeof(f->may_free));
 }
 
+bool qemu_file_is_seekable(QEMUFile *f)
+{
+    return qio_channel_has_feature(f->ioc, QIO_CHANNEL_FEATURE_SEEKABLE);
+}
 
 /**
  * Flushes QEMUFile buffer
@@ -538,6 +543,60 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
     }
 }
 
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos)
+{
+    Error *err = NULL;
+    struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
+
+    if (f->last_error) {
+        return;
+    }
+
+    qemu_fflush(f);
+
+    if (qio_channel_io_pwritev(f->ioc, &iov, 1, pos, &err) == (off_t)-1)
+        goto error;
+
+    return;
+
+ error:
+    qemu_file_set_error_obj(f, -EIO, err);
+    return;
+}
+
+void qemu_set_offset(QEMUFile *f, off_t off, int whence)
+{
+    Error *err = NULL;
+    off_t ret;
+
+    qemu_fflush(f);
+
+    if (!qemu_file_is_writable(f)) {
+	    f->buf_index = 0;
+	    f->buf_size = 0;
+    }
+
+    ret = qio_channel_io_seek(f->ioc, off, whence, &err);
+    if (ret == (off_t)-1) {
+        qemu_file_set_error_obj(f, -EIO, err);
+    }
+}
+
+off_t qemu_get_offset(QEMUFile *f)
+{
+    Error *err = NULL;
+    off_t ret;
+
+    qemu_fflush(f);
+
+    ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, &err);
+    if (ret == (off_t)-1) {
+        qemu_file_set_error_obj(f, -EIO, err);
+    }
+    return ret;
+}
+
+
 void qemu_put_byte(QEMUFile *f, int v)
 {
     if (f->last_error) {
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index fa13d04d787c..33cfc07b81d1 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -148,6 +148,9 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+void qemu_set_offset(QEMUFile *f, off_t off, int whence);
+off_t qemu_get_offset(QEMUFile *f);
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 08/14] io: Add preadv support to QIOChannelFile
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (6 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 07/14] migration/qemu-file: add utility methods for working with seekable channels Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 16:59   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 09/14] migration: add qemu_get_buffer_at Nikolay Borisov
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

preadv is going to be needed when 'fixed-ram'-enabled stream are to be
restored. Add a minimal implementation of preadv for file channels and
expose it via the generic io_preadv interface.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 io/channel-file.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/io/channel-file.c b/io/channel-file.c
index e213a0fd7cd2..d2f4706b7f6d 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -145,6 +145,32 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
     return ret;
 }
 
+static ssize_t qio_channel_file_preadv(QIOChannel *ioc,
+                                       const struct iovec *iov,
+                                       size_t niov,
+                                       off_t offset,
+                                       Error **errp)
+{
+    QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+    ssize_t ret;
+
+ retry:
+    ret = preadv(fioc->fd, iov, niov, offset);
+    if (ret < 0) {
+        if (errno == EAGAIN) {
+            return QIO_CHANNEL_ERR_BLOCK;
+        }
+        if (errno == EINTR) {
+            goto retry;
+        }
+
+        error_setg_errno(errp, errno, "Unable to read from file");
+        return -1;
+    }
+
+    return ret;
+}
+
 static ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
                                         const struct iovec *iov,
                                         size_t niov,
@@ -252,6 +278,7 @@ static void qio_channel_file_class_init(ObjectClass *klass,
     ioc_klass->io_readv = qio_channel_file_readv;
     ioc_klass->io_set_blocking = qio_channel_file_set_blocking;
     ioc_klass->io_pwritev = qio_channel_file_pwritev;
+    ioc_klass->io_preadv = qio_channel_file_preadv;
     ioc_klass->io_seek = qio_channel_file_seek;
     ioc_klass->io_close = qio_channel_file_close;
     ioc_klass->io_create_watch = qio_channel_file_create_watch;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 08/14] io: Add preadv support to QIOChannelFile
  2022-10-28 10:39 ` [PATCH v3 08/14] io: Add preadv support to QIOChannelFile Nikolay Borisov
@ 2023-02-10 16:59   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 16:59 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:08PM +0300, Nikolay Borisov wrote:
> preadv is going to be needed when 'fixed-ram'-enabled stream are to be
> restored. Add a minimal implementation of preadv for file channels and
> expose it via the generic io_preadv interface.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  io/channel-file.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/io/channel-file.c b/io/channel-file.c
> index e213a0fd7cd2..d2f4706b7f6d 100644
> --- a/io/channel-file.c
> +++ b/io/channel-file.c
> @@ -145,6 +145,32 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
>      return ret;
>  }
>  
> +static ssize_t qio_channel_file_preadv(QIOChannel *ioc,
> +                                       const struct iovec *iov,
> +                                       size_t niov,
> +                                       off_t offset,
> +                                       Error **errp)
> +{
> +    QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
> +    ssize_t ret;
> +
> + retry:
> +    ret = preadv(fioc->fd, iov, niov, offset);
> +    if (ret < 0) {
> +        if (errno == EAGAIN) {
> +            return QIO_CHANNEL_ERR_BLOCK;
> +        }
> +        if (errno == EINTR) {
> +            goto retry;
> +        }
> +
> +        error_setg_errno(errp, errno, "Unable to read from file");
> +        return -1;
> +    }
> +
> +    return ret;
> +}
> +
>  static ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
>                                          const struct iovec *iov,
>                                          size_t niov,
> @@ -252,6 +278,7 @@ static void qio_channel_file_class_init(ObjectClass *klass,
>      ioc_klass->io_readv = qio_channel_file_readv;
>      ioc_klass->io_set_blocking = qio_channel_file_set_blocking;
>      ioc_klass->io_pwritev = qio_channel_file_pwritev;
> +    ioc_klass->io_preadv = qio_channel_file_preadv;
>      ioc_klass->io_seek = qio_channel_file_seek;
>      ioc_klass->io_close = qio_channel_file_close;
>      ioc_klass->io_create_watch = qio_channel_file_create_watch;

I'd suggest this patch should just be merged into the patch which
adds the io_pwritev callback.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 09/14] migration: add qemu_get_buffer_at
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (7 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 08/14] io: Add preadv support to QIOChannelFile Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 16:59   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability Nikolay Borisov
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Restoring a 'fixed-ram' enabled migration stream would require reading
from specific offsets in the file so add a helper to QEMUFile that uses
the newly introduced qio_channel_file_preadv.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 migration/qemu-file.c | 23 +++++++++++++++++++++++
 migration/qemu-file.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d0e0ba6150f7..b24972d5728d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -564,6 +564,29 @@ void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t po
     return;
 }
 
+
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos)
+{
+    Error *err = NULL;
+    struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
+    ssize_t ret;
+
+    if (f->last_error) {
+        return 0;
+    }
+
+    ret = qio_channel_io_preadv(f->ioc, &iov, 1, pos, &err);
+    if (ret == -1) {
+	    goto error;
+    }
+
+    return (size_t)ret;
+
+ error:
+    qemu_file_set_error_obj(f, -EIO, err);
+    return 0;
+}
+
 void qemu_set_offset(QEMUFile *f, off_t off, int whence)
 {
     Error *err = NULL;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 33cfc07b81d1..ab10c3ad7e42 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -151,6 +151,7 @@ void qemu_file_set_blocking(QEMUFile *f, bool block);
 void qemu_set_offset(QEMUFile *f, off_t off, int whence);
 off_t qemu_get_offset(QEMUFile *f);
 void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos);
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/14] migration: add qemu_get_buffer_at
  2022-10-28 10:39 ` [PATCH v3 09/14] migration: add qemu_get_buffer_at Nikolay Borisov
@ 2023-02-10 16:59   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 16:59 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:09PM +0300, Nikolay Borisov wrote:
> Restoring a 'fixed-ram' enabled migration stream would require reading
> from specific offsets in the file so add a helper to QEMUFile that uses
> the newly introduced qio_channel_file_preadv.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  migration/qemu-file.c | 23 +++++++++++++++++++++++
>  migration/qemu-file.h |  1 +
>  2 files changed, 24 insertions(+)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index d0e0ba6150f7..b24972d5728d 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -564,6 +564,29 @@ void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t po
>      return;
>  }
>  
> +
> +size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos)
> +{
> +    Error *err = NULL;
> +    struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
> +    ssize_t ret;
> +
> +    if (f->last_error) {
> +        return 0;
> +    }
> +
> +    ret = qio_channel_io_preadv(f->ioc, &iov, 1, pos, &err);

If we have a qio_channel_io_preadv that does NOT use iovecs,
then this code gets simpler, as the iovec wrapping can be
hidden in the QIOChannel code.

> +    if (ret == -1) {
> +	    goto error;
> +    }
> +
> +    return (size_t)ret;
> +
> + error:
> +    qemu_file_set_error_obj(f, -EIO, err);
> +    return 0;
> +}
> +
>  void qemu_set_offset(QEMUFile *f, off_t off, int whence)
>  {
>      Error *err = NULL;
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index 33cfc07b81d1..ab10c3ad7e42 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -151,6 +151,7 @@ void qemu_file_set_blocking(QEMUFile *f, bool block);
>  void qemu_set_offset(QEMUFile *f, off_t off, int whence);
>  off_t qemu_get_offset(QEMUFile *f);
>  void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos);
> +size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos);
>  
>  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (8 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 09/14] migration: add qemu_get_buffer_at Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 17:11   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 11/14] migration: Refactor precopy ram loading code Nikolay Borisov
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Implement 'fixed-ram' feature. The core of the feature is to ensure that
each ram page of the migration stream has a specific offset in the
resulting migration stream. The reason why we'd want such behavior are
two fold:

 - When doing a 'fixed-ram' migration the resulting file will have a
   bounded size, since pages which are dirtied multiple times will
   always go to a fixed location in the file, rather than constantly
   being added to a sequential stream. This eliminates cases where a vm
   with, say, 1g of ram can result in a migration file that's 10s of
   Gbs, provided that the workload constantly redirties memory.

 - It paves the way to implement DIO-enabled save/restore of the
   migration stream as the pages are ensured to be written at aligned
   offsets.

The features requires changing the format. First, a bitmap is introduced
which tracks which pages have been written (i.e are dirtied) during
migration and subsequently it's being written in the resultin file,
again at a fixed location for every ramblock. Zero pages are ignored as
they'd be zero in the destination migration as well. With the changed
format data would look like the following:

|name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|

* pc - refers to the page_size/mr->addr members, so newly added members
begin from "bitmap_size".

This layout is initialized during ram_save_setup so instead of having a
sequential stream of pages that follow the ramblock headers the dirty
pages for a ramblock follow its header. Since all pages have a fixed
location RAM_SAVE_FLAG_EOS is no longer generated on every migration
iteration but there is effectively a single RAM_SAVE_FLAG_EOS right at
the end.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 include/exec/ramblock.h |  7 +++
 migration/migration.c   | 51 +++++++++++++++++++++-
 migration/migration.h   |  1 +
 migration/ram.c         | 97 +++++++++++++++++++++++++++++++++--------
 migration/savevm.c      |  1 +
 qapi/migration.json     |  2 +-
 6 files changed, 138 insertions(+), 21 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 6cbedf9e0c9a..30216c1a41d3 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -43,6 +43,13 @@ struct RAMBlock {
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
+    /* shadow dirty bitmap used when migrating to a file */
+    unsigned long *shadow_bmap;
+    /* offset in the file pages belonging to this ramblock are saved, used
+     * only during migration to a file
+     */
+    off_t bitmap_offset;
+    uint64_t pages_offset;
     /* bitmap of already received pages in postcopy */
     unsigned long *receivedmap;
 
diff --git a/migration/migration.c b/migration/migration.c
index 11ceea340702..c7383845a5b4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -165,7 +165,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
     MIGRATION_CAPABILITY_XBZRLE,
     MIGRATION_CAPABILITY_X_COLO,
     MIGRATION_CAPABILITY_VALIDATE_UUID,
-    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
+    MIGRATION_CAPABILITY_ZERO_COPY_SEND,
+    MIGRATION_CAPABILITY_FIXED_RAM);
 
 /* When we add fault tolerance, we could have several
    migrations at once.  For now we don't need to add
@@ -1326,6 +1327,27 @@ static bool migrate_caps_check(bool *cap_list,
     }
 #endif
 
+    if (cap_list[MIGRATION_CAPABILITY_FIXED_RAM]) {
+	    if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+		    error_setg(errp, "Directly mapped memory incompatible with multifd");
+		    return false;
+	    }
+
+	    if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
+		    error_setg(errp, "Directly mapped memory incompatible with xbzrle");
+		    return false;
+	    }
+
+	    if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+		    error_setg(errp, "Directly mapped memory incompatible with compression");
+		    return false;
+	    }
+
+	    if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+		    error_setg(errp, "Directly mapped memory incompatible with postcopy ram");
+		    return false;
+	    }
+    }
 
     /* incoming side only */
     if (runstate_check(RUN_STATE_INMIGRATE) &&
@@ -2630,6 +2652,11 @@ MultiFDCompression migrate_multifd_compression(void)
     return s->parameters.multifd_compression;
 }
 
+int migrate_fixed_ram(void)
+{
+    return migrate_get_current()->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
+}
+
 int migrate_multifd_zlib_level(void)
 {
     MigrationState *s;
@@ -4190,6 +4217,21 @@ static void *bg_migration_thread(void *opaque)
     return NULL;
 }
 
+static int
+migrate_check_fixed_ram(MigrationState *s, Error **errp)
+{
+    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM])
+        return 0;
+
+    if (!qemu_file_is_seekable(s->to_dst_file)) {
+        error_setg(errp, "Directly mapped memory requires a seekable transport");
+        return -1;
+    }
+
+    return 0;
+}
+
+
 void migrate_fd_connect(MigrationState *s, Error *error_in)
 {
     Error *local_err = NULL;
@@ -4265,6 +4307,12 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         return;
     }
 
+    if (migrate_check_fixed_ram(s, &local_err) < 0) {
+	    migrate_fd_cleanup(s);
+	    migrate_fd_error(s, local_err);
+	    return;
+    }
+
     if (resume) {
         /* Wakeup the main migration thread to do the recovery */
         migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
@@ -4398,6 +4446,7 @@ static Property migration_properties[] = {
     DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
 
     /* Migration capabilities */
+    DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),
     DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
     DEFINE_PROP_MIG_CAP("x-rdma-pin-all", MIGRATION_CAPABILITY_RDMA_PIN_ALL),
     DEFINE_PROP_MIG_CAP("x-auto-converge", MIGRATION_CAPABILITY_AUTO_CONVERGE),
diff --git a/migration/migration.h b/migration/migration.h
index 96f27aba2210..9aab1b16f407 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -410,6 +410,7 @@ bool migrate_zero_blocks(void);
 bool migrate_dirty_bitmaps(void);
 bool migrate_ignore_shared(void);
 bool migrate_validate_uuid(void);
+int migrate_fixed_ram(void);
 
 bool migrate_auto_converge(void);
 bool migrate_use_multifd(void);
diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc68..4f5ddaff356b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1261,9 +1261,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
     int len = 0;
 
     if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
-        len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
-        qemu_put_byte(file, 0);
-        len += 1;
+        if (migrate_fixed_ram()) {
+            /* for zero pages we don't need to do anything */
+            len = 1;
+        } else {
+            len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
+            qemu_put_byte(file, 0);
+            len += 1;
+        }
         ram_release_page(block->idstr, offset);
     }
     return len;
@@ -1342,15 +1347,22 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
 static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
                             uint8_t *buf, bool async)
 {
-    ram_transferred_add(save_page_header(rs, rs->f, block,
-                                         offset | RAM_SAVE_FLAG_PAGE));
-    if (async) {
-        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
-                              migrate_release_ram() &&
-                              migration_in_postcopy());
-    } else {
-        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
-    }
+
+	if (migrate_fixed_ram()) {
+		qemu_put_buffer_at(rs->f, buf, TARGET_PAGE_SIZE,
+				   block->pages_offset + offset);
+		set_bit(offset >> TARGET_PAGE_BITS, block->shadow_bmap);
+	} else {
+		ram_transferred_add(save_page_header(rs, rs->f, block,
+						     offset | RAM_SAVE_FLAG_PAGE));
+		if (async) {
+			qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
+					      migrate_release_ram() &&
+					      migration_in_postcopy());
+		} else {
+			qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
+		}
+	}
     ram_transferred_add(TARGET_PAGE_SIZE);
     ram_counters.normal++;
     return 1;
@@ -2683,6 +2695,8 @@ static void ram_save_cleanup(void *opaque)
         block->clear_bmap = NULL;
         g_free(block->bmap);
         block->bmap = NULL;
+        g_free(block->shadow_bmap);
+        block->shadow_bmap = NULL;
     }
 
     xbzrle_cleanup();
@@ -3044,6 +3058,7 @@ static void ram_list_init_bitmaps(void)
              */
             block->bmap = bitmap_new(pages);
             bitmap_set(block->bmap, 0, pages);
+            block->shadow_bmap = bitmap_new(block->used_length >> TARGET_PAGE_BITS);
             block->clear_bmap_shift = shift;
             block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
         }
@@ -3226,12 +3241,34 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
             qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
             qemu_put_be64(f, block->used_length);
             if (migrate_postcopy_ram() && block->page_size !=
-                                          qemu_host_page_size) {
+                qemu_host_page_size) {
                 qemu_put_be64(f, block->page_size);
             }
             if (migrate_ignore_shared()) {
                 qemu_put_be64(f, block->mr->addr);
             }
+
+            if (migrate_fixed_ram()) {
+                long num_pages = block->used_length >> TARGET_PAGE_BITS;
+                long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+
+                /* Needed for external programs (think analyze-migration.py) */
+                qemu_put_be32(f, bitmap_size);
+
+                /*
+                 * Make pages offset aligned to TARGET_PAGE_SIZE to enable
+                 * DIO in the future. Also add 8 to account for the page offset
+                 * itself
+                 */
+                block->bitmap_offset = qemu_get_offset(f) + 8;
+                block->pages_offset = ROUND_UP(block->bitmap_offset +
+                                               bitmap_size, TARGET_PAGE_SIZE);
+                qemu_put_be64(f, block->pages_offset);
+
+                /* Now prepare offset for next ramblock */
+                qemu_set_offset(f, block->pages_offset + block->used_length, SEEK_SET);
+            }
         }
     }
 
@@ -3249,6 +3286,17 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
+static void ram_save_shadow_bmap(QEMUFile *f)
+{
+    RAMBlock *block;
+
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
+        long num_pages = block->used_length >> TARGET_PAGE_BITS;
+        long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+        qemu_put_buffer_at(f, (uint8_t *)block->shadow_bmap, bitmap_size, block->bitmap_offset);
+    }
+}
+
 /**
  * ram_save_iterate: iterative stage for migration
  *
@@ -3358,9 +3406,15 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
             return ret;
         }
 
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
-        ram_transferred_add(8);
+	/*
+	 * For fixed ram we don't want to pollute the migration stream with
+	 * EOS flags.
+	 */
+	if (!migrate_fixed_ram()) {
+            qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+            qemu_fflush(f);
+            ram_transferred_add(8);
+	}
 
         ret = qemu_file_get_error(f);
     }
@@ -3405,7 +3459,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
             pages = ram_find_and_save_block(rs);
             /* no more blocks to sent */
             if (pages == 0) {
-                break;
+                if (migrate_fixed_ram()) {
+                    ram_save_shadow_bmap(f);
+                }
+            break;
             }
             if (pages < 0) {
                 ret = pages;
@@ -3428,8 +3485,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-    qemu_fflush(f);
+    if (!migrate_fixed_ram()) {
+        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+        qemu_fflush(f);
+    }
 
     return 0;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index 44a222888306..847a8bdfb6ce 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -240,6 +240,7 @@ static bool should_validate_capability(int capability)
     /* Validate only new capabilities to keep compatibility. */
     switch (capability) {
     case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
+    case MIGRATION_CAPABILITY_FIXED_RAM:
         return true;
     default:
         return false;
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac876..6196617171e8 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -485,7 +485,7 @@
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-           'compress', 'events', 'postcopy-ram',
+           'compress', 'events', 'postcopy-ram', 'fixed-ram',
            { 'name': 'x-colo', 'features': [ 'unstable' ] },
            'release-ram',
            'block', 'return-path', 'pause-before-switchover', 'multifd',
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability
  2022-10-28 10:39 ` [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability Nikolay Borisov
@ 2023-02-10 17:11   ` Daniel P. Berrangé
  2023-03-20 11:05     ` Claudio Fontana
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 17:11 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:10PM +0300, Nikolay Borisov wrote:
> Implement 'fixed-ram' feature. The core of the feature is to ensure that
> each ram page of the migration stream has a specific offset in the
> resulting migration stream. The reason why we'd want such behavior are
> two fold:
> 
>  - When doing a 'fixed-ram' migration the resulting file will have a
>    bounded size, since pages which are dirtied multiple times will
>    always go to a fixed location in the file, rather than constantly
>    being added to a sequential stream. This eliminates cases where a vm
>    with, say, 1g of ram can result in a migration file that's 10s of
>    Gbs, provided that the workload constantly redirties memory.
> 
>  - It paves the way to implement DIO-enabled save/restore of the
>    migration stream as the pages are ensured to be written at aligned
>    offsets.
> 
> The features requires changing the format. First, a bitmap is introduced
> which tracks which pages have been written (i.e are dirtied) during
> migration and subsequently it's being written in the resultin file,
> again at a fixed location for every ramblock. Zero pages are ignored as
> they'd be zero in the destination migration as well. With the changed
> format data would look like the following:
> 
> |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
> 
> * pc - refers to the page_size/mr->addr members, so newly added members
> begin from "bitmap_size".
> 
> This layout is initialized during ram_save_setup so instead of having a
> sequential stream of pages that follow the ramblock headers the dirty
> pages for a ramblock follow its header. Since all pages have a fixed
> location RAM_SAVE_FLAG_EOS is no longer generated on every migration
> iteration but there is effectively a single RAM_SAVE_FLAG_EOS right at
> the end.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  include/exec/ramblock.h |  7 +++
>  migration/migration.c   | 51 +++++++++++++++++++++-
>  migration/migration.h   |  1 +
>  migration/ram.c         | 97 +++++++++++++++++++++++++++++++++--------
>  migration/savevm.c      |  1 +
>  qapi/migration.json     |  2 +-
>  6 files changed, 138 insertions(+), 21 deletions(-)

This patch probably ought to extends the docs/devel/migration.rst
file. Specifically the text following the 'Stream structure'
heading.

> 
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 6cbedf9e0c9a..30216c1a41d3 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -43,6 +43,13 @@ struct RAMBlock {
>      size_t page_size;
>      /* dirty bitmap used during migration */
>      unsigned long *bmap;
> +    /* shadow dirty bitmap used when migrating to a file */
> +    unsigned long *shadow_bmap;
> +    /* offset in the file pages belonging to this ramblock are saved, used
> +     * only during migration to a file
> +     */
> +    off_t bitmap_offset;
> +    uint64_t pages_offset;
>      /* bitmap of already received pages in postcopy */
>      unsigned long *receivedmap;
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index 11ceea340702..c7383845a5b4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -165,7 +165,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
>      MIGRATION_CAPABILITY_XBZRLE,
>      MIGRATION_CAPABILITY_X_COLO,
>      MIGRATION_CAPABILITY_VALIDATE_UUID,
> -    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
> +    MIGRATION_CAPABILITY_ZERO_COPY_SEND,
> +    MIGRATION_CAPABILITY_FIXED_RAM);
>  
>  /* When we add fault tolerance, we could have several
>     migrations at once.  For now we don't need to add
> @@ -1326,6 +1327,27 @@ static bool migrate_caps_check(bool *cap_list,
>      }
>  #endif
>  
> +    if (cap_list[MIGRATION_CAPABILITY_FIXED_RAM]) {
> +	    if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
> +		    error_setg(errp, "Directly mapped memory incompatible with multifd");
> +		    return false;
> +	    }
> +
> +	    if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
> +		    error_setg(errp, "Directly mapped memory incompatible with xbzrle");
> +		    return false;
> +	    }
> +
> +	    if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
> +		    error_setg(errp, "Directly mapped memory incompatible with compression");
> +		    return false;
> +	    }
> +
> +	    if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
> +		    error_setg(errp, "Directly mapped memory incompatible with postcopy ram");
> +		    return false;
> +	    }
> +    }
>  
>      /* incoming side only */
>      if (runstate_check(RUN_STATE_INMIGRATE) &&
> @@ -2630,6 +2652,11 @@ MultiFDCompression migrate_multifd_compression(void)
>      return s->parameters.multifd_compression;
>  }
>  
> +int migrate_fixed_ram(void)
> +{
> +    return migrate_get_current()->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
> +}
> +
>  int migrate_multifd_zlib_level(void)
>  {
>      MigrationState *s;
> @@ -4190,6 +4217,21 @@ static void *bg_migration_thread(void *opaque)
>      return NULL;
>  }
>  
> +static int
> +migrate_check_fixed_ram(MigrationState *s, Error **errp)
> +{
> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM])
> +        return 0;
> +
> +    if (!qemu_file_is_seekable(s->to_dst_file)) {
> +        error_setg(errp, "Directly mapped memory requires a seekable transport");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +
>  void migrate_fd_connect(MigrationState *s, Error *error_in)
>  {
>      Error *local_err = NULL;
> @@ -4265,6 +4307,12 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>          return;
>      }
>  
> +    if (migrate_check_fixed_ram(s, &local_err) < 0) {
> +	    migrate_fd_cleanup(s);
> +	    migrate_fd_error(s, local_err);
> +	    return;
> +    }
> +
>      if (resume) {
>          /* Wakeup the main migration thread to do the recovery */
>          migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> @@ -4398,6 +4446,7 @@ static Property migration_properties[] = {
>      DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
>  
>      /* Migration capabilities */
> +    DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),
>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>      DEFINE_PROP_MIG_CAP("x-rdma-pin-all", MIGRATION_CAPABILITY_RDMA_PIN_ALL),
>      DEFINE_PROP_MIG_CAP("x-auto-converge", MIGRATION_CAPABILITY_AUTO_CONVERGE),
> diff --git a/migration/migration.h b/migration/migration.h
> index 96f27aba2210..9aab1b16f407 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -410,6 +410,7 @@ bool migrate_zero_blocks(void);
>  bool migrate_dirty_bitmaps(void);
>  bool migrate_ignore_shared(void);
>  bool migrate_validate_uuid(void);
> +int migrate_fixed_ram(void);
>  
>  bool migrate_auto_converge(void);
>  bool migrate_use_multifd(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index dc1de9ddbc68..4f5ddaff356b 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1261,9 +1261,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
>      int len = 0;
>  
>      if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> -        len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
> -        qemu_put_byte(file, 0);
> -        len += 1;
> +        if (migrate_fixed_ram()) {
> +            /* for zero pages we don't need to do anything */
> +            len = 1;
> +        } else {
> +            len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
> +            qemu_put_byte(file, 0);
> +            len += 1;
> +        }
>          ram_release_page(block->idstr, offset);
>      }
>      return len;
> @@ -1342,15 +1347,22 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>  static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>                              uint8_t *buf, bool async)
>  {
> -    ram_transferred_add(save_page_header(rs, rs->f, block,
> -                                         offset | RAM_SAVE_FLAG_PAGE));
> -    if (async) {
> -        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
> -                              migrate_release_ram() &&
> -                              migration_in_postcopy());
> -    } else {
> -        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
> -    }
> +
> +	if (migrate_fixed_ram()) {
> +		qemu_put_buffer_at(rs->f, buf, TARGET_PAGE_SIZE,
> +				   block->pages_offset + offset);
> +		set_bit(offset >> TARGET_PAGE_BITS, block->shadow_bmap);
> +	} else {
> +		ram_transferred_add(save_page_header(rs, rs->f, block,
> +						     offset | RAM_SAVE_FLAG_PAGE));
> +		if (async) {
> +			qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
> +					      migrate_release_ram() &&
> +					      migration_in_postcopy());
> +		} else {
> +			qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
> +		}
> +	}
>      ram_transferred_add(TARGET_PAGE_SIZE);
>      ram_counters.normal++;
>      return 1;
> @@ -2683,6 +2695,8 @@ static void ram_save_cleanup(void *opaque)
>          block->clear_bmap = NULL;
>          g_free(block->bmap);
>          block->bmap = NULL;
> +        g_free(block->shadow_bmap);
> +        block->shadow_bmap = NULL;
>      }
>  
>      xbzrle_cleanup();
> @@ -3044,6 +3058,7 @@ static void ram_list_init_bitmaps(void)
>               */
>              block->bmap = bitmap_new(pages);
>              bitmap_set(block->bmap, 0, pages);
> +            block->shadow_bmap = bitmap_new(block->used_length >> TARGET_PAGE_BITS);
>              block->clear_bmap_shift = shift;
>              block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
>          }
> @@ -3226,12 +3241,34 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>              qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
>              qemu_put_be64(f, block->used_length);
>              if (migrate_postcopy_ram() && block->page_size !=
> -                                          qemu_host_page_size) {
> +                qemu_host_page_size) {
>                  qemu_put_be64(f, block->page_size);
>              }
>              if (migrate_ignore_shared()) {
>                  qemu_put_be64(f, block->mr->addr);
>              }
> +
> +            if (migrate_fixed_ram()) {
> +                long num_pages = block->used_length >> TARGET_PAGE_BITS;
> +                long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
> +
> +
> +                /* Needed for external programs (think analyze-migration.py) */
> +                qemu_put_be32(f, bitmap_size);
> +
> +                /*
> +                 * Make pages offset aligned to TARGET_PAGE_SIZE to enable
> +                 * DIO in the future. Also add 8 to account for the page offset
> +                 * itself
> +                 */
> +                block->bitmap_offset = qemu_get_offset(f) + 8;
> +                block->pages_offset = ROUND_UP(block->bitmap_offset +
> +                                               bitmap_size, TARGET_PAGE_SIZE);

I'm not sure that rounding to TARGET_PAGE_SIZE is sufficient.

If we've built QEMU for a 32-bit i686 target, but are running on a 64-bit
kernel, is TARGET_PAGE_SIZE going to offer sufficient alignement to keep
the kernel happy.

Also O_DIRECT has alignment constraints beyond simply the page size. The
page size alignment is most important for the buffers being passed
back and forth. The on-disk alignment constraint is actually defined by
the filesystem and storage it is on, and on-disk alignment is what
matters when we decide on pages_offset.

If we want to allow saved state to be copied across different filesystems/
hosts, we need to consider the worst case alignment that would exist on
any conceivable host deployment now or in the future.

Given that RAM sizes are measured 1000's of MBs in any scenario where
we care about save/restore speed enough to use the fixed-ram feature,
we can afford to waste a little space.

IOW, we could round pages_offset upto the nearest 1 MB boundary,
which ought to be well enough aligned to cope with any constraint
we might imagine ? Or possibly even align to 4 MB, which is a common
size for small-ish huge pages.


> +                qemu_put_be64(f, block->pages_offset);
> +
> +                /* Now prepare offset for next ramblock */
> +                qemu_set_offset(f, block->pages_offset + block->used_length, SEEK_SET);
> +            }
>          }
>      }
>  
> @@ -3249,6 +3286,17 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> +static void ram_save_shadow_bmap(QEMUFile *f)
> +{
> +    RAMBlock *block;
> +
> +    RAMBLOCK_FOREACH_MIGRATABLE(block) {
> +        long num_pages = block->used_length >> TARGET_PAGE_BITS;
> +        long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
> +        qemu_put_buffer_at(f, (uint8_t *)block->shadow_bmap, bitmap_size, block->bitmap_offset);
> +    }
> +}
> +
>  /**
>   * ram_save_iterate: iterative stage for migration
>   *
> @@ -3358,9 +3406,15 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>              return ret;
>          }
>  
> -        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> -        qemu_fflush(f);
> -        ram_transferred_add(8);
> +	/*
> +	 * For fixed ram we don't want to pollute the migration stream with
> +	 * EOS flags.
> +	 */
> +	if (!migrate_fixed_ram()) {
> +            qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +            qemu_fflush(f);
> +            ram_transferred_add(8);
> +	}
>  
>          ret = qemu_file_get_error(f);
>      }
> @@ -3405,7 +3459,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>              pages = ram_find_and_save_block(rs);
>              /* no more blocks to sent */
>              if (pages == 0) {
> -                break;
> +                if (migrate_fixed_ram()) {
> +                    ram_save_shadow_bmap(f);
> +                }
> +            break;
>              }
>              if (pages < 0) {
>                  ret = pages;
> @@ -3428,8 +3485,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> -    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> -    qemu_fflush(f);
> +    if (!migrate_fixed_ram()) {
> +        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +        qemu_fflush(f);
> +    }
>  
>      return 0;
>  }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 44a222888306..847a8bdfb6ce 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -240,6 +240,7 @@ static bool should_validate_capability(int capability)
>      /* Validate only new capabilities to keep compatibility. */
>      switch (capability) {
>      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
> +    case MIGRATION_CAPABILITY_FIXED_RAM:
>          return true;
>      default:
>          return false;
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88ecf86ac876..6196617171e8 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -485,7 +485,7 @@
>  ##
>  { 'enum': 'MigrationCapability',
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
> -           'compress', 'events', 'postcopy-ram',
> +           'compress', 'events', 'postcopy-ram', 'fixed-ram',
>             { 'name': 'x-colo', 'features': [ 'unstable' ] },
>             'release-ram',
>             'block', 'return-path', 'pause-before-switchover', 'multifd',
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability
  2023-02-10 17:11   ` Daniel P. Berrangé
@ 2023-03-20 11:05     ` Claudio Fontana
  2023-03-20 11:08       ` Claudio Fontana
  0 siblings, 1 reply; 32+ messages in thread
From: Claudio Fontana @ 2023-03-20 11:05 UTC (permalink / raw)
  To: Daniel P. Berrangé, Nikolay Borisov
  Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On 2/10/23 18:11, Daniel P. Berrangé wrote:
> On Fri, Oct 28, 2022 at 01:39:10PM +0300, Nikolay Borisov wrote:
>> Implement 'fixed-ram' feature. The core of the feature is to ensure that
>> each ram page of the migration stream has a specific offset in the
>> resulting migration stream. The reason why we'd want such behavior are
>> two fold:
>>
>>  - When doing a 'fixed-ram' migration the resulting file will have a
>>    bounded size, since pages which are dirtied multiple times will
>>    always go to a fixed location in the file, rather than constantly
>>    being added to a sequential stream. This eliminates cases where a vm
>>    with, say, 1g of ram can result in a migration file that's 10s of
>>    Gbs, provided that the workload constantly redirties memory.
>>
>>  - It paves the way to implement DIO-enabled save/restore of the
>>    migration stream as the pages are ensured to be written at aligned
>>    offsets.
>>
>> The features requires changing the format. First, a bitmap is introduced
>> which tracks which pages have been written (i.e are dirtied) during
>> migration and subsequently it's being written in the resultin file,
>> again at a fixed location for every ramblock. Zero pages are ignored as
>> they'd be zero in the destination migration as well. With the changed
>> format data would look like the following:
>>
>> |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
>>
>> * pc - refers to the page_size/mr->addr members, so newly added members
>> begin from "bitmap_size".
>>
>> This layout is initialized during ram_save_setup so instead of having a
>> sequential stream of pages that follow the ramblock headers the dirty
>> pages for a ramblock follow its header. Since all pages have a fixed
>> location RAM_SAVE_FLAG_EOS is no longer generated on every migration
>> iteration but there is effectively a single RAM_SAVE_FLAG_EOS right at
>> the end.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  include/exec/ramblock.h |  7 +++
>>  migration/migration.c   | 51 +++++++++++++++++++++-
>>  migration/migration.h   |  1 +
>>  migration/ram.c         | 97 +++++++++++++++++++++++++++++++++--------
>>  migration/savevm.c      |  1 +
>>  qapi/migration.json     |  2 +-
>>  6 files changed, 138 insertions(+), 21 deletions(-)
> 
> This patch probably ought to extends the docs/devel/migration.rst
> file. Specifically the text following the 'Stream structure'
> heading.
> 
>>
>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>> index 6cbedf9e0c9a..30216c1a41d3 100644
>> --- a/include/exec/ramblock.h
>> +++ b/include/exec/ramblock.h
>> @@ -43,6 +43,13 @@ struct RAMBlock {
>>      size_t page_size;
>>      /* dirty bitmap used during migration */
>>      unsigned long *bmap;
>> +    /* shadow dirty bitmap used when migrating to a file */
>> +    unsigned long *shadow_bmap;
>> +    /* offset in the file pages belonging to this ramblock are saved, used
>> +     * only during migration to a file
>> +     */
>> +    off_t bitmap_offset;
>> +    uint64_t pages_offset;
>>      /* bitmap of already received pages in postcopy */
>>      unsigned long *receivedmap;
>>  
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 11ceea340702..c7383845a5b4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -165,7 +165,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
>>      MIGRATION_CAPABILITY_XBZRLE,
>>      MIGRATION_CAPABILITY_X_COLO,
>>      MIGRATION_CAPABILITY_VALIDATE_UUID,
>> -    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
>> +    MIGRATION_CAPABILITY_ZERO_COPY_SEND,
>> +    MIGRATION_CAPABILITY_FIXED_RAM);
>>  
>>  /* When we add fault tolerance, we could have several
>>     migrations at once.  For now we don't need to add
>> @@ -1326,6 +1327,27 @@ static bool migrate_caps_check(bool *cap_list,
>>      }
>>  #endif
>>  
>> +    if (cap_list[MIGRATION_CAPABILITY_FIXED_RAM]) {
>> +	    if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
>> +		    error_setg(errp, "Directly mapped memory incompatible with multifd");
>> +		    return false;
>> +	    }
>> +
>> +	    if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
>> +		    error_setg(errp, "Directly mapped memory incompatible with xbzrle");
>> +		    return false;
>> +	    }
>> +
>> +	    if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
>> +		    error_setg(errp, "Directly mapped memory incompatible with compression");
>> +		    return false;
>> +	    }
>> +
>> +	    if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
>> +		    error_setg(errp, "Directly mapped memory incompatible with postcopy ram");
>> +		    return false;
>> +	    }
>> +    }
>>  
>>      /* incoming side only */
>>      if (runstate_check(RUN_STATE_INMIGRATE) &&
>> @@ -2630,6 +2652,11 @@ MultiFDCompression migrate_multifd_compression(void)
>>      return s->parameters.multifd_compression;
>>  }
>>  
>> +int migrate_fixed_ram(void)
>> +{
>> +    return migrate_get_current()->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
>> +}
>> +
>>  int migrate_multifd_zlib_level(void)
>>  {
>>      MigrationState *s;
>> @@ -4190,6 +4217,21 @@ static void *bg_migration_thread(void *opaque)
>>      return NULL;
>>  }
>>  
>> +static int
>> +migrate_check_fixed_ram(MigrationState *s, Error **errp)
>> +{
>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM])
>> +        return 0;
>> +
>> +    if (!qemu_file_is_seekable(s->to_dst_file)) {
>> +        error_setg(errp, "Directly mapped memory requires a seekable transport");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +
>>  void migrate_fd_connect(MigrationState *s, Error *error_in)
>>  {
>>      Error *local_err = NULL;
>> @@ -4265,6 +4307,12 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>          return;
>>      }
>>  
>> +    if (migrate_check_fixed_ram(s, &local_err) < 0) {
>> +	    migrate_fd_cleanup(s);
>> +	    migrate_fd_error(s, local_err);
>> +	    return;
>> +    }
>> +
>>      if (resume) {
>>          /* Wakeup the main migration thread to do the recovery */
>>          migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
>> @@ -4398,6 +4446,7 @@ static Property migration_properties[] = {
>>      DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
>>  
>>      /* Migration capabilities */
>> +    DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),
>>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>>      DEFINE_PROP_MIG_CAP("x-rdma-pin-all", MIGRATION_CAPABILITY_RDMA_PIN_ALL),
>>      DEFINE_PROP_MIG_CAP("x-auto-converge", MIGRATION_CAPABILITY_AUTO_CONVERGE),
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 96f27aba2210..9aab1b16f407 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -410,6 +410,7 @@ bool migrate_zero_blocks(void);
>>  bool migrate_dirty_bitmaps(void);
>>  bool migrate_ignore_shared(void);
>>  bool migrate_validate_uuid(void);
>> +int migrate_fixed_ram(void);
>>  
>>  bool migrate_auto_converge(void);
>>  bool migrate_use_multifd(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index dc1de9ddbc68..4f5ddaff356b 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1261,9 +1261,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
>>      int len = 0;
>>  
>>      if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>> -        len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
>> -        qemu_put_byte(file, 0);
>> -        len += 1;
>> +        if (migrate_fixed_ram()) {
>> +            /* for zero pages we don't need to do anything */
>> +            len = 1;
>> +        } else {
>> +            len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
>> +            qemu_put_byte(file, 0);
>> +            len += 1;
>> +        }
>>          ram_release_page(block->idstr, offset);
>>      }
>>      return len;
>> @@ -1342,15 +1347,22 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>>  static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>>                              uint8_t *buf, bool async)
>>  {
>> -    ram_transferred_add(save_page_header(rs, rs->f, block,
>> -                                         offset | RAM_SAVE_FLAG_PAGE));
>> -    if (async) {
>> -        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
>> -                              migrate_release_ram() &&
>> -                              migration_in_postcopy());
>> -    } else {
>> -        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
>> -    }
>> +
>> +	if (migrate_fixed_ram()) {
>> +		qemu_put_buffer_at(rs->f, buf, TARGET_PAGE_SIZE,
>> +				   block->pages_offset + offset);
>> +		set_bit(offset >> TARGET_PAGE_BITS, block->shadow_bmap);
>> +	} else {
>> +		ram_transferred_add(save_page_header(rs, rs->f, block,
>> +						     offset | RAM_SAVE_FLAG_PAGE));
>> +		if (async) {
>> +			qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
>> +					      migrate_release_ram() &&
>> +					      migration_in_postcopy());
>> +		} else {
>> +			qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
>> +		}
>> +	}
>>      ram_transferred_add(TARGET_PAGE_SIZE);
>>      ram_counters.normal++;
>>      return 1;
>> @@ -2683,6 +2695,8 @@ static void ram_save_cleanup(void *opaque)
>>          block->clear_bmap = NULL;
>>          g_free(block->bmap);
>>          block->bmap = NULL;
>> +        g_free(block->shadow_bmap);
>> +        block->shadow_bmap = NULL;
>>      }
>>  
>>      xbzrle_cleanup();
>> @@ -3044,6 +3058,7 @@ static void ram_list_init_bitmaps(void)
>>               */
>>              block->bmap = bitmap_new(pages);
>>              bitmap_set(block->bmap, 0, pages);
>> +            block->shadow_bmap = bitmap_new(block->used_length >> TARGET_PAGE_BITS);
>>              block->clear_bmap_shift = shift;
>>              block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
>>          }
>> @@ -3226,12 +3241,34 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>              qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
>>              qemu_put_be64(f, block->used_length);
>>              if (migrate_postcopy_ram() && block->page_size !=
>> -                                          qemu_host_page_size) {
>> +                qemu_host_page_size) {
>>                  qemu_put_be64(f, block->page_size);
>>              }
>>              if (migrate_ignore_shared()) {
>>                  qemu_put_be64(f, block->mr->addr);
>>              }
>> +
>> +            if (migrate_fixed_ram()) {
>> +                long num_pages = block->used_length >> TARGET_PAGE_BITS;
>> +                long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
>> +
>> +
>> +                /* Needed for external programs (think analyze-migration.py) */
>> +                qemu_put_be32(f, bitmap_size);
>> +
>> +                /*
>> +                 * Make pages offset aligned to TARGET_PAGE_SIZE to enable
>> +                 * DIO in the future. Also add 8 to account for the page offset
>> +                 * itself
>> +                 */
>> +                block->bitmap_offset = qemu_get_offset(f) + 8;
>> +                block->pages_offset = ROUND_UP(block->bitmap_offset +
>> +                                               bitmap_size, TARGET_PAGE_SIZE);
> 
> I'm not sure that rounding to TARGET_PAGE_SIZE is sufficient.
> 
> If we've built QEMU for a 32-bit i686 target, but are running on a 64-bit
> kernel, is TARGET_PAGE_SIZE going to offer sufficient alignement to keep
> the kernel happy.
> 
> Also O_DIRECT has alignment constraints beyond simply the page size. The
> page size alignment is most important for the buffers being passed
> back and forth. The on-disk alignment constraint is actually defined by
> the filesystem and storage it is on, and on-disk alignment is what
> matters when we decide on pages_offset.

For comparison, in the libvirt iohelper the alignment of DIRECT I/O-friendly transfers is 64K,
as per libvirt commit:

commit 12291656b135ed788e41dadbd2d15e613ddea9b5
Author: Eric Blake <eblake@redhat.com>
Date:   Tue Jul 12 08:35:05 2011 -0600

    save: let iohelper work on O_DIRECT fds
    
    Required for a coming patch where iohelper will operate on O_DIRECT
    fds.  There, the user-space memory must be aligned to file system
    boundaries (at least 512, but using page-aligned works better, and
    some file systems prefer 64k).  Made tougher by the fact that
    VIR_ALLOC won't work on void *, but posix_memalign won't work on
    char * and isn't available everywhere.
    
    This patch makes some simplifying assumptions - namely, output
    to an O_DIRECT fd will only be attempted on an empty seekable
    file (hence, no need to worry about preserving existing data
    on a partial block, and ftruncate will work to undo the effects
    of having to round up the size of the last block written), and
    input from an O_DIRECT fd will only be attempted on a complete
    seekable file with the only possible short read at EOF.
    
    * configure.ac (AC_CHECK_FUNCS_ONCE): Check for posix_memalign.
    * src/util/iohelper.c (runIO): Use aligned memory, and handle
    quirks of O_DIRECT on last write.


> 
> If we want to allow saved state to be copied across different filesystems/
> hosts, we need to consider the worst case alignment that would exist on
> any conceivable host deployment now or in the future.
> 
> Given that RAM sizes are measured 1000's of MBs in any scenario where
> we care about save/restore speed enough to use the fixed-ram feature,
> we can afford to waste a little space.
> 
> IOW, we could round pages_offset upto the nearest 1 MB boundary,
> which ought to be well enough aligned to cope with any constraint
> we might imagine ? Or possibly even align to 4 MB, which is a common
> size for small-ish huge pages.
> 
> 
>> +                qemu_put_be64(f, block->pages_offset);
>> +
>> +                /* Now prepare offset for next ramblock */
>> +                qemu_set_offset(f, block->pages_offset + block->used_length, SEEK_SET);
>> +            }
>>          }
>>      }
>>  
>> @@ -3249,6 +3286,17 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>      return 0;
>>  }
>>  
>> +static void ram_save_shadow_bmap(QEMUFile *f)
>> +{
>> +    RAMBlock *block;
>> +
>> +    RAMBLOCK_FOREACH_MIGRATABLE(block) {
>> +        long num_pages = block->used_length >> TARGET_PAGE_BITS;
>> +        long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
>> +        qemu_put_buffer_at(f, (uint8_t *)block->shadow_bmap, bitmap_size, block->bitmap_offset);
>> +    }
>> +}
>> +
>>  /**
>>   * ram_save_iterate: iterative stage for migration
>>   *
>> @@ -3358,9 +3406,15 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>              return ret;
>>          }
>>  
>> -        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>> -        qemu_fflush(f);
>> -        ram_transferred_add(8);
>> +	/*
>> +	 * For fixed ram we don't want to pollute the migration stream with
>> +	 * EOS flags.
>> +	 */
>> +	if (!migrate_fixed_ram()) {
>> +            qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>> +            qemu_fflush(f);
>> +            ram_transferred_add(8);
>> +	}
>>  
>>          ret = qemu_file_get_error(f);
>>      }
>> @@ -3405,7 +3459,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>>              pages = ram_find_and_save_block(rs);
>>              /* no more blocks to sent */
>>              if (pages == 0) {
>> -                break;
>> +                if (migrate_fixed_ram()) {
>> +                    ram_save_shadow_bmap(f);
>> +                }
>> +            break;
>>              }
>>              if (pages < 0) {
>>                  ret = pages;
>> @@ -3428,8 +3485,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>>          return ret;
>>      }
>>  
>> -    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>> -    qemu_fflush(f);
>> +    if (!migrate_fixed_ram()) {
>> +        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>> +        qemu_fflush(f);
>> +    }
>>  
>>      return 0;
>>  }
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 44a222888306..847a8bdfb6ce 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -240,6 +240,7 @@ static bool should_validate_capability(int capability)
>>      /* Validate only new capabilities to keep compatibility. */
>>      switch (capability) {
>>      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>> +    case MIGRATION_CAPABILITY_FIXED_RAM:
>>          return true;
>>      default:
>>          return false;
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 88ecf86ac876..6196617171e8 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -485,7 +485,7 @@
>>  ##
>>  { 'enum': 'MigrationCapability',
>>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
>> -           'compress', 'events', 'postcopy-ram',
>> +           'compress', 'events', 'postcopy-ram', 'fixed-ram',
>>             { 'name': 'x-colo', 'features': [ 'unstable' ] },
>>             'release-ram',
>>             'block', 'return-path', 'pause-before-switchover', 'multifd',
>> -- 
>> 2.34.1
>>
> 
> With regards,
> Daniel

Thanks,

Claudio


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability
  2023-03-20 11:05     ` Claudio Fontana
@ 2023-03-20 11:08       ` Claudio Fontana
  0 siblings, 0 replies; 32+ messages in thread
From: Claudio Fontana @ 2023-03-20 11:08 UTC (permalink / raw)
  To: Daniel P. Berrangé, Fabiano Rosas
  Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

Added Fabiano to the thread,

CLaudio

On 3/20/23 12:05, Claudio Fontana wrote:
> On 2/10/23 18:11, Daniel P. Berrangé wrote:
>> On Fri, Oct 28, 2022 at 01:39:10PM +0300, Nikolay Borisov wrote:
>>> Implement 'fixed-ram' feature. The core of the feature is to ensure that
>>> each ram page of the migration stream has a specific offset in the
>>> resulting migration stream. The reason why we'd want such behavior are
>>> two fold:
>>>
>>>  - When doing a 'fixed-ram' migration the resulting file will have a
>>>    bounded size, since pages which are dirtied multiple times will
>>>    always go to a fixed location in the file, rather than constantly
>>>    being added to a sequential stream. This eliminates cases where a vm
>>>    with, say, 1g of ram can result in a migration file that's 10s of
>>>    Gbs, provided that the workload constantly redirties memory.
>>>
>>>  - It paves the way to implement DIO-enabled save/restore of the
>>>    migration stream as the pages are ensured to be written at aligned
>>>    offsets.
>>>
>>> The features requires changing the format. First, a bitmap is introduced
>>> which tracks which pages have been written (i.e are dirtied) during
>>> migration and subsequently it's being written in the resultin file,
>>> again at a fixed location for every ramblock. Zero pages are ignored as
>>> they'd be zero in the destination migration as well. With the changed
>>> format data would look like the following:
>>>
>>> |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
>>>
>>> * pc - refers to the page_size/mr->addr members, so newly added members
>>> begin from "bitmap_size".
>>>
>>> This layout is initialized during ram_save_setup so instead of having a
>>> sequential stream of pages that follow the ramblock headers the dirty
>>> pages for a ramblock follow its header. Since all pages have a fixed
>>> location RAM_SAVE_FLAG_EOS is no longer generated on every migration
>>> iteration but there is effectively a single RAM_SAVE_FLAG_EOS right at
>>> the end.
>>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>  include/exec/ramblock.h |  7 +++
>>>  migration/migration.c   | 51 +++++++++++++++++++++-
>>>  migration/migration.h   |  1 +
>>>  migration/ram.c         | 97 +++++++++++++++++++++++++++++++++--------
>>>  migration/savevm.c      |  1 +
>>>  qapi/migration.json     |  2 +-
>>>  6 files changed, 138 insertions(+), 21 deletions(-)
>>
>> This patch probably ought to extends the docs/devel/migration.rst
>> file. Specifically the text following the 'Stream structure'
>> heading.
>>
>>>
>>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>>> index 6cbedf9e0c9a..30216c1a41d3 100644
>>> --- a/include/exec/ramblock.h
>>> +++ b/include/exec/ramblock.h
>>> @@ -43,6 +43,13 @@ struct RAMBlock {
>>>      size_t page_size;
>>>      /* dirty bitmap used during migration */
>>>      unsigned long *bmap;
>>> +    /* shadow dirty bitmap used when migrating to a file */
>>> +    unsigned long *shadow_bmap;
>>> +    /* offset in the file pages belonging to this ramblock are saved, used
>>> +     * only during migration to a file
>>> +     */
>>> +    off_t bitmap_offset;
>>> +    uint64_t pages_offset;
>>>      /* bitmap of already received pages in postcopy */
>>>      unsigned long *receivedmap;
>>>  
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 11ceea340702..c7383845a5b4 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -165,7 +165,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
>>>      MIGRATION_CAPABILITY_XBZRLE,
>>>      MIGRATION_CAPABILITY_X_COLO,
>>>      MIGRATION_CAPABILITY_VALIDATE_UUID,
>>> -    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
>>> +    MIGRATION_CAPABILITY_ZERO_COPY_SEND,
>>> +    MIGRATION_CAPABILITY_FIXED_RAM);
>>>  
>>>  /* When we add fault tolerance, we could have several
>>>     migrations at once.  For now we don't need to add
>>> @@ -1326,6 +1327,27 @@ static bool migrate_caps_check(bool *cap_list,
>>>      }
>>>  #endif
>>>  
>>> +    if (cap_list[MIGRATION_CAPABILITY_FIXED_RAM]) {
>>> +	    if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
>>> +		    error_setg(errp, "Directly mapped memory incompatible with multifd");
>>> +		    return false;
>>> +	    }
>>> +
>>> +	    if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
>>> +		    error_setg(errp, "Directly mapped memory incompatible with xbzrle");
>>> +		    return false;
>>> +	    }
>>> +
>>> +	    if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
>>> +		    error_setg(errp, "Directly mapped memory incompatible with compression");
>>> +		    return false;
>>> +	    }
>>> +
>>> +	    if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
>>> +		    error_setg(errp, "Directly mapped memory incompatible with postcopy ram");
>>> +		    return false;
>>> +	    }
>>> +    }
>>>  
>>>      /* incoming side only */
>>>      if (runstate_check(RUN_STATE_INMIGRATE) &&
>>> @@ -2630,6 +2652,11 @@ MultiFDCompression migrate_multifd_compression(void)
>>>      return s->parameters.multifd_compression;
>>>  }
>>>  
>>> +int migrate_fixed_ram(void)
>>> +{
>>> +    return migrate_get_current()->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
>>> +}
>>> +
>>>  int migrate_multifd_zlib_level(void)
>>>  {
>>>      MigrationState *s;
>>> @@ -4190,6 +4217,21 @@ static void *bg_migration_thread(void *opaque)
>>>      return NULL;
>>>  }
>>>  
>>> +static int
>>> +migrate_check_fixed_ram(MigrationState *s, Error **errp)
>>> +{
>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM])
>>> +        return 0;
>>> +
>>> +    if (!qemu_file_is_seekable(s->to_dst_file)) {
>>> +        error_setg(errp, "Directly mapped memory requires a seekable transport");
>>> +        return -1;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +
>>>  void migrate_fd_connect(MigrationState *s, Error *error_in)
>>>  {
>>>      Error *local_err = NULL;
>>> @@ -4265,6 +4307,12 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>>          return;
>>>      }
>>>  
>>> +    if (migrate_check_fixed_ram(s, &local_err) < 0) {
>>> +	    migrate_fd_cleanup(s);
>>> +	    migrate_fd_error(s, local_err);
>>> +	    return;
>>> +    }
>>> +
>>>      if (resume) {
>>>          /* Wakeup the main migration thread to do the recovery */
>>>          migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
>>> @@ -4398,6 +4446,7 @@ static Property migration_properties[] = {
>>>      DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
>>>  
>>>      /* Migration capabilities */
>>> +    DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),
>>>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>>>      DEFINE_PROP_MIG_CAP("x-rdma-pin-all", MIGRATION_CAPABILITY_RDMA_PIN_ALL),
>>>      DEFINE_PROP_MIG_CAP("x-auto-converge", MIGRATION_CAPABILITY_AUTO_CONVERGE),
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index 96f27aba2210..9aab1b16f407 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -410,6 +410,7 @@ bool migrate_zero_blocks(void);
>>>  bool migrate_dirty_bitmaps(void);
>>>  bool migrate_ignore_shared(void);
>>>  bool migrate_validate_uuid(void);
>>> +int migrate_fixed_ram(void);
>>>  
>>>  bool migrate_auto_converge(void);
>>>  bool migrate_use_multifd(void);
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index dc1de9ddbc68..4f5ddaff356b 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1261,9 +1261,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
>>>      int len = 0;
>>>  
>>>      if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>> -        len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
>>> -        qemu_put_byte(file, 0);
>>> -        len += 1;
>>> +        if (migrate_fixed_ram()) {
>>> +            /* for zero pages we don't need to do anything */
>>> +            len = 1;
>>> +        } else {
>>> +            len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
>>> +            qemu_put_byte(file, 0);
>>> +            len += 1;
>>> +        }
>>>          ram_release_page(block->idstr, offset);
>>>      }
>>>      return len;
>>> @@ -1342,15 +1347,22 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>>>  static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>>>                              uint8_t *buf, bool async)
>>>  {
>>> -    ram_transferred_add(save_page_header(rs, rs->f, block,
>>> -                                         offset | RAM_SAVE_FLAG_PAGE));
>>> -    if (async) {
>>> -        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
>>> -                              migrate_release_ram() &&
>>> -                              migration_in_postcopy());
>>> -    } else {
>>> -        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
>>> -    }
>>> +
>>> +	if (migrate_fixed_ram()) {
>>> +		qemu_put_buffer_at(rs->f, buf, TARGET_PAGE_SIZE,
>>> +				   block->pages_offset + offset);
>>> +		set_bit(offset >> TARGET_PAGE_BITS, block->shadow_bmap);
>>> +	} else {
>>> +		ram_transferred_add(save_page_header(rs, rs->f, block,
>>> +						     offset | RAM_SAVE_FLAG_PAGE));
>>> +		if (async) {
>>> +			qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
>>> +					      migrate_release_ram() &&
>>> +					      migration_in_postcopy());
>>> +		} else {
>>> +			qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
>>> +		}
>>> +	}
>>>      ram_transferred_add(TARGET_PAGE_SIZE);
>>>      ram_counters.normal++;
>>>      return 1;
>>> @@ -2683,6 +2695,8 @@ static void ram_save_cleanup(void *opaque)
>>>          block->clear_bmap = NULL;
>>>          g_free(block->bmap);
>>>          block->bmap = NULL;
>>> +        g_free(block->shadow_bmap);
>>> +        block->shadow_bmap = NULL;
>>>      }
>>>  
>>>      xbzrle_cleanup();
>>> @@ -3044,6 +3058,7 @@ static void ram_list_init_bitmaps(void)
>>>               */
>>>              block->bmap = bitmap_new(pages);
>>>              bitmap_set(block->bmap, 0, pages);
>>> +            block->shadow_bmap = bitmap_new(block->used_length >> TARGET_PAGE_BITS);
>>>              block->clear_bmap_shift = shift;
>>>              block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
>>>          }
>>> @@ -3226,12 +3241,34 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>>              qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
>>>              qemu_put_be64(f, block->used_length);
>>>              if (migrate_postcopy_ram() && block->page_size !=
>>> -                                          qemu_host_page_size) {
>>> +                qemu_host_page_size) {
>>>                  qemu_put_be64(f, block->page_size);
>>>              }
>>>              if (migrate_ignore_shared()) {
>>>                  qemu_put_be64(f, block->mr->addr);
>>>              }
>>> +
>>> +            if (migrate_fixed_ram()) {
>>> +                long num_pages = block->used_length >> TARGET_PAGE_BITS;
>>> +                long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
>>> +
>>> +
>>> +                /* Needed for external programs (think analyze-migration.py) */
>>> +                qemu_put_be32(f, bitmap_size);
>>> +
>>> +                /*
>>> +                 * Make pages offset aligned to TARGET_PAGE_SIZE to enable
>>> +                 * DIO in the future. Also add 8 to account for the page offset
>>> +                 * itself
>>> +                 */
>>> +                block->bitmap_offset = qemu_get_offset(f) + 8;
>>> +                block->pages_offset = ROUND_UP(block->bitmap_offset +
>>> +                                               bitmap_size, TARGET_PAGE_SIZE);
>>
>> I'm not sure that rounding to TARGET_PAGE_SIZE is sufficient.
>>
>> If we've built QEMU for a 32-bit i686 target, but are running on a 64-bit
>> kernel, is TARGET_PAGE_SIZE going to offer sufficient alignement to keep
>> the kernel happy.
>>
>> Also O_DIRECT has alignment constraints beyond simply the page size. The
>> page size alignment is most important for the buffers being passed
>> back and forth. The on-disk alignment constraint is actually defined by
>> the filesystem and storage it is on, and on-disk alignment is what
>> matters when we decide on pages_offset.
> 
> For comparison, in the libvirt iohelper the alignment of DIRECT I/O-friendly transfers is 64K,
> as per libvirt commit:
> 
> commit 12291656b135ed788e41dadbd2d15e613ddea9b5
> Author: Eric Blake <eblake@redhat.com>
> Date:   Tue Jul 12 08:35:05 2011 -0600
> 
>     save: let iohelper work on O_DIRECT fds
>     
>     Required for a coming patch where iohelper will operate on O_DIRECT
>     fds.  There, the user-space memory must be aligned to file system
>     boundaries (at least 512, but using page-aligned works better, and
>     some file systems prefer 64k).  Made tougher by the fact that
>     VIR_ALLOC won't work on void *, but posix_memalign won't work on
>     char * and isn't available everywhere.
>     
>     This patch makes some simplifying assumptions - namely, output
>     to an O_DIRECT fd will only be attempted on an empty seekable
>     file (hence, no need to worry about preserving existing data
>     on a partial block, and ftruncate will work to undo the effects
>     of having to round up the size of the last block written), and
>     input from an O_DIRECT fd will only be attempted on a complete
>     seekable file with the only possible short read at EOF.
>     
>     * configure.ac (AC_CHECK_FUNCS_ONCE): Check for posix_memalign.
>     * src/util/iohelper.c (runIO): Use aligned memory, and handle
>     quirks of O_DIRECT on last write.
> 
> 
>>
>> If we want to allow saved state to be copied across different filesystems/
>> hosts, we need to consider the worst case alignment that would exist on
>> any conceivable host deployment now or in the future.
>>
>> Given that RAM sizes are measured 1000's of MBs in any scenario where
>> we care about save/restore speed enough to use the fixed-ram feature,
>> we can afford to waste a little space.
>>
>> IOW, we could round pages_offset upto the nearest 1 MB boundary,
>> which ought to be well enough aligned to cope with any constraint
>> we might imagine ? Or possibly even align to 4 MB, which is a common
>> size for small-ish huge pages.
>>
>>
>>> +                qemu_put_be64(f, block->pages_offset);
>>> +
>>> +                /* Now prepare offset for next ramblock */
>>> +                qemu_set_offset(f, block->pages_offset + block->used_length, SEEK_SET);
>>> +            }
>>>          }
>>>      }
>>>  
>>> @@ -3249,6 +3286,17 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>>      return 0;
>>>  }
>>>  
>>> +static void ram_save_shadow_bmap(QEMUFile *f)
>>> +{
>>> +    RAMBlock *block;
>>> +
>>> +    RAMBLOCK_FOREACH_MIGRATABLE(block) {
>>> +        long num_pages = block->used_length >> TARGET_PAGE_BITS;
>>> +        long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
>>> +        qemu_put_buffer_at(f, (uint8_t *)block->shadow_bmap, bitmap_size, block->bitmap_offset);
>>> +    }
>>> +}
>>> +
>>>  /**
>>>   * ram_save_iterate: iterative stage for migration
>>>   *
>>> @@ -3358,9 +3406,15 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>>              return ret;
>>>          }
>>>  
>>> -        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>>> -        qemu_fflush(f);
>>> -        ram_transferred_add(8);
>>> +	/*
>>> +	 * For fixed ram we don't want to pollute the migration stream with
>>> +	 * EOS flags.
>>> +	 */
>>> +	if (!migrate_fixed_ram()) {
>>> +            qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>>> +            qemu_fflush(f);
>>> +            ram_transferred_add(8);
>>> +	}
>>>  
>>>          ret = qemu_file_get_error(f);
>>>      }
>>> @@ -3405,7 +3459,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>>>              pages = ram_find_and_save_block(rs);
>>>              /* no more blocks to sent */
>>>              if (pages == 0) {
>>> -                break;
>>> +                if (migrate_fixed_ram()) {
>>> +                    ram_save_shadow_bmap(f);
>>> +                }
>>> +            break;
>>>              }
>>>              if (pages < 0) {
>>>                  ret = pages;
>>> @@ -3428,8 +3485,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>>>          return ret;
>>>      }
>>>  
>>> -    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>>> -    qemu_fflush(f);
>>> +    if (!migrate_fixed_ram()) {
>>> +        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>>> +        qemu_fflush(f);
>>> +    }
>>>  
>>>      return 0;
>>>  }
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index 44a222888306..847a8bdfb6ce 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -240,6 +240,7 @@ static bool should_validate_capability(int capability)
>>>      /* Validate only new capabilities to keep compatibility. */
>>>      switch (capability) {
>>>      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>>> +    case MIGRATION_CAPABILITY_FIXED_RAM:
>>>          return true;
>>>      default:
>>>          return false;
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 88ecf86ac876..6196617171e8 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -485,7 +485,7 @@
>>>  ##
>>>  { 'enum': 'MigrationCapability',
>>>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
>>> -           'compress', 'events', 'postcopy-ram',
>>> +           'compress', 'events', 'postcopy-ram', 'fixed-ram',
>>>             { 'name': 'x-colo', 'features': [ 'unstable' ] },
>>>             'release-ram',
>>>             'block', 'return-path', 'pause-before-switchover', 'multifd',
>>> -- 
>>> 2.34.1
>>>
>>
>> With regards,
>> Daniel
> 
> Thanks,
> 
> Claudio



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 11/14] migration: Refactor precopy ram loading code
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (9 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 12/14] migration: Add support for 'fixed-ram' migration restore Nikolay Borisov
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

To facilitate easier implementaiton of the 'fixed-ram' migration restore
factor out the code responsible for parsing the ramblocks headers. This
also makes ram_load_precopy easier to comprehend.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 migration/ram.c | 142 +++++++++++++++++++++++++++---------------------
 1 file changed, 80 insertions(+), 62 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 4f5ddaff356b..1dd68c221667 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4253,6 +4253,83 @@ void colo_flush_ram_cache(void)
     trace_colo_flush_ram_cache_end();
 }
 
+static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
+{
+    int ret = 0;
+    /* ADVISE is earlier, it shows the source has the postcopy capability on */
+    bool postcopy_advised = postcopy_is_advised();
+
+    assert(block);
+
+    if (!qemu_ram_is_migratable(block)) {
+        error_report("block %s should not be migrated !", block->idstr);
+        ret = -EINVAL;
+    }
+
+    if (length != block->used_length) {
+        Error *local_err = NULL;
+
+        ret = qemu_ram_resize(block, length, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+        }
+    }
+    /* For postcopy we need to check hugepage sizes match */
+    if (postcopy_advised && migrate_postcopy_ram() &&
+        block->page_size != qemu_host_page_size) {
+        uint64_t remote_page_size = qemu_get_be64(f);
+        if (remote_page_size != block->page_size) {
+            error_report("Mismatched RAM page size %s "
+                         "(local) %zd != %" PRId64, block->idstr,
+                         block->page_size, remote_page_size);
+            ret = -EINVAL;
+        }
+    }
+    if (migrate_ignore_shared()) {
+        hwaddr addr = qemu_get_be64(f);
+        if (ramblock_is_ignored(block) &&
+            block->mr->addr != addr) {
+            error_report("Mismatched GPAs for block %s "
+                         "%" PRId64 "!= %" PRId64, block->idstr,
+                         (uint64_t)addr,
+                         (uint64_t)block->mr->addr);
+            ret = -EINVAL;
+        }
+    }
+    ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, block->idstr);
+
+    return ret;
+}
+
+static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+    int ret = 0;
+
+    /* Synchronize RAM block list */
+    while (!ret && total_ram_bytes) {
+        char id[256];
+        RAMBlock *block;
+        ram_addr_t length;
+        int len = qemu_get_byte(f);
+
+        qemu_get_buffer(f, (uint8_t *)id, len);
+        id[len] = 0;
+        length = qemu_get_be64(f);
+
+        block = qemu_ram_block_by_name(id);
+        if (block) {
+            ret = parse_ramblock(f, block, length);
+        } else {
+            error_report("Unknown ramblock \"%s\", cannot accept "
+                         "migration", id);
+            ret = -EINVAL;
+        }
+        total_ram_bytes -= length;
+    }
+
+    return ret;
+}
+
 /**
  * ram_load_precopy: load pages in precopy case
  *
@@ -4267,14 +4344,13 @@ static int ram_load_precopy(QEMUFile *f)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
-    /* ADVISE is earlier, it shows the source has the postcopy capability on */
-    bool postcopy_advised = postcopy_is_advised();
+
     if (!migrate_use_compression()) {
         invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
     }
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
-        ram_addr_t addr, total_ram_bytes;
+        ram_addr_t addr;
         void *host = NULL, *host_bak = NULL;
         uint8_t ch;
 
@@ -4345,65 +4421,7 @@ static int ram_load_precopy(QEMUFile *f)
 
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
         case RAM_SAVE_FLAG_MEM_SIZE:
-            /* Synchronize RAM block list */
-            total_ram_bytes = addr;
-            while (!ret && total_ram_bytes) {
-                RAMBlock *block;
-                char id[256];
-                ram_addr_t length;
-
-                len = qemu_get_byte(f);
-                qemu_get_buffer(f, (uint8_t *)id, len);
-                id[len] = 0;
-                length = qemu_get_be64(f);
-
-                block = qemu_ram_block_by_name(id);
-                if (block && !qemu_ram_is_migratable(block)) {
-                    error_report("block %s should not be migrated !", id);
-                    ret = -EINVAL;
-                } else if (block) {
-                    if (length != block->used_length) {
-                        Error *local_err = NULL;
-
-                        ret = qemu_ram_resize(block, length,
-                                              &local_err);
-                        if (local_err) {
-                            error_report_err(local_err);
-                        }
-                    }
-                    /* For postcopy we need to check hugepage sizes match */
-                    if (postcopy_advised && migrate_postcopy_ram() &&
-                        block->page_size != qemu_host_page_size) {
-                        uint64_t remote_page_size = qemu_get_be64(f);
-                        if (remote_page_size != block->page_size) {
-                            error_report("Mismatched RAM page size %s "
-                                         "(local) %zd != %" PRId64,
-                                         id, block->page_size,
-                                         remote_page_size);
-                            ret = -EINVAL;
-                        }
-                    }
-                    if (migrate_ignore_shared()) {
-                        hwaddr addr = qemu_get_be64(f);
-                        if (ramblock_is_ignored(block) &&
-                            block->mr->addr != addr) {
-                            error_report("Mismatched GPAs for block %s "
-                                         "%" PRId64 "!= %" PRId64,
-                                         id, (uint64_t)addr,
-                                         (uint64_t)block->mr->addr);
-                            ret = -EINVAL;
-                        }
-                    }
-                    ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
-                                          block->idstr);
-                } else {
-                    error_report("Unknown ramblock \"%s\", cannot "
-                                 "accept migration", id);
-                    ret = -EINVAL;
-                }
-
-                total_ram_bytes -= length;
-            }
+            ret = parse_ramblocks(f, addr);
             break;
 
         case RAM_SAVE_FLAG_ZERO:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 12/14] migration: Add support for 'fixed-ram' migration restore
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (10 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 11/14] migration: Refactor precopy ram loading code Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2022-10-28 10:39 ` [PATCH v3 13/14] tests: Add migrate_incoming_qmp helper Nikolay Borisov
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Add the necessary code to parse the format changes for 'fixed-ram'
capability. One of the more notable changes in behavior is that in the
'fixed-ram' case ram pages are restored in one go rather than constantly
looping through the migration stream. Also due to idiosyncrasies of the
format I have added the 'ram_migrated' since it was easier to simply
return directly from ->load_state rather than introducing more
conditionals around the code to prevent ->load_state being called
multiple times (from
qemu_loadvm_section_start_full/qemu_loadvm_section_part_end i.e. from
multiple QEMU_VM_SECTION_(PART|END) flags).

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 migration/migration.h |  2 +
 migration/ram.c       | 95 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 9aab1b16f407..7a832d072415 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -96,6 +96,8 @@ struct MigrationIncomingState {
     bool           have_listen_thread;
     QemuThread     listen_thread;
 
+    bool ram_migrated;
+
     /* For the kernel to send us notifications */
     int       userfault_fd;
     /* To notify the fault_thread to wake, e.g., when need to quit */
diff --git a/migration/ram.c b/migration/ram.c
index 1dd68c221667..e085a2431f88 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4330,6 +4330,90 @@ static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes)
     return ret;
 }
 
+
+static int parse_ramblocks_fixed_ram(QEMUFile *f)
+{
+    int ret = 0;
+
+    while (!ret) {
+        char id[256];
+        RAMBlock *block;
+        ram_addr_t length;
+        unsigned long clear_bit_idx;
+        long num_pages, bitmap_size;
+        int len = qemu_get_byte(f);
+        g_autofree unsigned long *dirty_bitmap = NULL;
+
+        qemu_get_buffer(f, (uint8_t *)id, len);
+        id[len] = 0;
+        length = qemu_get_be64(f);
+
+        block = qemu_ram_block_by_name(id);
+        if (block) {
+            ret = parse_ramblock(f, block, length);
+            if (ret < 0) {
+                return ret;
+            }
+        } else {
+            error_report("Unknown ramblock \"%s\", cannot accept "
+                         "migration", id);
+            ret = -EINVAL;
+            continue;
+        }
+
+        /* 1. read the bitmap size */
+        num_pages = length >> TARGET_PAGE_BITS;
+        bitmap_size = qemu_get_be32(f);
+
+        assert(bitmap_size == BITS_TO_LONGS(num_pages)*sizeof(unsigned long));
+
+        block->pages_offset = qemu_get_be64(f);
+
+        /* 2. read the actual bitmap */
+        dirty_bitmap = g_malloc0(bitmap_size);
+        if (qemu_get_buffer(f, (uint8_t *)dirty_bitmap, bitmap_size) != bitmap_size) {
+            error_report("Error parsing dirty bitmap");
+            return -EINVAL;
+        }
+
+#define BUFSIZE (4*1024*1024)
+        for (unsigned long set_bit_idx = find_first_bit(dirty_bitmap, num_pages);
+             set_bit_idx < num_pages;
+             set_bit_idx = find_next_bit(dirty_bitmap, num_pages, clear_bit_idx + 1)) {
+
+            clear_bit_idx = find_next_zero_bit(dirty_bitmap, num_pages, set_bit_idx + 1);
+            unsigned long len = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
+            ram_addr_t offset = set_bit_idx << TARGET_PAGE_BITS;
+
+            for (size_t read = 0, completed = 0; completed < len; offset += read) {
+                void *host = host_from_ram_block_offset(block, offset);
+                size_t read_len = MIN(len, BUFSIZE);
+
+                read = qemu_get_buffer_at(f, host, read_len,
+                                          block->pages_offset + offset);
+                completed += read;
+            }
+        }
+
+        /* Skip pages array */
+        qemu_set_offset(f, block->pages_offset + length, SEEK_SET);
+
+        /* Check if this is the last ramblock */
+        if (qemu_get_be64(f) == RAM_SAVE_FLAG_EOS) {
+            ret = 1;
+        } else {
+            /*
+             * If not, adjust the internal file index to account for the
+             * previous 64 bit read
+             */
+            qemu_file_skip(f, -8);
+            ret = 0;
+        }
+    }
+
+    return ret;
+}
+
 /**
  * ram_load_precopy: load pages in precopy case
  *
@@ -4349,7 +4433,7 @@ static int ram_load_precopy(QEMUFile *f)
         invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
     }
 
-    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
+    while (!ret && !(flags & RAM_SAVE_FLAG_EOS) && !mis->ram_migrated) {
         ram_addr_t addr;
         void *host = NULL, *host_bak = NULL;
         uint8_t ch;
@@ -4421,7 +4505,14 @@ static int ram_load_precopy(QEMUFile *f)
 
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
         case RAM_SAVE_FLAG_MEM_SIZE:
-            ret = parse_ramblocks(f, addr);
+            if (migrate_fixed_ram()) {
+                ret = parse_ramblocks_fixed_ram(f);
+                if (ret == 1) {
+                    mis->ram_migrated = true;
+                }
+            } else {
+                ret = parse_ramblocks(f, addr);
+            }
             break;
 
         case RAM_SAVE_FLAG_ZERO:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 13/14] tests: Add migrate_incoming_qmp helper
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (11 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 12/14] migration: Add support for 'fixed-ram' migration restore Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 17:13   ` Daniel P. Berrangé
  2022-10-28 10:39 ` [PATCH v3 14/14] tests/qtest: migration-test: Add tests for file-based migration Nikolay Borisov
  2023-02-09 13:32 ` [PATCH v3 00/14] File-based migration support and fixed-ram features Claudio Fontana
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

file-based migration requires the target to initiate its migration after
the source has finished writing out the data in the file. Currently
there's no easy way to initiate 'migrate-incoming', allow this by
introducing migrate_incoming_qmp helper, similarly to migrate_qmp.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 tests/qtest/migration-helpers.c | 19 +++++++++++++++++++
 tests/qtest/migration-helpers.h |  4 ++++
 2 files changed, 23 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index f6f3c6680f57..8161495c2764 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -130,6 +130,25 @@ void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...)
     qobject_unref(rsp);
 }
 
+
+void migrate_incoming_qmp(QTestState *who, const char *uri, const char *fmt, ...)
+{
+    va_list ap;
+    QDict *args, *rsp;
+
+    va_start(ap, fmt);
+    args = qdict_from_vjsonf_nofail(fmt, ap);
+    va_end(ap);
+
+    g_assert(!qdict_haskey(args, "uri"));
+    qdict_put_str(args, "uri", uri);
+
+    rsp = qtest_qmp(who, "{ 'execute': 'migrate-incoming', 'arguments': %p}", args);
+
+    g_assert(qdict_haskey(rsp, "return"));
+    qobject_unref(rsp);
+}
+
 /*
  * Note: caller is responsible to free the returned object via
  * qobject_unref() after use
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index db0684de48b2..c0385aa27e11 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -30,6 +30,10 @@ QDict *qmp_command(QTestState *who, const char *command, ...);
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
+G_GNUC_PRINTF(3, 4)
+void migrate_incoming_qmp(QTestState *who, const char *uri,
+                          const char *fmt, ...);
+
 QDict *migrate_query(QTestState *who);
 QDict *migrate_query_not_failed(QTestState *who);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 13/14] tests: Add migrate_incoming_qmp helper
  2022-10-28 10:39 ` [PATCH v3 13/14] tests: Add migrate_incoming_qmp helper Nikolay Borisov
@ 2023-02-10 17:13   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 17:13 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:13PM +0300, Nikolay Borisov wrote:
> file-based migration requires the target to initiate its migration after
> the source has finished writing out the data in the file. Currently
> there's no easy way to initiate 'migrate-incoming', allow this by
> introducing migrate_incoming_qmp helper, similarly to migrate_qmp.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  tests/qtest/migration-helpers.c | 19 +++++++++++++++++++
>  tests/qtest/migration-helpers.h |  4 ++++
>  2 files changed, 23 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 14/14] tests/qtest: migration-test: Add tests for file-based migration
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (12 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 13/14] tests: Add migrate_incoming_qmp helper Nikolay Borisov
@ 2022-10-28 10:39 ` Nikolay Borisov
  2023-02-10 17:17   ` Daniel P. Berrangé
  2023-02-09 13:32 ` [PATCH v3 00/14] File-based migration support and fixed-ram features Claudio Fontana
  14 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2022-10-28 10:39 UTC (permalink / raw)
  To: dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli, Nikolay Borisov

Add basic tests for file-based migration as well as for the 'fixed-ram'
feature.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 tests/qtest/migration-test.c | 46 ++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index ef4427ff4d41..de877473f193 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -748,6 +748,7 @@ static void test_migrate_end(QTestState *from, QTestState *to, bool test_dest)
     cleanup("migsocket");
     cleanup("src_serial");
     cleanup("dest_serial");
+    cleanup("migfile");
 }
 
 #ifdef CONFIG_GNUTLS
@@ -1359,6 +1360,14 @@ static void test_precopy_common(MigrateCommon *args)
          * hanging forever if migration didn't converge */
         wait_for_migration_complete(from);
 
+        /*
+         * For file based migration the target must begin its migration after
+         * the source has finished
+         */
+        if (strstr(args->connect_uri, "file:")) {
+            migrate_incoming_qmp(to, args->connect_uri, "{}");
+        }
+
         if (!got_stop) {
             qtest_qmp_eventwait(from, "STOP");
         }
@@ -1514,6 +1523,39 @@ static void test_precopy_unix_xbzrle(void)
     test_precopy_common(&args);
 }
 
+static void test_precopy_unix_file(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+    };
+
+    test_precopy_common(&args);
+}
+
+static void *
+test_migrate_fixed_ram_start(QTestState *from,
+                             QTestState *to)
+{
+    migrate_set_capability(from, "fixed-ram", true);
+    migrate_set_capability(to, "fixed-ram", true);
+
+    return NULL;
+}
+
+static void test_precopy_unix_fixed_ram(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = test_migrate_fixed_ram_start,
+    };
+
+    test_precopy_common(&args);
+}
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -2506,6 +2548,10 @@ int main(int argc, char **argv)
                    test_precopy_unix_tls_psk);
 #endif
 
+    qtest_add_func("/migration/precopy/unix/file", test_precopy_unix_file);
+    qtest_add_func("/migration/precopy/unix/fixed-ram",
+                   test_precopy_unix_fixed_ram);
+
     if (has_uffd) {
         /*
          * NOTE: psk test is enough for postcopy, as other types of TLS
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 14/14] tests/qtest: migration-test: Add tests for file-based migration
  2022-10-28 10:39 ` [PATCH v3 14/14] tests/qtest: migration-test: Add tests for file-based migration Nikolay Borisov
@ 2023-02-10 17:17   ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 17:17 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dgilbert, qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

On Fri, Oct 28, 2022 at 01:39:14PM +0300, Nikolay Borisov wrote:
> Add basic tests for file-based migration as well as for the 'fixed-ram'
> feature.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  tests/qtest/migration-test.c | 46 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index ef4427ff4d41..de877473f193 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -748,6 +748,7 @@ static void test_migrate_end(QTestState *from, QTestState *to, bool test_dest)
>      cleanup("migsocket");
>      cleanup("src_serial");
>      cleanup("dest_serial");
> +    cleanup("migfile");
>  }
>  
>  #ifdef CONFIG_GNUTLS
> @@ -1359,6 +1360,14 @@ static void test_precopy_common(MigrateCommon *args)
>           * hanging forever if migration didn't converge */
>          wait_for_migration_complete(from);
>  
> +        /*
> +         * For file based migration the target must begin its migration after
> +         * the source has finished
> +         */
> +        if (strstr(args->connect_uri, "file:")) {
> +            migrate_incoming_qmp(to, args->connect_uri, "{}");
> +        }
> +
>          if (!got_stop) {
>              qtest_qmp_eventwait(from, "STOP");
>          }
> @@ -1514,6 +1523,39 @@ static void test_precopy_unix_xbzrle(void)
>      test_precopy_common(&args);
>  }
>  
> +static void test_precopy_unix_file(void)
> +{
> +    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
> +    MigrateCommon args = {
> +        .connect_uri = uri,
> +        .listen_uri = "defer",
> +    };
> +
> +    test_precopy_common(&args);
> +}
> +
> +static void *
> +test_migrate_fixed_ram_start(QTestState *from,
> +                             QTestState *to)
> +{
> +    migrate_set_capability(from, "fixed-ram", true);
> +    migrate_set_capability(to, "fixed-ram", true);
> +
> +    return NULL;
> +}
> +
> +static void test_precopy_unix_fixed_ram(void)
> +{
> +    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
> +    MigrateCommon args = {
> +        .connect_uri = uri,
> +        .listen_uri = "defer",
> +        .start_hook = test_migrate_fixed_ram_start,
> +    };
> +
> +    test_precopy_common(&args);
> +}
> +
>  static void test_precopy_tcp_plain(void)
>  {
>      MigrateCommon args = {
> @@ -2506,6 +2548,10 @@ int main(int argc, char **argv)
>                     test_precopy_unix_tls_psk);
>  #endif
>  
> +    qtest_add_func("/migration/precopy/unix/file", test_precopy_unix_file);
> +    qtest_add_func("/migration/precopy/unix/fixed-ram",
> +                   test_precopy_unix_fixed_ram);

Minor point '/unix' would indicate this is testing UNIX socket backend
for migration. The paths for the tests would be better as

    /migration/precopy/file/stream-ram
    /migration/precopy/file/fixed-ram

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 00/14] File-based migration support and fixed-ram features
  2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
                   ` (13 preceding siblings ...)
  2022-10-28 10:39 ` [PATCH v3 14/14] tests/qtest: migration-test: Add tests for file-based migration Nikolay Borisov
@ 2023-02-09 13:32 ` Claudio Fontana
  2023-02-10 15:35   ` Daniel P. Berrangé
  14 siblings, 1 reply; 32+ messages in thread
From: Claudio Fontana @ 2023-02-09 13:32 UTC (permalink / raw)
  To: Nikolay Borisov, dgilbert, berrange
  Cc: qemu-devel, jfehlig, Claudio.Fontana, dfaggioli

Hello Daniel and all,

resurrecting this series from end of last year,

do we think that this is the right approach and first step to be able to provide good performance for virsh save and virsh restore?

For reference, our previous attempt to get the performance for our use case (libvirt + qemu virsh save and restore under 5 seconds for a 30GB VM on NVMe disk) is here:

https://listman.redhat.com/archives/libvir-list/2022-June/232252.html

However since the option of a libvirt-only solution is not acceptable for upstream libvirt, Nikolay's attempt to create a file:/// migration target in QEMU
seemed to be the next preparatory step.

Do we still agree on this way forward, any comments? Thanks,

Claudio

On 10/28/22 12:39, Nikolay Borisov wrote:
> Here's the 3rd version of file-based migration support [0]. For background
> check the cover letter of the initial. The main changes are :
> 
> - Updated commit message as per Daniel Berrange's suggestino for Patches 1-2
> 
> - Fixed tab in various pages
> 
> - Added comments better explaining how json_writer_start_object in
> qemu_savevm_state_header is matched and also squashed the analyze-migration.py
> parts into patch 3
> 
> - Reworked the way pwritv/preadv are introduced. Now there are generic
> callbacks in QIOChannel that are implemented for the QIOChannelFile.
> 
> - Separated the introduction of QEMUFile-related helpers from the patch
> introducing the io interfaces.
> 
> - Added qtests for the file-based migration as well as for the fixed-ram
> feature.
> 
> [0] https://lore.kernel.org/qemu-devel/20221004123733.2745519-1-nborisov@suse.com/
> 
> Nikolay Borisov (14):
>   migration: support file: uri for source migration
>   migration: Add support for 'file:' uri for incoming migration
>   migration: Initial support of fixed-ram feature for
>     analyze-migration.py
>   io: Add generic pwritev/preadv interface
>   io: implement io_pwritev for QIOChannelFile
>   io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
>   migration/qemu-file: add utility methods for working with seekable
>     channels
>   io: Add preadv support to QIOChannelFile
>   migration: add qemu_get_buffer_at
>   migration/ram: Introduce 'fixed-ram' migration stream capability
>   migration: Refactor precopy ram loading code
>   migration: Add support for 'fixed-ram' migration restore
>   tests: Add migrate_incoming_qmp helper
>   tests/qtest: migration-test: Add tests for file-based migration
> 
>  include/exec/ramblock.h             |   7 +
>  include/io/channel.h                |  50 +++++
>  include/migration/qemu-file-types.h |   2 +
>  io/channel-file.c                   |  61 ++++++
>  io/channel.c                        |  26 +++
>  migration/file.c                    |  38 ++++
>  migration/file.h                    |  10 +
>  migration/meson.build               |   1 +
>  migration/migration.c               |  61 +++++-
>  migration/migration.h               |   6 +
>  migration/qemu-file.c               |  82 +++++++
>  migration/qemu-file.h               |   4 +
>  migration/ram.c                     | 328 +++++++++++++++++++++-------
>  migration/savevm.c                  |  48 ++--
>  qapi/migration.json                 |   2 +-
>  scripts/analyze-migration.py        |  49 ++++-
>  tests/qtest/migration-helpers.c     |  19 ++
>  tests/qtest/migration-helpers.h     |   4 +
>  tests/qtest/migration-test.c        |  46 ++++
>  19 files changed, 743 insertions(+), 101 deletions(-)
>  create mode 100644 migration/file.c
>  create mode 100644 migration/file.h
> 
> --
> 2.34.1
> 
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 00/14] File-based migration support and fixed-ram features
  2023-02-09 13:32 ` [PATCH v3 00/14] File-based migration support and fixed-ram features Claudio Fontana
@ 2023-02-10 15:35   ` Daniel P. Berrangé
  2023-03-20 11:14     ` Claudio Fontana
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-02-10 15:35 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Nikolay Borisov, dgilbert, qemu-devel, jfehlig, Claudio.Fontana,
	dfaggioli

On Thu, Feb 09, 2023 at 02:32:01PM +0100, Claudio Fontana wrote:
> Hello Daniel and all,
> 
> resurrecting this series from end of last year,
> 
> do we think that this is the right approach and first step to
> be able to provide good performance for virsh save and virsh
> restore?

I've looked through the series in some more detail now and will
send review comments separately. Overall, I'm pretty pleased with
the series and think it is on the right path. The new format it
provides should be amenable to parallel I/O with multifd and
be able to support O_DIRECT to avoid burning through the host I/O
cache.

There is obviously a bit of extra complexity from having a new
way to map RAM to the output, but it looks fairly well contained
in just a couple of places of the code. The performance wins
should be able to justify the extra maint burden IMHO.

> Do we still agree on this way forward, any comments? Thanks,

I'm not a migration maintainer, but overall I think it is
good.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 00/14] File-based migration support and fixed-ram features
  2023-02-10 15:35   ` Daniel P. Berrangé
@ 2023-03-20 11:14     ` Claudio Fontana
  2023-03-20 11:25       ` Daniel P. Berrangé
  0 siblings, 1 reply; 32+ messages in thread
From: Claudio Fontana @ 2023-03-20 11:14 UTC (permalink / raw)
  To: Daniel P. Berrangé, fabiano.rosas
  Cc: Nikolay Borisov, dgilbert, qemu-devel, jfehlig, Claudio.Fontana,
	dfaggioli

(adding Fabiano to the thread)

On 2/10/23 16:35, Daniel P. Berrangé wrote:
> On Thu, Feb 09, 2023 at 02:32:01PM +0100, Claudio Fontana wrote:
>> Hello Daniel and all,
>>
>> resurrecting this series from end of last year,
>>
>> do we think that this is the right approach and first step to
>> be able to provide good performance for virsh save and virsh
>> restore?
> 
> I've looked through the series in some more detail now and will
> send review comments separately. Overall, I'm pretty pleased with
> the series and think it is on the right path. The new format it
> provides should be amenable to parallel I/O with multifd and
> be able to support O_DIRECT to avoid burning through the host I/O
> cache.

Just wanted to add a thought we had with Fabiano a few days ago:

experimentally, it is clear that fixed-ram is an optimization, but the actual scalability seems to come from the successive parallel I/O with multifd.

Since the goal is being able to transfer _to disk_ (fdatasync) the whole 30G memory in 5 seconds, the need to split the cpu-intensive work into smaller tasks remains,
and the main scalability solution seems to come from the multifd part of the work (or another way to split the problem), combined with the O_DIRECT friendliness to avoid the trap of the cache trashing.

Not adding much, just highlighting that fixed-ram _alone_ does not seem to suffice, we probably need all pieces of the puzzle in place.

Thanks!

Claudio

> 
> There is obviously a bit of extra complexity from having a new
> way to map RAM to the output, but it looks fairly well contained
> in just a couple of places of the code. The performance wins
> should be able to justify the extra maint burden IMHO.
> 
>> Do we still agree on this way forward, any comments? Thanks,
> 
> I'm not a migration maintainer, but overall I think it is
> good.
> 
> With regards,
> Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 00/14] File-based migration support and fixed-ram features
  2023-03-20 11:14     ` Claudio Fontana
@ 2023-03-20 11:25       ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2023-03-20 11:25 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: fabiano.rosas, Nikolay Borisov, dgilbert, qemu-devel, jfehlig,
	Claudio.Fontana, dfaggioli

On Mon, Mar 20, 2023 at 12:14:53PM +0100, Claudio Fontana wrote:
> (adding Fabiano to the thread)
> 
> On 2/10/23 16:35, Daniel P. Berrangé wrote:
> > On Thu, Feb 09, 2023 at 02:32:01PM +0100, Claudio Fontana wrote:
> >> Hello Daniel and all,
> >>
> >> resurrecting this series from end of last year,
> >>
> >> do we think that this is the right approach and first step to
> >> be able to provide good performance for virsh save and virsh
> >> restore?
> > 
> > I've looked through the series in some more detail now and will
> > send review comments separately. Overall, I'm pretty pleased with
> > the series and think it is on the right path. The new format it
> > provides should be amenable to parallel I/O with multifd and
> > be able to support O_DIRECT to avoid burning through the host I/O
> > cache.
> 
> Just wanted to add a thought we had with Fabiano a few days ago:
> 
> experimentally, it is clear that fixed-ram is an optimization, but the actual scalability seems to come from the successive parallel I/O with multifd.
> 
> Since the goal is being able to transfer _to disk_ (fdatasync) the whole 30G memory in 5 seconds, the need to split the cpu-intensive work into smaller tasks remains,
> and the main scalability solution seems to come from the multifd part of the work (or another way to split the problem), combined with the O_DIRECT friendliness to avoid the trap of the cache trashing.
> 
> Not adding much, just highlighting that fixed-ram _alone_ does not seem to suffice, we probably need all pieces of the puzzle in place.

Agreed, the only thing that fixed-ram alone really does is to
ensure you have a finite file size if you're saving the state
while the VM is running, because successive dirtying of the
same block will write to the same location in file. Nice to
avoid wasting storage and gives a small speed up on restore.

It isn't going to give the massive speedup you're looking for
though. multifd + O_DIRECT is a critical followup piece to
achieve the major performance improvement. The fixed-ram
feature makes it practical to use multifd, without needing
to split content streams across many files, or have to invent
a way to splice multiple streams into the same file.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-03-20 11:26 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-28 10:39 [PATCH v3 00/14] File-based migration support and fixed-ram features Nikolay Borisov
2022-10-28 10:39 ` [PATCH v3 01/14] migration: support file: uri for source migration Nikolay Borisov
2022-10-28 10:39 ` [PATCH v3 02/14] migration: Add support for 'file:' uri for incoming migration Nikolay Borisov
2023-02-10 15:58   ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 03/14] migration: Initial support of fixed-ram feature for analyze-migration.py Nikolay Borisov
2023-02-10 16:13   ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 04/14] io: Add generic pwritev/preadv interface Nikolay Borisov
2023-02-10 16:26   ` Daniel P. Berrangé
2023-02-10 16:58     ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 05/14] io: implement io_pwritev for QIOChannelFile Nikolay Borisov
2023-02-10 16:34   ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 06/14] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Nikolay Borisov
2023-02-10 16:38   ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 07/14] migration/qemu-file: add utility methods for working with seekable channels Nikolay Borisov
2022-10-28 10:39 ` [PATCH v3 08/14] io: Add preadv support to QIOChannelFile Nikolay Borisov
2023-02-10 16:59   ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 09/14] migration: add qemu_get_buffer_at Nikolay Borisov
2023-02-10 16:59   ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 10/14] migration/ram: Introduce 'fixed-ram' migration stream capability Nikolay Borisov
2023-02-10 17:11   ` Daniel P. Berrangé
2023-03-20 11:05     ` Claudio Fontana
2023-03-20 11:08       ` Claudio Fontana
2022-10-28 10:39 ` [PATCH v3 11/14] migration: Refactor precopy ram loading code Nikolay Borisov
2022-10-28 10:39 ` [PATCH v3 12/14] migration: Add support for 'fixed-ram' migration restore Nikolay Borisov
2022-10-28 10:39 ` [PATCH v3 13/14] tests: Add migrate_incoming_qmp helper Nikolay Borisov
2023-02-10 17:13   ` Daniel P. Berrangé
2022-10-28 10:39 ` [PATCH v3 14/14] tests/qtest: migration-test: Add tests for file-based migration Nikolay Borisov
2023-02-10 17:17   ` Daniel P. Berrangé
2023-02-09 13:32 ` [PATCH v3 00/14] File-based migration support and fixed-ram features Claudio Fontana
2023-02-10 15:35   ` Daniel P. Berrangé
2023-03-20 11:14     ` Claudio Fontana
2023-03-20 11:25       ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).