From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
Juan Quintela <quintela@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Michal Privoznik <mprivozn@redhat.com>
Subject: Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
Date: Mon, 9 Jan 2023 15:34:48 +0100 [thread overview]
Message-ID: <482fadb5-7420-e07b-982d-5b0f3e8c42f8@redhat.com> (raw)
In-Reply-To: <Y7cFplyGc4hIrYZW@x1n>
On 05.01.23 18:15, Peter Xu wrote:
> On Thu, Jan 05, 2023 at 09:35:54AM +0100, David Hildenbrand wrote:
>> On 04.01.23 18:23, Peter Xu wrote:
>>> On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
>>>> Migrating device state before we start iterating is currently impossible.
>>>> Introduce and use qemu_savevm_state_start_precopy(), and use
>>>> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
>>>> state will be saved in qemu_savevm_state_start_precopy() or in
>>>> qemu_savevm_state_complete_precopy_*().
>>>
>>> Can something like this be done in qemu_savevm_state_setup()?
>>
>> Hi Peter,
>
> Hi, David,
>
>>
>> Do you mean
>>
>> (a) Moving qemu_savevm_state_start_precopy() effectively into
>> qemu_savevm_state_setup()
>>
>> (b) Using se->ops->save_setup()
>
> I meant (b).
>
>>
>> I first tried going via (b), but decided to go the current way of using a
>> proper vmstate with properties (instead of e.g., filling the stream
>> manually), which also made vmdesc handling possible (and significantly
>> cleaner).
>>
>> Regarding (a), I decided to not move logic of
>> qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), because it
>> looked cleaner to save device state with the BQL held and for background
>> snapshots, the VM has been stopped. To decouple device state saving from the
>> setup path, just like we do it right now for all vmstates.
>
> Is BQL required or optional? IIUC it's at least still not taken in the
> migration thread path, only in savevm path.
>
>>
>> Having that said, for virtio-mem, it would still work because that state is
>> immutable once migration starts, but it felt cleaner to separate the setup()
>> phase from actual device state saving.
>
> I get the point. My major concerns are:
>
> (1) The new migration priority is changing the semantic of original,
> making it over-complicated
>
> (2) The new precopy-start routine added one more step to the migration
> framework, while it's somehow overlapping (if not to say, mostly the
> same as..) save_setup().
>
> For (1): the old priority was only deciding the order of save entries in
> the global list, nothing more than that. Even if we want to have a
> precopy-start phase, I'd suggest we use something else and keep the
> migration priority simple. Otherwise we really need serious documentation
> for MigrationPriority and if so I'd rather don't bother and not reuse the
> priority field.
>
> For (2), if you see there're a bunch of save_setup() that already does
> things like transferring static data besides the device states. Besides
> the notorious ram_save_setup() there's also dirty_bitmap_save_setup() which
> also sends a bitmap during save_setup() and some others. It looks clean to
> me to do it in the same way as we used to.
>
> Reusing vmstate_save() and vmsd structures are useful too which I totally
> agree. So.. can we just call vmstate_save_state() in the save_setup() of
> the other new vmsd of virtio-mem?
I went halfway that way, by moving stuff into qemu_savevm_state_setup()
and avoiding using a new migration priority. Seems to work:
I think we could go one step further and perform it from a save_setup() callback,
however, I'm not convinced that this gets particularly cleaner (vmdesc handling
eventually).
However, if there are hard feelings, I can look into that. Thanks.
From e501f80dbbca1260445a6dac03053f426fbb572d Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 20 Dec 2022 18:14:33 +0100
Subject: [PATCH] migration: Allow immutable device state to be migrated early
(i.e., before RAM)
For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content. This
information is immutable on the migration source while migration is active,
For example, we want to use this information for proper preallocation
support with migration: currently, we don't preallocate memory on the
migration target, and especially with hugetlb, we can easily run out of
hugetlb pages during RAM migration and will crash (SIGBUS) instead of
catching this gracefully via preallocation.
Migrating device state before we start iterating is currently impossible.
Let's allow for migrating such state during the setup state, indicating
applicable vmstate descriptors using a "immutable" flag.
We have to take care of properly including the early device state in the
vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
a bit sub-optimal, but we use that explicitly or implicitly all over the
place already, so this barely matters in practice.
Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of that.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/migration/vmstate.h | 5 +++
migration/migration.c | 4 ++
migration/migration.h | 4 ++
migration/savevm.c | 85 +++++++++++++++++++++++++++----------
4 files changed, 75 insertions(+), 23 deletions(-)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ad24aa1934..610e4c1e38 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -179,6 +179,11 @@ struct VMStateField {
struct VMStateDescription {
const char *name;
int unmigratable;
+ /*
+ * The state is immutable while migration is active and the state can
+ * be migrated early, during the setup phase.
+ */
+ int immutable;
int version_id;
int minimum_version_id;
MigrationPriority priority;
diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..1d33a7efa0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
s->vm_was_running = false;
s->iteration_initial_bytes = 0;
s->threshold_size = 0;
+
+ json_writer_free(s->vmdesc);
+ s->vmdesc = NULL;
}
int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -4445,6 +4448,7 @@ static void migration_instance_finalize(Object *obj)
qemu_sem_destroy(&ms->rp_state.rp_sem);
qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
error_free(ms->error);
+ json_writer_free(ms->vmdesc);
}
static void migration_instance_init(Object *obj)
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..66511ce532 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
#include "exec/cpu-common.h"
#include "hw/qdev-core.h"
#include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
#include "qemu/thread.h"
#include "qemu/coroutine_int.h"
#include "io/channel.h"
@@ -366,6 +367,9 @@ struct MigrationState {
* This save hostname when out-going migration starts
*/
char *hostname;
+
+ /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
+ JSONWriter *vmdesc;
};
void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..e77f643f52 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -42,7 +42,6 @@
#include "postcopy-ram.h"
#include "qapi/error.h"
#include "qapi/qapi-commands-migration.h"
-#include "qapi/qmp/json-writer.h"
#include "qapi/clone-visitor.h"
#include "qapi/qapi-builtin-visit.h"
#include "qapi/qmp/qerror.h"
@@ -1161,14 +1160,63 @@ bool qemu_savevm_state_guest_unplug_pending(void)
return false;
}
+static int qemu_savevm_state_precopy_one_non_iterable(SaveStateEntry *se,
+ QEMUFile *f,
+ JSONWriter *vmdesc)
+{
+ int ret;
+
+ if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
+ trace_savevm_section_skip(se->idstr, se->section_id);
+ return 0;
+ }
+
+ trace_savevm_section_start(se->idstr, se->section_id);
+
+ json_writer_start_object(vmdesc, NULL);
+ json_writer_str(vmdesc, "name", se->idstr);
+ json_writer_int64(vmdesc, "instance_id", se->instance_id);
+
+ save_section_header(f, se, QEMU_VM_SECTION_FULL);
+ ret = vmstate_save(f, se, vmdesc);
+ if (ret) {
+ qemu_file_set_error(f, ret);
+ return ret;
+ }
+ trace_savevm_section_end(se->idstr, se->section_id, 0);
+ save_section_footer(f, se);
+
+ json_writer_end_object(vmdesc);
+ return 0;
+}
+
void qemu_savevm_state_setup(QEMUFile *f)
{
- SaveStateEntry *se;
+ MigrationState *ms = migrate_get_current();
Error *local_err = NULL;
+ SaveStateEntry *se;
+ JSONWriter *vmdesc;
int ret;
+ assert(!ms->vmdesc);
+ ms->vmdesc = vmdesc = json_writer_new(false);
+ json_writer_start_object(vmdesc, NULL);
+ json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
+ json_writer_start_array(vmdesc, "devices");
+
trace_savevm_state_setup();
QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+ if (se->vmsd) {
+ if (!se->vmsd->immutable) {
+ continue;
+ }
+ ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
+ if (ret) {
+ break;
+ }
+ continue;
+ }
+
if (!se->ops || !se->ops->save_setup) {
continue;
}
@@ -1364,41 +1412,28 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
bool in_postcopy,
bool inactivate_disks)
{
- g_autoptr(JSONWriter) vmdesc = NULL;
+ MigrationState *ms = migrate_get_current();
+ JSONWriter *vmdesc = ms->vmdesc;
int vmdesc_len;
SaveStateEntry *se;
int ret;
- vmdesc = json_writer_new(false);
- json_writer_start_object(vmdesc, NULL);
- json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
- json_writer_start_array(vmdesc, "devices");
- QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+ /* qemu_savevm_state_start_precopy() is expected to be called first. */
+ assert(vmdesc);
+ QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
continue;
}
- if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
- trace_savevm_section_skip(se->idstr, se->section_id);
+ if (se->vmsd && se->vmsd->immutable) {
+ /* Already saved during qemu_savevm_state_setup(). */
continue;
}
- trace_savevm_section_start(se->idstr, se->section_id);
-
- json_writer_start_object(vmdesc, NULL);
- json_writer_str(vmdesc, "name", se->idstr);
- json_writer_int64(vmdesc, "instance_id", se->instance_id);
-
- save_section_header(f, se, QEMU_VM_SECTION_FULL);
- ret = vmstate_save(f, se, vmdesc);
+ ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
if (ret) {
- qemu_file_set_error(f, ret);
return ret;
}
- trace_savevm_section_end(se->idstr, se->section_id, 0);
- save_section_footer(f, se);
-
- json_writer_end_object(vmdesc);
}
if (inactivate_disks) {
@@ -1427,6 +1462,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
}
+ /* Free it now to detect any inconsistencies. */
+ json_writer_free(vmdesc);
+ ms->vmdesc = NULL;
+
return 0;
}
--
2.39.0
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2023-01-09 15:28 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
2022-12-23 9:34 ` David Hildenbrand
2023-01-05 1:27 ` Michael S. Tsirkin
2023-01-05 8:20 ` David Hildenbrand
2023-01-04 17:23 ` Peter Xu
2023-01-05 8:35 ` David Hildenbrand
2023-01-05 17:15 ` Peter Xu
2023-01-09 14:34 ` David Hildenbrand [this message]
2023-01-09 19:54 ` Peter Xu
2023-01-10 10:18 ` David Hildenbrand
2023-01-10 11:52 ` David Hildenbrand
2023-01-10 20:03 ` Peter Xu
2023-01-11 13:48 ` David Hildenbrand
2023-01-11 16:35 ` Peter Xu
2023-01-11 16:58 ` David Hildenbrand
2023-01-11 17:28 ` Peter Xu
2023-01-11 17:44 ` David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
2023-01-05 17:33 ` Dr. David Alan Gilbert
2022-12-22 11:02 ` [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy David Hildenbrand
2023-01-05 17:18 ` Peter Xu
2023-01-09 14:39 ` David Hildenbrand
2023-01-09 14:42 ` David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 4/6] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 5/6] virtio-mem: Migrate bitmap, size and sanity checks early David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 6/6] virtio-mem: Proper support for preallocation with migration David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=482fadb5-7420-e07b-982d-5b0f3e8c42f8@redhat.com \
--to=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=mprivozn@redhat.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).