* [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle
@ 2024-10-23 18:02 Peter Xu
2024-10-23 18:02 ` [PATCH v2 1/4] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Peter Xu @ 2024-10-23 18:02 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Alex Williamson, Fabiano Rosas, Avihai Horon,
Cédric Le Goater
This is a follow up of below patch from Avihai as a replacement:
https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avihaih@nvidia.com/
This is v2 of the series, and it became a more generic rework on how we do
migration object refcounts, so I skipped a changelog because most of this
is new things.
To put it simple, now I introduced another pointer to migration object, and
here's a simple explanation for both after all change applied (copy-paste
from one of the patch):
/*
* We have two pointers for the global migration objects. Both of them are
* initialized early during QEMU starts, but they have different lifecycles.
*
* - current_migration
*
* This variable reflects the whole lifecycle of the migration object
* (which each QEMU can only have one). It is valid until the migration
* object is destroyed.
*
* This is the object that internal migration so far use. For example,
* internal helper migrate_get_current() references it.
*
* When all migration code can always pass over a MigrationState* around,
* this variable can logically be dropped. But we're not yet there.
*
* - global_migration
*
* This is valid only until the migration object is still valid to the
* outside-migration world (until migration_shutdown()).
*
* This should normally be always set, cleared or accessed by the main
* thread only, rather than the migration thread.
*
* All the exported functions (in include/migration) should reference the
* exported migration object only to avoid race conditions, as
* current_migration can be freed concurrently by migration thread when
* the migration thread holds the last refcount.
*/
It allows all misc.h exported helpers to be used for the whole VM
lifecycle, so as to never crash QEMU with freed migration objects.
Thanks,
Peter Xu (4):
migration: Unexport dirty_bitmap_mig_init() in misc.h
migration: Reset current_migration properly
migration: Add global_migration
migration: Make all helpers in misc.h safe to use without migration
include/migration/misc.h | 29 ++++++++----
migration/migration.h | 4 ++
migration/migration.c | 99 +++++++++++++++++++++++++++++++++++-----
3 files changed, 113 insertions(+), 19 deletions(-)
--
2.45.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/4] migration: Unexport dirty_bitmap_mig_init() in misc.h
2024-10-23 18:02 [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
@ 2024-10-23 18:02 ` Peter Xu
2024-10-23 18:02 ` [PATCH v2 2/4] migration: Reset current_migration properly Peter Xu
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-10-23 18:02 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Alex Williamson, Fabiano Rosas, Avihai Horon,
Cédric Le Goater
It's only used within migration/, so it shouldn't be exported.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/migration/misc.h | 3 ---
migration/migration.h | 4 ++++
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bfadc5613b..df57be6b5e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -108,7 +108,4 @@ bool migration_incoming_postcopy_advised(void);
/* True if background snapshot is active */
bool migration_in_bg_snapshot(void);
-/* migration/block-dirty-bitmap.c */
-void dirty_bitmap_mig_init(void);
-
#endif
diff --git a/migration/migration.h b/migration/migration.h
index 7dc59c5e8d..0956e9274b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -552,4 +552,8 @@ int migration_rp_wait(MigrationState *s);
void migration_rp_kick(MigrationState *s);
void migration_bitmap_sync_precopy(bool last_stage);
+
+/* migration/block-dirty-bitmap.c */
+void dirty_bitmap_mig_init(void);
+
#endif
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/4] migration: Reset current_migration properly
2024-10-23 18:02 [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
2024-10-23 18:02 ` [PATCH v2 1/4] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
@ 2024-10-23 18:02 ` Peter Xu
2024-10-23 18:02 ` [PATCH v2 3/4] migration: Add global_migration Peter Xu
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-10-23 18:02 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Alex Williamson, Fabiano Rosas, Avihai Horon,
Cédric Le Goater
current_migration is never reset, even if the migration object is freed
already. It means anyone references that can trigger UAF and it'll be hard
to debug.
Properly clear the pointer now, so far by doing it in the finalize() (as we
know there's only one instance of it).
Add a TODO entry for it showing that we can do better in the future.
To make it clear, also initialize the variable in the instance_init() so
it's very well paired at least.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index bcb735869b..a82297db0f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -232,9 +232,9 @@ static int migration_stop_vm(MigrationState *s, RunState state)
void migration_object_init(void)
{
- /* This can only be called once. */
- assert(!current_migration);
- current_migration = MIGRATION_OBJ(object_new(TYPE_MIGRATION));
+ MIGRATION_OBJ(object_new(TYPE_MIGRATION));
+ /* This should be set when initialize the object */
+ assert(current_migration);
/*
* Init the migrate incoming object as well no matter whether
@@ -3877,12 +3877,31 @@ static void migration_instance_finalize(Object *obj)
qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
error_free(ms->error);
+
+ /*
+ * We know we only have one intance of migration, and when reaching
+ * here it means migration object is gone. Clear the global reference
+ * to reflect that.
+ */
+ current_migration = NULL;
}
static void migration_instance_init(Object *obj)
{
MigrationState *ms = MIGRATION_OBJ(obj);
+ /*
+ * There can only be one migration object globally. Keep a record of
+ * the pointer in current_migration, which will be reset after the
+ * object finalize().
+ *
+ * TODO: after migration/ code can always take a MigrationObject*
+ * pointer all over the place, logically we can drop current_migration
+ * variable.
+ */
+ assert(!current_migration);
+ current_migration = ms;
+
ms->state = MIGRATION_STATUS_NONE;
ms->mbps = -1;
ms->pages_per_second = -1;
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/4] migration: Add global_migration
2024-10-23 18:02 [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
2024-10-23 18:02 ` [PATCH v2 1/4] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
2024-10-23 18:02 ` [PATCH v2 2/4] migration: Reset current_migration properly Peter Xu
@ 2024-10-23 18:02 ` Peter Xu
2024-10-23 18:02 ` [PATCH v2 4/4] migration: Make all helpers in misc.h safe to use without migration Peter Xu
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-10-23 18:02 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Alex Williamson, Fabiano Rosas, Avihai Horon,
Cédric Le Goater
Add a variable that is only used in exported / global migration helpers,
reflecting whether migration is available to the outside world.
Note that we haven't yet started using this variable, but hopefully that is
still better already because now we have an explicit place to say who owns
the initial migration refcount. In this case, it's the global_migration
pointer (rather than current_migration) that owns it. Then in shutdown()
we clear that pointer right after the unref() would make sense.
We'll start to use the variable in the next patch to provide thread safety
to all migration exported helpers.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 44 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 41 insertions(+), 3 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index a82297db0f..c4adad7972 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -96,7 +96,36 @@ enum mig_rp_message_type {
migrations at once. For now we don't need to add
dynamic creation of migration */
-static MigrationState *current_migration;
+/*
+ * We have two pointers for the global migration objects. Both of them are
+ * initialized early during QEMU starts, but they have different lifecycles.
+ *
+ * - current_migration
+ *
+ * This variable reflects the whole lifecycle of the migration object
+ * (which each QEMU can only have one). It is valid until the migration
+ * object is destroyed.
+ *
+ * This is the object that internal migration so far use. For example,
+ * internal helper migrate_get_current() references it.
+ *
+ * When all migration code can always pass over a MigrationState* around,
+ * this variable can logically be dropped. But we're not yet there.
+ *
+ * - global_migration
+ *
+ * This is valid only until the migration object is still valid to the
+ * outside-migration world (until migration_shutdown()).
+ *
+ * This should normally be always set, cleared or accessed by the main
+ * thread only, rather than the migration thread.
+ *
+ * All the exported functions (in include/migration) should reference the
+ * exported migration object only to avoid race conditions, as
+ * current_migration can be freed concurrently by migration thread when
+ * the migration thread holds the last refcount.
+ */
+static MigrationState *current_migration, *global_migration;
static MigrationIncomingState *current_incoming;
static GSList *migration_blockers[MIG_MODE__MAX];
@@ -232,7 +261,9 @@ static int migration_stop_vm(MigrationState *s, RunState state)
void migration_object_init(void)
{
- MIGRATION_OBJ(object_new(TYPE_MIGRATION));
+ /* The global variable holds the refcount */
+ global_migration = MIGRATION_OBJ(object_new(TYPE_MIGRATION));
+
/* This should be set when initialize the object */
assert(current_migration);
@@ -333,7 +364,14 @@ void migration_shutdown(void)
* stop the migration using this structure
*/
migration_cancel(NULL);
- object_unref(OBJECT(current_migration));
+
+ /*
+ * This marks that migration object is not visible anymore to
+ * outside-migration world. Migration might still hold a refcount if
+ * it's still in progress.
+ */
+ object_unref(OBJECT(global_migration));
+ global_migration = NULL;
/*
* Cancel outgoing migration of dirty bitmaps. It should
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/4] migration: Make all helpers in misc.h safe to use without migration
2024-10-23 18:02 [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
` (2 preceding siblings ...)
2024-10-23 18:02 ` [PATCH v2 3/4] migration: Add global_migration Peter Xu
@ 2024-10-23 18:02 ` Peter Xu
2024-10-23 18:19 ` Peter Xu
2024-10-23 19:25 ` [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
2024-10-23 19:32 ` Fabiano Rosas
5 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2024-10-23 18:02 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Alex Williamson, Fabiano Rosas, Avihai Horon,
Cédric Le Goater, Dr . David Alan Gilbert
Migration object can be freed (e.g. after migration_shutdown()) before some
other device codes can still run, while we do have a bunch of migration
helpers exported in migration/misc.h that logically can be invoked at any
time of QEMU, even during destruction of a VM.
Make all these functions safe to be called, especially, not crashing after
the migration object is unreferenced from the main context.
To achieve it, only reference global_migration variable in the exported
functions. The variable is only reset with BQL, so it's safe to access.
Add a comment in the header explaining how to guarantee thread safe on
using these functions, and we choose BQL because fundamentally that's how
it's working now. We can move to other things (e.g. RCU) whenever
necessary in the future but it's an overkill if we have BQL anyway in
most/all existing callers.
When at it, update some comments, e.g. migrate_announce_params() is
exported from options.c now.
Cc: Cédric Le Goater <clg@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Fabiano Rosas <farosas@suse.de>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/migration/misc.h | 26 +++++++++++++++++++++-----
migration/migration.c | 32 ++++++++++++++++++++++++++------
2 files changed, 47 insertions(+), 11 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index df57be6b5e..48892f9672 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,8 +19,19 @@
#include "qapi/qapi-types-net.h"
#include "migration/client-options.h"
-/* migration/ram.c */
+/*
+ * Misc migration functions exported to be used in QEMU generic system
+ * code outside migration/.
+ *
+ * By default, BQL is recommended before using below functions to avoid
+ * race conditions (concurrent updates to global_migration variable). It's
+ * caller's responsibility to make sure it's thread safe otherwise when
+ * below helpers are used without BQL held. When unsure, take BQL always.
+ */
+/*
+ * migration/ram.c
+ */
typedef enum PrecopyNotifyReason {
PRECOPY_NOTIFY_SETUP = 0,
PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
@@ -43,14 +54,19 @@ void ram_mig_init(void);
void qemu_guest_free_page_hint(void *addr, size_t len);
bool migrate_ram_is_ignored(RAMBlock *block);
-/* migration/block.c */
-
+/*
+ * migration/options.c
+ */
AnnounceParameters *migrate_announce_params(void);
-/* migration/savevm.c */
+/*
+ * migration/savevm.c
+ */
void dump_vmstate_json_to_file(FILE *out_fp);
-/* migration/migration.c */
+/*
+ * migration/migration.c
+ */
void migration_object_init(void);
void migration_shutdown(void);
bool migration_is_idle(void);
diff --git a/migration/migration.c b/migration/migration.c
index c4adad7972..667816cc65 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1157,7 +1157,11 @@ void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
*/
bool migration_is_setup_or_active(void)
{
- MigrationState *s = current_migration;
+ MigrationState *s = global_migration;
+
+ if (!s) {
+ return false;
+ }
switch (s->state) {
case MIGRATION_STATUS_ACTIVE:
@@ -1174,7 +1178,6 @@ bool migration_is_setup_or_active(void)
default:
return false;
-
}
}
@@ -1721,7 +1724,11 @@ bool migration_is_idle(void)
bool migration_is_active(void)
{
- MigrationState *s = current_migration;
+ MigrationState *s = global_migration;
+
+ if (!s) {
+ return false;
+ }
return (s->state == MIGRATION_STATUS_ACTIVE ||
s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1729,14 +1736,23 @@ bool migration_is_active(void)
bool migration_is_device(void)
{
- MigrationState *s = current_migration;
+ MigrationState *s = global_migration;
+
+ if (!s) {
+ return false;
+ }
return s->state == MIGRATION_STATUS_DEVICE;
}
bool migration_thread_is_self(void)
{
- MigrationState *s = current_migration;
+ MigrationState *s = global_migration;
+
+ /* If no migration object, must not be the migration thread */
+ if (!s) {
+ return false;
+ }
return qemu_thread_is_self(&s->thread);
}
@@ -3113,7 +3129,11 @@ static MigThrError postcopy_pause(MigrationState *s)
void migration_file_set_error(int ret, Error *err)
{
- MigrationState *s = current_migration;
+ MigrationState *s = global_migration;
+
+ if (!s) {
+ return;
+ }
WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
if (s->to_dst_file) {
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/4] migration: Make all helpers in misc.h safe to use without migration
2024-10-23 18:02 ` [PATCH v2 4/4] migration: Make all helpers in misc.h safe to use without migration Peter Xu
@ 2024-10-23 18:19 ` Peter Xu
0 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-10-23 18:19 UTC (permalink / raw)
To: QEMU Developers
Cc: Alex Williamson, Fabiano Rosas, Avihai Horon,
Cédric Le Goater, Dr . David Alan Gilbert
[-- Attachment #1: Type: text/plain, Size: 5252 bytes --]
On Wed, Oct 23, 2024, 2:02 p.m. Peter Xu <peterx@redhat.com> wrote:
> Migration object can be freed (e.g. after migration_shutdown()) before some
> other device codes can still run, while we do have a bunch of migration
> helpers exported in migration/misc.h that logically can be invoked at any
> time of QEMU, even during destruction of a VM.
>
> Make all these functions safe to be called, especially, not crashing after
> the migration object is unreferenced from the main context.
>
> To achieve it, only reference global_migration variable in the exported
> functions. The variable is only reset with BQL, so it's safe to access.
>
> Add a comment in the header explaining how to guarantee thread safe on
> using these functions, and we choose BQL because fundamentally that's how
> it's working now. We can move to other things (e.g. RCU) whenever
> necessary in the future but it's an overkill if we have BQL anyway in
> most/all existing callers.
>
> When at it, update some comments, e.g. migrate_announce_params() is
> exported from options.c now.
>
Err I wrongly fetched the list email then lost my local English fix with
s/when/while/. I'll make sure it's fixed ultimately when repost.
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Avihai Horon <avihaih@nvidia.com>
> Cc: Fabiano Rosas <farosas@suse.de>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> include/migration/misc.h | 26 +++++++++++++++++++++-----
> migration/migration.c | 32 ++++++++++++++++++++++++++------
> 2 files changed, 47 insertions(+), 11 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index df57be6b5e..48892f9672 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -19,8 +19,19 @@
> #include "qapi/qapi-types-net.h"
> #include "migration/client-options.h"
>
> -/* migration/ram.c */
> +/*
> + * Misc migration functions exported to be used in QEMU generic system
> + * code outside migration/.
> + *
> + * By default, BQL is recommended before using below functions to avoid
> + * race conditions (concurrent updates to global_migration variable).
> It's
> + * caller's responsibility to make sure it's thread safe otherwise when
> + * below helpers are used without BQL held. When unsure, take BQL always.
> + */
>
> +/*
> + * migration/ram.c
> + */
> typedef enum PrecopyNotifyReason {
> PRECOPY_NOTIFY_SETUP = 0,
> PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
> @@ -43,14 +54,19 @@ void ram_mig_init(void);
> void qemu_guest_free_page_hint(void *addr, size_t len);
> bool migrate_ram_is_ignored(RAMBlock *block);
>
> -/* migration/block.c */
> -
> +/*
> + * migration/options.c
> + */
> AnnounceParameters *migrate_announce_params(void);
> -/* migration/savevm.c */
>
> +/*
> + * migration/savevm.c
> + */
> void dump_vmstate_json_to_file(FILE *out_fp);
>
> -/* migration/migration.c */
> +/*
> + * migration/migration.c
> + */
> void migration_object_init(void);
> void migration_shutdown(void);
> bool migration_is_idle(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index c4adad7972..667816cc65 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1157,7 +1157,11 @@ void
> migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
> */
> bool migration_is_setup_or_active(void)
> {
> - MigrationState *s = current_migration;
> + MigrationState *s = global_migration;
> +
> + if (!s) {
> + return false;
> + }
>
> switch (s->state) {
> case MIGRATION_STATUS_ACTIVE:
> @@ -1174,7 +1178,6 @@ bool migration_is_setup_or_active(void)
>
> default:
> return false;
> -
> }
> }
>
> @@ -1721,7 +1724,11 @@ bool migration_is_idle(void)
>
> bool migration_is_active(void)
> {
> - MigrationState *s = current_migration;
> + MigrationState *s = global_migration;
> +
> + if (!s) {
> + return false;
> + }
>
> return (s->state == MIGRATION_STATUS_ACTIVE ||
> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> @@ -1729,14 +1736,23 @@ bool migration_is_active(void)
>
> bool migration_is_device(void)
> {
> - MigrationState *s = current_migration;
> + MigrationState *s = global_migration;
> +
> + if (!s) {
> + return false;
> + }
>
> return s->state == MIGRATION_STATUS_DEVICE;
> }
>
> bool migration_thread_is_self(void)
> {
> - MigrationState *s = current_migration;
> + MigrationState *s = global_migration;
> +
> + /* If no migration object, must not be the migration thread */
> + if (!s) {
> + return false;
> + }
>
> return qemu_thread_is_self(&s->thread);
> }
> @@ -3113,7 +3129,11 @@ static MigThrError postcopy_pause(MigrationState *s)
>
> void migration_file_set_error(int ret, Error *err)
> {
> - MigrationState *s = current_migration;
> + MigrationState *s = global_migration;
> +
> + if (!s) {
> + return;
> + }
>
> WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> if (s->to_dst_file) {
> --
> 2.45.0
>
>
[-- Attachment #2: Type: text/html, Size: 6708 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle
2024-10-23 18:02 [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
` (3 preceding siblings ...)
2024-10-23 18:02 ` [PATCH v2 4/4] migration: Make all helpers in misc.h safe to use without migration Peter Xu
@ 2024-10-23 19:25 ` Peter Xu
2024-10-23 19:32 ` Fabiano Rosas
5 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-10-23 19:25 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Fabiano Rosas, Avihai Horon,
Cédric Le Goater
On Wed, Oct 23, 2024 at 02:02:12PM -0400, Peter Xu wrote:
> This is a follow up of below patch from Avihai as a replacement:
>
> https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avihaih@nvidia.com/
>
> This is v2 of the series, and it became a more generic rework on how we do
> migration object refcounts, so I skipped a changelog because most of this
> is new things.
>
> To put it simple, now I introduced another pointer to migration object, and
> here's a simple explanation for both after all change applied (copy-paste
> from one of the patch):
>
> /*
> * We have two pointers for the global migration objects. Both of them are
> * initialized early during QEMU starts, but they have different lifecycles.
> *
> * - current_migration
> *
> * This variable reflects the whole lifecycle of the migration object
> * (which each QEMU can only have one). It is valid until the migration
> * object is destroyed.
> *
> * This is the object that internal migration so far use. For example,
> * internal helper migrate_get_current() references it.
> *
> * When all migration code can always pass over a MigrationState* around,
> * this variable can logically be dropped. But we're not yet there.
> *
> * - global_migration
> *
> * This is valid only until the migration object is still valid to the
> * outside-migration world (until migration_shutdown()).
> *
> * This should normally be always set, cleared or accessed by the main
> * thread only, rather than the migration thread.
> *
> * All the exported functions (in include/migration) should reference the
> * exported migration object only to avoid race conditions, as
> * current_migration can be freed concurrently by migration thread when
> * the migration thread holds the last refcount.
> */
>
> It allows all misc.h exported helpers to be used for the whole VM
> lifecycle, so as to never crash QEMU with freed migration objects.
>
> Thanks,
>
> Peter Xu (4):
> migration: Unexport dirty_bitmap_mig_init() in misc.h
> migration: Reset current_migration properly
> migration: Add global_migration
> migration: Make all helpers in misc.h safe to use without migration
>
> include/migration/misc.h | 29 ++++++++----
> migration/migration.h | 4 ++
> migration/migration.c | 99 +++++++++++++++++++++++++++++++++++-----
> 3 files changed, 113 insertions(+), 19 deletions(-)
Sent too soon. This breaks device-introspect-test.. Sorry. I'll look at
that and repost.
Meanwhile please still comment on the idea, especially when one disagrees.
--
Peter Xu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle
2024-10-23 18:02 [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
` (4 preceding siblings ...)
2024-10-23 19:25 ` [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
@ 2024-10-23 19:32 ` Fabiano Rosas
2024-10-23 20:03 ` Peter Xu
5 siblings, 1 reply; 11+ messages in thread
From: Fabiano Rosas @ 2024-10-23 19:32 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: peterx, Alex Williamson, Avihai Horon, Cédric Le Goater
Peter Xu <peterx@redhat.com> writes:
> This is a follow up of below patch from Avihai as a replacement:
>
> https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avihaih@nvidia.com/
>
> This is v2 of the series, and it became a more generic rework on how we do
> migration object refcounts, so I skipped a changelog because most of this
> is new things.
>
> To put it simple, now I introduced another pointer to migration object, and
> here's a simple explanation for both after all change applied (copy-paste
> from one of the patch):
>
> /*
> * We have two pointers for the global migration objects. Both of them are
> * initialized early during QEMU starts, but they have different lifecycles.
The very next person that needs to access one of those in migration code
will get this wrong.
> *
> * - current_migration
> *
> * This variable reflects the whole lifecycle of the migration object
> * (which each QEMU can only have one). It is valid until the migration
> * object is destroyed.
> *
> * This is the object that internal migration so far use. For example,
> * internal helper migrate_get_current() references it.
> *
> * When all migration code can always pass over a MigrationState* around,
> * this variable can logically be dropped. But we're not yet there.
Won't dropping it just bring us back to the situation before this patch?
I'd say we need the opposite, to stop using migrate_get_current()
everywhere in the migration code and instead expose the
current_migration via an internal header.
> *
> * - global_migration
Both are global, we can't prefix one of them with global_.
> *
> * This is valid only until the migration object is still valid to the
> * outside-migration world (until migration_shutdown()).
> *
> * This should normally be always set, cleared or accessed by the main
> * thread only, rather than the migration thread.
> *
> * All the exported functions (in include/migration) should reference the
> * exported migration object only to avoid race conditions, as
> * current_migration can be freed concurrently by migration thread when
> * the migration thread holds the last refcount.
> */
Have you thought of locking the migration object instead?
Having two global pointers to the same object with one having slightly
different lifecycle than the other will certainly lead to misuse.
I worry about mixing the usage of both globals due to some migration
code that needs to call these exported functions or external code
reaching into migration/ through some indirect path. Check global and
try to use current sort of scenarios (and vice-versa).
Similarly, what about when a lingering reference still exists, but
global_migration is already clear? Any migration code looking at
global_migration would do the wrong thing.
>
> It allows all misc.h exported helpers to be used for the whole VM
> lifecycle, so as to never crash QEMU with freed migration objects.
Isn't there a race with the last reference being dropped at
migration_shutdown() and global_migration still being set?
>
> Thanks,
>
> Peter Xu (4):
> migration: Unexport dirty_bitmap_mig_init() in misc.h
> migration: Reset current_migration properly
> migration: Add global_migration
> migration: Make all helpers in misc.h safe to use without migration
>
> include/migration/misc.h | 29 ++++++++----
> migration/migration.h | 4 ++
> migration/migration.c | 99 +++++++++++++++++++++++++++++++++++-----
> 3 files changed, 113 insertions(+), 19 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle
2024-10-23 19:32 ` Fabiano Rosas
@ 2024-10-23 20:03 ` Peter Xu
2024-10-23 21:03 ` Fabiano Rosas
0 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2024-10-23 20:03 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Alex Williamson, Avihai Horon, Cédric Le Goater
On Wed, Oct 23, 2024 at 04:32:01PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > This is a follow up of below patch from Avihai as a replacement:
> >
> > https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avihaih@nvidia.com/
> >
> > This is v2 of the series, and it became a more generic rework on how we do
> > migration object refcounts, so I skipped a changelog because most of this
> > is new things.
> >
> > To put it simple, now I introduced another pointer to migration object, and
> > here's a simple explanation for both after all change applied (copy-paste
> > from one of the patch):
> >
> > /*
> > * We have two pointers for the global migration objects. Both of them are
> > * initialized early during QEMU starts, but they have different lifecycles.
>
> The very next person that needs to access one of those in migration code
> will get this wrong.
Migration code should never access global_migration except during init /
shutdown (or a renamed version of it), that's the plan.. so no one should
use it within migration/.
>
> > *
> > * - current_migration
> > *
> > * This variable reflects the whole lifecycle of the migration object
> > * (which each QEMU can only have one). It is valid until the migration
> > * object is destroyed.
> > *
> > * This is the object that internal migration so far use. For example,
> > * internal helper migrate_get_current() references it.
> > *
> > * When all migration code can always pass over a MigrationState* around,
> > * this variable can logically be dropped. But we're not yet there.
>
> Won't dropping it just bring us back to the situation before this patch?
> I'd say we need the opposite, to stop using migrate_get_current()
> everywhere in the migration code and instead expose the
> current_migration via an internal header.
I meant dropping all the global access to current_migration within
migration/ dir.
Consider all functions has the 1st parameter as MigrationState*, for
example. Then we don't need that for internal use, while global_migration
is still needed for external use, but only for external use.
>
> > *
> > * - global_migration
>
> Both are global, we can't prefix one of them with global_.
I can rename it to migration_export, but the question is whether the name
is the issue, or you'd think having two global variables pointing to
migration object is the issue / concern.
So I think fundaementally we indeed only need one global var there, if we
can cleanup the migration/ everywhere to always take the pointer from the
caller, then migration thread will take 1 refcount and that guarantees it's
available for migration/.
>
> > *
> > * This is valid only until the migration object is still valid to the
> > * outside-migration world (until migration_shutdown()).
> > *
> > * This should normally be always set, cleared or accessed by the main
> > * thread only, rather than the migration thread.
> > *
> > * All the exported functions (in include/migration) should reference the
> > * exported migration object only to avoid race conditions, as
> > * current_migration can be freed concurrently by migration thread when
> > * the migration thread holds the last refcount.
> > */
>
> Have you thought of locking the migration object instead?
Yes, logically we could use RCU if we want, then take BQL for example only
if update. but I worry that's an overkill: we'll need too many rcu read
lock all over the places..
>
> Having two global pointers to the same object with one having slightly
> different lifecycle than the other will certainly lead to misuse.
That's indeed tricky, it's just that this is so far the easiest change I
can think of.
>
> I worry about mixing the usage of both globals due to some migration
> code that needs to call these exported functions or external code
> reaching into migration/ through some indirect path. Check global and
> try to use current sort of scenarios (and vice-versa).
Yeh I get your concern. I actually tried to observe such usages (for now,
especially when migration/ uses the misc.h exported functions) and they're
all safe. I should have mentioned that.
For the other way round, I don't yet expect that to happen: the plan is
anything outside must call a function in include/migration/* and that must
only use global_migration.
>
> Similarly, what about when a lingering reference still exists, but
> global_migration is already clear? Any migration code looking at
> global_migration would do the wrong thing.
That's how it works: migration thread will take one refcount and that'll
happen when migration is running but when VM shutdowns itself. I checked
that all the migration code should be fine when using the exported
functions even if they should reference current_migration.
So logically if with such design, indeed internal migration/ code shouldn't
reference global_migration at all just to be always safe.
Any idea in your mind that can make this easier? I'm definitely open to
suggestions.
>
> >
> > It allows all misc.h exported helpers to be used for the whole VM
> > lifecycle, so as to never crash QEMU with freed migration objects.
>
> Isn't there a race with the last reference being dropped at
> migration_shutdown() and global_migration still being set?
It needs to be protected by BQL in this case, so anyone using exported
functions need to take BQL first.
>
> >
> > Thanks,
> >
> > Peter Xu (4):
> > migration: Unexport dirty_bitmap_mig_init() in misc.h
> > migration: Reset current_migration properly
> > migration: Add global_migration
> > migration: Make all helpers in misc.h safe to use without migration
> >
> > include/migration/misc.h | 29 ++++++++----
> > migration/migration.h | 4 ++
> > migration/migration.c | 99 +++++++++++++++++++++++++++++++++++-----
> > 3 files changed, 113 insertions(+), 19 deletions(-)
>
--
Peter Xu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle
2024-10-23 20:03 ` Peter Xu
@ 2024-10-23 21:03 ` Fabiano Rosas
2024-10-23 21:43 ` Peter Xu
0 siblings, 1 reply; 11+ messages in thread
From: Fabiano Rosas @ 2024-10-23 21:03 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Alex Williamson, Avihai Horon, Cédric Le Goater
Peter Xu <peterx@redhat.com> writes:
> On Wed, Oct 23, 2024 at 04:32:01PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > This is a follow up of below patch from Avihai as a replacement:
>> >
>> > https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avihaih@nvidia.com/
>> >
>> > This is v2 of the series, and it became a more generic rework on how we do
>> > migration object refcounts, so I skipped a changelog because most of this
>> > is new things.
>> >
>> > To put it simple, now I introduced another pointer to migration object, and
>> > here's a simple explanation for both after all change applied (copy-paste
>> > from one of the patch):
>> >
>> > /*
>> > * We have two pointers for the global migration objects. Both of them are
>> > * initialized early during QEMU starts, but they have different lifecycles.
>>
>> The very next person that needs to access one of those in migration code
>> will get this wrong.
>
> Migration code should never access global_migration except during init /
> shutdown (or a renamed version of it), that's the plan.. so no one should
> use it within migration/.
Right, but consider that it's easy enough for someone to look for a
global object to write in the code, find the wrong one and just use
it. It would then be up to reviewers to catch the mistake.
Look at this:
bool migration_is_idle(void)
{
MigrationState *s = current_migration;
...
}
bool migration_is_active(void)
{
MigrationState *s = global_migration;
...
}
Of course we'll get this wrong at some point.
Also, if in the future someone decides to call migration_is_idle() from
outside migration/, we'd be not only adding the burden to change the
variable, but also the functional change at shutdown time.
If we're to carry on with this idea, we'll need to play with some
headers to properly isolate these two usages. Something like:
migration.h:
bool migration_is_active_internal(MigrationState *s);
void set_global_obj(MigrationState *obj);
migration.c:
bool migration_is_active_internal(MigrationState *s)
{
}
void migration_object_init(void)
{
set_global_object(MIGRATION_OBJ(object_new(TYPE_MIGRATION)));
}
exports.h:
#include migration.h
bool migration_is_active(void);
exports.c:
static MigrationState *global_migration;
void set_global_obj(MigrationState *obj)
{
global_migration = obj;
}
bool migration_is_active(void)
{
return migration_is_active_internal(global_migration);
}
That way, the internal code never calls the exported functions with
global, always with current.
>>
>> > *
>> > * - current_migration
>> > *
>> > * This variable reflects the whole lifecycle of the migration object
>> > * (which each QEMU can only have one). It is valid until the migration
>> > * object is destroyed.
>> > *
>> > * This is the object that internal migration so far use. For example,
>> > * internal helper migrate_get_current() references it.
>> > *
>> > * When all migration code can always pass over a MigrationState* around,
>> > * this variable can logically be dropped. But we're not yet there.
>>
>> Won't dropping it just bring us back to the situation before this patch?
>> I'd say we need the opposite, to stop using migrate_get_current()
>> everywhere in the migration code and instead expose the
>> current_migration via an internal header.
>
> I meant dropping all the global access to current_migration within
> migration/ dir.
>
> Consider all functions has the 1st parameter as MigrationState*, for
> example. Then we don't need that for internal use, while global_migration
> is still needed for external use, but only for external use.
>
>>
>> > *
>> > * - global_migration
>>
>> Both are global, we can't prefix one of them with global_.
>
> I can rename it to migration_export, but the question is whether the name
> is the issue, or you'd think having two global variables pointing to
> migration object is the issue / concern.
>
> So I think fundaementally we indeed only need one global var there, if we
> can cleanup the migration/ everywhere to always take the pointer from the
> caller, then migration thread will take 1 refcount and that guarantees it's
> available for migration/.
We can discuss when we get to that point, but that might make the code
too cumbersome. Having to change signatures all the time to
include/remove MigrationState.
>
>>
>> > *
>> > * This is valid only until the migration object is still valid to the
>> > * outside-migration world (until migration_shutdown()).
>> > *
>> > * This should normally be always set, cleared or accessed by the main
>> > * thread only, rather than the migration thread.
>> > *
>> > * All the exported functions (in include/migration) should reference the
>> > * exported migration object only to avoid race conditions, as
>> > * current_migration can be freed concurrently by migration thread when
>> > * the migration thread holds the last refcount.
>> > */
>>
>> Have you thought of locking the migration object instead?
>
> Yes, logically we could use RCU if we want, then take BQL for example only
> if update. but I worry that's an overkill: we'll need too many rcu read
> lock all over the places..
>
>>
>> Having two global pointers to the same object with one having slightly
>> different lifecycle than the other will certainly lead to misuse.
>
> That's indeed tricky, it's just that this is so far the easiest change I
> can think of.
>
>>
>> I worry about mixing the usage of both globals due to some migration
>> code that needs to call these exported functions or external code
>> reaching into migration/ through some indirect path. Check global and
>> try to use current sort of scenarios (and vice-versa).
>
> Yeh I get your concern. I actually tried to observe such usages (for now,
> especially when migration/ uses the misc.h exported functions) and they're
> all safe. I should have mentioned that.
I'm afraid that's not enough, code changes after all. It's not possible
to carry these requirements to the future, we must account for the gaps
now.
>
> For the other way round, I don't yet expect that to happen: the plan is
> anything outside must call a function in include/migration/* and that must
> only use global_migration.
>
>>
>> Similarly, what about when a lingering reference still exists, but
>> global_migration is already clear? Any migration code looking at
>> global_migration would do the wrong thing.
>
> That's how it works: migration thread will take one refcount and that'll
> happen when migration is running but when VM shutdowns itself. I checked
> that all the migration code should be fine when using the exported
> functions even if they should reference current_migration.
>
> So logically if with such design, indeed internal migration/ code shouldn't
> reference global_migration at all just to be always safe.
>
> Any idea in your mind that can make this easier? I'm definitely open to
> suggestions.
>
>>
>> >
>> > It allows all misc.h exported helpers to be used for the whole VM
>> > lifecycle, so as to never crash QEMU with freed migration objects.
>>
>> Isn't there a race with the last reference being dropped at
>> migration_shutdown() and global_migration still being set?
>
> It needs to be protected by BQL in this case, so anyone using exported
> functions need to take BQL first.
Even code under migration/ ? All the various threads? The BQL is a
heavyweight lock. We shouldn't be adding more instances of it unless it
protects against something from the main loop/other subsystems.
>
>>
>> >
>> > Thanks,
>> >
>> > Peter Xu (4):
>> > migration: Unexport dirty_bitmap_mig_init() in misc.h
>> > migration: Reset current_migration properly
>> > migration: Add global_migration
>> > migration: Make all helpers in misc.h safe to use without migration
>> >
>> > include/migration/misc.h | 29 ++++++++----
>> > migration/migration.h | 4 ++
>> > migration/migration.c | 99 +++++++++++++++++++++++++++++++++++-----
>> > 3 files changed, 113 insertions(+), 19 deletions(-)
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle
2024-10-23 21:03 ` Fabiano Rosas
@ 2024-10-23 21:43 ` Peter Xu
0 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-10-23 21:43 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Alex Williamson, Avihai Horon, Cédric Le Goater
On Wed, Oct 23, 2024 at 06:03:36PM -0300, Fabiano Rosas wrote:
> Right, but consider that it's easy enough for someone to look for a
> global object to write in the code, find the wrong one and just use
> it. It would then be up to reviewers to catch the mistake.
>
> Look at this:
>
> bool migration_is_idle(void)
> {
> MigrationState *s = current_migration;
> ...
> }
This is actually something I overlooked (and just fixed it up before this
email..).. so yah I think it's too trivial indeed.
>
> bool migration_is_active(void)
> {
> MigrationState *s = global_migration;
> ...
> }
>
> Of course we'll get this wrong at some point.
>
> Also, if in the future someone decides to call migration_is_idle() from
> outside migration/, we'd be not only adding the burden to change the
> variable, but also the functional change at shutdown time.
>
> If we're to carry on with this idea, we'll need to play with some
> headers to properly isolate these two usages. Something like:
>
> migration.h:
> bool migration_is_active_internal(MigrationState *s);
> void set_global_obj(MigrationState *obj);
>
> migration.c:
> bool migration_is_active_internal(MigrationState *s)
> {
> }
>
> void migration_object_init(void)
> {
> set_global_object(MIGRATION_OBJ(object_new(TYPE_MIGRATION)));
> }
>
> exports.h:
> #include migration.h
> bool migration_is_active(void);
>
> exports.c:
> static MigrationState *global_migration;
>
> void set_global_obj(MigrationState *obj)
> {
> global_migration = obj;
> }
>
> bool migration_is_active(void)
> {
> return migration_is_active_internal(global_migration);
> }
>
> That way, the internal code never calls the exported functions with
> global, always with current.
Yes if so we'll need this if to make it clean.
Now I'm thinking whether we can make it easier, by using a mutex to protect
current_migration accesses, but only when outside migration/ (so migration
thread's refcount will make sure the migration code has safe access to the
variable). Tricky, but maybe working.
The 1st thing we may want to fix is, we never clear current_migration, but
QEMU does free the object after all refcounts released.. it means when
accessed after freed it's UAF and it'll be harder to debug (comparing to
NULL deref). So the 1st thing is to clear current_migration properly,
probably.
The only possible place to reset current_migration is in finalize() (as
proposed in this series), because it can be either main / migration thread
that does the last unref and invoke finalize(), there's no function we can
clear it manually besides the finalize().
But then this leads me to the QOM unit test crash, just noticing that QEMU
has device-introspect-test which can dynamically create a migration object
via qom-list-properties QMP command.
We'll need to think about how to declare a class that can only be initiated
once. Things like INTERFACE_INTERNAL can potentially hide a class (per we
talked just now), but you were right that could make qom-set harder to be
applicable to migration some day. The other approach can be
INTERFACE_SINGLETON so it should still apply to qom-set /
qom-list-properties / ... but it can't be created more than once. Either
of them needs more thoughts as prior work to allow mutex to protect
current_migration.
I'll think about it tomorrow to see what I'll propose next.
--
Peter Xu
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-10-23 21:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 18:02 [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
2024-10-23 18:02 ` [PATCH v2 1/4] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
2024-10-23 18:02 ` [PATCH v2 2/4] migration: Reset current_migration properly Peter Xu
2024-10-23 18:02 ` [PATCH v2 3/4] migration: Add global_migration Peter Xu
2024-10-23 18:02 ` [PATCH v2 4/4] migration: Make all helpers in misc.h safe to use without migration Peter Xu
2024-10-23 18:19 ` Peter Xu
2024-10-23 19:25 ` [PATCH v2 0/4] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
2024-10-23 19:32 ` Fabiano Rosas
2024-10-23 20:03 ` Peter Xu
2024-10-23 21:03 ` Fabiano Rosas
2024-10-23 21:43 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).