* [PATCH 0/2] Migration: Make misc.h helpers available for whole VM lifecycle
@ 2024-10-22 16:07 Peter Xu
2024-10-22 16:07 ` [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration Peter Xu
2024-10-22 16:07 ` [PATCH 2/2] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
0 siblings, 2 replies; 9+ messages in thread
From: Peter Xu @ 2024-10-22 16:07 UTC (permalink / raw)
To: qemu-devel
Cc: Avihai Horon, Alex Williamson, Fabiano Rosas, peterx,
Cédric Le Goater
This is a follow up of below patch from Avihai as a replacement:
https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avihaih@nvidia.com/
It allows all misc.h exported helpers to be used for the whole VM
lifecycle, so as to never crash QEMU with freed migration objects.
I did also add some comments explaining lock requirements for using the
helpers, which used to be ambiguous. Hopefully that clarify things too.
Thanks,
Peter Xu (2):
migration: Make all helpers in misc.h safe to use without migration
migration: Unexport dirty_bitmap_mig_init() in misc.h
include/migration/misc.h | 36 ++++++++++++++++++++++++++++--------
migration/migration.h | 4 ++++
migration/migration.c | 22 +++++++++++++++++++++-
3 files changed, 53 insertions(+), 9 deletions(-)
--
2.45.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration
2024-10-22 16:07 [PATCH 0/2] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
@ 2024-10-22 16:07 ` Peter Xu
2024-10-22 16:11 ` Cédric Le Goater
2024-10-23 8:30 ` Avihai Horon
2024-10-22 16:07 ` [PATCH 2/2] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
1 sibling, 2 replies; 9+ messages in thread
From: Peter Xu @ 2024-10-22 16:07 UTC (permalink / raw)
To: qemu-devel
Cc: Avihai Horon, Alex Williamson, Fabiano Rosas, peterx,
Cédric Le Goater, Dr . David Alan Gilbert
Migration object can be freed before some other device codes run, while we
do have a bunch of migration helpers exported in migration/misc.h that
logically can be invoked at any time of QEMU, even during destruction of a
VM.
Make all these functions safe to be called, especially, not crashing after
the migration object is freed.
Add a rich comment in the header explaining how to guarantee thread safe on
using these functions, and we choose BQL because fundamentally that's how
it's working now. We can move to other things (e.g. RCU) whenever
necessary in the future but it's an overkill if we have BQL anyway in
most/all existing callers.
When at it, update some comments, e.g. migrate_announce_params() is
exported from options.c now.
Cc: Cédric Le Goater <clg@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Fabiano Rosas <farosas@suse.de>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/migration/misc.h | 33 ++++++++++++++++++++++++++++-----
migration/migration.c | 22 +++++++++++++++++++++-
2 files changed, 49 insertions(+), 6 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bfadc5613b..8d6812b8c7 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,8 +19,26 @@
#include "qapi/qapi-types-net.h"
#include "migration/client-options.h"
-/* migration/ram.c */
+/*
+ * Misc migration functions exported to be used in QEMU generic system
+ * code outside migration/.
+ *
+ * By default, BQL is required to use below functions to avoid race
+ * conditions (e.g. concurrent free of the migration object). It's
+ * caller's responsibility to make sure it's thread safe otherwise when
+ * below helpers are used without BQL held.
+ *
+ * One example of the special case is migration_thread(), who will take a
+ * refcount of the migration object. The refcount will make sure the
+ * migration object will not be freed concurrently when accessing through
+ * below helpers.
+ *
+ * When unsure, always take BQL first before using the helpers.
+ */
+/*
+ * migration/ram.c
+ */
typedef enum PrecopyNotifyReason {
PRECOPY_NOTIFY_SETUP = 0,
PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
@@ -43,14 +61,19 @@ void ram_mig_init(void);
void qemu_guest_free_page_hint(void *addr, size_t len);
bool migrate_ram_is_ignored(RAMBlock *block);
-/* migration/block.c */
-
+/*
+ * migration/options.c
+ */
AnnounceParameters *migrate_announce_params(void);
-/* migration/savevm.c */
+/*
+ * migration/savevm.c
+ */
void dump_vmstate_json_to_file(FILE *out_fp);
-/* migration/migration.c */
+/*
+ * migration/migration.c
+ */
void migration_object_init(void);
void migration_shutdown(void);
bool migration_is_idle(void);
diff --git a/migration/migration.c b/migration/migration.c
index bcb735869b..27341eed50 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1121,6 +1121,10 @@ bool migration_is_setup_or_active(void)
{
MigrationState *s = current_migration;
+ if (!s) {
+ return false;
+ }
+
switch (s->state) {
case MIGRATION_STATUS_ACTIVE:
case MIGRATION_STATUS_POSTCOPY_ACTIVE:
@@ -1136,7 +1140,6 @@ bool migration_is_setup_or_active(void)
default:
return false;
-
}
}
@@ -1685,6 +1688,10 @@ bool migration_is_active(void)
{
MigrationState *s = current_migration;
+ if (!s) {
+ return false;
+ }
+
return (s->state == MIGRATION_STATUS_ACTIVE ||
s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
}
@@ -1693,6 +1700,10 @@ bool migration_is_device(void)
{
MigrationState *s = current_migration;
+ if (!s) {
+ return false;
+ }
+
return s->state == MIGRATION_STATUS_DEVICE;
}
@@ -1700,6 +1711,11 @@ bool migration_thread_is_self(void)
{
MigrationState *s = current_migration;
+ /* If no migration object, must not be the migration thread */
+ if (!s) {
+ return false;
+ }
+
return qemu_thread_is_self(&s->thread);
}
@@ -3077,6 +3093,10 @@ void migration_file_set_error(int ret, Error *err)
{
MigrationState *s = current_migration;
+ if (!s) {
+ return;
+ }
+
WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
if (s->to_dst_file) {
qemu_file_set_error_obj(s->to_dst_file, ret, err);
--
2.45.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] migration: Unexport dirty_bitmap_mig_init() in misc.h
2024-10-22 16:07 [PATCH 0/2] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
2024-10-22 16:07 ` [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration Peter Xu
@ 2024-10-22 16:07 ` Peter Xu
2024-10-22 16:13 ` Cédric Le Goater
1 sibling, 1 reply; 9+ messages in thread
From: Peter Xu @ 2024-10-22 16:07 UTC (permalink / raw)
To: qemu-devel
Cc: Avihai Horon, Alex Williamson, Fabiano Rosas, peterx,
Cédric Le Goater
It's only used within migration/, so it shouldn't be exported.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/migration/misc.h | 3 ---
migration/migration.h | 4 ++++
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 8d6812b8c7..e0e88b1c0c 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -131,7 +131,4 @@ bool migration_incoming_postcopy_advised(void);
/* True if background snapshot is active */
bool migration_in_bg_snapshot(void);
-/* migration/block-dirty-bitmap.c */
-void dirty_bitmap_mig_init(void);
-
#endif
diff --git a/migration/migration.h b/migration/migration.h
index 7dc59c5e8d..0956e9274b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -552,4 +552,8 @@ int migration_rp_wait(MigrationState *s);
void migration_rp_kick(MigrationState *s);
void migration_bitmap_sync_precopy(bool last_stage);
+
+/* migration/block-dirty-bitmap.c */
+void dirty_bitmap_mig_init(void);
+
#endif
--
2.45.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration
2024-10-22 16:07 ` [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration Peter Xu
@ 2024-10-22 16:11 ` Cédric Le Goater
2024-10-22 21:52 ` Peter Xu
2024-10-23 8:30 ` Avihai Horon
1 sibling, 1 reply; 9+ messages in thread
From: Cédric Le Goater @ 2024-10-22 16:11 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Avihai Horon, Alex Williamson, Fabiano Rosas,
Dr . David Alan Gilbert
On 10/22/24 18:07, Peter Xu wrote:
> Migration object can be freed before some other device codes run, while we
> do have a bunch of migration helpers exported in migration/misc.h that
> logically can be invoked at any time of QEMU, even during destruction of a
> VM.
>
> Make all these functions safe to be called, especially, not crashing after
> the migration object is freed.
>
> Add a rich comment in the header explaining how to guarantee thread safe on
> using these functions, and we choose BQL because fundamentally that's how
> it's working now. We can move to other things (e.g. RCU) whenever
> necessary in the future but it's an overkill if we have BQL anyway in
> most/all existing callers.
>
> When at it, update some comments, e.g. migrate_announce_params() is
While ?
> exported from options.c now.
>
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Avihai Horon <avihaih@nvidia.com>
> Cc: Fabiano Rosas <farosas@suse.de>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/migration/misc.h | 33 ++++++++++++++++++++++++++++-----
> migration/migration.c | 22 +++++++++++++++++++++-
> 2 files changed, 49 insertions(+), 6 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index bfadc5613b..8d6812b8c7 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -19,8 +19,26 @@
> #include "qapi/qapi-types-net.h"
> #include "migration/client-options.h"
>
> -/* migration/ram.c */
> +/*
> + * Misc migration functions exported to be used in QEMU generic system
> + * code outside migration/.
> + *
> + * By default, BQL is required to use below functions to avoid race
> + * conditions (e.g. concurrent free of the migration object). It's
> + * caller's responsibility to make sure it's thread safe otherwise when
> + * below helpers are used without BQL held.
> + *
> + * One example of the special case is migration_thread(), who will take a
> + * refcount of the migration object. The refcount will make sure the
> + * migration object will not be freed concurrently when accessing through
> + * below helpers.
> + *
> + * When unsure, always take BQL first before using the helpers.
> + */
>
> +/*
> + * migration/ram.c
> + */
> typedef enum PrecopyNotifyReason {
> PRECOPY_NOTIFY_SETUP = 0,
> PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
> @@ -43,14 +61,19 @@ void ram_mig_init(void);
> void qemu_guest_free_page_hint(void *addr, size_t len);
> bool migrate_ram_is_ignored(RAMBlock *block);
>
> -/* migration/block.c */
> -
> +/*
> + * migration/options.c
> + */
> AnnounceParameters *migrate_announce_params(void);
> -/* migration/savevm.c */
>
> +/*
> + * migration/savevm.c
> + */
> void dump_vmstate_json_to_file(FILE *out_fp);
>
> -/* migration/migration.c */
> +/*
> + * migration/migration.c
> + */
> void migration_object_init(void);
> void migration_shutdown(void);
> bool migration_is_idle(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index bcb735869b..27341eed50 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1121,6 +1121,10 @@ bool migration_is_setup_or_active(void)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return false;
> + }
> +
> switch (s->state) {
> case MIGRATION_STATUS_ACTIVE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> @@ -1136,7 +1140,6 @@ bool migration_is_setup_or_active(void)
>
> default:
> return false;
> -
> }
> }
>
> @@ -1685,6 +1688,10 @@ bool migration_is_active(void)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return false;
> + }
> +
> return (s->state == MIGRATION_STATUS_ACTIVE ||
> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> }
> @@ -1693,6 +1700,10 @@ bool migration_is_device(void)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return false;
> + }
> +
> return s->state == MIGRATION_STATUS_DEVICE;
> }
>
> @@ -1700,6 +1711,11 @@ bool migration_thread_is_self(void)
> {
> MigrationState *s = current_migration;
>
> + /* If no migration object, must not be the migration thread */
> + if (!s) {
> + return false;
> + }
> +
> return qemu_thread_is_self(&s->thread);
> }
>
> @@ -3077,6 +3093,10 @@ void migration_file_set_error(int ret, Error *err)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return;
> + }
> +
> WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> if (s->to_dst_file) {
> qemu_file_set_error_obj(s->to_dst_file, ret, err);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] migration: Unexport dirty_bitmap_mig_init() in misc.h
2024-10-22 16:07 ` [PATCH 2/2] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
@ 2024-10-22 16:13 ` Cédric Le Goater
0 siblings, 0 replies; 9+ messages in thread
From: Cédric Le Goater @ 2024-10-22 16:13 UTC (permalink / raw)
To: Peter Xu, qemu-devel; +Cc: Avihai Horon, Alex Williamson, Fabiano Rosas
On 10/22/24 18:07, Peter Xu wrote:
> It's only used within migration/, so it shouldn't be exported.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/migration/misc.h | 3 ---
> migration/migration.h | 4 ++++
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 8d6812b8c7..e0e88b1c0c 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -131,7 +131,4 @@ bool migration_incoming_postcopy_advised(void);
> /* True if background snapshot is active */
> bool migration_in_bg_snapshot(void);
>
> -/* migration/block-dirty-bitmap.c */
> -void dirty_bitmap_mig_init(void);
> -
> #endif
> diff --git a/migration/migration.h b/migration/migration.h
> index 7dc59c5e8d..0956e9274b 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -552,4 +552,8 @@ int migration_rp_wait(MigrationState *s);
> void migration_rp_kick(MigrationState *s);
>
> void migration_bitmap_sync_precopy(bool last_stage);
> +
> +/* migration/block-dirty-bitmap.c */
> +void dirty_bitmap_mig_init(void);
> +
> #endif
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration
2024-10-22 16:11 ` Cédric Le Goater
@ 2024-10-22 21:52 ` Peter Xu
0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2024-10-22 21:52 UTC (permalink / raw)
To: Cédric Le Goater
Cc: qemu-devel, Avihai Horon, Alex Williamson, Fabiano Rosas,
Dr . David Alan Gilbert
On Tue, Oct 22, 2024 at 06:11:19PM +0200, Cédric Le Goater wrote:
> On 10/22/24 18:07, Peter Xu wrote:
> > Migration object can be freed before some other device codes run, while we
> > do have a bunch of migration helpers exported in migration/misc.h that
> > logically can be invoked at any time of QEMU, even during destruction of a
> > VM.
> >
> > Make all these functions safe to be called, especially, not crashing after
> > the migration object is freed.
> >
> > Add a rich comment in the header explaining how to guarantee thread safe on
> > using these functions, and we choose BQL because fundamentally that's how
> > it's working now. We can move to other things (e.g. RCU) whenever
> > necessary in the future but it's an overkill if we have BQL anyway in
> > most/all existing callers.
> >
> > When at it, update some comments, e.g. migrate_announce_params() is
>
> While ?
Will fix. I'll wait for a while, and see whether I should repost or just
fix it up when queue.
>
> > exported from options.c now.
> >
> > Cc: Cédric Le Goater <clg@redhat.com>
> > Cc: Avihai Horon <avihaih@nvidia.com>
> > Cc: Fabiano Rosas <farosas@suse.de>
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
>
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks!
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration
2024-10-22 16:07 ` [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration Peter Xu
2024-10-22 16:11 ` Cédric Le Goater
@ 2024-10-23 8:30 ` Avihai Horon
2024-10-23 15:14 ` Peter Xu
1 sibling, 1 reply; 9+ messages in thread
From: Avihai Horon @ 2024-10-23 8:30 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Alex Williamson, Fabiano Rosas, Cédric Le Goater,
Dr . David Alan Gilbert
On 22/10/2024 19:07, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> Migration object can be freed before some other device codes run, while we
> do have a bunch of migration helpers exported in migration/misc.h that
> logically can be invoked at any time of QEMU, even during destruction of a
> VM.
>
> Make all these functions safe to be called, especially, not crashing after
> the migration object is freed.
>
> Add a rich comment in the header explaining how to guarantee thread safe on
> using these functions, and we choose BQL because fundamentally that's how
> it's working now. We can move to other things (e.g. RCU) whenever
> necessary in the future but it's an overkill if we have BQL anyway in
> most/all existing callers.
>
> When at it, update some comments, e.g. migrate_announce_params() is
> exported from options.c now.
>
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Avihai Horon <avihaih@nvidia.com>
> Cc: Fabiano Rosas <farosas@suse.de>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> include/migration/misc.h | 33 ++++++++++++++++++++++++++++-----
> migration/migration.c | 22 +++++++++++++++++++++-
> 2 files changed, 49 insertions(+), 6 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index bfadc5613b..8d6812b8c7 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -19,8 +19,26 @@
> #include "qapi/qapi-types-net.h"
> #include "migration/client-options.h"
>
> -/* migration/ram.c */
> +/*
> + * Misc migration functions exported to be used in QEMU generic system
> + * code outside migration/.
> + *
> + * By default, BQL is required to use below functions to avoid race
> + * conditions (e.g. concurrent free of the migration object). It's
> + * caller's responsibility to make sure it's thread safe otherwise when
> + * below helpers are used without BQL held.
> + *
> + * One example of the special case is migration_thread(), who will take a
> + * refcount of the migration object. The refcount will make sure the
> + * migration object will not be freed concurrently when accessing through
> + * below helpers.
> + *
> + * When unsure, always take BQL first before using the helpers.
> + */
>
> +/*
> + * migration/ram.c
> + */
> typedef enum PrecopyNotifyReason {
> PRECOPY_NOTIFY_SETUP = 0,
> PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
> @@ -43,14 +61,19 @@ void ram_mig_init(void);
> void qemu_guest_free_page_hint(void *addr, size_t len);
> bool migrate_ram_is_ignored(RAMBlock *block);
>
> -/* migration/block.c */
> -
> +/*
> + * migration/options.c
> + */
> AnnounceParameters *migrate_announce_params(void);
> -/* migration/savevm.c */
>
> +/*
> + * migration/savevm.c
> + */
> void dump_vmstate_json_to_file(FILE *out_fp);
>
> -/* migration/migration.c */
> +/*
> + * migration/migration.c
> + */
> void migration_object_init(void);
> void migration_shutdown(void);
> bool migration_is_idle(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index bcb735869b..27341eed50 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1121,6 +1121,10 @@ bool migration_is_setup_or_active(void)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return false;
> + }
> +
> switch (s->state) {
> case MIGRATION_STATUS_ACTIVE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> @@ -1136,7 +1140,6 @@ bool migration_is_setup_or_active(void)
>
> default:
> return false;
> -
> }
> }
>
> @@ -1685,6 +1688,10 @@ bool migration_is_active(void)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return false;
> + }
> +
> return (s->state == MIGRATION_STATUS_ACTIVE ||
> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> }
> @@ -1693,6 +1700,10 @@ bool migration_is_device(void)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return false;
> + }
> +
> return s->state == MIGRATION_STATUS_DEVICE;
> }
>
> @@ -1700,6 +1711,11 @@ bool migration_thread_is_self(void)
> {
> MigrationState *s = current_migration;
>
> + /* If no migration object, must not be the migration thread */
> + if (!s) {
> + return false;
> + }
> +
> return qemu_thread_is_self(&s->thread);
> }
>
> @@ -3077,6 +3093,10 @@ void migration_file_set_error(int ret, Error *err)
> {
> MigrationState *s = current_migration;
>
> + if (!s) {
> + return;
> + }
> +
I think this is not enough because current_migration is never set to
NULL after it's destroyed.
Can we add "current_migration = NULL;" to migration_instance_finalize()?
Thanks.
> WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> if (s->to_dst_file) {
> qemu_file_set_error_obj(s->to_dst_file, ret, err);
> --
> 2.45.0
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration
2024-10-23 8:30 ` Avihai Horon
@ 2024-10-23 15:14 ` Peter Xu
2024-10-23 15:25 ` Peter Xu
0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2024-10-23 15:14 UTC (permalink / raw)
To: Avihai Horon
Cc: qemu-devel, Alex Williamson, Fabiano Rosas, Cédric Le Goater,
Dr . David Alan Gilbert
On Wed, Oct 23, 2024 at 11:30:14AM +0300, Avihai Horon wrote:
>
> On 22/10/2024 19:07, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > Migration object can be freed before some other device codes run, while we
> > do have a bunch of migration helpers exported in migration/misc.h that
> > logically can be invoked at any time of QEMU, even during destruction of a
> > VM.
> >
> > Make all these functions safe to be called, especially, not crashing after
> > the migration object is freed.
> >
> > Add a rich comment in the header explaining how to guarantee thread safe on
> > using these functions, and we choose BQL because fundamentally that's how
> > it's working now. We can move to other things (e.g. RCU) whenever
> > necessary in the future but it's an overkill if we have BQL anyway in
> > most/all existing callers.
> >
> > When at it, update some comments, e.g. migrate_announce_params() is
> > exported from options.c now.
> >
> > Cc: Cédric Le Goater <clg@redhat.com>
> > Cc: Avihai Horon <avihaih@nvidia.com>
> > Cc: Fabiano Rosas <farosas@suse.de>
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > include/migration/misc.h | 33 ++++++++++++++++++++++++++++-----
> > migration/migration.c | 22 +++++++++++++++++++++-
> > 2 files changed, 49 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > index bfadc5613b..8d6812b8c7 100644
> > --- a/include/migration/misc.h
> > +++ b/include/migration/misc.h
> > @@ -19,8 +19,26 @@
> > #include "qapi/qapi-types-net.h"
> > #include "migration/client-options.h"
> >
> > -/* migration/ram.c */
> > +/*
> > + * Misc migration functions exported to be used in QEMU generic system
> > + * code outside migration/.
> > + *
> > + * By default, BQL is required to use below functions to avoid race
> > + * conditions (e.g. concurrent free of the migration object). It's
> > + * caller's responsibility to make sure it's thread safe otherwise when
> > + * below helpers are used without BQL held.
> > + *
> > + * One example of the special case is migration_thread(), who will take a
> > + * refcount of the migration object. The refcount will make sure the
> > + * migration object will not be freed concurrently when accessing through
> > + * below helpers.
> > + *
> > + * When unsure, always take BQL first before using the helpers.
> > + */
> >
> > +/*
> > + * migration/ram.c
> > + */
> > typedef enum PrecopyNotifyReason {
> > PRECOPY_NOTIFY_SETUP = 0,
> > PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
> > @@ -43,14 +61,19 @@ void ram_mig_init(void);
> > void qemu_guest_free_page_hint(void *addr, size_t len);
> > bool migrate_ram_is_ignored(RAMBlock *block);
> >
> > -/* migration/block.c */
> > -
> > +/*
> > + * migration/options.c
> > + */
> > AnnounceParameters *migrate_announce_params(void);
> > -/* migration/savevm.c */
> >
> > +/*
> > + * migration/savevm.c
> > + */
> > void dump_vmstate_json_to_file(FILE *out_fp);
> >
> > -/* migration/migration.c */
> > +/*
> > + * migration/migration.c
> > + */
> > void migration_object_init(void);
> > void migration_shutdown(void);
> > bool migration_is_idle(void);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index bcb735869b..27341eed50 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1121,6 +1121,10 @@ bool migration_is_setup_or_active(void)
> > {
> > MigrationState *s = current_migration;
> >
> > + if (!s) {
> > + return false;
> > + }
> > +
> > switch (s->state) {
> > case MIGRATION_STATUS_ACTIVE:
> > case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> > @@ -1136,7 +1140,6 @@ bool migration_is_setup_or_active(void)
> >
> > default:
> > return false;
> > -
> > }
> > }
> >
> > @@ -1685,6 +1688,10 @@ bool migration_is_active(void)
> > {
> > MigrationState *s = current_migration;
> >
> > + if (!s) {
> > + return false;
> > + }
> > +
> > return (s->state == MIGRATION_STATUS_ACTIVE ||
> > s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > }
> > @@ -1693,6 +1700,10 @@ bool migration_is_device(void)
> > {
> > MigrationState *s = current_migration;
> >
> > + if (!s) {
> > + return false;
> > + }
> > +
> > return s->state == MIGRATION_STATUS_DEVICE;
> > }
> >
> > @@ -1700,6 +1711,11 @@ bool migration_thread_is_self(void)
> > {
> > MigrationState *s = current_migration;
> >
> > + /* If no migration object, must not be the migration thread */
> > + if (!s) {
> > + return false;
> > + }
> > +
> > return qemu_thread_is_self(&s->thread);
> > }
> >
> > @@ -3077,6 +3093,10 @@ void migration_file_set_error(int ret, Error *err)
> > {
> > MigrationState *s = current_migration;
> >
> > + if (!s) {
> > + return;
> > + }
> > +
>
> I think this is not enough because current_migration is never set to NULL
> after it's destroyed.
>
> Can we add "current_migration = NULL;" to migration_instance_finalize()?
Good point..
I thought it was cleared already in migration_shutdown(), but now I just
noticed why it can't - we have too many dangling references in migration/
so that it needs to be there even if migration threads holds one refcount.
I'll add one more patch as you suggested as of now and repost soon. It's
not the cleanest that we clear that global pointer in a finalize(), but
looks like that's the only simple way forward.
Will repost soon, thanks.
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration
2024-10-23 15:14 ` Peter Xu
@ 2024-10-23 15:25 ` Peter Xu
0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2024-10-23 15:25 UTC (permalink / raw)
To: Avihai Horon
Cc: qemu-devel, Alex Williamson, Fabiano Rosas, Cédric Le Goater,
Dr . David Alan Gilbert
On Wed, Oct 23, 2024 at 11:14:24AM -0400, Peter Xu wrote:
> On Wed, Oct 23, 2024 at 11:30:14AM +0300, Avihai Horon wrote:
> >
> > On 22/10/2024 19:07, Peter Xu wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > Migration object can be freed before some other device codes run, while we
> > > do have a bunch of migration helpers exported in migration/misc.h that
> > > logically can be invoked at any time of QEMU, even during destruction of a
> > > VM.
> > >
> > > Make all these functions safe to be called, especially, not crashing after
> > > the migration object is freed.
> > >
> > > Add a rich comment in the header explaining how to guarantee thread safe on
> > > using these functions, and we choose BQL because fundamentally that's how
> > > it's working now. We can move to other things (e.g. RCU) whenever
> > > necessary in the future but it's an overkill if we have BQL anyway in
> > > most/all existing callers.
> > >
> > > When at it, update some comments, e.g. migrate_announce_params() is
> > > exported from options.c now.
> > >
> > > Cc: Cédric Le Goater <clg@redhat.com>
> > > Cc: Avihai Horon <avihaih@nvidia.com>
> > > Cc: Fabiano Rosas <farosas@suse.de>
> > > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > include/migration/misc.h | 33 ++++++++++++++++++++++++++++-----
> > > migration/migration.c | 22 +++++++++++++++++++++-
> > > 2 files changed, 49 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index bfadc5613b..8d6812b8c7 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -19,8 +19,26 @@
> > > #include "qapi/qapi-types-net.h"
> > > #include "migration/client-options.h"
> > >
> > > -/* migration/ram.c */
> > > +/*
> > > + * Misc migration functions exported to be used in QEMU generic system
> > > + * code outside migration/.
> > > + *
> > > + * By default, BQL is required to use below functions to avoid race
> > > + * conditions (e.g. concurrent free of the migration object). It's
> > > + * caller's responsibility to make sure it's thread safe otherwise when
> > > + * below helpers are used without BQL held.
> > > + *
> > > + * One example of the special case is migration_thread(), who will take a
> > > + * refcount of the migration object. The refcount will make sure the
> > > + * migration object will not be freed concurrently when accessing through
> > > + * below helpers.
> > > + *
> > > + * When unsure, always take BQL first before using the helpers.
> > > + */
> > >
> > > +/*
> > > + * migration/ram.c
> > > + */
> > > typedef enum PrecopyNotifyReason {
> > > PRECOPY_NOTIFY_SETUP = 0,
> > > PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
> > > @@ -43,14 +61,19 @@ void ram_mig_init(void);
> > > void qemu_guest_free_page_hint(void *addr, size_t len);
> > > bool migrate_ram_is_ignored(RAMBlock *block);
> > >
> > > -/* migration/block.c */
> > > -
> > > +/*
> > > + * migration/options.c
> > > + */
> > > AnnounceParameters *migrate_announce_params(void);
> > > -/* migration/savevm.c */
> > >
> > > +/*
> > > + * migration/savevm.c
> > > + */
> > > void dump_vmstate_json_to_file(FILE *out_fp);
> > >
> > > -/* migration/migration.c */
> > > +/*
> > > + * migration/migration.c
> > > + */
> > > void migration_object_init(void);
> > > void migration_shutdown(void);
> > > bool migration_is_idle(void);
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index bcb735869b..27341eed50 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1121,6 +1121,10 @@ bool migration_is_setup_or_active(void)
> > > {
> > > MigrationState *s = current_migration;
> > >
> > > + if (!s) {
> > > + return false;
> > > + }
> > > +
> > > switch (s->state) {
> > > case MIGRATION_STATUS_ACTIVE:
> > > case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> > > @@ -1136,7 +1140,6 @@ bool migration_is_setup_or_active(void)
> > >
> > > default:
> > > return false;
> > > -
> > > }
> > > }
> > >
> > > @@ -1685,6 +1688,10 @@ bool migration_is_active(void)
> > > {
> > > MigrationState *s = current_migration;
> > >
> > > + if (!s) {
> > > + return false;
> > > + }
> > > +
> > > return (s->state == MIGRATION_STATUS_ACTIVE ||
> > > s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > }
> > > @@ -1693,6 +1700,10 @@ bool migration_is_device(void)
> > > {
> > > MigrationState *s = current_migration;
> > >
> > > + if (!s) {
> > > + return false;
> > > + }
> > > +
> > > return s->state == MIGRATION_STATUS_DEVICE;
> > > }
> > >
> > > @@ -1700,6 +1711,11 @@ bool migration_thread_is_self(void)
> > > {
> > > MigrationState *s = current_migration;
> > >
> > > + /* If no migration object, must not be the migration thread */
> > > + if (!s) {
> > > + return false;
> > > + }
> > > +
> > > return qemu_thread_is_self(&s->thread);
> > > }
> > >
> > > @@ -3077,6 +3093,10 @@ void migration_file_set_error(int ret, Error *err)
> > > {
> > > MigrationState *s = current_migration;
> > >
> > > + if (!s) {
> > > + return;
> > > + }
> > > +
> >
> > I think this is not enough because current_migration is never set to NULL
> > after it's destroyed.
> >
> > Can we add "current_migration = NULL;" to migration_instance_finalize()?
>
> Good point..
>
> I thought it was cleared already in migration_shutdown(), but now I just
> noticed why it can't - we have too many dangling references in migration/
> so that it needs to be there even if migration threads holds one refcount.
>
> I'll add one more patch as you suggested as of now and repost soon. It's
> not the cleanest that we clear that global pointer in a finalize(), but
> looks like that's the only simple way forward.
After a 2nd thought, it might not work. The issue is when the last
refcount is held by migration_thread, I _think_ it means the migration
thread will invoke this finalize(), reset current_migration, and then it
can race with a main thread calling any of the exported helpers again..
I'll think about it and prepare something else. Probably we'll need two
variables for now holding the object, while the exported functions should
only reference the global one. Then the global one can be reset in the
main thread always with no race possible when exit.
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-10-23 15:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-22 16:07 [PATCH 0/2] Migration: Make misc.h helpers available for whole VM lifecycle Peter Xu
2024-10-22 16:07 ` [PATCH 1/2] migration: Make all helpers in misc.h safe to use without migration Peter Xu
2024-10-22 16:11 ` Cédric Le Goater
2024-10-22 21:52 ` Peter Xu
2024-10-23 8:30 ` Avihai Horon
2024-10-23 15:14 ` Peter Xu
2024-10-23 15:25 ` Peter Xu
2024-10-22 16:07 ` [PATCH 2/2] migration: Unexport dirty_bitmap_mig_init() in misc.h Peter Xu
2024-10-22 16:13 ` Cédric Le Goater
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).