* [PATCH v5 0/5] migration: do not exit on incoming failure
@ 2024-04-29 19:14 Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Hi all!
The series brings an option to not immediately exit on incoming
migration failure, giving a possibility to orchestrator to get the error
through QAPI and shutdown QEMU by "quit".
v5:
- add "migration: process_incoming_migration_co(): fix reporting s->error"
v4:
- add r-b and a-b by Fabiano and Markus
- improve wording in 04 as Markus suggested
v3:
- don't refactor the whole code around setting migration error, it seems
too much and necessary for the new feature itself
- add constant
- change behavior for HMP command
- split some things to separate patches
- and more, by Peter's suggestions
New behavior can be demonstrated like this:
bash:
(
cat <<EOF
{'execute': 'qmp_capabilities'}
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [{'capability': 'events', 'state': true}]}}
{'execute': 'migrate-incoming', 'arguments': {'uri': 'exec:echo x', 'exit-on-error': false}}
EOF
sleep 1
cat <<EOF
{'execute': 'query-migrate'}
{'execute': 'quit'}
EOF
) | ./build/qemu-system-x86_64 -incoming 'defer' -qmp stdio -nographic -nodefaults
output:
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-149-gb6295ad58c"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1714068847, "microseconds": 263907}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1714068847, "microseconds": 266696}, "event": "MIGRATION", "data": {"status": "active"}}
qemu-system-x86_64: Not a migration stream
{"timestamp": {"seconds": 1714068847, "microseconds": 266766}, "event": "MIGRATION", "data": {"status": "failed"}}
{"return": {"status": "failed", "error-desc": "load of migration failed: Invalid argument"}}
{"timestamp": {"seconds": 1714068848, "microseconds": 237187}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}
Vladimir Sementsov-Ogievskiy (5):
migration: move trace-point from migrate_fd_error to migrate_set_error
migration: process_incoming_migration_co(): complete cleanup on
failure
migration: process_incoming_migration_co(): fix reporting s->error
migration: process_incoming_migration_co(): rework error reporting
qapi: introduce exit-on-error parameter for migrate-incoming
migration/migration-hmp-cmds.c | 2 +-
migration/migration.c | 76 +++++++++++++++++++++++-----------
migration/migration.h | 3 ++
migration/trace-events | 2 +-
qapi/migration.json | 7 +++-
system/vl.c | 3 +-
6 files changed, 64 insertions(+), 29 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error
2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Cover more cases by trace-point.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 4 +++-
migration/trace-events | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5105..2dc6a063e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque)
void migrate_set_error(MigrationState *s, const Error *error)
{
QEMU_LOCK_GUARD(&s->error_mutex);
+
+ trace_migrate_error(error_get_pretty(error));
+
if (!s->error) {
s->error = error_copy(error);
}
@@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s)
static void migrate_fd_error(MigrationState *s, const Error *error)
{
- trace_migrate_fd_error(error_get_pretty(error));
assert(s->to_dst_file == NULL);
migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_FAILED);
diff --git a/migration/trace-events b/migration/trace-events
index f0e1cb80c7..d0c44c3853 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam
# migration.c
migrate_set_state(const char *new_state) "new state %s"
migrate_fd_cleanup(void) ""
-migrate_fd_error(const char *error_desc) "error=%s"
+migrate_error(const char *error_desc) "error=%s"
migrate_fd_cancel(void) ""
migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure
2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:22 ` Peter Xu
2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Make call to migration_incoming_state_destroy(), instead of doing only
partial of it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 2dc6a063e9..0d26db47f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque)
fail:
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_FAILED);
- qemu_fclose(mis->from_src_file);
-
- multifd_recv_cleanup();
- compress_threads_load_cleanup();
+ migration_incoming_state_destroy();
exit(EXIT_FAILURE);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:32 ` Peter Xu
2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
migration/migration.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..58fd5819bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
migration_incoming_state_destroy();
}
+static void migrate_error_free(MigrationState *s)
+{
+ QEMU_LOCK_GUARD(&s->error_mutex);
+ if (s->error) {
+ error_free(s->error);
+ s->error = NULL;
+ }
+}
+
static void coroutine_fn
process_incoming_migration_co(void *opaque)
{
+ MigrationState *s = migrate_get_current();
MigrationIncomingState *mis = migration_incoming_get_current();
PostcopyState ps;
int ret;
@@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
}
if (ret < 0) {
- MigrationState *s = migrate_get_current();
-
if (migrate_has_error(s)) {
WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(s->error);
+ error_report_err(error_copy(s->error));
}
}
error_report("load of migration failed: %s", strerror(-ret));
@@ -801,6 +809,7 @@ fail:
MIGRATION_STATUS_FAILED);
migration_incoming_state_destroy();
+ migrate_error_free(s);
exit(EXIT_FAILURE);
}
@@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
return qatomic_read(&s->error);
}
-static void migrate_error_free(MigrationState *s)
-{
- QEMU_LOCK_GUARD(&s->error_mutex);
- if (s->error) {
- error_free(s->error);
- s->error = NULL;
- }
-}
-
static void migrate_fd_error(MigrationState *s, const Error *error)
{
assert(s->to_dst_file == NULL);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
` (2 preceding siblings ...)
2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:39 ` Peter Xu
2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 58fd5819bc..5489ff96df 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
MigrationIncomingState *mis = migration_incoming_get_current();
PostcopyState ps;
int ret;
+ Error *local_err = NULL;
assert(mis->from_src_file);
if (compress_threads_load_setup(mis->from_src_file)) {
- error_report("Failed to setup decompress threads");
+ error_setg(&local_err, "Failed to setup decompress threads");
goto fail;
}
@@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)
}
if (ret < 0) {
- if (migrate_has_error(s)) {
- WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(error_copy(s->error));
- }
- }
- error_report("load of migration failed: %s", strerror(-ret));
+ error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
goto fail;
}
if (colo_incoming_co() < 0) {
+ error_setg(&local_err, "colo incoming failed");
goto fail;
}
@@ -809,6 +806,12 @@ fail:
MIGRATION_STATUS_FAILED);
migration_incoming_state_destroy();
+ if (migrate_has_error(s)) {
+ WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+ error_report_err(error_copy(s->error));
+ }
+ }
+ error_report_err(local_err);
migrate_error_free(s);
exit(EXIT_FAILURE);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming
2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
` (3 preceding siblings ...)
2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:41 ` Peter Xu
4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Now we do set MIGRATION_FAILED state, but don't give a chance to
orchestrator to query migration state and get the error.
Let's provide a possibility for QMP-based orchestrators to get an error
like with outgoing migration.
For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
and ABI, it's mostly intended to use by developer and it makes sense
not to stop the process.
For x-exit-preconfig, let's keep the old behavior:
- it's called from init(), so here we want to keep current behavior by
default
- it does exit on error by itself as well
So, if we want to change the behavior of x-exit-preconfig, it should be
another patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Markus Armbruster <armbru@redhat.com>
---
migration/migration-hmp-cmds.c | 2 +-
migration/migration.c | 38 +++++++++++++++++++++++++++-------
migration/migration.h | 3 +++
qapi/migration.json | 7 ++++++-
system/vl.c | 3 ++-
5 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..23181bbee1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
}
QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
- qmp_migrate_incoming(NULL, true, caps, &err);
+ qmp_migrate_incoming(NULL, true, caps, true, false, &err);
qapi_free_MigrationChannelList(caps);
end:
diff --git a/migration/migration.c b/migration/migration.c
index 5489ff96df..09f762c99e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -72,6 +72,8 @@
#define NOTIFIER_ELEM_INIT(array, elem) \
[elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
+#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
+
static NotifierWithReturnList migration_state_notifiers[] = {
NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
@@ -234,6 +236,8 @@ void migration_object_init(void)
qemu_cond_init(¤t_incoming->page_request_cond);
current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
+ current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
migration_object_check(current_migration, &error_fatal);
blk_mig_init();
@@ -806,14 +810,19 @@ fail:
MIGRATION_STATUS_FAILED);
migration_incoming_state_destroy();
- if (migrate_has_error(s)) {
- WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(error_copy(s->error));
+ if (mis->exit_on_error) {
+ if (migrate_has_error(s)) {
+ WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+ error_report_err(error_copy(s->error));
+ }
}
+ error_report_err(local_err);
+ migrate_error_free(s);
+ exit(EXIT_FAILURE);
+ } else {
+ migrate_set_error(s, local_err);
+ error_free(local_err);
}
- error_report_err(local_err);
- migrate_error_free(s);
- exit(EXIT_FAILURE);
}
/**
@@ -1322,6 +1331,15 @@ static void fill_destination_migration_info(MigrationInfo *info)
break;
}
info->status = mis->state;
+
+ if (!info->error_desc) {
+ MigrationState *s = migrate_get_current();
+ QEMU_LOCK_GUARD(&s->error_mutex);
+
+ if (s->error) {
+ info->error_desc = g_strdup(error_get_pretty(s->error));
+ }
+ }
}
MigrationInfo *qmp_query_migrate(Error **errp)
@@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp)
}
void qmp_migrate_incoming(const char *uri, bool has_channels,
- MigrationChannelList *channels, Error **errp)
+ MigrationChannelList *channels,
+ bool has_exit_on_error, bool exit_on_error,
+ Error **errp)
{
Error *local_err = NULL;
static bool once = true;
+ MigrationIncomingState *mis = migration_incoming_get_current();
if (!once) {
error_setg(errp, "The incoming migration has already been started");
@@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
return;
}
+ mis->exit_on_error =
+ has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
if (local_err) {
diff --git a/migration/migration.h b/migration/migration.h
index 8045e39c26..95995a818e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -227,6 +227,9 @@ struct MigrationIncomingState {
* is needed as this field is updated serially.
*/
unsigned int switchover_ack_pending_num;
+
+ /* Do exit on incoming migration failure */
+ bool exit_on_error;
};
MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 8c65b90328..9feed413b5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1837,6 +1837,10 @@
# @channels: list of migration stream channels with each stream in the
# list connected to a destination interface endpoint.
#
+# @exit-on-error: Exit on incoming migration failure. Default true.
+# When set to false, the failure triggers a MIGRATION event, and
+# error details could be retrieved with query-migrate. (since 9.1)
+#
# Since: 2.3
#
# Notes:
@@ -1889,7 +1893,8 @@
##
{ 'command': 'migrate-incoming',
'data': {'*uri': 'str',
- '*channels': [ 'MigrationChannel' ] } }
+ '*channels': [ 'MigrationChannel' ],
+ '*exit-on-error': 'bool' } }
##
# @xen-save-devices-state:
diff --git a/system/vl.c b/system/vl.c
index 7756eac81e..79cd498395 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2723,7 +2723,8 @@ void qmp_x_exit_preconfig(Error **errp)
if (incoming) {
Error *local_err = NULL;
if (strcmp(incoming, "defer") != 0) {
- qmp_migrate_incoming(incoming, false, NULL, &local_err);
+ qmp_migrate_incoming(incoming, false, NULL, true, true,
+ &local_err);
if (local_err) {
error_reportf_err(local_err, "-incoming %s: ", incoming);
exit(1);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure
2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:22 ` Peter Xu
0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:22 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On Mon, Apr 29, 2024 at 10:14:23PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Make call to migration_incoming_state_destroy(), instead of doing only
> partial of it.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:32 ` Peter Xu
2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:32 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> It's bad idea to leave critical section with error object freed, but
> s->error still set, this theoretically may lead to use-after-free
> crash. Let's avoid it.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> migration/migration.c | 24 ++++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 0d26db47f7..58fd5819bc 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
> migration_incoming_state_destroy();
> }
>
> +static void migrate_error_free(MigrationState *s)
> +{
> + QEMU_LOCK_GUARD(&s->error_mutex);
> + if (s->error) {
> + error_free(s->error);
> + s->error = NULL;
> + }
> +}
> +
> static void coroutine_fn
> process_incoming_migration_co(void *opaque)
> {
> + MigrationState *s = migrate_get_current();
> MigrationIncomingState *mis = migration_incoming_get_current();
> PostcopyState ps;
> int ret;
> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
> }
>
> if (ret < 0) {
> - MigrationState *s = migrate_get_current();
> -
> if (migrate_has_error(s)) {
> WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> - error_report_err(s->error);
> + error_report_err(error_copy(s->error));
This looks like a bugfix, agreed.
> }
> }
> error_report("load of migration failed: %s", strerror(-ret));
> @@ -801,6 +809,7 @@ fail:
> MIGRATION_STATUS_FAILED);
> migration_incoming_state_destroy();
>
> + migrate_error_free(s);
Would migration_incoming_state_destroy() be a better place?
One thing weird is we actually reuses MigrationState*'s error for incoming
too, but so far it looks ok as long as QEMU can't be both src & dst. Then
calling migrate_error_free even in incoming_state_destroy() looks ok.
This patch still looks like containing two changes. Better split them (or
just fix the bug only)?
Thanks,
> exit(EXIT_FAILURE);
> }
>
> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
> return qatomic_read(&s->error);
> }
>
> -static void migrate_error_free(MigrationState *s)
> -{
> - QEMU_LOCK_GUARD(&s->error_mutex);
> - if (s->error) {
> - error_free(s->error);
> - s->error = NULL;
> - }
> -}
> -
> static void migrate_fd_error(MigrationState *s, const Error *error)
> {
> assert(s->to_dst_file == NULL);
> --
> 2.34.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:39 ` Peter Xu
2024-04-30 7:12 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:39 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Unify error reporting in the function. This simplifies the following
> commit, which will not-exit-on-error behavior variant to the function.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/migration.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 58fd5819bc..5489ff96df 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
> MigrationIncomingState *mis = migration_incoming_get_current();
> PostcopyState ps;
> int ret;
> + Error *local_err = NULL;
>
> assert(mis->from_src_file);
>
> if (compress_threads_load_setup(mis->from_src_file)) {
> - error_report("Failed to setup decompress threads");
> + error_setg(&local_err, "Failed to setup decompress threads");
> goto fail;
> }
>
> @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)
> }
>
> if (ret < 0) {
> - if (migrate_has_error(s)) {
> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> - error_report_err(error_copy(s->error));
> - }
> - }
> - error_report("load of migration failed: %s", strerror(-ret));
> + error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
> goto fail;
> }
>
> if (colo_incoming_co() < 0) {
> + error_setg(&local_err, "colo incoming failed");
> goto fail;
> }
>
> @@ -809,6 +806,12 @@ fail:
> MIGRATION_STATUS_FAILED);
> migration_incoming_state_destroy();
>
> + if (migrate_has_error(s)) {
> + WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> + error_report_err(error_copy(s->error));
> + }
> + }
> + error_report_err(local_err);
Here migrate_has_error() will always return true? If so we can drop it.
Meanwhile, IMHO it's easier we simply always keep the earliest error we see
and report that only, local_err is just one of the errors and whoever
reaches first will be reported. Something like:
migrate_set_error(local_err);
WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
error_report_err(error_copy(s->error));
}
exit(EXIT_FAILURE);
Then when with the exit-on-error thing:
migrate_set_error(local_err);
if (exit_on_error) {
WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
error_report_err(error_copy(s->error));
}
exit(EXIT_FAILURE);
}
Would this looks slightly cleaner?
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming
2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:41 ` Peter Xu
0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:41 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On Mon, Apr 29, 2024 at 10:14:26PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Now we do set MIGRATION_FAILED state, but don't give a chance to
> orchestrator to query migration state and get the error.
>
> Let's provide a possibility for QMP-based orchestrators to get an error
> like with outgoing migration.
>
> For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
> and ABI, it's mostly intended to use by developer and it makes sense
> not to stop the process.
>
> For x-exit-preconfig, let's keep the old behavior:
> - it's called from init(), so here we want to keep current behavior by
> default
> - it does exit on error by itself as well
> So, if we want to change the behavior of x-exit-preconfig, it should be
> another patch.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> ---
> migration/migration-hmp-cmds.c | 2 +-
> migration/migration.c | 38 +++++++++++++++++++++++++++-------
> migration/migration.h | 3 +++
> qapi/migration.json | 7 ++++++-
> system/vl.c | 3 ++-
> 5 files changed, 43 insertions(+), 10 deletions(-)
>
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7e96ae6ffd..23181bbee1 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
> }
> QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
>
> - qmp_migrate_incoming(NULL, true, caps, &err);
> + qmp_migrate_incoming(NULL, true, caps, true, false, &err);
> qapi_free_MigrationChannelList(caps);
>
> end:
> diff --git a/migration/migration.c b/migration/migration.c
> index 5489ff96df..09f762c99e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -72,6 +72,8 @@
> #define NOTIFIER_ELEM_INIT(array, elem) \
> [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
>
> +#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
> +
> static NotifierWithReturnList migration_state_notifiers[] = {
> NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
> NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
> @@ -234,6 +236,8 @@ void migration_object_init(void)
> qemu_cond_init(¤t_incoming->page_request_cond);
> current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>
> + current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
> +
> migration_object_check(current_migration, &error_fatal);
>
> blk_mig_init();
> @@ -806,14 +810,19 @@ fail:
> MIGRATION_STATUS_FAILED);
> migration_incoming_state_destroy();
>
> - if (migrate_has_error(s)) {
> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> - error_report_err(error_copy(s->error));
> + if (mis->exit_on_error) {
> + if (migrate_has_error(s)) {
> + WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> + error_report_err(error_copy(s->error));
> + }
> }
> + error_report_err(local_err);
> + migrate_error_free(s);
> + exit(EXIT_FAILURE);
> + } else {
> + migrate_set_error(s, local_err);
> + error_free(local_err);
> }
> - error_report_err(local_err);
> - migrate_error_free(s);
> - exit(EXIT_FAILURE);
> }
(Please see my reply in previous patch)
>
> /**
> @@ -1322,6 +1331,15 @@ static void fill_destination_migration_info(MigrationInfo *info)
> break;
> }
> info->status = mis->state;
> +
> + if (!info->error_desc) {
This will also guarantee to happen? Can drop the check too if so.
Other than that looks good, thanks.
> + MigrationState *s = migrate_get_current();
> + QEMU_LOCK_GUARD(&s->error_mutex);
> +
> + if (s->error) {
> + info->error_desc = g_strdup(error_get_pretty(s->error));
> + }
> + }
> }
>
> MigrationInfo *qmp_query_migrate(Error **errp)
> @@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp)
> }
>
> void qmp_migrate_incoming(const char *uri, bool has_channels,
> - MigrationChannelList *channels, Error **errp)
> + MigrationChannelList *channels,
> + bool has_exit_on_error, bool exit_on_error,
> + Error **errp)
> {
> Error *local_err = NULL;
> static bool once = true;
> + MigrationIncomingState *mis = migration_incoming_get_current();
>
> if (!once) {
> error_setg(errp, "The incoming migration has already been started");
> @@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
> return;
> }
>
> + mis->exit_on_error =
> + has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
> +
> qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
>
> if (local_err) {
> diff --git a/migration/migration.h b/migration/migration.h
> index 8045e39c26..95995a818e 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -227,6 +227,9 @@ struct MigrationIncomingState {
> * is needed as this field is updated serially.
> */
> unsigned int switchover_ack_pending_num;
> +
> + /* Do exit on incoming migration failure */
> + bool exit_on_error;
> };
>
> MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8c65b90328..9feed413b5 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1837,6 +1837,10 @@
> # @channels: list of migration stream channels with each stream in the
> # list connected to a destination interface endpoint.
> #
> +# @exit-on-error: Exit on incoming migration failure. Default true.
> +# When set to false, the failure triggers a MIGRATION event, and
> +# error details could be retrieved with query-migrate. (since 9.1)
> +#
> # Since: 2.3
> #
> # Notes:
> @@ -1889,7 +1893,8 @@
> ##
> { 'command': 'migrate-incoming',
> 'data': {'*uri': 'str',
> - '*channels': [ 'MigrationChannel' ] } }
> + '*channels': [ 'MigrationChannel' ],
> + '*exit-on-error': 'bool' } }
>
> ##
> # @xen-save-devices-state:
> diff --git a/system/vl.c b/system/vl.c
> index 7756eac81e..79cd498395 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -2723,7 +2723,8 @@ void qmp_x_exit_preconfig(Error **errp)
> if (incoming) {
> Error *local_err = NULL;
> if (strcmp(incoming, "defer") != 0) {
> - qmp_migrate_incoming(incoming, false, NULL, &local_err);
> + qmp_migrate_incoming(incoming, false, NULL, true, true,
> + &local_err);
> if (local_err) {
> error_reportf_err(local_err, "-incoming %s: ", incoming);
> exit(1);
> --
> 2.34.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
2024-04-29 19:39 ` Peter Xu
@ 2024-04-30 7:12 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 7:12 UTC (permalink / raw)
To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On 29.04.24 22:39, Peter Xu wrote:
> On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Unify error reporting in the function. This simplifies the following
>> commit, which will not-exit-on-error behavior variant to the function.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> migration/migration.c | 17 ++++++++++-------
>> 1 file changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 58fd5819bc..5489ff96df 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
>> MigrationIncomingState *mis = migration_incoming_get_current();
>> PostcopyState ps;
>> int ret;
>> + Error *local_err = NULL;
>>
>> assert(mis->from_src_file);
>>
>> if (compress_threads_load_setup(mis->from_src_file)) {
>> - error_report("Failed to setup decompress threads");
>> + error_setg(&local_err, "Failed to setup decompress threads");
>> goto fail;
>> }
>>
>> @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)
>> }
>>
>> if (ret < 0) {
>> - if (migrate_has_error(s)) {
>> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> - error_report_err(error_copy(s->error));
>> - }
>> - }
>> - error_report("load of migration failed: %s", strerror(-ret));
>> + error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
>> goto fail;
>> }
>>
>> if (colo_incoming_co() < 0) {
>> + error_setg(&local_err, "colo incoming failed");
>> goto fail;
>> }
>>
>> @@ -809,6 +806,12 @@ fail:
>> MIGRATION_STATUS_FAILED);
>> migration_incoming_state_destroy();
>>
>> + if (migrate_has_error(s)) {
>> + WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> + error_report_err(error_copy(s->error));
>> + }
>> + }
>> + error_report_err(local_err);
>
> Here migrate_has_error() will always return true? If so we can drop it.
>
> Meanwhile, IMHO it's easier we simply always keep the earliest error we see
> and report that only, local_err is just one of the errors and whoever
> reaches first will be reported. Something like:
>
> migrate_set_error(local_err);
> WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> error_report_err(error_copy(s->error));
> }
> exit(EXIT_FAILURE);
>
> Then when with the exit-on-error thing:
>
> migrate_set_error(local_err);
> if (exit_on_error) {
> WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> error_report_err(error_copy(s->error));
> }
> exit(EXIT_FAILURE);
> }
>
> Would this looks slightly cleaner?
>
Yes, looks better, will do so
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
2024-04-29 19:32 ` Peter Xu
@ 2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy
2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:06 UTC (permalink / raw)
To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On 29.04.24 22:32, Peter Xu wrote:
> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> It's bad idea to leave critical section with error object freed, but
>> s->error still set, this theoretically may lead to use-after-free
>> crash. Let's avoid it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>> migration/migration.c | 24 ++++++++++++------------
>> 1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 0d26db47f7..58fd5819bc 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
>> migration_incoming_state_destroy();
>> }
>>
>> +static void migrate_error_free(MigrationState *s)
>> +{
>> + QEMU_LOCK_GUARD(&s->error_mutex);
>> + if (s->error) {
>> + error_free(s->error);
>> + s->error = NULL;
>> + }
>> +}
>> +
>> static void coroutine_fn
>> process_incoming_migration_co(void *opaque)
>> {
>> + MigrationState *s = migrate_get_current();
>> MigrationIncomingState *mis = migration_incoming_get_current();
>> PostcopyState ps;
>> int ret;
>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
>> }
>>
>> if (ret < 0) {
>> - MigrationState *s = migrate_get_current();
>> -
>> if (migrate_has_error(s)) {
>> WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> - error_report_err(s->error);
>> + error_report_err(error_copy(s->error));
>
> This looks like a bugfix, agreed.
>
>> }
>> }
>> error_report("load of migration failed: %s", strerror(-ret));
>> @@ -801,6 +809,7 @@ fail:
>> MIGRATION_STATUS_FAILED);
>> migration_incoming_state_destroy();
>>
>> + migrate_error_free(s);
>
> Would migration_incoming_state_destroy() be a better place?
Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands.
>
> One thing weird is we actually reuses MigrationState*'s error for incoming
> too, but so far it looks ok as long as QEMU can't be both src & dst. Then
> calling migrate_error_free even in incoming_state_destroy() looks ok.
>
> This patch still looks like containing two changes. Better split them (or
> just fix the bug only)?
>
> Thanks,
>
>> exit(EXIT_FAILURE);
>> }
>>
>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
>> return qatomic_read(&s->error);
>> }
>>
>> -static void migrate_error_free(MigrationState *s)
>> -{
>> - QEMU_LOCK_GUARD(&s->error_mutex);
>> - if (s->error) {
>> - error_free(s->error);
>> - s->error = NULL;
>> - }
>> -}
>> -
>> static void migrate_fd_error(MigrationState *s, const Error *error)
>> {
>> assert(s->to_dst_file == NULL);
>> --
>> 2.34.1
>>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy
@ 2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy
2024-04-30 8:47 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:09 UTC (permalink / raw)
To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote:
> On 29.04.24 22:32, Peter Xu wrote:
>> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> It's bad idea to leave critical section with error object freed, but
>>> s->error still set, this theoretically may lead to use-after-free
>>> crash. Let's avoid it.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>> ---
>>> migration/migration.c | 24 ++++++++++++------------
>>> 1 file changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 0d26db47f7..58fd5819bc 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
>>> migration_incoming_state_destroy();
>>> }
>>> +static void migrate_error_free(MigrationState *s)
>>> +{
>>> + QEMU_LOCK_GUARD(&s->error_mutex);
>>> + if (s->error) {
>>> + error_free(s->error);
>>> + s->error = NULL;
>>> + }
>>> +}
>>> +
>>> static void coroutine_fn
>>> process_incoming_migration_co(void *opaque)
>>> {
>>> + MigrationState *s = migrate_get_current();
>>> MigrationIncomingState *mis = migration_incoming_get_current();
>>> PostcopyState ps;
>>> int ret;
>>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
>>> }
>>> if (ret < 0) {
>>> - MigrationState *s = migrate_get_current();
>>> -
>>> if (migrate_has_error(s)) {
>>> WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>>> - error_report_err(s->error);
>>> + error_report_err(error_copy(s->error));
>>
>> This looks like a bugfix, agreed.
>>
>>> }
>>> }
>>> error_report("load of migration failed: %s", strerror(-ret));
>>> @@ -801,6 +809,7 @@ fail:
>>> MIGRATION_STATUS_FAILED);
>>> migration_incoming_state_destroy();
>>> + migrate_error_free(s);
>>
>> Would migration_incoming_state_destroy() be a better place?
>
> Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands.
Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit().
>
>>
>> One thing weird is we actually reuses MigrationState*'s error for incoming
>> too, but so far it looks ok as long as QEMU can't be both src & dst. Then
>> calling migrate_error_free even in incoming_state_destroy() looks ok.
>>
>> This patch still looks like containing two changes. Better split them (or
>> just fix the bug only)?
>>
>> Thanks,
>>
>>> exit(EXIT_FAILURE);
>>> }
>>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
>>> return qatomic_read(&s->error);
>>> }
>>> -static void migrate_error_free(MigrationState *s)
>>> -{
>>> - QEMU_LOCK_GUARD(&s->error_mutex);
>>> - if (s->error) {
>>> - error_free(s->error);
>>> - s->error = NULL;
>>> - }
>>> -}
>>> -
>>> static void migrate_fd_error(MigrationState *s, const Error *error)
>>> {
>>> assert(s->to_dst_file == NULL);
>>> --
>>> 2.34.1
>>>
>>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy
@ 2024-04-30 8:47 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:47 UTC (permalink / raw)
To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On 30.04.24 11:09, Vladimir Sementsov-Ogievskiy wrote:
> On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote:
>> On 29.04.24 22:32, Peter Xu wrote:
>>> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> It's bad idea to leave critical section with error object freed, but
>>>> s->error still set, this theoretically may lead to use-after-free
>>>> crash. Let's avoid it.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>>> migration/migration.c | 24 ++++++++++++------------
>>>> 1 file changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 0d26db47f7..58fd5819bc 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
>>>> migration_incoming_state_destroy();
>>>> }
>>>> +static void migrate_error_free(MigrationState *s)
>>>> +{
>>>> + QEMU_LOCK_GUARD(&s->error_mutex);
>>>> + if (s->error) {
>>>> + error_free(s->error);
>>>> + s->error = NULL;
>>>> + }
>>>> +}
>>>> +
>>>> static void coroutine_fn
>>>> process_incoming_migration_co(void *opaque)
>>>> {
>>>> + MigrationState *s = migrate_get_current();
>>>> MigrationIncomingState *mis = migration_incoming_get_current();
>>>> PostcopyState ps;
>>>> int ret;
>>>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
>>>> }
>>>> if (ret < 0) {
>>>> - MigrationState *s = migrate_get_current();
>>>> -
>>>> if (migrate_has_error(s)) {
>>>> WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>>>> - error_report_err(s->error);
>>>> + error_report_err(error_copy(s->error));
>>>
>>> This looks like a bugfix, agreed.
>>>
>>>> }
>>>> }
>>>> error_report("load of migration failed: %s", strerror(-ret));
>>>> @@ -801,6 +809,7 @@ fail:
>>>> MIGRATION_STATUS_FAILED);
>>>> migration_incoming_state_destroy();
>>>> + migrate_error_free(s);
>>>
>>> Would migration_incoming_state_destroy() be a better place?
>>
>> Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands.
>
> Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit().
>
Or, just do the simplest fix of potential use-after-free, and don't care:
WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
error_report_err(s->error);
s->error = NULL;
}
- I'll send v6 with it.
>>
>>>
>>> One thing weird is we actually reuses MigrationState*'s error for incoming
>>> too, but so far it looks ok as long as QEMU can't be both src & dst. Then
>>> calling migrate_error_free even in incoming_state_destroy() looks ok.
>>>
>>> This patch still looks like containing two changes. Better split them (or
>>> just fix the bug only)?
>>>
>>> Thanks,
>>>
>>>> exit(EXIT_FAILURE);
>>>> }
>>>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
>>>> return qatomic_read(&s->error);
>>>> }
>>>> -static void migrate_error_free(MigrationState *s)
>>>> -{
>>>> - QEMU_LOCK_GUARD(&s->error_mutex);
>>>> - if (s->error) {
>>>> - error_free(s->error);
>>>> - s->error = NULL;
>>>> - }
>>>> -}
>>>> -
>>>> static void migrate_fd_error(MigrationState *s, const Error *error)
>>>> {
>>>> assert(s->to_dst_file == NULL);
>>>> --
>>>> 2.34.1
>>>>
>>>
>>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-04-30 8:48 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
2024-04-29 19:22 ` Peter Xu
2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
2024-04-29 19:32 ` Peter Xu
2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy
2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy
2024-04-30 8:47 ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
2024-04-29 19:39 ` Peter Xu
2024-04-30 7:12 ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
2024-04-29 19:41 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).