* [PATCH v2 0/2] migration: do not exit on incoming failure
@ 2024-04-24 17:42 Vladimir Sementsov-Ogievskiy
2024-04-24 17:42 ` [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err() Vladimir Sementsov-Ogievskiy
2024-04-24 17:42 ` [PATCH v2 2/2] qapi: introduce exit-on-error paramter for migrate-incoming Vladimir Sementsov-Ogievskiy
0 siblings, 2 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-24 17:42 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Hi all!
The series brings an option to not immediately exit on incoming
migration failure, giving a possibility to orchestrator to get the error
through QAPI and shutdown QEMU by "quit".
Vladimir Sementsov-Ogievskiy (2):
migration: rework migrate_set_error() to migrate_report_err()
qapi: introduce exit-on-error paramter for migrate-incoming
migration/migration-hmp-cmds.c | 2 +-
migration/migration.c | 79 +++++++++++++++-------------------
migration/migration.h | 5 ++-
migration/multifd.c | 18 +++-----
migration/postcopy-ram.c | 3 +-
migration/savevm.c | 16 +++----
qapi/migration.json | 7 ++-
system/vl.c | 7 +--
8 files changed, 60 insertions(+), 77 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()
2024-04-24 17:42 [PATCH v2 0/2] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
@ 2024-04-24 17:42 ` Vladimir Sementsov-Ogievskiy
2024-04-24 19:40 ` Peter Xu
2024-04-24 17:42 ` [PATCH v2 2/2] qapi: introduce exit-on-error paramter for migrate-incoming Vladimir Sementsov-Ogievskiy
1 sibling, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-24 17:42 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
1. Most of callers want to free the error after call. Let's help them.
2. Some callers log the error, some not. We also have places where we
log the stored error. Let's instead simply log every migration
error.
3. Some callers have to retrieve current migration state only to pass
it to migrate_set_error(). In the new helper let's get the state
automatically.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
migration/migration.c | 48 ++++++++++++----------------------------
migration/migration.h | 2 +-
migration/multifd.c | 18 ++++++---------
migration/postcopy-ram.c | 3 +--
migration/savevm.c | 16 +++++---------
5 files changed, 28 insertions(+), 59 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 696762bc64..806b7b080b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
void migration_cancel(const Error *error)
{
if (error) {
- migrate_set_error(current_migration, error);
+ migrate_report_err(error_copy(error));
}
if (migrate_dirty_limit()) {
qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
@@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
}
if (ret < 0) {
- MigrationState *s = migrate_get_current();
-
- if (migrate_has_error(s)) {
- WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(s->error);
- }
- }
error_report("load of migration failed: %s", strerror(-ret));
goto fail;
}
@@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
MIGRATION_STATUS_CANCELLED);
}
- if (s->error) {
- /* It is used on info migrate. We can't free it */
- error_report_err(error_copy(s->error));
- }
type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
MIG_EVENT_PRECOPY_DONE;
migration_call_notifiers(s, type, NULL);
@@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
migrate_fd_cleanup(opaque);
}
-void migrate_set_error(MigrationState *s, const Error *error)
+void migrate_report_err(Error *error)
{
+ MigrationState *s = migrate_get_current();
QEMU_LOCK_GUARD(&s->error_mutex);
if (!s->error) {
s->error = error_copy(error);
}
+ error_report_err(error);
}
bool migrate_has_error(MigrationState *s)
@@ -1448,7 +1439,7 @@ static void migrate_fd_error(MigrationState *s, const Error *error)
assert(s->to_dst_file == NULL);
migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_FAILED);
- migrate_set_error(s, error);
+ migrate_report_err(error_copy(error));
}
static void migrate_fd_cancel(MigrationState *s)
@@ -1863,8 +1854,7 @@ void qmp_migrate_pause(Error **errp)
/* Tell the core migration that we're pausing */
error_setg(&error, "Postcopy migration is paused by the user");
- migrate_set_error(ms, error);
- error_free(error);
+ migrate_report_err(error);
qemu_mutex_lock(&ms->qemu_file_lock);
if (ms->to_dst_file) {
@@ -2413,8 +2403,7 @@ static void *source_return_path_thread(void *opaque)
out:
if (err) {
- migrate_set_error(ms, err);
- error_free(err);
+ migrate_report_err(err);
trace_source_return_path_thread_bad_end();
}
@@ -2842,12 +2831,10 @@ static void migration_completion(MigrationState *s)
fail:
if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
- migrate_set_error(s, local_err);
- error_free(local_err);
+ migrate_report_err(local_err);
} else if (ret) {
error_setg_errno(&local_err, -ret, "Error in migration completion");
- migrate_set_error(s, local_err);
- error_free(local_err);
+ migrate_report_err(local_err);
}
migration_completion_failed(s, current_active_state);
@@ -3070,8 +3057,7 @@ static MigThrError migration_detect_error(MigrationState *s)
}
if (local_error) {
- migrate_set_error(s, local_error);
- error_free(local_error);
+ migrate_report_err(local_error);
}
if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
@@ -3242,8 +3228,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
if (!in_postcopy && must_precopy <= s->threshold_size && can_switchover &&
qatomic_read(&s->start_postcopy)) {
if (postcopy_start(s, &local_err)) {
- migrate_set_error(s, local_err);
- error_report_err(local_err);
+ migrate_report_err(local_err);
}
return MIG_ITERATE_SKIP;
}
@@ -3487,8 +3472,7 @@ static void *migration_thread(void *opaque)
* devices to unplug. This to preserve migration state transitions.
*/
if (ret) {
- migrate_set_error(s, local_err);
- error_free(local_err);
+ migrate_report_err(local_err);
migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_FAILED);
goto out;
@@ -3613,8 +3597,7 @@ static void *bg_migration_thread(void *opaque)
* devices to unplug. This to preserve migration state transitions.
*/
if (ret) {
- migrate_set_error(s, local_err);
- error_free(local_err);
+ migrate_report_err(local_err);
migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_FAILED);
goto fail_setup;
@@ -3723,8 +3706,6 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
* It's normally done in migrate_fd_cleanup(), but call it here
* explicitly.
*/
- error_report_err(error_copy(s->error));
- } else {
migrate_fd_cleanup(s);
}
return;
@@ -3794,9 +3775,8 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
return;
fail:
- migrate_set_error(s, local_err);
+ migrate_report_err(local_err);
migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
- error_report_err(local_err);
migrate_fd_cleanup(s);
}
diff --git a/migration/migration.h b/migration/migration.h
index 8045e39c26..f9f436c33e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -475,7 +475,7 @@ void migration_incoming_process(void);
bool migration_has_all_channels(void);
-void migrate_set_error(MigrationState *s, const Error *error);
+void migrate_report_err(Error *error);
bool migrate_has_error(MigrationState *s);
void migrate_fd_connect(MigrationState *s, Error *error_in);
diff --git a/migration/multifd.c b/migration/multifd.c
index f317bff077..ce51744054 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -678,7 +678,7 @@ retry:
}
/* Multifd send side hit an error; remember it and prepare to quit */
-static void multifd_send_set_error(Error *err)
+static void multifd_report_err(Error *err)
{
/*
* We don't want to exit each threads twice. Depending on where
@@ -692,7 +692,7 @@ static void multifd_send_set_error(Error *err)
if (err) {
MigrationState *s = migrate_get_current();
- migrate_set_error(s, err);
+ migrate_report_err(err);
if (s->state == MIGRATION_STATUS_SETUP ||
s->state == MIGRATION_STATUS_PRE_SWITCHOVER ||
s->state == MIGRATION_STATUS_DEVICE ||
@@ -819,8 +819,7 @@ void multifd_send_shutdown(void)
Error *local_err = NULL;
if (!multifd_send_cleanup_channel(p, &local_err)) {
- migrate_set_error(migrate_get_current(), local_err);
- error_free(local_err);
+ migrate_report_err(local_err);
}
}
@@ -998,9 +997,8 @@ out:
if (ret) {
assert(local_err);
trace_multifd_send_error(p->id);
- multifd_send_set_error(local_err);
+ multifd_report_err(local_err);
multifd_send_kick_main(p);
- error_free(local_err);
}
rcu_unregister_thread();
@@ -1123,14 +1121,13 @@ out:
}
trace_multifd_new_send_channel_async_error(p->id, local_err);
- multifd_send_set_error(local_err);
+ multifd_report_err(local_err);
/*
* For error cases (TLS or non-TLS), IO channel is always freed here
* rather than when cleanup multifd: since p->c is not set, multifd
* cleanup code doesn't even know its existence.
*/
object_unref(OBJECT(ioc));
- error_free(local_err);
}
static bool multifd_new_send_channel_create(gpointer opaque, Error **errp)
@@ -1214,8 +1211,7 @@ bool multifd_send_setup(void)
}
if (ret) {
- migrate_set_error(s, local_err);
- error_report_err(local_err);
+ migrate_report_err(local_err);
migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_FAILED);
return false;
@@ -1287,7 +1283,7 @@ static void multifd_recv_terminate_threads(Error *err)
if (err) {
MigrationState *s = migrate_get_current();
- migrate_set_error(s, err);
+ migrate_report_err(error_copy(err));
if (s->state == MIGRATION_STATUS_SETUP ||
s->state == MIGRATION_STATUS_ACTIVE) {
migrate_set_state(&s->state, s->state,
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index eccff499cb..dff220d529 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1626,8 +1626,7 @@ postcopy_preempt_send_channel_done(MigrationState *s,
QIOChannel *ioc, Error *local_err)
{
if (local_err) {
- migrate_set_error(s, local_err);
- error_free(local_err);
+ migrate_report_err(local_err);
} else {
migration_ioc_register_yank(ioc);
s->postcopy_qemufile_src = qemu_file_new_output(ioc);
diff --git a/migration/savevm.c b/migration/savevm.c
index 5d200cf42a..839957b5d7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1103,14 +1103,12 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len)
{
uint32_t tmp;
- MigrationState *ms = migrate_get_current();
Error *local_err = NULL;
if (len > MAX_VM_CMD_PACKAGED_SIZE) {
error_setg(&local_err, "%s: Unreasonably large packaged state: %zu",
__func__, len);
- migrate_set_error(ms, local_err);
- error_report_err(local_err);
+ migrate_report_err(local_err);
return -1;
}
@@ -1325,7 +1323,7 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
if (se->vmsd && se->vmsd->early_setup) {
ret = vmstate_save(f, se, ms->vmdesc, errp);
if (ret) {
- migrate_set_error(ms, *errp);
+ migrate_report_err(error_copy(*errp));
qemu_file_set_error(f, ret);
break;
}
@@ -1553,8 +1551,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
ret = vmstate_save(f, se, vmdesc, &local_err);
if (ret) {
- migrate_set_error(ms, local_err);
- error_report_err(local_err);
+ migrate_report_err(local_err);
qemu_file_set_error(f, ret);
return ret;
}
@@ -1571,8 +1568,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
if (ret) {
error_setg(&local_err, "%s: bdrv_inactivate_all() failed (%d)",
__func__, ret);
- migrate_set_error(ms, local_err);
- error_report_err(local_err);
+ migrate_report_err(local_err);
qemu_file_set_error(f, ret);
return ret;
}
@@ -1768,7 +1764,6 @@ void qemu_savevm_live_state(QEMUFile *f)
int qemu_save_device_state(QEMUFile *f)
{
- MigrationState *ms = migrate_get_current();
Error *local_err = NULL;
SaveStateEntry *se;
@@ -1786,8 +1781,7 @@ int qemu_save_device_state(QEMUFile *f)
}
ret = vmstate_save(f, se, NULL, &local_err);
if (ret) {
- migrate_set_error(ms, local_err);
- error_report_err(local_err);
+ migrate_report_err(local_err);
return ret;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] qapi: introduce exit-on-error paramter for migrate-incoming
2024-04-24 17:42 [PATCH v2 0/2] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
2024-04-24 17:42 ` [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err() Vladimir Sementsov-Ogievskiy
@ 2024-04-24 17:42 ` Vladimir Sementsov-Ogievskiy
2024-04-24 22:02 ` Peter Xu
1 sibling, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-24 17:42 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Now we do set MIGRATION_FAILED state, but don't give a chance to
orchestrator to query migration state and get the error.
Let's provide a possibility for QMP-based orchestrators to get an error
like with outgoing migration.
For x-exit-preconfig we can enable new behavior unconditionally, as
it's an unstable command.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
migration/migration-hmp-cmds.c | 2 +-
migration/migration.c | 31 +++++++++++++++++++++----------
migration/migration.h | 3 +++
qapi/migration.json | 7 ++++++-
system/vl.c | 7 +------
5 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..9c94a18029 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
}
QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
- qmp_migrate_incoming(NULL, true, caps, &err);
+ qmp_migrate_incoming(NULL, true, caps, true, true, &err);
qapi_free_MigrationChannelList(caps);
end:
diff --git a/migration/migration.c b/migration/migration.c
index 806b7b080b..fe72529ba7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -234,6 +234,8 @@ void migration_object_init(void)
qemu_cond_init(¤t_incoming->page_request_cond);
current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
+ current_incoming->exit_on_error = true;
+
migration_object_check(current_migration, &error_fatal);
blk_mig_init();
@@ -597,12 +599,15 @@ bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
static void qemu_start_incoming_migration(const char *uri, bool has_channels,
MigrationChannelList *channels,
+ bool exit_on_error,
Error **errp)
{
g_autoptr(MigrationChannel) channel = NULL;
MigrationAddress *addr = NULL;
MigrationIncomingState *mis = migration_incoming_get_current();
+ mis->exit_on_error = exit_on_error;
+
/*
* Having preliminary checks for uri and channel
*/
@@ -738,11 +743,12 @@ process_incoming_migration_co(void *opaque)
MigrationIncomingState *mis = migration_incoming_get_current();
PostcopyState ps;
int ret;
+ Error *local_err = NULL;
assert(mis->from_src_file);
if (compress_threads_load_setup(mis->from_src_file)) {
- error_report("Failed to setup decompress threads");
+ error_setg(&local_err, "Failed to setup decompress threads");
goto fail;
}
@@ -779,25 +785,26 @@ process_incoming_migration_co(void *opaque)
}
if (ret < 0) {
- error_report("load of migration failed: %s", strerror(-ret));
+ error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
goto fail;
}
if (colo_incoming_co() < 0) {
+ error_setg(&local_err, "colo incoming failed");
goto fail;
}
migration_bh_schedule(process_incoming_migration_bh, mis);
return;
fail:
+ migrate_report_err(local_err);
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_FAILED);
- qemu_fclose(mis->from_src_file);
-
- multifd_recv_cleanup();
- compress_threads_load_cleanup();
+ migration_incoming_state_destroy();
- exit(EXIT_FAILURE);
+ if (mis->exit_on_error) {
+ exit(EXIT_FAILURE);
+ }
}
/**
@@ -1785,7 +1792,9 @@ void migrate_del_blocker(Error **reasonp)
}
void qmp_migrate_incoming(const char *uri, bool has_channels,
- MigrationChannelList *channels, Error **errp)
+ MigrationChannelList *channels,
+ bool has_exit_on_error, bool exit_on_error,
+ Error **errp)
{
Error *local_err = NULL;
static bool once = true;
@@ -1803,7 +1812,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
return;
}
- qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
+ qemu_start_incoming_migration(uri, has_channels, channels,
+ has_exit_on_error ? exit_on_error : true,
+ &local_err);
if (local_err) {
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
@@ -1839,7 +1850,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
* only re-setup the migration stream and poke existing migration
* to continue using that newly established channel.
*/
- qemu_start_incoming_migration(uri, false, NULL, errp);
+ qemu_start_incoming_migration(uri, false, NULL, mis->exit_on_error, errp);
}
void qmp_migrate_pause(Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index f9f436c33e..089f710ee1 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -227,6 +227,9 @@ struct MigrationIncomingState {
* is needed as this field is updated serially.
*/
unsigned int switchover_ack_pending_num;
+
+ /* Do exit on incoming migration failure */
+ bool exit_on_error;
};
MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 8c65b90328..9de8b98d0b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1837,6 +1837,10 @@
# @channels: list of migration stream channels with each stream in the
# list connected to a destination interface endpoint.
#
+# @exit-on-error: Do exit on incoming migration failure. Default true.
+# When set to false, the error is reported by MIGRATION event and
+# error could be retrieved by query-migrate command. (since 9.1)
+#
# Since: 2.3
#
# Notes:
@@ -1889,7 +1893,8 @@
##
{ 'command': 'migrate-incoming',
'data': {'*uri': 'str',
- '*channels': [ 'MigrationChannel' ] } }
+ '*channels': [ 'MigrationChannel' ],
+ '*exit-on-error': 'bool' } }
##
# @xen-save-devices-state:
diff --git a/system/vl.c b/system/vl.c
index c644222982..f5b4c18200 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2718,13 +2718,8 @@ void qmp_x_exit_preconfig(Error **errp)
}
if (incoming) {
- Error *local_err = NULL;
if (strcmp(incoming, "defer") != 0) {
- qmp_migrate_incoming(incoming, false, NULL, &local_err);
- if (local_err) {
- error_reportf_err(local_err, "-incoming %s: ", incoming);
- exit(1);
- }
+ qmp_migrate_incoming(incoming, false, NULL, true, true, errp);
}
} else if (autostart) {
qmp_cont(NULL);
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()
2024-04-24 17:42 ` [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err() Vladimir Sementsov-Ogievskiy
@ 2024-04-24 19:40 ` Peter Xu
2024-04-24 19:50 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2024-04-24 19:40 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 1. Most of callers want to free the error after call. Let's help them.
>
> 2. Some callers log the error, some not. We also have places where we
> log the stored error. Let's instead simply log every migration
> error.
>
> 3. Some callers have to retrieve current migration state only to pass
> it to migrate_set_error(). In the new helper let's get the state
> automatically.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> migration/migration.c | 48 ++++++++++++----------------------------
> migration/migration.h | 2 +-
> migration/multifd.c | 18 ++++++---------
> migration/postcopy-ram.c | 3 +--
> migration/savevm.c | 16 +++++---------
> 5 files changed, 28 insertions(+), 59 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 696762bc64..806b7b080b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
> void migration_cancel(const Error *error)
> {
> if (error) {
> - migrate_set_error(current_migration, error);
> + migrate_report_err(error_copy(error));
> }
> if (migrate_dirty_limit()) {
> qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
> @@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
> }
>
> if (ret < 0) {
> - MigrationState *s = migrate_get_current();
> -
> - if (migrate_has_error(s)) {
> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> - error_report_err(s->error);
> - }
> - }
> error_report("load of migration failed: %s", strerror(-ret));
> goto fail;
> }
> @@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
> MIGRATION_STATUS_CANCELLED);
> }
>
> - if (s->error) {
> - /* It is used on info migrate. We can't free it */
> - error_report_err(error_copy(s->error));
> - }
> type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
> MIG_EVENT_PRECOPY_DONE;
> migration_call_notifiers(s, type, NULL);
> @@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
> migrate_fd_cleanup(opaque);
> }
>
> -void migrate_set_error(MigrationState *s, const Error *error)
> +void migrate_report_err(Error *error)
> {
> + MigrationState *s = migrate_get_current();
Avoid passing in *s looks ok.
> QEMU_LOCK_GUARD(&s->error_mutex);
> if (!s->error) {
> s->error = error_copy(error);
I think I get your point, but then looks like this error_copy() should be
removed but forgotten?
I remember I had an attempt to do similarly (but only transfer the
ownership):
https://lore.kernel.org/qemu-devel/20230829214235.69309-3-peterx@redhat.com/
However I gave up later and I forgot why. One reason could be that I hit a
use-after-free, then found that well indeed leaking an Error is much better
than debugging a UAF.
So maybe we simply keep it like before? If you like such change, let's
just be extremely caucious.
> }
> + error_report_err(error);
The ideal case to me is we don't trigger an error_report() all over the
place. We're slightly going backward from that regard, IMHO..
Ideally I think we shouldn't dump anything to stderr, but user should
always query from qmp.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()
2024-04-24 19:40 ` Peter Xu
@ 2024-04-24 19:50 ` Vladimir Sementsov-Ogievskiy
2024-04-24 21:52 ` Peter Xu
0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-24 19:50 UTC (permalink / raw)
To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On 24.04.24 22:40, Peter Xu wrote:
> On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 1. Most of callers want to free the error after call. Let's help them.
>>
>> 2. Some callers log the error, some not. We also have places where we
>> log the stored error. Let's instead simply log every migration
>> error.
>>
>> 3. Some callers have to retrieve current migration state only to pass
>> it to migrate_set_error(). In the new helper let's get the state
>> automatically.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>> migration/migration.c | 48 ++++++++++++----------------------------
>> migration/migration.h | 2 +-
>> migration/multifd.c | 18 ++++++---------
>> migration/postcopy-ram.c | 3 +--
>> migration/savevm.c | 16 +++++---------
>> 5 files changed, 28 insertions(+), 59 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 696762bc64..806b7b080b 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
>> void migration_cancel(const Error *error)
>> {
>> if (error) {
>> - migrate_set_error(current_migration, error);
>> + migrate_report_err(error_copy(error));
>> }
>> if (migrate_dirty_limit()) {
>> qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
>> @@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
>> }
>>
>> if (ret < 0) {
>> - MigrationState *s = migrate_get_current();
>> -
>> - if (migrate_has_error(s)) {
>> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> - error_report_err(s->error);
>> - }
>> - }
>> error_report("load of migration failed: %s", strerror(-ret));
>> goto fail;
>> }
>> @@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
>> MIGRATION_STATUS_CANCELLED);
>> }
>>
>> - if (s->error) {
>> - /* It is used on info migrate. We can't free it */
>> - error_report_err(error_copy(s->error));
>> - }
>> type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
>> MIG_EVENT_PRECOPY_DONE;
>> migration_call_notifiers(s, type, NULL);
>> @@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
>> migrate_fd_cleanup(opaque);
>> }
>>
>> -void migrate_set_error(MigrationState *s, const Error *error)
>> +void migrate_report_err(Error *error)
>> {
>> + MigrationState *s = migrate_get_current();
>
> Avoid passing in *s looks ok.
>
>> QEMU_LOCK_GUARD(&s->error_mutex);
>> if (!s->error) {
>> s->error = error_copy(error);
>
> I think I get your point, but then looks like this error_copy() should be
> removed but forgotten?
>
> I remember I had an attempt to do similarly (but only transfer the
> ownership):
>
> https://lore.kernel.org/qemu-devel/20230829214235.69309-3-peterx@redhat.com/
>
> However I gave up later and I forgot why. One reason could be that I hit a
> use-after-free, then found that well indeed leaking an Error is much better
> than debugging a UAF.
Hmm, yes I saw a leaked Error somewhere, and this patch should fix it. But may be I missed introducing this use-after-free again)
>
> So maybe we simply keep it like before? If you like such change, let's
> just be extremely caucious.
>
>> }
>> + error_report_err(error);
>
> The ideal case to me is we don't trigger an error_report() all over the
> place. We're slightly going backward from that regard, IMHO..
>
> Ideally I think we shouldn't dump anything to stderr, but user should
> always query from qmp.
>
Sounds reasonable to me. I'm just unsure, if keep it like before, how should I correctly update logging to stderr in process_incoming_migration_co(). Could we just drop error reporting to stderr? Or should it be kept as is for the case when exit_on_error is true?
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()
2024-04-24 19:50 ` Vladimir Sementsov-Ogievskiy
@ 2024-04-24 21:52 ` Peter Xu
2024-04-25 8:25 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2024-04-24 21:52 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On Wed, Apr 24, 2024 at 10:50:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 24.04.24 22:40, Peter Xu wrote:
> > On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > 1. Most of callers want to free the error after call. Let's help them.
> > >
> > > 2. Some callers log the error, some not. We also have places where we
> > > log the stored error. Let's instead simply log every migration
> > > error.
> > >
> > > 3. Some callers have to retrieve current migration state only to pass
> > > it to migrate_set_error(). In the new helper let's get the state
> > > automatically.
> > >
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > ---
> > > migration/migration.c | 48 ++++++++++++----------------------------
> > > migration/migration.h | 2 +-
> > > migration/multifd.c | 18 ++++++---------
> > > migration/postcopy-ram.c | 3 +--
> > > migration/savevm.c | 16 +++++---------
> > > 5 files changed, 28 insertions(+), 59 deletions(-)
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 696762bc64..806b7b080b 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
> > > void migration_cancel(const Error *error)
> > > {
> > > if (error) {
> > > - migrate_set_error(current_migration, error);
> > > + migrate_report_err(error_copy(error));
> > > }
> > > if (migrate_dirty_limit()) {
> > > qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
> > > @@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
> > > }
> > > if (ret < 0) {
> > > - MigrationState *s = migrate_get_current();
> > > -
> > > - if (migrate_has_error(s)) {
> > > - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> > > - error_report_err(s->error);
> > > - }
> > > - }
> > > error_report("load of migration failed: %s", strerror(-ret));
> > > goto fail;
> > > }
> > > @@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
> > > MIGRATION_STATUS_CANCELLED);
> > > }
> > > - if (s->error) {
> > > - /* It is used on info migrate. We can't free it */
> > > - error_report_err(error_copy(s->error));
> > > - }
> > > type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
> > > MIG_EVENT_PRECOPY_DONE;
> > > migration_call_notifiers(s, type, NULL);
> > > @@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
> > > migrate_fd_cleanup(opaque);
> > > }
> > > -void migrate_set_error(MigrationState *s, const Error *error)
> > > +void migrate_report_err(Error *error)
> > > {
> > > + MigrationState *s = migrate_get_current();
> >
> > Avoid passing in *s looks ok.
> >
> > > QEMU_LOCK_GUARD(&s->error_mutex);
> > > if (!s->error) {
> > > s->error = error_copy(error);
> >
> > I think I get your point, but then looks like this error_copy() should be
> > removed but forgotten?
> >
> > I remember I had an attempt to do similarly (but only transfer the
> > ownership):
> >
> > https://lore.kernel.org/qemu-devel/20230829214235.69309-3-peterx@redhat.com/
> >
> > However I gave up later and I forgot why. One reason could be that I hit a
> > use-after-free, then found that well indeed leaking an Error is much better
> > than debugging a UAF.
>
> Hmm, yes I saw a leaked Error somewhere, and this patch should fix it. But may be I missed introducing this use-after-free again)
Ah do you mean you fixed a bug? If so please use a standalone patch for
that and we'll need to copy stable.
I did notice that in this series on patch looks like does more than one
thing. It'll be helpful too for reviewers when patch can be split
properly.
>
> >
> > So maybe we simply keep it like before? If you like such change, let's
> > just be extremely caucious.
> >
> > > }
> > > + error_report_err(error);
> >
> > The ideal case to me is we don't trigger an error_report() all over the
> > place. We're slightly going backward from that regard, IMHO..
> >
> > Ideally I think we shouldn't dump anything to stderr, but user should
> > always query from qmp.
> >
>
> Sounds reasonable to me. I'm just unsure, if keep it like before, how
> should I correctly update logging to stderr in
> process_incoming_migration_co(). Could we just drop error reporting to
> stderr? Or should it be kept as is for the case when exit_on_error is
> true?
I'm not sure I get the question, but I'll comment in patch 2 very soon, so
we can move the discussion there.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] qapi: introduce exit-on-error paramter for migrate-incoming
2024-04-24 17:42 ` [PATCH v2 2/2] qapi: introduce exit-on-error paramter for migrate-incoming Vladimir Sementsov-Ogievskiy
@ 2024-04-24 22:02 ` Peter Xu
0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2024-04-24 22:02 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On Wed, Apr 24, 2024 at 08:42:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Now we do set MIGRATION_FAILED state, but don't give a chance to
> orchestrator to query migration state and get the error.
>
> Let's provide a possibility for QMP-based orchestrators to get an error
> like with outgoing migration.
>
> For x-exit-preconfig we can enable new behavior unconditionally, as
> it's an unstable command.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> migration/migration-hmp-cmds.c | 2 +-
> migration/migration.c | 31 +++++++++++++++++++++----------
> migration/migration.h | 3 +++
> qapi/migration.json | 7 ++++++-
> system/vl.c | 7 +------
> 5 files changed, 32 insertions(+), 18 deletions(-)
>
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7e96ae6ffd..9c94a18029 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
> }
> QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
>
> - qmp_migrate_incoming(NULL, true, caps, &err);
> + qmp_migrate_incoming(NULL, true, caps, true, true, &err);
Hmm, to me HMP implies more on "this is a developer, and he/she can be
debugging stuff", in that case I'm thinking whether this would be a good
chance to let exit=false already? So that developers can keep the qemu
instances when error occurs.
IOW, I'd think it's fine to break HMP as that's not an ABI, so we choose
whatever makes sense to us.
> qapi_free_MigrationChannelList(caps);
>
> end:
> diff --git a/migration/migration.c b/migration/migration.c
> index 806b7b080b..fe72529ba7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -234,6 +234,8 @@ void migration_object_init(void)
> qemu_cond_init(¤t_incoming->page_request_cond);
> current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>
> + current_incoming->exit_on_error = true;
Let's define a macro for the default value? Then we can use it below too [1].
> +
> migration_object_check(current_migration, &error_fatal);
>
> blk_mig_init();
> @@ -597,12 +599,15 @@ bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
>
> static void qemu_start_incoming_migration(const char *uri, bool has_channels,
> MigrationChannelList *channels,
> + bool exit_on_error,
> Error **errp)
> {
> g_autoptr(MigrationChannel) channel = NULL;
> MigrationAddress *addr = NULL;
> MigrationIncomingState *mis = migration_incoming_get_current();
>
> + mis->exit_on_error = exit_on_error;
Maybe it's better to set this in qmp_migrate_incoming() directly? As then
we guarantee qmp_migrate_recover() won't ever try to touch it.
> +
> /*
> * Having preliminary checks for uri and channel
> */
> @@ -738,11 +743,12 @@ process_incoming_migration_co(void *opaque)
> MigrationIncomingState *mis = migration_incoming_get_current();
> PostcopyState ps;
> int ret;
> + Error *local_err = NULL;
>
> assert(mis->from_src_file);
>
> if (compress_threads_load_setup(mis->from_src_file)) {
> - error_report("Failed to setup decompress threads");
> + error_setg(&local_err, "Failed to setup decompress threads");
> goto fail;
> }
>
> @@ -779,25 +785,26 @@ process_incoming_migration_co(void *opaque)
> }
>
> if (ret < 0) {
> - error_report("load of migration failed: %s", strerror(-ret));
> + error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
> goto fail;
> }
>
> if (colo_incoming_co() < 0) {
> + error_setg(&local_err, "colo incoming failed");
> goto fail;
> }
>
> migration_bh_schedule(process_incoming_migration_bh, mis);
> return;
> fail:
> + migrate_report_err(local_err);
> migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> MIGRATION_STATUS_FAILED);
> - qemu_fclose(mis->from_src_file);
> -
> - multifd_recv_cleanup();
> - compress_threads_load_cleanup();
> + migration_incoming_state_destroy();
IMHO it'll be good to split such change into another patch.
>
> - exit(EXIT_FAILURE);
> + if (mis->exit_on_error) {
> + exit(EXIT_FAILURE);
> + }
Instead of migrate_report_err(), I'd error_report() here.
if (exit_on_error) {
// error_report...
exit();
}
And perhaps dump nothing if don't exit? Then the user/developer can check
using info/query migrate.
> }
>
> /**
> @@ -1785,7 +1792,9 @@ void migrate_del_blocker(Error **reasonp)
> }
>
> void qmp_migrate_incoming(const char *uri, bool has_channels,
> - MigrationChannelList *channels, Error **errp)
> + MigrationChannelList *channels,
> + bool has_exit_on_error, bool exit_on_error,
> + Error **errp)
> {
> Error *local_err = NULL;
> static bool once = true;
> @@ -1803,7 +1812,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
> return;
> }
>
> - qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
> + qemu_start_incoming_migration(uri, has_channels, channels,
> + has_exit_on_error ? exit_on_error : true,
[1]
> + &local_err);
>
> if (local_err) {
> yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> @@ -1839,7 +1850,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
> * only re-setup the migration stream and poke existing migration
> * to continue using that newly established channel.
> */
> - qemu_start_incoming_migration(uri, false, NULL, errp);
> + qemu_start_incoming_migration(uri, false, NULL, mis->exit_on_error, errp);
(if we set it in qmp_migrate_incoming, we shouldn't need this change)
Thanks,
> }
>
> void qmp_migrate_pause(Error **errp)
> diff --git a/migration/migration.h b/migration/migration.h
> index f9f436c33e..089f710ee1 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -227,6 +227,9 @@ struct MigrationIncomingState {
> * is needed as this field is updated serially.
> */
> unsigned int switchover_ack_pending_num;
> +
> + /* Do exit on incoming migration failure */
> + bool exit_on_error;
> };
>
> MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8c65b90328..9de8b98d0b 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1837,6 +1837,10 @@
> # @channels: list of migration stream channels with each stream in the
> # list connected to a destination interface endpoint.
> #
> +# @exit-on-error: Do exit on incoming migration failure. Default true.
> +# When set to false, the error is reported by MIGRATION event and
> +# error could be retrieved by query-migrate command. (since 9.1)
> +#
> # Since: 2.3
> #
> # Notes:
> @@ -1889,7 +1893,8 @@
> ##
> { 'command': 'migrate-incoming',
> 'data': {'*uri': 'str',
> - '*channels': [ 'MigrationChannel' ] } }
> + '*channels': [ 'MigrationChannel' ],
> + '*exit-on-error': 'bool' } }
>
> ##
> # @xen-save-devices-state:
> diff --git a/system/vl.c b/system/vl.c
> index c644222982..f5b4c18200 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -2718,13 +2718,8 @@ void qmp_x_exit_preconfig(Error **errp)
> }
>
> if (incoming) {
> - Error *local_err = NULL;
> if (strcmp(incoming, "defer") != 0) {
> - qmp_migrate_incoming(incoming, false, NULL, &local_err);
> - if (local_err) {
> - error_reportf_err(local_err, "-incoming %s: ", incoming);
> - exit(1);
> - }
> + qmp_migrate_incoming(incoming, false, NULL, true, true, errp);
> }
> } else if (autostart) {
> qmp_cont(NULL);
> --
> 2.34.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()
2024-04-24 21:52 ` Peter Xu
@ 2024-04-25 8:25 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-25 8:25 UTC (permalink / raw)
To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core
On 25.04.24 00:52, Peter Xu wrote:
> On Wed, Apr 24, 2024 at 10:50:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 24.04.24 22:40, Peter Xu wrote:
>>> On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> 1. Most of callers want to free the error after call. Let's help them.
>>>>
>>>> 2. Some callers log the error, some not. We also have places where we
>>>> log the stored error. Let's instead simply log every migration
>>>> error.
>>>>
>>>> 3. Some callers have to retrieve current migration state only to pass
>>>> it to migrate_set_error(). In the new helper let's get the state
>>>> automatically.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>>> migration/migration.c | 48 ++++++++++++----------------------------
>>>> migration/migration.h | 2 +-
>>>> migration/multifd.c | 18 ++++++---------
>>>> migration/postcopy-ram.c | 3 +--
>>>> migration/savevm.c | 16 +++++---------
>>>> 5 files changed, 28 insertions(+), 59 deletions(-)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 696762bc64..806b7b080b 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
>>>> void migration_cancel(const Error *error)
>>>> {
>>>> if (error) {
>>>> - migrate_set_error(current_migration, error);
>>>> + migrate_report_err(error_copy(error));
>>>> }
>>>> if (migrate_dirty_limit()) {
>>>> qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
>>>> @@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
>>>> }
>>>> if (ret < 0) {
>>>> - MigrationState *s = migrate_get_current();
>>>> -
>>>> - if (migrate_has_error(s)) {
>>>> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>>>> - error_report_err(s->error);
>>>> - }
>>>> - }
>>>> error_report("load of migration failed: %s", strerror(-ret));
>>>> goto fail;
>>>> }
>>>> @@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
>>>> MIGRATION_STATUS_CANCELLED);
>>>> }
>>>> - if (s->error) {
>>>> - /* It is used on info migrate. We can't free it */
>>>> - error_report_err(error_copy(s->error));
>>>> - }
>>>> type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
>>>> MIG_EVENT_PRECOPY_DONE;
>>>> migration_call_notifiers(s, type, NULL);
>>>> @@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
>>>> migrate_fd_cleanup(opaque);
>>>> }
>>>> -void migrate_set_error(MigrationState *s, const Error *error)
>>>> +void migrate_report_err(Error *error)
>>>> {
>>>> + MigrationState *s = migrate_get_current();
>>>
>>> Avoid passing in *s looks ok.
>>>
>>>> QEMU_LOCK_GUARD(&s->error_mutex);
>>>> if (!s->error) {
>>>> s->error = error_copy(error);
>>>
>>> I think I get your point, but then looks like this error_copy() should be
>>> removed but forgotten?
>>>
>>> I remember I had an attempt to do similarly (but only transfer the
>>> ownership):
>>>
>>> https://lore.kernel.org/qemu-devel/20230829214235.69309-3-peterx@redhat.com/
>>>
>>> However I gave up later and I forgot why. One reason could be that I hit a
>>> use-after-free, then found that well indeed leaking an Error is much better
>>> than debugging a UAF.
>>
>> Hmm, yes I saw a leaked Error somewhere, and this patch should fix it. But may be I missed introducing this use-after-free again)
>
> Ah do you mean you fixed a bug? If so please use a standalone patch for
> that and we'll need to copy stable.
>
Hmm, this hunk:
@@ -1287,7 +1283,7 @@ static void multifd_recv_terminate_threads(Error *err)
if (err) {
MigrationState *s = migrate_get_current();
- migrate_set_error(s, err);
+ migrate_report_err(error_copy(err));
if (s->state == MIGRATION_STATUS_SETUP ||
s->state == MIGRATION_STATUS_ACTIVE) {
migrate_set_state(&s->state, s->state,
No, seems I don't fix bug, but exactly introduce use-after-free :). Will fix.
> I did notice that in this series on patch looks like does more than one
> thing. It'll be helpful too for reviewers when patch can be split
> properly.
Agree, OK
>
>>
>>>
>>> So maybe we simply keep it like before? If you like such change, let's
>>> just be extremely caucious.
>>>
>>>> }
>>>> + error_report_err(error);
>>>
>>> The ideal case to me is we don't trigger an error_report() all over the
>>> place. We're slightly going backward from that regard, IMHO..
>>>
>>> Ideally I think we shouldn't dump anything to stderr, but user should
>>> always query from qmp.
>>>
>>
>> Sounds reasonable to me. I'm just unsure, if keep it like before, how
>> should I correctly update logging to stderr in
>> process_incoming_migration_co(). Could we just drop error reporting to
>> stderr? Or should it be kept as is for the case when exit_on_error is
>> true?
>
> I'm not sure I get the question, but I'll comment in patch 2 very soon, so
> we can move the discussion there.
>
> Thanks,
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-04-25 8:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-24 17:42 [PATCH v2 0/2] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
2024-04-24 17:42 ` [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err() Vladimir Sementsov-Ogievskiy
2024-04-24 19:40 ` Peter Xu
2024-04-24 19:50 ` Vladimir Sementsov-Ogievskiy
2024-04-24 21:52 ` Peter Xu
2024-04-25 8:25 ` Vladimir Sementsov-Ogievskiy
2024-04-24 17:42 ` [PATCH v2 2/2] qapi: introduce exit-on-error paramter for migrate-incoming Vladimir Sementsov-Ogievskiy
2024-04-24 22:02 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).