* [PATCH v5 0/5] migration: do not exit on incoming failure
@ 2024-04-29 19:14 Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core
Hi all!
The series brings an option to not immediately exit on incoming
migration failure, giving a possibility to orchestrator to get the error
through QAPI and shutdown QEMU by "quit".
v5:
- add "migration: process_incoming_migration_co(): fix reporting s->error"
v4:
- add r-b and a-b by Fabiano and Markus
- improve wording in 04 as Markus suggested
v3:
- don't refactor the whole code around setting migration error, it seems
too much and necessary for the new feature itself
- add constant
- change behavior for HMP command
- split some things to separate patches
- and more, by Peter's suggestions
New behavior can be demonstrated like this:
bash:
(
cat <<EOF
{'execute': 'qmp_capabilities'}
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [{'capability': 'events', 'state': true}]}}
{'execute': 'migrate-incoming', 'arguments': {'uri': 'exec:echo x', 'exit-on-error': false}}
EOF
sleep 1
cat <<EOF
{'execute': 'query-migrate'}
{'execute': 'quit'}
EOF
) | ./build/qemu-system-x86_64 -incoming 'defer' -qmp stdio -nographic -nodefaults
output:
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-149-gb6295ad58c"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1714068847, "microseconds": 263907}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1714068847, "microseconds": 266696}, "event": "MIGRATION", "data": {"status": "active"}}
qemu-system-x86_64: Not a migration stream
{"timestamp": {"seconds": 1714068847, "microseconds": 266766}, "event": "MIGRATION", "data": {"status": "failed"}}
{"return": {"status": "failed", "error-desc": "load of migration failed: Invalid argument"}}
{"timestamp": {"seconds": 1714068848, "microseconds": 237187}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}
Vladimir Sementsov-Ogievskiy (5):
migration: move trace-point from migrate_fd_error to migrate_set_error
migration: process_incoming_migration_co(): complete cleanup on
failure
migration: process_incoming_migration_co(): fix reporting s->error
migration: process_incoming_migration_co(): rework error reporting
qapi: introduce exit-on-error parameter for migrate-incoming
migration/migration-hmp-cmds.c | 2 +-
migration/migration.c | 76 +++++++++++++++++++++++-----------
migration/migration.h | 3 ++
migration/trace-events | 2 +-
qapi/migration.json | 7 +++-
system/vl.c | 3 +-
6 files changed, 64 insertions(+), 29 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error 2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy ` (3 subsequent siblings) 4 siblings, 0 replies; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw) To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core Cover more cases by trace-point. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fabiano Rosas <farosas@suse.de> --- migration/migration.c | 4 +++- migration/trace-events | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index b5af6b5105..2dc6a063e9 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque) void migrate_set_error(MigrationState *s, const Error *error) { QEMU_LOCK_GUARD(&s->error_mutex); + + trace_migrate_error(error_get_pretty(error)); + if (!s->error) { s->error = error_copy(error); } @@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s) static void migrate_fd_error(MigrationState *s, const Error *error) { - trace_migrate_fd_error(error_get_pretty(error)); assert(s->to_dst_file == NULL); migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED); diff --git a/migration/trace-events b/migration/trace-events index f0e1cb80c7..d0c44c3853 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam # migration.c migrate_set_state(const char *new_state) "new state %s" migrate_fd_cleanup(void) "" -migrate_fd_error(const char *error_desc) "error=%s" +migrate_error(const char *error_desc) "error=%s" migrate_fd_cancel(void) "" migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx" migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")" -- 2.34.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure 2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy 2024-04-29 19:22 ` Peter Xu 2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy ` (2 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw) To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core Make call to migration_incoming_state_destroy(), instead of doing only partial of it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fabiano Rosas <farosas@suse.de> --- migration/migration.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 2dc6a063e9..0d26db47f7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque) fail: migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED); - qemu_fclose(mis->from_src_file); - - multifd_recv_cleanup(); - compress_threads_load_cleanup(); + migration_incoming_state_destroy(); exit(EXIT_FAILURE); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure 2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:22 ` Peter Xu 0 siblings, 0 replies; 14+ messages in thread From: Peter Xu @ 2024-04-29 19:22 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On Mon, Apr 29, 2024 at 10:14:23PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Make call to migration_incoming_state_destroy(), instead of doing only > partial of it. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error 2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy 2024-04-29 19:32 ` Peter Xu 2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy 4 siblings, 1 reply; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw) To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core It's bad idea to leave critical section with error object freed, but s->error still set, this theoretically may lead to use-after-free crash. Let's avoid it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- migration/migration.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 0d26db47f7..58fd5819bc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) migration_incoming_state_destroy(); } +static void migrate_error_free(MigrationState *s) +{ + QEMU_LOCK_GUARD(&s->error_mutex); + if (s->error) { + error_free(s->error); + s->error = NULL; + } +} + static void coroutine_fn process_incoming_migration_co(void *opaque) { + MigrationState *s = migrate_get_current(); MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { - MigrationState *s = migrate_get_current(); - if (migrate_has_error(s)) { WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(s->error); + error_report_err(error_copy(s->error)); } } error_report("load of migration failed: %s", strerror(-ret)); @@ -801,6 +809,7 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); + migrate_error_free(s); exit(EXIT_FAILURE); } @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) return qatomic_read(&s->error); } -static void migrate_error_free(MigrationState *s) -{ - QEMU_LOCK_GUARD(&s->error_mutex); - if (s->error) { - error_free(s->error); - s->error = NULL; - } -} - static void migrate_fd_error(MigrationState *s, const Error *error) { assert(s->to_dst_file == NULL); -- 2.34.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error 2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:32 ` Peter Xu 2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 14+ messages in thread From: Peter Xu @ 2024-04-29 19:32 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: > It's bad idea to leave critical section with error object freed, but > s->error still set, this theoretically may lead to use-after-free > crash. Let's avoid it. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > --- > migration/migration.c | 24 ++++++++++++------------ > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 0d26db47f7..58fd5819bc 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) > migration_incoming_state_destroy(); > } > > +static void migrate_error_free(MigrationState *s) > +{ > + QEMU_LOCK_GUARD(&s->error_mutex); > + if (s->error) { > + error_free(s->error); > + s->error = NULL; > + } > +} > + > static void coroutine_fn > process_incoming_migration_co(void *opaque) > { > + MigrationState *s = migrate_get_current(); > MigrationIncomingState *mis = migration_incoming_get_current(); > PostcopyState ps; > int ret; > @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) > } > > if (ret < 0) { > - MigrationState *s = migrate_get_current(); > - > if (migrate_has_error(s)) { > WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > - error_report_err(s->error); > + error_report_err(error_copy(s->error)); This looks like a bugfix, agreed. > } > } > error_report("load of migration failed: %s", strerror(-ret)); > @@ -801,6 +809,7 @@ fail: > MIGRATION_STATUS_FAILED); > migration_incoming_state_destroy(); > > + migrate_error_free(s); Would migration_incoming_state_destroy() be a better place? One thing weird is we actually reuses MigrationState*'s error for incoming too, but so far it looks ok as long as QEMU can't be both src & dst. Then calling migrate_error_free even in incoming_state_destroy() looks ok. This patch still looks like containing two changes. Better split them (or just fix the bug only)? Thanks, > exit(EXIT_FAILURE); > } > > @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) > return qatomic_read(&s->error); > } > > -static void migrate_error_free(MigrationState *s) > -{ > - QEMU_LOCK_GUARD(&s->error_mutex); > - if (s->error) { > - error_free(s->error); > - s->error = NULL; > - } > -} > - > static void migrate_fd_error(MigrationState *s, const Error *error) > { > assert(s->to_dst_file == NULL); > -- > 2.34.1 > -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error 2024-04-29 19:32 ` Peter Xu @ 2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy 2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:06 UTC (permalink / raw) To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On 29.04.24 22:32, Peter Xu wrote: > On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> It's bad idea to leave critical section with error object freed, but >> s->error still set, this theoretically may lead to use-after-free >> crash. Let's avoid it. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> --- >> migration/migration.c | 24 ++++++++++++------------ >> 1 file changed, 12 insertions(+), 12 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 0d26db47f7..58fd5819bc 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) >> migration_incoming_state_destroy(); >> } >> >> +static void migrate_error_free(MigrationState *s) >> +{ >> + QEMU_LOCK_GUARD(&s->error_mutex); >> + if (s->error) { >> + error_free(s->error); >> + s->error = NULL; >> + } >> +} >> + >> static void coroutine_fn >> process_incoming_migration_co(void *opaque) >> { >> + MigrationState *s = migrate_get_current(); >> MigrationIncomingState *mis = migration_incoming_get_current(); >> PostcopyState ps; >> int ret; >> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) >> } >> >> if (ret < 0) { >> - MigrationState *s = migrate_get_current(); >> - >> if (migrate_has_error(s)) { >> WITH_QEMU_LOCK_GUARD(&s->error_mutex) { >> - error_report_err(s->error); >> + error_report_err(error_copy(s->error)); > > This looks like a bugfix, agreed. > >> } >> } >> error_report("load of migration failed: %s", strerror(-ret)); >> @@ -801,6 +809,7 @@ fail: >> MIGRATION_STATUS_FAILED); >> migration_incoming_state_destroy(); >> >> + migrate_error_free(s); > > Would migration_incoming_state_destroy() be a better place? Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands. > > One thing weird is we actually reuses MigrationState*'s error for incoming > too, but so far it looks ok as long as QEMU can't be both src & dst. Then > calling migrate_error_free even in incoming_state_destroy() looks ok. > > This patch still looks like containing two changes. Better split them (or > just fix the bug only)? > > Thanks, > >> exit(EXIT_FAILURE); >> } >> >> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) >> return qatomic_read(&s->error); >> } >> >> -static void migrate_error_free(MigrationState *s) >> -{ >> - QEMU_LOCK_GUARD(&s->error_mutex); >> - if (s->error) { >> - error_free(s->error); >> - s->error = NULL; >> - } >> -} >> - >> static void migrate_fd_error(MigrationState *s, const Error *error) >> { >> assert(s->to_dst_file == NULL); >> -- >> 2.34.1 >> > -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error 2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy 2024-04-30 8:47 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:09 UTC (permalink / raw) To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote: > On 29.04.24 22:32, Peter Xu wrote: >> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>> It's bad idea to leave critical section with error object freed, but >>> s->error still set, this theoretically may lead to use-after-free >>> crash. Let's avoid it. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >>> --- >>> migration/migration.c | 24 ++++++++++++------------ >>> 1 file changed, 12 insertions(+), 12 deletions(-) >>> >>> diff --git a/migration/migration.c b/migration/migration.c >>> index 0d26db47f7..58fd5819bc 100644 >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) >>> migration_incoming_state_destroy(); >>> } >>> +static void migrate_error_free(MigrationState *s) >>> +{ >>> + QEMU_LOCK_GUARD(&s->error_mutex); >>> + if (s->error) { >>> + error_free(s->error); >>> + s->error = NULL; >>> + } >>> +} >>> + >>> static void coroutine_fn >>> process_incoming_migration_co(void *opaque) >>> { >>> + MigrationState *s = migrate_get_current(); >>> MigrationIncomingState *mis = migration_incoming_get_current(); >>> PostcopyState ps; >>> int ret; >>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) >>> } >>> if (ret < 0) { >>> - MigrationState *s = migrate_get_current(); >>> - >>> if (migrate_has_error(s)) { >>> WITH_QEMU_LOCK_GUARD(&s->error_mutex) { >>> - error_report_err(s->error); >>> + error_report_err(error_copy(s->error)); >> >> This looks like a bugfix, agreed. >> >>> } >>> } >>> error_report("load of migration failed: %s", strerror(-ret)); >>> @@ -801,6 +809,7 @@ fail: >>> MIGRATION_STATUS_FAILED); >>> migration_incoming_state_destroy(); >>> + migrate_error_free(s); >> >> Would migration_incoming_state_destroy() be a better place? > > Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands. Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit(). > >> >> One thing weird is we actually reuses MigrationState*'s error for incoming >> too, but so far it looks ok as long as QEMU can't be both src & dst. Then >> calling migrate_error_free even in incoming_state_destroy() looks ok. >> >> This patch still looks like containing two changes. Better split them (or >> just fix the bug only)? >> >> Thanks, >> >>> exit(EXIT_FAILURE); >>> } >>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) >>> return qatomic_read(&s->error); >>> } >>> -static void migrate_error_free(MigrationState *s) >>> -{ >>> - QEMU_LOCK_GUARD(&s->error_mutex); >>> - if (s->error) { >>> - error_free(s->error); >>> - s->error = NULL; >>> - } >>> -} >>> - >>> static void migrate_fd_error(MigrationState *s, const Error *error) >>> { >>> assert(s->to_dst_file == NULL); >>> -- >>> 2.34.1 >>> >> > -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error 2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:47 ` Vladimir Sementsov-Ogievskiy 0 siblings, 0 replies; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 8:47 UTC (permalink / raw) To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On 30.04.24 11:09, Vladimir Sementsov-Ogievskiy wrote: > On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote: >> On 29.04.24 22:32, Peter Xu wrote: >>> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>>> It's bad idea to leave critical section with error object freed, but >>>> s->error still set, this theoretically may lead to use-after-free >>>> crash. Let's avoid it. >>>> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >>>> --- >>>> migration/migration.c | 24 ++++++++++++------------ >>>> 1 file changed, 12 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/migration/migration.c b/migration/migration.c >>>> index 0d26db47f7..58fd5819bc 100644 >>>> --- a/migration/migration.c >>>> +++ b/migration/migration.c >>>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) >>>> migration_incoming_state_destroy(); >>>> } >>>> +static void migrate_error_free(MigrationState *s) >>>> +{ >>>> + QEMU_LOCK_GUARD(&s->error_mutex); >>>> + if (s->error) { >>>> + error_free(s->error); >>>> + s->error = NULL; >>>> + } >>>> +} >>>> + >>>> static void coroutine_fn >>>> process_incoming_migration_co(void *opaque) >>>> { >>>> + MigrationState *s = migrate_get_current(); >>>> MigrationIncomingState *mis = migration_incoming_get_current(); >>>> PostcopyState ps; >>>> int ret; >>>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) >>>> } >>>> if (ret < 0) { >>>> - MigrationState *s = migrate_get_current(); >>>> - >>>> if (migrate_has_error(s)) { >>>> WITH_QEMU_LOCK_GUARD(&s->error_mutex) { >>>> - error_report_err(s->error); >>>> + error_report_err(error_copy(s->error)); >>> >>> This looks like a bugfix, agreed. >>> >>>> } >>>> } >>>> error_report("load of migration failed: %s", strerror(-ret)); >>>> @@ -801,6 +809,7 @@ fail: >>>> MIGRATION_STATUS_FAILED); >>>> migration_incoming_state_destroy(); >>>> + migrate_error_free(s); >>> >>> Would migration_incoming_state_destroy() be a better place? >> >> Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands. > > Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit(). > Or, just do the simplest fix of potential use-after-free, and don't care: WITH_QEMU_LOCK_GUARD(&s->error_mutex) { error_report_err(s->error); s->error = NULL; } - I'll send v6 with it. >> >>> >>> One thing weird is we actually reuses MigrationState*'s error for incoming >>> too, but so far it looks ok as long as QEMU can't be both src & dst. Then >>> calling migrate_error_free even in incoming_state_destroy() looks ok. >>> >>> This patch still looks like containing two changes. Better split them (or >>> just fix the bug only)? >>> >>> Thanks, >>> >>>> exit(EXIT_FAILURE); >>>> } >>>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) >>>> return qatomic_read(&s->error); >>>> } >>>> -static void migrate_error_free(MigrationState *s) >>>> -{ >>>> - QEMU_LOCK_GUARD(&s->error_mutex); >>>> - if (s->error) { >>>> - error_free(s->error); >>>> - s->error = NULL; >>>> - } >>>> -} >>>> - >>>> static void migrate_fd_error(MigrationState *s, const Error *error) >>>> { >>>> assert(s->to_dst_file == NULL); >>>> -- >>>> 2.34.1 >>>> >>> >> > -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting 2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy ` (2 preceding siblings ...) 2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy 2024-04-29 19:39 ` Peter Xu 2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy 4 siblings, 1 reply; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw) To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core Unify error reporting in the function. This simplifies the following commit, which will not-exit-on-error behavior variant to the function. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fabiano Rosas <farosas@suse.de> --- migration/migration.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 58fd5819bc..5489ff96df 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque) MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; + Error *local_err = NULL; assert(mis->from_src_file); if (compress_threads_load_setup(mis->from_src_file)) { - error_report("Failed to setup decompress threads"); + error_setg(&local_err, "Failed to setup decompress threads"); goto fail; } @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { - if (migrate_has_error(s)) { - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(error_copy(s->error)); - } - } - error_report("load of migration failed: %s", strerror(-ret)); + error_setg(&local_err, "load of migration failed: %s", strerror(-ret)); goto fail; } if (colo_incoming_co() < 0) { + error_setg(&local_err, "colo incoming failed"); goto fail; } @@ -809,6 +806,12 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); + if (migrate_has_error(s)) { + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { + error_report_err(error_copy(s->error)); + } + } + error_report_err(local_err); migrate_error_free(s); exit(EXIT_FAILURE); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting 2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:39 ` Peter Xu 2024-04-30 7:12 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 14+ messages in thread From: Peter Xu @ 2024-04-29 19:39 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Unify error reporting in the function. This simplifies the following > commit, which will not-exit-on-error behavior variant to the function. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Reviewed-by: Fabiano Rosas <farosas@suse.de> > --- > migration/migration.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 58fd5819bc..5489ff96df 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque) > MigrationIncomingState *mis = migration_incoming_get_current(); > PostcopyState ps; > int ret; > + Error *local_err = NULL; > > assert(mis->from_src_file); > > if (compress_threads_load_setup(mis->from_src_file)) { > - error_report("Failed to setup decompress threads"); > + error_setg(&local_err, "Failed to setup decompress threads"); > goto fail; > } > > @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque) > } > > if (ret < 0) { > - if (migrate_has_error(s)) { > - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > - error_report_err(error_copy(s->error)); > - } > - } > - error_report("load of migration failed: %s", strerror(-ret)); > + error_setg(&local_err, "load of migration failed: %s", strerror(-ret)); > goto fail; > } > > if (colo_incoming_co() < 0) { > + error_setg(&local_err, "colo incoming failed"); > goto fail; > } > > @@ -809,6 +806,12 @@ fail: > MIGRATION_STATUS_FAILED); > migration_incoming_state_destroy(); > > + if (migrate_has_error(s)) { > + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > + error_report_err(error_copy(s->error)); > + } > + } > + error_report_err(local_err); Here migrate_has_error() will always return true? If so we can drop it. Meanwhile, IMHO it's easier we simply always keep the earliest error we see and report that only, local_err is just one of the errors and whoever reaches first will be reported. Something like: migrate_set_error(local_err); WITH_QEMU_LOCK_GUARD(&s->error_mutex) { error_report_err(error_copy(s->error)); } exit(EXIT_FAILURE); Then when with the exit-on-error thing: migrate_set_error(local_err); if (exit_on_error) { WITH_QEMU_LOCK_GUARD(&s->error_mutex) { error_report_err(error_copy(s->error)); } exit(EXIT_FAILURE); } Would this looks slightly cleaner? -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting 2024-04-29 19:39 ` Peter Xu @ 2024-04-30 7:12 ` Vladimir Sementsov-Ogievskiy 0 siblings, 0 replies; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-30 7:12 UTC (permalink / raw) To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On 29.04.24 22:39, Peter Xu wrote: > On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> Unify error reporting in the function. This simplifies the following >> commit, which will not-exit-on-error behavior variant to the function. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> Reviewed-by: Fabiano Rosas <farosas@suse.de> >> --- >> migration/migration.c | 17 ++++++++++------- >> 1 file changed, 10 insertions(+), 7 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 58fd5819bc..5489ff96df 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque) >> MigrationIncomingState *mis = migration_incoming_get_current(); >> PostcopyState ps; >> int ret; >> + Error *local_err = NULL; >> >> assert(mis->from_src_file); >> >> if (compress_threads_load_setup(mis->from_src_file)) { >> - error_report("Failed to setup decompress threads"); >> + error_setg(&local_err, "Failed to setup decompress threads"); >> goto fail; >> } >> >> @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque) >> } >> >> if (ret < 0) { >> - if (migrate_has_error(s)) { >> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { >> - error_report_err(error_copy(s->error)); >> - } >> - } >> - error_report("load of migration failed: %s", strerror(-ret)); >> + error_setg(&local_err, "load of migration failed: %s", strerror(-ret)); >> goto fail; >> } >> >> if (colo_incoming_co() < 0) { >> + error_setg(&local_err, "colo incoming failed"); >> goto fail; >> } >> >> @@ -809,6 +806,12 @@ fail: >> MIGRATION_STATUS_FAILED); >> migration_incoming_state_destroy(); >> >> + if (migrate_has_error(s)) { >> + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { >> + error_report_err(error_copy(s->error)); >> + } >> + } >> + error_report_err(local_err); > > Here migrate_has_error() will always return true? If so we can drop it. > > Meanwhile, IMHO it's easier we simply always keep the earliest error we see > and report that only, local_err is just one of the errors and whoever > reaches first will be reported. Something like: > > migrate_set_error(local_err); > WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > error_report_err(error_copy(s->error)); > } > exit(EXIT_FAILURE); > > Then when with the exit-on-error thing: > > migrate_set_error(local_err); > if (exit_on_error) { > WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > error_report_err(error_copy(s->error)); > } > exit(EXIT_FAILURE); > } > > Would this looks slightly cleaner? > Yes, looks better, will do so -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming 2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy ` (3 preceding siblings ...) 2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy 2024-04-29 19:41 ` Peter Xu 4 siblings, 1 reply; 14+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw) To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core Now we do set MIGRATION_FAILED state, but don't give a chance to orchestrator to query migration state and get the error. Let's provide a possibility for QMP-based orchestrators to get an error like with outgoing migration. For hmp_migrate_incoming(), let's enable the new behavior: HMP is not and ABI, it's mostly intended to use by developer and it makes sense not to stop the process. For x-exit-preconfig, let's keep the old behavior: - it's called from init(), so here we want to keep current behavior by default - it does exit on error by itself as well So, if we want to change the behavior of x-exit-preconfig, it should be another patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Acked-by: Markus Armbruster <armbru@redhat.com> --- migration/migration-hmp-cmds.c | 2 +- migration/migration.c | 38 +++++++++++++++++++++++++++------- migration/migration.h | 3 +++ qapi/migration.json | 7 ++++++- system/vl.c | 3 ++- 5 files changed, 43 insertions(+), 10 deletions(-) diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 7e96ae6ffd..23181bbee1 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict) } QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel)); - qmp_migrate_incoming(NULL, true, caps, &err); + qmp_migrate_incoming(NULL, true, caps, true, false, &err); qapi_free_MigrationChannelList(caps); end: diff --git a/migration/migration.c b/migration/migration.c index 5489ff96df..09f762c99e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -72,6 +72,8 @@ #define NOTIFIER_ELEM_INIT(array, elem) \ [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem]) +#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true + static NotifierWithReturnList migration_state_notifiers[] = { NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL), NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT), @@ -234,6 +236,8 @@ void migration_object_init(void) qemu_cond_init(¤t_incoming->page_request_cond); current_incoming->page_requested = g_tree_new(page_request_addr_cmp); + current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR; + migration_object_check(current_migration, &error_fatal); blk_mig_init(); @@ -806,14 +810,19 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); - if (migrate_has_error(s)) { - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(error_copy(s->error)); + if (mis->exit_on_error) { + if (migrate_has_error(s)) { + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { + error_report_err(error_copy(s->error)); + } } + error_report_err(local_err); + migrate_error_free(s); + exit(EXIT_FAILURE); + } else { + migrate_set_error(s, local_err); + error_free(local_err); } - error_report_err(local_err); - migrate_error_free(s); - exit(EXIT_FAILURE); } /** @@ -1322,6 +1331,15 @@ static void fill_destination_migration_info(MigrationInfo *info) break; } info->status = mis->state; + + if (!info->error_desc) { + MigrationState *s = migrate_get_current(); + QEMU_LOCK_GUARD(&s->error_mutex); + + if (s->error) { + info->error_desc = g_strdup(error_get_pretty(s->error)); + } + } } MigrationInfo *qmp_query_migrate(Error **errp) @@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp) } void qmp_migrate_incoming(const char *uri, bool has_channels, - MigrationChannelList *channels, Error **errp) + MigrationChannelList *channels, + bool has_exit_on_error, bool exit_on_error, + Error **errp) { Error *local_err = NULL; static bool once = true; + MigrationIncomingState *mis = migration_incoming_get_current(); if (!once) { error_setg(errp, "The incoming migration has already been started"); @@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels, return; } + mis->exit_on_error = + has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR; + qemu_start_incoming_migration(uri, has_channels, channels, &local_err); if (local_err) { diff --git a/migration/migration.h b/migration/migration.h index 8045e39c26..95995a818e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -227,6 +227,9 @@ struct MigrationIncomingState { * is needed as this field is updated serially. */ unsigned int switchover_ack_pending_num; + + /* Do exit on incoming migration failure */ + bool exit_on_error; }; MigrationIncomingState *migration_incoming_get_current(void); diff --git a/qapi/migration.json b/qapi/migration.json index 8c65b90328..9feed413b5 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1837,6 +1837,10 @@ # @channels: list of migration stream channels with each stream in the # list connected to a destination interface endpoint. # +# @exit-on-error: Exit on incoming migration failure. Default true. +# When set to false, the failure triggers a MIGRATION event, and +# error details could be retrieved with query-migrate. (since 9.1) +# # Since: 2.3 # # Notes: @@ -1889,7 +1893,8 @@ ## { 'command': 'migrate-incoming', 'data': {'*uri': 'str', - '*channels': [ 'MigrationChannel' ] } } + '*channels': [ 'MigrationChannel' ], + '*exit-on-error': 'bool' } } ## # @xen-save-devices-state: diff --git a/system/vl.c b/system/vl.c index 7756eac81e..79cd498395 100644 --- a/system/vl.c +++ b/system/vl.c @@ -2723,7 +2723,8 @@ void qmp_x_exit_preconfig(Error **errp) if (incoming) { Error *local_err = NULL; if (strcmp(incoming, "defer") != 0) { - qmp_migrate_incoming(incoming, false, NULL, &local_err); + qmp_migrate_incoming(incoming, false, NULL, true, true, + &local_err); if (local_err) { error_reportf_err(local_err, "-incoming %s: ", incoming); exit(1); -- 2.34.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming 2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:41 ` Peter Xu 0 siblings, 0 replies; 14+ messages in thread From: Peter Xu @ 2024-04-29 19:41 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core On Mon, Apr 29, 2024 at 10:14:26PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Now we do set MIGRATION_FAILED state, but don't give a chance to > orchestrator to query migration state and get the error. > > Let's provide a possibility for QMP-based orchestrators to get an error > like with outgoing migration. > > For hmp_migrate_incoming(), let's enable the new behavior: HMP is not > and ABI, it's mostly intended to use by developer and it makes sense > not to stop the process. > > For x-exit-preconfig, let's keep the old behavior: > - it's called from init(), so here we want to keep current behavior by > default > - it does exit on error by itself as well > So, if we want to change the behavior of x-exit-preconfig, it should be > another patch. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Acked-by: Markus Armbruster <armbru@redhat.com> > --- > migration/migration-hmp-cmds.c | 2 +- > migration/migration.c | 38 +++++++++++++++++++++++++++------- > migration/migration.h | 3 +++ > qapi/migration.json | 7 ++++++- > system/vl.c | 3 ++- > 5 files changed, 43 insertions(+), 10 deletions(-) > > diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c > index 7e96ae6ffd..23181bbee1 100644 > --- a/migration/migration-hmp-cmds.c > +++ b/migration/migration-hmp-cmds.c > @@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict) > } > QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel)); > > - qmp_migrate_incoming(NULL, true, caps, &err); > + qmp_migrate_incoming(NULL, true, caps, true, false, &err); > qapi_free_MigrationChannelList(caps); > > end: > diff --git a/migration/migration.c b/migration/migration.c > index 5489ff96df..09f762c99e 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -72,6 +72,8 @@ > #define NOTIFIER_ELEM_INIT(array, elem) \ > [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem]) > > +#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true > + > static NotifierWithReturnList migration_state_notifiers[] = { > NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL), > NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT), > @@ -234,6 +236,8 @@ void migration_object_init(void) > qemu_cond_init(¤t_incoming->page_request_cond); > current_incoming->page_requested = g_tree_new(page_request_addr_cmp); > > + current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR; > + > migration_object_check(current_migration, &error_fatal); > > blk_mig_init(); > @@ -806,14 +810,19 @@ fail: > MIGRATION_STATUS_FAILED); > migration_incoming_state_destroy(); > > - if (migrate_has_error(s)) { > - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > - error_report_err(error_copy(s->error)); > + if (mis->exit_on_error) { > + if (migrate_has_error(s)) { > + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > + error_report_err(error_copy(s->error)); > + } > } > + error_report_err(local_err); > + migrate_error_free(s); > + exit(EXIT_FAILURE); > + } else { > + migrate_set_error(s, local_err); > + error_free(local_err); > } > - error_report_err(local_err); > - migrate_error_free(s); > - exit(EXIT_FAILURE); > } (Please see my reply in previous patch) > > /** > @@ -1322,6 +1331,15 @@ static void fill_destination_migration_info(MigrationInfo *info) > break; > } > info->status = mis->state; > + > + if (!info->error_desc) { This will also guarantee to happen? Can drop the check too if so. Other than that looks good, thanks. > + MigrationState *s = migrate_get_current(); > + QEMU_LOCK_GUARD(&s->error_mutex); > + > + if (s->error) { > + info->error_desc = g_strdup(error_get_pretty(s->error)); > + } > + } > } > > MigrationInfo *qmp_query_migrate(Error **errp) > @@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp) > } > > void qmp_migrate_incoming(const char *uri, bool has_channels, > - MigrationChannelList *channels, Error **errp) > + MigrationChannelList *channels, > + bool has_exit_on_error, bool exit_on_error, > + Error **errp) > { > Error *local_err = NULL; > static bool once = true; > + MigrationIncomingState *mis = migration_incoming_get_current(); > > if (!once) { > error_setg(errp, "The incoming migration has already been started"); > @@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels, > return; > } > > + mis->exit_on_error = > + has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR; > + > qemu_start_incoming_migration(uri, has_channels, channels, &local_err); > > if (local_err) { > diff --git a/migration/migration.h b/migration/migration.h > index 8045e39c26..95995a818e 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -227,6 +227,9 @@ struct MigrationIncomingState { > * is needed as this field is updated serially. > */ > unsigned int switchover_ack_pending_num; > + > + /* Do exit on incoming migration failure */ > + bool exit_on_error; > }; > > MigrationIncomingState *migration_incoming_get_current(void); > diff --git a/qapi/migration.json b/qapi/migration.json > index 8c65b90328..9feed413b5 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -1837,6 +1837,10 @@ > # @channels: list of migration stream channels with each stream in the > # list connected to a destination interface endpoint. > # > +# @exit-on-error: Exit on incoming migration failure. Default true. > +# When set to false, the failure triggers a MIGRATION event, and > +# error details could be retrieved with query-migrate. (since 9.1) > +# > # Since: 2.3 > # > # Notes: > @@ -1889,7 +1893,8 @@ > ## > { 'command': 'migrate-incoming', > 'data': {'*uri': 'str', > - '*channels': [ 'MigrationChannel' ] } } > + '*channels': [ 'MigrationChannel' ], > + '*exit-on-error': 'bool' } } > > ## > # @xen-save-devices-state: > diff --git a/system/vl.c b/system/vl.c > index 7756eac81e..79cd498395 100644 > --- a/system/vl.c > +++ b/system/vl.c > @@ -2723,7 +2723,8 @@ void qmp_x_exit_preconfig(Error **errp) > if (incoming) { > Error *local_err = NULL; > if (strcmp(incoming, "defer") != 0) { > - qmp_migrate_incoming(incoming, false, NULL, &local_err); > + qmp_migrate_incoming(incoming, false, NULL, true, true, > + &local_err); > if (local_err) { > error_reportf_err(local_err, "-incoming %s: ", incoming); > exit(1); > -- > 2.34.1 > -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-04-30 8:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy 2024-04-29 19:22 ` Peter Xu 2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy 2024-04-29 19:32 ` Peter Xu 2024-04-30 8:06 ` Vladimir Sementsov-Ogievskiy 2024-04-30 8:09 ` Vladimir Sementsov-Ogievskiy 2024-04-30 8:47 ` Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy 2024-04-29 19:39 ` Peter Xu 2024-04-30 7:12 ` Vladimir Sementsov-Ogievskiy 2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy 2024-04-29 19:41 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).