* [PATCH 0/3] migration: Error fixes and improvements
@ 2025-11-15 8:34 Markus Armbruster
2025-11-15 8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Markus Armbruster @ 2025-11-15 8:34 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, farosas
Maintainers decide what to take for 10.2, if anything.
Let me know if you'd prefer the "perhaps should take ownership" idea
in PATCH 1's commit message.
Markus Armbruster (3):
migration: Plug memory leaks after migrate_set_error()
migration: Use warn_reportf_err() where appropriate
migration/postcopy-ram: Improve error reporting after loadvm failure
migration/cpr-exec.c | 3 ++-
migration/multifd.c | 6 ++++--
migration/postcopy-ram.c | 17 ++++++++---------
3 files changed, 14 insertions(+), 12 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() 2025-11-15 8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster @ 2025-11-15 8:34 ` Markus Armbruster 2025-11-15 8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: Markus Armbruster @ 2025-11-15 8:34 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas migrate_set_error(s, err) stores a copy of @err in @s. The original @err is not freed. Most callers free it immediately. Some callers free it later, or pass it on. And some leak it. Fix those. Perhaps migrate_set_error(s, err) should take ownership of @err. The callers that free it immediately would become simpler, and avoid a copy and a deallocation. The others would have to pass error_copy(err). Signed-off-by: Markus Armbruster <armbru@redhat.com> --- migration/cpr-exec.c | 3 ++- migration/multifd.c | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/migration/cpr-exec.c b/migration/cpr-exec.c index d284f6e734..0b8344a86f 100644 --- a/migration/cpr-exec.c +++ b/migration/cpr-exec.c @@ -159,11 +159,12 @@ static void cpr_exec_cb(void *opaque) error_report_err(error_copy(err)); migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED); migrate_set_error(s, err); + error_free(err); + err = NULL; /* Note, we can go from state COMPLETED to FAILED */ migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL); - err = NULL; if (!migration_block_activate(&err)) { /* error was already reported */ error_free(err); diff --git a/migration/multifd.c b/migration/multifd.c index 98873cee74..a529c399e4 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -964,6 +964,7 @@ bool multifd_send_setup(void) if (!multifd_new_send_channel_create(p, &local_err)) { migrate_set_error(s, local_err); + error_free(local_err); ret = -1; } } @@ -988,6 +989,7 @@ bool multifd_send_setup(void) ret = multifd_send_state->ops->send_setup(p, &local_err); if (ret) { migrate_set_error(s, local_err); + error_free(local_err); goto err; } assert(p->iov); -- 2.49.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] migration: Use warn_reportf_err() where appropriate 2025-11-15 8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster 2025-11-15 8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster @ 2025-11-15 8:34 ` Markus Armbruster 2025-11-17 15:47 ` Fabiano Rosas 2025-11-15 8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster 2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu 3 siblings, 1 reply; 14+ messages in thread From: Markus Armbruster @ 2025-11-15 8:34 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas Replace warn_report("...: %s", ..., error_get_pretty(err)); by warn_reportf_err(err, "...: ", ...); Prior art: commit 5217f1887a8 (error: Use error_reportf_err() where appropriate). Signed-off-by: Markus Armbruster <armbru@redhat.com> --- migration/multifd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index a529c399e4..6210454838 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -464,8 +464,8 @@ static void migration_ioc_shutdown_gracefully(QIOChannel *ioc) */ migration_tls_channel_end(ioc, &local_err); if (local_err) { - warn_report("Failed to gracefully terminate TLS connection: %s", - error_get_pretty(local_err)); + warn_reportf_err(local_err, + "Failed to gracefully terminate TLS connection: "); } } -- 2.49.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] migration: Use warn_reportf_err() where appropriate 2025-11-15 8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster @ 2025-11-17 15:47 ` Fabiano Rosas 0 siblings, 0 replies; 14+ messages in thread From: Fabiano Rosas @ 2025-11-17 15:47 UTC (permalink / raw) To: Markus Armbruster, qemu-devel; +Cc: peterx Markus Armbruster <armbru@redhat.com> writes: > Replace > > warn_report("...: %s", ..., error_get_pretty(err)); > > by > > warn_reportf_err(err, "...: ", ...); > > Prior art: commit 5217f1887a8 (error: Use error_reportf_err() where > appropriate). > > Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure 2025-11-15 8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster 2025-11-15 8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster 2025-11-15 8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster @ 2025-11-15 8:35 ` Markus Armbruster 2025-11-17 15:50 ` Fabiano Rosas 2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu 3 siblings, 1 reply; 14+ messages in thread From: Markus Armbruster @ 2025-11-15 8:35 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas One of two error messages show __func__. Drop it; it doesn't help users, and developers can grep for the message. This also permits de-duplicating the code to prepend to the error message. Both error messages show a numeric error code. I doubt that's helpful, but I'm leaving it alone. Use error_append_hint() for explaining that some dirty bitmaps may be lost. Polish the prose. Don't faff around with g_clear_pointer(), it's not worth its keep here. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- migration/postcopy-ram.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 3f98dcb6fd..7c9fe61041 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2146,25 +2146,24 @@ static void *postcopy_listen_thread(void *opaque) if (load_res < 0) { qemu_file_set_error(f, load_res); dirty_bitmap_mig_cancel_incoming(); + error_prepend(&local_err, + "loadvm failed during postcopy: %d: ", load_res); if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING && !migrate_postcopy_ram() && migrate_dirty_bitmaps()) { - error_report("%s: loadvm failed during postcopy: %d: %s. All states " - "are migrated except dirty bitmaps. Some dirty " - "bitmaps may be lost, and present migrated dirty " - "bitmaps are correctly migrated and valid.", - __func__, load_res, error_get_pretty(local_err)); - g_clear_pointer(&local_err, error_free); + error_append_hint(&local_err, + "All state is migrated except dirty bitmaps." + " Some dirty bitmaps may be lost, but any" + " migrated dirty bitmaps are valid."); + error_report_err(local_err); } else { /* * Something went fatally wrong and we have a bad state, QEMU will * exit depending on if postcopy-exit-on-error is true, but the * migration cannot be recovered. */ - error_prepend(&local_err, - "loadvm failed during postcopy: %d: ", load_res); migrate_set_error(migr, local_err); - g_clear_pointer(&local_err, error_report_err); + error_report_err(local_err); migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_FAILED); goto out; } -- 2.49.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure 2025-11-15 8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster @ 2025-11-17 15:50 ` Fabiano Rosas 0 siblings, 0 replies; 14+ messages in thread From: Fabiano Rosas @ 2025-11-17 15:50 UTC (permalink / raw) To: Markus Armbruster, qemu-devel; +Cc: peterx Markus Armbruster <armbru@redhat.com> writes: > One of two error messages show __func__. Drop it; it doesn't help > users, and developers can grep for the message. This also permits > de-duplicating the code to prepend to the error message. > > Both error messages show a numeric error code. I doubt that's > helpful, but I'm leaving it alone. > > Use error_append_hint() for explaining that some dirty bitmaps may be > lost. Polish the prose. > > Don't faff around with g_clear_pointer(), it's not worth its keep > here. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] migration: Error fixes and improvements 2025-11-15 8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster ` (2 preceding siblings ...) 2025-11-15 8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster @ 2025-11-17 16:03 ` Peter Xu 2025-11-18 7:44 ` Markus Armbruster 3 siblings, 1 reply; 14+ messages in thread From: Peter Xu @ 2025-11-17 16:03 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel, farosas On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote: > Maintainers decide what to take for 10.2, if anything. > > Let me know if you'd prefer the "perhaps should take ownership" idea > in PATCH 1's commit message. I recall I had such patch previously, so I digged it out: https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/ I found that I dropped it in v3's post of that series, where I mentioned in the change log that either way is not clean, so I dropped that until I could have a better understanding: https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/ I think at that time I should have hit an use case where the caller forgot to error_copy(), hence passing it in causing an UAF. Then I thought memory leak was better in error path comparing to UAF if the API was used wrong either way. But feel free to share your thoughts. We can definitely revisit this. I queued the series for this release, thanks Markus. > > Markus Armbruster (3): > migration: Plug memory leaks after migrate_set_error() > migration: Use warn_reportf_err() where appropriate > migration/postcopy-ram: Improve error reporting after loadvm failure > > migration/cpr-exec.c | 3 ++- > migration/multifd.c | 6 ++++-- > migration/postcopy-ram.c | 17 ++++++++--------- > 3 files changed, 14 insertions(+), 12 deletions(-) > > -- > 2.49.0 > > -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] migration: Error fixes and improvements 2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu @ 2025-11-18 7:44 ` Markus Armbruster 2025-11-18 17:35 ` Peter Xu 0 siblings, 1 reply; 14+ messages in thread From: Markus Armbruster @ 2025-11-18 7:44 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas Peter Xu <peterx@redhat.com> writes: > On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote: >> Maintainers decide what to take for 10.2, if anything. >> >> Let me know if you'd prefer the "perhaps should take ownership" idea >> in PATCH 1's commit message. > > I recall I had such patch previously, so I digged it out: > > https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/ > > I found that I dropped it in v3's post of that series, where I mentioned in > the change log that either way is not clean, so I dropped that until I > could have a better understanding: > > https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/ > > I think at that time I should have hit an use case where the caller forgot > to error_copy(), hence passing it in causing an UAF. Use-after-free can happen if the caller holds on to its reference until after the stored Error is freed. In other words, holding on to the reference is safe only until the stored Error is freed, and any safety argument will have to reason about the stored Error's lifetime. No idea how difficult or brittle such an argument would be. > Then I thought memory > leak was better in error path comparing to UAF if the API was used wrong > either way. Fair point. > But feel free to share your thoughts. We can definitely revisit this. migrate_set_error(s, err) stores a copy of @err in @s unless @s already has an Error stored. I see 26 calls of migrate_set_error(). * 11 call error_free() immediately, and 2 call it via g_autoptr(). 3 neglect to call it. My patch fixes them. Total is 16 out of 26. * 6 report and free with error_report_err(), i.e. we store the error for later *and* report it now. Gives me an uneasy feeling. How is the stored error handled? Will it be reported again? That would likely be wrong. * 3 wrap migrate_set_error(): - multifd_send_set_error() Its callers both call error_free() immediately. - migration_connect_set_error() 3 callers. qmp_migrate() and qmp_migrate_finish() propagate to their callers, i.e. we store the error for later *and* have callers handle it now. Same uneasy feeling as above. One of migration_connect()'s callers passes NULL, the other calls error_free() immediately. - multifd_recv_terminate_threads() Two callers pass NULL, one calls error_free() immediately, and multifd_recv_new_channel() propagates. Uneasy feeling again. * qemu_savevm_state_setup() confuses me. It sets @errp on failure, and stores some (uneasy feeling), but not all of these errors with migrate_set_error(). See below. Summary: * We're prone to leaking the Error passed to migrate_set_error(). * If we replaced it by a function that takes ownership, we may become prone to use-after-free. I write "may" because I'm unsure when exactly a use after calling this ownership-taking function would be unsafe. * The "forked" Error handling makes me uneasy. I'm sure we do it for reasons. Can you help me understand them? > I queued the series for this release, thanks Markus. Thanks! Bonus content: int qemu_savevm_state_setup(QEMUFile *f, Error **errp) { ERRP_GUARD(); MigrationState *ms = migrate_get_current(); JSONWriter *vmdesc = ms->vmdesc; SaveStateEntry *se; int ret = 0; if (vmdesc) { json_writer_int64(vmdesc, "page_size", qemu_target_page_size()); json_writer_start_array(vmdesc, "devices"); } trace_savevm_state_setup(); QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { The function does work in a loop. This work can fail in two places, setting @errp. When it does, we return. if (se->vmsd && se->vmsd->early_setup) { First one: ret = vmstate_save(f, se, vmdesc, errp); if (ret) { vmstate_save() set @errp and returned an error code. migrate_set_error(ms, *errp); qemu_file_set_error(f, ret); Store @errp in @ms, and the error code in @f. Aside: storing error state in two places feels like one too many. break; This break jumps to if (ret) { return ret; }, so it's a roundabout way to return ret. Meh. } continue; } if (!se->ops || !se->ops->save_setup) { continue; } if (se->ops->is_active) { if (!se->ops->is_active(se->opaque)) { continue; } } save_section_header(f, se, QEMU_VM_SECTION_START); Second one: ret = se->ops->save_setup(f, se->opaque, errp); save_section_footer(f, se); if (ret < 0) { ->save_setup() set @errp and returned an error code. qemu_file_set_error(f, ret); This time we store only the error code. Why not @errp? break; Again, a roundabout way to return ret. } Obviously ret >= 0 here. I believe @errp is not set. } We get here either via break or normal loop termination. If via break, ret != 0 and @errp set. If via normal loop termination, ret >= 0 and @errp not set. if (ret) { return ret; If via normal loop termination, and ret > 0, we return failure (I think) without setting @errp. I hope this can't happen, because ->save_setup() never returns > 0. But it's unclean even then. } I trust @errp is still unset here. This is the case if we take the return right above when @errp has been set. /* TODO: Should we check that errp is set in case of failure ? */ return precopy_notify(PRECOPY_NOTIFY_SETUP, errp); This time we store nothing. Why? } I think this function would be easier to understand if we replaced break by return. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] migration: Error fixes and improvements 2025-11-18 7:44 ` Markus Armbruster @ 2025-11-18 17:35 ` Peter Xu 2025-11-19 7:45 ` Markus Armbruster 0 siblings, 1 reply; 14+ messages in thread From: Peter Xu @ 2025-11-18 17:35 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel, farosas On Tue, Nov 18, 2025 at 08:44:32AM +0100, Markus Armbruster wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote: > >> Maintainers decide what to take for 10.2, if anything. > >> > >> Let me know if you'd prefer the "perhaps should take ownership" idea > >> in PATCH 1's commit message. > > > > I recall I had such patch previously, so I digged it out: > > > > https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/ > > > > I found that I dropped it in v3's post of that series, where I mentioned in > > the change log that either way is not clean, so I dropped that until I > > could have a better understanding: > > > > https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/ > > > > I think at that time I should have hit an use case where the caller forgot > > to error_copy(), hence passing it in causing an UAF. > > Use-after-free can happen if the caller holds on to its reference until > after the stored Error is freed. In other words, holding on to the > reference is safe only until the stored Error is freed, and any safety > argument will have to reason about the stored Error's lifetime. No idea > how difficult or brittle such an argument would be. > > > Then I thought memory > > leak was better in error path comparing to UAF if the API was used wrong > > either way. > > Fair point. > > > But feel free to share your thoughts. We can definitely revisit this. > > migrate_set_error(s, err) stores a copy of @err in @s unless @s already > has an Error stored. > > I see 26 calls of migrate_set_error(). > > * 11 call error_free() immediately, and 2 call it via g_autoptr(). 3 > neglect to call it. My patch fixes them. Total is 16 out of 26. Yes, I appreciate that! > > * 6 report and free with error_report_err(), i.e. we store the error for > later *and* report it now. Gives me an uneasy feeling. How is the > stored error handled? Will it be reported again? That would likely > be wrong. Migration as a module outlives me working as a developer on QEMU.. so I can only share my two cents that still resides in my memory, which may or may not always be the truth.. I think one issue is migration used to error_report() things all over the places, but then we thought it a good idea to try remember the error so that libvirt can query at any time if necessary. With that, starting from QEMU 2.7 we introduced error-desc into query-migrate results. That's what s->error was about. But then, could we drop the error_report() / error_report_err()s? Likely not, because there might be user relying on the printed errors to see what happened.. Making query-migrate the only source of error report adds complexity to those users and may cause confusions.. Hence the extra references sometimes needed after migrate_set_error(), almost for keeping behaviors to print it out as the old times (I didn't check each of them, though). > > * 3 wrap migrate_set_error(): > > - multifd_send_set_error() > > Its callers both call error_free() immediately. > > - migration_connect_set_error() > > 3 callers. > > qmp_migrate() and qmp_migrate_finish() propagate to their callers, > i.e. we store the error for later *and* have callers handle it now. > Same uneasy feeling as above. > > One of migration_connect()'s callers passes NULL, the other calls > error_free() immediately. > > - multifd_recv_terminate_threads() > > Two callers pass NULL, one calls error_free() immediately, and > multifd_recv_new_channel() propagates. Uneasy feeling again. > > * qemu_savevm_state_setup() confuses me. It sets @errp on failure, and > stores some (uneasy feeling), but not all of these errors with > migrate_set_error(). See below. Yes, I also agree qemu_savevm_state_setup() is confusing on error handlings. I'll comment below. > > Summary: > > * We're prone to leaking the Error passed to migrate_set_error(). > > * If we replaced it by a function that takes ownership, we may become > prone to use-after-free. I write "may" because I'm unsure when > exactly a use after calling this ownership-taking function would be > unsafe. > > * The "forked" Error handling makes me uneasy. I'm sure we do it for > reasons. Can you help me understand them? Hope above would provide some context. In short, IMHO it's a mixture demand of "print the error immediately like the old behavior" and "remember the error too when query". > > > I queued the series for this release, thanks Markus. > > Thanks! > > > Bonus content: > > int qemu_savevm_state_setup(QEMUFile *f, Error **errp) > { > ERRP_GUARD(); > MigrationState *ms = migrate_get_current(); > JSONWriter *vmdesc = ms->vmdesc; > SaveStateEntry *se; > int ret = 0; > > if (vmdesc) { > json_writer_int64(vmdesc, "page_size", qemu_target_page_size()); > json_writer_start_array(vmdesc, "devices"); > } > > trace_savevm_state_setup(); > QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { > > The function does work in a loop. > > This work can fail in two places, setting @errp. When it does, we > return. > > if (se->vmsd && se->vmsd->early_setup) { > > First one: > > ret = vmstate_save(f, se, vmdesc, errp); > if (ret) { > > vmstate_save() set @errp and returned an error code. > > migrate_set_error(ms, *errp); > qemu_file_set_error(f, ret); > > Store @errp in @ms, and the error code in @f. > > Aside: storing error state in two places feels like one too many. Correct. It's once again legacy of migration code where it used to set error onto fds (e.g. that'll start to fail all qemufile APIs on this fd, and qemufile API is not well designed IMHO.. which is another story), but then we start to have string-based Errors and I guess we didn't try harder to unify both. > > break; > > This break jumps to if (ret) { return ret; }, so it's a roundabout way > to return ret. Meh. IIUC some people prefer such way so the function returns at a single point (easier to add common cleanup codes, for example). But I'd confess I also prefer a direct return. :) > > } > continue; > } > > if (!se->ops || !se->ops->save_setup) { > continue; > } > if (se->ops->is_active) { > if (!se->ops->is_active(se->opaque)) { > continue; > } > } > save_section_header(f, se, QEMU_VM_SECTION_START); > > Second one: > > ret = se->ops->save_setup(f, se->opaque, errp); > save_section_footer(f, se); > if (ret < 0) { > > ->save_setup() set @errp and returned an error code. > > qemu_file_set_error(f, ret); > > This time we store only the error code. Why not @errp? IIUC migrate_set_error() above was redundant instead, because qemu_savevm_state_setup()'s caller will do migrate_set_error(). Said that, we shouldn't need to randomly call qemu_file_set_error() either deep in this function.. may need some cleanups. > > break; > > Again, a roundabout way to return ret. > > } > > Obviously ret >= 0 here. I believe @errp is not set. I don't think in this path ret can be >0.. but yeah this is pretty unobvious even if true. That's also why I liked QEMU's current preferences of "bool fn(..., Error **errp)".. at least in new codes. Maybe I should ping Vladimir on his recent work here? https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru That'll be part of such cleanup effort (and yes unfortunately many migration related cleanups will need a lot of code churns...). > > } > > We get here either via break or normal loop termination. > > If via break, ret != 0 and @errp set. > > If via normal loop termination, ret >= 0 and @errp not set. > > if (ret) { > return ret; > > If via normal loop termination, and ret > 0, we return failure (I think) > without setting @errp. I hope this can't happen, because ->save_setup() > never returns > 0. But it's unclean even then. > > } > > I trust @errp is still unset here. This is the case if we take the > return right above when @errp has been set. > > /* TODO: Should we check that errp is set in case of failure ? */ > return precopy_notify(PRECOPY_NOTIFY_SETUP, errp); > > This time we store nothing. Why? I think the clean way is we do not store anything when this function provided Error**. > > } > > I think this function would be easier to understand if we replaced break > by return. I'll see what I can do with this function. Thanks! -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] migration: Error fixes and improvements 2025-11-18 17:35 ` Peter Xu @ 2025-11-19 7:45 ` Markus Armbruster 2025-11-19 20:58 ` Peter Xu 0 siblings, 1 reply; 14+ messages in thread From: Markus Armbruster @ 2025-11-19 7:45 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas Peter Xu <peterx@redhat.com> writes: > On Tue, Nov 18, 2025 at 08:44:32AM +0100, Markus Armbruster wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote: >> >> Maintainers decide what to take for 10.2, if anything. >> >> >> >> Let me know if you'd prefer the "perhaps should take ownership" idea >> >> in PATCH 1's commit message. >> > >> > I recall I had such patch previously, so I digged it out: >> > >> > https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/ >> > >> > I found that I dropped it in v3's post of that series, where I mentioned in >> > the change log that either way is not clean, so I dropped that until I >> > could have a better understanding: >> > >> > https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/ >> > >> > I think at that time I should have hit an use case where the caller forgot >> > to error_copy(), hence passing it in causing an UAF. >> >> Use-after-free can happen if the caller holds on to its reference until >> after the stored Error is freed. In other words, holding on to the >> reference is safe only until the stored Error is freed, and any safety >> argument will have to reason about the stored Error's lifetime. No idea >> how difficult or brittle such an argument would be. >> >> > Then I thought memory >> > leak was better in error path comparing to UAF if the API was used wrong >> > either way. >> >> Fair point. >> >> > But feel free to share your thoughts. We can definitely revisit this. >> >> migrate_set_error(s, err) stores a copy of @err in @s unless @s already >> has an Error stored. >> >> I see 26 calls of migrate_set_error(). >> >> * 11 call error_free() immediately, and 2 call it via g_autoptr(). 3 >> neglect to call it. My patch fixes them. Total is 16 out of 26. > > Yes, I appreciate that! > >> >> * 6 report and free with error_report_err(), i.e. we store the error for >> later *and* report it now. Gives me an uneasy feeling. How is the >> stored error handled? Will it be reported again? That would likely >> be wrong. > > Migration as a module outlives me working as a developer on QEMU.. so I can > only share my two cents that still resides in my memory, which may or may > not always be the truth.. It was already big & complicated back when *I* started! > I think one issue is migration used to error_report() things all over the > places, Yes, except even error_report() didn't exist, yet. It made sense at the time. > but then we thought it a good idea to try remember the error so > that libvirt can query at any time if necessary. With that, starting from > QEMU 2.7 we introduced error-desc into query-migrate results. That's what > s->error was about. > > But then, could we drop the error_report() / error_report_err()s? Likely > not, because there might be user relying on the printed errors to see what > happened.. Making query-migrate the only source of error report adds > complexity to those users and may cause confusions.. Hence the extra > references sometimes needed after migrate_set_error(), almost for keeping > behaviors to print it out as the old times (I didn't check each of them, > though). Yes. The core of migration is a long-running background task. We have command line options and monitor commands to start and control it. QMP commands need to pass their errors to the QMP core by setting @errp. Any code they call should pass them the same way. Conceptually easy, although converting old code to pass error up instead of reporting them can be tedious. HMP commands should wrap around QMP commands an report to their monitor. No issues there, as far as I know. The hairy part is the background task. I believe it used to simply do its job, reporting errors to stderr along the way, until it either succeeded or failed. The errors reported made success / failure "obvious" for users. This can report multiple errors, which can be confusing. Worse, it was no good for management applications. These need to observe migration as a state machine, with final success and error states, where the error state comes with an indication of what went wrong. So we made migration store the first of certain errors in the migration state in addition to reporting to stderr. "First", because we store only when the state doesn't already have an error. "Certain", because I doubt we do it for all errors we report. Compare this to how jobs solve this problem. These are a much, much later invention, and designed for management applications from the start[*]. A job is a state machine. Management applications can observe and control the state. Errors are not supposed to be reported, they should be fed to the state machine, which goes into an error state then. The job is not supposed to do actual work in an error state. Therefore, no further errors should be possible. When something goes wrong, we get a single error, stored in the job state, where the management application can find it. Migration is also a state machine, and we long ago retrofitted the means for management applications to observe and control the state. What we haven't done is the disciplined feeding of errors to the state machine. We can still get multiple errors. We store the first of certain errors where the managament application can find it, but whether that error suffices to explain what went wrong is a crap shot. As long as that's the case, we need to spew the other errors to stderr, where a human can find it. >> * 3 wrap migrate_set_error(): >> >> - multifd_send_set_error() >> >> Its callers both call error_free() immediately. >> >> - migration_connect_set_error() >> >> 3 callers. >> >> qmp_migrate() and qmp_migrate_finish() propagate to their callers, >> i.e. we store the error for later *and* have callers handle it now. >> Same uneasy feeling as above. >> >> One of migration_connect()'s callers passes NULL, the other calls >> error_free() immediately. >> >> - multifd_recv_terminate_threads() >> >> Two callers pass NULL, one calls error_free() immediately, and >> multifd_recv_new_channel() propagates. Uneasy feeling again. >> >> * qemu_savevm_state_setup() confuses me. It sets @errp on failure, and >> stores some (uneasy feeling), but not all of these errors with >> migrate_set_error(). See below. > > Yes, I also agree qemu_savevm_state_setup() is confusing on error > handlings. I'll comment below. > >> >> Summary: >> >> * We're prone to leaking the Error passed to migrate_set_error(). >> >> * If we replaced it by a function that takes ownership, we may become >> prone to use-after-free. I write "may" because I'm unsure when >> exactly a use after calling this ownership-taking function would be >> unsafe. >> >> * The "forked" Error handling makes me uneasy. I'm sure we do it for >> reasons. Can you help me understand them? > > Hope above would provide some context. In short, IMHO it's a mixture > demand of "print the error immediately like the old behavior" and "remember > the error too when query". I figure that's how far we were able to drag migration forward. More dragging is called for, but it's hard work, and there's so much else to do. >> > I queued the series for this release, thanks Markus. >> >> Thanks! >> >> >> Bonus content: >> >> int qemu_savevm_state_setup(QEMUFile *f, Error **errp) >> { >> ERRP_GUARD(); >> MigrationState *ms = migrate_get_current(); >> JSONWriter *vmdesc = ms->vmdesc; >> SaveStateEntry *se; >> int ret = 0; >> >> if (vmdesc) { >> json_writer_int64(vmdesc, "page_size", qemu_target_page_size()); >> json_writer_start_array(vmdesc, "devices"); >> } >> >> trace_savevm_state_setup(); >> QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { >> >> The function does work in a loop. >> >> This work can fail in two places, setting @errp. When it does, we >> return. >> >> if (se->vmsd && se->vmsd->early_setup) { >> >> First one: >> >> ret = vmstate_save(f, se, vmdesc, errp); >> if (ret) { >> >> vmstate_save() set @errp and returned an error code. >> >> migrate_set_error(ms, *errp); >> qemu_file_set_error(f, ret); >> >> Store @errp in @ms, and the error code in @f. >> >> Aside: storing error state in two places feels like one too many. > > Correct. It's once again legacy of migration code where it used to set > error onto fds (e.g. that'll start to fail all qemufile APIs on this fd, > and qemufile API is not well designed IMHO.. which is another story), but > then we start to have string-based Errors and I guess we didn't try harder > to unify both. > >> >> break; >> >> This break jumps to if (ret) { return ret; }, so it's a roundabout way >> to return ret. Meh. > > IIUC some people prefer such way so the function returns at a single point > (easier to add common cleanup codes, for example). But I'd confess I also > prefer a direct return. :) I don't believe in complicating control flow to make future cleanup easier to add. If we ever we add it, we need to review the entire function to make sure it's called when needed no matter what. >> >> } >> continue; >> } >> >> if (!se->ops || !se->ops->save_setup) { >> continue; >> } >> if (se->ops->is_active) { >> if (!se->ops->is_active(se->opaque)) { >> continue; >> } >> } >> save_section_header(f, se, QEMU_VM_SECTION_START); >> >> Second one: >> >> ret = se->ops->save_setup(f, se->opaque, errp); >> save_section_footer(f, se); >> if (ret < 0) { >> >> ->save_setup() set @errp and returned an error code. >> >> qemu_file_set_error(f, ret); >> >> This time we store only the error code. Why not @errp? > > IIUC migrate_set_error() above was redundant instead, because > qemu_savevm_state_setup()'s caller will do migrate_set_error(). %-} > Said that, we shouldn't need to randomly call qemu_file_set_error() either > deep in this function.. may need some cleanups. > >> >> break; >> >> Again, a roundabout way to return ret. >> >> } >> >> Obviously ret >= 0 here. I believe @errp is not set. > > I don't think in this path ret can be >0.. but yeah this is pretty > unobvious even if true. That's also why I liked QEMU's current preferences > of "bool fn(..., Error **errp)".. at least in new codes. Me too. When all we need the return value to express is success vs. failure, bool has served us much better than int. > Maybe I should > ping Vladimir on his recent work here? > > https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru > > That'll be part of such cleanup effort (and yes unfortunately many > migration related cleanups will need a lot of code churns...). I know... Can we afford modest efforts to reduce the mess one step at a time? >> } >> >> We get here either via break or normal loop termination. >> >> If via break, ret != 0 and @errp set. >> >> If via normal loop termination, ret >= 0 and @errp not set. >> >> if (ret) { >> return ret; >> >> If via normal loop termination, and ret > 0, we return failure (I think) >> without setting @errp. I hope this can't happen, because ->save_setup() >> never returns > 0. But it's unclean even then. >> >> } >> >> I trust @errp is still unset here. This is the case if we take the >> return right above when @errp has been set. >> >> /* TODO: Should we check that errp is set in case of failure ? */ >> return precopy_notify(PRECOPY_NOTIFY_SETUP, errp); >> >> This time we store nothing. Why? > > I think the clean way is we do not store anything when this function > provided Error**. Point! >> } >> >> I think this function would be easier to understand if we replaced break >> by return. > > I'll see what I can do with this function. > > Thanks! You're welcome! [*] If the job abstraction had been available in time, migration would totally be a job. There's no *design* reason for it being not a job. Plenty of implementation and backward compatibility reasons, though. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] migration: Error fixes and improvements 2025-11-19 7:45 ` Markus Armbruster @ 2025-11-19 20:58 ` Peter Xu 2025-11-20 10:30 ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster 2025-11-21 12:38 ` [PATCH 0/3] migration: Error fixes and improvements Fabiano Rosas 0 siblings, 2 replies; 14+ messages in thread From: Peter Xu @ 2025-11-19 20:58 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel, farosas On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote: [...] > The hairy part is the background task. > > I believe it used to simply do its job, reporting errors to stderr along > the way, until it either succeeded or failed. The errors reported made > success / failure "obvious" for users. > > This can report multiple errors, which can be confusing. > > Worse, it was no good for management applications. These need to > observe migration as a state machine, with final success and error > states, where the error state comes with an indication of what went > wrong. So we made migration store the first of certain errors in the > migration state in addition to reporting to stderr. > > "First", because we store only when the state doesn't already have an > error. "Certain", because I doubt we do it for all errors we report. > > Compare this to how jobs solve this problem. These are a much, much > later invention, and designed for management applications from the > start[*]. A job is a state machine. Management applications can > observe and control the state. Errors are not supposed to be reported, > they should be fed to the state machine, which goes into an error state > then. The job is not supposed to do actual work in an error state. > Therefore, no further errors should be possible. When something goes > wrong, we get a single error, stored in the job state, where the > management application can find it. > > Migration is also a state machine, and we long ago retrofitted the means > for management applications to observe and control the state. What we > haven't done is the disciplined feeding of errors to the state machine. > We can still get multiple errors. We store the first of certain errors > where the managament application can find it, but whether that error > suffices to explain what went wrong is a crap shot. As long as that's > the case, we need to spew the other errors to stderr, where a human can > find it. Since above mentioned once more on the possibility of reusing Jobs idea, I did try to list things explicitly this time, that why I think it should be challenging and maybe not as worthwhile (?) to do so, however I might be wrong. I attached it at the end of this email almost for myself in the future to reference, please feel free comment, or, to ignore all of those! IMHO it's not directly relevant to the error reporting issues. IMHO rewriting migration with Jobs will not help much in error reporting, because the challenge for refactoring from migration side is not the "Jobs" interfacing, but internally of migration. Say, even if migration provided a "job", it's the "job" impl that did error reporting bad, not the Jobs interfacing.. the "job" impl will need to manage quite some threads on its own, making sure errors are properly reported at least to the "job" interface. Said that, I totally agree we should try to improve error reporting in migration.. with / without Jobs. [...] > > Maybe I should ping Vladimir on his recent work here? > > > > https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru > > > > That'll be part of such cleanup effort (and yes unfortunately many > > migration related cleanups will need a lot of code churns...). > > I know... > > Can we afford modest efforts to reduce the mess one step at a time? Yes, I'll try to follow up on that. [...] > [*] If the job abstraction had been available in time, migration would > totally be a job. There's no *design* reason for it being not a job. > Plenty of implementation and backward compatibility reasons, though. There might be something common between Jobs that block uses and a migration process. If so, we can provide CommonJob and make MigrationJob and BlockJobs dependent on it. However, I sincerely don't know how much common function will there be. IOW, I doubt even in an imaginery world, if we could go back to when Jobs was designed and if we would make migration a Job too (note! snapshots is definitely a too simple migration scenario..). Is it possible after evaluation we still don't? I don't know, but I think it's possible. Thanks! Peter Possible challenges of adopting Jobs in migration flow ====================================================== - Many Jobs defined property doesn't directly suite migration - JobStatus is not directly suitable for migration purposes. There're some of the JobStatus that I can't think of any use (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we can simply not use it), but there're other status that migration needs but isn't availble. Introducing them seems to be an overkill instead to block layer's use case. - Similarly to JobVerb. E.g. JOB_VERB_CHANGE doesn't seem to apply to any concept to migration, but it misses quite some others (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more). - Similarly, JobInfo reports in current-progress (which is not optional but required), which may make perfect sense for block jobs. However migration is OTOH convergence-triggered process, or user-triggered (in case of postcopy). It doesn't have a quantified process but only "COMPLETED" / "IN_PROGRESS". - Another very major example that I have discussed a few times previously, Jobs are close attached to AioContext, while migration doesn't have, meanwhile migration is moving even further away from event driven model.. See: https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t There're just too many example showing that Jobs are defined almost only for block layer.. e.g. job-finalize (which may not make much sense in a migration context anyway..) mentions finalizing of graph changes, which also doesn't exist in migration process. So if we rewrite migration somehow with Jobs or keeping migration in mind designing Jobs, Jobs may need to be very bloated containing both migration and block layer requirements. - Migration involves "two" QEMU instances instead of one I'm guessing existing Jobs operations are not as such, and providing such mechanisms in "Jobs" only for migration may introduce unnecessary code that block layer will never use. E.g. postcopy migration attached the two QEMU instances to represent one VM instance. I do not have a clear picture in mind yet on how we can manage that if we see it as two separate Jobs on each side, and what happens if each side operates on its own Job with different purposes, and how we should connect two Jobs to say they're relevant (or maybe we don't need to?). - More challenges on dest QEMU (VM loader) than src QEMU Unlike on the src side, the dest QEMU, when in an incoming state, is not a VM at all yet, but waiting to receive the migration data to become a working VM. It's not a generic long term process, but a pure listening port of QEMU where QEMU can do nothing without this "job" being completed.. If we think about CPR it's even more complicated, because we essential require part of incoming process to happen before almost everything.. it may even include monitors being initialized. - Deep integration with other subsystems Migration is deeply integrated into many other subsystems (auto-converge being able to throttle vCPUs, RAM being able to ignore empty pages reported from balloons, dirty trackings per-module, etc.), so we're not sure if there'll be some limitation from Jobs (when designed with block layer in mind) that will make such transition harder. For example, we at least want to make sure Jobs won't have simple locks that will be held while running migration, that can further deadlock if the migration code may invoke something else that tries to re-take the Jobs lock, which may cause dead-locks. Or, since migration runs nowadays with quite some threads concurrently, whether the main migration Job can always properly synchronize between all of them with no problem (maybe yes, but I just don't know Jobs enough to say). This is also a relevant question about how much AioContext plays a role in core of Jobs idea and whether it can work well with complicated threaded environment. -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) 2025-11-19 20:58 ` Peter Xu @ 2025-11-20 10:30 ` Markus Armbruster 2025-11-20 12:16 ` Kevin Wolf 2025-11-21 12:38 ` [PATCH 0/3] migration: Error fixes and improvements Fabiano Rosas 1 sibling, 1 reply; 14+ messages in thread From: Markus Armbruster @ 2025-11-20 10:30 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas, John Snow, Kevin Wolf Peter Xu <peterx@redhat.com> writes: > On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote: > > [...] > >> The hairy part is the background task. >> >> I believe it used to simply do its job, reporting errors to stderr along >> the way, until it either succeeded or failed. The errors reported made >> success / failure "obvious" for users. >> >> This can report multiple errors, which can be confusing. >> >> Worse, it was no good for management applications. These need to >> observe migration as a state machine, with final success and error >> states, where the error state comes with an indication of what went >> wrong. So we made migration store the first of certain errors in the >> migration state in addition to reporting to stderr. >> >> "First", because we store only when the state doesn't already have an >> error. "Certain", because I doubt we do it for all errors we report. >> >> Compare this to how jobs solve this problem. These are a much, much >> later invention, and designed for management applications from the >> start[*]. A job is a state machine. Management applications can >> observe and control the state. Errors are not supposed to be reported, >> they should be fed to the state machine, which goes into an error state >> then. The job is not supposed to do actual work in an error state. >> Therefore, no further errors should be possible. When something goes >> wrong, we get a single error, stored in the job state, where the >> management application can find it. >> >> Migration is also a state machine, and we long ago retrofitted the means >> for management applications to observe and control the state. What we >> haven't done is the disciplined feeding of errors to the state machine. >> We can still get multiple errors. We store the first of certain errors >> where the managament application can find it, but whether that error >> suffices to explain what went wrong is a crap shot. As long as that's >> the case, we need to spew the other errors to stderr, where a human can >> find it. > > Since above mentioned once more on the possibility of reusing Jobs idea, I > did try to list things explicitly this time, that why I think it should be > challenging and maybe not as worthwhile (?) to do so, however I might be > wrong. I attached it at the end of this email almost for myself in the > future to reference, please feel free comment, or, to ignore all of those! Challenging definitely, worthwhile unknown, but putting the issues in writing can only help. > IMHO it's not directly relevant to the error reporting issues. There's no hard dependency; migration's error reporting wrongs can certainly be righted without converting to the job abstraction. Studying how the job abstraction does errors may still help. > IMHO rewriting migration with Jobs will not help much in error reporting, > because the challenge for refactoring from migration side is not the "Jobs" > interfacing, but internally of migration. Say, even if migration provided > a "job", it's the "job" impl that did error reporting bad, not the Jobs > interfacing.. the "job" impl will need to manage quite some threads on its > own, making sure errors are properly reported at least to the "job" > interface. Point taken. > Said that, I totally agree we should try to improve error reporting in > migration.. with / without Jobs. > > [...] > >> > Maybe I should ping Vladimir on his recent work here? >> > >> > https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru >> > >> > That'll be part of such cleanup effort (and yes unfortunately many >> > migration related cleanups will need a lot of code churns...). >> >> I know... >> >> Can we afford modest efforts to reduce the mess one step at a time? > > Yes, I'll try to follow up on that. > > [...] > >> [*] If the job abstraction had been available in time, migration would >> totally be a job. There's no *design* reason for it being not a job. >> Plenty of implementation and backward compatibility reasons, though. > > There might be something common between Jobs that block uses and a > migration process. If so, we can provide CommonJob and make MigrationJob > and BlockJobs dependent on it. Ignore BlockJob here, focus on Job. BlockJob came first, in 2012: commit eeec61f2913 (block: add BlockJob interface for long-running operations). It then grew quite a bit to accomodate new job types and provide more control. Job appeared in 2018 as "infrastructure for generic background jobs that aren't tied to a block device" (commit 33e9e9bd62d (job: Create Job, JobDriver and job_create()). Substantial parts of the BlockJob interface are actually deprecated in favor of the Job interface. I believe BlockJob still exists for things that are actually block-specific, and possibly for things we neglected to move over. So, CommonJob already exists: it's Job, defined in qapi/job.json. > However, I sincerely don't know how much common function will there be. > IOW, I doubt even in an imaginery world, if we could go back to when Jobs > was designed and if we would make migration a Job too (note! snapshots is > definitely a too simple migration scenario..). Is it possible after > evaluation we still don't? I don't know, but I think it's possible. To find out, we'd have to examine how the migration state machine and interface could map to the Job state machine, and how the Job interface could map to migration's internal interfaces. The mapping may lose detail. Generic Job is supposed to cover the generic aspects. Some specific jobs may need job-specific interfaces we can't or prefer not to fit into Job. > Thanks! > Peter > > > > > Possible challenges of adopting Jobs in migration flow > ====================================================== > > - Many Jobs defined property doesn't directly suite migration > > - JobStatus is not directly suitable for migration purposes. There're > some of the JobStatus that I can't think of any use > (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we > can simply not use it), but there're other status that migration needs > but isn't availble. Introducing them seems to be an overkill instead to > block layer's use case. The Job abstraction defines possible states and state transitions. Each job finds its own path from the initial state @created to the final state @concluded. If a state doesn't make sense for a certain type of job, it simply doesn't go there. So, job states migration doesn't want are only a problem if there is no path from start to finish that doesn't go through unwanted states. There may also be states migration wants that aren't job states. We could make them job states. Or we map multiple migration states to a single job state, i.e. have the job state *abstract* from migration state details. > - Similarly to JobVerb. E.g. JOB_VERB_CHANGE doesn't seem to apply to > any concept to migration, but it misses quite some others > (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more). JobVerb is used internally to restrict certain job commands to certain job states. For instance, command job-dismiss is rejected unless job is in state @concluded. This governs the generic job-FOO commands. It also covers the legacy block-job-FOO commands, because these wrap around the same C core as the job-FOO commands. We could have commands specific to a certain job type (say migration jobs) that bypass the JobVerb infrastructure, and do their own thing to restrict themselves to certain states. Probably stupid if the states that matter are job states. Probably necessary if they aren't (say a more fine-grained migration state). > - Similarly, JobInfo reports in current-progress (which is not optional > but required), which may make perfect sense for block jobs. However > migration is OTOH convergence-triggered process, or user-triggered (in > case of postcopy). It doesn't have a quantified process but only > "COMPLETED" / "IN_PROGRESS". Is there really no way to track migration progress approximately? Here's the relevant doc comment: # @current-progress: Progress made until now. The unit is arbitrary # and the value can only meaningfully be used for the ratio of # @current-progress to @total-progress. The value is # monotonically increasing. # # @total-progress: Estimated @current-progress value at the completion # of the job. This value can arbitrarily change while the job is # running, in both directions. I think this should work fine for convergence-triggered finish. @current-progress could be the number of "things" sent (for some arbitrary, convenient choice of "things"). Monotonotically increasing. @total-progress then would have to be a more or less rough estimate of @current-progress plus what still needs to be sent. For RAM, @current-progress could be number of pages sent, ane @total-progress could be number of pages sent + (possibly estimated) number of dirty pages. Multiply by page size if that makes adding the estimated size of the non-RAM transfers easier. I haven't thought about postcopy. > - Another very major example that I have discussed a few times > previously, Jobs are close attached to AioContext, while migration > doesn't have, meanwhile migration is moving even further away from > event driven model.. See: > > https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t > > There're just too many example showing that Jobs are defined almost only > for block layer.. e.g. job-finalize (which may not make much sense in a > migration context anyway..) mentions finalizing of graph changes, which > also doesn't exist in migration process. > > So if we rewrite migration somehow with Jobs or keeping migration in mind > designing Jobs, Jobs may need to be very bloated containing both > migration and block layer requirements. > > - Migration involves "two" QEMU instances instead of one > > I'm guessing existing Jobs operations are not as such, and providing such > mechanisms in "Jobs" only for migration may introduce unnecessary code > that block layer will never use. > > E.g. postcopy migration attached the two QEMU instances to represent one > VM instance. I do not have a clear picture in mind yet on how we can > manage that if we see it as two separate Jobs on each side, and what > happens if each side operates on its own Job with different purposes, and > how we should connect two Jobs to say they're relevant (or maybe we don't > need to?). > > - More challenges on dest QEMU (VM loader) than src QEMU > > Unlike on the src side, the dest QEMU, when in an incoming state, is not > a VM at all yet, but waiting to receive the migration data to become a > working VM. It's not a generic long term process, but a pure listening > port of QEMU where QEMU can do nothing without this "job" being > completed.. > > If we think about CPR it's even more complicated, because we essential > require part of incoming process to happen before almost everything.. it > may even include monitors being initialized. > > - Deep integration with other subsystems > > Migration is deeply integrated into many other subsystems (auto-converge > being able to throttle vCPUs, RAM being able to ignore empty pages > reported from balloons, dirty trackings per-module, etc.), so we're not > sure if there'll be some limitation from Jobs (when designed with block > layer in mind) that will make such transition harder. > > For example, we at least want to make sure Jobs won't have simple locks > that will be held while running migration, that can further deadlock if > the migration code may invoke something else that tries to re-take the > Jobs lock, which may cause dead-locks. > > Or, since migration runs nowadays with quite some threads concurrently, > whether the main migration Job can always properly synchronize between > all of them with no problem (maybe yes, but I just don't know Jobs enough > to say). This is also a relevant question about how much AioContext > plays a role in core of Jobs idea and whether it can work well with > complicated threaded environment. Fair points! Which ones are due to the external Job interface, and which ones are "only" due to its current implementation? Thanks a lot for writing all this! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) 2025-11-20 10:30 ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster @ 2025-11-20 12:16 ` Kevin Wolf 0 siblings, 0 replies; 14+ messages in thread From: Kevin Wolf @ 2025-11-20 12:16 UTC (permalink / raw) To: Markus Armbruster; +Cc: Peter Xu, qemu-devel, farosas, John Snow Am 20.11.2025 um 11:30 hat Markus Armbruster geschrieben: > Peter Xu <peterx@redhat.com> writes: > > On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote: > >> [*] If the job abstraction had been available in time, migration would > >> totally be a job. There's no *design* reason for it being not a job. > >> Plenty of implementation and backward compatibility reasons, though. > > > > There might be something common between Jobs that block uses and a > > migration process. If so, we can provide CommonJob and make MigrationJob > > and BlockJobs dependent on it. Conceptually, live migration and the mirror block job are _really_ similar. You have a bulk copy phase and you keep copying data that has changed to bring both sides in sync. When both sides are close enough, you stop new changes from coming in, copy the small remainder and finish the thing. The main difference is that mirror copies disk content whereas live migration mostly copies RAM. But that's irrelevant conceptually. So it makes a lot of sense to me that the same user-visible state machine should be applicable to both. (I'm not saying that we have to do this, just that I expect it to be possible.) > > Possible challenges of adopting Jobs in migration flow > > ====================================================== > > > > - Many Jobs defined property doesn't directly suite migration > > > > - JobStatus is not directly suitable for migration purposes. There're > > some of the JobStatus that I can't think of any use > > (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we > > can simply not use it), but there're other status that migration needs > > but isn't availble. Introducing them seems to be an overkill instead to > > block layer's use case. Which other status does live migration have that the user cares about? Does it have to be a status on the Job level or could it be some kind of substatus that could be represented by job-specific information in query-jobs? (Which doesn't exist yet, but I think we have been talking about it multiple times before.) > The Job abstraction defines possible states and state transitions. Each > job finds its own path from the initial state @created to the final > state @concluded. If a state doesn't make sense for a certain type of > job, it simply doesn't go there. Note that most states are really managed by the common Job abstraction. The job itself only goes potentially from RUNNING to READY, and then implicitly to WAITING/ABORTING when it returns from .run(). Everything else only exists so that management tools can orchestrate jobs in the right way and can query errors before the job disappears. I'm not sure if WAITING is really useless for migration. In theory, you could have a job transaction of some mirror jobs for the disks and live migration for the device state, and of course you want both to finish and switch over at the same time. I'm also not completely sure if it will actually be used in practice, but the Job infrastructure gives you the option for free. PENDING and the associated job-finalize already exists in live migration in the form of pause-before-switchover/pre-switchover status and migrate-continue. So I don't think you can argue you have no use for it. > So, job states migration doesn't want are only a problem if there is no > path from start to finish that doesn't go through unwanted states. > > There may also be states migration wants that aren't job states. We > could make them job states. Or we map multiple migration states to a > single job state, i.e. have the job state *abstract* from migration > state details. > > > - Similarly to JobVerb. E.g. JOB_VERB_CHANGE doesn't seem to apply to > > any concept to migration, but it misses quite some others > > (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more). How is SET_DOWNTIME or POSTCOPY_START not just a form of CHANGE? > JobVerb is used internally to restrict certain job commands to certain > job states. For instance, command job-dismiss is rejected unless job is > in state @concluded. > > This governs the generic job-FOO commands. It also covers the legacy > block-job-FOO commands, because these wrap around the same C core as the > job-FOO commands. > > We could have commands specific to a certain job type (say migration > jobs) that bypass the JobVerb infrastructure, and do their own thing to > restrict themselves to certain states. Probably stupid if the states > that matter are job states. Probably necessary if they aren't (say a > more fine-grained migration state). I suspect we would have to look at specific examples to figure out how to represent them best. In general, I think a generic job-change (to be added as a more generic version of block-job-change) and job-specific data in query-jobs can cover a lot. You may want to have job-specific QMP events outside of the Job mechanism, or we could have a generic one to notify the user that something in the queryable state has changed. > > - Similarly, JobInfo reports in current-progress (which is not optional > > but required), which may make perfect sense for block jobs. However > > migration is OTOH convergence-triggered process, or user-triggered (in > > case of postcopy). It doesn't have a quantified process but only > > "COMPLETED" / "IN_PROGRESS". > > Is there really no way to track migration progress approximately? Of course there is. mirror in its default configuration is no different. When things are dirtied, the amount of work to do simply grows. > Here's the relevant doc comment: > > # @current-progress: Progress made until now. The unit is arbitrary > # and the value can only meaningfully be used for the ratio of > # @current-progress to @total-progress. The value is > # monotonically increasing. > # > # @total-progress: Estimated @current-progress value at the completion > # of the job. This value can arbitrarily change while the job is > # running, in both directions. > > I think this should work fine for convergence-triggered finish. > @current-progress could be the number of "things" sent (for some > arbitrary, convenient choice of "things"). Monotonotically increasing. > @total-progress then would have to be a more or less rough estimate of > @current-progress plus what still needs to be sent. For RAM, > @current-progress could be number of pages sent, ane @total-progress > could be number of pages sent + (possibly estimated) number of dirty > pages. Multiply by page size if that makes adding the estimated size of > the non-RAM transfers easier. > > I haven't thought about postcopy. Postcopy should ask for every "thing" only once, so if you knew the number of remaining "things" when you switched to postcopy, you can simply continue to increase current-progress and leave total-progress unchanged (which might already be what automatically happens because the number of dirty pages can't grow on an inactive source instance any more). > > - Another very major example that I have discussed a few times > > previously, Jobs are close attached to AioContext, while migration > > doesn't have, meanwhile migration is moving even further away from > > event driven model.. See: > > > > https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t The main loop of a job runs in a coroutine in an AioContext, yes, and that is the context where you will get callbacks from, too. If the job doesn't explicitly put itself anywhere else, it will be in the main thread. Nothing stops you from firing off as many worker threads as you want to, though. This separation of the migration thread and its management interface running in a different thread isn't new, though: Your existing QMP commands always run in the main thread, too. > > There're just too many example showing that Jobs are defined almost only > > for block layer.. e.g. job-finalize (which may not make much sense in a > > migration context anyway..) mentions finalizing of graph changes, which > > also doesn't exist in migration process. s/graph changes/switchover/ for migration. I suppose "changes to global state" might be a more accurate generic description. > > So if we rewrite migration somehow with Jobs or keeping migration in mind > > designing Jobs, Jobs may need to be very bloated containing both > > migration and block layer requirements. > > > > - Migration involves "two" QEMU instances instead of one > > > > I'm guessing existing Jobs operations are not as such, and providing such > > mechanisms in "Jobs" only for migration may introduce unnecessary code > > that block layer will never use. > > > > E.g. postcopy migration attached the two QEMU instances to represent one > > VM instance. I do not have a clear picture in mind yet on how we can > > manage that if we see it as two separate Jobs on each side, and what > > happens if each side operates on its own Job with different purposes, and > > how we should connect two Jobs to say they're relevant (or maybe we don't > > need to?). Don't you already run two different code paths on the source and the destination host? Why would they be the same job? Block migration isn't very different. You typically have an NBD export running on one side and have a mirror job connecting to it on the other side. One part doesn't make sense without the other, but that doesn't mean that they are the same thing. > > - More challenges on dest QEMU (VM loader) than src QEMU > > > > Unlike on the src side, the dest QEMU, when in an incoming state, is not > > a VM at all yet, but waiting to receive the migration data to become a > > working VM. It's not a generic long term process, but a pure listening > > port of QEMU where QEMU can do nothing without this "job" being > > completed.. > > > > If we think about CPR it's even more complicated, because we essential > > require part of incoming process to happen before almost everything.. it > > may even include monitors being initialized. I'm not sure that the destination side should be a job. I suppose with postcopy migration you have at least some background task, but with precopy you don't even have that. Is there much that the user can manage on the destination side apart from just waiting for migration to finish? > > - Deep integration with other subsystems > > > > Migration is deeply integrated into many other subsystems (auto-converge > > being able to throttle vCPUs, RAM being able to ignore empty pages > > reported from balloons, dirty trackings per-module, etc.), so we're not > > sure if there'll be some limitation from Jobs (when designed with block > > layer in mind) that will make such transition harder. > > > > For example, we at least want to make sure Jobs won't have simple locks > > that will be held while running migration, that can further deadlock if > > the migration code may invoke something else that tries to re-take the > > Jobs lock, which may cause dead-locks. I don't know why anything in the actual system emulation would want to access the migration job and take its locks? But that aside, jobs are made for long running background tasks. While they have a mutex to protect the generic state, you're not supposed to hold it for a long time. > > Or, since migration runs nowadays with quite some threads concurrently, > > whether the main migration Job can always properly synchronize between > > all of them with no problem (maybe yes, but I just don't know Jobs enough > > to say). This is also a relevant question about how much AioContext > > plays a role in core of Jobs idea and whether it can work well with > > complicated threaded environment. Why wouldn't they be able to do that when a QMP handler in the main loop can do it correctly? Kevin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] migration: Error fixes and improvements 2025-11-19 20:58 ` Peter Xu 2025-11-20 10:30 ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster @ 2025-11-21 12:38 ` Fabiano Rosas 1 sibling, 0 replies; 14+ messages in thread From: Fabiano Rosas @ 2025-11-21 12:38 UTC (permalink / raw) To: Peter Xu, Markus Armbruster; +Cc: qemu-devel Peter Xu <peterx@redhat.com> writes: > On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote: > > [...] > >> The hairy part is the background task. >> >> I believe it used to simply do its job, reporting errors to stderr along >> the way, until it either succeeded or failed. The errors reported made >> success / failure "obvious" for users. >> >> This can report multiple errors, which can be confusing. >> >> Worse, it was no good for management applications. These need to >> observe migration as a state machine, with final success and error >> states, where the error state comes with an indication of what went >> wrong. So we made migration store the first of certain errors in the >> migration state in addition to reporting to stderr. >> >> "First", because we store only when the state doesn't already have an >> error. "Certain", because I doubt we do it for all errors we report. >> >> Compare this to how jobs solve this problem. These are a much, much >> later invention, and designed for management applications from the >> start[*]. A job is a state machine. Management applications can >> observe and control the state. Errors are not supposed to be reported, >> they should be fed to the state machine, which goes into an error state >> then. The job is not supposed to do actual work in an error state. >> Therefore, no further errors should be possible. When something goes >> wrong, we get a single error, stored in the job state, where the >> management application can find it. >> >> Migration is also a state machine, and we long ago retrofitted the means >> for management applications to observe and control the state. What we >> haven't done is the disciplined feeding of errors to the state machine. >> We can still get multiple errors. We store the first of certain errors >> where the managament application can find it, but whether that error >> suffices to explain what went wrong is a crap shot. As long as that's >> the case, we need to spew the other errors to stderr, where a human can >> find it. > > Since above mentioned once more on the possibility of reusing Jobs idea, I > did try to list things explicitly this time, that why I think it should be > challenging and maybe not as worthwhile (?) to do so, however I might be > wrong. I attached it at the end of this email almost for myself in the > future to reference, please feel free comment, or, to ignore all of those! > IMHO it's not directly relevant to the error reporting issues. > > IMHO rewriting migration with Jobs will not help much in error reporting, > because the challenge for refactoring from migration side is not the "Jobs" > interfacing, but internally of migration. Say, even if migration provided > a "job", it's the "job" impl that did error reporting bad, not the Jobs > interfacing.. the "job" impl will need to manage quite some threads on its > own, making sure errors are properly reported at least to the "job" > interface. > > Said that, I totally agree we should try to improve error reporting in > migration.. with / without Jobs. > > [...] > >> > Maybe I should ping Vladimir on his recent work here? >> > >> > https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru >> > >> > That'll be part of such cleanup effort (and yes unfortunately many >> > migration related cleanups will need a lot of code churns...). >> >> I know... >> >> Can we afford modest efforts to reduce the mess one step at a time? > > Yes, I'll try to follow up on that. > > [...] > >> [*] If the job abstraction had been available in time, migration would >> totally be a job. There's no *design* reason for it being not a job. >> Plenty of implementation and backward compatibility reasons, though. > > There might be something common between Jobs that block uses and a > migration process. If so, we can provide CommonJob and make MigrationJob > and BlockJobs dependent on it. > > However, I sincerely don't know how much common function will there be. > IOW, I doubt even in an imaginery world, if we could go back to when Jobs > was designed and if we would make migration a Job too (note! snapshots is > definitely a too simple migration scenario..). Is it possible after > evaluation we still don't? I don't know, but I think it's possible. > > Thanks! > Peter > > > > > Possible challenges of adopting Jobs in migration flow > ====================================================== > > - Many Jobs defined property doesn't directly suite migration > > - JobStatus is not directly suitable for migration purposes. There're > some of the JobStatus that I can't think of any use > (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we > can simply not use it), but there're other status that migration needs > but isn't availble. Introducing them seems to be an overkill instead to > block layer's use case. > > - Similarly to JobVerb. E.g. JOB_VERB_CHANGE doesn't seem to apply to > any concept to migration, but it misses quite some others > (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more). > > - Similarly, JobInfo reports in current-progress (which is not optional > but required), which may make perfect sense for block jobs. However > migration is OTOH convergence-triggered process, or user-triggered (in > case of postcopy). It doesn't have a quantified process but only > "COMPLETED" / "IN_PROGRESS". > > - Another very major example that I have discussed a few times > previously, Jobs are close attached to AioContext, while migration > doesn't have, meanwhile migration is moving even further away from > event driven model.. See: > > https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t > > There're just too many example showing that Jobs are defined almost only > for block layer.. e.g. job-finalize (which may not make much sense in a > migration context anyway..) mentions finalizing of graph changes, which > also doesn't exist in migration process. > > So if we rewrite migration somehow with Jobs or keeping migration in mind > designing Jobs, Jobs may need to be very bloated containing both > migration and block layer requirements. > > - Migration involves "two" QEMU instances instead of one > > I'm guessing existing Jobs operations are not as such, and providing such > mechanisms in "Jobs" only for migration may introduce unnecessary code > that block layer will never use. > > E.g. postcopy migration attached the two QEMU instances to represent one > VM instance. I do not have a clear picture in mind yet on how we can > manage that if we see it as two separate Jobs on each side, and what > happens if each side operates on its own Job with different purposes, and > how we should connect two Jobs to say they're relevant (or maybe we don't > need to?). > > - More challenges on dest QEMU (VM loader) than src QEMU > > Unlike on the src side, the dest QEMU, when in an incoming state, is not > a VM at all yet, but waiting to receive the migration data to become a > working VM. It's not a generic long term process, but a pure listening > port of QEMU where QEMU can do nothing without this "job" being > completed.. > > If we think about CPR it's even more complicated, because we essential > require part of incoming process to happen before almost everything.. it > may even include monitors being initialized. > > - Deep integration with other subsystems > > Migration is deeply integrated into many other subsystems (auto-converge > being able to throttle vCPUs, RAM being able to ignore empty pages > reported from balloons, dirty trackings per-module, etc.), so we're not > sure if there'll be some limitation from Jobs (when designed with block > layer in mind) that will make such transition harder. > > For example, we at least want to make sure Jobs won't have simple locks > that will be held while running migration, that can further deadlock if > the migration code may invoke something else that tries to re-take the > Jobs lock, which may cause dead-locks. > > Or, since migration runs nowadays with quite some threads concurrently, > whether the main migration Job can always properly synchronize between > all of them with no problem (maybe yes, but I just don't know Jobs enough > to say). This is also a relevant question about how much AioContext > plays a role in core of Jobs idea and whether it can work well with > complicated threaded environment. Thanks for looking into this, Peter! I'm saving it for future reference as well! It was on my todo list to make such an analysis. I hope Markus can comment on some of those and maybe we can still find a way to converge, but I think I agree that migration is (at this point) a little too particular to be retrofitted (which I'd be very much in favor of, if it were at all feasible). (wondering what happened in QEMU historically that we devised so many well designed interfaces, but chose to leave migration aside altogether) (maybe this right here is what happened) ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-11-22 1:51 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-15 8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster 2025-11-15 8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster 2025-11-15 8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster 2025-11-17 15:47 ` Fabiano Rosas 2025-11-15 8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster 2025-11-17 15:50 ` Fabiano Rosas 2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu 2025-11-18 7:44 ` Markus Armbruster 2025-11-18 17:35 ` Peter Xu 2025-11-19 7:45 ` Markus Armbruster 2025-11-19 20:58 ` Peter Xu 2025-11-20 10:30 ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster 2025-11-20 12:16 ` Kevin Wolf 2025-11-21 12:38 ` [PATCH 0/3] migration: Error fixes and improvements Fabiano Rosas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).