qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] migration: Error fixes and improvements
@ 2025-11-15  8:34 Markus Armbruster
  2025-11-15  8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Markus Armbruster @ 2025-11-15  8:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas

Maintainers decide what to take for 10.2, if anything.

Let me know if you'd prefer the "perhaps should take ownership" idea
in PATCH 1's commit message.

Markus Armbruster (3):
  migration: Plug memory leaks after migrate_set_error()
  migration: Use warn_reportf_err() where appropriate
  migration/postcopy-ram: Improve error reporting after loadvm failure

 migration/cpr-exec.c     |  3 ++-
 migration/multifd.c      |  6 ++++--
 migration/postcopy-ram.c | 17 ++++++++---------
 3 files changed, 14 insertions(+), 12 deletions(-)

-- 
2.49.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] migration: Plug memory leaks after migrate_set_error()
  2025-11-15  8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster
@ 2025-11-15  8:34 ` Markus Armbruster
  2025-11-15  8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Markus Armbruster @ 2025-11-15  8:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas

migrate_set_error(s, err) stores a copy of @err in @s.  The original
@err is not freed.  Most callers free it immediately.  Some callers
free it later, or pass it on.  And some leak it.  Fix those.

Perhaps migrate_set_error(s, err) should take ownership of @err.  The
callers that free it immediately would become simpler, and avoid a
copy and a deallocation.  The others would have to pass
error_copy(err).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/cpr-exec.c | 3 ++-
 migration/multifd.c  | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/cpr-exec.c b/migration/cpr-exec.c
index d284f6e734..0b8344a86f 100644
--- a/migration/cpr-exec.c
+++ b/migration/cpr-exec.c
@@ -159,11 +159,12 @@ static void cpr_exec_cb(void *opaque)
     error_report_err(error_copy(err));
     migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
     migrate_set_error(s, err);
+    error_free(err);
+    err = NULL;
 
     /* Note, we can go from state COMPLETED to FAILED */
     migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
 
-    err = NULL;
     if (!migration_block_activate(&err)) {
         /* error was already reported */
         error_free(err);
diff --git a/migration/multifd.c b/migration/multifd.c
index 98873cee74..a529c399e4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -964,6 +964,7 @@ bool multifd_send_setup(void)
 
         if (!multifd_new_send_channel_create(p, &local_err)) {
             migrate_set_error(s, local_err);
+            error_free(local_err);
             ret = -1;
         }
     }
@@ -988,6 +989,7 @@ bool multifd_send_setup(void)
         ret = multifd_send_state->ops->send_setup(p, &local_err);
         if (ret) {
             migrate_set_error(s, local_err);
+            error_free(local_err);
             goto err;
         }
         assert(p->iov);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] migration: Use warn_reportf_err() where appropriate
  2025-11-15  8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster
  2025-11-15  8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster
@ 2025-11-15  8:34 ` Markus Armbruster
  2025-11-17 15:47   ` Fabiano Rosas
  2025-11-15  8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster
  2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu
  3 siblings, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2025-11-15  8:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas

Replace

    warn_report("...: %s", ..., error_get_pretty(err));

by

    warn_reportf_err(err, "...: ", ...);

Prior art: commit 5217f1887a8 (error: Use error_reportf_err() where
appropriate).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/multifd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index a529c399e4..6210454838 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -464,8 +464,8 @@ static void migration_ioc_shutdown_gracefully(QIOChannel *ioc)
          */
         migration_tls_channel_end(ioc, &local_err);
         if (local_err) {
-            warn_report("Failed to gracefully terminate TLS connection: %s",
-                        error_get_pretty(local_err));
+            warn_reportf_err(local_err,
+                        "Failed to gracefully terminate TLS connection: ");
         }
     }
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure
  2025-11-15  8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster
  2025-11-15  8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster
  2025-11-15  8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster
@ 2025-11-15  8:35 ` Markus Armbruster
  2025-11-17 15:50   ` Fabiano Rosas
  2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu
  3 siblings, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2025-11-15  8:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas

One of two error messages show __func__.  Drop it; it doesn't help
users, and developers can grep for the message.  This also permits
de-duplicating the code to prepend to the error message.

Both error messages show a numeric error code.  I doubt that's
helpful, but I'm leaving it alone.

Use error_append_hint() for explaining that some dirty bitmaps may be
lost.  Polish the prose.

Don't faff around with g_clear_pointer(), it's not worth its keep
here.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/postcopy-ram.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3f98dcb6fd..7c9fe61041 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2146,25 +2146,24 @@ static void *postcopy_listen_thread(void *opaque)
     if (load_res < 0) {
         qemu_file_set_error(f, load_res);
         dirty_bitmap_mig_cancel_incoming();
+        error_prepend(&local_err,
+                      "loadvm failed during postcopy: %d: ", load_res);
         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
             !migrate_postcopy_ram() && migrate_dirty_bitmaps())
         {
-            error_report("%s: loadvm failed during postcopy: %d: %s. All states "
-                         "are migrated except dirty bitmaps. Some dirty "
-                         "bitmaps may be lost, and present migrated dirty "
-                         "bitmaps are correctly migrated and valid.",
-                         __func__, load_res, error_get_pretty(local_err));
-            g_clear_pointer(&local_err, error_free);
+            error_append_hint(&local_err,
+                              "All state is migrated except dirty bitmaps."
+                              " Some dirty bitmaps may be lost, but any"
+                              " migrated dirty bitmaps are valid.");
+            error_report_err(local_err);
         } else {
             /*
              * Something went fatally wrong and we have a bad state, QEMU will
              * exit depending on if postcopy-exit-on-error is true, but the
              * migration cannot be recovered.
              */
-            error_prepend(&local_err,
-                          "loadvm failed during postcopy: %d: ", load_res);
             migrate_set_error(migr, local_err);
-            g_clear_pointer(&local_err, error_report_err);
+            error_report_err(local_err);
             migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_FAILED);
             goto out;
         }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] migration: Use warn_reportf_err() where appropriate
  2025-11-15  8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster
@ 2025-11-17 15:47   ` Fabiano Rosas
  0 siblings, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2025-11-17 15:47 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: peterx

Markus Armbruster <armbru@redhat.com> writes:

> Replace
>
>     warn_report("...: %s", ..., error_get_pretty(err));
>
> by
>
>     warn_reportf_err(err, "...: ", ...);
>
> Prior art: commit 5217f1887a8 (error: Use error_reportf_err() where
> appropriate).
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure
  2025-11-15  8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster
@ 2025-11-17 15:50   ` Fabiano Rosas
  0 siblings, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2025-11-17 15:50 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: peterx

Markus Armbruster <armbru@redhat.com> writes:

> One of two error messages show __func__.  Drop it; it doesn't help
> users, and developers can grep for the message.  This also permits
> de-duplicating the code to prepend to the error message.
>
> Both error messages show a numeric error code.  I doubt that's
> helpful, but I'm leaving it alone.
>
> Use error_append_hint() for explaining that some dirty bitmaps may be
> lost.  Polish the prose.
>
> Don't faff around with g_clear_pointer(), it's not worth its keep
> here.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] migration: Error fixes and improvements
  2025-11-15  8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster
                   ` (2 preceding siblings ...)
  2025-11-15  8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster
@ 2025-11-17 16:03 ` Peter Xu
  2025-11-18  7:44   ` Markus Armbruster
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-11-17 16:03 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, farosas

On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote:
> Maintainers decide what to take for 10.2, if anything.
> 
> Let me know if you'd prefer the "perhaps should take ownership" idea
> in PATCH 1's commit message.

I recall I had such patch previously, so I digged it out:

https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/

I found that I dropped it in v3's post of that series, where I mentioned in
the change log that either way is not clean, so I dropped that until I
could have a better understanding:

https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/

I think at that time I should have hit an use case where the caller forgot
to error_copy(), hence passing it in causing an UAF.  Then I thought memory
leak was better in error path comparing to UAF if the API was used wrong
either way.

But feel free to share your thoughts.  We can definitely revisit this.

I queued the series for this release, thanks Markus.

> 
> Markus Armbruster (3):
>   migration: Plug memory leaks after migrate_set_error()
>   migration: Use warn_reportf_err() where appropriate
>   migration/postcopy-ram: Improve error reporting after loadvm failure
> 
>  migration/cpr-exec.c     |  3 ++-
>  migration/multifd.c      |  6 ++++--
>  migration/postcopy-ram.c | 17 ++++++++---------
>  3 files changed, 14 insertions(+), 12 deletions(-)
> 
> -- 
> 2.49.0
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] migration: Error fixes and improvements
  2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu
@ 2025-11-18  7:44   ` Markus Armbruster
  2025-11-18 17:35     ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2025-11-18  7:44 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas

Peter Xu <peterx@redhat.com> writes:

> On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote:
>> Maintainers decide what to take for 10.2, if anything.
>> 
>> Let me know if you'd prefer the "perhaps should take ownership" idea
>> in PATCH 1's commit message.
>
> I recall I had such patch previously, so I digged it out:
>
> https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/
>
> I found that I dropped it in v3's post of that series, where I mentioned in
> the change log that either way is not clean, so I dropped that until I
> could have a better understanding:
>
> https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/
>
> I think at that time I should have hit an use case where the caller forgot
> to error_copy(), hence passing it in causing an UAF.

Use-after-free can happen if the caller holds on to its reference until
after the stored Error is freed.  In other words, holding on to the
reference is safe only until the stored Error is freed, and any safety
argument will have to reason about the stored Error's lifetime.  No idea
how difficult or brittle such an argument would be.

>                                                       Then I thought memory
> leak was better in error path comparing to UAF if the API was used wrong
> either way.

Fair point.

> But feel free to share your thoughts.  We can definitely revisit this.

migrate_set_error(s, err) stores a copy of @err in @s unless @s already
has an Error stored.

I see 26 calls of migrate_set_error().

* 11 call error_free() immediately, and 2 call it via g_autoptr().  3
  neglect to call it.  My patch fixes them.  Total is 16 out of 26.

* 6 report and free with error_report_err(), i.e. we store the error for
  later *and* report it now.  Gives me an uneasy feeling.  How is the
  stored error handled?  Will it be reported again?  That would likely
  be wrong.

* 3 wrap migrate_set_error():

  - multifd_send_set_error()

    Its callers both call error_free() immediately.

  - migration_connect_set_error()

    3 callers.

    qmp_migrate() and qmp_migrate_finish() propagate to their callers,
    i.e. we store the error for later *and* have callers handle it now.
    Same uneasy feeling as above.

    One of migration_connect()'s callers passes NULL, the other calls
    error_free() immediately.

  - multifd_recv_terminate_threads()

    Two callers pass NULL, one calls error_free() immediately, and
    multifd_recv_new_channel() propagates.  Uneasy feeling again.

* qemu_savevm_state_setup() confuses me.  It sets @errp on failure, and
  stores some (uneasy feeling), but not all of these errors with
  migrate_set_error().  See below.

Summary:

* We're prone to leaking the Error passed to migrate_set_error().

* If we replaced it by a function that takes ownership, we may become
  prone to use-after-free.  I write "may" because I'm unsure when
  exactly a use after calling this ownership-taking function would be
  unsafe.

* The "forked" Error handling makes me uneasy.  I'm sure we do it for
  reasons.  Can you help me understand them?

> I queued the series for this release, thanks Markus.

Thanks!


Bonus content:

    int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
    {
        ERRP_GUARD();
        MigrationState *ms = migrate_get_current();
        JSONWriter *vmdesc = ms->vmdesc;
        SaveStateEntry *se;
        int ret = 0;

        if (vmdesc) {
            json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
            json_writer_start_array(vmdesc, "devices");
        }

        trace_savevm_state_setup();
        QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {

The function does work in a loop.

This work can fail in two places, setting @errp.  When it does, we
return.

            if (se->vmsd && se->vmsd->early_setup) {

First one:

                ret = vmstate_save(f, se, vmdesc, errp);
                if (ret) {

vmstate_save() set @errp and returned an error code.

                    migrate_set_error(ms, *errp);
                    qemu_file_set_error(f, ret);

Store @errp in @ms, and the error code in @f.

Aside: storing error state in two places feels like one too many.

                    break;

This break jumps to if (ret) { return ret; }, so it's a roundabout way
to return ret.  Meh.

                }
                continue;
            }

            if (!se->ops || !se->ops->save_setup) {
                continue;
            }
            if (se->ops->is_active) {
                if (!se->ops->is_active(se->opaque)) {
                    continue;
                }
            }
            save_section_header(f, se, QEMU_VM_SECTION_START);

Second one:

            ret = se->ops->save_setup(f, se->opaque, errp);
            save_section_footer(f, se);
            if (ret < 0) {

->save_setup() set @errp and returned an error code.

                qemu_file_set_error(f, ret);

This time we store only the error code.  Why not @errp?

                break;

Again, a roundabout way to return ret.

            }

Obviously ret >= 0 here.  I believe @errp is not set.

        }

We get here either via break or normal loop termination.

If via break, ret != 0 and @errp set.

If via normal loop termination, ret >= 0 and @errp not set.

        if (ret) {
            return ret;

If via normal loop termination, and ret > 0, we return failure (I think)
without setting @errp.  I hope this can't happen, because ->save_setup()
never returns > 0.  But it's unclean even then.

        }

I trust @errp is still unset here.  This is the case if we take the
return right above when @errp has been set.

        /* TODO: Should we check that errp is set in case of failure ? */
        return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);

This time we store nothing.  Why?

    }

I think this function would be easier to understand if we replaced break
by return.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] migration: Error fixes and improvements
  2025-11-18  7:44   ` Markus Armbruster
@ 2025-11-18 17:35     ` Peter Xu
  2025-11-19  7:45       ` Markus Armbruster
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-11-18 17:35 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, farosas

On Tue, Nov 18, 2025 at 08:44:32AM +0100, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote:
> >> Maintainers decide what to take for 10.2, if anything.
> >> 
> >> Let me know if you'd prefer the "perhaps should take ownership" idea
> >> in PATCH 1's commit message.
> >
> > I recall I had such patch previously, so I digged it out:
> >
> > https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/
> >
> > I found that I dropped it in v3's post of that series, where I mentioned in
> > the change log that either way is not clean, so I dropped that until I
> > could have a better understanding:
> >
> > https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/
> >
> > I think at that time I should have hit an use case where the caller forgot
> > to error_copy(), hence passing it in causing an UAF.
> 
> Use-after-free can happen if the caller holds on to its reference until
> after the stored Error is freed.  In other words, holding on to the
> reference is safe only until the stored Error is freed, and any safety
> argument will have to reason about the stored Error's lifetime.  No idea
> how difficult or brittle such an argument would be.
> 
> >                                                       Then I thought memory
> > leak was better in error path comparing to UAF if the API was used wrong
> > either way.
> 
> Fair point.
> 
> > But feel free to share your thoughts.  We can definitely revisit this.
> 
> migrate_set_error(s, err) stores a copy of @err in @s unless @s already
> has an Error stored.
> 
> I see 26 calls of migrate_set_error().
> 
> * 11 call error_free() immediately, and 2 call it via g_autoptr().  3
>   neglect to call it.  My patch fixes them.  Total is 16 out of 26.

Yes, I appreciate that!

> 
> * 6 report and free with error_report_err(), i.e. we store the error for
>   later *and* report it now.  Gives me an uneasy feeling.  How is the
>   stored error handled?  Will it be reported again?  That would likely
>   be wrong.

Migration as a module outlives me working as a developer on QEMU.. so I can
only share my two cents that still resides in my memory, which may or may
not always be the truth..

I think one issue is migration used to error_report() things all over the
places, but then we thought it a good idea to try remember the error so
that libvirt can query at any time if necessary.  With that, starting from
QEMU 2.7 we introduced error-desc into query-migrate results.  That's what
s->error was about.

But then, could we drop the error_report() / error_report_err()s?  Likely
not, because there might be user relying on the printed errors to see what
happened..  Making query-migrate the only source of error report adds
complexity to those users and may cause confusions..  Hence the extra
references sometimes needed after migrate_set_error(), almost for keeping
behaviors to print it out as the old times (I didn't check each of them,
though).

> 
> * 3 wrap migrate_set_error():
> 
>   - multifd_send_set_error()
> 
>     Its callers both call error_free() immediately.
> 
>   - migration_connect_set_error()
> 
>     3 callers.
> 
>     qmp_migrate() and qmp_migrate_finish() propagate to their callers,
>     i.e. we store the error for later *and* have callers handle it now.
>     Same uneasy feeling as above.
> 
>     One of migration_connect()'s callers passes NULL, the other calls
>     error_free() immediately.
> 
>   - multifd_recv_terminate_threads()
> 
>     Two callers pass NULL, one calls error_free() immediately, and
>     multifd_recv_new_channel() propagates.  Uneasy feeling again.
> 
> * qemu_savevm_state_setup() confuses me.  It sets @errp on failure, and
>   stores some (uneasy feeling), but not all of these errors with
>   migrate_set_error().  See below.

Yes, I also agree qemu_savevm_state_setup() is confusing on error
handlings.  I'll comment below.

> 
> Summary:
> 
> * We're prone to leaking the Error passed to migrate_set_error().
> 
> * If we replaced it by a function that takes ownership, we may become
>   prone to use-after-free.  I write "may" because I'm unsure when
>   exactly a use after calling this ownership-taking function would be
>   unsafe.
> 
> * The "forked" Error handling makes me uneasy.  I'm sure we do it for
>   reasons.  Can you help me understand them?

Hope above would provide some context.  In short, IMHO it's a mixture
demand of "print the error immediately like the old behavior" and "remember
the error too when query".

> 
> > I queued the series for this release, thanks Markus.
> 
> Thanks!
> 
> 
> Bonus content:
> 
>     int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>     {
>         ERRP_GUARD();
>         MigrationState *ms = migrate_get_current();
>         JSONWriter *vmdesc = ms->vmdesc;
>         SaveStateEntry *se;
>         int ret = 0;
> 
>         if (vmdesc) {
>             json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
>             json_writer_start_array(vmdesc, "devices");
>         }
> 
>         trace_savevm_state_setup();
>         QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> 
> The function does work in a loop.
> 
> This work can fail in two places, setting @errp.  When it does, we
> return.
> 
>             if (se->vmsd && se->vmsd->early_setup) {
> 
> First one:
> 
>                 ret = vmstate_save(f, se, vmdesc, errp);
>                 if (ret) {
> 
> vmstate_save() set @errp and returned an error code.
> 
>                     migrate_set_error(ms, *errp);
>                     qemu_file_set_error(f, ret);
> 
> Store @errp in @ms, and the error code in @f.
> 
> Aside: storing error state in two places feels like one too many.

Correct.  It's once again legacy of migration code where it used to set
error onto fds (e.g. that'll start to fail all qemufile APIs on this fd,
and qemufile API is not well designed IMHO.. which is another story), but
then we start to have string-based Errors and I guess we didn't try harder
to unify both.

> 
>                     break;
> 
> This break jumps to if (ret) { return ret; }, so it's a roundabout way
> to return ret.  Meh.

IIUC some people prefer such way so the function returns at a single point
(easier to add common cleanup codes, for example).  But I'd confess I also
prefer a direct return. :)

> 
>                 }
>                 continue;
>             }
> 
>             if (!se->ops || !se->ops->save_setup) {
>                 continue;
>             }
>             if (se->ops->is_active) {
>                 if (!se->ops->is_active(se->opaque)) {
>                     continue;
>                 }
>             }
>             save_section_header(f, se, QEMU_VM_SECTION_START);
> 
> Second one:
> 
>             ret = se->ops->save_setup(f, se->opaque, errp);
>             save_section_footer(f, se);
>             if (ret < 0) {
> 
> ->save_setup() set @errp and returned an error code.
> 
>                 qemu_file_set_error(f, ret);
> 
> This time we store only the error code.  Why not @errp?

IIUC migrate_set_error() above was redundant instead, because
qemu_savevm_state_setup()'s caller will do migrate_set_error().

Said that, we shouldn't need to randomly call qemu_file_set_error() either
deep in this function.. may need some cleanups.

> 
>                 break;
> 
> Again, a roundabout way to return ret.
> 
>             }
> 
> Obviously ret >= 0 here.  I believe @errp is not set.

I don't think in this path ret can be >0..  but yeah this is pretty
unobvious even if true.  That's also why I liked QEMU's current preferences
of "bool fn(..., Error **errp)".. at least in new codes.  Maybe I should
ping Vladimir on his recent work here?

https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru

That'll be part of such cleanup effort (and yes unfortunately many
migration related cleanups will need a lot of code churns...).

> 
>         }
> 
> We get here either via break or normal loop termination.
> 
> If via break, ret != 0 and @errp set.
> 
> If via normal loop termination, ret >= 0 and @errp not set.
> 
>         if (ret) {
>             return ret;
> 
> If via normal loop termination, and ret > 0, we return failure (I think)
> without setting @errp.  I hope this can't happen, because ->save_setup()
> never returns > 0.  But it's unclean even then.
> 
>         }
> 
> I trust @errp is still unset here.  This is the case if we take the
> return right above when @errp has been set.
> 
>         /* TODO: Should we check that errp is set in case of failure ? */
>         return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
> 
> This time we store nothing.  Why?

I think the clean way is we do not store anything when this function
provided Error**.

> 
>     }
> 
> I think this function would be easier to understand if we replaced break
> by return.

I'll see what I can do with this function.

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] migration: Error fixes and improvements
  2025-11-18 17:35     ` Peter Xu
@ 2025-11-19  7:45       ` Markus Armbruster
  2025-11-19 20:58         ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2025-11-19  7:45 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas

Peter Xu <peterx@redhat.com> writes:

> On Tue, Nov 18, 2025 at 08:44:32AM +0100, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Sat, Nov 15, 2025 at 09:34:57AM +0100, Markus Armbruster wrote:
>> >> Maintainers decide what to take for 10.2, if anything.
>> >> 
>> >> Let me know if you'd prefer the "perhaps should take ownership" idea
>> >> in PATCH 1's commit message.
>> >
>> > I recall I had such patch previously, so I digged it out:
>> >
>> > https://lore.kernel.org/all/20230705163502.331007-3-peterx@redhat.com/
>> >
>> > I found that I dropped it in v3's post of that series, where I mentioned in
>> > the change log that either way is not clean, so I dropped that until I
>> > could have a better understanding:
>> >
>> > https://lore.kernel.org/all/20231004220240.167175-1-peterx@redhat.com/
>> >
>> > I think at that time I should have hit an use case where the caller forgot
>> > to error_copy(), hence passing it in causing an UAF.
>> 
>> Use-after-free can happen if the caller holds on to its reference until
>> after the stored Error is freed.  In other words, holding on to the
>> reference is safe only until the stored Error is freed, and any safety
>> argument will have to reason about the stored Error's lifetime.  No idea
>> how difficult or brittle such an argument would be.
>> 
>> >                                                       Then I thought memory
>> > leak was better in error path comparing to UAF if the API was used wrong
>> > either way.
>> 
>> Fair point.
>> 
>> > But feel free to share your thoughts.  We can definitely revisit this.
>> 
>> migrate_set_error(s, err) stores a copy of @err in @s unless @s already
>> has an Error stored.
>> 
>> I see 26 calls of migrate_set_error().
>> 
>> * 11 call error_free() immediately, and 2 call it via g_autoptr().  3
>>   neglect to call it.  My patch fixes them.  Total is 16 out of 26.
>
> Yes, I appreciate that!
>
>> 
>> * 6 report and free with error_report_err(), i.e. we store the error for
>>   later *and* report it now.  Gives me an uneasy feeling.  How is the
>>   stored error handled?  Will it be reported again?  That would likely
>>   be wrong.
>
> Migration as a module outlives me working as a developer on QEMU.. so I can
> only share my two cents that still resides in my memory, which may or may
> not always be the truth..

It was already big & complicated back when *I* started!

> I think one issue is migration used to error_report() things all over the
> places,

Yes, except even error_report() didn't exist, yet.  It made sense at the
time.

>         but then we thought it a good idea to try remember the error so
> that libvirt can query at any time if necessary.  With that, starting from
> QEMU 2.7 we introduced error-desc into query-migrate results.  That's what
> s->error was about.
>
> But then, could we drop the error_report() / error_report_err()s?  Likely
> not, because there might be user relying on the printed errors to see what
> happened..  Making query-migrate the only source of error report adds
> complexity to those users and may cause confusions..  Hence the extra
> references sometimes needed after migrate_set_error(), almost for keeping
> behaviors to print it out as the old times (I didn't check each of them,
> though).

Yes.

The core of migration is a long-running background task.  We have
command line options and monitor commands to start and control it.

QMP commands need to pass their errors to the QMP core by setting @errp.
Any code they call should pass them the same way.  Conceptually easy,
although converting old code to pass error up instead of reporting them
can be tedious.

HMP commands should wrap around QMP commands an report to their monitor.
No issues there, as far as I know.

The hairy part is the background task.

I believe it used to simply do its job, reporting errors to stderr along
the way, until it either succeeded or failed.  The errors reported made
success / failure "obvious" for users.

This can report multiple errors, which can be confusing.

Worse, it was no good for management applications.  These need to
observe migration as a state machine, with final success and error
states, where the error state comes with an indication of what went
wrong.  So we made migration store the first of certain errors in the
migration state in addition to reporting to stderr.

"First", because we store only when the state doesn't already have an
error.  "Certain", because I doubt we do it for all errors we report.

Compare this to how jobs solve this problem.  These are a much, much
later invention, and designed for management applications from the
start[*].  A job is a state machine.  Management applications can
observe and control the state.  Errors are not supposed to be reported,
they should be fed to the state machine, which goes into an error state
then.  The job is not supposed to do actual work in an error state.
Therefore, no further errors should be possible.  When something goes
wrong, we get a single error, stored in the job state, where the
management application can find it.

Migration is also a state machine, and we long ago retrofitted the means
for management applications to observe and control the state.  What we
haven't done is the disciplined feeding of errors to the state machine.
We can still get multiple errors.  We store the first of certain errors
where the managament application can find it, but whether that error
suffices to explain what went wrong is a crap shot.  As long as that's
the case, we need to spew the other errors to stderr, where a human can
find it.

>> * 3 wrap migrate_set_error():
>> 
>>   - multifd_send_set_error()
>> 
>>     Its callers both call error_free() immediately.
>> 
>>   - migration_connect_set_error()
>> 
>>     3 callers.
>> 
>>     qmp_migrate() and qmp_migrate_finish() propagate to their callers,
>>     i.e. we store the error for later *and* have callers handle it now.
>>     Same uneasy feeling as above.
>> 
>>     One of migration_connect()'s callers passes NULL, the other calls
>>     error_free() immediately.
>> 
>>   - multifd_recv_terminate_threads()
>> 
>>     Two callers pass NULL, one calls error_free() immediately, and
>>     multifd_recv_new_channel() propagates.  Uneasy feeling again.
>> 
>> * qemu_savevm_state_setup() confuses me.  It sets @errp on failure, and
>>   stores some (uneasy feeling), but not all of these errors with
>>   migrate_set_error().  See below.
>
> Yes, I also agree qemu_savevm_state_setup() is confusing on error
> handlings.  I'll comment below.
>
>> 
>> Summary:
>> 
>> * We're prone to leaking the Error passed to migrate_set_error().
>> 
>> * If we replaced it by a function that takes ownership, we may become
>>   prone to use-after-free.  I write "may" because I'm unsure when
>>   exactly a use after calling this ownership-taking function would be
>>   unsafe.
>> 
>> * The "forked" Error handling makes me uneasy.  I'm sure we do it for
>>   reasons.  Can you help me understand them?
>
> Hope above would provide some context.  In short, IMHO it's a mixture
> demand of "print the error immediately like the old behavior" and "remember
> the error too when query".

I figure that's how far we were able to drag migration forward.  More
dragging is called for, but it's hard work, and there's so much else to
do.

>> > I queued the series for this release, thanks Markus.
>> 
>> Thanks!
>> 
>> 
>> Bonus content:
>> 
>>     int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>>     {
>>         ERRP_GUARD();
>>         MigrationState *ms = migrate_get_current();
>>         JSONWriter *vmdesc = ms->vmdesc;
>>         SaveStateEntry *se;
>>         int ret = 0;
>> 
>>         if (vmdesc) {
>>             json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
>>             json_writer_start_array(vmdesc, "devices");
>>         }
>> 
>>         trace_savevm_state_setup();
>>         QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> 
>> The function does work in a loop.
>> 
>> This work can fail in two places, setting @errp.  When it does, we
>> return.
>> 
>>             if (se->vmsd && se->vmsd->early_setup) {
>> 
>> First one:
>> 
>>                 ret = vmstate_save(f, se, vmdesc, errp);
>>                 if (ret) {
>> 
>> vmstate_save() set @errp and returned an error code.
>> 
>>                     migrate_set_error(ms, *errp);
>>                     qemu_file_set_error(f, ret);
>> 
>> Store @errp in @ms, and the error code in @f.
>> 
>> Aside: storing error state in two places feels like one too many.
>
> Correct.  It's once again legacy of migration code where it used to set
> error onto fds (e.g. that'll start to fail all qemufile APIs on this fd,
> and qemufile API is not well designed IMHO.. which is another story), but
> then we start to have string-based Errors and I guess we didn't try harder
> to unify both.
>
>> 
>>                     break;
>> 
>> This break jumps to if (ret) { return ret; }, so it's a roundabout way
>> to return ret.  Meh.
>
> IIUC some people prefer such way so the function returns at a single point
> (easier to add common cleanup codes, for example).  But I'd confess I also
> prefer a direct return. :)

I don't believe in complicating control flow to make future cleanup
easier to add.  If we ever we add it, we need to review the entire
function to make sure it's called when needed no matter what.

>> 
>>                 }
>>                 continue;
>>             }
>> 
>>             if (!se->ops || !se->ops->save_setup) {
>>                 continue;
>>             }
>>             if (se->ops->is_active) {
>>                 if (!se->ops->is_active(se->opaque)) {
>>                     continue;
>>                 }
>>             }
>>             save_section_header(f, se, QEMU_VM_SECTION_START);
>> 
>> Second one:
>> 
>>             ret = se->ops->save_setup(f, se->opaque, errp);
>>             save_section_footer(f, se);
>>             if (ret < 0) {
>> 
>> ->save_setup() set @errp and returned an error code.
>> 
>>                 qemu_file_set_error(f, ret);
>> 
>> This time we store only the error code.  Why not @errp?
>
> IIUC migrate_set_error() above was redundant instead, because
> qemu_savevm_state_setup()'s caller will do migrate_set_error().

%-}

> Said that, we shouldn't need to randomly call qemu_file_set_error() either
> deep in this function.. may need some cleanups.
>
>> 
>>                 break;
>> 
>> Again, a roundabout way to return ret.
>> 
>>             }
>> 
>> Obviously ret >= 0 here.  I believe @errp is not set.
>
> I don't think in this path ret can be >0..  but yeah this is pretty
> unobvious even if true.  That's also why I liked QEMU's current preferences
> of "bool fn(..., Error **errp)".. at least in new codes.

Me too.  When all we need the return value to express is success
vs. failure, bool has served us much better than int.

>                                                           Maybe I should
> ping Vladimir on his recent work here?
>
> https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru
>
> That'll be part of such cleanup effort (and yes unfortunately many
> migration related cleanups will need a lot of code churns...).

I know...

Can we afford modest efforts to reduce the mess one step at a time?

>>         }
>> 
>> We get here either via break or normal loop termination.
>> 
>> If via break, ret != 0 and @errp set.
>> 
>> If via normal loop termination, ret >= 0 and @errp not set.
>> 
>>         if (ret) {
>>             return ret;
>> 
>> If via normal loop termination, and ret > 0, we return failure (I think)
>> without setting @errp.  I hope this can't happen, because ->save_setup()
>> never returns > 0.  But it's unclean even then.
>> 
>>         }
>> 
>> I trust @errp is still unset here.  This is the case if we take the
>> return right above when @errp has been set.
>> 
>>         /* TODO: Should we check that errp is set in case of failure ? */
>>         return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>> 
>> This time we store nothing.  Why?
>
> I think the clean way is we do not store anything when this function
> provided Error**.

Point!

>>     }
>> 
>> I think this function would be easier to understand if we replaced break
>> by return.
>
> I'll see what I can do with this function.
>
> Thanks!

You're welcome!


[*] If the job abstraction had been available in time, migration would
totally be a job.  There's no *design* reason for it being not a job.
Plenty of implementation and backward compatibility reasons, though.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] migration: Error fixes and improvements
  2025-11-19  7:45       ` Markus Armbruster
@ 2025-11-19 20:58         ` Peter Xu
  2025-11-20 10:30           ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster
  2025-11-21 12:38           ` [PATCH 0/3] migration: Error fixes and improvements Fabiano Rosas
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Xu @ 2025-11-19 20:58 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, farosas

On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote:

[...]

> The hairy part is the background task.
> 
> I believe it used to simply do its job, reporting errors to stderr along
> the way, until it either succeeded or failed.  The errors reported made
> success / failure "obvious" for users.
> 
> This can report multiple errors, which can be confusing.
> 
> Worse, it was no good for management applications.  These need to
> observe migration as a state machine, with final success and error
> states, where the error state comes with an indication of what went
> wrong.  So we made migration store the first of certain errors in the
> migration state in addition to reporting to stderr.
> 
> "First", because we store only when the state doesn't already have an
> error.  "Certain", because I doubt we do it for all errors we report.
> 
> Compare this to how jobs solve this problem.  These are a much, much
> later invention, and designed for management applications from the
> start[*].  A job is a state machine.  Management applications can
> observe and control the state.  Errors are not supposed to be reported,
> they should be fed to the state machine, which goes into an error state
> then.  The job is not supposed to do actual work in an error state.
> Therefore, no further errors should be possible.  When something goes
> wrong, we get a single error, stored in the job state, where the
> management application can find it.
> 
> Migration is also a state machine, and we long ago retrofitted the means
> for management applications to observe and control the state.  What we
> haven't done is the disciplined feeding of errors to the state machine.
> We can still get multiple errors.  We store the first of certain errors
> where the managament application can find it, but whether that error
> suffices to explain what went wrong is a crap shot.  As long as that's
> the case, we need to spew the other errors to stderr, where a human can
> find it.

Since above mentioned once more on the possibility of reusing Jobs idea, I
did try to list things explicitly this time, that why I think it should be
challenging and maybe not as worthwhile (?) to do so, however I might be
wrong.  I attached it at the end of this email almost for myself in the
future to reference, please feel free comment, or, to ignore all of those!
IMHO it's not directly relevant to the error reporting issues.

IMHO rewriting migration with Jobs will not help much in error reporting,
because the challenge for refactoring from migration side is not the "Jobs"
interfacing, but internally of migration.  Say, even if migration provided
a "job", it's the "job" impl that did error reporting bad, not the Jobs
interfacing.. the "job" impl will need to manage quite some threads on its
own, making sure errors are properly reported at least to the "job"
interface.

Said that, I totally agree we should try to improve error reporting in
migration.. with / without Jobs.

[...]

> > Maybe I should ping Vladimir on his recent work here?
> >
> > https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru
> >
> > That'll be part of such cleanup effort (and yes unfortunately many
> > migration related cleanups will need a lot of code churns...).
> 
> I know...
> 
> Can we afford modest efforts to reduce the mess one step at a time?

Yes, I'll try to follow up on that.

[...]

> [*] If the job abstraction had been available in time, migration would
> totally be a job.  There's no *design* reason for it being not a job.
> Plenty of implementation and backward compatibility reasons, though.

There might be something common between Jobs that block uses and a
migration process.  If so, we can provide CommonJob and make MigrationJob
and BlockJobs dependent on it.

However, I sincerely don't know how much common function will there be.
IOW, I doubt even in an imaginery world, if we could go back to when Jobs
was designed and if we would make migration a Job too (note!  snapshots is
definitely a too simple migration scenario..).  Is it possible after
evaluation we still don't?  I don't know, but I think it's possible.

Thanks!
Peter




Possible challenges of adopting Jobs in migration flow
======================================================

- Many Jobs defined property doesn't directly suite migration

  - JobStatus is not directly suitable for migration purposes.  There're
    some of the JobStatus that I can't think of any use
    (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we
    can simply not use it), but there're other status that migration needs
    but isn't availble. Introducing them seems to be an overkill instead to
    block layer's use case.

  - Similarly to JobVerb.  E.g. JOB_VERB_CHANGE doesn't seem to apply to
    any concept to migration, but it misses quite some others
    (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more).

  - Similarly, JobInfo reports in current-progress (which is not optional
    but required), which may make perfect sense for block jobs. However
    migration is OTOH convergence-triggered process, or user-triggered (in
    case of postcopy).  It doesn't have a quantified process but only
    "COMPLETED" / "IN_PROGRESS".

  - Another very major example that I have discussed a few times
    previously, Jobs are close attached to AioContext, while migration
    doesn't have, meanwhile migration is moving even further away from
    event driven model..  See:

    https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t

  There're just too many example showing that Jobs are defined almost only
  for block layer.. e.g. job-finalize (which may not make much sense in a
  migration context anyway..) mentions finalizing of graph changes, which
  also doesn't exist in migration process.

  So if we rewrite migration somehow with Jobs or keeping migration in mind
  designing Jobs, Jobs may need to be very bloated containing both
  migration and block layer requirements.

- Migration involves "two" QEMU instances instead of one

  I'm guessing existing Jobs operations are not as such, and providing such
  mechanisms in "Jobs" only for migration may introduce unnecessary code
  that block layer will never use.

  E.g. postcopy migration attached the two QEMU instances to represent one
  VM instance.  I do not have a clear picture in mind yet on how we can
  manage that if we see it as two separate Jobs on each side, and what
  happens if each side operates on its own Job with different purposes, and
  how we should connect two Jobs to say they're relevant (or maybe we don't
  need to?).

- More challenges on dest QEMU (VM loader) than src QEMU

  Unlike on the src side, the dest QEMU, when in an incoming state, is not
  a VM at all yet, but waiting to receive the migration data to become a
  working VM. It's not a generic long term process, but a pure listening
  port of QEMU where QEMU can do nothing without this "job" being
  completed..

  If we think about CPR it's even more complicated, because we essential
  require part of incoming process to happen before almost everything.. it
  may even include monitors being initialized.

- Deep integration with other subsystems

  Migration is deeply integrated into many other subsystems (auto-converge
  being able to throttle vCPUs, RAM being able to ignore empty pages
  reported from balloons, dirty trackings per-module, etc.), so we're not
  sure if there'll be some limitation from Jobs (when designed with block
  layer in mind) that will make such transition harder.

  For example, we at least want to make sure Jobs won't have simple locks
  that will be held while running migration, that can further deadlock if
  the migration code may invoke something else that tries to re-take the
  Jobs lock, which may cause dead-locks.

  Or, since migration runs nowadays with quite some threads concurrently,
  whether the main migration Job can always properly synchronize between
  all of them with no problem (maybe yes, but I just don't know Jobs enough
  to say).  This is also a relevant question about how much AioContext
  plays a role in core of Jobs idea and whether it can work well with
  complicated threaded environment.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements)
  2025-11-19 20:58         ` Peter Xu
@ 2025-11-20 10:30           ` Markus Armbruster
  2025-11-20 12:16             ` Kevin Wolf
  2025-11-21 12:38           ` [PATCH 0/3] migration: Error fixes and improvements Fabiano Rosas
  1 sibling, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2025-11-20 10:30 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, John Snow, Kevin Wolf

Peter Xu <peterx@redhat.com> writes:

> On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote:
>
> [...]
>
>> The hairy part is the background task.
>> 
>> I believe it used to simply do its job, reporting errors to stderr along
>> the way, until it either succeeded or failed.  The errors reported made
>> success / failure "obvious" for users.
>> 
>> This can report multiple errors, which can be confusing.
>> 
>> Worse, it was no good for management applications.  These need to
>> observe migration as a state machine, with final success and error
>> states, where the error state comes with an indication of what went
>> wrong.  So we made migration store the first of certain errors in the
>> migration state in addition to reporting to stderr.
>> 
>> "First", because we store only when the state doesn't already have an
>> error.  "Certain", because I doubt we do it for all errors we report.
>> 
>> Compare this to how jobs solve this problem.  These are a much, much
>> later invention, and designed for management applications from the
>> start[*].  A job is a state machine.  Management applications can
>> observe and control the state.  Errors are not supposed to be reported,
>> they should be fed to the state machine, which goes into an error state
>> then.  The job is not supposed to do actual work in an error state.
>> Therefore, no further errors should be possible.  When something goes
>> wrong, we get a single error, stored in the job state, where the
>> management application can find it.
>> 
>> Migration is also a state machine, and we long ago retrofitted the means
>> for management applications to observe and control the state.  What we
>> haven't done is the disciplined feeding of errors to the state machine.
>> We can still get multiple errors.  We store the first of certain errors
>> where the managament application can find it, but whether that error
>> suffices to explain what went wrong is a crap shot.  As long as that's
>> the case, we need to spew the other errors to stderr, where a human can
>> find it.
>
> Since above mentioned once more on the possibility of reusing Jobs idea, I
> did try to list things explicitly this time, that why I think it should be
> challenging and maybe not as worthwhile (?) to do so, however I might be
> wrong.  I attached it at the end of this email almost for myself in the
> future to reference, please feel free comment, or, to ignore all of those!

Challenging definitely, worthwhile unknown, but putting the issues in
writing can only help.

> IMHO it's not directly relevant to the error reporting issues.

There's no hard dependency; migration's error reporting wrongs can
certainly be righted without converting to the job abstraction.
Studying how the job abstraction does errors may still help.

> IMHO rewriting migration with Jobs will not help much in error reporting,
> because the challenge for refactoring from migration side is not the "Jobs"
> interfacing, but internally of migration.  Say, even if migration provided
> a "job", it's the "job" impl that did error reporting bad, not the Jobs
> interfacing.. the "job" impl will need to manage quite some threads on its
> own, making sure errors are properly reported at least to the "job"
> interface.

Point taken.

> Said that, I totally agree we should try to improve error reporting in
> migration.. with / without Jobs.
>
> [...]
>
>> > Maybe I should ping Vladimir on his recent work here?
>> >
>> > https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru
>> >
>> > That'll be part of such cleanup effort (and yes unfortunately many
>> > migration related cleanups will need a lot of code churns...).
>> 
>> I know...
>> 
>> Can we afford modest efforts to reduce the mess one step at a time?
>
> Yes, I'll try to follow up on that.
>
> [...]
>
>> [*] If the job abstraction had been available in time, migration would
>> totally be a job.  There's no *design* reason for it being not a job.
>> Plenty of implementation and backward compatibility reasons, though.
>
> There might be something common between Jobs that block uses and a
> migration process.  If so, we can provide CommonJob and make MigrationJob
> and BlockJobs dependent on it.

Ignore BlockJob here, focus on Job.

BlockJob came first, in 2012: commit eeec61f2913 (block: add BlockJob
interface for long-running operations).  It then grew quite a bit to
accomodate new job types and provide more control.

Job appeared in 2018 as "infrastructure for generic background jobs that
aren't tied to a block device" (commit 33e9e9bd62d (job: Create Job,
JobDriver and job_create()).  Substantial parts of the BlockJob
interface are actually deprecated in favor of the Job interface.  I
believe BlockJob still exists for things that are actually
block-specific, and possibly for things we neglected to move over.

So, CommonJob already exists: it's Job, defined in qapi/job.json.

> However, I sincerely don't know how much common function will there be.
> IOW, I doubt even in an imaginery world, if we could go back to when Jobs
> was designed and if we would make migration a Job too (note!  snapshots is
> definitely a too simple migration scenario..).  Is it possible after
> evaluation we still don't?  I don't know, but I think it's possible.

To find out, we'd have to examine how the migration state machine and
interface could map to the Job state machine, and how the Job interface
could map to migration's internal interfaces.

The mapping may lose detail.  Generic Job is supposed to cover the
generic aspects.  Some specific jobs may need job-specific interfaces we
can't or prefer not to fit into Job.

> Thanks!
> Peter
>
>
>
>
> Possible challenges of adopting Jobs in migration flow
> ======================================================
>
> - Many Jobs defined property doesn't directly suite migration
>
>   - JobStatus is not directly suitable for migration purposes.  There're
>     some of the JobStatus that I can't think of any use
>     (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we
>     can simply not use it), but there're other status that migration needs
>     but isn't availble. Introducing them seems to be an overkill instead to
>     block layer's use case.

The Job abstraction defines possible states and state transitions.  Each
job finds its own path from the initial state @created to the final
state @concluded.  If a state doesn't make sense for a certain type of
job, it simply doesn't go there.

So, job states migration doesn't want are only a problem if there is no
path from start to finish that doesn't go through unwanted states.

There may also be states migration wants that aren't job states.  We
could make them job states.  Or we map multiple migration states to a
single job state, i.e. have the job state *abstract* from migration
state details.

>   - Similarly to JobVerb.  E.g. JOB_VERB_CHANGE doesn't seem to apply to
>     any concept to migration, but it misses quite some others
>     (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more).

JobVerb is used internally to restrict certain job commands to certain
job states.  For instance, command job-dismiss is rejected unless job is
in state @concluded.

This governs the generic job-FOO commands.  It also covers the legacy
block-job-FOO commands, because these wrap around the same C core as the
job-FOO commands.

We could have commands specific to a certain job type (say migration
jobs) that bypass the JobVerb infrastructure, and do their own thing to
restrict themselves to certain states.  Probably stupid if the states
that matter are job states.  Probably necessary if they aren't (say a
more fine-grained migration state).

>   - Similarly, JobInfo reports in current-progress (which is not optional
>     but required), which may make perfect sense for block jobs. However
>     migration is OTOH convergence-triggered process, or user-triggered (in
>     case of postcopy).  It doesn't have a quantified process but only
>     "COMPLETED" / "IN_PROGRESS".

Is there really no way to track migration progress approximately?

Here's the relevant doc comment:

    # @current-progress: Progress made until now.  The unit is arbitrary
    #     and the value can only meaningfully be used for the ratio of
    #     @current-progress to @total-progress.  The value is
    #     monotonically increasing.
    #
    # @total-progress: Estimated @current-progress value at the completion
    #     of the job.  This value can arbitrarily change while the job is
    #     running, in both directions.

I think this should work fine for convergence-triggered finish.
@current-progress could be the number of "things" sent (for some
arbitrary, convenient choice of "things").  Monotonotically increasing.
@total-progress then would have to be a more or less rough estimate of
@current-progress plus what still needs to be sent.  For RAM,
@current-progress could be number of pages sent, ane @total-progress
could be number of pages sent + (possibly estimated) number of dirty
pages.  Multiply by page size if that makes adding the estimated size of
the non-RAM transfers easier.

I haven't thought about postcopy.

>   - Another very major example that I have discussed a few times
>     previously, Jobs are close attached to AioContext, while migration
>     doesn't have, meanwhile migration is moving even further away from
>     event driven model..  See:
>
>     https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t
>
>   There're just too many example showing that Jobs are defined almost only
>   for block layer.. e.g. job-finalize (which may not make much sense in a
>   migration context anyway..) mentions finalizing of graph changes, which
>   also doesn't exist in migration process.
>
>   So if we rewrite migration somehow with Jobs or keeping migration in mind
>   designing Jobs, Jobs may need to be very bloated containing both
>   migration and block layer requirements.
>
> - Migration involves "two" QEMU instances instead of one
>
>   I'm guessing existing Jobs operations are not as such, and providing such
>   mechanisms in "Jobs" only for migration may introduce unnecessary code
>   that block layer will never use.
>
>   E.g. postcopy migration attached the two QEMU instances to represent one
>   VM instance.  I do not have a clear picture in mind yet on how we can
>   manage that if we see it as two separate Jobs on each side, and what
>   happens if each side operates on its own Job with different purposes, and
>   how we should connect two Jobs to say they're relevant (or maybe we don't
>   need to?).
>
> - More challenges on dest QEMU (VM loader) than src QEMU
>
>   Unlike on the src side, the dest QEMU, when in an incoming state, is not
>   a VM at all yet, but waiting to receive the migration data to become a
>   working VM. It's not a generic long term process, but a pure listening
>   port of QEMU where QEMU can do nothing without this "job" being
>   completed..
>
>   If we think about CPR it's even more complicated, because we essential
>   require part of incoming process to happen before almost everything.. it
>   may even include monitors being initialized.
>
> - Deep integration with other subsystems
>
>   Migration is deeply integrated into many other subsystems (auto-converge
>   being able to throttle vCPUs, RAM being able to ignore empty pages
>   reported from balloons, dirty trackings per-module, etc.), so we're not
>   sure if there'll be some limitation from Jobs (when designed with block
>   layer in mind) that will make such transition harder.
>
>   For example, we at least want to make sure Jobs won't have simple locks
>   that will be held while running migration, that can further deadlock if
>   the migration code may invoke something else that tries to re-take the
>   Jobs lock, which may cause dead-locks.
>
>   Or, since migration runs nowadays with quite some threads concurrently,
>   whether the main migration Job can always properly synchronize between
>   all of them with no problem (maybe yes, but I just don't know Jobs enough
>   to say).  This is also a relevant question about how much AioContext
>   plays a role in core of Jobs idea and whether it can work well with
>   complicated threaded environment.

Fair points!

Which ones are due to the external Job interface, and which ones are
"only" due to its current implementation?

Thanks a lot for writing all this!



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements)
  2025-11-20 10:30           ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster
@ 2025-11-20 12:16             ` Kevin Wolf
  0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2025-11-20 12:16 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Peter Xu, qemu-devel, farosas, John Snow

Am 20.11.2025 um 11:30 hat Markus Armbruster geschrieben:
> Peter Xu <peterx@redhat.com> writes:
> > On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote:
> >> [*] If the job abstraction had been available in time, migration would
> >> totally be a job.  There's no *design* reason for it being not a job.
> >> Plenty of implementation and backward compatibility reasons, though.
> >
> > There might be something common between Jobs that block uses and a
> > migration process.  If so, we can provide CommonJob and make MigrationJob
> > and BlockJobs dependent on it.

Conceptually, live migration and the mirror block job are _really_
similar. You have a bulk copy phase and you keep copying data that has
changed to bring both sides in sync. When both sides are close enough,
you stop new changes from coming in, copy the small remainder and finish
the thing.

The main difference is that mirror copies disk content whereas live
migration mostly copies RAM. But that's irrelevant conceptually.

So it makes a lot of sense to me that the same user-visible state
machine should be applicable to both.

(I'm not saying that we have to do this, just that I expect it to be
possible.)

> > Possible challenges of adopting Jobs in migration flow
> > ======================================================
> >
> > - Many Jobs defined property doesn't directly suite migration
> >
> >   - JobStatus is not directly suitable for migration purposes.  There're
> >     some of the JobStatus that I can't think of any use
> >     (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we
> >     can simply not use it), but there're other status that migration needs
> >     but isn't availble. Introducing them seems to be an overkill instead to
> >     block layer's use case.

Which other status does live migration have that the user cares about?

Does it have to be a status on the Job level or could it be some kind of
substatus that could be represented by job-specific information in
query-jobs? (Which doesn't exist yet, but I think we have been talking
about it multiple times before.)

> The Job abstraction defines possible states and state transitions.  Each
> job finds its own path from the initial state @created to the final
> state @concluded.  If a state doesn't make sense for a certain type of
> job, it simply doesn't go there.

Note that most states are really managed by the common Job abstraction.
The job itself only goes potentially from RUNNING to READY, and then
implicitly to WAITING/ABORTING when it returns from .run().

Everything else only exists so that management tools can orchestrate
jobs in the right way and can query errors before the job disappears.

I'm not sure if WAITING is really useless for migration. In theory, you
could have a job transaction of some mirror jobs for the disks and live
migration for the device state, and of course you want both to finish
and switch over at the same time. I'm also not completely sure if it
will actually be used in practice, but the Job infrastructure gives you
the option for free.

PENDING and the associated job-finalize already exists in live migration
in the form of pause-before-switchover/pre-switchover status and
migrate-continue. So I don't think you can argue you have no use for it.

> So, job states migration doesn't want are only a problem if there is no
> path from start to finish that doesn't go through unwanted states.
> 
> There may also be states migration wants that aren't job states.  We
> could make them job states.  Or we map multiple migration states to a
> single job state, i.e. have the job state *abstract* from migration
> state details.
> 
> >   - Similarly to JobVerb.  E.g. JOB_VERB_CHANGE doesn't seem to apply to
> >     any concept to migration, but it misses quite some others
> >     (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more).

How is SET_DOWNTIME or POSTCOPY_START not just a form of CHANGE?

> JobVerb is used internally to restrict certain job commands to certain
> job states.  For instance, command job-dismiss is rejected unless job is
> in state @concluded.
> 
> This governs the generic job-FOO commands.  It also covers the legacy
> block-job-FOO commands, because these wrap around the same C core as the
> job-FOO commands.
> 
> We could have commands specific to a certain job type (say migration
> jobs) that bypass the JobVerb infrastructure, and do their own thing to
> restrict themselves to certain states.  Probably stupid if the states
> that matter are job states.  Probably necessary if they aren't (say a
> more fine-grained migration state).

I suspect we would have to look at specific examples to figure out how
to represent them best. In general, I think a generic job-change (to be
added as a more generic version of block-job-change) and job-specific
data in query-jobs can cover a lot.

You may want to have job-specific QMP events outside of the Job
mechanism, or we could have a generic one to notify the user that
something in the queryable state has changed.

> >   - Similarly, JobInfo reports in current-progress (which is not optional
> >     but required), which may make perfect sense for block jobs. However
> >     migration is OTOH convergence-triggered process, or user-triggered (in
> >     case of postcopy).  It doesn't have a quantified process but only
> >     "COMPLETED" / "IN_PROGRESS".
> 
> Is there really no way to track migration progress approximately?

Of course there is. mirror in its default configuration is no different.
When things are dirtied, the amount of work to do simply grows.

> Here's the relevant doc comment:
> 
>     # @current-progress: Progress made until now.  The unit is arbitrary
>     #     and the value can only meaningfully be used for the ratio of
>     #     @current-progress to @total-progress.  The value is
>     #     monotonically increasing.
>     #
>     # @total-progress: Estimated @current-progress value at the completion
>     #     of the job.  This value can arbitrarily change while the job is
>     #     running, in both directions.
> 
> I think this should work fine for convergence-triggered finish.
> @current-progress could be the number of "things" sent (for some
> arbitrary, convenient choice of "things").  Monotonotically increasing.
> @total-progress then would have to be a more or less rough estimate of
> @current-progress plus what still needs to be sent.  For RAM,
> @current-progress could be number of pages sent, ane @total-progress
> could be number of pages sent + (possibly estimated) number of dirty
> pages.  Multiply by page size if that makes adding the estimated size of
> the non-RAM transfers easier.
> 
> I haven't thought about postcopy.

Postcopy should ask for every "thing" only once, so if you knew the
number of remaining "things" when you switched to postcopy, you can
simply continue to increase current-progress and leave total-progress
unchanged (which might already be what automatically happens because the
number of dirty pages can't grow on an inactive source instance any
more).

> >   - Another very major example that I have discussed a few times
> >     previously, Jobs are close attached to AioContext, while migration
> >     doesn't have, meanwhile migration is moving even further away from
> >     event driven model..  See:
> >
> >     https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t

The main loop of a job runs in a coroutine in an AioContext, yes, and
that is the context where you will get callbacks from, too. If the job
doesn't explicitly put itself anywhere else, it will be in the main
thread. Nothing stops you from firing off as many worker threads as you
want to, though.

This separation of the migration thread and its management interface
running in a different thread isn't new, though: Your existing QMP
commands always run in the main thread, too.

> >   There're just too many example showing that Jobs are defined almost only
> >   for block layer.. e.g. job-finalize (which may not make much sense in a
> >   migration context anyway..) mentions finalizing of graph changes, which
> >   also doesn't exist in migration process.

s/graph changes/switchover/ for migration.

I suppose "changes to global state" might be a more accurate generic
description.

> >   So if we rewrite migration somehow with Jobs or keeping migration in mind
> >   designing Jobs, Jobs may need to be very bloated containing both
> >   migration and block layer requirements.
> >
> > - Migration involves "two" QEMU instances instead of one
> >
> >   I'm guessing existing Jobs operations are not as such, and providing such
> >   mechanisms in "Jobs" only for migration may introduce unnecessary code
> >   that block layer will never use.
> >
> >   E.g. postcopy migration attached the two QEMU instances to represent one
> >   VM instance.  I do not have a clear picture in mind yet on how we can
> >   manage that if we see it as two separate Jobs on each side, and what
> >   happens if each side operates on its own Job with different purposes, and
> >   how we should connect two Jobs to say they're relevant (or maybe we don't
> >   need to?).

Don't you already run two different code paths on the source and the
destination host? Why would they be the same job?

Block migration isn't very different. You typically have an NBD export
running on one side and have a mirror job connecting to it on the other
side. One part doesn't make sense without the other, but that doesn't
mean that they are the same thing.

> > - More challenges on dest QEMU (VM loader) than src QEMU
> >
> >   Unlike on the src side, the dest QEMU, when in an incoming state, is not
> >   a VM at all yet, but waiting to receive the migration data to become a
> >   working VM. It's not a generic long term process, but a pure listening
> >   port of QEMU where QEMU can do nothing without this "job" being
> >   completed..
> >
> >   If we think about CPR it's even more complicated, because we essential
> >   require part of incoming process to happen before almost everything.. it
> >   may even include monitors being initialized.

I'm not sure that the destination side should be a job. I suppose with
postcopy migration you have at least some background task, but with
precopy you don't even have that.

Is there much that the user can manage on the destination side apart
from just waiting for migration to finish?

> > - Deep integration with other subsystems
> >
> >   Migration is deeply integrated into many other subsystems (auto-converge
> >   being able to throttle vCPUs, RAM being able to ignore empty pages
> >   reported from balloons, dirty trackings per-module, etc.), so we're not
> >   sure if there'll be some limitation from Jobs (when designed with block
> >   layer in mind) that will make such transition harder.
> >
> >   For example, we at least want to make sure Jobs won't have simple locks
> >   that will be held while running migration, that can further deadlock if
> >   the migration code may invoke something else that tries to re-take the
> >   Jobs lock, which may cause dead-locks.

I don't know why anything in the actual system emulation would want to
access the migration job and take its locks?

But that aside, jobs are made for long running background tasks. While
they have a mutex to protect the generic state, you're not supposed to
hold it for a long time.

> >   Or, since migration runs nowadays with quite some threads concurrently,
> >   whether the main migration Job can always properly synchronize between
> >   all of them with no problem (maybe yes, but I just don't know Jobs enough
> >   to say).  This is also a relevant question about how much AioContext
> >   plays a role in core of Jobs idea and whether it can work well with
> >   complicated threaded environment.

Why wouldn't they be able to do that when a QMP handler in the main loop
can do it correctly?

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] migration: Error fixes and improvements
  2025-11-19 20:58         ` Peter Xu
  2025-11-20 10:30           ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster
@ 2025-11-21 12:38           ` Fabiano Rosas
  1 sibling, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2025-11-21 12:38 UTC (permalink / raw)
  To: Peter Xu, Markus Armbruster; +Cc: qemu-devel

Peter Xu <peterx@redhat.com> writes:

> On Wed, Nov 19, 2025 at 08:45:39AM +0100, Markus Armbruster wrote:
>
> [...]
>
>> The hairy part is the background task.
>> 
>> I believe it used to simply do its job, reporting errors to stderr along
>> the way, until it either succeeded or failed.  The errors reported made
>> success / failure "obvious" for users.
>> 
>> This can report multiple errors, which can be confusing.
>> 
>> Worse, it was no good for management applications.  These need to
>> observe migration as a state machine, with final success and error
>> states, where the error state comes with an indication of what went
>> wrong.  So we made migration store the first of certain errors in the
>> migration state in addition to reporting to stderr.
>> 
>> "First", because we store only when the state doesn't already have an
>> error.  "Certain", because I doubt we do it for all errors we report.
>> 
>> Compare this to how jobs solve this problem.  These are a much, much
>> later invention, and designed for management applications from the
>> start[*].  A job is a state machine.  Management applications can
>> observe and control the state.  Errors are not supposed to be reported,
>> they should be fed to the state machine, which goes into an error state
>> then.  The job is not supposed to do actual work in an error state.
>> Therefore, no further errors should be possible.  When something goes
>> wrong, we get a single error, stored in the job state, where the
>> management application can find it.
>> 
>> Migration is also a state machine, and we long ago retrofitted the means
>> for management applications to observe and control the state.  What we
>> haven't done is the disciplined feeding of errors to the state machine.
>> We can still get multiple errors.  We store the first of certain errors
>> where the managament application can find it, but whether that error
>> suffices to explain what went wrong is a crap shot.  As long as that's
>> the case, we need to spew the other errors to stderr, where a human can
>> find it.
>
> Since above mentioned once more on the possibility of reusing Jobs idea, I
> did try to list things explicitly this time, that why I think it should be
> challenging and maybe not as worthwhile (?) to do so, however I might be
> wrong.  I attached it at the end of this email almost for myself in the
> future to reference, please feel free comment, or, to ignore all of those!
> IMHO it's not directly relevant to the error reporting issues.
>
> IMHO rewriting migration with Jobs will not help much in error reporting,
> because the challenge for refactoring from migration side is not the "Jobs"
> interfacing, but internally of migration.  Say, even if migration provided
> a "job", it's the "job" impl that did error reporting bad, not the Jobs
> interfacing.. the "job" impl will need to manage quite some threads on its
> own, making sure errors are properly reported at least to the "job"
> interface.
>
> Said that, I totally agree we should try to improve error reporting in
> migration.. with / without Jobs.
>
> [...]
>
>> > Maybe I should ping Vladimir on his recent work here?
>> >
>> > https://lore.kernel.org/r/20251028231347.194844-1-vsementsov@yandex-team.ru
>> >
>> > That'll be part of such cleanup effort (and yes unfortunately many
>> > migration related cleanups will need a lot of code churns...).
>> 
>> I know...
>> 
>> Can we afford modest efforts to reduce the mess one step at a time?
>
> Yes, I'll try to follow up on that.
>
> [...]
>
>> [*] If the job abstraction had been available in time, migration would
>> totally be a job.  There's no *design* reason for it being not a job.
>> Plenty of implementation and backward compatibility reasons, though.
>
> There might be something common between Jobs that block uses and a
> migration process.  If so, we can provide CommonJob and make MigrationJob
> and BlockJobs dependent on it.
>
> However, I sincerely don't know how much common function will there be.
> IOW, I doubt even in an imaginery world, if we could go back to when Jobs
> was designed and if we would make migration a Job too (note!  snapshots is
> definitely a too simple migration scenario..).  Is it possible after
> evaluation we still don't?  I don't know, but I think it's possible.
>
> Thanks!
> Peter
>
>
>
>
> Possible challenges of adopting Jobs in migration flow
> ======================================================
>
> - Many Jobs defined property doesn't directly suite migration
>
>   - JobStatus is not directly suitable for migration purposes.  There're
>     some of the JobStatus that I can't think of any use
>     (e.g. JOB_STATUS_WAITING, JOB_STATUS_PENDING, which is fine, because we
>     can simply not use it), but there're other status that migration needs
>     but isn't availble. Introducing them seems to be an overkill instead to
>     block layer's use case.
>
>   - Similarly to JobVerb.  E.g. JOB_VERB_CHANGE doesn't seem to apply to
>     any concept to migration, but it misses quite some others
>     (e.g. JOB_VERB_SET_DOWNTIME, JOB_VERB_POSTCOPY_START, and more).
>
>   - Similarly, JobInfo reports in current-progress (which is not optional
>     but required), which may make perfect sense for block jobs. However
>     migration is OTOH convergence-triggered process, or user-triggered (in
>     case of postcopy).  It doesn't have a quantified process but only
>     "COMPLETED" / "IN_PROGRESS".
>
>   - Another very major example that I have discussed a few times
>     previously, Jobs are close attached to AioContext, while migration
>     doesn't have, meanwhile migration is moving even further away from
>     event driven model..  See:
>
>     https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com/#t
>
>   There're just too many example showing that Jobs are defined almost only
>   for block layer.. e.g. job-finalize (which may not make much sense in a
>   migration context anyway..) mentions finalizing of graph changes, which
>   also doesn't exist in migration process.
>
>   So if we rewrite migration somehow with Jobs or keeping migration in mind
>   designing Jobs, Jobs may need to be very bloated containing both
>   migration and block layer requirements.
>
> - Migration involves "two" QEMU instances instead of one
>
>   I'm guessing existing Jobs operations are not as such, and providing such
>   mechanisms in "Jobs" only for migration may introduce unnecessary code
>   that block layer will never use.
>
>   E.g. postcopy migration attached the two QEMU instances to represent one
>   VM instance.  I do not have a clear picture in mind yet on how we can
>   manage that if we see it as two separate Jobs on each side, and what
>   happens if each side operates on its own Job with different purposes, and
>   how we should connect two Jobs to say they're relevant (or maybe we don't
>   need to?).
>
> - More challenges on dest QEMU (VM loader) than src QEMU
>
>   Unlike on the src side, the dest QEMU, when in an incoming state, is not
>   a VM at all yet, but waiting to receive the migration data to become a
>   working VM. It's not a generic long term process, but a pure listening
>   port of QEMU where QEMU can do nothing without this "job" being
>   completed..
>
>   If we think about CPR it's even more complicated, because we essential
>   require part of incoming process to happen before almost everything.. it
>   may even include monitors being initialized.
>
> - Deep integration with other subsystems
>
>   Migration is deeply integrated into many other subsystems (auto-converge
>   being able to throttle vCPUs, RAM being able to ignore empty pages
>   reported from balloons, dirty trackings per-module, etc.), so we're not
>   sure if there'll be some limitation from Jobs (when designed with block
>   layer in mind) that will make such transition harder.
>
>   For example, we at least want to make sure Jobs won't have simple locks
>   that will be held while running migration, that can further deadlock if
>   the migration code may invoke something else that tries to re-take the
>   Jobs lock, which may cause dead-locks.
>
>   Or, since migration runs nowadays with quite some threads concurrently,
>   whether the main migration Job can always properly synchronize between
>   all of them with no problem (maybe yes, but I just don't know Jobs enough
>   to say).  This is also a relevant question about how much AioContext
>   plays a role in core of Jobs idea and whether it can work well with
>   complicated threaded environment.

Thanks for looking into this, Peter! I'm saving it for future reference
as well! It was on my todo list to make such an analysis.

I hope Markus can comment on some of those and maybe we can still find a
way to converge, but I think I agree that migration is (at this point) a
little too particular to be retrofitted (which I'd be very much in favor
of, if it were at all feasible).

(wondering what happened in QEMU historically that we devised so many
well designed interfaces, but chose to leave migration aside altogether)

(maybe this right here is what happened)


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-11-22  1:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-15  8:34 [PATCH 0/3] migration: Error fixes and improvements Markus Armbruster
2025-11-15  8:34 ` [PATCH 1/3] migration: Plug memory leaks after migrate_set_error() Markus Armbruster
2025-11-15  8:34 ` [PATCH 2/3] migration: Use warn_reportf_err() where appropriate Markus Armbruster
2025-11-17 15:47   ` Fabiano Rosas
2025-11-15  8:35 ` [PATCH 3/3] migration/postcopy-ram: Improve error reporting after loadvm failure Markus Armbruster
2025-11-17 15:50   ` Fabiano Rosas
2025-11-17 16:03 ` [PATCH 0/3] migration: Error fixes and improvements Peter Xu
2025-11-18  7:44   ` Markus Armbruster
2025-11-18 17:35     ` Peter Xu
2025-11-19  7:45       ` Markus Armbruster
2025-11-19 20:58         ` Peter Xu
2025-11-20 10:30           ` Migration and the Job abstraction (was: [PATCH 0/3] migration: Error fixes and improvements) Markus Armbruster
2025-11-20 12:16             ` Kevin Wolf
2025-11-21 12:38           ` [PATCH 0/3] migration: Error fixes and improvements Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).