Re: [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish()

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, Juraj Marcin <jmarcin@redhat.com>
Cc: qemu-devel@nongnu.org, Jiri Denemark <jdenemar@redhat.com>,
	"Dr. David Alan Gilbert" <dave@treblig.org>
Subject: Re: [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish()
Date: Mon, 22 Sep 2025 14:40:37 -0300	[thread overview]
Message-ID: <877bxqtmga.fsf@suse.de> (raw)
In-Reply-To: <aNFwdic2-d_Crctc@x1.local>

Peter Xu <peterx@redhat.com> writes:

> On Mon, Sep 22, 2025 at 02:58:38PM +0200, Juraj Marcin wrote:
>> Hi Fabiano,
>> 
>> On 2025-09-19 13:46, Fabiano Rosas wrote:
>> > Juraj Marcin <jmarcin@redhat.com> writes:
>> > 
>> > Hi Juraj,
>> > 
>> > Good patch, nice use of migrate_has_failed()
>> 
>> Thanks!
>> 
>> > 
>> > > From: Juraj Marcin <jmarcin@redhat.com>
>> > >
>> > > Currently, there are two functions that are responsible for cleanup of
>> > > the incoming migration state. With successful precopy, it's the main
>> > > thread and with successful postcopy it's the listen thread. However, if
>> > > postcopy fails during in the device load, both functions will try to do
>> > > the cleanup. Moreover, when exit-on-error parameter was added, it was
>> > > applied only to precopy.
>> > >
>> > 
>> > Someone could be relying in postcopy always exiting on error while
>> > explicitly setting exit-on-error=false for precopy and this patch would
>> > change the behavior incompatibly. Is this an issue? I'm willing to
>> > ignore it, but you guys know more about postcopy.
>> 
>> Good question. When going through older patches where postcopy listen
>> thread and then where exit-on-error were implemented, it seemed more
>> like an overlook than intentional omission. However, it might be better
>> to not break any potential users of this, we could add another option,
>> "exit-on-postcopy-error" that would allow such handling if postscopy
>> failed unrecoverably. I've already talked about such option with
>> @jdenemar and he agreed with it.
>
> The idea for postcopy ram is, it should never fail.. as failing should
> never be better than a pause.  Block dirty bitmap might be different,
> though, when enabled separately.
>
> For postcopy-ram, qemu_loadvm_state_main() will in reality only receive RAM
> updates. It'll almost always trigger the postcopy_pause_incoming() path
> when anything fails.
>
> For pure block-dirty-bitmap-only styled postcopy: for this exit-on-error, I
> also don't think we should really "exit on errors", even if the flag is
> set.  IIUC, it's not fatal to the VM if that failed, as described in:
>
> commit ee64722514fabcad2430982ade86180208f5be4f
> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Date:   Mon Jul 27 22:42:32 2020 +0300
>
>     migration/savevm: don't worry if bitmap migration postcopy failed
>
>     ...
>
>     And anyway, bitmaps postcopy is not prepared to be somehow recovered.
>     The original idea instead is that if bitmaps postcopy failed, we just
>     lose some bitmaps, which is not critical. So, on failure we just need
>     to remove unfinished bitmaps and guest should continue execution on
>     destination.
>
> Hence, exit here might be an overkill.. need block developers to double
> check, though..
>
>> 
>> > 
>> > > This patch refactors common cleanup and exiting on error into a helper
>> > > function that can be started either from precopy or postcopy, reducing
>> > > the duplication. If the listen thread has been started (the postcopy
>> > > state is at least LISTENING), the listen thread is responsible for all
>> > > cleanup and exiting, otherwise it's the main thread's responsibility.
>> > 
>> > Don't the BHs also run in the main thread? I'm not sure this sentence is
>> > accurate.
>> 
>> Yeah, it is a bit inaccurate now that you mention it, BHs are indeed run
>> in the main thread. More accurate would to replace the main thread with
>> process incoming migration coroutine, that would precisely describe who
>> has the responsibility to call the cleanup.
>> 
>> > 
>> > >
>> > > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
>> > > ---
>> > >  migration/migration.c | 64 ++++++++++++++++++++++++-------------------
>> > >  migration/migration.h |  1 +
>> > >  migration/savevm.c    | 48 +++++++++++---------------------
>> > 
>> > Could someone act on the TODOs and move postcopy code into postcopy-ram?
>> > It's never too late to make things consistent.
>> 
>> I can take a look.
>> 
>> > 
>> > >  3 files changed, 53 insertions(+), 60 deletions(-)
>> > >
>> > > diff --git a/migration/migration.c b/migration/migration.c
>> > > index 2c0b3a7229..7222e3de13 100644
>> > > --- a/migration/migration.c
>> > > +++ b/migration/migration.c
>> > > @@ -442,9 +442,19 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
>> > >  void migration_incoming_state_destroy(void)
>> > >  {
>> > >      struct MigrationIncomingState *mis = migration_incoming_get_current();
>> > > +    PostcopyState ps = postcopy_state_get();
>> > >  
>> > >      multifd_recv_cleanup();
>> > >  
>> > > +    if (mis->have_listen_thread) {
>> > > +        qemu_thread_join(&mis->listen_thread);
>> > > +        mis->have_listen_thread = false;
>> > > +    }
>> > > +
>> > > +    if (ps != POSTCOPY_INCOMING_NONE) {
>> > > +        postcopy_ram_incoming_cleanup(mis);
>> > > +    }
>> > > +
>> > >      /*
>> > >       * RAM state cleanup needs to happen after multifd cleanup, because
>> > >       * multifd threads can use some of its states (receivedmap).
>> > > @@ -809,6 +819,23 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
>> > >      cpr_state_close();
>> > >  }
>> > >  
>> > > +void migration_incoming_finish(void)
>> > > +{
>> > > +    MigrationState *s = migrate_get_current();
>> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
>> > > +
>> > > +    migration_incoming_state_destroy();
>> > > +
>> > > +    if (migration_has_failed(mis->state) && mis->exit_on_error) {
>> > > +        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> > > +            error_report_err(s->error);
>> > > +            s->error = NULL;
>> > > +        }
>> > > +
>> > > +        exit(EXIT_FAILURE);
>> > > +    }
>> > > +}
>> > > +
>> > >  static void process_incoming_migration_bh(void *opaque)
>> > >  {
>> > >      MigrationIncomingState *mis = opaque;
>> > > @@ -861,7 +888,7 @@ static void process_incoming_migration_bh(void *opaque)
>> > >       */
>> > >      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> > >                        MIGRATION_STATUS_COMPLETED);
>> > > -    migration_incoming_state_destroy();
>> > > +    migration_incoming_finish();
>> > >  }
>> > >  
>> > >  static void coroutine_fn
>> > > @@ -888,23 +915,13 @@ process_incoming_migration_co(void *opaque)
>> > >  
>> > >      ps = postcopy_state_get();
>> > >      trace_process_incoming_migration_co_end(ret, ps);
>> > > -    if (ps != POSTCOPY_INCOMING_NONE) {
>> > > -        if (ps == POSTCOPY_INCOMING_ADVISE) {
>> > > -            /*
>> > > -             * Where a migration had postcopy enabled (and thus went to advise)
>> > > -             * but managed to complete within the precopy period, we can use
>> > > -             * the normal exit.
>> > > -             */
>> > > -            postcopy_ram_incoming_cleanup(mis);
>> > > -        } else if (ret >= 0) {
>> > > -            /*
>> > > -             * Postcopy was started, cleanup should happen at the end of the
>> > > -             * postcopy thread.
>> > > -             */
>> > > -            trace_process_incoming_migration_co_postcopy_end_main();
>> > > -            goto out;
>> > > -        }
>> > > -        /* Else if something went wrong then just fall out of the normal exit */
>> > > +    if (ps >= POSTCOPY_INCOMING_LISTENING) {
>> > > +        /*
>> > > +         * Postcopy was started, cleanup should happen at the end of the
>> > > +         * postcopy thread.
>> > > +         */
>> > > +        trace_process_incoming_migration_co_postcopy_end_main();
>> > > +        goto out;
>> > >      }
>> > >  
>> > >      if (ret < 0) {
>> > > @@ -926,16 +943,7 @@ fail:
>> > >      migrate_set_error(s, local_err);
>> > >      error_free(local_err);
>> > >  
>> > > -    migration_incoming_state_destroy();
>> > > -
>> > > -    if (mis->exit_on_error) {
>> > > -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> > > -            error_report_err(s->error);
>> > > -            s->error = NULL;
>> > > -        }
>> > > -
>> > > -        exit(EXIT_FAILURE);
>> > > -    }
>> > > +    migration_incoming_finish();
>> > >  out:
>> > >      /* Pairs with the refcount taken in qmp_migrate_incoming() */
>> > >      migrate_incoming_unref_outgoing_state();
>> > > diff --git a/migration/migration.h b/migration/migration.h
>> > > index 2c2331f40d..67e3318467 100644
>> > > --- a/migration/migration.h
>> > > +++ b/migration/migration.h
>> > > @@ -518,6 +518,7 @@ void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
>> > >  void migration_fd_process_incoming(QEMUFile *f);
>> > >  void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>> > >  void migration_incoming_process(void);
>> > > +void migration_incoming_finish(void);
>> > 
>> > What about merging migration_incoming_state_destroy and
>> > migration_incoming_finish and pair all of this with migration_cleanup?
>> > 
>> > static void migration_cleanup_bh(void *opaque)
>> > {
>> >     migration_cleanup(opaque);
>> > }
>> > 
>> > static void migration_incoming_cleanup_bh(void *opaque)
>> > {
>> >     migration_incoming_cleanup(opaque);
>> > }
>> 
>> Yes, it would pair well. I guess it would also solve Peter's nitpick
>> about the change in process_incoming_migration_bh() above.
>> 
>> > >  
>> > >  bool  migration_has_all_channels(void);
>> > >  
>> > > diff --git a/migration/savevm.c b/migration/savevm.c
>> > > index fabbeb296a..d7eb416d48 100644
>> > > --- a/migration/savevm.c
>> > > +++ b/migration/savevm.c
>> > > @@ -2069,6 +2069,11 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
>> > >      return 0;
>> > >  }
>> > >  
>> > > +static void postcopy_ram_listen_thread_bh(void *opaque)
>> > > +{
>> > > +    migration_incoming_finish();
>> > > +}
>> > > +
>> > >  /*
>> > >   * Triggered by a postcopy_listen command; this thread takes over reading
>> > >   * the input stream, leaving the main thread free to carry on loading the rest
>> > > @@ -2122,52 +2127,31 @@ static void *postcopy_ram_listen_thread(void *opaque)
>> > >                           "bitmaps may be lost, and present migrated dirty "
>> > >                           "bitmaps are correctly migrated and valid.",
>> > >                           __func__, load_res);
>> > > -            load_res = 0; /* prevent further exit() */
>> > >          } else {
>> > >              error_report("%s: loadvm failed: %d", __func__, load_res);
>> > >              migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> > >                                             MIGRATION_STATUS_FAILED);
>> > > +            goto out;
>> > >          }
>> > >      }
>> > > -    if (load_res >= 0) {
>> > > -        /*
>> > > -         * This looks good, but it's possible that the device loading in the
>> > > -         * main thread hasn't finished yet, and so we might not be in 'RUN'
>> > > -         * state yet; wait for the end of the main thread.
>> > > -         */
>> > > -        qemu_event_wait(&mis->main_thread_load_event);
>> > > -    }
>> > > -    postcopy_ram_incoming_cleanup(mis);
>> > > -
>> > > -    if (load_res < 0) {
>> > > -        /*
>> > > -         * If something went wrong then we have a bad state so exit;
>> > > -         * depending how far we got it might be possible at this point
>> > > -         * to leave the guest running and fire MCEs for pages that never
>> > > -         * arrived as a desperate recovery step.
>> > > -         */
>> > > -        rcu_unregister_thread();
>> > > -        exit(EXIT_FAILURE);
>> > > -    }
>> > > +    /*
>> > > +     * This looks good, but it's possible that the device loading in the
>> > > +     * main thread hasn't finished yet, and so we might not be in 'RUN'
>> > > +     * state yet; wait for the end of the main thread.
>> > > +     */
>> > > +    qemu_event_wait(&mis->main_thread_load_event);
>
> PS: I didn't notice this change, looks like this may be better to be a
> separate patch when moving out of the if.  Meanwhile, I don't think we set
> it right either, in qemu_loadvm_state():
>
>     qemu_event_set(&mis->main_thread_load_event);
>
> The problem is e.g. load_snapshot / qmp_xen_load_devices_state also set
> that event, even if there'll be no one to consume it.. not a huge deal, but
> maybe while moving it out of the if, we can also cleanup the set() side
> too, by moving the set() upper into process_incoming_migration_co().
>
>> > >  
>> > >      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> > >                                     MIGRATION_STATUS_COMPLETED);
>> > > -    /*
>> > > -     * If everything has worked fine, then the main thread has waited
>> > > -     * for us to start, and we're the last use of the mis.
>> > > -     * (If something broke then qemu will have to exit anyway since it's
>> > > -     * got a bad migration state).
>> > > -     */
>> > > -    bql_lock();
>> > > -    migration_incoming_state_destroy();
>> > > -    bql_unlock();
>> > >  
>> > > +out:
>> > >      rcu_unregister_thread();
>> > > -    mis->have_listen_thread = false;
>> > >      postcopy_state_set(POSTCOPY_INCOMING_END);
>> > >  
>> > >      object_unref(OBJECT(migr));
>> > >  
>> > > +    migration_bh_schedule(postcopy_ram_listen_thread_bh, NULL);
>> > 
>> > Better to schedule before the object_unref to ensure there's always
>> > someone holding a reference?
>> 
>> True, I'll move it.
>
> Good point.  Though I'm not sure moving it upper would help, because it'll
> be the BH that references the MigrationState*..

A while ago we wrapped QEMU BHs with migration_bh_dispatch_bh. The BHs
don't need to hold or drop MigrationState reference anymore because the
dispatch callback does it:

void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
{
    ...
    object_ref(OBJECT(s));
    qemu_bh_schedule(bh);
}

static void migration_bh_dispatch_bh(void *opaque)
{
    ...        
    migbh->cb(migbh->opaque);
    object_unref(OBJECT(s));
    ...
}

>  So maybe we could unref at
> the end of postcopy_ram_listen_thread_bh().  If so, we should add a comment
> on ref() / unref() saying how they're paired.
>

>> 
>> > 
>> > > +
>> > >      return NULL;
>> > >  }
>> > >  
>> > > @@ -2217,7 +2201,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>> > >      mis->have_listen_thread = true;
>> > >      postcopy_thread_create(mis, &mis->listen_thread,
>> > >                             MIGRATION_THREAD_DST_LISTEN,
>> > > -                           postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
>> > > +                           postcopy_ram_listen_thread, QEMU_THREAD_JOINABLE);
>> > >      trace_loadvm_postcopy_handle_listen("return");
>> > >  
>> > >      return 0;
>> > 
>>

next prev parent reply	other threads:[~2025-09-22 17:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15 11:59 [PATCH 0/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-09-15 11:59 ` [PATCH 1/4] migration: Do not try to start VM if disk activation fails Juraj Marcin
2025-09-19 16:12   ` Fabiano Rosas
2025-09-15 11:59 ` [PATCH 2/4] migration: Accept MigrationStatus in migration_has_failed() Juraj Marcin
2025-09-19 14:57   ` Peter Xu
2025-09-22 11:26     ` Juraj Marcin
2025-09-15 11:59 ` [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish() Juraj Marcin
2025-09-19 15:53   ` Peter Xu
2025-09-19 16:46   ` Fabiano Rosas
2025-09-22 12:58     ` Juraj Marcin
2025-09-22 15:51       ` Peter Xu
2025-09-22 17:40         ` Fabiano Rosas [this message]
2025-09-22 17:48           ` Peter Xu
2025-09-23 14:58         ` Juraj Marcin
2025-09-23 16:17           ` Peter Xu
2025-09-15 11:59 ` [PATCH 4/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-09-19 16:58   ` Peter Xu
2025-09-19 17:50     ` Peter Xu
2025-09-22 13:34       ` Juraj Marcin
2025-09-22 16:16         ` Peter Xu
2025-09-23 14:23           ` Juraj Marcin
2025-09-25 11:54   ` Jiří Denemark
2025-09-25 18:22     ` Peter Xu
2025-09-30  7:53       ` Jiří Denemark
2025-09-30 20:04         ` Peter Xu
2025-10-01  8:43           ` Jiří Denemark
2025-10-01 11:05             ` Dr. David Alan Gilbert
2025-10-01 14:26               ` Jiří Denemark
2025-10-01 15:53                 ` Dr. David Alan Gilbert
2025-10-01 15:10               ` Daniel P. Berrangé
2025-10-02 12:17                 ` Jiří Denemark
2025-10-02 13:12                   ` Dr. David Alan Gilbert
2025-10-01 10:09           ` Juraj Marcin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877bxqtmga.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=dave@treblig.org \
    --cc=jdenemar@redhat.com \
    --cc=jmarcin@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).