From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, Juraj Marcin <jmarcin@redhat.com>
Cc: qemu-devel@nongnu.org, Jiri Denemark <jdenemar@redhat.com>,
"Dr. David Alan Gilbert" <dave@treblig.org>
Subject: Re: [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish()
Date: Mon, 22 Sep 2025 14:40:37 -0300 [thread overview]
Message-ID: <877bxqtmga.fsf@suse.de> (raw)
In-Reply-To: <aNFwdic2-d_Crctc@x1.local>
Peter Xu <peterx@redhat.com> writes:
> On Mon, Sep 22, 2025 at 02:58:38PM +0200, Juraj Marcin wrote:
>> Hi Fabiano,
>>
>> On 2025-09-19 13:46, Fabiano Rosas wrote:
>> > Juraj Marcin <jmarcin@redhat.com> writes:
>> >
>> > Hi Juraj,
>> >
>> > Good patch, nice use of migrate_has_failed()
>>
>> Thanks!
>>
>> >
>> > > From: Juraj Marcin <jmarcin@redhat.com>
>> > >
>> > > Currently, there are two functions that are responsible for cleanup of
>> > > the incoming migration state. With successful precopy, it's the main
>> > > thread and with successful postcopy it's the listen thread. However, if
>> > > postcopy fails during in the device load, both functions will try to do
>> > > the cleanup. Moreover, when exit-on-error parameter was added, it was
>> > > applied only to precopy.
>> > >
>> >
>> > Someone could be relying in postcopy always exiting on error while
>> > explicitly setting exit-on-error=false for precopy and this patch would
>> > change the behavior incompatibly. Is this an issue? I'm willing to
>> > ignore it, but you guys know more about postcopy.
>>
>> Good question. When going through older patches where postcopy listen
>> thread and then where exit-on-error were implemented, it seemed more
>> like an overlook than intentional omission. However, it might be better
>> to not break any potential users of this, we could add another option,
>> "exit-on-postcopy-error" that would allow such handling if postscopy
>> failed unrecoverably. I've already talked about such option with
>> @jdenemar and he agreed with it.
>
> The idea for postcopy ram is, it should never fail.. as failing should
> never be better than a pause. Block dirty bitmap might be different,
> though, when enabled separately.
>
> For postcopy-ram, qemu_loadvm_state_main() will in reality only receive RAM
> updates. It'll almost always trigger the postcopy_pause_incoming() path
> when anything fails.
>
> For pure block-dirty-bitmap-only styled postcopy: for this exit-on-error, I
> also don't think we should really "exit on errors", even if the flag is
> set. IIUC, it's not fatal to the VM if that failed, as described in:
>
> commit ee64722514fabcad2430982ade86180208f5be4f
> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Date: Mon Jul 27 22:42:32 2020 +0300
>
> migration/savevm: don't worry if bitmap migration postcopy failed
>
> ...
>
> And anyway, bitmaps postcopy is not prepared to be somehow recovered.
> The original idea instead is that if bitmaps postcopy failed, we just
> lose some bitmaps, which is not critical. So, on failure we just need
> to remove unfinished bitmaps and guest should continue execution on
> destination.
>
> Hence, exit here might be an overkill.. need block developers to double
> check, though..
>
>>
>> >
>> > > This patch refactors common cleanup and exiting on error into a helper
>> > > function that can be started either from precopy or postcopy, reducing
>> > > the duplication. If the listen thread has been started (the postcopy
>> > > state is at least LISTENING), the listen thread is responsible for all
>> > > cleanup and exiting, otherwise it's the main thread's responsibility.
>> >
>> > Don't the BHs also run in the main thread? I'm not sure this sentence is
>> > accurate.
>>
>> Yeah, it is a bit inaccurate now that you mention it, BHs are indeed run
>> in the main thread. More accurate would to replace the main thread with
>> process incoming migration coroutine, that would precisely describe who
>> has the responsibility to call the cleanup.
>>
>> >
>> > >
>> > > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
>> > > ---
>> > > migration/migration.c | 64 ++++++++++++++++++++++++-------------------
>> > > migration/migration.h | 1 +
>> > > migration/savevm.c | 48 +++++++++++---------------------
>> >
>> > Could someone act on the TODOs and move postcopy code into postcopy-ram?
>> > It's never too late to make things consistent.
>>
>> I can take a look.
>>
>> >
>> > > 3 files changed, 53 insertions(+), 60 deletions(-)
>> > >
>> > > diff --git a/migration/migration.c b/migration/migration.c
>> > > index 2c0b3a7229..7222e3de13 100644
>> > > --- a/migration/migration.c
>> > > +++ b/migration/migration.c
>> > > @@ -442,9 +442,19 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
>> > > void migration_incoming_state_destroy(void)
>> > > {
>> > > struct MigrationIncomingState *mis = migration_incoming_get_current();
>> > > + PostcopyState ps = postcopy_state_get();
>> > >
>> > > multifd_recv_cleanup();
>> > >
>> > > + if (mis->have_listen_thread) {
>> > > + qemu_thread_join(&mis->listen_thread);
>> > > + mis->have_listen_thread = false;
>> > > + }
>> > > +
>> > > + if (ps != POSTCOPY_INCOMING_NONE) {
>> > > + postcopy_ram_incoming_cleanup(mis);
>> > > + }
>> > > +
>> > > /*
>> > > * RAM state cleanup needs to happen after multifd cleanup, because
>> > > * multifd threads can use some of its states (receivedmap).
>> > > @@ -809,6 +819,23 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
>> > > cpr_state_close();
>> > > }
>> > >
>> > > +void migration_incoming_finish(void)
>> > > +{
>> > > + MigrationState *s = migrate_get_current();
>> > > + MigrationIncomingState *mis = migration_incoming_get_current();
>> > > +
>> > > + migration_incoming_state_destroy();
>> > > +
>> > > + if (migration_has_failed(mis->state) && mis->exit_on_error) {
>> > > + WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> > > + error_report_err(s->error);
>> > > + s->error = NULL;
>> > > + }
>> > > +
>> > > + exit(EXIT_FAILURE);
>> > > + }
>> > > +}
>> > > +
>> > > static void process_incoming_migration_bh(void *opaque)
>> > > {
>> > > MigrationIncomingState *mis = opaque;
>> > > @@ -861,7 +888,7 @@ static void process_incoming_migration_bh(void *opaque)
>> > > */
>> > > migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> > > MIGRATION_STATUS_COMPLETED);
>> > > - migration_incoming_state_destroy();
>> > > + migration_incoming_finish();
>> > > }
>> > >
>> > > static void coroutine_fn
>> > > @@ -888,23 +915,13 @@ process_incoming_migration_co(void *opaque)
>> > >
>> > > ps = postcopy_state_get();
>> > > trace_process_incoming_migration_co_end(ret, ps);
>> > > - if (ps != POSTCOPY_INCOMING_NONE) {
>> > > - if (ps == POSTCOPY_INCOMING_ADVISE) {
>> > > - /*
>> > > - * Where a migration had postcopy enabled (and thus went to advise)
>> > > - * but managed to complete within the precopy period, we can use
>> > > - * the normal exit.
>> > > - */
>> > > - postcopy_ram_incoming_cleanup(mis);
>> > > - } else if (ret >= 0) {
>> > > - /*
>> > > - * Postcopy was started, cleanup should happen at the end of the
>> > > - * postcopy thread.
>> > > - */
>> > > - trace_process_incoming_migration_co_postcopy_end_main();
>> > > - goto out;
>> > > - }
>> > > - /* Else if something went wrong then just fall out of the normal exit */
>> > > + if (ps >= POSTCOPY_INCOMING_LISTENING) {
>> > > + /*
>> > > + * Postcopy was started, cleanup should happen at the end of the
>> > > + * postcopy thread.
>> > > + */
>> > > + trace_process_incoming_migration_co_postcopy_end_main();
>> > > + goto out;
>> > > }
>> > >
>> > > if (ret < 0) {
>> > > @@ -926,16 +943,7 @@ fail:
>> > > migrate_set_error(s, local_err);
>> > > error_free(local_err);
>> > >
>> > > - migration_incoming_state_destroy();
>> > > -
>> > > - if (mis->exit_on_error) {
>> > > - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> > > - error_report_err(s->error);
>> > > - s->error = NULL;
>> > > - }
>> > > -
>> > > - exit(EXIT_FAILURE);
>> > > - }
>> > > + migration_incoming_finish();
>> > > out:
>> > > /* Pairs with the refcount taken in qmp_migrate_incoming() */
>> > > migrate_incoming_unref_outgoing_state();
>> > > diff --git a/migration/migration.h b/migration/migration.h
>> > > index 2c2331f40d..67e3318467 100644
>> > > --- a/migration/migration.h
>> > > +++ b/migration/migration.h
>> > > @@ -518,6 +518,7 @@ void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
>> > > void migration_fd_process_incoming(QEMUFile *f);
>> > > void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>> > > void migration_incoming_process(void);
>> > > +void migration_incoming_finish(void);
>> >
>> > What about merging migration_incoming_state_destroy and
>> > migration_incoming_finish and pair all of this with migration_cleanup?
>> >
>> > static void migration_cleanup_bh(void *opaque)
>> > {
>> > migration_cleanup(opaque);
>> > }
>> >
>> > static void migration_incoming_cleanup_bh(void *opaque)
>> > {
>> > migration_incoming_cleanup(opaque);
>> > }
>>
>> Yes, it would pair well. I guess it would also solve Peter's nitpick
>> about the change in process_incoming_migration_bh() above.
>>
>> > >
>> > > bool migration_has_all_channels(void);
>> > >
>> > > diff --git a/migration/savevm.c b/migration/savevm.c
>> > > index fabbeb296a..d7eb416d48 100644
>> > > --- a/migration/savevm.c
>> > > +++ b/migration/savevm.c
>> > > @@ -2069,6 +2069,11 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
>> > > return 0;
>> > > }
>> > >
>> > > +static void postcopy_ram_listen_thread_bh(void *opaque)
>> > > +{
>> > > + migration_incoming_finish();
>> > > +}
>> > > +
>> > > /*
>> > > * Triggered by a postcopy_listen command; this thread takes over reading
>> > > * the input stream, leaving the main thread free to carry on loading the rest
>> > > @@ -2122,52 +2127,31 @@ static void *postcopy_ram_listen_thread(void *opaque)
>> > > "bitmaps may be lost, and present migrated dirty "
>> > > "bitmaps are correctly migrated and valid.",
>> > > __func__, load_res);
>> > > - load_res = 0; /* prevent further exit() */
>> > > } else {
>> > > error_report("%s: loadvm failed: %d", __func__, load_res);
>> > > migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> > > MIGRATION_STATUS_FAILED);
>> > > + goto out;
>> > > }
>> > > }
>> > > - if (load_res >= 0) {
>> > > - /*
>> > > - * This looks good, but it's possible that the device loading in the
>> > > - * main thread hasn't finished yet, and so we might not be in 'RUN'
>> > > - * state yet; wait for the end of the main thread.
>> > > - */
>> > > - qemu_event_wait(&mis->main_thread_load_event);
>> > > - }
>> > > - postcopy_ram_incoming_cleanup(mis);
>> > > -
>> > > - if (load_res < 0) {
>> > > - /*
>> > > - * If something went wrong then we have a bad state so exit;
>> > > - * depending how far we got it might be possible at this point
>> > > - * to leave the guest running and fire MCEs for pages that never
>> > > - * arrived as a desperate recovery step.
>> > > - */
>> > > - rcu_unregister_thread();
>> > > - exit(EXIT_FAILURE);
>> > > - }
>> > > + /*
>> > > + * This looks good, but it's possible that the device loading in the
>> > > + * main thread hasn't finished yet, and so we might not be in 'RUN'
>> > > + * state yet; wait for the end of the main thread.
>> > > + */
>> > > + qemu_event_wait(&mis->main_thread_load_event);
>
> PS: I didn't notice this change, looks like this may be better to be a
> separate patch when moving out of the if. Meanwhile, I don't think we set
> it right either, in qemu_loadvm_state():
>
> qemu_event_set(&mis->main_thread_load_event);
>
> The problem is e.g. load_snapshot / qmp_xen_load_devices_state also set
> that event, even if there'll be no one to consume it.. not a huge deal, but
> maybe while moving it out of the if, we can also cleanup the set() side
> too, by moving the set() upper into process_incoming_migration_co().
>
>> > >
>> > > migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> > > MIGRATION_STATUS_COMPLETED);
>> > > - /*
>> > > - * If everything has worked fine, then the main thread has waited
>> > > - * for us to start, and we're the last use of the mis.
>> > > - * (If something broke then qemu will have to exit anyway since it's
>> > > - * got a bad migration state).
>> > > - */
>> > > - bql_lock();
>> > > - migration_incoming_state_destroy();
>> > > - bql_unlock();
>> > >
>> > > +out:
>> > > rcu_unregister_thread();
>> > > - mis->have_listen_thread = false;
>> > > postcopy_state_set(POSTCOPY_INCOMING_END);
>> > >
>> > > object_unref(OBJECT(migr));
>> > >
>> > > + migration_bh_schedule(postcopy_ram_listen_thread_bh, NULL);
>> >
>> > Better to schedule before the object_unref to ensure there's always
>> > someone holding a reference?
>>
>> True, I'll move it.
>
> Good point. Though I'm not sure moving it upper would help, because it'll
> be the BH that references the MigrationState*..
A while ago we wrapped QEMU BHs with migration_bh_dispatch_bh. The BHs
don't need to hold or drop MigrationState reference anymore because the
dispatch callback does it:
void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
{
...
object_ref(OBJECT(s));
qemu_bh_schedule(bh);
}
static void migration_bh_dispatch_bh(void *opaque)
{
...
migbh->cb(migbh->opaque);
object_unref(OBJECT(s));
...
}
> So maybe we could unref at
> the end of postcopy_ram_listen_thread_bh(). If so, we should add a comment
> on ref() / unref() saying how they're paired.
>
>>
>> >
>> > > +
>> > > return NULL;
>> > > }
>> > >
>> > > @@ -2217,7 +2201,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>> > > mis->have_listen_thread = true;
>> > > postcopy_thread_create(mis, &mis->listen_thread,
>> > > MIGRATION_THREAD_DST_LISTEN,
>> > > - postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
>> > > + postcopy_ram_listen_thread, QEMU_THREAD_JOINABLE);
>> > > trace_loadvm_postcopy_handle_listen("return");
>> > >
>> > > return 0;
>> >
>>
next prev parent reply other threads:[~2025-09-22 17:42 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 11:59 [PATCH 0/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-09-15 11:59 ` [PATCH 1/4] migration: Do not try to start VM if disk activation fails Juraj Marcin
2025-09-19 16:12 ` Fabiano Rosas
2025-09-15 11:59 ` [PATCH 2/4] migration: Accept MigrationStatus in migration_has_failed() Juraj Marcin
2025-09-19 14:57 ` Peter Xu
2025-09-22 11:26 ` Juraj Marcin
2025-09-15 11:59 ` [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish() Juraj Marcin
2025-09-19 15:53 ` Peter Xu
2025-09-19 16:46 ` Fabiano Rosas
2025-09-22 12:58 ` Juraj Marcin
2025-09-22 15:51 ` Peter Xu
2025-09-22 17:40 ` Fabiano Rosas [this message]
2025-09-22 17:48 ` Peter Xu
2025-09-23 14:58 ` Juraj Marcin
2025-09-23 16:17 ` Peter Xu
2025-09-15 11:59 ` [PATCH 4/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-09-19 16:58 ` Peter Xu
2025-09-19 17:50 ` Peter Xu
2025-09-22 13:34 ` Juraj Marcin
2025-09-22 16:16 ` Peter Xu
2025-09-23 14:23 ` Juraj Marcin
2025-09-25 11:54 ` Jiří Denemark
2025-09-25 18:22 ` Peter Xu
2025-09-30 7:53 ` Jiří Denemark
2025-09-30 20:04 ` Peter Xu
2025-10-01 8:43 ` Jiří Denemark
2025-10-01 11:05 ` Dr. David Alan Gilbert
2025-10-01 14:26 ` Jiří Denemark
2025-10-01 15:53 ` Dr. David Alan Gilbert
2025-10-01 15:10 ` Daniel P. Berrangé
2025-10-02 12:17 ` Jiří Denemark
2025-10-02 13:12 ` Dr. David Alan Gilbert
2025-10-01 10:09 ` Juraj Marcin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877bxqtmga.fsf@suse.de \
--to=farosas@suse.de \
--cc=dave@treblig.org \
--cc=jdenemar@redhat.com \
--cc=jmarcin@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).