From: Fabiano Rosas <farosas@suse.de>
To: Prasad Pandit <ppandit@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
qemu-devel@nongnu.org, Prasad Pandit <pjp@fedoraproject.org>
Subject: Re: [PATCH] migration: introduce MIGRATION_STATUS_FAILING
Date: Mon, 12 Jan 2026 16:45:16 -0300 [thread overview]
Message-ID: <87y0m2zkc3.fsf@suse.de> (raw)
In-Reply-To: <CAE8KmOx0ikDueu-znY14RCmp6weX_G+CJMUrQOmOuv-OPwPR+Q@mail.gmail.com>
Prasad Pandit <ppandit@redhat.com> writes:
> Hello Fabiano,
>
> On Wed, 7 Jan 2026 at 18:54, Fabiano Rosas <farosas@suse.de> wrote:
>> I like this because it forces us to determine more clearly what is the
>> necessary condition for a state change. This could eventually allow the
>> abstraction of the qapi_event_send_migration() to a higher
>> layer. Something like:
>>
>> void qmp_migrate() {
>> t:migrate=true
>>
>> migration_setup() :: do setup, mig_setup_done=true
>> migration_advance_state() :: checks the triggers, changes state and
>> sends the event
>>
>> migration_start() :: migrate, mig_done=true
>> failure, mig_failed=true
>> etc
>> migration_advance_state()
>>
>> migrate_vmstate() :: device state migration, mig_device_done=true
>> migration_advance_state()
>>
>> etc..
>> }
>>
>> IOW, we could do a better job of separating what is work, what is
>> migration control flow, what is error handling, etc.
>
> * Yes, indeed. Above skeleton code conveys the plausible
> segregation/stages well.
>
>> What I'm trying to convey is that we have:
>>
>> 1) events API that needs to be kept stable, this list of states that
>> libvirt sees and at what moments we emit them.
> ===
> qemuProcessHandleMigrationStatus & qemuMigrationUpdateJobType
> -> https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_process.c#L1766
> -> https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_migration.c?ref_type=heads#L1931
> ===
> * I was trying to see how libvirtd(8) handles QEMU migration states.
> Looking at the above functions there, it seems they don't do much with
> it. Only MIGRATION_STATUS_POSTCOPY_* has some handling, while other
> states are not handled for anything. Interestingly, there's no _FAILED
> state in there, maybe they call it _ERROR.
>
> * While I get the importance of not breaking APIs, still, simplifying
> migration states on the QEMU side should help them too.
>
Also remember libvirt is not the only consumer of events from QEMU,
there are other platforms as well.
>> 2) MigrationStatus being used as an internal record of the current
>> (loosely defined) migration phase. This is "arbitrary", hence we're
>> discussing adding a new MigrationStatus "just" to make sure we don't
>> start a new migration at the wrong moment.
>>
>> I'm trying to understand if you want to cover 1, 2 or both.
>>
>> I would suggest we first take all of the internal tracking, i.e. #2, the
>> "if (state==MIGRATION_STATUS)" code and convert them to use some other
>> state tracking, either the triggers as you suggest, or random booleans
>> sprinkled all over, it's not immediately important.
>>
>> Once that is done, then we could freeze the #1, MigrationStatus. It
>> would only change whenever we wanted to change the API and that should
>> be a well documented change.
>
> * Yes, sounds good. We could start with the QEMU internal state/phase
> tracking and then go to #1 above once we see how it all works in
> practice.
>
>> Ok, maybe I'm splittling hairs here, I was trying to understand whether
>> all of these "if (s->state ...)" have the same semantics.
>>
>> a) For cases such as CANCELLING: that could be a simple
>> s->trigger[MIGRATE_CANCEL]=1.
>>
>> (we're not removing the CANCELLING state due to the API stability, but
>> still)
>>
>> b) For error conditions: s->event[FAILED]=1, then (possibly at a later
>> point in migration_change_state):
>>
>> if (s->event[FAILED] && !s->trigger[MIGRATE_CANCEL]) {
>> migrate_set_state(s->state, MIGRATION_STATUS_FAILED);
>> }
>
> * Do we have to check !MIGRATE_CANCEL like this? It's not clean.
There are failures that happen _because_ we cancelled. As I've mentioned
somewhere else before, the cancellation is not "informed" to all threads
running migration code, there are some code paths that will simply fail
as a result of migration_cancel(). We need to allow cancelling to work
in a possibly stuck thread (such as a blocked recv in the return path),
but this means we end up calling qemu_file_shutdown indiscriminately.
In these cases, parts of the code would set FAILED, but that failure is
a result of cancelling. We've determined that migrate-cancel should
always lead to CANCELLED and a new migration should always be possible.
> Ideally if an error/failure event occurs before the user cancels, then
> cancel can be ignored, no? Because migration is anyway going to stop
> or end.
This is ok, call it an error and done.
> OTOH, if we cancel while processing an error/failure, end user
> may not see that error because we report - migration was cancelled.
>
This is interesting, I _think_ it wouldn't be possible to cancel while
handling an error due to BQL locked, the migrate-cancel wouldn't be
issued while migration_cleanup is ongoing. However, I don't think we ever
tested this scenario in particular. Maybe you could try to catch
something by modifying the /migration/cancel tests, if you're willing.
>> b) For postcopy resume/pause, etc, maybe an actual state machine that can
>> only be in one state would be helpful.
>>
>> c) For "we reached this point, so set this state", most of those could
>> just be an invocation to migration_change_state() and, as you
>> suggest, that would look for the evidence elsewhere to know what
>> state to set:
>>
>> if (s->trigger[MIGRATE] && s->event[COMPLETED]) {
>> migrate_set_state(s->state, MIGRATION_STATUS_COMPLETED);
>> }
>
> * Yes, right. We need to define/differentiate between _what_ is the
> state and _why_ is that state.
>
> * How do we go from here? Next step?
>
You could send a PoC patch with your idea fixing this FAILING bug? We'd
need a trigger for migrate, set_caps, etc and the failed event.
If that new patch doesn't get consensus then we merge this one and work
on a new design as time permits.
---
Aside from the QAPI states, there are some internal states we already
track with separate flags, e.g.:
rp_thread_created, start_postcopy, migration_thread_running,
switchover_acked, postcopy_package_loaded, fault_thread_quit,
preempt_thread_status, load_threads_abort.
A bit array could maybe cover all of these and more.
next prev parent reply other threads:[~2026-01-12 19:46 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 11:48 [PATCH] migration: introduce MIGRATION_STATUS_FAILING Prasad Pandit
2025-12-22 14:29 ` Fabiano Rosas
2025-12-23 11:17 ` Prasad Pandit
2025-12-23 15:04 ` Fabiano Rosas
2026-01-06 10:54 ` Prasad Pandit
2026-01-06 14:53 ` Peter Xu
2025-12-23 15:30 ` Peter Xu
2025-12-23 16:01 ` Fabiano Rosas
2026-01-06 11:45 ` Prasad Pandit
2026-01-06 13:47 ` Fabiano Rosas
2026-01-07 10:49 ` Prasad Pandit
2026-01-07 13:23 ` Fabiano Rosas
2026-01-12 13:05 ` Prasad Pandit
2026-01-12 19:45 ` Fabiano Rosas [this message]
2026-01-13 11:59 ` Prasad Pandit
2026-01-14 12:16 ` Fabiano Rosas
2026-01-06 15:09 ` Peter Xu
2026-01-07 11:11 ` Prasad Pandit
2026-01-07 17:03 ` Peter Xu
2025-12-23 15:17 ` Peter Xu
2025-12-23 15:36 ` Fabiano Rosas
2025-12-23 16:27 ` Peter Xu
2026-01-06 10:49 ` Prasad Pandit
2026-01-06 15:23 ` Peter Xu
2026-01-07 11:01 ` Prasad Pandit
2026-01-07 17:07 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y0m2zkc3.fsf@suse.de \
--to=farosas@suse.de \
--cc=peterx@redhat.com \
--cc=pjp@fedoraproject.org \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.