From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: Prasad Pandit <ppandit@redhat.com>,
qemu-devel@nongnu.org, Prasad Pandit <pjp@fedoraproject.org>
Subject: Re: [PATCH] migration: introduce MIGRATION_STATUS_FAILING
Date: Tue, 23 Dec 2025 13:01:03 -0300 [thread overview]
Message-ID: <877budkx1c.fsf@suse.de> (raw)
In-Reply-To: <aUq1oA73W9rAdCgG@x1.local>
Peter Xu <peterx@redhat.com> writes:
> On Mon, Dec 22, 2025 at 11:29:57AM -0300, Fabiano Rosas wrote:
>> I'm fine with the general idea:
>>
>> i) FAILED and CANCELLED are terminal states. It makes sense to not have
>> work happen after they're set.
>>
>> ii) Using an intermediate state, assuming locking/atomic are correct is
>> a suitable fix for the issue.
>>
>> iii) Using a FAILING status seems appropriate.
>>
>> However,
>>
>> It would be great if we could stop exposing implementation details via
>> QAPI. Does the user really need to see events for CANCELLING and
>> FAILING?
>>
>> It would probably be easier if we kept MigrationStatus as QAPI only and
>> used a separate mechanism to track the internal states.
>>
>> That said, we could merge this as is to fix the bug and think about that
>> later.
>
> This bug looks to be there for a long time, IMHO we don't need to rush
> fixing it if we risk adding a new status and revert it quickly... Let's
> discuss it here, and it's a valid question indeed.
>
> One thing good about exposing such status via QAPI is, it can help us
> diagnose issues by seeing CANCELLING / FAILING even looking at
> query-migrate results (as normally when bug happens we can't see the
> internal status..), so that we know either it's explicitly cancelled, or
> something went wrong.
>
> If it's a completely hidden / internal status, we may see ACTIVE even if
> something wrong happened..
>
On the other hand, we could have more fine-grained statuses and track
them with tracepoints.
> My current hope is any mgmt should normally by default ignore new migration
> states.. If that's always achieved, it looks to me adding FAILING directly
> into migration status would still have some benefits on debugging.
>
Maybe for FAILING it's ok, as we already have CANCELLING and FAILED is
currently mismatched.
> [...]
>
>> > @@ -2907,7 +2914,7 @@ fail_closefb:
>> > qemu_fclose(fb);
>> > fail:
>> > if (ms->state != MIGRATION_STATUS_CANCELLING) {
>> > - migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
>> > + migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILING);
>> > }
>>
>> This is a good example where having MigrationStatus makes the code more
>> complicated. This could be exiting=true, running=false, etc. It
>> shouldn't matter why this routine failed. If we reach
>> migration_cleanup() and, at the very end, state is CANCELLING, then we
>> know the cancel command has caused this and set the state to
>> CANCELLED. If the state was something else, then it's unintended and we
>> set FAILED.
>
> If it'll be an internal status, we'll still need to identify if someone
> already have cancelled it, right?
>
I was thinking of keeping CANCELLING, since it's already in the API.
> Assuming we introduce stop_reason flag, making it:
>
> enum {
> MIG_STOP_REASON_CANCEL,
> MIG_STOP_REASON_FAIL,
> } MigrationStopReason;
>
> Then we can switch to CANCELLED / FAILED when cleanup from those reasons.
>
> Then here, logically we also need logic like:
>
> if (stop_reason != MIG_STOP_REASON_CANCEL) {
> stop_reason = MIG_STOP_REASON_FAIL;
> }
>
> Because we want to make sure when the user already triggered cancel, it
> won't show FAILED but only show CANCELLED at last?
next prev parent reply other threads:[~2025-12-23 16:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 11:48 [PATCH] migration: introduce MIGRATION_STATUS_FAILING Prasad Pandit
2025-12-22 14:29 ` Fabiano Rosas
2025-12-23 11:17 ` Prasad Pandit
2025-12-23 15:04 ` Fabiano Rosas
2026-01-06 10:54 ` Prasad Pandit
2026-01-06 14:53 ` Peter Xu
2025-12-23 15:30 ` Peter Xu
2025-12-23 16:01 ` Fabiano Rosas [this message]
2026-01-06 11:45 ` Prasad Pandit
2026-01-06 13:47 ` Fabiano Rosas
2026-01-07 10:49 ` Prasad Pandit
2026-01-07 13:23 ` Fabiano Rosas
2026-01-12 13:05 ` Prasad Pandit
2026-01-12 19:45 ` Fabiano Rosas
2026-01-13 11:59 ` Prasad Pandit
2026-01-14 12:16 ` Fabiano Rosas
2026-01-06 15:09 ` Peter Xu
2026-01-07 11:11 ` Prasad Pandit
2026-01-07 17:03 ` Peter Xu
2025-12-23 15:17 ` Peter Xu
2025-12-23 15:36 ` Fabiano Rosas
2025-12-23 16:27 ` Peter Xu
2026-01-06 10:49 ` Prasad Pandit
2026-01-06 15:23 ` Peter Xu
2026-01-07 11:01 ` Prasad Pandit
2026-01-07 17:07 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877budkx1c.fsf@suse.de \
--to=farosas@suse.de \
--cc=peterx@redhat.com \
--cc=pjp@fedoraproject.org \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.