From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH 2/3] migration: Make migration_has_failed() work even for CANCELLING
Date: Thu, 18 Sep 2025 10:43:19 -0300 [thread overview]
Message-ID: <87ldmbvpu0.fsf@suse.de> (raw)
In-Reply-To: <aMsvitYEzHao0i83@x1.local>
Peter Xu <peterx@redhat.com> writes:
> On Wed, Sep 17, 2025 at 05:52:54PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > We set CANCELLED very late, it means migration_has_failed() may not work
>> > correctly if it's invoked before updating CANCELLING to CANCELLED.
>> >
>>
>> The prophecy is fulfilled.
>>
>> https://wiki.qemu.org/ToDo/LiveMigration#Migration_cancel_concurrency
>>
>> I'm not sure I'm convinced, for instance, CANCELLING is part of
>> migration_is_running(), while FAILED is not. This doesn't seem
>> right. Another point is that CANCELLING is not a final state, so we're
>> prone to later need a migration_has_finished_failing_now() helper. =)
>
> Considering we only have two users so far, and the other user doesn't care
> about CANCELLING (while the multifd shutdown cares?), then I assume it's ok
> to treat CANCELLING to be "has failed"? :) I didn't try to interpret "has
> failed" in English, but only for the sake of an universal helper that works
> for both places.
>
> Or maybe it can be is_failing() too? I don't have a strong feeling.
>
I'm not nitipicking on language. I'm pointing out that CANCELLING is a
transitory state, i.e. from migrate_cancel() until migrate_cleanup(),
while FAILED is a terminal state, nothing happens after it.
But fine, I guess it's really only *my* assumptions being broken and not
the ones in the code.
>>
>> My mental model is that CANCELLING is a transitional, ongoing state
>> where we shouldn't really be making assumptions. Once FAILED is reached,
>> then we're sure in which general state everything is.
>>
>> How did you catch this? It was one of the cancel tests that failed? I
>> just noticed that multifd_send_shutdown() is called from
>> migration_cleanup() before it changes the state to CANCELLED. So current
>> code also has whatever issue you detected here.
>
> No test failed, it was only by code observation, mentioned below [1],
> exactly as you said.
>
> I just think when cancelling the tls sessions, we shouldn't dump the error
> messages anymore even if the bye failed.
Ok
> Or maybe we simply do not need to
> invoke migration_tls_channel_end() when CANCELLING / FAILED? That's
> relevant to your ask on the cover letter, we can discuss there.
>
> This is very trivial.
Nah, let me review the patch properly, please.
> Let me know how you thinks. I can also drop this
> patch when repost v3 but fix the postcopy warning first, which reliably
> reproduce now with qtest.
>
>>
>> > Allow that state will make migration_has_failed() working as expected even
>> > if it's invoked slightly earlier.
>> >
>> > One current user is the multifd code for the TLS graceful termination,
>> > where it's before updating to CANCELLED.
>
> [1]
>
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> > migration/migration.c | 3 ++-
>> > 1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index 7015c2b5e0..397917b1b3 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -1723,7 +1723,8 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
>> >
>> > bool migration_has_failed(MigrationState *s)
>> > {
>> > - return (s->state == MIGRATION_STATUS_CANCELLED ||
>> > + return (s->state == MIGRATION_STATUS_CANCELLING ||
>> > + s->state == MIGRATION_STATUS_CANCELLED ||
>> > s->state == MIGRATION_STATUS_FAILED);
>> > }
>>
next prev parent reply other threads:[~2025-09-18 13:44 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 16:01 [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-10 16:01 ` [PATCH 1/3] migration/tls: Gracefully shutdown main and preempt channels Peter Xu
2025-09-17 20:22 ` Fabiano Rosas
2025-09-10 16:01 ` [PATCH 2/3] migration: Make migration_has_failed() work even for CANCELLING Peter Xu
2025-09-17 20:52 ` Fabiano Rosas
2025-09-17 22:00 ` Peter Xu
2025-09-18 13:43 ` Fabiano Rosas [this message]
2025-09-10 16:01 ` [PATCH 3/3] migration/multifd: Use the new graceful termination helper Peter Xu
2025-09-17 21:07 ` Fabiano Rosas
2025-09-11 13:13 ` [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-17 20:56 ` Fabiano Rosas
2025-09-17 21:50 ` Peter Xu
2025-09-18 13:47 ` Fabiano Rosas
2025-09-18 16:15 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ldmbvpu0.fsf@suse.de \
--to=farosas@suse.de \
--cc=berrange@redhat.com \
--cc=jmarcin@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.