qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH 2/3] migration: Make migration_has_failed() work even for CANCELLING
Date: Thu, 18 Sep 2025 10:43:19 -0300	[thread overview]
Message-ID: <87ldmbvpu0.fsf@suse.de> (raw)
In-Reply-To: <aMsvitYEzHao0i83@x1.local>

Peter Xu <peterx@redhat.com> writes:

> On Wed, Sep 17, 2025 at 05:52:54PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > We set CANCELLED very late, it means migration_has_failed() may not work
>> > correctly if it's invoked before updating CANCELLING to CANCELLED.
>> >
>> 
>> The prophecy is fulfilled.
>> 
>> https://wiki.qemu.org/ToDo/LiveMigration#Migration_cancel_concurrency
>> 
>> I'm not sure I'm convinced, for instance, CANCELLING is part of
>> migration_is_running(), while FAILED is not. This doesn't seem
>> right. Another point is that CANCELLING is not a final state, so we're
>> prone to later need a migration_has_finished_failing_now() helper. =)
>
> Considering we only have two users so far, and the other user doesn't care
> about CANCELLING (while the multifd shutdown cares?), then I assume it's ok
> to treat CANCELLING to be "has failed"? :)  I didn't try to interpret "has
> failed" in English, but only for the sake of an universal helper that works
> for both places.
>
> Or maybe it can be is_failing() too?  I don't have a strong feeling.
>

I'm not nitipicking on language. I'm pointing out that CANCELLING is a
transitory state, i.e. from migrate_cancel() until migrate_cleanup(),
while FAILED is a terminal state, nothing happens after it.

But fine, I guess it's really only *my* assumptions being broken and not
the ones in the code.

>> 
>> My mental model is that CANCELLING is a transitional, ongoing state
>> where we shouldn't really be making assumptions. Once FAILED is reached,
>> then we're sure in which general state everything is.
>> 
>> How did you catch this? It was one of the cancel tests that failed? I
>> just noticed that multifd_send_shutdown() is called from
>> migration_cleanup() before it changes the state to CANCELLED. So current
>> code also has whatever issue you detected here.
>
> No test failed, it was only by code observation, mentioned below [1],
> exactly as you said.
>
> I just think when cancelling the tls sessions, we shouldn't dump the error
> messages anymore even if the bye failed.

Ok

> Or maybe we simply do not need to
> invoke migration_tls_channel_end() when CANCELLING / FAILED?  That's
> relevant to your ask on the cover letter, we can discuss there.
>
> This is very trivial.

Nah, let me review the patch properly, please.

> Let me know how you thinks.  I can also drop this
> patch when repost v3 but fix the postcopy warning first, which reliably
> reproduce now with qtest.
>
>> 
>> > Allow that state will make migration_has_failed() working as expected even
>> > if it's invoked slightly earlier.
>> >
>> > One current user is the multifd code for the TLS graceful termination,
>> > where it's before updating to CANCELLED.
>
> [1]
>
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  migration/migration.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index 7015c2b5e0..397917b1b3 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -1723,7 +1723,8 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
>> >  
>> >  bool migration_has_failed(MigrationState *s)
>> >  {
>> > -    return (s->state == MIGRATION_STATUS_CANCELLED ||
>> > +    return (s->state == MIGRATION_STATUS_CANCELLING ||
>> > +            s->state == MIGRATION_STATUS_CANCELLED ||
>> >              s->state == MIGRATION_STATUS_FAILED);
>> >  }
>> 


  reply	other threads:[~2025-09-18 13:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 16:01 [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-10 16:01 ` [PATCH 1/3] migration/tls: Gracefully shutdown main and preempt channels Peter Xu
2025-09-17 20:22   ` Fabiano Rosas
2025-09-10 16:01 ` [PATCH 2/3] migration: Make migration_has_failed() work even for CANCELLING Peter Xu
2025-09-17 20:52   ` Fabiano Rosas
2025-09-17 22:00     ` Peter Xu
2025-09-18 13:43       ` Fabiano Rosas [this message]
2025-09-10 16:01 ` [PATCH 3/3] migration/multifd: Use the new graceful termination helper Peter Xu
2025-09-17 21:07   ` Fabiano Rosas
2025-09-11 13:13 ` [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-17 20:56   ` Fabiano Rosas
2025-09-17 21:50     ` Peter Xu
2025-09-18 13:47       ` Fabiano Rosas
2025-09-18 16:15         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ldmbvpu0.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=berrange@redhat.com \
    --cc=jmarcin@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).