All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH 2/3] migration: Make migration_has_failed() work even for CANCELLING
Date: Wed, 17 Sep 2025 18:00:42 -0400	[thread overview]
Message-ID: <aMsvitYEzHao0i83@x1.local> (raw)
In-Reply-To: <87wm5wvm1l.fsf@suse.de>

On Wed, Sep 17, 2025 at 05:52:54PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > We set CANCELLED very late, it means migration_has_failed() may not work
> > correctly if it's invoked before updating CANCELLING to CANCELLED.
> >
> 
> The prophecy is fulfilled.
> 
> https://wiki.qemu.org/ToDo/LiveMigration#Migration_cancel_concurrency
> 
> I'm not sure I'm convinced, for instance, CANCELLING is part of
> migration_is_running(), while FAILED is not. This doesn't seem
> right. Another point is that CANCELLING is not a final state, so we're
> prone to later need a migration_has_finished_failing_now() helper. =)

Considering we only have two users so far, and the other user doesn't care
about CANCELLING (while the multifd shutdown cares?), then I assume it's ok
to treat CANCELLING to be "has failed"? :)  I didn't try to interpret "has
failed" in English, but only for the sake of an universal helper that works
for both places.

Or maybe it can be is_failing() too?  I don't have a strong feeling.

> 
> My mental model is that CANCELLING is a transitional, ongoing state
> where we shouldn't really be making assumptions. Once FAILED is reached,
> then we're sure in which general state everything is.
> 
> How did you catch this? It was one of the cancel tests that failed? I
> just noticed that multifd_send_shutdown() is called from
> migration_cleanup() before it changes the state to CANCELLED. So current
> code also has whatever issue you detected here.

No test failed, it was only by code observation, mentioned below [1],
exactly as you said.

I just think when cancelling the tls sessions, we shouldn't dump the error
messages anymore even if the bye failed.  Or maybe we simply do not need to
invoke migration_tls_channel_end() when CANCELLING / FAILED?  That's
relevant to your ask on the cover letter, we can discuss there.

This is very trivial.  Let me know how you thinks.  I can also drop this
patch when repost v3 but fix the postcopy warning first, which reliably
reproduce now with qtest.

> 
> > Allow that state will make migration_has_failed() working as expected even
> > if it's invoked slightly earlier.
> >
> > One current user is the multifd code for the TLS graceful termination,
> > where it's before updating to CANCELLED.

[1]

> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 7015c2b5e0..397917b1b3 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1723,7 +1723,8 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
> >  
> >  bool migration_has_failed(MigrationState *s)
> >  {
> > -    return (s->state == MIGRATION_STATUS_CANCELLED ||
> > +    return (s->state == MIGRATION_STATUS_CANCELLING ||
> > +            s->state == MIGRATION_STATUS_CANCELLED ||
> >              s->state == MIGRATION_STATUS_FAILED);
> >  }
> 

-- 
Peter Xu



  reply	other threads:[~2025-09-17 22:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 16:01 [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-10 16:01 ` [PATCH 1/3] migration/tls: Gracefully shutdown main and preempt channels Peter Xu
2025-09-17 20:22   ` Fabiano Rosas
2025-09-10 16:01 ` [PATCH 2/3] migration: Make migration_has_failed() work even for CANCELLING Peter Xu
2025-09-17 20:52   ` Fabiano Rosas
2025-09-17 22:00     ` Peter Xu [this message]
2025-09-18 13:43       ` Fabiano Rosas
2025-09-10 16:01 ` [PATCH 3/3] migration/multifd: Use the new graceful termination helper Peter Xu
2025-09-17 21:07   ` Fabiano Rosas
2025-09-11 13:13 ` [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-17 20:56   ` Fabiano Rosas
2025-09-17 21:50     ` Peter Xu
2025-09-18 13:47       ` Fabiano Rosas
2025-09-18 16:15         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMsvitYEzHao0i83@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.