Re: [RFC PATCH] Fix race in live migration failure path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Shivam Kumar <shivam.kumar1@nutanix.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [RFC PATCH] Fix race in live migration failure path
Date: Thu, 23 Jan 2025 11:27:15 -0500	[thread overview]
Message-ID: <Z5Jt434QBy92NnBM@x1n> (raw)
In-Reply-To: <35F19D15-7FD0-43D1-B6A0-2FBB5FD9313B@nutanix.com>

On Thu, Jan 23, 2025 at 09:53:16AM +0000, Shivam Kumar wrote:
> 
> 
> > On 22 Jan 2025, at 10:10 PM, Peter Xu <peterx@redhat.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  CAUTION: External Email
> > 
> > |-------------------------------------------------------------------!
> > 
> > Hi, Shivam,
> > 
> > On Wed, Jan 22, 2025 at 10:54:17AM +0000, Shivam Kumar wrote:
> >> There is one place where we set the migration status to FAILED but we don’t set
> >> s->error, i.e. in save_snapshot workflow, please check qemu_savevm_state but
> >> not sure setting s->error in this case is possible (or required), as it seems a
> >> different workflow to me.
> > 
> > Yes it's very different indeed.  I may not yet fully get the challenge on
> > how switching to s->error (implies FAILING) would affect this one, though.
> > I mean, in qemu_savevm_state() it tries to fetch qemufile errors with:
> > 
> >    ret = qemu_file_get_error(f);
> > 
> > IIUC we could also try to fetch s->error just like what migration thread
> > does, and if it sets means it's failing?
> Yes, I can just set s->error in qemu_savevm_state.
> @@ -1686,7 +1686,6 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
> -        status = MIGRATION_STATUS_FAILED;
> +       s->error = *errp;
> But my question was: is migrate_fd_cleanup called (where we will set the status
> to FAILED if s->error is set) in the snapshot workflow? 

I see what you meant.  It's not called indeed. We may need to process it
the same as what migrate_fd_cleanup() does.

So far the snapshot code reuses migration code in a partial way, so it's
not crystal clear where the line is, e.g., obviously it still moves the
migration state machine but it never shows "active" phase at all (even if
it has a major chunk of duration that it's actively migrating the data to
the snapshot files).  Here the state machine only goes from SETUP to either
FAILED or COMPLETED.

From that POV looks like we should duplicate such s->error detection logic
here on deciding whether to switch to FAILED, if we treat s->error as the
internal "single failure point" for migration.

> > 
> >> 
> >> In addition, one potentially real problem that I see is this comment in
> >> migration_detect_error:
> >> /*
> >> * For postcopy, we allow the network to be down for a
> >> * while. After that, it can be continued by a
> >> * recovery phase.
> >> */
> >> Let's say if we set s->error at some place and there was a file error on either
> >> source or destination (qemu_file_get_error_obj_any returns a positive value
> > 
> > This is trivial, but I suppose you meant s/positive/negative/ here.. as
> > qemufile's last_error should always be negative, iiuc.
> > 
> >> when called by migration_detect_error). We expect migration to fail in this
> >> case but migration will continue to run since post-copy migration is tolerant
> >> to file errors?
> > 
> > Yes it can halt at postcopy_pause().  I'm not yet understand why it's an
> > issue to using s->error, though.
> > 
> > In general, I'm thinking whether we could also check s->error in detect
> > error path like this:
> > 
> > ===8<===
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 2d1da917c7..fbd97395e0 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -3015,17 +3015,17 @@ static MigThrError migration_detect_error(MigrationState *s)
> >     ret = qemu_file_get_error_obj_any(s->to_dst_file,
> >                                       s->postcopy_qemufile_src,
> >                                       &local_error);
> > -    if (!ret) {
> > -        /* Everything is fine */
> > -        assert(!local_error);
> > -        return MIG_THR_ERR_NONE;
> > -    }
> > -
> > -    if (local_error) {
> > +    if (ret) {
> > +        /* Passover qemufile errors to s->error */
> > +        assert(local_error);
> >         migrate_set_error(s, local_error);
> >         error_free(local_error);
> >     }
> > 
> > +    if (!migrate_has_error(s)) {
> > +        return MIG_THR_ERR_NONE;
> > +    }
> > +
> >     if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
> >         /*
> >          * For postcopy, we allow the network to be down for a
> > @@ -3037,6 +3037,8 @@ static MigThrError migration_detect_error(MigrationState *s)
> >         /*
> >          * For precopy (or postcopy with error outside IO), we fail
> >          * with no time.
> > +         *
> > +         * TODO: update FAILED only until the end of migration in BH.
> >          */
> >         migrate_set_state(&s->state, state, MIGRATION_STATUS_FAILED);
> >         trace_migration_thread_file_err();
> > ===8<===
> > 
> > I kept a TODO above, I would hope if you reworked everything to route
> > errors to s->error, then we can move this to the cleanup BH to avoid the
> > race.
> > 
> > Do you think that could work?
> 
> I meant: in case of post-copy, what if we have another error somewhere and
> s->error was set, but then we also saw a file error when we called
> qemu_file_get_error_obj_any. In this case, migration should fail IMO but it
> would be paused instead, right?

Yeah you got a point, but I see no good reason to cancel any postcopy
migration, no matter which error it is - either a qemufile error or another
- simply because postcopy cancel means VM crash.  There's nothing worse
that that..

So IMHO we could treat it the same as EIO errors in this case as of now,
and we always pause postcopy no matter which kind of error it hits.  At
least for non-recoverable errors we can have some active process to look
at on src QEMU instance, OTOH there's no direct benefit for us to
differenciate different error cases to crash VM earlier.

Thanks,

-- 
Peter Xu

next prev parent reply	other threads:[~2025-01-23 16:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 10:07 [RFC PATCH] Fix race in live migration failure path Shivam Kumar
2025-01-10 13:09 ` Fabiano Rosas
2025-01-13 16:29   ` Peter Xu
2025-01-22 10:54     ` Shivam Kumar
2025-01-22 16:40       ` Peter Xu
2025-01-23  9:53         ` Shivam Kumar
2025-01-23 16:27           ` Peter Xu [this message]
2025-01-23 18:05             ` Shivam Kumar
2025-01-23 20:40               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5Jt434QBy92NnBM@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=qemu-devel@nongnu.org \
    --cc=shivam.kumar1@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.