All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>,
	Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp
Date: Mon, 13 Jun 2022 12:13:05 +0100	[thread overview]
Message-ID: <Yqcbwemb7I/MpGWG@work-vm> (raw)
In-Reply-To: <YqNTDSV4P05pb+9l@xz-m1.local>

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Jun 09, 2022 at 05:02:29PM -0400, Peter Xu wrote:
> > On Wed, Jun 08, 2022 at 06:05:28PM +0100, Dr. David Alan Gilbert wrote:
> > > > @@ -2005,7 +2005,17 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
> > > >      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
> > > >       * in migration.c
> > > >       */
> > > > -    cpu_synchronize_all_post_init();
> > > > +    cpu_synchronize_all_post_init(&local_err);
> > > > +    if (local_err) {
> > > > +        /*
> > > > +         * TODO: a better way to do this is to tell the src that we cannot
> > > > +         * run the VM here so hopefully we can keep the VM running on src
> > > > +         * and immediately halt the switch-over.  But that needs work.
> > > 
> > > Yes, I think it is possible; unlike some of the later errors in the same
> > > function, in this case we know no disks/network/etc have been touched,
> > > so we should be able to recover.
> > > I wonder if we can move the postcopy_state_set(POSTCOPY_INCOMING_RUNNING)
> > > out of loadvm_postcopy_handle_run to after this point.
> > > 
> > > We've already got the return path, so we should be able to signal the
> > > failure unless we're very unlucky.
> > 
> > Right.  It's just that for the new ACK we may need to modify the return
> > path protocol for sure, because none of the existing ones can notify such
> > an information.
> > 
> > One idea is to reuse MIG_RP_MSG_RESUME_ACK, it was only used for postcopy
> > recovery before to do the final handshake with offload=1 only (which is
> > defined as MIGRATION_RESUME_ACK_VALUE).  We could try to fill in the
> > payload with some !1 value, to tell the source that we NACK the migration
> > then src fails the migration as long as possible?
> > 
> > That seems to be even compatibile with one old qemu migrating to a new qemu
> > scenario, because when the old qemu notices the MIG_RP_MSG_RESUME_ACK
> > message with !1 payload, it'll mark the rp bad:
> 
> Oh it won't be compatible..  The clean way to do this is we need to modify
> the src qemu to halt in postcopy_start() to wait for that ack before
> continue.  That may need another cap/param to enable.

OK; I was wondering aobut sending a RP_MSG_SHUT with a failure; but if
you'd need to change the source it's still a problem.

> The thing is I'm not very sure whether this will be worth it.
> 
> Non-compatible migrations should be rare on put register failures.  For the
> issue I was working on, it was actually a kernel bug that triggered it but
> it's just hard to figure out where's wrong.  With properly working kernels
> and matching hosts they should just not really heppen.  I'm worried adding
> too much complexity could over-engineer things without much benefits.

OK that makes sense.

> In that case, I'd think it proper if we start with what this patchset
> provides, which at least allows us to fail in a crystal clear way?

Yes, the clear error is important.

Dave

> > 
> >   if (migrate_handle_rp_resume_ack(ms, tmp32)) {
> >       mark_source_rp_bad(ms);
> >       goto out;
> >   }
> > 
> >   static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> >   {
> >       trace_source_return_path_thread_resume_ack(value);
> >   
> >       if (value != MIGRATION_RESUME_ACK_VALUE) {
> >           error_report("%s: illegal resume_ack value %"PRIu32,
> >                        __func__, value);
> >           return -1;
> >       }
> >       ...
> >   }
> > 
> > If it looks generally good, I can try with such a change in v2.
> > 
> > Thanks,
> > 
> > -- 
> > Peter Xu
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2022-06-13 11:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
2022-06-07 23:06 ` [PATCH RFC 1/5] cpus-common: Introduce run_on_cpu_func2 which allows error returns Peter Xu
2022-06-07 23:06 ` [PATCH RFC 2/5] cpus-common: Add run_on_cpu2() Peter Xu
2022-06-07 23:06 ` [PATCH RFC 3/5] accel: Allow synchronize_post_init() to take an Error** Peter Xu
2022-06-07 23:06 ` [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp Peter Xu
2022-06-08 17:05   ` Dr. David Alan Gilbert
2022-06-09 21:02     ` Peter Xu
2022-06-10 14:19       ` Peter Xu
2022-06-13 11:13         ` Dr. David Alan Gilbert [this message]
2022-06-07 23:06 ` [PATCH RFC 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yqcbwemb7I/MpGWG@work-vm \
    --to=dgilbert@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rth@twiddle.net \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.