qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>,
	Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp
Date: Mon, 13 Jun 2022 12:13:05 +0100	[thread overview]
Message-ID: <Yqcbwemb7I/MpGWG@work-vm> (raw)
In-Reply-To: <YqNTDSV4P05pb+9l@xz-m1.local>

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Jun 09, 2022 at 05:02:29PM -0400, Peter Xu wrote:
> > On Wed, Jun 08, 2022 at 06:05:28PM +0100, Dr. David Alan Gilbert wrote:
> > > > @@ -2005,7 +2005,17 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
> > > >      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
> > > >       * in migration.c
> > > >       */
> > > > -    cpu_synchronize_all_post_init();
> > > > +    cpu_synchronize_all_post_init(&local_err);
> > > > +    if (local_err) {
> > > > +        /*
> > > > +         * TODO: a better way to do this is to tell the src that we cannot
> > > > +         * run the VM here so hopefully we can keep the VM running on src
> > > > +         * and immediately halt the switch-over.  But that needs work.
> > > 
> > > Yes, I think it is possible; unlike some of the later errors in the same
> > > function, in this case we know no disks/network/etc have been touched,
> > > so we should be able to recover.
> > > I wonder if we can move the postcopy_state_set(POSTCOPY_INCOMING_RUNNING)
> > > out of loadvm_postcopy_handle_run to after this point.
> > > 
> > > We've already got the return path, so we should be able to signal the
> > > failure unless we're very unlucky.
> > 
> > Right.  It's just that for the new ACK we may need to modify the return
> > path protocol for sure, because none of the existing ones can notify such
> > an information.
> > 
> > One idea is to reuse MIG_RP_MSG_RESUME_ACK, it was only used for postcopy
> > recovery before to do the final handshake with offload=1 only (which is
> > defined as MIGRATION_RESUME_ACK_VALUE).  We could try to fill in the
> > payload with some !1 value, to tell the source that we NACK the migration
> > then src fails the migration as long as possible?
> > 
> > That seems to be even compatibile with one old qemu migrating to a new qemu
> > scenario, because when the old qemu notices the MIG_RP_MSG_RESUME_ACK
> > message with !1 payload, it'll mark the rp bad:
> 
> Oh it won't be compatible..  The clean way to do this is we need to modify
> the src qemu to halt in postcopy_start() to wait for that ack before
> continue.  That may need another cap/param to enable.

OK; I was wondering aobut sending a RP_MSG_SHUT with a failure; but if
you'd need to change the source it's still a problem.

> The thing is I'm not very sure whether this will be worth it.
> 
> Non-compatible migrations should be rare on put register failures.  For the
> issue I was working on, it was actually a kernel bug that triggered it but
> it's just hard to figure out where's wrong.  With properly working kernels
> and matching hosts they should just not really heppen.  I'm worried adding
> too much complexity could over-engineer things without much benefits.

OK that makes sense.

> In that case, I'd think it proper if we start with what this patchset
> provides, which at least allows us to fail in a crystal clear way?

Yes, the clear error is important.

Dave

> > 
> >   if (migrate_handle_rp_resume_ack(ms, tmp32)) {
> >       mark_source_rp_bad(ms);
> >       goto out;
> >   }
> > 
> >   static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> >   {
> >       trace_source_return_path_thread_resume_ack(value);
> >   
> >       if (value != MIGRATION_RESUME_ACK_VALUE) {
> >           error_report("%s: illegal resume_ack value %"PRIu32,
> >                        __func__, value);
> >           return -1;
> >       }
> >       ...
> >   }
> > 
> > If it looks generally good, I can try with such a change in v2.
> > 
> > Thanks,
> > 
> > -- 
> > Peter Xu
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2022-06-13 11:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
2022-06-07 23:06 ` [PATCH RFC 1/5] cpus-common: Introduce run_on_cpu_func2 which allows error returns Peter Xu
2022-06-07 23:06 ` [PATCH RFC 2/5] cpus-common: Add run_on_cpu2() Peter Xu
2022-06-07 23:06 ` [PATCH RFC 3/5] accel: Allow synchronize_post_init() to take an Error** Peter Xu
2022-06-07 23:06 ` [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp Peter Xu
2022-06-08 17:05   ` Dr. David Alan Gilbert
2022-06-09 21:02     ` Peter Xu
2022-06-10 14:19       ` Peter Xu
2022-06-13 11:13         ` Dr. David Alan Gilbert [this message]
2022-06-07 23:06 ` [PATCH RFC 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yqcbwemb7I/MpGWG@work-vm \
    --to=dgilbert@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rth@twiddle.net \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).