All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Cedric Le Goater <clg@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Marc-Andre Lureau <marcandre.lureau@redhat.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH V3 10/13] migration: stop vm for cpr
Date: Thu, 22 Feb 2024 17:30:32 +0800	[thread overview]
Message-ID: <ZdcUODYUBm8sfnOq@x1n> (raw)
In-Reply-To: <ZdcQFbIJ46sSPf2i@x1n>

On Thu, Feb 22, 2024 at 05:12:53PM +0800, Peter Xu wrote:
> On Wed, Feb 21, 2024 at 04:20:07PM -0500, Steven Sistare wrote:
> > On 2/20/2024 2:33 AM, Peter Xu wrote:
> > > On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote:
> > >> When migration for cpr is initiated, stop the vm and set state
> > >> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
> > >> possibility of ram and device state being out of sync, and guarantees
> > >> that a guest in the suspended state remains suspended, because qmp_cont
> > >> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
> > >>
> > >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > >> ---
> > >>  include/migration/misc.h |  1 +
> > >>  migration/migration.c    | 32 +++++++++++++++++++++++++-------
> > >>  2 files changed, 26 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/include/migration/misc.h b/include/migration/misc.h
> > >> index 6dc234b..54c99a3 100644
> > >> --- a/include/migration/misc.h
> > >> +++ b/include/migration/misc.h
> > >> @@ -60,6 +60,7 @@ void migration_object_init(void);
> > >>  void migration_shutdown(void);
> > >>  bool migration_is_idle(void);
> > >>  bool migration_is_active(MigrationState *);
> > >> +bool migrate_mode_is_cpr(MigrationState *);
> > >>  
> > >>  typedef enum MigrationEventType {
> > >>      MIG_EVENT_PRECOPY_SETUP,
> > >> diff --git a/migration/migration.c b/migration/migration.c
> > >> index d1fce9e..fc5c587 100644
> > >> --- a/migration/migration.c
> > >> +++ b/migration/migration.c
> > >> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s)
> > >>              s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > >>  }
> > >>  
> > >> +bool migrate_mode_is_cpr(MigrationState *s)
> > >> +{
> > >> +    return s->parameters.mode == MIG_MODE_CPR_REBOOT;
> > >> +}
> > >> +
> > >>  int migrate_init(MigrationState *s, Error **errp)
> > >>  {
> > >>      int ret;
> > >> @@ -2651,13 +2656,14 @@ static int migration_completion_precopy(MigrationState *s,
> > >>      bql_lock();
> > >>      migration_downtime_start(s);
> > >>  
> > >> -    s->vm_old_state = runstate_get();
> > >> -    global_state_store();
> > >> -
> > >> -    ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
> > >> -    trace_migration_completion_vm_stop(ret);
> > >> -    if (ret < 0) {
> > >> -        goto out_unlock;
> > >> +    if (!migrate_mode_is_cpr(s)) {
> > >> +        s->vm_old_state = runstate_get();
> > >> +        global_state_store();
> > >> +        ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
> > >> +        trace_migration_completion_vm_stop(ret);
> > >> +        if (ret < 0) {
> > >> +            goto out_unlock;
> > >> +        }
> > >>      }
> > >>  
> > >>      ret = migration_maybe_pause(s, current_active_state,
> > >> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> > >>      Error *local_err = NULL;
> > >>      uint64_t rate_limit;
> > >>      bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
> > >> +    int ret;
> > >>  
> > >>      /*
> > >>       * If there's a previous error, free it and prepare for another one.
> > >> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> > >>          goto fail;
> > >>      }
> > >>  
> > >> +    if (migrate_mode_is_cpr(s)) {
> > >> +        s->vm_old_state = runstate_get();
> > >> +        global_state_store();
> > >> +        ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
> > >> +        trace_migration_completion_vm_stop(ret);
> > >> +        if (ret < 0) {
> > >> +            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
> > >> +            goto fail;
> > >> +        }
> > >> +    }
> > > 
> > > Could we have a helper function for the shared codes?
> > 
> > I propose to add code to migration_stop_vm to make it the helper.  Some call sites emit
> > more traces (via migration_stop_vm) as a result of my refactoring,
> 
> This should be fine.
> 
> > and postcopy start sets 
> > vm_old_state, which is not used thereafter in that path.  Those changes seem harmless to me.
> 
> Not only harmless, I think it was a bug to not set vm_old_state in
> postcopy_start()..  See:
> 
> https://issues.redhat.com/browse/RHEL-18061
> 
> I suspect that was the cause.
> 
> > Tell me what you think:
> 
> I'll have a closer look later, but so far it looks all good.
> 
> Thanks,
> 
> > 
> > -------------------------------------------------------
> > diff --git a/migration/migration.c b/migration/migration.c
> > index fc5c587..30d2b08 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -107,12 +107,6 @@ static int migration_maybe_pause(MigrationState *s,
> >  static void migrate_fd_cancel(MigrationState *s);
> >  static bool close_return_path_on_source(MigrationState *s);
> > 
> > -static void migration_downtime_start(MigrationState *s)
> > -{
> > -    trace_vmstate_downtime_checkpoint("src-downtime-start");
> > -    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > -}

Ah.. one more thing: would you mind keep this helper even if it can be
squashed when sending formal patch?  I want to keep downtime start/end
super clear and paired as they're important hook points.  It should be
inlined anyway by the compiler.

> > -
> >  static void migration_downtime_end(MigrationState *s)
> >  {
> >      int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > @@ -161,11 +155,20 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpo
> >      return (a > b) - (a < b);
> >  }
> > 
> > -int migration_stop_vm(RunState state)
> > +static int migration_stop_vm(MigrationState *s, RunState state)
> >  {
> > -    int ret = vm_stop_force_state(state);
> > +    int ret;
> > +
> > +    trace_vmstate_downtime_checkpoint("src-downtime-start");
> > +    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +
> > +    s->vm_old_state = runstate_get();
> > +    global_state_store();
> > +
> > +    ret = vm_stop_force_state(state);
> > 
> >      trace_vmstate_downtime_checkpoint("src-vm-stopped");
> > +    trace_migration_completion_vm_stop(ret);
> > 
> >      return ret;
> >  }
> > @@ -2454,10 +2457,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
> >      bql_lock();
> >      trace_postcopy_start_set_run();
> > 
> > -    migration_downtime_start(ms);
> > -
> > -    global_state_store();
> > -    ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
> > +    ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
> >      if (ret < 0) {
> >          goto fail;
> >      }
> > @@ -2654,13 +2654,9 @@ static int migration_completion_precopy(MigrationState *s,
> >      int ret;
> > 
> >      bql_lock();
> > -    migration_downtime_start(s);
> > 
> >      if (!migrate_mode_is_cpr(s)) {
> > -        s->vm_old_state = runstate_get();
> > -        global_state_store();
> > -        ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
> > -        trace_migration_completion_vm_stop(ret);
> > +        ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
> >          if (ret < 0) {
> >              goto out_unlock;
> >          }
> > @@ -3498,15 +3494,10 @@ static void *bg_migration_thread(void *opaque)
> >      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> > 
> >      trace_migration_thread_setup_complete();
> > -    migration_downtime_start(s);
> > 
> >      bql_lock();
> > 
> > -    s->vm_old_state = runstate_get();
> > -
> > -    global_state_store();
> > -    /* Forcibly stop VM before saving state of vCPUs and devices */
> > -    if (migration_stop_vm(RUN_STATE_PAUSED)) {
> > +    if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
> >          goto fail;
> >      }
> >      /*
> > @@ -3659,10 +3650,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> >      }
> > 
> >      if (migrate_mode_is_cpr(s)) {
> > -        s->vm_old_state = runstate_get();
> > -        global_state_store();
> > -        ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
> > -        trace_migration_completion_vm_stop(ret);
> > +        ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
> >          if (ret < 0) {
> >              error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
> >              goto fail;
> > diff --git a/migration/migration.h b/migration/migration.h
> > index aef8afb..65c0b61 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
> >   */
> >  void migration_rp_kick(MigrationState *s);
> > 
> > -int migration_stop_vm(RunState state);
> > -
> >  #endif
> > -------------------------------------------------------
> > 
> > - Steve
> > 
> 
> -- 
> Peter Xu

-- 
Peter Xu



  reply	other threads:[~2024-02-22  9:31 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-08 18:53 [PATCH V3 00/13] allow cpr-reboot for vfio Steve Sistare
2024-02-08 18:53 ` [PATCH V3 01/13] notify: pass error to notifier with return Steve Sistare
2024-02-12  9:08   ` David Hildenbrand
2024-02-08 18:53 ` [PATCH V3 02/13] migration: remove error from notifier data Steve Sistare
2024-02-12  9:08   ` David Hildenbrand
2024-02-08 18:53 ` [PATCH V3 03/13] migration: convert to NotifierWithReturn Steve Sistare
2024-02-12  9:10   ` David Hildenbrand
2024-02-08 18:53 ` [PATCH V3 04/13] migration: MigrationEvent for notifiers Steve Sistare
2024-02-12  9:11   ` David Hildenbrand
2024-02-20  6:38   ` Peter Xu
2024-02-08 18:53 ` [PATCH V3 05/13] migration: remove postcopy_after_devices Steve Sistare
2024-02-20  6:42   ` Peter Xu
2024-02-08 18:53 ` [PATCH V3 06/13] migration: MigrationNotifyFunc Steve Sistare
2024-02-12  9:14   ` David Hildenbrand
2024-02-20  6:48   ` Peter Xu
2024-02-08 18:54 ` [PATCH V3 07/13] migration: per-mode notifiers Steve Sistare
2024-02-12  9:16   ` David Hildenbrand
2024-02-20  6:51   ` Peter Xu
2024-02-08 18:54 ` [PATCH V3 08/13] migration: refactor migrate_fd_connect failures Steve Sistare
2024-02-12  9:17   ` David Hildenbrand
2024-02-08 18:54 ` [PATCH V3 09/13] migration: notifier error checking Steve Sistare
2024-02-12  9:24   ` David Hildenbrand
2024-02-12 15:37     ` Steven Sistare
2024-02-20  7:12   ` Peter Xu
2024-02-20 22:12     ` Steven Sistare
2024-02-08 18:54 ` [PATCH V3 10/13] migration: stop vm for cpr Steve Sistare
2024-02-20  7:33   ` Peter Xu
2024-02-20 22:19     ` Steven Sistare
2024-02-21 21:20     ` Steven Sistare
2024-02-22  9:12       ` Peter Xu
2024-02-22  9:30         ` Peter Xu [this message]
2024-02-22 13:29           ` Steven Sistare
2024-02-21 21:23     ` Steven Sistare
2024-02-22  9:03       ` Peter Xu
2024-02-22 13:24         ` Steven Sistare
2024-02-08 18:54 ` [PATCH V3 11/13] vfio: register container " Steve Sistare
2024-02-08 18:54 ` [PATCH V3 12/13] vfio: allow cpr-reboot migration if suspended Steve Sistare
2024-02-21 18:32   ` Steven Sistare
2024-02-08 18:54 ` [PATCH V3 13/13] migration: update cpr-reboot description Steve Sistare
2024-02-20  7:49 ` [PATCH V3 00/13] allow cpr-reboot for vfio Peter Xu
2024-02-20 22:32   ` Steven Sistare
2024-02-21  2:13     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdcUODYUBm8sfnOq@x1n \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=clg@redhat.com \
    --cc=david@redhat.com \
    --cc=farosas@suse.de \
    --cc=jasowang@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.