qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juan Quintela" <quintela@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH V1 2/3] migration: fix suspended runstate
Date: Wed, 21 Jun 2023 15:15:42 -0400	[thread overview]
Message-ID: <6adfae20-60fe-ae08-1685-160b2a1efab5@oracle.com> (raw)
In-Reply-To: <ZJIeR7svXvtHdgs4@x1n>

On 6/20/2023 5:46 PM, Peter Xu wrote:
> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>> Migration of a guest in the suspended state is broken.  The incoming
>> migration code automatically tries to wake the guest, which IMO is
>> wrong -- the guest should end migration in the same state it started.
>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>> bypasses vm_start().  The guest appears to be in the running state, but
>> it is not.
>>
>> To fix, leave the guest in the suspended state, but call
>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>> later, when the client sends a system_wakeup command.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  migration/migration.c | 11 ++++-------
>>  softmmu/runstate.c    |  1 +
>>  2 files changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 17b4b47..851fe6d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>>          vm_start();
>>      } else {
>>          runstate_set(global_state_get_runstate());
>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
>> +            /* Force vm_start to be called later. */
>> +            qemu_system_start_on_wakeup_request();
>> +        }
> 
> Is this really needed, along with patch 1?
> 
> I have a very limited knowledge on suspension, so I'm prone to making
> mistakes..
> 
> But from what I read this, qemu_system_wakeup_request() (existing one, not
> after patch 1 applied) will setup wakeup_reason and kick the main thread
> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> the main thread later on after qemu_wakeup_requested() returns true.

Correct, here:

    if (qemu_wakeup_requested()) {
        pause_all_vcpus();
        qemu_system_wakeup();
        notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
        wakeup_reason = QEMU_WAKEUP_REASON_NONE;
        resume_all_vcpus();
        qapi_event_send_wakeup();
    }

However, that is not sufficient, because vm_start() was never called on the incoming
side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.


Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
guest, which sets the state to running, which is saved in global state:

    void migration_completion(MigrationState *s)
        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
        global_state_store()

Then the incoming migration calls vm_start here:

    migration/migration.c
        if (!global_state_received() ||
            global_state_get_runstate() == RUN_STATE_RUNNING) {
            if (autostart) {
                vm_start();

vm_start must be called for correctness.

- Steve

>>      }
>>      /*
>>       * This must happen after any state changes since as soon as an external
>> @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms)
>>      qemu_mutex_lock_iothread();
>>      trace_postcopy_start_set_run();
>>  
>> -    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>      global_state_store();
>>      ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>      if (ret < 0) {
>> @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s)
>>      if (s->state == MIGRATION_STATUS_ACTIVE) {
>>          qemu_mutex_lock_iothread();
>>          s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> -        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>  
>>          s->vm_old_state = runstate_get();
>>          global_state_store();
>> @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque)
>>  
>>      qemu_mutex_lock_iothread();
>>  
>> -    /*
>> -     * If VM is currently in suspended state, then, to make a valid runstate
>> -     * transition in vm_stop_force_state() we need to wakeup it up.
>> -     */
>> -    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> 
> Removal of these three places seems reasonable to me, or we won't persist
> the SUSPEND state.
> 
> Above comment was the major reason I used to have thought it was needed
> (again, based on zero knowledge around this..), but perhaps it was just
> wrong?  I would assume vm_stop_force_state() will still just work with
> suepended, am I right?
> 
>>      s->vm_old_state = runstate_get();
>>  
>>      global_state_store();
>> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
>> index e127b21..771896c 100644
>> --- a/softmmu/runstate.c
>> +++ b/softmmu/runstate.c
>> @@ -159,6 +159,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>      { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>> +    { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>>  
>> -- 
>> 1.8.3.1
>>
> 


  reply	other threads:[~2023-06-21 19:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 20:26 [PATCH V1 0/3] fix migration of suspended runstate Steve Sistare
2023-06-15 20:26 ` [PATCH V1 1/3] vl: start on wakeup request Steve Sistare
2023-06-15 20:26 ` [PATCH V1 2/3] migration: fix suspended runstate Steve Sistare
2023-06-20 21:46   ` Peter Xu
2023-06-21 19:15     ` Steven Sistare [this message]
2023-06-21 20:28       ` Peter Xu
2023-06-23 18:25         ` Steven Sistare
2023-06-23 19:56           ` Steven Sistare
2023-06-26 18:27           ` Peter Xu
2023-06-26 19:11             ` Peter Xu
2023-06-30 13:50             ` Steven Sistare
2023-07-26 20:18               ` Peter Xu
2023-08-14 18:53                 ` Steven Sistare
2023-08-14 19:37                   ` Peter Xu
2023-08-16 17:48                     ` Steven Sistare
2023-08-17 18:19                       ` Peter Xu
2023-08-24 20:53                         ` Steven Sistare
2023-06-15 20:26 ` [PATCH V1 3/3] tests/qtest: live migration suspended state Steve Sistare
2023-06-21 16:45   ` Peter Xu
2023-06-21 19:39     ` Steven Sistare
2023-06-21 20:00       ` Peter Xu
2023-06-22 21:28         ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6adfae20-60fe-ae08-1685-160b2a1efab5@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=berrange@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).