qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juan Quintela" <quintela@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Leonardo Bras" <leobras@redhat.com>
Subject: Re: [PATCH V6 05/14] migration: propagate suspended runstate
Date: Mon, 4 Dec 2023 17:23:57 -0500	[thread overview]
Message-ID: <38a8b047-4c60-4e6f-9ccd-a307d0358f53@oracle.com> (raw)
In-Reply-To: <ZW4LX9FpfTj77TZv@x1n>

On 12/4/2023 12:24 PM, Peter Xu wrote:
> On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
>>>> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int version_id)
>>>>          return -EINVAL;
>>>>      }
>>>>      s->state = r;
>>>> +    vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
>>>
>>> IIUC current vm_was_suspended (based on my read of your patch) was not the
>>> same as a boolean representing "whether VM is suspended", but only a
>>> temporary field to remember that for a VM stop request.  To be explicit, I
>>> didn't see this flag set in qemu_system_suspend() in your previous patch.
>>>
>>> If so, we can already do:
>>>
>>>   vm_set_suspended(s->vm_was_suspended);
>>>
>>> Irrelevant of RUN_STATE_SUSPENDED?
>>
>> We need both terms of the expression.
>>
>> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = false.
>> We call global_state_store prior to vm_stop_force_state, so the incoming
>> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
> 
> Right.
> 
>> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
>> calling vm_start, we need to restore the suspended state.  Thus in 
>> global_state_post_load, we must set vm_was_suspended = true.
> 
> With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
> already?  Then I think it should call vm_start(SUSPENDED) if to start.

The V6 code does not pass a state to vm_start, and knowledge of vm_was_suspended
is confined to the global_state and cpus functions.  IMO this is a more modular
and robust solution, as multiple sites may call vm_start(), and the right thing
happens.  Look at patch 6.  The changes are minimal because vm_start "just works".

> Maybe you're talking about the special case where autostart==false?  We
> used to have this (existing process_incoming_migration_bh()):
> 
>     if (!global_state_received() ||
>         global_state_get_runstate() == RUN_STATE_RUNNING) {
>         if (autostart) {
>             vm_start();
>         } else {
>             runstate_set(RUN_STATE_PAUSED);
>         }
>     }
> 
> If so maybe I get you, because in the "else" path we do seem to lose the
> SUSPENDED state again, but in that case IMHO we should logically set
> vm_was_suspended only when we "lose" it - we didn't lose it during
> migration, but only until we decided to switch to PAUSED (due to
> autostart==false). IOW, change above to something like:
> 
>     state = global_state_get_runstate();
>     if (!global_state_received() || runstate_is_alive(state)) {
>         if (autostart) {
>             vm_start(state);
>         } else {
>             if (runstate_is_suspended(state)) {
>                 /* Remember suspended state before setting system to STOPed */
>                 vm_was_suspended = true;
>             }
>             runstate_set(RUN_STATE_PAUSED);
>         }
>     }

This is similar to V5 which tested suspended and fiddled with runstate at
multiple call sites in migration and snapshot.  I believe V6 is cleaner.

> It may or may not have a functional difference even if current patch,
> though.  However maybe clearer to follow vm_was_suspended's strict
> definition.
> 
>>
>> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
>> then vm_was_suspended = true.  Migration from that state sets
>> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and 
>> ends with runstate_set(RUN_STATE_PAUSED).
>>
>> I will add a comment here in the code.
>>  
>>>>      return 0;
>>>>  }
>>>> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
>>>>      .fields = (VMStateField[]) {
>>>>          VMSTATE_UINT32(size, GlobalState),
>>>>          VMSTATE_BUFFER(runstate, GlobalState),
>>>> +        VMSTATE_BOOL(vm_was_suspended, GlobalState),
>>>>          VMSTATE_END_OF_LIST()
>>>>      },
>>>>  };
>>>
>>> I think this will break migration between old/new, unfortunately.  And
>>> since the global state exist mostly for every VM, all VM setup should be
>>> affected, and over all archs.
>>
>> Thanks, I keep forgetting that my binary tricks are no good here.  However,
>> I have one other trick up my sleeve, which is to store vm_was_running in
>> global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
>> compatible, since that byte is always 0 in older qemu.  It can be implemented
>> with a few lines of code change confined to global_state.c, versus many lines 
>> spread across files to do it the conventional way using a compat property and
>> a subsection.  Sound OK?  
> 
> Tricky!  But sounds okay to me.  I think you're inventing some of your own
> way of being compatible, not relying on machine type as a benefit.  If go
> this route please document clearly on the layout and also what it looked
> like in old binaries.
> 
> I think maybe it'll be good to keep using strings, so in the new binaries
> we allow >1 strings, then we define properly on those strings (index 0:
> runstate, existed since start; index 2: suspended, perhaps using "1"/"0" to
> express, while 0x00 means old binary, etc.).
> 
> I hope this trick will need less code than the subsection solution,
> otherwise I'd still consider going with that, which is the "common
> solution".
> 
> Let's also see whether Juan/Fabiano/others has any opinions.

The disadvantage of using strings '0' and '1' is the additional check for
the backwards compatible value 0x00.  No big deal, and I'll do that if you prefer, 
but it seems unnecessary. I had already written this for binary 0/1. Not yet tested, 
and still needs comments:

-----------
diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8..8a59554 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -32,9 +32,10 @@ static GlobalState global_state;
 static void global_state_do_store(RunState state)
 {
     const char *state_str = RunState_str(state);
-    assert(strlen(state_str) < sizeof(global_state.runstate));
+    assert(strlen(state_str) < sizeof(global_state.runstate) - 2);
     strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
               state_str, '\0');
+    global_state.runstate[strlen(state_str) + 1] = vm_get_suspended();
 }

 void global_state_store(void)
@@ -68,6 +69,12 @@ static bool global_state_needed(void *opaque)
         return true;
     }

+    /* If the suspended state must be remembered, it is needed */
+
+    if (vm_get_suspended()) {
+        return true;
+    }
+
     /* If state is running or paused, it is not needed */

     if (strcmp(runstate, "running") == 0 ||
@@ -109,6 +116,8 @@ static int global_state_post_load(void *opaque, int version_
         return -EINVAL;
     }
     s->state = r;
+    bool vm_was_suspended = runstate[strlen(runstate) + 1];
+    vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);

     return 0;
 }
------------

- Steve


  parent reply	other threads:[~2023-12-04 22:25 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 21:37 [PATCH V6 00/14] fix migration of suspended runstate Steve Sistare
2023-11-30 21:37 ` [PATCH V6 01/14] cpus: pass runstate to vm_prepare_start Steve Sistare
2023-11-30 21:37 ` [PATCH V6 02/14] cpus: vm_was_suspended Steve Sistare
2023-11-30 22:03   ` Peter Xu
2023-11-30 21:37 ` [PATCH V6 03/14] cpus: stop vm in suspended runstate Steve Sistare
2023-11-30 22:10   ` Peter Xu
2023-12-01 17:11     ` Steven Sistare
2023-12-04 16:35       ` Peter Xu
2023-12-04 16:41         ` Steven Sistare
2023-12-22 12:20   ` Markus Armbruster
2023-12-22 15:53     ` Steven Sistare
2023-12-23  5:41       ` Markus Armbruster
2024-01-03 13:09         ` Peter Xu
2024-01-03 13:32           ` Steven Sistare
2024-01-03 14:47         ` Steven Sistare
2024-01-08  7:43           ` Markus Armbruster
2023-11-30 21:37 ` [PATCH V6 04/14] cpus: vm_resume Steve Sistare
2023-12-05 21:36   ` Peter Xu
2023-11-30 21:37 ` [PATCH V6 05/14] migration: propagate suspended runstate Steve Sistare
2023-11-30 23:06   ` Peter Xu
2023-12-01 16:23     ` Steven Sistare
2023-12-04 17:24       ` Peter Xu
2023-12-04 19:31         ` Fabiano Rosas
2023-12-04 20:02           ` Peter Xu
2023-12-04 21:09             ` Fabiano Rosas
2023-12-04 22:04               ` Peter Xu
2023-12-05 12:44                 ` Fabiano Rosas
2023-12-05 14:14                   ` Steven Sistare
2023-12-05 16:18                   ` Peter Xu
2023-12-05 16:52                     ` Fabiano Rosas
2023-12-05 17:04                       ` Steven Sistare
2023-12-04 22:23         ` Steven Sistare [this message]
2023-12-05 16:50           ` Peter Xu
2023-12-05 17:48             ` Steven Sistare
2023-11-30 21:37 ` [PATCH V6 06/14] migration: preserve " Steve Sistare
2023-12-05 21:34   ` Peter Xu
2023-11-30 21:37 ` [PATCH V6 07/14] migration: preserve suspended for snapshot Steve Sistare
2023-12-05 21:35   ` Peter Xu
2023-11-30 21:37 ` [PATCH V6 08/14] migration: preserve suspended for bg_migration Steve Sistare
2023-12-05 21:35   ` Peter Xu
2023-11-30 21:37 ` [PATCH V6 09/14] tests/qtest: migration events Steve Sistare
2023-11-30 21:37 ` [PATCH V6 10/14] tests/qtest: option to suspend during migration Steve Sistare
2023-12-04 21:14   ` Fabiano Rosas
2023-11-30 21:37 ` [PATCH V6 11/14] tests/qtest: precopy migration with suspend Steve Sistare
2023-12-04 20:49   ` Peter Xu
2023-12-05 16:14     ` Steven Sistare
2023-12-05 21:07       ` Peter Xu
2023-11-30 21:37 ` [PATCH V6 12/14] tests/qtest: postcopy " Steve Sistare
2023-11-30 21:37 ` [PATCH V6 13/14] tests/qtest: bootfile per vm Steve Sistare
2023-12-04 21:13   ` Fabiano Rosas
2023-12-04 22:37     ` Peter Xu
2023-12-05 18:43       ` Steven Sistare
2023-11-30 21:37 ` [PATCH V6 14/14] tests/qtest: background migration with suspend Steve Sistare
2023-12-04 21:14   ` Fabiano Rosas
2023-12-05 18:52 ` [PATCH V6 00/14] fix migration of suspended runstate Steven Sistare
2023-12-05 19:19   ` Fabiano Rosas
2023-12-05 21:37   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38a8b047-4c60-4e6f-9ccd-a307d0358f53@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).