[PATCH V1 0/3] fix migration of suspended runstate

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V1 0/3] fix migration of suspended runstate
@ 2023-06-15 20:26 Steve Sistare
  2023-06-15 20:26 ` [PATCH V1 1/3] vl: start on wakeup request Steve Sistare
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Steve Sistare @ 2023-06-15 20:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé, Steve Sistare

Migration of a guest in the suspended runstate is broken.
The incoming migration code automatically tries to wake the guest,
which IMO is wrong -- the guest should end migration in the same
runstate it started.  Further, the automatic wakeup fails.  The guest
appears to be running, but is not.  See the commit messages for
the details.

Steve Sistare (3):
  vl: start on wakeup request
  migration: fix suspended runstate
  tests/qtest: live migration suspended state

 include/sysemu/runstate.h            |  1 +
 migration/migration.c                | 11 +++-----
 softmmu/runstate.c                   | 16 +++++++++++-
 tests/migration/i386/Makefile        |  5 ++--
 tests/migration/i386/a-b-bootblock.S | 49 +++++++++++++++++++++++++++++++++---
 tests/migration/i386/a-b-bootblock.h | 22 ++++++++++------
 tests/qtest/migration-helpers.c      |  2 +-
 tests/qtest/migration-test.c         | 31 +++++++++++++++++++++--
 8 files changed, 112 insertions(+), 25 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V1 1/3] vl: start on wakeup request
  2023-06-15 20:26 [PATCH V1 0/3] fix migration of suspended runstate Steve Sistare
@ 2023-06-15 20:26 ` Steve Sistare
  2023-06-15 20:26 ` [PATCH V1 2/3] migration: fix suspended runstate Steve Sistare
  2023-06-15 20:26 ` [PATCH V1 3/3] tests/qtest: live migration suspended state Steve Sistare
  2 siblings, 0 replies; 22+ messages in thread
From: Steve Sistare @ 2023-06-15 20:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé, Steve Sistare

If qemu starts and loads a VM in the suspended state, then a later wakeup
request sets the state to running and tries to start execution, but it
bypasses vm_start() and its initialization steps, which is fatal for the
guest.  See qemu_system_wakeup_request(), and qemu_system_wakeup() in
main_loop_should_exit().

Define the start_on_wakeup_requested() hook to cause vm_start() to be called
when processing the wakeup request.  This will be called in a subsequent
migration patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/sysemu/runstate.h |  1 +
 softmmu/runstate.c        | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c..c12ad7d 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -57,6 +57,7 @@ void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
 bool qemu_wakeup_suspend_enabled(void);
+void qemu_system_start_on_wakeup_request(void);
 void qemu_system_wakeup_request(WakeupReason reason, Error **errp);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 1957caf..e127b21 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -343,6 +343,7 @@ void vm_state_notify(bool running, RunState state)
     }
 }
 
+static bool start_on_wakeup_requested;
 static ShutdownCause reset_requested;
 static ShutdownCause shutdown_requested;
 static int shutdown_signal;
@@ -568,6 +569,11 @@ void qemu_register_suspend_notifier(Notifier *notifier)
     notifier_list_add(&suspend_notifiers, notifier);
 }
 
+void qemu_system_start_on_wakeup_request(void)
+{
+    start_on_wakeup_requested = true;
+}
+
 void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
 {
     trace_system_wakeup_request(reason);
@@ -580,7 +586,14 @@ void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
     if (!(wakeup_reason_mask & (1 << reason))) {
         return;
     }
-    runstate_set(RUN_STATE_RUNNING);
+
+    if (start_on_wakeup_requested) {
+        start_on_wakeup_requested = false;
+        vm_start();
+    } else {
+        runstate_set(RUN_STATE_RUNNING);
+    }
+
     wakeup_reason = reason;
     qemu_notify_event();
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-15 20:26 [PATCH V1 0/3] fix migration of suspended runstate Steve Sistare
  2023-06-15 20:26 ` [PATCH V1 1/3] vl: start on wakeup request Steve Sistare
@ 2023-06-15 20:26 ` Steve Sistare
  2023-06-20 21:46   ` Peter Xu
  2023-06-15 20:26 ` [PATCH V1 3/3] tests/qtest: live migration suspended state Steve Sistare
  2 siblings, 1 reply; 22+ messages in thread
From: Steve Sistare @ 2023-06-15 20:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé, Steve Sistare

Migration of a guest in the suspended state is broken.  The incoming
migration code automatically tries to wake the guest, which IMO is
wrong -- the guest should end migration in the same state it started.
Further, the wakeup is done by calling qemu_system_wakeup_request(), which
bypasses vm_start().  The guest appears to be in the running state, but
it is not.

To fix, leave the guest in the suspended state, but call
qemu_system_start_on_wakeup_request() so the guest is properly resumed
later, when the client sends a system_wakeup command.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/migration.c | 11 ++++-------
 softmmu/runstate.c    |  1 +
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 17b4b47..851fe6d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
         vm_start();
     } else {
         runstate_set(global_state_get_runstate());
+        if (runstate_check(RUN_STATE_SUSPENDED)) {
+            /* Force vm_start to be called later. */
+            qemu_system_start_on_wakeup_request();
+        }
     }
     /*
      * This must happen after any state changes since as soon as an external
@@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms)
     qemu_mutex_lock_iothread();
     trace_postcopy_start_set_run();
 
-    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
     global_state_store();
     ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
     if (ret < 0) {
@@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s)
     if (s->state == MIGRATION_STATUS_ACTIVE) {
         qemu_mutex_lock_iothread();
         s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
         s->vm_old_state = runstate_get();
         global_state_store();
@@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque)
 
     qemu_mutex_lock_iothread();
 
-    /*
-     * If VM is currently in suspended state, then, to make a valid runstate
-     * transition in vm_stop_force_state() we need to wakeup it up.
-     */
-    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
     s->vm_old_state = runstate_get();
 
     global_state_store();
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index e127b21..771896c 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -159,6 +159,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
     { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
     { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED },
     { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
     { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V1 3/3] tests/qtest: live migration suspended state
  2023-06-15 20:26 [PATCH V1 0/3] fix migration of suspended runstate Steve Sistare
  2023-06-15 20:26 ` [PATCH V1 1/3] vl: start on wakeup request Steve Sistare
  2023-06-15 20:26 ` [PATCH V1 2/3] migration: fix suspended runstate Steve Sistare
@ 2023-06-15 20:26 ` Steve Sistare
  2023-06-21 16:45   ` Peter Xu
  2 siblings, 1 reply; 22+ messages in thread
From: Steve Sistare @ 2023-06-15 20:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé, Steve Sistare

Add a test case to verify that the suspended state is handled correctly in
live migration.  The test suspends the src, migrates, then wakes the dest.

Add an option to suspend the src in a-b-bootblock.S, which puts the guest
in S3 state after one round of writing to memory.  The option is enabled by
poking a 1 into the suspend_me word in the boot block prior to starting the
src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
offset is known.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 tests/migration/i386/Makefile        |  5 ++--
 tests/migration/i386/a-b-bootblock.S | 49 +++++++++++++++++++++++++++++++++---
 tests/migration/i386/a-b-bootblock.h | 22 ++++++++++------
 tests/qtest/migration-helpers.c      |  2 +-
 tests/qtest/migration-test.c         | 31 +++++++++++++++++++++--
 5 files changed, 92 insertions(+), 17 deletions(-)

diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
index 5c03241..37a72ae 100644
--- a/tests/migration/i386/Makefile
+++ b/tests/migration/i386/Makefile
@@ -4,9 +4,10 @@
 .PHONY: all clean
 all: a-b-bootblock.h
 
-a-b-bootblock.h: x86.bootsect
+a-b-bootblock.h: x86.bootsect x86.o
 	echo "$$__note" > header.tmp
 	xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
+	nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
 	mv header.tmp $@
 
 x86.bootsect: x86.boot
@@ -16,7 +17,7 @@ x86.boot: x86.o
 	$(CROSS_PREFIX)objcopy -O binary $< $@
 
 x86.o: a-b-bootblock.S
-	$(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
+	$(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
 
 clean:
 	@rm -rf *.boot *.o *.bootsect
diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S
index 3d464c7..63d446f 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -9,6 +9,21 @@
 #
 # Author: dgilbert@redhat.com
 
+#include "migration-test.h"
+
+#define ACPI_ENABLE         0xf1
+#define ACPI_PORT_SMI_CMD   0xb2
+#define ACPI_PM_BASE        0x600
+#define PM1A_CNT_OFFSET     4
+
+#define ACPI_SCI_ENABLE     0x0001
+#define ACPI_SLEEP_TYPE     0x0400
+#define ACPI_SLEEP_ENABLE   0x2000
+#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
+
+#define LOW_ADDR            X86_TEST_MEM_START
+#define HIGH_ADDR           X86_TEST_MEM_END
+#define suspended           HIGH_ADDR
 
 .code16
 .org 0x7c00
@@ -41,12 +56,11 @@ start:             # at 0x7c00 ?
         # bl keeps a counter so we limit the output speed
         mov $0, %bl
 mainloop:
-        # Start from 1MB
-        mov $(1024*1024),%eax
+        mov $LOW_ADDR,%eax
 innerloop:
         incb (%eax)
         add $4096,%eax
-        cmp $(100*1024*1024),%eax
+        cmp $HIGH_ADDR,%eax
         jl innerloop
 
         inc %bl
@@ -57,7 +71,30 @@ innerloop:
         mov $0x3f8,%dx
         outb %al,%dx
 
-        jmp mainloop
+        # should this test suspend?
+        mov (suspend_me),%eax
+        cmp $0,%eax
+        je mainloop
+
+        # are we waking after suspend?  do not suspend again.
+        mov $suspended,%eax
+        mov (%eax),%eax
+        cmp $1,%eax
+        je mainloop
+
+        # enable acpi
+        mov $ACPI_ENABLE,%al
+        outb %al,$ACPI_PORT_SMI_CMD
+
+        # suspend to ram
+        mov $suspended,%eax
+        movl $1,(%eax)
+        mov $SLEEP,%ax
+        mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
+        outw %ax,%dx
+        # not reached.  The wakeup causes reset and restart at 0x7c00, and we
+        # do not save and restore registers as a real kernel would do.
+
 
         # GDT magic from old (GPLv2)  Grub startup.S
         .p2align        2       /* force 4-byte alignment */
@@ -83,6 +120,10 @@ gdtdesc:
         .word   0x27                    /* limit */
         .long   gdt                     /* addr */
 
+        /* test launcher can poke a 1 here to exercise suspend */
+suspend_me:
+        .int  0
+
 /* I'm a bootable disk */
 .org 0x7dfe
         .byte 0x55
diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h
index b7b0fce..b39773f 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,20 +4,20 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0xa4, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
   0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
   0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
   0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8,
-  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
-  0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00,
-  0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xa1, 0xaa, 0x7c, 0x00, 0x00,
+  0x83, 0xf8, 0x00, 0x74, 0xd3, 0xb8, 0x00, 0x00, 0x40, 0x06, 0x8b, 0x00,
+  0x83, 0xf8, 0x01, 0x74, 0xc7, 0xb0, 0xf1, 0xe6, 0xb2, 0xb8, 0x00, 0x00,
+  0x40, 0x06, 0xc7, 0x00, 0x01, 0x00, 0x00, 0x00, 0x66, 0xb8, 0x01, 0x24,
+  0x66, 0xba, 0x04, 0x06, 0x66, 0xef, 0x66, 0x90, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
+  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x8c, 0x7c,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
@@ -49,3 +49,9 @@ unsigned char x86_bootsect[] = {
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x55, 0xaa
 };
 
+#define SYM_gdt 0x00007c8c
+#define SYM_gdtdesc 0x00007ca4
+#define SYM_innerloop 0x00007c3d
+#define SYM_mainloop 0x00007c38
+#define SYM_start 0x00007c00
+#define SYM_suspend_me 0x00007caa
diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index be00c52..433d678 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -28,7 +28,7 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
 {
     bool *seen = opaque;
 
-    if (g_str_equal(name, "STOP")) {
+    if (g_str_equal(name, "STOP") || g_str_equal(name, "SUSPEND")) {
         *seen = true;
         return true;
     }
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355b..73b07b3 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -121,7 +121,7 @@ static void init_bootfile(const char *bootpath, void *content, size_t len)
 /*
  * Wait for some output in the serial output file,
  * we get an 'A' followed by an endless string of 'B's
- * but on the destination we won't have the A.
+ * but on the destination we won't have the A (unless we enabled suspend/resume)
  */
 static void wait_for_serial(const char *side)
 {
@@ -507,6 +507,8 @@ typedef struct {
     bool use_dirty_ring;
     const char *opts_source;
     const char *opts_target;
+    /* suspend the src before migrating to dest. */
+    bool suspend;
 } MigrateStart;
 
 /*
@@ -617,6 +619,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
         }
     }
 
+    x86_bootsect[SYM_suspend_me - SYM_start] = args->suspend;
+
     got_src_stop = false;
     got_dst_resume = false;
     bootpath = g_strdup_printf("%s/bootsect", tmpfs);
@@ -1475,7 +1479,13 @@ static void test_precopy_common(MigrateCommon *args)
              */
             wait_for_migration_complete(to);
 
-            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
+            if (!args->start.suspend) {
+                qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
+            }
+        }
+
+        if (args->start.suspend) {
+            qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
         }
 
         if (!got_dst_resume) {
@@ -1508,6 +1518,18 @@ static void test_precopy_unix_plain(void)
     test_precopy_common(&args);
 }
 
+static void test_precopy_unix_suspend(void)
+{
+    g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+    MigrateCommon args = {
+        .listen_uri = uri,
+        .connect_uri = uri,
+        .live = true,
+        .start.suspend = true,
+    };
+
+    test_precopy_common(&args);
+}
 
 static void test_precopy_unix_dirty_ring(void)
 {
@@ -2682,6 +2704,11 @@ int main(int argc, char **argv)
 
     module_call_init(MODULE_INIT_QOM);
 
+    if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
+        qtest_add_func("/migration/precopy/unix/suspend",
+                       test_precopy_unix_suspend);
+    }
+
     if (has_uffd) {
         qtest_add_func("/migration/postcopy/plain", test_postcopy);
         qtest_add_func("/migration/postcopy/recovery/plain",
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-15 20:26 ` [PATCH V1 2/3] migration: fix suspended runstate Steve Sistare
@ 2023-06-20 21:46   ` Peter Xu
  2023-06-21 19:15     ` Steven Sistare
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-06-20 21:46 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> Migration of a guest in the suspended state is broken.  The incoming
> migration code automatically tries to wake the guest, which IMO is
> wrong -- the guest should end migration in the same state it started.
> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> bypasses vm_start().  The guest appears to be in the running state, but
> it is not.
> 
> To fix, leave the guest in the suspended state, but call
> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> later, when the client sends a system_wakeup command.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  migration/migration.c | 11 ++++-------
>  softmmu/runstate.c    |  1 +
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 17b4b47..851fe6d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>          vm_start();
>      } else {
>          runstate_set(global_state_get_runstate());
> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
> +            /* Force vm_start to be called later. */
> +            qemu_system_start_on_wakeup_request();
> +        }

Is this really needed, along with patch 1?

I have a very limited knowledge on suspension, so I'm prone to making
mistakes..

But from what I read this, qemu_system_wakeup_request() (existing one, not
after patch 1 applied) will setup wakeup_reason and kick the main thread
using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
the main thread later on after qemu_wakeup_requested() returns true.

>      }
>      /*
>       * This must happen after any state changes since as soon as an external
> @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms)
>      qemu_mutex_lock_iothread();
>      trace_postcopy_start_set_run();
>  
> -    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>      global_state_store();
>      ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>      if (ret < 0) {
> @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s)
>      if (s->state == MIGRATION_STATUS_ACTIVE) {
>          qemu_mutex_lock_iothread();
>          s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>  
>          s->vm_old_state = runstate_get();
>          global_state_store();
> @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque)
>  
>      qemu_mutex_lock_iothread();
>  
> -    /*
> -     * If VM is currently in suspended state, then, to make a valid runstate
> -     * transition in vm_stop_force_state() we need to wakeup it up.
> -     */
> -    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);

Removal of these three places seems reasonable to me, or we won't persist
the SUSPEND state.

Above comment was the major reason I used to have thought it was needed
(again, based on zero knowledge around this..), but perhaps it was just
wrong?  I would assume vm_stop_force_state() will still just work with
suepended, am I right?

>      s->vm_old_state = runstate_get();
>  
>      global_state_store();
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index e127b21..771896c 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -159,6 +159,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>      { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>      { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED },
>      { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
>      { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>  
> -- 
> 1.8.3.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 3/3] tests/qtest: live migration suspended state
  2023-06-15 20:26 ` [PATCH V1 3/3] tests/qtest: live migration suspended state Steve Sistare
@ 2023-06-21 16:45   ` Peter Xu
  2023-06-21 19:39     ` Steven Sistare
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-06-21 16:45 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Thu, Jun 15, 2023 at 01:26:40PM -0700, Steve Sistare wrote:
> Add a test case to verify that the suspended state is handled correctly in
> live migration.  The test suspends the src, migrates, then wakes the dest.
> 
> Add an option to suspend the src in a-b-bootblock.S, which puts the guest
> in S3 state after one round of writing to memory.  The option is enabled by
> poking a 1 into the suspend_me word in the boot block prior to starting the
> src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
> offset is known.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Thanks for the test case, mostly good to me, a few trivial comments /
questions below.

> ---
>  tests/migration/i386/Makefile        |  5 ++--
>  tests/migration/i386/a-b-bootblock.S | 49 +++++++++++++++++++++++++++++++++---
>  tests/migration/i386/a-b-bootblock.h | 22 ++++++++++------
>  tests/qtest/migration-helpers.c      |  2 +-
>  tests/qtest/migration-test.c         | 31 +++++++++++++++++++++--
>  5 files changed, 92 insertions(+), 17 deletions(-)
> 
> diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
> index 5c03241..37a72ae 100644
> --- a/tests/migration/i386/Makefile
> +++ b/tests/migration/i386/Makefile
> @@ -4,9 +4,10 @@
>  .PHONY: all clean
>  all: a-b-bootblock.h
>  
> -a-b-bootblock.h: x86.bootsect
> +a-b-bootblock.h: x86.bootsect x86.o
>  	echo "$$__note" > header.tmp
>  	xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
> +	nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
>  	mv header.tmp $@
>  
>  x86.bootsect: x86.boot
> @@ -16,7 +17,7 @@ x86.boot: x86.o
>  	$(CROSS_PREFIX)objcopy -O binary $< $@
>  
>  x86.o: a-b-bootblock.S
> -	$(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
> +	$(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
>  
>  clean:
>  	@rm -rf *.boot *.o *.bootsect
> diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S
> index 3d464c7..63d446f 100644
> --- a/tests/migration/i386/a-b-bootblock.S
> +++ b/tests/migration/i386/a-b-bootblock.S
> @@ -9,6 +9,21 @@
>  #
>  # Author: dgilbert@redhat.com
>  
> +#include "migration-test.h"
> +
> +#define ACPI_ENABLE         0xf1
> +#define ACPI_PORT_SMI_CMD   0xb2
> +#define ACPI_PM_BASE        0x600
> +#define PM1A_CNT_OFFSET     4
> +
> +#define ACPI_SCI_ENABLE     0x0001
> +#define ACPI_SLEEP_TYPE     0x0400
> +#define ACPI_SLEEP_ENABLE   0x2000
> +#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
> +
> +#define LOW_ADDR            X86_TEST_MEM_START
> +#define HIGH_ADDR           X86_TEST_MEM_END
> +#define suspended           HIGH_ADDR
>  
>  .code16
>  .org 0x7c00
> @@ -41,12 +56,11 @@ start:             # at 0x7c00 ?
>          # bl keeps a counter so we limit the output speed
>          mov $0, %bl
>  mainloop:
> -        # Start from 1MB
> -        mov $(1024*1024),%eax
> +        mov $LOW_ADDR,%eax
>  innerloop:
>          incb (%eax)
>          add $4096,%eax
> -        cmp $(100*1024*1024),%eax
> +        cmp $HIGH_ADDR,%eax
>          jl innerloop
>  
>          inc %bl
> @@ -57,7 +71,30 @@ innerloop:
>          mov $0x3f8,%dx
>          outb %al,%dx
>  
> -        jmp mainloop
> +        # should this test suspend?
> +        mov (suspend_me),%eax
> +        cmp $0,%eax
> +        je mainloop
> +
> +        # are we waking after suspend?  do not suspend again.
> +        mov $suspended,%eax

So IIUC then it'll use 4 bytes over 100MB range which means we need at
least 100MB+4bytes.. not obvious for a HIGH_ADDR definition to me..

Could we just define a variable inside the section like suspend_me?

> +        mov (%eax),%eax
> +        cmp $1,%eax
> +        je mainloop
> +
> +        # enable acpi
> +        mov $ACPI_ENABLE,%al
> +        outb %al,$ACPI_PORT_SMI_CMD
> +
> +        # suspend to ram
> +        mov $suspended,%eax
> +        movl $1,(%eax)
> +        mov $SLEEP,%ax
> +        mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
> +        outw %ax,%dx
> +        # not reached.  The wakeup causes reset and restart at 0x7c00, and we
> +        # do not save and restore registers as a real kernel would do.
> +
>  
>          # GDT magic from old (GPLv2)  Grub startup.S
>          .p2align        2       /* force 4-byte alignment */
> @@ -83,6 +120,10 @@ gdtdesc:
>          .word   0x27                    /* limit */
>          .long   gdt                     /* addr */
>  
> +        /* test launcher can poke a 1 here to exercise suspend */
> +suspend_me:
> +        .int  0
> +
>  /* I'm a bootable disk */
>  .org 0x7dfe
>          .byte 0x55
> diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h
> index b7b0fce..b39773f 100644
> --- a/tests/migration/i386/a-b-bootblock.h
> +++ b/tests/migration/i386/a-b-bootblock.h
> @@ -4,20 +4,20 @@
>   * the header and the assembler differences in your patch submission.
>   */
>  unsigned char x86_bootsect[] = {
> -  0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
> +  0xfa, 0x0f, 0x01, 0x16, 0xa4, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
>    0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
>    0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
>    0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
>    0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
>    0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8,
> -  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00,
> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
> -  0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00,
> -  0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> +  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xa1, 0xaa, 0x7c, 0x00, 0x00,
> +  0x83, 0xf8, 0x00, 0x74, 0xd3, 0xb8, 0x00, 0x00, 0x40, 0x06, 0x8b, 0x00,
> +  0x83, 0xf8, 0x01, 0x74, 0xc7, 0xb0, 0xf1, 0xe6, 0xb2, 0xb8, 0x00, 0x00,
> +  0x40, 0x06, 0xc7, 0x00, 0x01, 0x00, 0x00, 0x00, 0x66, 0xb8, 0x01, 0x24,
> +  0x66, 0xba, 0x04, 0x06, 0x66, 0xef, 0x66, 0x90, 0x00, 0x00, 0x00, 0x00,
> +  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
> +  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x8c, 0x7c,
>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> @@ -49,3 +49,9 @@ unsigned char x86_bootsect[] = {
>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x55, 0xaa
>  };
>  
> +#define SYM_gdt 0x00007c8c
> +#define SYM_gdtdesc 0x00007ca4
> +#define SYM_innerloop 0x00007c3d
> +#define SYM_mainloop 0x00007c38
> +#define SYM_start 0x00007c00
> +#define SYM_suspend_me 0x00007caa
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index be00c52..433d678 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -28,7 +28,7 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
>  {
>      bool *seen = opaque;
>  
> -    if (g_str_equal(name, "STOP")) {
> +    if (g_str_equal(name, "STOP") || g_str_equal(name, "SUSPEND")) {
>          *seen = true;

This is also a bit hachish.. "*seen" points to got_src_stop, so when
SUSPEND it'll set got_src_stop.  It's not clear what stage we're in even if
that's set, irrelevant of the confusing naming after being able to SUSPEND.

Shall we just add one got_src_suspended here and explicitly check that when
suspend=true in test?

>          return true;
>      }
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index b0c355b..73b07b3 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -121,7 +121,7 @@ static void init_bootfile(const char *bootpath, void *content, size_t len)
>  /*
>   * Wait for some output in the serial output file,
>   * we get an 'A' followed by an endless string of 'B's
> - * but on the destination we won't have the A.
> + * but on the destination we won't have the A (unless we enabled suspend/resume)
>   */
>  static void wait_for_serial(const char *side)
>  {
> @@ -507,6 +507,8 @@ typedef struct {
>      bool use_dirty_ring;
>      const char *opts_source;
>      const char *opts_target;
> +    /* suspend the src before migrating to dest. */
> +    bool suspend;
>  } MigrateStart;
>  
>  /*
> @@ -617,6 +619,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>          }
>      }
>  
> +    x86_bootsect[SYM_suspend_me - SYM_start] = args->suspend;
> +
>      got_src_stop = false;
>      got_dst_resume = false;
>      bootpath = g_strdup_printf("%s/bootsect", tmpfs);
> @@ -1475,7 +1479,13 @@ static void test_precopy_common(MigrateCommon *args)
>               */
>              wait_for_migration_complete(to);
>  
> -            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
> +            if (!args->start.suspend) {
> +                qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
> +            }

This is live==false path, if this test needs live=true then afaict this
path won't ever trigger.

Even if it triggers, "qmp_cont" skips for SUSPEND already, so I assume
we're fine.

> +        }
> +
> +        if (args->start.suspend) {
> +            qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");

Okay I did look up qmp_system_wakeup and I think it implicitly checks the
SUSPEND status, which is reasonable but not obvious.  Not sure whether
that's intended.

Shall we just check it explicitly with a query-status, even if so?

If keeping relying on qmp_system_wakeup not failing, I suggest we add a
comment explaining that this explicitly checks machine state so we're sure
the SUSPEND state is migrated properly.

>          }
>  
>          if (!got_dst_resume) {
> @@ -1508,6 +1518,18 @@ static void test_precopy_unix_plain(void)
>      test_precopy_common(&args);
>  }
>  
> +static void test_precopy_unix_suspend(void)
> +{
> +    g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
> +    MigrateCommon args = {
> +        .listen_uri = uri,
> +        .connect_uri = uri,
> +        .live = true,

We need explicit comment for all .live=true cases to state why.  Please
refer to:

    /*
     * Optional: whether the guest CPUs should be running during a precopy
     * migration test.  We used to always run with live but it took much
     * longer so we reduced live tests to only the ones that have solid
     * reason to be tested live-only.  For each of the new test cases for
     * precopy please provide justifications to use live explicitly (please
     * refer to existing ones with live=true), or use live=off by default.
     */
    bool live;

Thanks,

> +        .start.suspend = true,
> +    };
> +
> +    test_precopy_common(&args);
> +}
>  
>  static void test_precopy_unix_dirty_ring(void)
>  {
> @@ -2682,6 +2704,11 @@ int main(int argc, char **argv)
>  
>      module_call_init(MODULE_INIT_QOM);
>  
> +    if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> +        qtest_add_func("/migration/precopy/unix/suspend",
> +                       test_precopy_unix_suspend);
> +    }
> +
>      if (has_uffd) {
>          qtest_add_func("/migration/postcopy/plain", test_postcopy);
>          qtest_add_func("/migration/postcopy/recovery/plain",
> -- 
> 1.8.3.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-20 21:46   ` Peter Xu
@ 2023-06-21 19:15     ` Steven Sistare
  2023-06-21 20:28       ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Sistare @ 2023-06-21 19:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 6/20/2023 5:46 PM, Peter Xu wrote:
> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>> Migration of a guest in the suspended state is broken.  The incoming
>> migration code automatically tries to wake the guest, which IMO is
>> wrong -- the guest should end migration in the same state it started.
>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>> bypasses vm_start().  The guest appears to be in the running state, but
>> it is not.
>>
>> To fix, leave the guest in the suspended state, but call
>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>> later, when the client sends a system_wakeup command.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  migration/migration.c | 11 ++++-------
>>  softmmu/runstate.c    |  1 +
>>  2 files changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 17b4b47..851fe6d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>>          vm_start();
>>      } else {
>>          runstate_set(global_state_get_runstate());
>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
>> +            /* Force vm_start to be called later. */
>> +            qemu_system_start_on_wakeup_request();
>> +        }
> 
> Is this really needed, along with patch 1?
> 
> I have a very limited knowledge on suspension, so I'm prone to making
> mistakes..
> 
> But from what I read this, qemu_system_wakeup_request() (existing one, not
> after patch 1 applied) will setup wakeup_reason and kick the main thread
> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> the main thread later on after qemu_wakeup_requested() returns true.

Correct, here:

    if (qemu_wakeup_requested()) {
        pause_all_vcpus();
        qemu_system_wakeup();
        notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
        wakeup_reason = QEMU_WAKEUP_REASON_NONE;
        resume_all_vcpus();
        qapi_event_send_wakeup();
    }

However, that is not sufficient, because vm_start() was never called on the incoming
side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.


Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
guest, which sets the state to running, which is saved in global state:

    void migration_completion(MigrationState *s)
        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
        global_state_store()

Then the incoming migration calls vm_start here:

    migration/migration.c
        if (!global_state_received() ||
            global_state_get_runstate() == RUN_STATE_RUNNING) {
            if (autostart) {
                vm_start();

vm_start must be called for correctness.

- Steve

>>      }
>>      /*
>>       * This must happen after any state changes since as soon as an external
>> @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms)
>>      qemu_mutex_lock_iothread();
>>      trace_postcopy_start_set_run();
>>  
>> -    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>      global_state_store();
>>      ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>      if (ret < 0) {
>> @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s)
>>      if (s->state == MIGRATION_STATUS_ACTIVE) {
>>          qemu_mutex_lock_iothread();
>>          s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> -        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>  
>>          s->vm_old_state = runstate_get();
>>          global_state_store();
>> @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque)
>>  
>>      qemu_mutex_lock_iothread();
>>  
>> -    /*
>> -     * If VM is currently in suspended state, then, to make a valid runstate
>> -     * transition in vm_stop_force_state() we need to wakeup it up.
>> -     */
>> -    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> 
> Removal of these three places seems reasonable to me, or we won't persist
> the SUSPEND state.
> 
> Above comment was the major reason I used to have thought it was needed
> (again, based on zero knowledge around this..), but perhaps it was just
> wrong?  I would assume vm_stop_force_state() will still just work with
> suepended, am I right?
> 
>>      s->vm_old_state = runstate_get();
>>  
>>      global_state_store();
>> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
>> index e127b21..771896c 100644
>> --- a/softmmu/runstate.c
>> +++ b/softmmu/runstate.c
>> @@ -159,6 +159,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>      { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>> +    { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
>>      { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>>  
>> -- 
>> 1.8.3.1
>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 3/3] tests/qtest: live migration suspended state
  2023-06-21 16:45   ` Peter Xu
@ 2023-06-21 19:39     ` Steven Sistare
  2023-06-21 20:00       ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Sistare @ 2023-06-21 19:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 6/21/2023 12:45 PM, Peter Xu wrote:
> On Thu, Jun 15, 2023 at 01:26:40PM -0700, Steve Sistare wrote:
>> Add a test case to verify that the suspended state is handled correctly in
>> live migration.  The test suspends the src, migrates, then wakes the dest.
>>
>> Add an option to suspend the src in a-b-bootblock.S, which puts the guest
>> in S3 state after one round of writing to memory.  The option is enabled by
>> poking a 1 into the suspend_me word in the boot block prior to starting the
>> src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
>> offset is known.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> Thanks for the test case, mostly good to me, a few trivial comments /
> questions below.
> 
>> ---
>>  tests/migration/i386/Makefile        |  5 ++--
>>  tests/migration/i386/a-b-bootblock.S | 49 +++++++++++++++++++++++++++++++++---
>>  tests/migration/i386/a-b-bootblock.h | 22 ++++++++++------
>>  tests/qtest/migration-helpers.c      |  2 +-
>>  tests/qtest/migration-test.c         | 31 +++++++++++++++++++++--
>>  5 files changed, 92 insertions(+), 17 deletions(-)
>>
>> diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
>> index 5c03241..37a72ae 100644
>> --- a/tests/migration/i386/Makefile
>> +++ b/tests/migration/i386/Makefile
>> @@ -4,9 +4,10 @@
>>  .PHONY: all clean
>>  all: a-b-bootblock.h
>>  
>> -a-b-bootblock.h: x86.bootsect
>> +a-b-bootblock.h: x86.bootsect x86.o
>>  	echo "$$__note" > header.tmp
>>  	xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
>> +	nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
>>  	mv header.tmp $@
>>  
>>  x86.bootsect: x86.boot
>> @@ -16,7 +17,7 @@ x86.boot: x86.o
>>  	$(CROSS_PREFIX)objcopy -O binary $< $@
>>  
>>  x86.o: a-b-bootblock.S
>> -	$(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
>> +	$(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
>>  
>>  clean:
>>  	@rm -rf *.boot *.o *.bootsect
>> diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S
>> index 3d464c7..63d446f 100644
>> --- a/tests/migration/i386/a-b-bootblock.S
>> +++ b/tests/migration/i386/a-b-bootblock.S
>> @@ -9,6 +9,21 @@
>>  #
>>  # Author: dgilbert@redhat.com
>>  
>> +#include "migration-test.h"
>> +
>> +#define ACPI_ENABLE         0xf1
>> +#define ACPI_PORT_SMI_CMD   0xb2
>> +#define ACPI_PM_BASE        0x600
>> +#define PM1A_CNT_OFFSET     4
>> +
>> +#define ACPI_SCI_ENABLE     0x0001
>> +#define ACPI_SLEEP_TYPE     0x0400
>> +#define ACPI_SLEEP_ENABLE   0x2000
>> +#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
>> +
>> +#define LOW_ADDR            X86_TEST_MEM_START
>> +#define HIGH_ADDR           X86_TEST_MEM_END
>> +#define suspended           HIGH_ADDR
>>  
>>  .code16
>>  .org 0x7c00
>> @@ -41,12 +56,11 @@ start:             # at 0x7c00 ?
>>          # bl keeps a counter so we limit the output speed
>>          mov $0, %bl
>>  mainloop:
>> -        # Start from 1MB
>> -        mov $(1024*1024),%eax
>> +        mov $LOW_ADDR,%eax
>>  innerloop:
>>          incb (%eax)
>>          add $4096,%eax
>> -        cmp $(100*1024*1024),%eax
>> +        cmp $HIGH_ADDR,%eax
>>          jl innerloop
>>  
>>          inc %bl
>> @@ -57,7 +71,30 @@ innerloop:
>>          mov $0x3f8,%dx
>>          outb %al,%dx
>>  
>> -        jmp mainloop
>> +        # should this test suspend?
>> +        mov (suspend_me),%eax
>> +        cmp $0,%eax
>> +        je mainloop
>> +
>> +        # are we waking after suspend?  do not suspend again.
>> +        mov $suspended,%eax
> 
> So IIUC then it'll use 4 bytes over 100MB range which means we need at
> least 100MB+4bytes.. not obvious for a HIGH_ADDR definition to me..
> 
> Could we just define a variable inside the section like suspend_me?

No, because modifications to this memory backing the boot block are not
copied to the destination.  The dest reads a clean copy of the boot block
from disk, as specified by the qemu command line arguments.

>> +        mov (%eax),%eax
>> +        cmp $1,%eax
>> +        je mainloop
>> +
>> +        # enable acpi
>> +        mov $ACPI_ENABLE,%al
>> +        outb %al,$ACPI_PORT_SMI_CMD
>> +
>> +        # suspend to ram
>> +        mov $suspended,%eax
>> +        movl $1,(%eax)
>> +        mov $SLEEP,%ax
>> +        mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
>> +        outw %ax,%dx
>> +        # not reached.  The wakeup causes reset and restart at 0x7c00, and we
>> +        # do not save and restore registers as a real kernel would do.
>> +
>>  
>>          # GDT magic from old (GPLv2)  Grub startup.S
>>          .p2align        2       /* force 4-byte alignment */
>> @@ -83,6 +120,10 @@ gdtdesc:
>>          .word   0x27                    /* limit */
>>          .long   gdt                     /* addr */
>>  
>> +        /* test launcher can poke a 1 here to exercise suspend */
>> +suspend_me:
>> +        .int  0
>> +
>>  /* I'm a bootable disk */
>>  .org 0x7dfe
>>          .byte 0x55
>> diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h
>> index b7b0fce..b39773f 100644
>> --- a/tests/migration/i386/a-b-bootblock.h
>> +++ b/tests/migration/i386/a-b-bootblock.h
>> @@ -4,20 +4,20 @@
>>   * the header and the assembler differences in your patch submission.
>>   */
>>  unsigned char x86_bootsect[] = {
>> -  0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
>> +  0xfa, 0x0f, 0x01, 0x16, 0xa4, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
>>    0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
>>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
>>    0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
>>    0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
>>    0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
>>    0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8,
>> -  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00,
>> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
>> -  0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00,
>> -  0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>> -  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>> +  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xa1, 0xaa, 0x7c, 0x00, 0x00,
>> +  0x83, 0xf8, 0x00, 0x74, 0xd3, 0xb8, 0x00, 0x00, 0x40, 0x06, 0x8b, 0x00,
>> +  0x83, 0xf8, 0x01, 0x74, 0xc7, 0xb0, 0xf1, 0xe6, 0xb2, 0xb8, 0x00, 0x00,
>> +  0x40, 0x06, 0xc7, 0x00, 0x01, 0x00, 0x00, 0x00, 0x66, 0xb8, 0x01, 0x24,
>> +  0x66, 0xba, 0x04, 0x06, 0x66, 0xef, 0x66, 0x90, 0x00, 0x00, 0x00, 0x00,
>> +  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
>> +  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x8c, 0x7c,
>>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>> @@ -49,3 +49,9 @@ unsigned char x86_bootsect[] = {
>>    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x55, 0xaa
>>  };
>>  
>> +#define SYM_gdt 0x00007c8c
>> +#define SYM_gdtdesc 0x00007ca4
>> +#define SYM_innerloop 0x00007c3d
>> +#define SYM_mainloop 0x00007c38
>> +#define SYM_start 0x00007c00
>> +#define SYM_suspend_me 0x00007caa
>> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
>> index be00c52..433d678 100644
>> --- a/tests/qtest/migration-helpers.c
>> +++ b/tests/qtest/migration-helpers.c
>> @@ -28,7 +28,7 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
>>  {
>>      bool *seen = opaque;
>>  
>> -    if (g_str_equal(name, "STOP")) {
>> +    if (g_str_equal(name, "STOP") || g_str_equal(name, "SUSPEND")) {
>>          *seen = true;
> 
> This is also a bit hachish.. "*seen" points to got_src_stop, so when
> SUSPEND it'll set got_src_stop.  It's not clear what stage we're in even if
> that's set, irrelevant of the confusing naming after being able to SUSPEND.
> 
> Shall we just add one got_src_suspended here and explicitly check that when
> suspend=true in test?

OK.  I will move got_src_stop and got_src_suspend to the QTestState, and pass the
QTestState as the opaque pointer.  Ditto for got_dst_resume for consistency.
Then we can delete a few globals as a bonus.

>>          return true;
>>      }
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index b0c355b..73b07b3 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -121,7 +121,7 @@ static void init_bootfile(const char *bootpath, void *content, size_t len)
>>  /*
>>   * Wait for some output in the serial output file,
>>   * we get an 'A' followed by an endless string of 'B's
>> - * but on the destination we won't have the A.
>> + * but on the destination we won't have the A (unless we enabled suspend/resume)
>>   */
>>  static void wait_for_serial(const char *side)
>>  {
>> @@ -507,6 +507,8 @@ typedef struct {
>>      bool use_dirty_ring;
>>      const char *opts_source;
>>      const char *opts_target;
>> +    /* suspend the src before migrating to dest. */
>> +    bool suspend;
>>  } MigrateStart;
>>  
>>  /*
>> @@ -617,6 +619,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>>          }
>>      }
>>  
>> +    x86_bootsect[SYM_suspend_me - SYM_start] = args->suspend;
>> +
>>      got_src_stop = false;
>>      got_dst_resume = false;
>>      bootpath = g_strdup_printf("%s/bootsect", tmpfs);
>> @@ -1475,7 +1479,13 @@ static void test_precopy_common(MigrateCommon *args)
>>               */
>>              wait_for_migration_complete(to);
>>  
>> -            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
>> +            if (!args->start.suspend) {
>> +                qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
>> +            }
> 
> This is live==false path, if this test needs live=true then afaict this
> path won't ever trigger.

I verified that live==false works, but did not add a test case for it. 
> Even if it triggers, "qmp_cont" skips for SUSPEND already, so I assume
> we're fine.

OK, I can delete the check for that reason.

>> +        }
>> +
>> +        if (args->start.suspend) {
>> +            qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
> 
> Okay I did look up qmp_system_wakeup and I think it implicitly checks the
> SUSPEND status, which is reasonable but not obvious.  Not sure whether
> that's intended.
> 
> Shall we just check it explicitly with a query-status, even if so?
> 
> If keeping relying on qmp_system_wakeup not failing, I suggest we add a
> comment explaining that this explicitly checks machine state so we're sure
> the SUSPEND state is migrated properly.

I'll add a comment.  I intended it this way, because it works, simply.

>>          }
>>  
>>          if (!got_dst_resume) {
>> @@ -1508,6 +1518,18 @@ static void test_precopy_unix_plain(void)
>>      test_precopy_common(&args);
>>  }
>>  
>> +static void test_precopy_unix_suspend(void)
>> +{
>> +    g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>> +    MigrateCommon args = {
>> +        .listen_uri = uri,
>> +        .connect_uri = uri,
>> +        .live = true,
> 
> We need explicit comment for all .live=true cases to state why.  Please
> refer to:
> 
>     /*
>      * Optional: whether the guest CPUs should be running during a precopy
>      * migration test.  We used to always run with live but it took much
>      * longer so we reduced live tests to only the ones that have solid
>      * reason to be tested live-only.  For each of the new test cases for
>      * precopy please provide justifications to use live explicitly (please
>      * refer to existing ones with live=true), or use live=off by default.
>      */
>     bool live;
> 
> Thanks,

OK, I'll add a comment.  Live runs as fast as non-live for this new test case
because the source is not re-dirtying memory.

- Steve

> 
>> +        .start.suspend = true,
>> +    };
>> +
>> +    test_precopy_common(&args);
>> +}
>>  
>>  static void test_precopy_unix_dirty_ring(void)
>>  {
>> @@ -2682,6 +2704,11 @@ int main(int argc, char **argv)
>>  
>>      module_call_init(MODULE_INIT_QOM);
>>  
>> +    if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>> +        qtest_add_func("/migration/precopy/unix/suspend",
>> +                       test_precopy_unix_suspend);
>> +    }
>> +
>>      if (has_uffd) {
>>          qtest_add_func("/migration/postcopy/plain", test_postcopy);
>>          qtest_add_func("/migration/postcopy/recovery/plain",
>> -- 
>> 1.8.3.1
>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 3/3] tests/qtest: live migration suspended state
  2023-06-21 19:39     ` Steven Sistare
@ 2023-06-21 20:00       ` Peter Xu
  2023-06-22 21:28         ` Steven Sistare
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-06-21 20:00 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Wed, Jun 21, 2023 at 03:39:44PM -0400, Steven Sistare wrote:
> >> -        jmp mainloop
> >> +        # should this test suspend?
> >> +        mov (suspend_me),%eax
> >> +        cmp $0,%eax
> >> +        je mainloop
> >> +
> >> +        # are we waking after suspend?  do not suspend again.
> >> +        mov $suspended,%eax
> > 
> > So IIUC then it'll use 4 bytes over 100MB range which means we need at
> > least 100MB+4bytes.. not obvious for a HIGH_ADDR definition to me..
> > 
> > Could we just define a variable inside the section like suspend_me?
> 
> No, because modifications to this memory backing the boot block are not
> copied to the destination.  The dest reads a clean copy of the boot block
> from disk, as specified by the qemu command line arguments.

Oh okay, can we use HIGH_ADDR-4, then?  I just still think it'll be nice if
we can keep HIGH_ADDR the high bar of the whole range.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-21 19:15     ` Steven Sistare
@ 2023-06-21 20:28       ` Peter Xu
  2023-06-23 18:25         ` Steven Sistare
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-06-21 20:28 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
> On 6/20/2023 5:46 PM, Peter Xu wrote:
> > On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> >> Migration of a guest in the suspended state is broken.  The incoming
> >> migration code automatically tries to wake the guest, which IMO is
> >> wrong -- the guest should end migration in the same state it started.
> >> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> >> bypasses vm_start().  The guest appears to be in the running state, but
> >> it is not.
> >>
> >> To fix, leave the guest in the suspended state, but call
> >> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> >> later, when the client sends a system_wakeup command.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  migration/migration.c | 11 ++++-------
> >>  softmmu/runstate.c    |  1 +
> >>  2 files changed, 5 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index 17b4b47..851fe6d 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
> >>          vm_start();
> >>      } else {
> >>          runstate_set(global_state_get_runstate());
> >> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
> >> +            /* Force vm_start to be called later. */
> >> +            qemu_system_start_on_wakeup_request();
> >> +        }
> > 
> > Is this really needed, along with patch 1?
> > 
> > I have a very limited knowledge on suspension, so I'm prone to making
> > mistakes..
> > 
> > But from what I read this, qemu_system_wakeup_request() (existing one, not
> > after patch 1 applied) will setup wakeup_reason and kick the main thread
> > using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> > the main thread later on after qemu_wakeup_requested() returns true.
> 
> Correct, here:
> 
>     if (qemu_wakeup_requested()) {
>         pause_all_vcpus();
>         qemu_system_wakeup();
>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>         resume_all_vcpus();
>         qapi_event_send_wakeup();
>     }
> 
> However, that is not sufficient, because vm_start() was never called on the incoming
> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
> 
> 
> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
> guest, which sets the state to running, which is saved in global state:
> 
>     void migration_completion(MigrationState *s)
>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>         global_state_store()
> 
> Then the incoming migration calls vm_start here:
> 
>     migration/migration.c
>         if (!global_state_received() ||
>             global_state_get_runstate() == RUN_STATE_RUNNING) {
>             if (autostart) {
>                 vm_start();
> 
> vm_start must be called for correctness.

I see.  Though I had a feeling that this is still not the right way to do,
at least not as clean.

One question is, would above work for postcopy when VM is suspended during
the switchover?

I think I see your point that vm_start() (mostly vm_prepare_start())
contains a bunch of operations that maybe we must have before starting the
VM, but then.. should we just make that vm_start() unconditional when
loading VM completes?  I just don't see anything won't need it (besides
-S), even COLO.

So I'm wondering about something like this:

===8<===
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
 
     dirty_bitmap_mig_before_vm_start();
 
-    if (!global_state_received() ||
-        global_state_get_runstate() == RUN_STATE_RUNNING) {
-        if (autostart) {
-            vm_start();
-        } else {
-            runstate_set(RUN_STATE_PAUSED);
-        }
-    } else if (migration_incoming_colo_enabled()) {
+    if (migration_incoming_colo_enabled()) {
         migration_incoming_disable_colo();
+        /* COLO should always have autostart=1 or we can enforce it here */
+    }
+
+    if (autostart) {
+        RunState run_state = global_state_get_runstate();
         vm_start();
+        switch (run_state) {
+        case RUN_STATE_RUNNING:
+            break;
+        case RUN_STATE_SUSPENDED:
+            qemu_system_suspend();
+            break;
+        default:
+            runstate_set(run_state);
+            break;
+        }
     } else {
-        runstate_set(global_state_get_runstate());
+        runstate_set(RUN_STATE_PAUSED);
     }
===8<===

IIUC this can drop qemu_system_start_on_wakeup_request() along with the
other global var.  Would something like it work for us?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 3/3] tests/qtest: live migration suspended state
  2023-06-21 20:00       ` Peter Xu
@ 2023-06-22 21:28         ` Steven Sistare
  0 siblings, 0 replies; 22+ messages in thread
From: Steven Sistare @ 2023-06-22 21:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 6/21/2023 4:00 PM, Peter Xu wrote:
> On Wed, Jun 21, 2023 at 03:39:44PM -0400, Steven Sistare wrote:
>>>> -        jmp mainloop
>>>> +        # should this test suspend?
>>>> +        mov (suspend_me),%eax
>>>> +        cmp $0,%eax
>>>> +        je mainloop
>>>> +
>>>> +        # are we waking after suspend?  do not suspend again.
>>>> +        mov $suspended,%eax
>>>
>>> So IIUC then it'll use 4 bytes over 100MB range which means we need at
>>> least 100MB+4bytes.. not obvious for a HIGH_ADDR definition to me..
>>>
>>> Could we just define a variable inside the section like suspend_me?
>>
>> No, because modifications to this memory backing the boot block are not
>> copied to the destination.  The dest reads a clean copy of the boot block
>> from disk, as specified by the qemu command line arguments.
> 
> Oh okay, can we use HIGH_ADDR-4, then?  I just still think it'll be nice if
> we can keep HIGH_ADDR the high bar of the whole range.

Sure.  I'll use LOW_ADDR + 4, and add a comment.

- Steve


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-21 20:28       ` Peter Xu
@ 2023-06-23 18:25         ` Steven Sistare
  2023-06-23 19:56           ` Steven Sistare
  2023-06-26 18:27           ` Peter Xu
  0 siblings, 2 replies; 22+ messages in thread
From: Steven Sistare @ 2023-06-23 18:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 6/21/2023 4:28 PM, Peter Xu wrote:
> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
>> On 6/20/2023 5:46 PM, Peter Xu wrote:
>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>>>> Migration of a guest in the suspended state is broken.  The incoming
>>>> migration code automatically tries to wake the guest, which IMO is
>>>> wrong -- the guest should end migration in the same state it started.
>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>>>> bypasses vm_start().  The guest appears to be in the running state, but
>>>> it is not.
>>>>
>>>> To fix, leave the guest in the suspended state, but call
>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>>>> later, when the client sends a system_wakeup command.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>>  migration/migration.c | 11 ++++-------
>>>>  softmmu/runstate.c    |  1 +
>>>>  2 files changed, 5 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 17b4b47..851fe6d 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>>>>          vm_start();
>>>>      } else {
>>>>          runstate_set(global_state_get_runstate());
>>>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
>>>> +            /* Force vm_start to be called later. */
>>>> +            qemu_system_start_on_wakeup_request();
>>>> +        }
>>>
>>> Is this really needed, along with patch 1?
>>>
>>> I have a very limited knowledge on suspension, so I'm prone to making
>>> mistakes..
>>>
>>> But from what I read this, qemu_system_wakeup_request() (existing one, not
>>> after patch 1 applied) will setup wakeup_reason and kick the main thread
>>> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
>>> the main thread later on after qemu_wakeup_requested() returns true.
>>
>> Correct, here:
>>
>>     if (qemu_wakeup_requested()) {
>>         pause_all_vcpus();
>>         qemu_system_wakeup();
>>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>>         resume_all_vcpus();
>>         qapi_event_send_wakeup();
>>     }
>>
>> However, that is not sufficient, because vm_start() was never called on the incoming
>> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
>>
>>
>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
>> guest, which sets the state to running, which is saved in global state:
>>
>>     void migration_completion(MigrationState *s)
>>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>         global_state_store()
>>
>> Then the incoming migration calls vm_start here:
>>
>>     migration/migration.c
>>         if (!global_state_received() ||
>>             global_state_get_runstate() == RUN_STATE_RUNNING) {
>>             if (autostart) {
>>                 vm_start();
>>
>> vm_start must be called for correctness.
> 
> I see.  Though I had a feeling that this is still not the right way to do,
> at least not as clean.
> 
> One question is, would above work for postcopy when VM is suspended during
> the switchover?

Good catch, that is broken.
I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
and now it works.

    if (global_state_get_runstate() == RUN_STATE_RUNNING) {
        if (autostart) {
            vm_start();
        } else {
            runstate_set(RUN_STATE_PAUSED);
        }
    } else {
        runstate_set(global_state_get_runstate());
        if (runstate_check(RUN_STATE_SUSPENDED)) {
            qemu_system_start_on_wakeup_request();
        }
    }

> I think I see your point that vm_start() (mostly vm_prepare_start())
> contains a bunch of operations that maybe we must have before starting the
> VM, but then.. should we just make that vm_start() unconditional when
> loading VM completes?  I just don't see anything won't need it (besides
> -S), even COLO.
> 
> So I'm wondering about something like this:
> 
> ===8<===
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
>  
>      dirty_bitmap_mig_before_vm_start();
>  
> -    if (!global_state_received() ||
> -        global_state_get_runstate() == RUN_STATE_RUNNING) {
> -        if (autostart) {
> -            vm_start();
> -        } else {
> -            runstate_set(RUN_STATE_PAUSED);
> -        }
> -    } else if (migration_incoming_colo_enabled()) {
> +    if (migration_incoming_colo_enabled()) {
>          migration_incoming_disable_colo();
> +        /* COLO should always have autostart=1 or we can enforce it here */
> +    }
> +
> +    if (autostart) {
> +        RunState run_state = global_state_get_runstate();
>          vm_start();

This will resume the guest for a brief time, against the user's wishes.  Not OK IMO.

> +        switch (run_state) {
> +        case RUN_STATE_RUNNING:
> +            break;
> +        case RUN_STATE_SUSPENDED:
> +            qemu_system_suspend();

qemu_system_suspend will not cause the guest to suspend.  It is qemu's response after
the guest initiates a suspend.

> +            break;
> +        default:
> +            runstate_set(run_state);
> +            break;
> +        }
>      } else {
> -        runstate_set(global_state_get_runstate());
> +        runstate_set(RUN_STATE_PAUSED);
>      }
> ===8<===
> 
> IIUC this can drop qemu_system_start_on_wakeup_request() along with the
> other global var.  Would something like it work for us?

Afraid not.  start_on_wake is the only correct solution I can think of.

- Steve


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-23 18:25         ` Steven Sistare
@ 2023-06-23 19:56           ` Steven Sistare
  2023-06-26 18:27           ` Peter Xu
  1 sibling, 0 replies; 22+ messages in thread
From: Steven Sistare @ 2023-06-23 19:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 6/23/2023 2:25 PM, Steven Sistare wrote:
> On 6/21/2023 4:28 PM, Peter Xu wrote:
>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
>>> On 6/20/2023 5:46 PM, Peter Xu wrote:
>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>>>>> Migration of a guest in the suspended state is broken.  The incoming
>>>>> migration code automatically tries to wake the guest, which IMO is
>>>>> wrong -- the guest should end migration in the same state it started.
>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>>>>> bypasses vm_start().  The guest appears to be in the running state, but
>>>>> it is not.
>>>>>
>>>>> To fix, leave the guest in the suspended state, but call
>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>>>>> later, when the client sends a system_wakeup command.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> ---
>>>>>  migration/migration.c | 11 ++++-------
>>>>>  softmmu/runstate.c    |  1 +
>>>>>  2 files changed, 5 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>> index 17b4b47..851fe6d 100644
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>>>>>          vm_start();
>>>>>      } else {
>>>>>          runstate_set(global_state_get_runstate());
>>>>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
>>>>> +            /* Force vm_start to be called later. */
>>>>> +            qemu_system_start_on_wakeup_request();
>>>>> +        }
>>>>
>>>> Is this really needed, along with patch 1?
>>>>
>>>> I have a very limited knowledge on suspension, so I'm prone to making
>>>> mistakes..
>>>>
>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not
>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread
>>>> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
>>>> the main thread later on after qemu_wakeup_requested() returns true.
>>>
>>> Correct, here:
>>>
>>>     if (qemu_wakeup_requested()) {
>>>         pause_all_vcpus();
>>>         qemu_system_wakeup();
>>>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>>>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>>>         resume_all_vcpus();
>>>         qapi_event_send_wakeup();
>>>     }
>>>
>>> However, that is not sufficient, because vm_start() was never called on the incoming
>>> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
>>>
>>>
>>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
>>> guest, which sets the state to running, which is saved in global state:
>>>
>>>     void migration_completion(MigrationState *s)
>>>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>>         global_state_store()
>>>
>>> Then the incoming migration calls vm_start here:
>>>
>>>     migration/migration.c
>>>         if (!global_state_received() ||
>>>             global_state_get_runstate() == RUN_STATE_RUNNING) {
>>>             if (autostart) {
>>>                 vm_start();
>>>
>>> vm_start must be called for correctness.
>>
>> I see.  Though I had a feeling that this is still not the right way to do,
>> at least not as clean.
>>
>> One question is, would above work for postcopy when VM is suspended during
>> the switchover?
> 
> Good catch, that is broken.
> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
> and now it works.
> 
>     if (global_state_get_runstate() == RUN_STATE_RUNNING) {
>         if (autostart) {
>             vm_start();
>         } else {
>             runstate_set(RUN_STATE_PAUSED);
>         }
>     } else {
>         runstate_set(global_state_get_runstate());
>         if (runstate_check(RUN_STATE_SUSPENDED)) {
>             qemu_system_start_on_wakeup_request();
>         }
>     }
> 
>> I think I see your point that vm_start() (mostly vm_prepare_start())
>> contains a bunch of operations that maybe we must have before starting the
>> VM, but then.. should we just make that vm_start() unconditional when
>> loading VM completes?  I just don't see anything won't need it (besides
>> -S), even COLO.
>>
>> So I'm wondering about something like this:
>>
>> ===8<===
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
>>  
>>      dirty_bitmap_mig_before_vm_start();
>>  
>> -    if (!global_state_received() ||
>> -        global_state_get_runstate() == RUN_STATE_RUNNING) {
>> -        if (autostart) {
>> -            vm_start();
>> -        } else {
>> -            runstate_set(RUN_STATE_PAUSED);
>> -        }
>> -    } else if (migration_incoming_colo_enabled()) {
>> +    if (migration_incoming_colo_enabled()) {
>>          migration_incoming_disable_colo();
>> +        /* COLO should always have autostart=1 or we can enforce it here */
>> +    }
>> +
>> +    if (autostart) {
>> +        RunState run_state = global_state_get_runstate();
>>          vm_start();
> 
> This will resume the guest for a brief time, against the user's wishes.  Not OK IMO.
> 
>> +        switch (run_state) {
>> +        case RUN_STATE_RUNNING:
>> +            break;
>> +        case RUN_STATE_SUSPENDED:
>> +            qemu_system_suspend();
> 
> qemu_system_suspend will not cause the guest to suspend.  It is qemu's response after
> the guest initiates a suspend.
> 
>> +            break;
>> +        default:
>> +            runstate_set(run_state);
>> +            break;
>> +        }
>>      } else {
>> -        runstate_set(global_state_get_runstate());
>> +        runstate_set(RUN_STATE_PAUSED);
>>      }
>> ===8<===
>>
>> IIUC this can drop qemu_system_start_on_wakeup_request() along with the
>> other global var.  Would something like it work for us?
> 
> Afraid not.  start_on_wake is the only correct solution I can think of.
> 
> - Steve

Actually, the start_on_wake concept is indeed required, but I can hide the
details in softmmu/cpus.s:

static bool vm_never_started = true;

void vm_start(void)
{
    if (!vm_prepare_start(false)) {
        vm_never_started = false;
        resume_all_vcpus();
    }
}

void vm_wakeup(void)
{
    if (vm_never_started) {
        vm_start();
    } else {
        runstate_set(RUN_STATE_RUNNING);
    }
}

... and qemu_system_wakeup_request() calls vm_wakeup().

Now the migration code simply sets the run state (eg SUSPENDED), no more 
calling qemu_system_start_on_wakeup_request.

- Steve


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-23 18:25         ` Steven Sistare
  2023-06-23 19:56           ` Steven Sistare
@ 2023-06-26 18:27           ` Peter Xu
  2023-06-26 19:11             ` Peter Xu
  2023-06-30 13:50             ` Steven Sistare
  1 sibling, 2 replies; 22+ messages in thread
From: Peter Xu @ 2023-06-26 18:27 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote:
> On 6/21/2023 4:28 PM, Peter Xu wrote:
> > On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
> >> On 6/20/2023 5:46 PM, Peter Xu wrote:
> >>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> >>>> Migration of a guest in the suspended state is broken.  The incoming
> >>>> migration code automatically tries to wake the guest, which IMO is
> >>>> wrong -- the guest should end migration in the same state it started.
> >>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> >>>> bypasses vm_start().  The guest appears to be in the running state, but
> >>>> it is not.
> >>>>
> >>>> To fix, leave the guest in the suspended state, but call
> >>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> >>>> later, when the client sends a system_wakeup command.
> >>>>
> >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>>> ---
> >>>>  migration/migration.c | 11 ++++-------
> >>>>  softmmu/runstate.c    |  1 +
> >>>>  2 files changed, 5 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/migration/migration.c b/migration/migration.c
> >>>> index 17b4b47..851fe6d 100644
> >>>> --- a/migration/migration.c
> >>>> +++ b/migration/migration.c
> >>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
> >>>>          vm_start();
> >>>>      } else {
> >>>>          runstate_set(global_state_get_runstate());
> >>>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
> >>>> +            /* Force vm_start to be called later. */
> >>>> +            qemu_system_start_on_wakeup_request();
> >>>> +        }
> >>>
> >>> Is this really needed, along with patch 1?
> >>>
> >>> I have a very limited knowledge on suspension, so I'm prone to making
> >>> mistakes..
> >>>
> >>> But from what I read this, qemu_system_wakeup_request() (existing one, not
> >>> after patch 1 applied) will setup wakeup_reason and kick the main thread
> >>> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> >>> the main thread later on after qemu_wakeup_requested() returns true.
> >>
> >> Correct, here:
> >>
> >>     if (qemu_wakeup_requested()) {
> >>         pause_all_vcpus();
> >>         qemu_system_wakeup();
> >>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
> >>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
> >>         resume_all_vcpus();
> >>         qapi_event_send_wakeup();
> >>     }
> >>
> >> However, that is not sufficient, because vm_start() was never called on the incoming
> >> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
> >>
> >>
> >> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
> >> guest, which sets the state to running, which is saved in global state:
> >>
> >>     void migration_completion(MigrationState *s)
> >>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> >>         global_state_store()
> >>
> >> Then the incoming migration calls vm_start here:
> >>
> >>     migration/migration.c
> >>         if (!global_state_received() ||
> >>             global_state_get_runstate() == RUN_STATE_RUNNING) {
> >>             if (autostart) {
> >>                 vm_start();
> >>
> >> vm_start must be called for correctness.
> > 
> > I see.  Though I had a feeling that this is still not the right way to do,
> > at least not as clean.
> > 
> > One question is, would above work for postcopy when VM is suspended during
> > the switchover?
> 
> Good catch, that is broken.
> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
> and now it works.
> 
>     if (global_state_get_runstate() == RUN_STATE_RUNNING) {
>         if (autostart) {
>             vm_start();
>         } else {
>             runstate_set(RUN_STATE_PAUSED);
>         }
>     } else {
>         runstate_set(global_state_get_runstate());
>         if (runstate_check(RUN_STATE_SUSPENDED)) {
>             qemu_system_start_on_wakeup_request();
>         }
>     }
> 
> > I think I see your point that vm_start() (mostly vm_prepare_start())
> > contains a bunch of operations that maybe we must have before starting the
> > VM, but then.. should we just make that vm_start() unconditional when
> > loading VM completes?  I just don't see anything won't need it (besides
> > -S), even COLO.
> > 
> > So I'm wondering about something like this:
> > 
> > ===8<===
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
> >  
> >      dirty_bitmap_mig_before_vm_start();
> >  
> > -    if (!global_state_received() ||
> > -        global_state_get_runstate() == RUN_STATE_RUNNING) {
> > -        if (autostart) {
> > -            vm_start();
> > -        } else {
> > -            runstate_set(RUN_STATE_PAUSED);
> > -        }
> > -    } else if (migration_incoming_colo_enabled()) {
> > +    if (migration_incoming_colo_enabled()) {
> >          migration_incoming_disable_colo();
> > +        /* COLO should always have autostart=1 or we can enforce it here */
> > +    }
> > +
> > +    if (autostart) {
> > +        RunState run_state = global_state_get_runstate();
> >          vm_start();
> 
> This will resume the guest for a brief time, against the user's wishes.  Not OK IMO.

Ah okay..

Can we do whatever we need in vm_prepare_start(), then? I assume these
chunks are needed:

    /*
     * WHPX accelerator needs to know whether we are going to step
     * any CPUs, before starting the first one.
     */
    if (cpus_accel->synchronize_pre_resume) {
        cpus_accel->synchronize_pre_resume(step_pending);
    }

    /* We are sending this now, but the CPUs will be resumed shortly later */
    qapi_event_send_resume();

    cpu_enable_ticks();

While these may not be needed, but instead only needed if RUN_STATE_RUNNING
below (runstate_set() will be needed regardless):

    runstate_set(RUN_STATE_RUNNING);
    vm_state_notify(1, RUN_STATE_RUNNING);

So here's another of my attempt (this time also taking
!global_state_received() into account)...

    RunState run_state;

    if (migration_incoming_colo_enabled()) {
        migration_incoming_disable_colo();
    }

    if (!global_state_received()) {
        run_state = RUN_STATE_RUNNING;
    } else {
        run_state = global_state_get_runstate();
    }

    if (autostart) {
        /* Part of vm_prepare_start(), may isolate into a helper? */
        if (cpus_accel->synchronize_pre_resume) {
            cpus_accel->synchronize_pre_resume(step_pending);
        }
        qapi_event_send_resume();
        cpu_enable_ticks();
        /* Setup the runstate on src */
        runstate_set(run_state);
        if (run_state == RUN_STATE_RUNNING) {
            vm_state_notify(1, RUN_STATE_RUNNING);
        }
    } else {
        runstate_set(RUN_STATE_PAUSED);
    }

The whole idea is still do whatever needed here besides resuming the vcpus,
rather than leaving the whole vm_start() into system wakeup.  It then can
avoid touching the system wakeup code which seems cleaner.

> > IIUC this can drop qemu_system_start_on_wakeup_request() along with the
> > other global var.  Would something like it work for us?
> 
> Afraid not.  start_on_wake is the only correct solution I can think of.

Please check again above, I just hope we can avoid yet another global to
QEMU if possible.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-26 18:27           ` Peter Xu
@ 2023-06-26 19:11             ` Peter Xu
  2023-06-30 13:50             ` Steven Sistare
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Xu @ 2023-06-26 19:11 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Mon, Jun 26, 2023 at 02:27:32PM -0400, Peter Xu wrote:
> On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote:
> > On 6/21/2023 4:28 PM, Peter Xu wrote:
> > > On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
> > >> On 6/20/2023 5:46 PM, Peter Xu wrote:
> > >>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> > >>>> Migration of a guest in the suspended state is broken.  The incoming
> > >>>> migration code automatically tries to wake the guest, which IMO is
> > >>>> wrong -- the guest should end migration in the same state it started.
> > >>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> > >>>> bypasses vm_start().  The guest appears to be in the running state, but
> > >>>> it is not.
> > >>>>
> > >>>> To fix, leave the guest in the suspended state, but call
> > >>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> > >>>> later, when the client sends a system_wakeup command.
> > >>>>
> > >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > >>>> ---
> > >>>>  migration/migration.c | 11 ++++-------
> > >>>>  softmmu/runstate.c    |  1 +
> > >>>>  2 files changed, 5 insertions(+), 7 deletions(-)
> > >>>>
> > >>>> diff --git a/migration/migration.c b/migration/migration.c
> > >>>> index 17b4b47..851fe6d 100644
> > >>>> --- a/migration/migration.c
> > >>>> +++ b/migration/migration.c
> > >>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
> > >>>>          vm_start();
> > >>>>      } else {
> > >>>>          runstate_set(global_state_get_runstate());
> > >>>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
> > >>>> +            /* Force vm_start to be called later. */
> > >>>> +            qemu_system_start_on_wakeup_request();
> > >>>> +        }
> > >>>
> > >>> Is this really needed, along with patch 1?
> > >>>
> > >>> I have a very limited knowledge on suspension, so I'm prone to making
> > >>> mistakes..
> > >>>
> > >>> But from what I read this, qemu_system_wakeup_request() (existing one, not
> > >>> after patch 1 applied) will setup wakeup_reason and kick the main thread
> > >>> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> > >>> the main thread later on after qemu_wakeup_requested() returns true.
> > >>
> > >> Correct, here:
> > >>
> > >>     if (qemu_wakeup_requested()) {
> > >>         pause_all_vcpus();
> > >>         qemu_system_wakeup();
> > >>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
> > >>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
> > >>         resume_all_vcpus();
> > >>         qapi_event_send_wakeup();
> > >>     }
> > >>
> > >> However, that is not sufficient, because vm_start() was never called on the incoming
> > >> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
> > >>
> > >>
> > >> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
> > >> guest, which sets the state to running, which is saved in global state:
> > >>
> > >>     void migration_completion(MigrationState *s)
> > >>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> > >>         global_state_store()
> > >>
> > >> Then the incoming migration calls vm_start here:
> > >>
> > >>     migration/migration.c
> > >>         if (!global_state_received() ||
> > >>             global_state_get_runstate() == RUN_STATE_RUNNING) {
> > >>             if (autostart) {
> > >>                 vm_start();
> > >>
> > >> vm_start must be called for correctness.
> > > 
> > > I see.  Though I had a feeling that this is still not the right way to do,
> > > at least not as clean.
> > > 
> > > One question is, would above work for postcopy when VM is suspended during
> > > the switchover?
> > 
> > Good catch, that is broken.
> > I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
> > and now it works.
> > 
> >     if (global_state_get_runstate() == RUN_STATE_RUNNING) {
> >         if (autostart) {
> >             vm_start();
> >         } else {
> >             runstate_set(RUN_STATE_PAUSED);
> >         }
> >     } else {
> >         runstate_set(global_state_get_runstate());
> >         if (runstate_check(RUN_STATE_SUSPENDED)) {
> >             qemu_system_start_on_wakeup_request();
> >         }
> >     }
> > 
> > > I think I see your point that vm_start() (mostly vm_prepare_start())
> > > contains a bunch of operations that maybe we must have before starting the
> > > VM, but then.. should we just make that vm_start() unconditional when
> > > loading VM completes?  I just don't see anything won't need it (besides
> > > -S), even COLO.
> > > 
> > > So I'm wondering about something like this:
> > > 
> > > ===8<===
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
> > >  
> > >      dirty_bitmap_mig_before_vm_start();
> > >  
> > > -    if (!global_state_received() ||
> > > -        global_state_get_runstate() == RUN_STATE_RUNNING) {
> > > -        if (autostart) {
> > > -            vm_start();
> > > -        } else {
> > > -            runstate_set(RUN_STATE_PAUSED);
> > > -        }
> > > -    } else if (migration_incoming_colo_enabled()) {
> > > +    if (migration_incoming_colo_enabled()) {
> > >          migration_incoming_disable_colo();
> > > +        /* COLO should always have autostart=1 or we can enforce it here */
> > > +    }
> > > +
> > > +    if (autostart) {
> > > +        RunState run_state = global_state_get_runstate();
> > >          vm_start();
> > 
> > This will resume the guest for a brief time, against the user's wishes.  Not OK IMO.
> 
> Ah okay..
> 
> Can we do whatever we need in vm_prepare_start(), then? I assume these
> chunks are needed:
> 
>     /*
>      * WHPX accelerator needs to know whether we are going to step
>      * any CPUs, before starting the first one.
>      */
>     if (cpus_accel->synchronize_pre_resume) {
>         cpus_accel->synchronize_pre_resume(step_pending);
>     }
> 
>     /* We are sending this now, but the CPUs will be resumed shortly later */
>     qapi_event_send_resume();
> 
>     cpu_enable_ticks();
> 
> While these may not be needed, but instead only needed if RUN_STATE_RUNNING
> below (runstate_set() will be needed regardless):
> 
>     runstate_set(RUN_STATE_RUNNING);
>     vm_state_notify(1, RUN_STATE_RUNNING);
> 
> So here's another of my attempt (this time also taking
> !global_state_received() into account)...
> 
>     RunState run_state;
> 
>     if (migration_incoming_colo_enabled()) {
>         migration_incoming_disable_colo();
>     }
> 
>     if (!global_state_received()) {
>         run_state = RUN_STATE_RUNNING;
>     } else {
>         run_state = global_state_get_runstate();
>     }
> 
>     if (autostart) {
>         /* Part of vm_prepare_start(), may isolate into a helper? */
>         if (cpus_accel->synchronize_pre_resume) {
>             cpus_accel->synchronize_pre_resume(step_pending);
>         }
>         qapi_event_send_resume();
>         cpu_enable_ticks();
>         /* Setup the runstate on src */
>         runstate_set(run_state);
>         if (run_state == RUN_STATE_RUNNING) {
>             vm_state_notify(1, RUN_STATE_RUNNING);

And here I at least forgot:

              resume_all_vcpus();

The pesudo code just to illustrate the whole idea.  Definitely not tested
in any form, so sorry if it won't even compile..

>         }
>     } else {
>         runstate_set(RUN_STATE_PAUSED);
>     }
> 
> The whole idea is still do whatever needed here besides resuming the vcpus,
> rather than leaving the whole vm_start() into system wakeup.  It then can
> avoid touching the system wakeup code which seems cleaner.
> 
> > > IIUC this can drop qemu_system_start_on_wakeup_request() along with the
> > > other global var.  Would something like it work for us?
> > 
> > Afraid not.  start_on_wake is the only correct solution I can think of.
> 
> Please check again above, I just hope we can avoid yet another global to
> QEMU if possible.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-26 18:27           ` Peter Xu
  2023-06-26 19:11             ` Peter Xu
@ 2023-06-30 13:50             ` Steven Sistare
  2023-07-26 20:18               ` Peter Xu
  1 sibling, 1 reply; 22+ messages in thread
From: Steven Sistare @ 2023-06-30 13:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 6/26/2023 2:27 PM, Peter Xu wrote:
> On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote:
>> On 6/21/2023 4:28 PM, Peter Xu wrote:
>>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
>>>> On 6/20/2023 5:46 PM, Peter Xu wrote:
>>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>>>>>> Migration of a guest in the suspended state is broken.  The incoming
>>>>>> migration code automatically tries to wake the guest, which IMO is
>>>>>> wrong -- the guest should end migration in the same state it started.
>>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>>>>>> bypasses vm_start().  The guest appears to be in the running state, but
>>>>>> it is not.
>>>>>>
>>>>>> To fix, leave the guest in the suspended state, but call
>>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>>>>>> later, when the client sends a system_wakeup command.
>>>>>>
>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>> ---
>>>>>>  migration/migration.c | 11 ++++-------
>>>>>>  softmmu/runstate.c    |  1 +
>>>>>>  2 files changed, 5 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>> index 17b4b47..851fe6d 100644
>>>>>> --- a/migration/migration.c
>>>>>> +++ b/migration/migration.c
>>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>>>>>>          vm_start();
>>>>>>      } else {
>>>>>>          runstate_set(global_state_get_runstate());
>>>>>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
>>>>>> +            /* Force vm_start to be called later. */
>>>>>> +            qemu_system_start_on_wakeup_request();
>>>>>> +        }
>>>>>
>>>>> Is this really needed, along with patch 1?
>>>>>
>>>>> I have a very limited knowledge on suspension, so I'm prone to making
>>>>> mistakes..
>>>>>
>>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not
>>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread
>>>>> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
>>>>> the main thread later on after qemu_wakeup_requested() returns true.
>>>>
>>>> Correct, here:
>>>>
>>>>     if (qemu_wakeup_requested()) {
>>>>         pause_all_vcpus();
>>>>         qemu_system_wakeup();
>>>>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>>>>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>>>>         resume_all_vcpus();
>>>>         qapi_event_send_wakeup();
>>>>     }
>>>>
>>>> However, that is not sufficient, because vm_start() was never called on the incoming
>>>> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
>>>>
>>>>
>>>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
>>>> guest, which sets the state to running, which is saved in global state:
>>>>
>>>>     void migration_completion(MigrationState *s)
>>>>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>>>         global_state_store()
>>>>
>>>> Then the incoming migration calls vm_start here:
>>>>
>>>>     migration/migration.c
>>>>         if (!global_state_received() ||
>>>>             global_state_get_runstate() == RUN_STATE_RUNNING) {
>>>>             if (autostart) {
>>>>                 vm_start();
>>>>
>>>> vm_start must be called for correctness.
>>>
>>> I see.  Though I had a feeling that this is still not the right way to do,
>>> at least not as clean.
>>>
>>> One question is, would above work for postcopy when VM is suspended during
>>> the switchover?
>>
>> Good catch, that is broken.
>> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
>> and now it works.
>>
>>     if (global_state_get_runstate() == RUN_STATE_RUNNING) {
>>         if (autostart) {
>>             vm_start();
>>         } else {
>>             runstate_set(RUN_STATE_PAUSED);
>>         }
>>     } else {
>>         runstate_set(global_state_get_runstate());
>>         if (runstate_check(RUN_STATE_SUSPENDED)) {
>>             qemu_system_start_on_wakeup_request();
>>         }
>>     }
>>
>>> I think I see your point that vm_start() (mostly vm_prepare_start())
>>> contains a bunch of operations that maybe we must have before starting the
>>> VM, but then.. should we just make that vm_start() unconditional when
>>> loading VM completes?  I just don't see anything won't need it (besides
>>> -S), even COLO.
>>>
>>> So I'm wondering about something like this:
>>>
>>> ===8<===
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
>>>  
>>>      dirty_bitmap_mig_before_vm_start();
>>>  
>>> -    if (!global_state_received() ||
>>> -        global_state_get_runstate() == RUN_STATE_RUNNING) {
>>> -        if (autostart) {
>>> -            vm_start();
>>> -        } else {
>>> -            runstate_set(RUN_STATE_PAUSED);
>>> -        }
>>> -    } else if (migration_incoming_colo_enabled()) {
>>> +    if (migration_incoming_colo_enabled()) {
>>>          migration_incoming_disable_colo();
>>> +        /* COLO should always have autostart=1 or we can enforce it here */
>>> +    }
>>> +
>>> +    if (autostart) {
>>> +        RunState run_state = global_state_get_runstate();
>>>          vm_start();
>>
>> This will resume the guest for a brief time, against the user's wishes.  Not OK IMO.
> 
> Ah okay..
> 
> Can we do whatever we need in vm_prepare_start(), then? I assume these
> chunks are needed:
> 
>     /*
>      * WHPX accelerator needs to know whether we are going to step
>      * any CPUs, before starting the first one.
>      */
>     if (cpus_accel->synchronize_pre_resume) {
>         cpus_accel->synchronize_pre_resume(step_pending);
>     }
> 
>     /* We are sending this now, but the CPUs will be resumed shortly later */
>     qapi_event_send_resume();
> 
>     cpu_enable_ticks();
> 
> While these may not be needed, but instead only needed if RUN_STATE_RUNNING
> below (runstate_set() will be needed regardless):
> 
>     runstate_set(RUN_STATE_RUNNING);
>     vm_state_notify(1, RUN_STATE_RUNNING);
> 
> So here's another of my attempt (this time also taking
> !global_state_received() into account)...
> 
>     RunState run_state;
> 
>     if (migration_incoming_colo_enabled()) {
>         migration_incoming_disable_colo();
>     }
> 
>     if (!global_state_received()) {
>         run_state = RUN_STATE_RUNNING;
>     } else {
>         run_state = global_state_get_runstate();
>     }
> 
>     if (autostart) {
>         /* Part of vm_prepare_start(), may isolate into a helper? */
>         if (cpus_accel->synchronize_pre_resume) {
>             cpus_accel->synchronize_pre_resume(step_pending);
>         }
>         qapi_event_send_resume();
>         cpu_enable_ticks();
>         /* Setup the runstate on src */
>         runstate_set(run_state);
>         if (run_state == RUN_STATE_RUNNING) {
>             vm_state_notify(1, RUN_STATE_RUNNING);
>         }
>     } else {
>         runstate_set(RUN_STATE_PAUSED);
>     }
> 
> The whole idea is still do whatever needed here besides resuming the vcpus,
> rather than leaving the whole vm_start() into system wakeup.  It then can
> avoid touching the system wakeup code which seems cleaner.

The problem is that some actions cannot be performed at migration finish time,
such as vm_state_notify RUN_STATE_RUNNING.  The wakeup code called later still 
needs to know that vm_state_notify has not been called, and call it.

I just posted a new series with a cleaner wakeup, but it still uses a global.

- Steve

>>> IIUC this can drop qemu_system_start_on_wakeup_request() along with the
>>> other global var.  Would something like it work for us?
>>
>> Afraid not.  start_on_wake is the only correct solution I can think of.
> 
> Please check again above, I just hope we can avoid yet another global to
> QEMU if possible.
> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-06-30 13:50             ` Steven Sistare
@ 2023-07-26 20:18               ` Peter Xu
  2023-08-14 18:53                 ` Steven Sistare
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-07-26 20:18 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Fri, Jun 30, 2023 at 09:50:41AM -0400, Steven Sistare wrote:
> On 6/26/2023 2:27 PM, Peter Xu wrote:
> > On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote:
> >> On 6/21/2023 4:28 PM, Peter Xu wrote:
> >>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
> >>>> On 6/20/2023 5:46 PM, Peter Xu wrote:
> >>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> >>>>>> Migration of a guest in the suspended state is broken.  The incoming
> >>>>>> migration code automatically tries to wake the guest, which IMO is
> >>>>>> wrong -- the guest should end migration in the same state it started.
> >>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> >>>>>> bypasses vm_start().  The guest appears to be in the running state, but
> >>>>>> it is not.
> >>>>>>
> >>>>>> To fix, leave the guest in the suspended state, but call
> >>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> >>>>>> later, when the client sends a system_wakeup command.
> >>>>>>
> >>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>>>>> ---
> >>>>>>  migration/migration.c | 11 ++++-------
> >>>>>>  softmmu/runstate.c    |  1 +
> >>>>>>  2 files changed, 5 insertions(+), 7 deletions(-)
> >>>>>>
> >>>>>> diff --git a/migration/migration.c b/migration/migration.c
> >>>>>> index 17b4b47..851fe6d 100644
> >>>>>> --- a/migration/migration.c
> >>>>>> +++ b/migration/migration.c
> >>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
> >>>>>>          vm_start();
> >>>>>>      } else {
> >>>>>>          runstate_set(global_state_get_runstate());
> >>>>>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
> >>>>>> +            /* Force vm_start to be called later. */
> >>>>>> +            qemu_system_start_on_wakeup_request();
> >>>>>> +        }
> >>>>>
> >>>>> Is this really needed, along with patch 1?
> >>>>>
> >>>>> I have a very limited knowledge on suspension, so I'm prone to making
> >>>>> mistakes..
> >>>>>
> >>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not
> >>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread
> >>>>> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> >>>>> the main thread later on after qemu_wakeup_requested() returns true.
> >>>>
> >>>> Correct, here:
> >>>>
> >>>>     if (qemu_wakeup_requested()) {
> >>>>         pause_all_vcpus();
> >>>>         qemu_system_wakeup();
> >>>>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
> >>>>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
> >>>>         resume_all_vcpus();
> >>>>         qapi_event_send_wakeup();
> >>>>     }
> >>>>
> >>>> However, that is not sufficient, because vm_start() was never called on the incoming
> >>>> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
> >>>>
> >>>>
> >>>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
> >>>> guest, which sets the state to running, which is saved in global state:
> >>>>
> >>>>     void migration_completion(MigrationState *s)
> >>>>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> >>>>         global_state_store()
> >>>>
> >>>> Then the incoming migration calls vm_start here:
> >>>>
> >>>>     migration/migration.c
> >>>>         if (!global_state_received() ||
> >>>>             global_state_get_runstate() == RUN_STATE_RUNNING) {
> >>>>             if (autostart) {
> >>>>                 vm_start();
> >>>>
> >>>> vm_start must be called for correctness.
> >>>
> >>> I see.  Though I had a feeling that this is still not the right way to do,
> >>> at least not as clean.
> >>>
> >>> One question is, would above work for postcopy when VM is suspended during
> >>> the switchover?
> >>
> >> Good catch, that is broken.
> >> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
> >> and now it works.
> >>
> >>     if (global_state_get_runstate() == RUN_STATE_RUNNING) {
> >>         if (autostart) {
> >>             vm_start();
> >>         } else {
> >>             runstate_set(RUN_STATE_PAUSED);
> >>         }
> >>     } else {
> >>         runstate_set(global_state_get_runstate());
> >>         if (runstate_check(RUN_STATE_SUSPENDED)) {
> >>             qemu_system_start_on_wakeup_request();
> >>         }
> >>     }
> >>
> >>> I think I see your point that vm_start() (mostly vm_prepare_start())
> >>> contains a bunch of operations that maybe we must have before starting the
> >>> VM, but then.. should we just make that vm_start() unconditional when
> >>> loading VM completes?  I just don't see anything won't need it (besides
> >>> -S), even COLO.
> >>>
> >>> So I'm wondering about something like this:
> >>>
> >>> ===8<===
> >>> --- a/migration/migration.c
> >>> +++ b/migration/migration.c
> >>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
> >>>  
> >>>      dirty_bitmap_mig_before_vm_start();
> >>>  
> >>> -    if (!global_state_received() ||
> >>> -        global_state_get_runstate() == RUN_STATE_RUNNING) {
> >>> -        if (autostart) {
> >>> -            vm_start();
> >>> -        } else {
> >>> -            runstate_set(RUN_STATE_PAUSED);
> >>> -        }
> >>> -    } else if (migration_incoming_colo_enabled()) {
> >>> +    if (migration_incoming_colo_enabled()) {
> >>>          migration_incoming_disable_colo();
> >>> +        /* COLO should always have autostart=1 or we can enforce it here */
> >>> +    }
> >>> +
> >>> +    if (autostart) {
> >>> +        RunState run_state = global_state_get_runstate();
> >>>          vm_start();
> >>
> >> This will resume the guest for a brief time, against the user's wishes.  Not OK IMO.
> > 
> > Ah okay..
> > 
> > Can we do whatever we need in vm_prepare_start(), then? I assume these
> > chunks are needed:
> > 
> >     /*
> >      * WHPX accelerator needs to know whether we are going to step
> >      * any CPUs, before starting the first one.
> >      */
> >     if (cpus_accel->synchronize_pre_resume) {
> >         cpus_accel->synchronize_pre_resume(step_pending);
> >     }
> > 
> >     /* We are sending this now, but the CPUs will be resumed shortly later */
> >     qapi_event_send_resume();
> > 
> >     cpu_enable_ticks();
> > 
> > While these may not be needed, but instead only needed if RUN_STATE_RUNNING
> > below (runstate_set() will be needed regardless):
> > 
> >     runstate_set(RUN_STATE_RUNNING);
> >     vm_state_notify(1, RUN_STATE_RUNNING);
> > 
> > So here's another of my attempt (this time also taking
> > !global_state_received() into account)...
> > 
> >     RunState run_state;
> > 
> >     if (migration_incoming_colo_enabled()) {
> >         migration_incoming_disable_colo();
> >     }
> > 
> >     if (!global_state_received()) {
> >         run_state = RUN_STATE_RUNNING;
> >     } else {
> >         run_state = global_state_get_runstate();
> >     }
> > 
> >     if (autostart) {
> >         /* Part of vm_prepare_start(), may isolate into a helper? */
> >         if (cpus_accel->synchronize_pre_resume) {
> >             cpus_accel->synchronize_pre_resume(step_pending);
> >         }
> >         qapi_event_send_resume();
> >         cpu_enable_ticks();
> >         /* Setup the runstate on src */
> >         runstate_set(run_state);
> >         if (run_state == RUN_STATE_RUNNING) {
> >             vm_state_notify(1, RUN_STATE_RUNNING);
> >         }
> >     } else {
> >         runstate_set(RUN_STATE_PAUSED);
> >     }
> > 
> > The whole idea is still do whatever needed here besides resuming the vcpus,
> > rather than leaving the whole vm_start() into system wakeup.  It then can
> > avoid touching the system wakeup code which seems cleaner.
> 
> The problem is that some actions cannot be performed at migration finish time,
> such as vm_state_notify RUN_STATE_RUNNING.  The wakeup code called later still 
> needs to know that vm_state_notify has not been called, and call it.

Sorry, very late reply..

Can we just call vm_state_notify() earlier?

I know I'm not familiar with this code at all.. but I'm still not fully
convinced a new global is needed for this.  IMHO QEMU becomes harder to
maintain with such globals, while already tons exist.

> 
> I just posted a new series with a cleaner wakeup, but it still uses a global.

I think the new version does not apply anymore to master.  Do you have a
tree somewhere?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-07-26 20:18               ` Peter Xu
@ 2023-08-14 18:53                 ` Steven Sistare
  2023-08-14 19:37                   ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Sistare @ 2023-08-14 18:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 7/26/2023 4:18 PM, Peter Xu wrote:
> On Fri, Jun 30, 2023 at 09:50:41AM -0400, Steven Sistare wrote:
>> On 6/26/2023 2:27 PM, Peter Xu wrote:
>>> On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote:
>>>> On 6/21/2023 4:28 PM, Peter Xu wrote:
>>>>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
>>>>>> On 6/20/2023 5:46 PM, Peter Xu wrote:
>>>>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>>>>>>>> Migration of a guest in the suspended state is broken.  The incoming
>>>>>>>> migration code automatically tries to wake the guest, which IMO is
>>>>>>>> wrong -- the guest should end migration in the same state it started.
>>>>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>>>>>>>> bypasses vm_start().  The guest appears to be in the running state, but
>>>>>>>> it is not.
>>>>>>>>
>>>>>>>> To fix, leave the guest in the suspended state, but call
>>>>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>>>>>>>> later, when the client sends a system_wakeup command.
>>>>>>>>
>>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>>>> ---
>>>>>>>>  migration/migration.c | 11 ++++-------
>>>>>>>>  softmmu/runstate.c    |  1 +
>>>>>>>>  2 files changed, 5 insertions(+), 7 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>>>> index 17b4b47..851fe6d 100644
>>>>>>>> --- a/migration/migration.c
>>>>>>>> +++ b/migration/migration.c
>>>>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>>>>>>>>          vm_start();
>>>>>>>>      } else {
>>>>>>>>          runstate_set(global_state_get_runstate());
>>>>>>>> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
>>>>>>>> +            /* Force vm_start to be called later. */
>>>>>>>> +            qemu_system_start_on_wakeup_request();
>>>>>>>> +        }
>>>>>>>
>>>>>>> Is this really needed, along with patch 1?
>>>>>>>
>>>>>>> I have a very limited knowledge on suspension, so I'm prone to making
>>>>>>> mistakes..
>>>>>>>
>>>>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not
>>>>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread
>>>>>>> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
>>>>>>> the main thread later on after qemu_wakeup_requested() returns true.
>>>>>>
>>>>>> Correct, here:
>>>>>>
>>>>>>     if (qemu_wakeup_requested()) {
>>>>>>         pause_all_vcpus();
>>>>>>         qemu_system_wakeup();
>>>>>>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>>>>>>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>>>>>>         resume_all_vcpus();
>>>>>>         qapi_event_send_wakeup();
>>>>>>     }
>>>>>>
>>>>>> However, that is not sufficient, because vm_start() was never called on the incoming
>>>>>> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things.
>>>>>>
>>>>>>
>>>>>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended
>>>>>> guest, which sets the state to running, which is saved in global state:
>>>>>>
>>>>>>     void migration_completion(MigrationState *s)
>>>>>>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>>>>>         global_state_store()
>>>>>>
>>>>>> Then the incoming migration calls vm_start here:
>>>>>>
>>>>>>     migration/migration.c
>>>>>>         if (!global_state_received() ||
>>>>>>             global_state_get_runstate() == RUN_STATE_RUNNING) {
>>>>>>             if (autostart) {
>>>>>>                 vm_start();
>>>>>>
>>>>>> vm_start must be called for correctness.
>>>>>
>>>>> I see.  Though I had a feeling that this is still not the right way to do,
>>>>> at least not as clean.
>>>>>
>>>>> One question is, would above work for postcopy when VM is suspended during
>>>>> the switchover?
>>>>
>>>> Good catch, that is broken.
>>>> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
>>>> and now it works.
>>>>
>>>>     if (global_state_get_runstate() == RUN_STATE_RUNNING) {
>>>>         if (autostart) {
>>>>             vm_start();
>>>>         } else {
>>>>             runstate_set(RUN_STATE_PAUSED);
>>>>         }
>>>>     } else {
>>>>         runstate_set(global_state_get_runstate());
>>>>         if (runstate_check(RUN_STATE_SUSPENDED)) {
>>>>             qemu_system_start_on_wakeup_request();
>>>>         }
>>>>     }
>>>>
>>>>> I think I see your point that vm_start() (mostly vm_prepare_start())
>>>>> contains a bunch of operations that maybe we must have before starting the
>>>>> VM, but then.. should we just make that vm_start() unconditional when
>>>>> loading VM completes?  I just don't see anything won't need it (besides
>>>>> -S), even COLO.
>>>>>
>>>>> So I'm wondering about something like this:
>>>>>
>>>>> ===8<===
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
>>>>>  
>>>>>      dirty_bitmap_mig_before_vm_start();
>>>>>  
>>>>> -    if (!global_state_received() ||
>>>>> -        global_state_get_runstate() == RUN_STATE_RUNNING) {
>>>>> -        if (autostart) {
>>>>> -            vm_start();
>>>>> -        } else {
>>>>> -            runstate_set(RUN_STATE_PAUSED);
>>>>> -        }
>>>>> -    } else if (migration_incoming_colo_enabled()) {
>>>>> +    if (migration_incoming_colo_enabled()) {
>>>>>          migration_incoming_disable_colo();
>>>>> +        /* COLO should always have autostart=1 or we can enforce it here */
>>>>> +    }
>>>>> +
>>>>> +    if (autostart) {
>>>>> +        RunState run_state = global_state_get_runstate();
>>>>>          vm_start();
>>>>
>>>> This will resume the guest for a brief time, against the user's wishes.  Not OK IMO.
>>>
>>> Ah okay..
>>>
>>> Can we do whatever we need in vm_prepare_start(), then? I assume these
>>> chunks are needed:
>>>
>>>     /*
>>>      * WHPX accelerator needs to know whether we are going to step
>>>      * any CPUs, before starting the first one.
>>>      */
>>>     if (cpus_accel->synchronize_pre_resume) {
>>>         cpus_accel->synchronize_pre_resume(step_pending);
>>>     }
>>>
>>>     /* We are sending this now, but the CPUs will be resumed shortly later */
>>>     qapi_event_send_resume();
>>>
>>>     cpu_enable_ticks();
>>>
>>> While these may not be needed, but instead only needed if RUN_STATE_RUNNING
>>> below (runstate_set() will be needed regardless):
>>>
>>>     runstate_set(RUN_STATE_RUNNING);
>>>     vm_state_notify(1, RUN_STATE_RUNNING);
>>>
>>> So here's another of my attempt (this time also taking
>>> !global_state_received() into account)...
>>>
>>>     RunState run_state;
>>>
>>>     if (migration_incoming_colo_enabled()) {
>>>         migration_incoming_disable_colo();
>>>     }
>>>
>>>     if (!global_state_received()) {
>>>         run_state = RUN_STATE_RUNNING;
>>>     } else {
>>>         run_state = global_state_get_runstate();
>>>     }
>>>
>>>     if (autostart) {
>>>         /* Part of vm_prepare_start(), may isolate into a helper? */
>>>         if (cpus_accel->synchronize_pre_resume) {
>>>             cpus_accel->synchronize_pre_resume(step_pending);
>>>         }
>>>         qapi_event_send_resume();
>>>         cpu_enable_ticks();
>>>         /* Setup the runstate on src */
>>>         runstate_set(run_state);
>>>         if (run_state == RUN_STATE_RUNNING) {
>>>             vm_state_notify(1, RUN_STATE_RUNNING);
>>>         }
>>>     } else {
>>>         runstate_set(RUN_STATE_PAUSED);
>>>     }
>>>
>>> The whole idea is still do whatever needed here besides resuming the vcpus,
>>> rather than leaving the whole vm_start() into system wakeup.  It then can
>>> avoid touching the system wakeup code which seems cleaner.
>>
>> The problem is that some actions cannot be performed at migration finish time,
>> such as vm_state_notify RUN_STATE_RUNNING.  The wakeup code called later still 
>> needs to know that vm_state_notify has not been called, and call it.
> 
> Sorry, very late reply..

And I just returned from vaction :)

> Can we just call vm_state_notify() earlier?

We cannot.  The guest is not running yet, and will not be until later.
We cannot call notifiers that perform actions that complete, or react to, 
the guest entering a running state.

> I know I'm not familiar with this code at all.. but I'm still not fully
> convinced a new global is needed for this.  IMHO QEMU becomes harder to
> maintain with such globals, while already tons exist.
> 
>> I just posted a new series with a cleaner wakeup, but it still uses a global.

See "[PATCH V2 01/10] vl: start on wakeup request" in the new series. 
The transitions of the global are trivial and sensible:

    vm_start()
        vm_started = true;

    do_vm_stop()
        vm_started = false;

current_run_state is already a global, confined to runstate.c.  
vm_started is a new qualifier on the state, and hence is also global.

If runstate were a field in MachineState, then I would add vm_started to
MachineState, but MachineState is not used in runstate.c.

> I think the new version does not apply anymore to master.  Do you have a
> tree somewhere?

I do not, but I will rebase and post V3 in a moment.

- Steve


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-08-14 18:53                 ` Steven Sistare
@ 2023-08-14 19:37                   ` Peter Xu
  2023-08-16 17:48                     ` Steven Sistare
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-08-14 19:37 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
> > Can we just call vm_state_notify() earlier?
> 
> We cannot.  The guest is not running yet, and will not be until later.
> We cannot call notifiers that perform actions that complete, or react to, 
> the guest entering a running state.

I tried to look at a few examples of the notifees and most of them I read
do not react to "vcpu running" but "vm running" (in which case I think
"suspended" mode falls into "vm running" case); most of them won't care on
the RunState parameter passed in, but only the bool "running".

In reality, when running=true, it must be RUNNING so far.

In that case does it mean we should notify right after the switchover,
since after migration the vm is indeed running only if the vcpus are not
during suspend?

One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
device; this kind of prove to me that SUSPEND is actually one of
running=true states.

If we postpone all notifiers here even after we switched over to dest qemu
to the next upcoming suspend wakeup, I think it means these devices will
not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
VFIO_DEVICE_STATE_STOP.

Ideally I think we should here call vm_state_notify() with running=true and
state=SUSPEND, but since I do see some hooks are not well prepared for
SUSPEND over running=true, I'd think we should on the safe side call
vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
later we just need to call no notifiers).

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-08-14 19:37                   ` Peter Xu
@ 2023-08-16 17:48                     ` Steven Sistare
  2023-08-17 18:19                       ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Sistare @ 2023-08-16 17:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 8/14/2023 3:37 PM, Peter Xu wrote:
> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
>>> Can we just call vm_state_notify() earlier?
>>
>> We cannot.  The guest is not running yet, and will not be until later.
>> We cannot call notifiers that perform actions that complete, or react to, 
>> the guest entering a running state.
> 
> I tried to look at a few examples of the notifees and most of them I read
> do not react to "vcpu running" but "vm running" (in which case I think
> "suspended" mode falls into "vm running" case); most of them won't care on
> the RunState parameter passed in, but only the bool "running".
> 
> In reality, when running=true, it must be RUNNING so far.
> 
> In that case does it mean we should notify right after the switchover,
> since after migration the vm is indeed running only if the vcpus are not
> during suspend?

I cannot parse your question, but maybe this answers it.
If the outgoing VM is running and not suspended, then the incoming side
tests for autostart==true and calls vm_start, which calls the notifiers,
right after the switchover.

> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
> device; this kind of prove to me that SUSPEND is actually one of
> running=true states.
> 
> If we postpone all notifiers here even after we switched over to dest qemu
> to the next upcoming suspend wakeup, I think it means these devices will
> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
> VFIO_DEVICE_STATE_STOP.

or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
AFAIK it is OK to remain in that state until wakeup is called later.

> Ideally I think we should here call vm_state_notify() with running=true and
> state=SUSPEND, but since I do see some hooks are not well prepared for
> SUSPEND over running=true, I'd think we should on the safe side call
> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
> over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
> later we just need to call no notifiers).

Notifiers are just one piece, all the code in vm_prepare_start must be called.
Is it correct to call all of that long before we actually resume the CPUs in
wakeup?  I don't know, but what is the point?  The wakeup code still needs
modification to conditionally resume the vcpus.  The scheme would be roughly:

    loadvm_postcopy_handle_run_bh()
        runstat = global_state_get_runstate();
        if (runstate == RUN_STATE_RUNNING) {
            vm_start()
        } else if (runstate == RUN_STATE_SUSPENDED)
            vm_prepare_start();   // the start of vm_start()
        }

    qemu_system_wakeup_request()
        if (some condition)
            resume_all_vcpus();   // the remainder of vm_start()
        else
            runstate_set(RUN_STATE_RUNNING)

How is that better than my patches
    [PATCH V3 01/10] vl: start on wakeup request
    [PATCH V3 02/10] migration: preserve suspended runstate

    loadvm_postcopy_handle_run_bh()
        runstate = global_state_get_runstate();
        if (runstate == RUN_STATE_RUNNING)
            vm_start()
        else
            runstate_set(runstate);    // eg RUN_STATE_SUSPENDED

    qemu_system_wakeup_request()
        if (!vm_started)
            vm_start();
        else
            runstate_set(RUN_STATE_RUNNING);

Recall this thread started with your comment "It then can avoid touching the 
system wakeup code which seems cleaner".  We still need to touch the wakeup
code.

- Steve


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-08-16 17:48                     ` Steven Sistare
@ 2023-08-17 18:19                       ` Peter Xu
  2023-08-24 20:53                         ` Steven Sistare
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-08-17 18:19 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote:
> On 8/14/2023 3:37 PM, Peter Xu wrote:
> > On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
> >>> Can we just call vm_state_notify() earlier?
> >>
> >> We cannot.  The guest is not running yet, and will not be until later.
> >> We cannot call notifiers that perform actions that complete, or react to, 
> >> the guest entering a running state.
> > 
> > I tried to look at a few examples of the notifees and most of them I read
> > do not react to "vcpu running" but "vm running" (in which case I think
> > "suspended" mode falls into "vm running" case); most of them won't care on
> > the RunState parameter passed in, but only the bool "running".
> > 
> > In reality, when running=true, it must be RUNNING so far.
> > 
> > In that case does it mean we should notify right after the switchover,
> > since after migration the vm is indeed running only if the vcpus are not
> > during suspend?
> 
> I cannot parse your question, but maybe this answers it.
> If the outgoing VM is running and not suspended, then the incoming side
> tests for autostart==true and calls vm_start, which calls the notifiers,
> right after the switchover.

I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like
RUNNING.  Then, we should invoke vm_prepare_start(), just need some touch
ups.

> 
> > One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
> > try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
> > device; this kind of prove to me that SUSPEND is actually one of
> > running=true states.
> > 
> > If we postpone all notifiers here even after we switched over to dest qemu
> > to the next upcoming suspend wakeup, I think it means these devices will
> > not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
> > VFIO_DEVICE_STATE_STOP.
> 
> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
> AFAIK it is OK to remain in that state until wakeup is called later.

So let me provide another clue of why I think we should call
vm_prepare_start()..

Firstly, I think RESUME event should always be there right after we
switched over, no matter suspeneded or not.  I just noticed that your test
case would work because you put "wakeup" to be before RESUME.  I'm not sure
whether that's correct.  I'd bet people could rely on that RESUME to
identify the switchover.

More importantly, I'm wondering whether RTC should still be running during
the suspended mode?  Sorry again if my knowledge over there is just
limited, so correct me otherwise - but my understanding is during suspend
mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should
still be running along with the system clock.  It means we _should_ at
least call cpu_enable_ticks() to enable rtc:

/*
 * enable cpu_get_ticks()
 * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
 */
void cpu_enable_ticks(void)

I think that'll enable cpu_get_tsc() and make it start to work right.

> 
> > Ideally I think we should here call vm_state_notify() with running=true and
> > state=SUSPEND, but since I do see some hooks are not well prepared for
> > SUSPEND over running=true, I'd think we should on the safe side call
> > vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
> > over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
> > later we just need to call no notifiers).
> 
> Notifiers are just one piece, all the code in vm_prepare_start must be called.
> Is it correct to call all of that long before we actually resume the CPUs in
> wakeup?  I don't know, but what is the point?

The point is not only for cleaness (again, I really, really don't like that
new global.. sorry), but also now I think we should make the vm running.

> The wakeup code still needs
> modification to conditionally resume the vcpus.  The scheme would be roughly:
> 
>     loadvm_postcopy_handle_run_bh()
>         runstat = global_state_get_runstate();
>         if (runstate == RUN_STATE_RUNNING) {
>             vm_start()
>         } else if (runstate == RUN_STATE_SUSPENDED)
>             vm_prepare_start();   // the start of vm_start()
>         }
> 
>     qemu_system_wakeup_request()
>         if (some condition)
>             resume_all_vcpus();   // the remainder of vm_start()
>         else
>             runstate_set(RUN_STATE_RUNNING)

No it doesn't.  wakeup_reason is set there, main loop does the resuming.
See:

    if (qemu_wakeup_requested()) {
        pause_all_vcpus();
        qemu_system_wakeup();
        notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
        wakeup_reason = QEMU_WAKEUP_REASON_NONE;
        resume_all_vcpus();
        qapi_event_send_wakeup();
    }

> 
> How is that better than my patches
>     [PATCH V3 01/10] vl: start on wakeup request
>     [PATCH V3 02/10] migration: preserve suspended runstate
> 
>     loadvm_postcopy_handle_run_bh()
>         runstate = global_state_get_runstate();
>         if (runstate == RUN_STATE_RUNNING)
>             vm_start()
>         else
>             runstate_set(runstate);    // eg RUN_STATE_SUSPENDED
> 
>     qemu_system_wakeup_request()
>         if (!vm_started)
>             vm_start();
>         else
>             runstate_set(RUN_STATE_RUNNING);
> 
> Recall this thread started with your comment "It then can avoid touching the 
> system wakeup code which seems cleaner".  We still need to touch the wakeup
> code.

Let me provide some code and reply to your new patchset inlines.

Thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V1 2/3] migration: fix suspended runstate
  2023-08-17 18:19                       ` Peter Xu
@ 2023-08-24 20:53                         ` Steven Sistare
  0 siblings, 0 replies; 22+ messages in thread
From: Steven Sistare @ 2023-08-24 20:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé

On 8/17/2023 2:19 PM, Peter Xu wrote:
> On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote:
>> On 8/14/2023 3:37 PM, Peter Xu wrote:
>>> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
>>>>> Can we just call vm_state_notify() earlier?
>>>>
>>>> We cannot.  The guest is not running yet, and will not be until later.
>>>> We cannot call notifiers that perform actions that complete, or react to, 
>>>> the guest entering a running state.
>>>
>>> I tried to look at a few examples of the notifees and most of them I read
>>> do not react to "vcpu running" but "vm running" (in which case I think
>>> "suspended" mode falls into "vm running" case); most of them won't care on
>>> the RunState parameter passed in, but only the bool "running".
>>>
>>> In reality, when running=true, it must be RUNNING so far.
>>>
>>> In that case does it mean we should notify right after the switchover,
>>> since after migration the vm is indeed running only if the vcpus are not
>>> during suspend?
>>
>> I cannot parse your question, but maybe this answers it.
>> If the outgoing VM is running and not suspended, then the incoming side
>> tests for autostart==true and calls vm_start, which calls the notifiers,
>> right after the switchover.
> 
> I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like
> RUNNING.  Then, we should invoke vm_prepare_start(), just need some touch
> ups.
> 
>>
>>> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
>>> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
>>> device; this kind of prove to me that SUSPEND is actually one of
>>> running=true states.
>>>
>>> If we postpone all notifiers here even after we switched over to dest qemu
>>> to the next upcoming suspend wakeup, I think it means these devices will
>>> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
>>> VFIO_DEVICE_STATE_STOP.
>>
>> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
>> AFAIK it is OK to remain in that state until wakeup is called later.
> 
> So let me provide another clue of why I think we should call
> vm_prepare_start()..
> 
> Firstly, I think RESUME event should always be there right after we
> switched over, no matter suspeneded or not.  I just noticed that your test
> case would work because you put "wakeup" to be before RESUME.  I'm not sure
> whether that's correct.  I'd bet people could rely on that RESUME to
> identify the switchover.

Yes, possibly.

> More importantly, I'm wondering whether RTC should still be running during
> the suspended mode?  Sorry again if my knowledge over there is just
> limited, so correct me otherwise - but my understanding is during suspend
> mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should
> still be running along with the system clock.  It means we _should_ at
> least call cpu_enable_ticks() to enable rtc:

Agreed.  The comment above cpu_get_ticks says:
  * return the time elapsed in VM between vm_start and vm_stop
Suspending a guest does not call vm_stop, so ticks keeps running.
I also verified that experimentally.

> /*
>  * enable cpu_get_ticks()
>  * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
>  */
> void cpu_enable_ticks(void)
> 
> I think that'll enable cpu_get_tsc() and make it start to work right.
> 
>>
>>> Ideally I think we should here call vm_state_notify() with running=true and
>>> state=SUSPEND, but since I do see some hooks are not well prepared for
>>> SUSPEND over running=true, I'd think we should on the safe side call
>>> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
>>> over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
>>> later we just need to call no notifiers).
>>
>> Notifiers are just one piece, all the code in vm_prepare_start must be called.
>> Is it correct to call all of that long before we actually resume the CPUs in
>> wakeup?  I don't know, but what is the point?
> 
> The point is not only for cleaness (again, I really, really don't like that
> new global.. sorry), but also now I think we should make the vm running.

I do believe it is safe to call vm_prepare_start immediately, since the vcpus
are never running when it is called.

>> The wakeup code still needs
>> modification to conditionally resume the vcpus.  The scheme would be roughly:
>>
>>     loadvm_postcopy_handle_run_bh()
>>         runstat = global_state_get_runstate();
>>         if (runstate == RUN_STATE_RUNNING) {
>>             vm_start()
>>         } else if (runstate == RUN_STATE_SUSPENDED)
>>             vm_prepare_start();   // the start of vm_start()
>>         }
>>
>>     qemu_system_wakeup_request()
>>         if (some condition)
>>             resume_all_vcpus();   // the remainder of vm_start()
>>         else
>>             runstate_set(RUN_STATE_RUNNING)
> 
> No it doesn't.  wakeup_reason is set there, main loop does the resuming.
> See:
> 
>     if (qemu_wakeup_requested()) {
>         pause_all_vcpus();
>         qemu_system_wakeup();
>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>         resume_all_vcpus();
>         qapi_event_send_wakeup();
>     }

Agreed, we can rely on that main loop code to call resume_all_vcpus, and not
modify qemu_system_wakeup_request.  That is cleaner. 

>> How is that better than my patches
>>     [PATCH V3 01/10] vl: start on wakeup request
>>     [PATCH V3 02/10] migration: preserve suspended runstate
>>
>>     loadvm_postcopy_handle_run_bh()
>>         runstate = global_state_get_runstate();
>>         if (runstate == RUN_STATE_RUNNING)
>>             vm_start()
>>         else
>>             runstate_set(runstate);    // eg RUN_STATE_SUSPENDED
>>
>>     qemu_system_wakeup_request()
>>         if (!vm_started)
>>             vm_start();
>>         else
>>             runstate_set(RUN_STATE_RUNNING);
>>
>> Recall this thread started with your comment "It then can avoid touching the 
>> system wakeup code which seems cleaner".  We still need to touch the wakeup
>> code.
> 
> Let me provide some code and reply to your new patchset inlines.

Thank you, I have more comments there.

- Steve



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-08-24 20:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-15 20:26 [PATCH V1 0/3] fix migration of suspended runstate Steve Sistare
2023-06-15 20:26 ` [PATCH V1 1/3] vl: start on wakeup request Steve Sistare
2023-06-15 20:26 ` [PATCH V1 2/3] migration: fix suspended runstate Steve Sistare
2023-06-20 21:46   ` Peter Xu
2023-06-21 19:15     ` Steven Sistare
2023-06-21 20:28       ` Peter Xu
2023-06-23 18:25         ` Steven Sistare
2023-06-23 19:56           ` Steven Sistare
2023-06-26 18:27           ` Peter Xu
2023-06-26 19:11             ` Peter Xu
2023-06-30 13:50             ` Steven Sistare
2023-07-26 20:18               ` Peter Xu
2023-08-14 18:53                 ` Steven Sistare
2023-08-14 19:37                   ` Peter Xu
2023-08-16 17:48                     ` Steven Sistare
2023-08-17 18:19                       ` Peter Xu
2023-08-24 20:53                         ` Steven Sistare
2023-06-15 20:26 ` [PATCH V1 3/3] tests/qtest: live migration suspended state Steve Sistare
2023-06-21 16:45   ` Peter Xu
2023-06-21 19:39     ` Steven Sistare
2023-06-21 20:00       ` Peter Xu
2023-06-22 21:28         ` Steven Sistare

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).