* [PATCH 0/4] ppc: Fix migration issues with XICS and quiesce
@ 2025-08-19 22:39 Fabiano Rosas
2025-08-19 22:39 ` [PATCH 1/4] hw/intc/xics: Add missing call to register vmstate_icp_server Fabiano Rosas
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-19 22:39 UTC (permalink / raw)
To: qemu-devel; +Cc: Nicholas Piggin, Thomas Huth, Fabian Vogt, Peter Xu
Fabian reports two issues with migration on ppc64le, one of which with
a proposed fix, which I include in this series.
1) XICS migration causes a guest hang after migration due to missing
ICP server state. Fix is to bring back the vmstate_register call
for that device. Breaks backward migration, but it was already
non-functional anyway.
2) pseries migration causes a guest hang after migration due to a new
variable used to track the stopped state of vcpus, which is not
migrated. Fix is to migrate the new variable. To avoid breaking
backward migration, a compat property is added. Breaks forward
migration, a workaround is proposed.
I also added some functional tests changes because there are currently
no test that can detect the kind of hangs seen here. RFC on those,
feel free to nitpick.
Thanks
CI run: https://gitlab.com/farosas/qemu/-/pipelines/1992482993
Fabian Vogt (1):
hw/intc/xics: Add missing call to register vmstate_icp_server
Fabiano Rosas (3):
tests/functional: Extract migration code into a new class
tests/functional: Add a OS level migration test for pseries
target/ppc: Fix env->quiesced migration
hw/core/machine.c | 1 +
hw/intc/xics.c | 2 ++
target/ppc/cpu.h | 1 +
target/ppc/cpu_init.c | 7 +++++
target/ppc/machine.c | 40 ++++++++++++++++++++++++
tests/functional/qemu_test/migration.py | 40 ++++++++++++++++++++++++
tests/functional/test_migration.py | 24 ++-------------
tests/functional/test_ppc64_pseries.py | 41 +++++++++++++++++++++++++
8 files changed, 135 insertions(+), 21 deletions(-)
create mode 100644 tests/functional/qemu_test/migration.py
--
2.35.3
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/4] hw/intc/xics: Add missing call to register vmstate_icp_server
2025-08-19 22:39 [PATCH 0/4] ppc: Fix migration issues with XICS and quiesce Fabiano Rosas
@ 2025-08-19 22:39 ` Fabiano Rosas
2025-09-18 15:28 ` Gautam Menghani
2025-08-19 22:39 ` [RFC PATCH 2/4] tests/functional: Extract migration code into a new class Fabiano Rosas
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-19 22:39 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Thomas Huth, Fabian Vogt, Peter Xu,
Philippe Mathieu-Daudé, Harsh Prateek Bora
From: Fabian Vogt <fvogt@suse.de>
An obsolete wrapper function with a workaround was removed entirely,
without restoring the call it wrapped.
Without this, the guest is stuck after savevm/loadvm.
Fixes: 24ee9229fe31 ("ppc/spapr: remove deprecated machine pseries-2.9")
Signed-off-by: Fabian Vogt <fvogt@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/6187781.lOV4Wx5bFT@fvogt-thinkpad
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
hw/intc/xics.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index d9a199e883..200710eb6c 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -335,6 +335,8 @@ static void icp_realize(DeviceState *dev, Error **errp)
return;
}
}
+
+ vmstate_register(NULL, icp->cs->cpu_index, &vmstate_icp_server, icp);
}
static void icp_unrealize(DeviceState *dev)
--
2.35.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 2/4] tests/functional: Extract migration code into a new class
2025-08-19 22:39 [PATCH 0/4] ppc: Fix migration issues with XICS and quiesce Fabiano Rosas
2025-08-19 22:39 ` [PATCH 1/4] hw/intc/xics: Add missing call to register vmstate_icp_server Fabiano Rosas
@ 2025-08-19 22:39 ` Fabiano Rosas
2025-08-20 6:50 ` Thomas Huth
2025-08-19 22:39 ` [RFC PATCH 3/4] tests/functional: Add a OS level migration test for pseries Fabiano Rosas
2025-08-19 22:39 ` [PATCH 4/4] target/ppc: Fix env->quiesced migration Fabiano Rosas
3 siblings, 1 reply; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-19 22:39 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Thomas Huth, Fabian Vogt, Peter Xu,
Philippe Mathieu-Daudé, Daniel P. Berrangé
Move some of the code from test_migration.py to a new class so it can
be reused to invoke migrations from other tests.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
I see this conflicts with Thomas' series, I'll update accordingly.
---
tests/functional/qemu_test/migration.py | 40 +++++++++++++++++++++++++
tests/functional/test_migration.py | 24 ++-------------
2 files changed, 43 insertions(+), 21 deletions(-)
create mode 100644 tests/functional/qemu_test/migration.py
diff --git a/tests/functional/qemu_test/migration.py b/tests/functional/qemu_test/migration.py
new file mode 100644
index 0000000000..37988704e8
--- /dev/null
+++ b/tests/functional/qemu_test/migration.py
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# Migration test
+#
+# Copyright (c) 2019 Red Hat, Inc.
+#
+# Authors:
+# Cleber Rosa <crosa@redhat.com>
+# Caio Carrara <ccarrara@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later. See the COPYING file in the top-level directory.
+
+import time
+
+
+class Migration():
+
+ @staticmethod
+ def migration_finished(vm):
+ return vm.cmd('query-migrate')['status'] in ('completed', 'failed')
+
+ def assert_migration(self, test, src_vm, dst_vm, timeout):
+
+ end = time.monotonic() + timeout
+ while time.monotonic() < end and not self.migration_finished(src_vm):
+ time.sleep(0.1)
+
+ end = time.monotonic() + timeout
+ while time.monotonic() < end and not self.migration_finished(dst_vm):
+ time.sleep(0.1)
+
+ test.assertEqual(src_vm.cmd('query-migrate')['status'], 'completed')
+ test.assertEqual(dst_vm.cmd('query-migrate')['status'], 'completed')
+ test.assertEqual(dst_vm.cmd('query-status')['status'], 'running')
+ test.assertEqual(src_vm.cmd('query-status')['status'],'postmigrate')
+
+ def migrate(self, test, source_vm, dest_vm, src_uri, timeout):
+ source_vm.qmp('migrate', uri=src_uri)
+ self.assert_migration(test, source_vm, dest_vm, timeout)
diff --git a/tests/functional/test_migration.py b/tests/functional/test_migration.py
index c4393c3543..1c75a98330 100755
--- a/tests/functional/test_migration.py
+++ b/tests/functional/test_migration.py
@@ -15,6 +15,7 @@
import time
from qemu_test import QemuSystemTest, skipIfMissingCommands
+from qemu_test.migration import Migration
from qemu_test.ports import Ports
@@ -22,25 +23,6 @@ class MigrationTest(QemuSystemTest):
timeout = 10
- @staticmethod
- def migration_finished(vm):
- return vm.cmd('query-migrate')['status'] in ('completed', 'failed')
-
- def assert_migration(self, src_vm, dst_vm):
-
- end = time.monotonic() + self.timeout
- while time.monotonic() < end and not self.migration_finished(src_vm):
- time.sleep(0.1)
-
- end = time.monotonic() + self.timeout
- while time.monotonic() < end and not self.migration_finished(dst_vm):
- time.sleep(0.1)
-
- self.assertEqual(src_vm.cmd('query-migrate')['status'], 'completed')
- self.assertEqual(dst_vm.cmd('query-migrate')['status'], 'completed')
- self.assertEqual(dst_vm.cmd('query-status')['status'], 'running')
- self.assertEqual(src_vm.cmd('query-status')['status'],'postmigrate')
-
def select_machine(self):
target_machine = {
'aarch64': 'quanta-gsj',
@@ -67,8 +49,8 @@ def do_migrate(self, dest_uri, src_uri=None):
source_vm = self.get_vm(name="source-qemu")
source_vm.add_args('-nodefaults')
source_vm.launch()
- source_vm.qmp('migrate', uri=src_uri)
- self.assert_migration(source_vm, dest_vm)
+
+ Migration().migrate(self, source_vm, dest_vm, src_uri, self.timeout)
def _get_free_port(self, ports):
port = ports.find_free_port()
--
2.35.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 3/4] tests/functional: Add a OS level migration test for pseries
2025-08-19 22:39 [PATCH 0/4] ppc: Fix migration issues with XICS and quiesce Fabiano Rosas
2025-08-19 22:39 ` [PATCH 1/4] hw/intc/xics: Add missing call to register vmstate_icp_server Fabiano Rosas
2025-08-19 22:39 ` [RFC PATCH 2/4] tests/functional: Extract migration code into a new class Fabiano Rosas
@ 2025-08-19 22:39 ` Fabiano Rosas
2025-08-20 7:03 ` Thomas Huth
2025-08-19 22:39 ` [PATCH 4/4] target/ppc: Fix env->quiesced migration Fabiano Rosas
3 siblings, 1 reply; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-19 22:39 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Thomas Huth, Fabian Vogt, Peter Xu,
Harsh Prateek Bora
There's currently no OS level test for ppc64le. Add one such test by
reusing the boot level tests that are already present.
The test boots the source machine, waits for it to reach a mid-boot
message, migrates and checks that the destination has reached the
final boot message (VFS error due to no disk).
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/functional/test_ppc64_pseries.py | 41 ++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/tests/functional/test_ppc64_pseries.py b/tests/functional/test_ppc64_pseries.py
index 67057934e8..7a7e0fe8ae 100755
--- a/tests/functional/test_ppc64_pseries.py
+++ b/tests/functional/test_ppc64_pseries.py
@@ -9,6 +9,8 @@
from qemu_test import QemuSystemTest, Asset
from qemu_test import wait_for_console_pattern
+from qemu_test.migration import Migration
+from qemu_test.ports import Ports
class pseriesMachine(QemuSystemTest):
@@ -87,5 +89,44 @@ def test_ppc64_linux_big_boot(self):
wait_for_console_pattern(self, console_pattern, self.panic_message)
wait_for_console_pattern(self, self.good_message, self.panic_message)
+ def test_ppc64_linux_migration(self):
+ with Ports() as ports:
+ port = ports.find_free_port()
+ if port is None:
+ self.skipTest('Failed to find a free port')
+ uri = 'tcp:localhost:%u' % port
+
+ kernel_path = self.ASSET_KERNEL.fetch()
+ kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE
+
+ self.set_machine('pseries')
+
+ dest_vm = self.get_vm('-incoming', uri, name="dest-qemu")
+ dest_vm.add_args('-smp', '4')
+ dest_vm.add_args('-nodefaults')
+ dest_vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+ dest_vm.set_console()
+ dest_vm.launch()
+
+ source_vm = self.get_vm(name="source-qemu")
+ source_vm.add_args('-smp', '4')
+ source_vm.add_args('-nodefaults')
+ source_vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+ source_vm.set_console()
+ source_vm.launch()
+
+ # ensure the boot has reached Linux
+ console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+ wait_for_console_pattern(self, console_pattern, self.panic_message,
+ vm=source_vm)
+
+ Migration().migrate(self, source_vm, dest_vm, uri, self.timeout)
+
+ # ensure the boot proceeds after migration
+ wait_for_console_pattern(self, self.good_message, self.panic_message,
+ vm=dest_vm)
+
if __name__ == '__main__':
QemuSystemTest.main()
--
2.35.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/4] target/ppc: Fix env->quiesced migration
2025-08-19 22:39 [PATCH 0/4] ppc: Fix migration issues with XICS and quiesce Fabiano Rosas
` (2 preceding siblings ...)
2025-08-19 22:39 ` [RFC PATCH 3/4] tests/functional: Add a OS level migration test for pseries Fabiano Rosas
@ 2025-08-19 22:39 ` Fabiano Rosas
2025-08-20 6:55 ` Thomas Huth
3 siblings, 1 reply; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-19 22:39 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Thomas Huth, Fabian Vogt, Peter Xu,
Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
Yanan Wang, Zhao Liu, Chinmay Rath
The commit referenced (from QEMU 10.0) has changed the way the pseries
machine marks a cpu as quiesced. Previously, the cpu->halted value
from QEMU common cpu code was (incorrectly) used. With the fix, the
env->quiesced variable starts being used, which improves on the
original situation, but also causes a side effect after migration:
The env->quiesced is set at reset and never migrated, which causes the
destination QEMU to stop delivering interrupts and hang the machine.
To fix the issue from this point on, start migrating the env->quiesced
value.
For QEMU versions < 10.0, sending the new element on the stream would
cause migration to be aborted, so add the appropriate compatibility
property to omit the new subsection.
Independently of this patch, all migrations from QEMU versions < 10.0
will result in a hang since the older QEMU never migrates
env->quiesced. This is bad because it leaves machines already running
on the old QEMU without a migration path into newer versions.
As a workaround, clear env->quiesced in the new QEMU whenever
cpu->halted is also clear. This assumes rtas_stop_self() always sets
both flags at the same time. Migrations during secondaries bringup
(i.e. before rtas-start-cpu) will still cause a hang, but those are
early enough that requiring reboot would not be unreasonable.
Note that this was tested with -cpu power9 and -machine ic-mode=xive
due to another bug affecting migration of XICS guests. Tested both
forward and backward migration and savevm/loadvm from 9.2 and 10.0.
Reported-by: Fabian Vogt <fvogt@suse.de>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3079
Fixes: fb802acdc8b ("ppc/spapr: Fix RTAS stopped state")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
The choice of PowerPCCPU to hold the compat property is dubious. This
only affects pseries, but it seems like a layering violation to access
SpaprMachine from target/ppc/, suggestions welcome.
---
hw/core/machine.c | 1 +
target/ppc/cpu.h | 1 +
target/ppc/cpu_init.c | 7 +++++++
target/ppc/machine.c | 40 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 49 insertions(+)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index bd47527479..ea83c0876b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -42,6 +42,7 @@ GlobalProperty hw_compat_10_0[] = {
{ "vfio-pci", "x-migration-load-config-after-iter", "off" },
{ "ramfb", "use-legacy-x86-rom", "true"},
{ "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
+ { "powerpc64-cpu", "rtas-stopped-state", "false" },
};
const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 6b90543811..8ff453024b 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1470,6 +1470,7 @@ struct ArchCPU {
void *machine_data;
int32_t node_id; /* NUMA node this CPU belongs to */
PPCHash64Options *hash64_opts;
+ bool rtas_stopped_state;
/* Those resources are used only during code translation */
/* opcode handlers */
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index a0e77f2673..4380c6eb14 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -55,6 +55,11 @@
/* #define PPC_DEBUG_SPR */
/* #define USE_APPLE_GDB */
+static const Property powerpc_cpu_properties[] = {
+ DEFINE_PROP_BOOL("rtas-stopped-state", PowerPCCPU,
+ rtas_stopped_state, true),
+};
+
static inline void vscr_init(CPUPPCState *env, uint32_t val)
{
/* Altivec always uses round-to-nearest */
@@ -7525,6 +7530,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, const void *data)
&pcc->parent_unrealize);
pcc->pvr_match = ppc_pvr_match_default;
+ device_class_set_props(dc, powerpc_cpu_properties);
+
resettable_class_set_parent_phases(rc, NULL, ppc_cpu_reset_hold, NULL,
&pcc->parent_phases);
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index d72e5ecb94..8797233ebe 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -257,6 +257,23 @@ static int cpu_post_load(void *opaque, int version_id)
ppc_store_sdr1(env, env->spr[SPR_SDR1]);
}
+ if (!cpu->rtas_stopped_state) {
+ /*
+ * The source QEMU doesn't have fb802acdc8 and still uses halt
+ * + PM bits in LPCR to implement RTAS stopped state. The new
+ * QEMU will have put the newly created vcpus in that state,
+ * waiting for the start-cpu RTAS call. Clear the quiesced
+ * flag if possible, otherwise the newly-loaded machine will
+ * hang indefinitely due to quiesced state ignoring
+ * interrupts.
+ */
+
+ if (!CPU(cpu)->halted) {
+ /* not halted, so definitely not in RTAS stopped state */
+ env->quiesced = 0;
+ }
+ }
+
post_load_update_msr(env);
if (tcg_enabled()) {
@@ -649,6 +666,28 @@ static const VMStateDescription vmstate_reservation = {
}
};
+static bool rtas_stopped_needed(void *opaque)
+{
+ PowerPCCPU *cpu = opaque;
+
+ return cpu->rtas_stopped_state && !cpu->env.quiesced;
+}
+
+static const VMStateDescription vmstate_rtas_stopped = {
+ .name = "cpu/rtas_stopped",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .needed = rtas_stopped_needed,
+ .fields = (const VMStateField[]) {
+ /*
+ * "RTAS stopped" state, independent of halted state. For QEMU
+ * < 10.0, this is taken from cpu->halted at cpu_post_load()
+ */
+ VMSTATE_BOOL(env.quiesced, PowerPCCPU),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
#ifdef TARGET_PPC64
static bool bhrb_needed(void *opaque)
{
@@ -715,6 +754,7 @@ const VMStateDescription vmstate_ppc_cpu = {
&vmstate_tlbmas,
&vmstate_compat,
&vmstate_reservation,
+ &vmstate_rtas_stopped,
NULL
}
};
--
2.35.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tests/functional: Extract migration code into a new class
2025-08-19 22:39 ` [RFC PATCH 2/4] tests/functional: Extract migration code into a new class Fabiano Rosas
@ 2025-08-20 6:50 ` Thomas Huth
0 siblings, 0 replies; 12+ messages in thread
From: Thomas Huth @ 2025-08-20 6:50 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
Cc: Nicholas Piggin, Fabian Vogt, Peter Xu,
Philippe Mathieu-Daudé, Daniel P. Berrangé
On 20/08/2025 00.39, Fabiano Rosas wrote:
> Move some of the code from test_migration.py to a new class so it can
> be reused to invoke migrations from other tests.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> I see this conflicts with Thomas' series, I'll update accordingly.
> ---
> tests/functional/qemu_test/migration.py | 40 +++++++++++++++++++++++++
> tests/functional/test_migration.py | 24 ++-------------
> 2 files changed, 43 insertions(+), 21 deletions(-)
> create mode 100644 tests/functional/qemu_test/migration.py
>
> diff --git a/tests/functional/qemu_test/migration.py b/tests/functional/qemu_test/migration.py
> new file mode 100644
> index 0000000000..37988704e8
> --- /dev/null
> +++ b/tests/functional/qemu_test/migration.py
> @@ -0,0 +1,40 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +#
> +# Migration test
> +#
> +# Copyright (c) 2019 Red Hat, Inc.
> +#
> +# Authors:
> +# Cleber Rosa <crosa@redhat.com>
> +# Caio Carrara <ccarrara@redhat.com>
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later. See the COPYING file in the top-level directory.
> +
> +import time
> +
> +
> +class Migration():
> +
> + @staticmethod
> + def migration_finished(vm):
> + return vm.cmd('query-migrate')['status'] in ('completed', 'failed')
> +
> + def assert_migration(self, test, src_vm, dst_vm, timeout):
> +
> + end = time.monotonic() + timeout
> + while time.monotonic() < end and not self.migration_finished(src_vm):
> + time.sleep(0.1)
> +
> + end = time.monotonic() + timeout
> + while time.monotonic() < end and not self.migration_finished(dst_vm):
> + time.sleep(0.1)
> +
> + test.assertEqual(src_vm.cmd('query-migrate')['status'], 'completed')
> + test.assertEqual(dst_vm.cmd('query-migrate')['status'], 'completed')
> + test.assertEqual(dst_vm.cmd('query-status')['status'], 'running')
> + test.assertEqual(src_vm.cmd('query-status')['status'],'postmigrate')
> +
> + def migrate(self, test, source_vm, dest_vm, src_uri, timeout):
> + source_vm.qmp('migrate', uri=src_uri)
> + self.assert_migration(test, source_vm, dest_vm, timeout)
> diff --git a/tests/functional/test_migration.py b/tests/functional/test_migration.py
> index c4393c3543..1c75a98330 100755
> --- a/tests/functional/test_migration.py
> +++ b/tests/functional/test_migration.py
> @@ -15,6 +15,7 @@
> import time
I guess you could drop the "import time" here now?
Apart from that:
Reviewed-by: Thomas Huth <thuth@redhat.com>
> from qemu_test import QemuSystemTest, skipIfMissingCommands
> +from qemu_test.migration import Migration
> from qemu_test.ports import Ports
>
>
> @@ -22,25 +23,6 @@ class MigrationTest(QemuSystemTest):
>
> timeout = 10
>
> - @staticmethod
> - def migration_finished(vm):
> - return vm.cmd('query-migrate')['status'] in ('completed', 'failed')
> -
> - def assert_migration(self, src_vm, dst_vm):
> -
> - end = time.monotonic() + self.timeout
> - while time.monotonic() < end and not self.migration_finished(src_vm):
> - time.sleep(0.1)
> -
> - end = time.monotonic() + self.timeout
> - while time.monotonic() < end and not self.migration_finished(dst_vm):
> - time.sleep(0.1)
> -
> - self.assertEqual(src_vm.cmd('query-migrate')['status'], 'completed')
> - self.assertEqual(dst_vm.cmd('query-migrate')['status'], 'completed')
> - self.assertEqual(dst_vm.cmd('query-status')['status'], 'running')
> - self.assertEqual(src_vm.cmd('query-status')['status'],'postmigrate')
> -
> def select_machine(self):
> target_machine = {
> 'aarch64': 'quanta-gsj',
> @@ -67,8 +49,8 @@ def do_migrate(self, dest_uri, src_uri=None):
> source_vm = self.get_vm(name="source-qemu")
> source_vm.add_args('-nodefaults')
> source_vm.launch()
> - source_vm.qmp('migrate', uri=src_uri)
> - self.assert_migration(source_vm, dest_vm)
> +
> + Migration().migrate(self, source_vm, dest_vm, src_uri, self.timeout)
>
> def _get_free_port(self, ports):
> port = ports.find_free_port()
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] target/ppc: Fix env->quiesced migration
2025-08-19 22:39 ` [PATCH 4/4] target/ppc: Fix env->quiesced migration Fabiano Rosas
@ 2025-08-20 6:55 ` Thomas Huth
2025-08-20 15:07 ` Fabiano Rosas
0 siblings, 1 reply; 12+ messages in thread
From: Thomas Huth @ 2025-08-20 6:55 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
Cc: Nicholas Piggin, Fabian Vogt, Peter Xu, Eduardo Habkost,
Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
Zhao Liu, Chinmay Rath
On 20/08/2025 00.39, Fabiano Rosas wrote:
> The commit referenced (from QEMU 10.0) has changed the way the pseries
> machine marks a cpu as quiesced. Previously, the cpu->halted value
> from QEMU common cpu code was (incorrectly) used. With the fix, the
> env->quiesced variable starts being used, which improves on the
> original situation, but also causes a side effect after migration:
>
> The env->quiesced is set at reset and never migrated, which causes the
> destination QEMU to stop delivering interrupts and hang the machine.
>
> To fix the issue from this point on, start migrating the env->quiesced
> value.
>
> For QEMU versions < 10.0, sending the new element on the stream would
> cause migration to be aborted, so add the appropriate compatibility
> property to omit the new subsection.
>
> Independently of this patch, all migrations from QEMU versions < 10.0
> will result in a hang since the older QEMU never migrates
> env->quiesced. This is bad because it leaves machines already running
> on the old QEMU without a migration path into newer versions.
>
> As a workaround, clear env->quiesced in the new QEMU whenever
> cpu->halted is also clear. This assumes rtas_stop_self() always sets
> both flags at the same time. Migrations during secondaries bringup
> (i.e. before rtas-start-cpu) will still cause a hang, but those are
> early enough that requiring reboot would not be unreasonable.
>
> Note that this was tested with -cpu power9 and -machine ic-mode=xive
> due to another bug affecting migration of XICS guests. Tested both
> forward and backward migration and savevm/loadvm from 9.2 and 10.0.
>
> Reported-by: Fabian Vogt <fvogt@suse.de>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3079
> Fixes: fb802acdc8b ("ppc/spapr: Fix RTAS stopped state")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> The choice of PowerPCCPU to hold the compat property is dubious. This
> only affects pseries, but it seems like a layering violation to access
> SpaprMachine from target/ppc/, suggestions welcome.
> ---
> hw/core/machine.c | 1 +
> target/ppc/cpu.h | 1 +
> target/ppc/cpu_init.c | 7 +++++++
> target/ppc/machine.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 49 insertions(+)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index bd47527479..ea83c0876b 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -42,6 +42,7 @@ GlobalProperty hw_compat_10_0[] = {
> { "vfio-pci", "x-migration-load-config-after-iter", "off" },
> { "ramfb", "use-legacy-x86-rom", "true"},
> { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
> + { "powerpc64-cpu", "rtas-stopped-state", "false" },
This is specific to ppc, so it should not go into the generic hw_compat_* array.
Please define a spapr_compat_10_0 array in
spapr_machine_10_0_class_options() and do another compat_props_add() for
that array there. (Similar to what is done for TYPE_SPAPR_PCI_HOST_BRIDGE in
spapr_machine_5_0_class_options() for example)
Thanks,
Thomas
> };
> const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
>
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 6b90543811..8ff453024b 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1470,6 +1470,7 @@ struct ArchCPU {
> void *machine_data;
> int32_t node_id; /* NUMA node this CPU belongs to */
> PPCHash64Options *hash64_opts;
> + bool rtas_stopped_state;
>
> /* Those resources are used only during code translation */
> /* opcode handlers */
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index a0e77f2673..4380c6eb14 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -55,6 +55,11 @@
> /* #define PPC_DEBUG_SPR */
> /* #define USE_APPLE_GDB */
>
> +static const Property powerpc_cpu_properties[] = {
> + DEFINE_PROP_BOOL("rtas-stopped-state", PowerPCCPU,
> + rtas_stopped_state, true),
> +};
> +
> static inline void vscr_init(CPUPPCState *env, uint32_t val)
> {
> /* Altivec always uses round-to-nearest */
> @@ -7525,6 +7530,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, const void *data)
> &pcc->parent_unrealize);
> pcc->pvr_match = ppc_pvr_match_default;
>
> + device_class_set_props(dc, powerpc_cpu_properties);
> +
> resettable_class_set_parent_phases(rc, NULL, ppc_cpu_reset_hold, NULL,
> &pcc->parent_phases);
>
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index d72e5ecb94..8797233ebe 100644
> --- a/target/ppc/machine.c
> +++ b/target/ppc/machine.c
> @@ -257,6 +257,23 @@ static int cpu_post_load(void *opaque, int version_id)
> ppc_store_sdr1(env, env->spr[SPR_SDR1]);
> }
>
> + if (!cpu->rtas_stopped_state) {
> + /*
> + * The source QEMU doesn't have fb802acdc8 and still uses halt
> + * + PM bits in LPCR to implement RTAS stopped state. The new
> + * QEMU will have put the newly created vcpus in that state,
> + * waiting for the start-cpu RTAS call. Clear the quiesced
> + * flag if possible, otherwise the newly-loaded machine will
> + * hang indefinitely due to quiesced state ignoring
> + * interrupts.
> + */
> +
> + if (!CPU(cpu)->halted) {
> + /* not halted, so definitely not in RTAS stopped state */
> + env->quiesced = 0;
> + }
> + }
> +
> post_load_update_msr(env);
>
> if (tcg_enabled()) {
> @@ -649,6 +666,28 @@ static const VMStateDescription vmstate_reservation = {
> }
> };
>
> +static bool rtas_stopped_needed(void *opaque)
> +{
> + PowerPCCPU *cpu = opaque;
> +
> + return cpu->rtas_stopped_state && !cpu->env.quiesced;
> +}
> +
> +static const VMStateDescription vmstate_rtas_stopped = {
> + .name = "cpu/rtas_stopped",
> + .version_id = 1,
> + .minimum_version_id = 1,
> + .needed = rtas_stopped_needed,
> + .fields = (const VMStateField[]) {
> + /*
> + * "RTAS stopped" state, independent of halted state. For QEMU
> + * < 10.0, this is taken from cpu->halted at cpu_post_load()
> + */
> + VMSTATE_BOOL(env.quiesced, PowerPCCPU),
> + VMSTATE_END_OF_LIST()
> + }
> +};
> +
> #ifdef TARGET_PPC64
> static bool bhrb_needed(void *opaque)
> {
> @@ -715,6 +754,7 @@ const VMStateDescription vmstate_ppc_cpu = {
> &vmstate_tlbmas,
> &vmstate_compat,
> &vmstate_reservation,
> + &vmstate_rtas_stopped,
> NULL
> }
> };
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 3/4] tests/functional: Add a OS level migration test for pseries
2025-08-19 22:39 ` [RFC PATCH 3/4] tests/functional: Add a OS level migration test for pseries Fabiano Rosas
@ 2025-08-20 7:03 ` Thomas Huth
2025-08-20 15:08 ` Fabiano Rosas
0 siblings, 1 reply; 12+ messages in thread
From: Thomas Huth @ 2025-08-20 7:03 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
Cc: Nicholas Piggin, Fabian Vogt, Peter Xu, Harsh Prateek Bora
On 20/08/2025 00.39, Fabiano Rosas wrote:
> There's currently no OS level test for ppc64le. Add one such test by
> reusing the boot level tests that are already present.
>
> The test boots the source machine, waits for it to reach a mid-boot
> message, migrates and checks that the destination has reached the
> final boot message (VFS error due to no disk).
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> tests/functional/test_ppc64_pseries.py | 41 ++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/tests/functional/test_ppc64_pseries.py b/tests/functional/test_ppc64_pseries.py
> index 67057934e8..7a7e0fe8ae 100755
> --- a/tests/functional/test_ppc64_pseries.py
> +++ b/tests/functional/test_ppc64_pseries.py
> @@ -9,6 +9,8 @@
>
> from qemu_test import QemuSystemTest, Asset
> from qemu_test import wait_for_console_pattern
> +from qemu_test.migration import Migration
> +from qemu_test.ports import Ports
>
> class pseriesMachine(QemuSystemTest):
>
> @@ -87,5 +89,44 @@ def test_ppc64_linux_big_boot(self):
> wait_for_console_pattern(self, console_pattern, self.panic_message)
> wait_for_console_pattern(self, self.good_message, self.panic_message)
>
> + def test_ppc64_linux_migration(self):
> + with Ports() as ports:
> + port = ports.find_free_port()
> + if port is None:
> + self.skipTest('Failed to find a free port')
> + uri = 'tcp:localhost:%u' % port
Hi,
this is not how to use the context for Ports: Once the "with" block is left,
the locking for the free port will be gone and you're subject to a race
condition with other tests running in parallel (see the __enter__ and
__exit__ methods in tests/functional/qemu_test/ports.py ... and yes, there
should be more documentation for this).
You've got to put everything up to the point where QEMU takes the port into
the "with" block, i.e. everything up to including the Migration().migrate()
line.
Thomas
> + kernel_path = self.ASSET_KERNEL.fetch()
> + kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE
> +
> + self.set_machine('pseries')
> +
> + dest_vm = self.get_vm('-incoming', uri, name="dest-qemu")
> + dest_vm.add_args('-smp', '4')
> + dest_vm.add_args('-nodefaults')
> + dest_vm.add_args('-kernel', kernel_path,
> + '-append', kernel_command_line)
> + dest_vm.set_console()
> + dest_vm.launch()
> +
> + source_vm = self.get_vm(name="source-qemu")
> + source_vm.add_args('-smp', '4')
> + source_vm.add_args('-nodefaults')
> + source_vm.add_args('-kernel', kernel_path,
> + '-append', kernel_command_line)
> + source_vm.set_console()
> + source_vm.launch()
> +
> + # ensure the boot has reached Linux
> + console_pattern = 'smp: Brought up 1 node, 4 CPUs'
> + wait_for_console_pattern(self, console_pattern, self.panic_message,
> + vm=source_vm)
> +
> + Migration().migrate(self, source_vm, dest_vm, uri, self.timeout)
> +
> + # ensure the boot proceeds after migration
> + wait_for_console_pattern(self, self.good_message, self.panic_message,
> + vm=dest_vm)
> +
> if __name__ == '__main__':
> QemuSystemTest.main()
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] target/ppc: Fix env->quiesced migration
2025-08-20 6:55 ` Thomas Huth
@ 2025-08-20 15:07 ` Fabiano Rosas
2025-08-20 15:20 ` Thomas Huth
0 siblings, 1 reply; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-20 15:07 UTC (permalink / raw)
To: Thomas Huth, qemu-devel
Cc: Nicholas Piggin, Fabian Vogt, Peter Xu, Eduardo Habkost,
Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
Zhao Liu, Chinmay Rath
Thomas Huth <thuth@redhat.com> writes:
> On 20/08/2025 00.39, Fabiano Rosas wrote:
>> The commit referenced (from QEMU 10.0) has changed the way the pseries
>> machine marks a cpu as quiesced. Previously, the cpu->halted value
>> from QEMU common cpu code was (incorrectly) used. With the fix, the
>> env->quiesced variable starts being used, which improves on the
>> original situation, but also causes a side effect after migration:
>>
>> The env->quiesced is set at reset and never migrated, which causes the
>> destination QEMU to stop delivering interrupts and hang the machine.
>>
>> To fix the issue from this point on, start migrating the env->quiesced
>> value.
>>
>> For QEMU versions < 10.0, sending the new element on the stream would
>> cause migration to be aborted, so add the appropriate compatibility
>> property to omit the new subsection.
>>
>> Independently of this patch, all migrations from QEMU versions < 10.0
>> will result in a hang since the older QEMU never migrates
>> env->quiesced. This is bad because it leaves machines already running
>> on the old QEMU without a migration path into newer versions.
>>
>> As a workaround, clear env->quiesced in the new QEMU whenever
>> cpu->halted is also clear. This assumes rtas_stop_self() always sets
>> both flags at the same time. Migrations during secondaries bringup
>> (i.e. before rtas-start-cpu) will still cause a hang, but those are
>> early enough that requiring reboot would not be unreasonable.
>>
>> Note that this was tested with -cpu power9 and -machine ic-mode=xive
>> due to another bug affecting migration of XICS guests. Tested both
>> forward and backward migration and savevm/loadvm from 9.2 and 10.0.
>>
>> Reported-by: Fabian Vogt <fvogt@suse.de>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3079
>> Fixes: fb802acdc8b ("ppc/spapr: Fix RTAS stopped state")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> The choice of PowerPCCPU to hold the compat property is dubious. This
>> only affects pseries, but it seems like a layering violation to access
>> SpaprMachine from target/ppc/, suggestions welcome.
>> ---
>> hw/core/machine.c | 1 +
>> target/ppc/cpu.h | 1 +
>> target/ppc/cpu_init.c | 7 +++++++
>> target/ppc/machine.c | 40 ++++++++++++++++++++++++++++++++++++++++
>> 4 files changed, 49 insertions(+)
>>
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index bd47527479..ea83c0876b 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -42,6 +42,7 @@ GlobalProperty hw_compat_10_0[] = {
>> { "vfio-pci", "x-migration-load-config-after-iter", "off" },
>> { "ramfb", "use-legacy-x86-rom", "true"},
>> { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
>> + { "powerpc64-cpu", "rtas-stopped-state", "false" },
>
> This is specific to ppc, so it should not go into the generic hw_compat_* array.
>
So arm-cpu in hw_compat_9_2 should not be there?
> Please define a spapr_compat_10_0 array in
> spapr_machine_10_0_class_options() and do another compat_props_add() for
> that array there. (Similar to what is done for TYPE_SPAPR_PCI_HOST_BRIDGE in
> spapr_machine_5_0_class_options() for example)
>
Ok, thanks for the pointer.
> Thanks,
> Thomas
>
>
>> };
>> const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
>>
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index 6b90543811..8ff453024b 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1470,6 +1470,7 @@ struct ArchCPU {
>> void *machine_data;
>> int32_t node_id; /* NUMA node this CPU belongs to */
>> PPCHash64Options *hash64_opts;
>> + bool rtas_stopped_state;
>>
>> /* Those resources are used only during code translation */
>> /* opcode handlers */
>> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
>> index a0e77f2673..4380c6eb14 100644
>> --- a/target/ppc/cpu_init.c
>> +++ b/target/ppc/cpu_init.c
>> @@ -55,6 +55,11 @@
>> /* #define PPC_DEBUG_SPR */
>> /* #define USE_APPLE_GDB */
>>
>> +static const Property powerpc_cpu_properties[] = {
>> + DEFINE_PROP_BOOL("rtas-stopped-state", PowerPCCPU,
>> + rtas_stopped_state, true),
>> +};
>> +
>> static inline void vscr_init(CPUPPCState *env, uint32_t val)
>> {
>> /* Altivec always uses round-to-nearest */
>> @@ -7525,6 +7530,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, const void *data)
>> &pcc->parent_unrealize);
>> pcc->pvr_match = ppc_pvr_match_default;
>>
>> + device_class_set_props(dc, powerpc_cpu_properties);
>> +
>> resettable_class_set_parent_phases(rc, NULL, ppc_cpu_reset_hold, NULL,
>> &pcc->parent_phases);
>>
>> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
>> index d72e5ecb94..8797233ebe 100644
>> --- a/target/ppc/machine.c
>> +++ b/target/ppc/machine.c
>> @@ -257,6 +257,23 @@ static int cpu_post_load(void *opaque, int version_id)
>> ppc_store_sdr1(env, env->spr[SPR_SDR1]);
>> }
>>
>> + if (!cpu->rtas_stopped_state) {
>> + /*
>> + * The source QEMU doesn't have fb802acdc8 and still uses halt
>> + * + PM bits in LPCR to implement RTAS stopped state. The new
>> + * QEMU will have put the newly created vcpus in that state,
>> + * waiting for the start-cpu RTAS call. Clear the quiesced
>> + * flag if possible, otherwise the newly-loaded machine will
>> + * hang indefinitely due to quiesced state ignoring
>> + * interrupts.
>> + */
>> +
>> + if (!CPU(cpu)->halted) {
>> + /* not halted, so definitely not in RTAS stopped state */
>> + env->quiesced = 0;
>> + }
>> + }
>> +
>> post_load_update_msr(env);
>>
>> if (tcg_enabled()) {
>> @@ -649,6 +666,28 @@ static const VMStateDescription vmstate_reservation = {
>> }
>> };
>>
>> +static bool rtas_stopped_needed(void *opaque)
>> +{
>> + PowerPCCPU *cpu = opaque;
>> +
>> + return cpu->rtas_stopped_state && !cpu->env.quiesced;
>> +}
>> +
>> +static const VMStateDescription vmstate_rtas_stopped = {
>> + .name = "cpu/rtas_stopped",
>> + .version_id = 1,
>> + .minimum_version_id = 1,
>> + .needed = rtas_stopped_needed,
>> + .fields = (const VMStateField[]) {
>> + /*
>> + * "RTAS stopped" state, independent of halted state. For QEMU
>> + * < 10.0, this is taken from cpu->halted at cpu_post_load()
>> + */
>> + VMSTATE_BOOL(env.quiesced, PowerPCCPU),
>> + VMSTATE_END_OF_LIST()
>> + }
>> +};
>> +
>> #ifdef TARGET_PPC64
>> static bool bhrb_needed(void *opaque)
>> {
>> @@ -715,6 +754,7 @@ const VMStateDescription vmstate_ppc_cpu = {
>> &vmstate_tlbmas,
>> &vmstate_compat,
>> &vmstate_reservation,
>> + &vmstate_rtas_stopped,
>> NULL
>> }
>> };
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 3/4] tests/functional: Add a OS level migration test for pseries
2025-08-20 7:03 ` Thomas Huth
@ 2025-08-20 15:08 ` Fabiano Rosas
0 siblings, 0 replies; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-20 15:08 UTC (permalink / raw)
To: Thomas Huth, qemu-devel
Cc: Nicholas Piggin, Fabian Vogt, Peter Xu, Harsh Prateek Bora
Thomas Huth <thuth@redhat.com> writes:
> On 20/08/2025 00.39, Fabiano Rosas wrote:
>> There's currently no OS level test for ppc64le. Add one such test by
>> reusing the boot level tests that are already present.
>>
>> The test boots the source machine, waits for it to reach a mid-boot
>> message, migrates and checks that the destination has reached the
>> final boot message (VFS error due to no disk).
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> tests/functional/test_ppc64_pseries.py | 41 ++++++++++++++++++++++++++
>> 1 file changed, 41 insertions(+)
>>
>> diff --git a/tests/functional/test_ppc64_pseries.py b/tests/functional/test_ppc64_pseries.py
>> index 67057934e8..7a7e0fe8ae 100755
>> --- a/tests/functional/test_ppc64_pseries.py
>> +++ b/tests/functional/test_ppc64_pseries.py
>> @@ -9,6 +9,8 @@
>>
>> from qemu_test import QemuSystemTest, Asset
>> from qemu_test import wait_for_console_pattern
>> +from qemu_test.migration import Migration
>> +from qemu_test.ports import Ports
>>
>> class pseriesMachine(QemuSystemTest):
>>
>> @@ -87,5 +89,44 @@ def test_ppc64_linux_big_boot(self):
>> wait_for_console_pattern(self, console_pattern, self.panic_message)
>> wait_for_console_pattern(self, self.good_message, self.panic_message)
>>
>> + def test_ppc64_linux_migration(self):
>> + with Ports() as ports:
>> + port = ports.find_free_port()
>> + if port is None:
>> + self.skipTest('Failed to find a free port')
>> + uri = 'tcp:localhost:%u' % port
>
> Hi,
>
> this is not how to use the context for Ports: Once the "with" block is left,
> the locking for the free port will be gone and you're subject to a race
> condition with other tests running in parallel (see the __enter__ and
> __exit__ methods in tests/functional/qemu_test/ports.py ... and yes, there
> should be more documentation for this).
>
Haha, I'm dumb. It never crossed my mind.
Thanks
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] target/ppc: Fix env->quiesced migration
2025-08-20 15:07 ` Fabiano Rosas
@ 2025-08-20 15:20 ` Thomas Huth
0 siblings, 0 replies; 12+ messages in thread
From: Thomas Huth @ 2025-08-20 15:20 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
Cc: Nicholas Piggin, Fabian Vogt, Peter Xu, Eduardo Habkost,
Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
Zhao Liu, Chinmay Rath, qemu-arm
On 20/08/2025 17.07, Fabiano Rosas wrote:
> Thomas Huth <thuth@redhat.com> writes:
>
>> On 20/08/2025 00.39, Fabiano Rosas wrote:
>>> The commit referenced (from QEMU 10.0) has changed the way the pseries
>>> machine marks a cpu as quiesced. Previously, the cpu->halted value
>>> from QEMU common cpu code was (incorrectly) used. With the fix, the
>>> env->quiesced variable starts being used, which improves on the
>>> original situation, but also causes a side effect after migration:
>>>
>>> The env->quiesced is set at reset and never migrated, which causes the
>>> destination QEMU to stop delivering interrupts and hang the machine.
>>>
>>> To fix the issue from this point on, start migrating the env->quiesced
>>> value.
>>>
>>> For QEMU versions < 10.0, sending the new element on the stream would
>>> cause migration to be aborted, so add the appropriate compatibility
>>> property to omit the new subsection.
>>>
>>> Independently of this patch, all migrations from QEMU versions < 10.0
>>> will result in a hang since the older QEMU never migrates
>>> env->quiesced. This is bad because it leaves machines already running
>>> on the old QEMU without a migration path into newer versions.
>>>
>>> As a workaround, clear env->quiesced in the new QEMU whenever
>>> cpu->halted is also clear. This assumes rtas_stop_self() always sets
>>> both flags at the same time. Migrations during secondaries bringup
>>> (i.e. before rtas-start-cpu) will still cause a hang, but those are
>>> early enough that requiring reboot would not be unreasonable.
>>>
>>> Note that this was tested with -cpu power9 and -machine ic-mode=xive
>>> due to another bug affecting migration of XICS guests. Tested both
>>> forward and backward migration and savevm/loadvm from 9.2 and 10.0.
>>>
>>> Reported-by: Fabian Vogt <fvogt@suse.de>
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3079
>>> Fixes: fb802acdc8b ("ppc/spapr: Fix RTAS stopped state")
>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> ---
>>> The choice of PowerPCCPU to hold the compat property is dubious. This
>>> only affects pseries, but it seems like a layering violation to access
>>> SpaprMachine from target/ppc/, suggestions welcome.
>>> ---
>>> hw/core/machine.c | 1 +
>>> target/ppc/cpu.h | 1 +
>>> target/ppc/cpu_init.c | 7 +++++++
>>> target/ppc/machine.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>> 4 files changed, 49 insertions(+)
>>>
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index bd47527479..ea83c0876b 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -42,6 +42,7 @@ GlobalProperty hw_compat_10_0[] = {
>>> { "vfio-pci", "x-migration-load-config-after-iter", "off" },
>>> { "ramfb", "use-legacy-x86-rom", "true"},
>>> { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
>>> + { "powerpc64-cpu", "rtas-stopped-state", "false" },
>>
>> This is specific to ppc, so it should not go into the generic hw_compat_* array.
>>
>
> So arm-cpu in hw_compat_9_2 should not be there?
Right, this should get moved to the code in hw/arm/virt.c.
Same for arm-cpu in hw_compat_9_0 and for arm-gicv3-common in hw_compat_7_0.
Thomas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/4] hw/intc/xics: Add missing call to register vmstate_icp_server
2025-08-19 22:39 ` [PATCH 1/4] hw/intc/xics: Add missing call to register vmstate_icp_server Fabiano Rosas
@ 2025-09-18 15:28 ` Gautam Menghani
0 siblings, 0 replies; 12+ messages in thread
From: Gautam Menghani @ 2025-09-18 15:28 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Nicholas Piggin, Thomas Huth, Fabian Vogt, Peter Xu,
Philippe Mathieu-Daudé, Harsh Prateek Bora
On Tue, Aug 19, 2025 at 07:39:02PM -0300, Fabiano Rosas wrote:
> From: Fabian Vogt <fvogt@suse.de>
>
> An obsolete wrapper function with a workaround was removed entirely,
> without restoring the call it wrapped.
>
> Without this, the guest is stuck after savevm/loadvm.
>
> Fixes: 24ee9229fe31 ("ppc/spapr: remove deprecated machine pseries-2.9")
> Signed-off-by: Fabian Vogt <fvogt@suse.de>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Link: https://lore.kernel.org/qemu-devel/6187781.lOV4Wx5bFT@fvogt-thinkpad
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> hw/intc/xics.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index d9a199e883..200710eb6c 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -335,6 +335,8 @@ static void icp_realize(DeviceState *dev, Error **errp)
> return;
> }
> }
> +
> + vmstate_register(NULL, icp->cs->cpu_index, &vmstate_icp_server, icp);
> }
>
> static void icp_unrealize(DeviceState *dev)
> --
> 2.35.3
>
>
I did some testing with QEMU-9.2.0 and 10.1.0 and my observations are:
1. QEMU-9.2.0
With XICS, both snapshots and migrations were broken and they work
fine with this patch
2. QEMU-10.1.0
With XICS, snapshot and migration both are broken - lockups are observed
(with and without this patch)
The 10.1.0 can be fixed in a follow up patch.
For now since 9.2 works fine, please feel free to add
Reviewed-by: Gautam Menghani <gautam@linux.ibm.com>
Thanks,
Gautam
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-18 15:34 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 22:39 [PATCH 0/4] ppc: Fix migration issues with XICS and quiesce Fabiano Rosas
2025-08-19 22:39 ` [PATCH 1/4] hw/intc/xics: Add missing call to register vmstate_icp_server Fabiano Rosas
2025-09-18 15:28 ` Gautam Menghani
2025-08-19 22:39 ` [RFC PATCH 2/4] tests/functional: Extract migration code into a new class Fabiano Rosas
2025-08-20 6:50 ` Thomas Huth
2025-08-19 22:39 ` [RFC PATCH 3/4] tests/functional: Add a OS level migration test for pseries Fabiano Rosas
2025-08-20 7:03 ` Thomas Huth
2025-08-20 15:08 ` Fabiano Rosas
2025-08-19 22:39 ` [PATCH 4/4] target/ppc: Fix env->quiesced migration Fabiano Rosas
2025-08-20 6:55 ` Thomas Huth
2025-08-20 15:07 ` Fabiano Rosas
2025-08-20 15:20 ` Thomas Huth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).