All of lore.kernel.org
 help / color / mirror / Atom feed
From: Misbah Anjum N <misanjum@linux.ibm.com>
To: Ani Sinha <anisinha@redhat.com>, Pbonzini <pbonzini@redhat.com>,
	Qemu Devel <qemu-devel@nongnu.org>,
	Qemu Ppc <qemu-ppc@nongnu.org>
Cc: npiggin@gmail.com, Harshpb <harshpb@linux.ibm.com>
Subject: Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
Date: Tue, 10 Mar 2026 14:09:49 +0530	[thread overview]
Message-ID: <b088f554af34cbcc4d88271e227074e1@linux.ibm.com> (raw)
In-Reply-To: <e966c053-2e69-b185-532e-fcb4173b4daf@redhat.com>

Hi Ani and Paolo,

We have tested the code by applying both the original commit 
(98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 
9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
However, the issue persists. We've conducted GDB debugging that shows 
the hang is occurring in a different location than what the fix 
addresses.

Since the original patch is breaking KVM guest bringup completely on 
ppc64le, and the fix patch does not resolve the issue, given the 
severity of this regression (complete KVM breakage on ppc64le), we 
should either find a quick fix or consider reverting the patch until a 
proper solution can be identified.

Analysis:
1. This is not a confidential guest. This is a regular KVM guest running 
on ppc64le.
2. The execution flow shows that qemu_system_reset() completes 
successfully and never enters the code path at line 529-543
3. The hang occurs later in qemu_default_main() at system/main.c:49, 
after calling bql_lock()
4. The ppc KVM guest boots fine with the previous commit - 
df8df3cb6b743372ebb335bd8404bc3d748da350
5. This suggests the issue is not with error handling of -EOPNOTSUPP 
during reset, but bql_lock() getting stuck in qemu_default_main()

GDB Trace Analysis:
We set breakpoints at qemu_system_reset() and qemu_default_main() to 
trace the execution flow. The system successfully completes 
qemu_system_reset() without entering the problematic code path where the 
fix provided by you applies (system/runstate.c:529-543).

# gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
pseries,accel=kvm -enable-kvm -m 32768 -smp 
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0

(gdb) handle SIGUSR1 pass nostop noprint
Signal        Stop	Print	Pass to program	Description
SIGUSR1       No	No	Yes		User defined signal 1
(gdb) b qemu_system_reset
Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
(gdb) b qemu_default_main
Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
(gdb) r

Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 
-machine pseries,accel=kvm -enable-kvm -m 32768 -smp 
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0

Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset 
(reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
(gdb) n
517     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : 
NULL;
(gdb) n
519     cpu_synchronize_all_states();
(gdb) n
521     switch (reason) {
(gdb) n
529     if (!cpus_are_resettable() &&
(gdb) n
553     if (mc && mc->reset) {
(gdb) n
554         mc->reset(current_machine, type);
(gdb) n
558     switch (reason) {
(gdb) n
574     if (cpus_are_resettable()) {
(gdb) n
583             cpu_synchronize_all_post_reset();
(gdb) n
587     vm_set_suspended(false);
(gdb) n
qdev_machine_creation_done () at ../hw/core/machine.c:1814
1814    register_global_state();
(gdb) n
qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at 
../system/vl.c:2785
2785    if (machine->cgs && !machine->cgs->ready) {
(gdb) n
2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
(gdb) n
2793    if (!vga_interface_created && !default_vga &&
(gdb) n
qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at 
../system/vl.c:2815
2815    if (loadvm) {
(gdb) n
2820    if (replay_mode != REPLAY_MODE_NONE) {
(gdb) n
2824    if (incoming) {
(gdb) n
2837    } else if (autostart) {
(gdb) n
2838        qmp_cont(NULL);
(gdb) n
qemu_init (argc=<optimized out>, argv=<optimized out>) at 
../system/vl.c:3849
3849    qemu_init_displays();
(gdb) n
3850    accel_setup_post(current_machine);
(gdb) n
3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
(gdb) n
3852        os_setup_post();
(gdb) n
3854    resume_mux_open();
(gdb) n
main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
84      bql_unlock();
(gdb) n
85      replay_mutex_unlock();
(gdb) n
87      if (qemu_main) {
(gdb) n
93          qemu_default_main(NULL);
(gdb) n

Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main 
(opaque=opaque@entry=0x0) at ../system/main.c:48
48      replay_mutex_lock();
(gdb) n
49      bql_lock();
(gdb) n

<hangs>
<system becomes unresponsive at this point>


Thanks,
Misbah Anjum N <misanjumn@ibm.com>



On 2026-03-09 18:53, Ani Sinha wrote:
> Yes seems this is an issue and I will fix it. Not sure if the fix will
> address your issue though ...
> 
> Can you try the following patch?
> 
> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
> From: Ani Sinha <anisinha@redhat.com>
> Date: Mon, 9 Mar 2026 18:44:40 +0530
> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset 
> yet
> 
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>  system/runstate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/system/runstate.c b/system/runstate.c
> index eca722b43c..c1f41284c9 100644
> --- a/system/runstate.c
> +++ b/system/runstate.c
> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>          (current_machine->new_accel_vmfd_on_reset || 
> !cpus_are_resettable())) {
>          if (ac->rebuild_guest) {
>              ret = ac->rebuild_guest(current_machine);
> -            if (ret < 0) {
> +            if (ret < 0 && ret != -EOPNOTSUPP) {
>                  error_report("unable to rebuild guest: %s(%d)",
>                               strerror(-ret), ret);
>                  vm_stop(RUN_STATE_INTERNAL_ERROR);
> +            } else if (ret == -EOPNOTSUPP) {
> +                error_report("accelerator does not support reset!");
>              } else {
>                  info_report("virtual machine state has been rebuilt 
> with new "
>                              "guest file handle.");
> --
> 2.42.0
> 
> 
>> 
>> Is this a confidential guest that cannot be normally reset?
>> 


  reply	other threads:[~2026-03-10  8:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 10:52 [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c Misbah Anjum N
2026-03-09  8:28 ` Misbah Anjum N
2026-03-09 11:04   ` Harsh Prateek Bora
2026-03-09 13:11     ` Ani Sinha
2026-03-09 13:23       ` Ani Sinha
2026-03-10  8:39         ` Misbah Anjum N [this message]
2026-03-10  8:54           ` Ani Sinha
2026-03-10  9:08             ` Misbah Anjum N
2026-03-10  9:34               ` Ani Sinha
2026-03-10 10:05                 ` Misbah Anjum N
2026-03-10 10:12                   ` Ani Sinha
2026-03-18  8:19                     ` Misbah Anjum N
2026-03-18  8:39                       ` Ani Sinha
2026-03-18  9:30                         ` Ani Sinha
2026-04-06  8:54                           ` Misbah Anjum N
2026-04-07  4:09                             ` Ani Sinha
2026-04-07 13:45                               ` Ani Sinha
2026-04-09 16:18                             ` Harsh Prateek Bora
2026-03-09 13:30     ` Ani Sinha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b088f554af34cbcc4d88271e227074e1@linux.ibm.com \
    --to=misanjum@linux.ibm.com \
    --cc=anisinha@redhat.com \
    --cc=harshpb@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.