All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
@ 2026-03-06 10:52 Misbah Anjum N
  2026-03-09  8:28 ` Misbah Anjum N
  0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-06 10:52 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, anisinha, pbonzini

Hi,
I'm reporting a critical regression on ppc64le that causes all KVM 
guests to hang immediately during startup. Git bisect identified commit 
98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad commit. The 
commit completely breaks KVM functionality on ppc64le.

Regression Details:
Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add 
changes required to support KVM VM file descriptor change"
Commit Link: 
https://gitlab.com/qemu-project/qemu/-/commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a

Environment:
Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
Libvirt: 12.1.0
Guest: Fedora 42, Kernel 7.0.0-rc2
Machine Type: pseries with KVM acceleration

Build Configuration:
git clone https://gitlab.com/qemu-project/qemu.git
cd qemu
git submodule init
git submodule update --recursive
./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
make && make install

Reproduction:
Using virt-install:
/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 
'avocado-vt-vm1' --machine pseries --memory=32768 
--vcpu=32,sockets=1,cores=32,threads=1 --import --nographics 
--os-variant rhel8.0 --serial pty --memballoon model=virtio --controller 
type=scsi,model=virtio-scsi --disk 
path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 
--network=bridge=virbr0,model=virtio --boot 
emulator=/usr/bin/qemu-system-ppc64
Result: Starting install...
         <hangs indefinitely with no output>

Using direct QEMU command:
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
pseries,accel=kvm -enable-kvm -m 32768 -smp 
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
Result: <hangs indefinitely with no output>

Analysis:
The commit introduces VM file descriptor change support with 
architecture-specific hooks.
I attempted the following fixes without success:
1. Changed abort() to return 0; in stubs/kvm.c
2. Added early return in kvm_reset_vmfd() when 
kvm_arch_supports_vmfd_change() returns false

Git Bisect Log:
# git bisect bad
98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
Author: Ani Sinha <anisinha@redhat.com>
Date:   Wed Feb 25 09:19:10 2026 +0530

     accel/kvm: add changes required to support KVM VM file descriptor 
change

     This change adds common kvm specific support to handle KVM VM file 
descriptor
     change. KVM VM file descriptor can change as a part of confidential 
guest reset
     mechanism. A new function api kvm_arch_on_vmfd_change() per
     architecture platform is added in order to implement architecture 
specific
     changes required to support it. A subsequent patch will add x86 
specific
     implementation for kvm_arch_on_vmfd_change() as currently only x86 
supports
     confidential guest reset.

     Signed-off-by: Ani Sinha <anisinha@redhat.com>
     Link: 
https://lore.kernel.org/r/20260225035000.385950-6-anisinha@redhat.com
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

  MAINTAINERS            |  6 ++++++
  accel/kvm/kvm-all.c    | 88 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
  accel/kvm/trace-events |  1 +
  include/system/kvm.h   |  3 +++
  stubs/kvm.c            | 22 ++++++++++++++++++++++
  stubs/meson.build      |  1 +
  target/i386/kvm/kvm.c  | 10 ++++++++++
  7 files changed, 128 insertions(+), 3 deletions(-)
  create mode 100644 stubs/kvm.c

# git bisect log
git bisect start
git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]

Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-06 10:52 [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c Misbah Anjum N
@ 2026-03-09  8:28 ` Misbah Anjum N
  2026-03-09 11:04   ` Harsh Prateek Bora
  0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-09  8:28 UTC (permalink / raw)
  To: Anisinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin, harshpb

Hi Ani and Paolo,
Following up on my previous report, I've attempted additional debugging 
to isolate the issue on ppc64le.

I implemented the architecture-specific hooks for ppc64le. After adding 
the following changes and recompiling QEMU and testing with the direct 
qemu-system-ppc64 command, the hang persists with the same issue - no 
output and complete unresponsiveness.

Could you suggest what additional changes are needed to ensure the VM FD 
change doesn't affect architectures that don't support this feature?

Tested with the following changes:
File: stubs/kvm.c
Changed the abort() call to return 0:
int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
{
return 0;  / Changed from abort() */
}

File: target/ppc/kvm.c
Added the following stubs:
int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
{
/ ppc64le doesn't support VM FD changes for confidential guests */
return 0;
}

bool kvm_arch_supports_vmfd_change(void)
{
return false;
}

GDB Backtrace:
I ran QEMU under GDB to capture the hang state. The backtrace shows the 
vCPU thread is waiting on a condition variable:

Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
__syscall_cancel_arch () at 
../sysdeps/unix/sysv/linux/powerpc/syscall_cancel.S:77
#0  __syscall_cancel_arch () at 
../sysdeps/unix/sysv/linux/powerpc/syscall_cancel.S:77
#1  0x00007ffff58a9678 in __internal_syscall_cancel (nr=221) at 
cancellation.c:49
#2  0x00007ffff58aa220 in __futex_abstimed_wait_common64 
(futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true) 
at futex-internal.c:57
#3  __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0, 
clockid=0, abstime=<optimized out>, private=0, cancel=true) at 
futex-internal.c:87
#4  __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10, 
expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
#5  0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0, 
mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at 
pthread_cond_wait.c:426
#6  ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at 
pthread_cond_wait.c:458
#7  0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0, 
mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472) 
at ../util/qemu-thread-posix.c:240
#8  0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) at 
../system/cpus.c:472
#9  0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at 
../accel/kvm/kvm-accel-ops.c:50
#10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at 
../util/qemu-thread-posix.c:414
#11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at 
pthread_create.c:448
#12 0x00007ffff59555f8 in __GI___clone3 () at 
../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S:114

Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>


On 2026-03-06 16:22, Misbah Anjum N wrote:
> Hi,
> I'm reporting a critical regression on ppc64le that causes all KVM
> guests to hang immediately during startup. Git bisect identified
> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
> commit. The commit completely breaks KVM functionality on ppc64le.
> 
> Regression Details:
> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
> changes required to support KVM VM file descriptor change"
> Commit Link:
> https://gitlab.com/qemu-project/qemu/-/commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
> 
> Environment:
> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
> Libvirt: 12.1.0
> Guest: Fedora 42, Kernel 7.0.0-rc2
> Machine Type: pseries with KVM acceleration
> 
> Build Configuration:
> git clone https://gitlab.com/qemu-project/qemu.git
> cd qemu
> git submodule init
> git submodule update --recursive
> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
> make && make install
> 
> Reproduction:
> Using virt-install:
> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
> --name 'avocado-vt-vm1' --machine pseries --memory=32768
> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
> --os-variant rhel8.0 --serial pty --memballoon model=virtio
> --controller type=scsi,model=virtio-scsi --disk
> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,bus=scsi,size=10,format=qcow2
> --network=bridge=virbr0,model=virtio --boot
> emulator=/usr/bin/qemu-system-ppc64
> Result: Starting install...
>         <hangs indefinitely with no output>
> 
> Using direct QEMU command:
> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
> pseries,accel=kvm -enable-kvm -m 32768 -smp
> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> Result: <hangs indefinitely with no output>
> 
> Analysis:
> The commit introduces VM file descriptor change support with
> architecture-specific hooks.
> I attempted the following fixes without success:
> 1. Changed abort() to return 0; in stubs/kvm.c
> 2. Added early return in kvm_reset_vmfd() when
> kvm_arch_supports_vmfd_change() returns false
> 
> Git Bisect Log:
> # git bisect bad
> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
> Author: Ani Sinha <anisinha@redhat.com>
> Date:   Wed Feb 25 09:19:10 2026 +0530
> 
>     accel/kvm: add changes required to support KVM VM file descriptor 
> change
> 
>     This change adds common kvm specific support to handle KVM VM file
> descriptor
>     change. KVM VM file descriptor can change as a part of
> confidential guest reset
>     mechanism. A new function api kvm_arch_on_vmfd_change() per
>     architecture platform is added in order to implement architecture 
> specific
>     changes required to support it. A subsequent patch will add x86 
> specific
>     implementation for kvm_arch_on_vmfd_change() as currently only x86 
> supports
>     confidential guest reset.
> 
>     Signed-off-by: Ani Sinha <anisinha@redhat.com>
>     Link: 
> https://lore.kernel.org/r/20260225035000.385950-6-anisinha@redhat.com
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
>  MAINTAINERS            |  6 ++++++
>  accel/kvm/kvm-all.c    | 88
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  accel/kvm/trace-events |  1 +
>  include/system/kvm.h   |  3 +++
>  stubs/kvm.c            | 22 ++++++++++++++++++++++
>  stubs/meson.build      |  1 +
>  target/i386/kvm/kvm.c  | 10 ++++++++++
>  7 files changed, 128 insertions(+), 3 deletions(-)
>  create mode 100644 stubs/kvm.c
> 
> # git bisect log
> git bisect start
> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
> 
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-09  8:28 ` Misbah Anjum N
@ 2026-03-09 11:04   ` Harsh Prateek Bora
  2026-03-09 13:11     ` Ani Sinha
  2026-03-09 13:30     ` Ani Sinha
  0 siblings, 2 replies; 19+ messages in thread
From: Harsh Prateek Bora @ 2026-03-09 11:04 UTC (permalink / raw)
  To: Misbah Anjum N, Anisinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin

Hi Ani, Paolo,

I think the problem lies here:

For archs which doesnt support vm fd change, we are baling out as below 
in kvm_reset_vmfd.


     /*
      * bail if the current architecture does not support VM file
      * descriptor change.
      */
     if (!kvm_arch_supports_vmfd_change()) {
         error_report("This target architecture does not support KVM VM "
                      "file descriptor change.");
         return -EOPNOTSUPP;
     }

However, when rebuild_guest (kvm_reset_vmfd) is called in
qemu_system_reset here:

     if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
          reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
         (current_machine->new_accel_vmfd_on_reset || 
!cpus_are_resettable())) {
         if (ac->rebuild_guest) {
             ret = ac->rebuild_guest(current_machine);
             if (ret < 0) {
                 error_report("unable to rebuild guest: %s(%d)",
                              strerror(-ret), ret);
                 vm_stop(RUN_STATE_INTERNAL_ERROR);
             } else {
                 info_report("virtual machine state has been rebuilt 
with new "
                             "guest file handle.");
                 guest_state_rebuilt = true;
             }
         } else if (!cpus_are_resettable())  {
             error_report("accelerator does not support reset!");
         } else {
             error_report("accelerator does not support rebuilding guest 
state,"
                          " proceeding with normal reset!");
         }
     }


it just does a vm_stop if rebuild_guest returns < 0.

IMHO, This should handle -EOPNOTSUPP gracefully.
Please advise if this needs to be taken care differently?

regards,
Harsh

On 09/03/26 1:58 pm, Misbah Anjum N wrote:
> Hi Ani and Paolo,
> Following up on my previous report, I've attempted additional debugging 
> to isolate the issue on ppc64le.
> 
> I implemented the architecture-specific hooks for ppc64le. After adding 
> the following changes and recompiling QEMU and testing with the direct 
> qemu-system-ppc64 command, the hang persists with the same issue - no 
> output and complete unresponsiveness.
> 
> Could you suggest what additional changes are needed to ensure the VM FD 
> change doesn't affect architectures that don't support this feature?
> 
> Tested with the following changes:
> File: stubs/kvm.c
> Changed the abort() call to return 0:
> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
> {
> return 0;  / Changed from abort() */
> }
> 
> File: target/ppc/kvm.c
> Added the following stubs:
> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
> {
> / ppc64le doesn't support VM FD changes for confidential guests */
> return 0;
> }
> 
> bool kvm_arch_supports_vmfd_change(void)
> {
> return false;
> }
> 
> GDB Backtrace:
> I ran QEMU under GDB to capture the hang state. The backtrace shows the 
> vCPU thread is waiting on a condition variable:
> 
> Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
> __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ 
> syscall_cancel.S:77
> #0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ 
> syscall_cancel.S:77
> #1  0x00007ffff58a9678 in __internal_syscall_cancel (nr=221) at 
> cancellation.c:49
> #2  0x00007ffff58aa220 in __futex_abstimed_wait_common64 
> (futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true) 
> at futex-internal.c:57
> #3  __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0, 
> clockid=0, abstime=<optimized out>, private=0, cancel=true) at futex- 
> internal.c:87
> #4  __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10, 
> expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
> #5  0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0, 
> mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at pthread_cond_wait.c:426
> #6  ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at 
> pthread_cond_wait.c:458
> #7  0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0, 
> mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472) 
> at ../util/qemu-thread-posix.c:240
> #8  0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) 
> at ../system/cpus.c:472
> #9  0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at ../ 
> accel/kvm/kvm-accel-ops.c:50
> #10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at ../ 
> util/qemu-thread-posix.c:414
> #11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at 
> pthread_create.c:448
> #12 0x00007ffff59555f8 in __GI___clone3 () at ../sysdeps/unix/sysv/ 
> linux/powerpc/powerpc64/clone3.S:114
> 
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>
> 
> 
> On 2026-03-06 16:22, Misbah Anjum N wrote:
>> Hi,
>> I'm reporting a critical regression on ppc64le that causes all KVM
>> guests to hang immediately during startup. Git bisect identified
>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
>> commit. The commit completely breaks KVM functionality on ppc64le.
>>
>> Regression Details:
>> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
>> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
>> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
>> changes required to support KVM VM file descriptor change"
>> Commit Link:
>> https://gitlab.com/qemu-project/qemu/-/ 
>> commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>
>> Environment:
>> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
>> Libvirt: 12.1.0
>> Guest: Fedora 42, Kernel 7.0.0-rc2
>> Machine Type: pseries with KVM acceleration
>>
>> Build Configuration:
>> git clone https://gitlab.com/qemu-project/qemu.git
>> cd qemu
>> git submodule init
>> git submodule update --recursive
>> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
>> make && make install
>>
>> Reproduction:
>> Using virt-install:
>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
>> --controller type=scsi,model=virtio-scsi --disk
>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- 
>> ppc64le.qcow2,bus=scsi,size=10,format=qcow2
>> --network=bridge=virbr0,model=virtio --boot
>> emulator=/usr/bin/qemu-system-ppc64
>> Result: Starting install...
>>         <hangs indefinitely with no output>
>>
>> Using direct QEMU command:
>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>> pseries,accel=kvm -enable-kvm -m 32768 -smp
>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- 
>> ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>> Result: <hangs indefinitely with no output>
>>
>> Analysis:
>> The commit introduces VM file descriptor change support with
>> architecture-specific hooks.
>> I attempted the following fixes without success:
>> 1. Changed abort() to return 0; in stubs/kvm.c
>> 2. Added early return in kvm_reset_vmfd() when
>> kvm_arch_supports_vmfd_change() returns false
>>
>> Git Bisect Log:
>> # git bisect bad
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
>> Author: Ani Sinha <anisinha@redhat.com>
>> Date:   Wed Feb 25 09:19:10 2026 +0530
>>
>>     accel/kvm: add changes required to support KVM VM file descriptor 
>> change
>>
>>     This change adds common kvm specific support to handle KVM VM file
>> descriptor
>>     change. KVM VM file descriptor can change as a part of
>> confidential guest reset
>>     mechanism. A new function api kvm_arch_on_vmfd_change() per
>>     architecture platform is added in order to implement architecture 
>> specific
>>     changes required to support it. A subsequent patch will add x86 
>> specific
>>     implementation for kvm_arch_on_vmfd_change() as currently only x86 
>> supports
>>     confidential guest reset.
>>
>>     Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>     Link: https://lore.kernel.org/r/20260225035000.385950-6- 
>> anisinha@redhat.com
>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>>  MAINTAINERS            |  6 ++++++
>>  accel/kvm/kvm-all.c    | 88
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>> +++++++++++++++---
>>  accel/kvm/trace-events |  1 +
>>  include/system/kvm.h   |  3 +++
>>  stubs/kvm.c            | 22 ++++++++++++++++++++++
>>  stubs/meson.build      |  1 +
>>  target/i386/kvm/kvm.c  | 10 ++++++++++
>>  7 files changed, 128 insertions(+), 3 deletions(-)
>>  create mode 100644 stubs/kvm.c
>>
>> # git bisect log
>> git bisect start
>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
>>
>> Thanks,
>> Misbah Anjum N <misanjum@linux.ibm.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-09 11:04   ` Harsh Prateek Bora
@ 2026-03-09 13:11     ` Ani Sinha
  2026-03-09 13:23       ` Ani Sinha
  2026-03-09 13:30     ` Ani Sinha
  1 sibling, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-09 13:11 UTC (permalink / raw)
  To: Harsh Prateek Bora
  Cc: Misbah Anjum N, Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin



> On 9 Mar 2026, at 4:34 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> 
> Hi Ani, Paolo,
> 
> I think the problem lies here:
> 
> For archs which doesnt support vm fd change, we are baling out as below in kvm_reset_vmfd.
> 
> 
>    /*
>     * bail if the current architecture does not support VM file
>     * descriptor change.
>     */
>    if (!kvm_arch_supports_vmfd_change()) {
>        error_report("This target architecture does not support KVM VM "
>                     "file descriptor change.");
>        return -EOPNOTSUPP;
>    }
> 
> However, when rebuild_guest (kvm_reset_vmfd) is called in
> qemu_system_reset here:
> 
>    if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
>        (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
>        if (ac->rebuild_guest) {
>            ret = ac->rebuild_guest(current_machine);
>            if (ret < 0) {
>                error_report("unable to rebuild guest: %s(%d)",
>                             strerror(-ret), ret);
>                vm_stop(RUN_STATE_INTERNAL_ERROR);
>            } else {
>                info_report("virtual machine state has been rebuilt with new "
>                            "guest file handle.");
>                guest_state_rebuilt = true;
>            }
>        } else if (!cpus_are_resettable())  {
>            error_report("accelerator does not support reset!");
>        } else {
>            error_report("accelerator does not support rebuilding guest state,"
>                         " proceeding with normal reset!");
>        }
>    }
> 
> 
> it just does a vm_stop if rebuild_guest returns < 0.
> 
> IMHO, This should handle -EOPNOTSUPP gracefully.
> Please advise if this needs to be taken care differently?

Is this a confidential guest that cannot be normally reset?


> 
> regards,
> Harsh
> 
> On 09/03/26 1:58 pm, Misbah Anjum N wrote:
>> Hi Ani and Paolo,
>> Following up on my previous report, I've attempted additional debugging to isolate the issue on ppc64le.
>> I implemented the architecture-specific hooks for ppc64le. After adding the following changes and recompiling QEMU and testing with the direct qemu-system-ppc64 command, the hang persists with the same issue - no output and complete unresponsiveness.
>> Could you suggest what additional changes are needed to ensure the VM FD change doesn't affect architectures that don't support this feature?
>> Tested with the following changes:
>> File: stubs/kvm.c
>> Changed the abort() call to return 0:
>> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
>> {
>> return 0;  / Changed from abort() */
>> }
>> File: target/ppc/kvm.c
>> Added the following stubs:
>> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
>> {
>> / ppc64le doesn't support VM FD changes for confidential guests */
>> return 0;
>> }
>> bool kvm_arch_supports_vmfd_change(void)
>> {
>> return false;
>> }
>> GDB Backtrace:
>> I ran QEMU under GDB to capture the hang state. The backtrace shows the vCPU thread is waiting on a condition variable:
>> Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
>> __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ syscall_cancel.S:77
>> #0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ syscall_cancel.S:77
>> #1  0x00007ffff58a9678 in __internal_syscall_cancel (nr=221) at cancellation.c:49
>> #2  0x00007ffff58aa220 in __futex_abstimed_wait_common64 (futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true) at futex-internal.c:57
>> #3  __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0, clockid=0, abstime=<optimized out>, private=0, cancel=true) at futex- internal.c:87
>> #4  __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10, expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
>> #5  0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at pthread_cond_wait.c:426
>> #6  ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at pthread_cond_wait.c:458
>> #7  0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472) at ../util/qemu-thread-posix.c:240
>> #8  0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) at ../system/cpus.c:472
>> #9  0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at ../ accel/kvm/kvm-accel-ops.c:50
>> #10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at ../ util/qemu-thread-posix.c:414
>> #11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at pthread_create.c:448
>> #12 0x00007ffff59555f8 in __GI___clone3 () at ../sysdeps/unix/sysv/ linux/powerpc/powerpc64/clone3.S:114
>> Thanks,
>> Misbah Anjum N <misanjum@linux.ibm.com>
>> On 2026-03-06 16:22, Misbah Anjum N wrote:
>>> Hi,
>>> I'm reporting a critical regression on ppc64le that causes all KVM
>>> guests to hang immediately during startup. Git bisect identified
>>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
>>> commit. The commit completely breaks KVM functionality on ppc64le.
>>> 
>>> Regression Details:
>>> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
>>> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
>>> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
>>> changes required to support KVM VM file descriptor change"
>>> Commit Link:
>>> https://gitlab.com/qemu-project/qemu/-/ commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>> 
>>> Environment:
>>> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
>>> Libvirt: 12.1.0
>>> Guest: Fedora 42, Kernel 7.0.0-rc2
>>> Machine Type: pseries with KVM acceleration
>>> 
>>> Build Configuration:
>>> git clone https://gitlab.com/qemu-project/qemu.git
>>> cd qemu
>>> git submodule init
>>> git submodule update --recursive
>>> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
>>> make && make install
>>> 
>>> Reproduction:
>>> Using virt-install:
>>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
>>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
>>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
>>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
>>> --controller type=scsi,model=virtio-scsi --disk
>>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,bus=scsi,size=10,format=qcow2
>>> --network=bridge=virbr0,model=virtio --boot
>>> emulator=/usr/bin/qemu-system-ppc64
>>> Result: Starting install...
>>>         <hangs indefinitely with no output>
>>> 
>>> Using direct QEMU command:
>>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>>> pseries,accel=kvm -enable-kvm -m 32768 -smp
>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>> Result: <hangs indefinitely with no output>
>>> 
>>> Analysis:
>>> The commit introduces VM file descriptor change support with
>>> architecture-specific hooks.
>>> I attempted the following fixes without success:
>>> 1. Changed abort() to return 0; in stubs/kvm.c
>>> 2. Added early return in kvm_reset_vmfd() when
>>> kvm_arch_supports_vmfd_change() returns false
>>> 
>>> Git Bisect Log:
>>> # git bisect bad
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
>>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
>>> Author: Ani Sinha <anisinha@redhat.com>
>>> Date:   Wed Feb 25 09:19:10 2026 +0530
>>> 
>>>     accel/kvm: add changes required to support KVM VM file descriptor change
>>> 
>>>     This change adds common kvm specific support to handle KVM VM file
>>> descriptor
>>>     change. KVM VM file descriptor can change as a part of
>>> confidential guest reset
>>>     mechanism. A new function api kvm_arch_on_vmfd_change() per
>>>     architecture platform is added in order to implement architecture specific
>>>     changes required to support it. A subsequent patch will add x86 specific
>>>     implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
>>>     confidential guest reset.
>>> 
>>>     Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>     Link: https://lore.kernel.org/r/20260225035000.385950-6- anisinha@redhat.com
>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> 
>>>  MAINTAINERS            |  6 ++++++
>>>  accel/kvm/kvm-all.c    | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++---
>>>  accel/kvm/trace-events |  1 +
>>>  include/system/kvm.h   |  3 +++
>>>  stubs/kvm.c            | 22 ++++++++++++++++++++++
>>>  stubs/meson.build      |  1 +
>>>  target/i386/kvm/kvm.c  | 10 ++++++++++
>>>  7 files changed, 128 insertions(+), 3 deletions(-)
>>>  create mode 100644 stubs/kvm.c
>>> 
>>> # git bisect log
>>> git bisect start
>>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
>>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
>>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
>>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
>>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
>>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
>>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
>>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
>>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
>>> 
>>> Thanks,
>>> Misbah Anjum N <misanjum@linux.ibm.com>
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-09 13:11     ` Ani Sinha
@ 2026-03-09 13:23       ` Ani Sinha
  2026-03-10  8:39         ` Misbah Anjum N
  0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-09 13:23 UTC (permalink / raw)
  To: Harsh Prateek Bora
  Cc: Misbah Anjum N, Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin

[-- Attachment #1: Type: text/plain, Size: 10021 bytes --]



On Mon, 9 Mar 2026, Ani Sinha wrote:

>
>
> > On 9 Mar 2026, at 4:34 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> >
> > Hi Ani, Paolo,
> >
> > I think the problem lies here:
> >
> > For archs which doesnt support vm fd change, we are baling out as below in kvm_reset_vmfd.
> >
> >
> >    /*
> >     * bail if the current architecture does not support VM file
> >     * descriptor change.
> >     */
> >    if (!kvm_arch_supports_vmfd_change()) {
> >        error_report("This target architecture does not support KVM VM "
> >                     "file descriptor change.");
> >        return -EOPNOTSUPP;
> >    }
> >
> > However, when rebuild_guest (kvm_reset_vmfd) is called in
> > qemu_system_reset here:
> >
> >    if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
> >         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
> >        (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
> >        if (ac->rebuild_guest) {
> >            ret = ac->rebuild_guest(current_machine);
> >            if (ret < 0) {
> >                error_report("unable to rebuild guest: %s(%d)",
> >                             strerror(-ret), ret);
> >                vm_stop(RUN_STATE_INTERNAL_ERROR);
> >            } else {
> >                info_report("virtual machine state has been rebuilt with new "
> >                            "guest file handle.");
> >                guest_state_rebuilt = true;
> >            }
> >        } else if (!cpus_are_resettable())  {
> >            error_report("accelerator does not support reset!");
> >        } else {
> >            error_report("accelerator does not support rebuilding guest state,"
> >                         " proceeding with normal reset!");
> >        }
> >    }
> >
> >
> > it just does a vm_stop if rebuild_guest returns < 0.
> >
> > IMHO, This should handle -EOPNOTSUPP gracefully.
> > Please advise if this needs to be taken care differently?

Yes seems this is an issue and I will fix it. Not sure if the fix will
address your issue though ...

Can you try the following patch?

From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
From: Ani Sinha <anisinha@redhat.com>
Date: Mon, 9 Mar 2026 18:44:40 +0530
Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 system/runstate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/system/runstate.c b/system/runstate.c
index eca722b43c..c1f41284c9 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
         (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
         if (ac->rebuild_guest) {
             ret = ac->rebuild_guest(current_machine);
-            if (ret < 0) {
+            if (ret < 0 && ret != -EOPNOTSUPP) {
                 error_report("unable to rebuild guest: %s(%d)",
                              strerror(-ret), ret);
                 vm_stop(RUN_STATE_INTERNAL_ERROR);
+            } else if (ret == -EOPNOTSUPP) {
+                error_report("accelerator does not support reset!");
             } else {
                 info_report("virtual machine state has been rebuilt with new "
                             "guest file handle.");
-- 
2.42.0


>
> Is this a confidential guest that cannot be normally reset?
>

> >> #2  0x00007ffff58aa220 in __futex_abstimed_wait_common64 (futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true) at futex-internal.c:57
> >> #3  __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0, clockid=0, abstime=<optimized out>, private=0, cancel=true) at futex- internal.c:87
> >> #4  __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10, expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
> >> #5  0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at pthread_cond_wait.c:426
> >> #6  ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at pthread_cond_wait.c:458
> >> #7  0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472) at ../util/qemu-thread-posix.c:240
> >> #8  0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) at ../system/cpus.c:472
> >> #9  0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at ../ accel/kvm/kvm-accel-ops.c:50
> >> #10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at ../ util/qemu-thread-posix.c:414
> >> #11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at pthread_create.c:448
> >> #12 0x00007ffff59555f8 in __GI___clone3 () at ../sysdeps/unix/sysv/ linux/powerpc/powerpc64/clone3.S:114
> >> Thanks,
> >> Misbah Anjum N <misanjum@linux.ibm.com>
> >> On 2026-03-06 16:22, Misbah Anjum N wrote:
> >>> Hi,
> >>> I'm reporting a critical regression on ppc64le that causes all KVM
> >>> guests to hang immediately during startup. Git bisect identified
> >>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
> >>> commit. The commit completely breaks KVM functionality on ppc64le.
> >>>
> >>> Regression Details:
> >>> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
> >>> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
> >>> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
> >>> changes required to support KVM VM file descriptor change"
> >>> Commit Link:
> >>> https://gitlab.com/qemu-project/qemu/-/ commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
> >>>
> >>> Environment:
> >>> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
> >>> Libvirt: 12.1.0
> >>> Guest: Fedora 42, Kernel 7.0.0-rc2
> >>> Machine Type: pseries with KVM acceleration
> >>>
> >>> Build Configuration:
> >>> git clone https://gitlab.com/qemu-project/qemu.git
> >>> cd qemu
> >>> git submodule init
> >>> git submodule update --recursive
> >>> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
> >>> make && make install
> >>>
> >>> Reproduction:
> >>> Using virt-install:
> >>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
> >>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
> >>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
> >>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
> >>> --controller type=scsi,model=virtio-scsi --disk
> >>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,bus=scsi,size=10,format=qcow2
> >>> --network=bridge=virbr0,model=virtio --boot
> >>> emulator=/usr/bin/qemu-system-ppc64
> >>> Result: Starting install...
> >>>         <hangs indefinitely with no output>
> >>>
> >>> Using direct QEMU command:
> >>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
> >>> pseries,accel=kvm -enable-kvm -m 32768 -smp
> >>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
> >>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
> >>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
> >>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
> >>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> >>> Result: <hangs indefinitely with no output>
> >>>
> >>> Analysis:
> >>> The commit introduces VM file descriptor change support with
> >>> architecture-specific hooks.
> >>> I attempted the following fixes without success:
> >>> 1. Changed abort() to return 0; in stubs/kvm.c
> >>> 2. Added early return in kvm_reset_vmfd() when
> >>> kvm_arch_supports_vmfd_change() returns false
> >>>
> >>> Git Bisect Log:
> >>> # git bisect bad
> >>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
> >>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
> >>> Author: Ani Sinha <anisinha@redhat.com>
> >>> Date:   Wed Feb 25 09:19:10 2026 +0530
> >>>
> >>>     accel/kvm: add changes required to support KVM VM file descriptor change
> >>>
> >>>     This change adds common kvm specific support to handle KVM VM file
> >>> descriptor
> >>>     change. KVM VM file descriptor can change as a part of
> >>> confidential guest reset
> >>>     mechanism. A new function api kvm_arch_on_vmfd_change() per
> >>>     architecture platform is added in order to implement architecture specific
> >>>     changes required to support it. A subsequent patch will add x86 specific
> >>>     implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
> >>>     confidential guest reset.
> >>>
> >>>     Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>>     Link: https://lore.kernel.org/r/20260225035000.385950-6- anisinha@redhat.com
> >>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>
> >>>  MAINTAINERS            |  6 ++++++
> >>>  accel/kvm/kvm-all.c    | 88
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++---
> >>>  accel/kvm/trace-events |  1 +
> >>>  include/system/kvm.h   |  3 +++
> >>>  stubs/kvm.c            | 22 ++++++++++++++++++++++
> >>>  stubs/meson.build      |  1 +
> >>>  target/i386/kvm/kvm.c  | 10 ++++++++++
> >>>  7 files changed, 128 insertions(+), 3 deletions(-)
> >>>  create mode 100644 stubs/kvm.c
> >>>
> >>> # git bisect log
> >>> git bisect start
> >>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
> >>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
> >>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
> >>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
> >>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
> >>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
> >>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
> >>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
> >>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
> >>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
> >>>
> >>> Thanks,
> >>> Misbah Anjum N <misanjum@linux.ibm.com>
> >
>
>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-09 11:04   ` Harsh Prateek Bora
  2026-03-09 13:11     ` Ani Sinha
@ 2026-03-09 13:30     ` Ani Sinha
  1 sibling, 0 replies; 19+ messages in thread
From: Ani Sinha @ 2026-03-09 13:30 UTC (permalink / raw)
  To: Harsh Prateek Bora
  Cc: Misbah Anjum N, Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin



> On 9 Mar 2026, at 4:34 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> 
> Hi Ani, Paolo,
> 
> I think the problem lies here:
> 
> For archs which doesnt support vm fd change, we are baling out as below in kvm_reset_vmfd.
> 
> 
>    /*
>     * bail if the current architecture does not support VM file
>     * descriptor change.
>     */
>    if (!kvm_arch_supports_vmfd_change()) {
>        error_report("This target architecture does not support KVM VM "
>                     "file descriptor change.");
>        return -EOPNOTSUPP;
>    }
> 
> However, when rebuild_guest (kvm_reset_vmfd) is called in
> qemu_system_reset here:
> 
>    if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
>        (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {

This entire block is only executed if either you manually enable new_accel_vmfd_on_reset or cpus are not resettable.
From looking at the code

$ git grep kvm_mark_guest_state_protected
accel/kvm/kvm-all.c:void kvm_mark_guest_state_protected(void)
include/system/kvm.h:void kvm_mark_guest_state_protected(void);
target/i386/kvm/tdx.c:    kvm_mark_guest_state_protected();
target/i386/sev.c:        kvm_mark_guest_state_protected();
target/i386/sev.c:    kvm_mark_guest_state_protected();

Seems only sev and tax makes the cpus non-resettable.


>        if (ac->rebuild_guest) {
>            ret = ac->rebuild_guest(current_machine);
>            if (ret < 0) {
>                error_report("unable to rebuild guest: %s(%d)",
>                             strerror(-ret), ret);
>                vm_stop(RUN_STATE_INTERNAL_ERROR);
>            } else {
>                info_report("virtual machine state has been rebuilt with new "
>                            "guest file handle.");
>                guest_state_rebuilt = true;
>            }
>        } else if (!cpus_are_resettable())  {
>            error_report("accelerator does not support reset!");
>        } else {
>            error_report("accelerator does not support rebuilding guest state,"
>                         " proceeding with normal reset!");
>        }
>    }
> 
>>> 

<anip>


>>> Reproduction:
>>> Using virt-install:
>>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
>>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
>>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
>>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
>>> --controller type=scsi,model=virtio-scsi --disk
>>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,bus=scsi,size=10,format=qcow2
>>> --network=bridge=virbr0,model=virtio --boot
>>> emulator=/usr/bin/qemu-system-ppc64
>>> Result: Starting install...
>>>         <hangs indefinitely with no output>
>>> 
>>> Using direct QEMU command:
>>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>>> pseries,accel=kvm -enable-kvm -m 32768 -smp
>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0

Hmm, this command line does not seems to indicate its a confidential vm or you enable that debug flag.


>>> Result: <hangs indefinitely with no output>
>>> 
>>> Analysis:
>>> The commit introduces VM file descriptor change support with
>>> architecture-specific hooks.
>>> I attempted the following fixes without success:
>>> 1. Changed abort() to return 0; in stubs/kvm.c
>>> 2. Added early return in kvm_reset_vmfd() when
>>> kvm_arch_supports_vmfd_change() returns false
>>> 
>>> Git Bisect Log:
>>> # git bisect bad
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
>>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
>>> Author: Ani Sinha <anisinha@redhat.com>
>>> Date:   Wed Feb 25 09:19:10 2026 +0530
>>> 
>>>     accel/kvm: add changes required to support KVM VM file descriptor change
>>> 
>>>     This change adds common kvm specific support to handle KVM VM file
>>> descriptor
>>>     change. KVM VM file descriptor can change as a part of
>>> confidential guest reset
>>>     mechanism. A new function api kvm_arch_on_vmfd_change() per
>>>     architecture platform is added in order to implement architecture specific
>>>     changes required to support it. A subsequent patch will add x86 specific
>>>     implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
>>>     confidential guest reset.
>>> 
>>>     Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>     Link: https://lore.kernel.org/r/20260225035000.385950-6- anisinha@redhat.com
>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> 
>>>  MAINTAINERS            |  6 ++++++
>>>  accel/kvm/kvm-all.c    | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++---
>>>  accel/kvm/trace-events |  1 +
>>>  include/system/kvm.h   |  3 +++
>>>  stubs/kvm.c            | 22 ++++++++++++++++++++++
>>>  stubs/meson.build      |  1 +
>>>  target/i386/kvm/kvm.c  | 10 ++++++++++
>>>  7 files changed, 128 insertions(+), 3 deletions(-)
>>>  create mode 100644 stubs/kvm.c
>>> 
>>> # git bisect log
>>> git bisect start
>>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
>>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
>>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
>>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
>>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
>>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
>>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
>>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
>>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
>>> 
>>> Thanks,
>>> Misbah Anjum N <misanjum@linux.ibm.com>
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-09 13:23       ` Ani Sinha
@ 2026-03-10  8:39         ` Misbah Anjum N
  2026-03-10  8:54           ` Ani Sinha
  0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-10  8:39 UTC (permalink / raw)
  To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin, Harshpb

Hi Ani and Paolo,

We have tested the code by applying both the original commit 
(98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 
9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
However, the issue persists. We've conducted GDB debugging that shows 
the hang is occurring in a different location than what the fix 
addresses.

Since the original patch is breaking KVM guest bringup completely on 
ppc64le, and the fix patch does not resolve the issue, given the 
severity of this regression (complete KVM breakage on ppc64le), we 
should either find a quick fix or consider reverting the patch until a 
proper solution can be identified.

Analysis:
1. This is not a confidential guest. This is a regular KVM guest running 
on ppc64le.
2. The execution flow shows that qemu_system_reset() completes 
successfully and never enters the code path at line 529-543
3. The hang occurs later in qemu_default_main() at system/main.c:49, 
after calling bql_lock()
4. The ppc KVM guest boots fine with the previous commit - 
df8df3cb6b743372ebb335bd8404bc3d748da350
5. This suggests the issue is not with error handling of -EOPNOTSUPP 
during reset, but bql_lock() getting stuck in qemu_default_main()

GDB Trace Analysis:
We set breakpoints at qemu_system_reset() and qemu_default_main() to 
trace the execution flow. The system successfully completes 
qemu_system_reset() without entering the problematic code path where the 
fix provided by you applies (system/runstate.c:529-543).

# gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
pseries,accel=kvm -enable-kvm -m 32768 -smp 
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0

(gdb) handle SIGUSR1 pass nostop noprint
Signal        Stop	Print	Pass to program	Description
SIGUSR1       No	No	Yes		User defined signal 1
(gdb) b qemu_system_reset
Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
(gdb) b qemu_default_main
Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
(gdb) r

Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 
-machine pseries,accel=kvm -enable-kvm -m 32768 -smp 
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0

Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset 
(reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
(gdb) n
517     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : 
NULL;
(gdb) n
519     cpu_synchronize_all_states();
(gdb) n
521     switch (reason) {
(gdb) n
529     if (!cpus_are_resettable() &&
(gdb) n
553     if (mc && mc->reset) {
(gdb) n
554         mc->reset(current_machine, type);
(gdb) n
558     switch (reason) {
(gdb) n
574     if (cpus_are_resettable()) {
(gdb) n
583             cpu_synchronize_all_post_reset();
(gdb) n
587     vm_set_suspended(false);
(gdb) n
qdev_machine_creation_done () at ../hw/core/machine.c:1814
1814    register_global_state();
(gdb) n
qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at 
../system/vl.c:2785
2785    if (machine->cgs && !machine->cgs->ready) {
(gdb) n
2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
(gdb) n
2793    if (!vga_interface_created && !default_vga &&
(gdb) n
qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at 
../system/vl.c:2815
2815    if (loadvm) {
(gdb) n
2820    if (replay_mode != REPLAY_MODE_NONE) {
(gdb) n
2824    if (incoming) {
(gdb) n
2837    } else if (autostart) {
(gdb) n
2838        qmp_cont(NULL);
(gdb) n
qemu_init (argc=<optimized out>, argv=<optimized out>) at 
../system/vl.c:3849
3849    qemu_init_displays();
(gdb) n
3850    accel_setup_post(current_machine);
(gdb) n
3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
(gdb) n
3852        os_setup_post();
(gdb) n
3854    resume_mux_open();
(gdb) n
main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
84      bql_unlock();
(gdb) n
85      replay_mutex_unlock();
(gdb) n
87      if (qemu_main) {
(gdb) n
93          qemu_default_main(NULL);
(gdb) n

Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main 
(opaque=opaque@entry=0x0) at ../system/main.c:48
48      replay_mutex_lock();
(gdb) n
49      bql_lock();
(gdb) n

<hangs>
<system becomes unresponsive at this point>


Thanks,
Misbah Anjum N <misanjumn@ibm.com>



On 2026-03-09 18:53, Ani Sinha wrote:
> Yes seems this is an issue and I will fix it. Not sure if the fix will
> address your issue though ...
> 
> Can you try the following patch?
> 
> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
> From: Ani Sinha <anisinha@redhat.com>
> Date: Mon, 9 Mar 2026 18:44:40 +0530
> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset 
> yet
> 
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>  system/runstate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/system/runstate.c b/system/runstate.c
> index eca722b43c..c1f41284c9 100644
> --- a/system/runstate.c
> +++ b/system/runstate.c
> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>          (current_machine->new_accel_vmfd_on_reset || 
> !cpus_are_resettable())) {
>          if (ac->rebuild_guest) {
>              ret = ac->rebuild_guest(current_machine);
> -            if (ret < 0) {
> +            if (ret < 0 && ret != -EOPNOTSUPP) {
>                  error_report("unable to rebuild guest: %s(%d)",
>                               strerror(-ret), ret);
>                  vm_stop(RUN_STATE_INTERNAL_ERROR);
> +            } else if (ret == -EOPNOTSUPP) {
> +                error_report("accelerator does not support reset!");
>              } else {
>                  info_report("virtual machine state has been rebuilt 
> with new "
>                              "guest file handle.");
> --
> 2.42.0
> 
> 
>> 
>> Is this a confidential guest that cannot be normally reset?
>> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-10  8:39         ` Misbah Anjum N
@ 2026-03-10  8:54           ` Ani Sinha
  2026-03-10  9:08             ` Misbah Anjum N
  0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-10  8:54 UTC (permalink / raw)
  To: Misbah Anjum N
  Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora



> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
> 
> Hi Ani and Paolo,
> 
> We have tested the code by applying both the original commit (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
> However, the issue persists. We've conducted GDB debugging that shows the hang is occurring in a different location than what the fix addresses.
> 
> Since the original patch is breaking KVM guest bringup completely on ppc64le, and the fix patch does not resolve the issue, given the severity of this regression (complete KVM breakage on ppc64le), we should either find a quick fix or consider reverting the patch until a proper solution can be identified.

Based on what you just described, it does not seem like the issue is related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you revert this patch in your local tree, can you confirm that your issue gets fixed?

> 
> Analysis:
> 1. This is not a confidential guest. This is a regular KVM guest running on ppc64le.
> 2. The execution flow shows that qemu_system_reset() completes successfully and never enters the code path at line 529-543

This is what I expected and therefore, no code related to coco guest rebuilding is getting executed. Your issue seems to be somewhere else.

> 3. The hang occurs later in qemu_default_main() at system/main.c:49, after calling bql_lock()
> 4. The ppc KVM guest boots fine with the previous commit - df8df3cb6b743372ebb335bd8404bc3d748da350
> 5. This suggests the issue is not with error handling of -EOPNOTSUPP during reset, but bql_lock() getting stuck in qemu_default_main()
> 
> GDB Trace Analysis:
> We set breakpoints at qemu_system_reset() and qemu_default_main() to trace the execution flow. The system successfully completes qemu_system_reset() without entering the problematic code path where the fix provided by you applies (system/runstate.c:529-543).
> 
> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> 
> (gdb) handle SIGUSR1 pass nostop noprint
> Signal        Stop Print Pass to program Description
> SIGUSR1       No No Yes User defined signal 1
> (gdb) b qemu_system_reset
> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
> (gdb) b qemu_default_main
> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
> (gdb) r
> 
> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> 
> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
> 513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
> (gdb) n
> 517     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
> (gdb) n
> 519     cpu_synchronize_all_states();
> (gdb) n
> 521     switch (reason) {
> (gdb) n
> 529     if (!cpus_are_resettable() &&
> (gdb) n
> 553     if (mc && mc->reset) {
> (gdb) n
> 554         mc->reset(current_machine, type);
> (gdb) n
> 558     switch (reason) {
> (gdb) n
> 574     if (cpus_are_resettable()) {
> (gdb) n
> 583             cpu_synchronize_all_post_reset();
> (gdb) n
> 587     vm_set_suspended(false);
> (gdb) n
> qdev_machine_creation_done () at ../hw/core/machine.c:1814
> 1814    register_global_state();
> (gdb) n
> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at ../system/vl.c:2785
> 2785    if (machine->cgs && !machine->cgs->ready) {
> (gdb) n
> 2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
> (gdb) n
> 2793    if (!vga_interface_created && !default_vga &&
> (gdb) n
> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at ../system/vl.c:2815
> 2815    if (loadvm) {
> (gdb) n
> 2820    if (replay_mode != REPLAY_MODE_NONE) {
> (gdb) n
> 2824    if (incoming) {
> (gdb) n
> 2837    } else if (autostart) {
> (gdb) n
> 2838        qmp_cont(NULL);
> (gdb) n
> qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3849
> 3849    qemu_init_displays();
> (gdb) n
> 3850    accel_setup_post(current_machine);
> (gdb) n
> 3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
> (gdb) n
> 3852        os_setup_post();
> (gdb) n
> 3854    resume_mux_open();
> (gdb) n
> main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
> 84      bql_unlock();
> (gdb) n
> 85      replay_mutex_unlock();
> (gdb) n
> 87      if (qemu_main) {
> (gdb) n
> 93          qemu_default_main(NULL);
> (gdb) n
> 
> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main (opaque=opaque@entry=0x0) at ../system/main.c:48
> 48      replay_mutex_lock();
> (gdb) n
> 49      bql_lock();
> (gdb) n
> 
> <hangs>
> <system becomes unresponsive at this point>
> 
> 
> Thanks,
> Misbah Anjum N <misanjumn@ibm.com>
> 
> 
> 
> On 2026-03-09 18:53, Ani Sinha wrote:
>> Yes seems this is an issue and I will fix it. Not sure if the fix will
>> address your issue though ...
>> Can you try the following patch?
>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
>> From: Ani Sinha <anisinha@redhat.com>
>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> ---
>> system/runstate.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>> diff --git a/system/runstate.c b/system/runstate.c
>> index eca722b43c..c1f41284c9 100644
>> --- a/system/runstate.c
>> +++ b/system/runstate.c
>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>         (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
>>         if (ac->rebuild_guest) {
>>             ret = ac->rebuild_guest(current_machine);
>> -            if (ret < 0) {
>> +            if (ret < 0 && ret != -EOPNOTSUPP) {
>>                 error_report("unable to rebuild guest: %s(%d)",
>>                              strerror(-ret), ret);
>>                 vm_stop(RUN_STATE_INTERNAL_ERROR);
>> +            } else if (ret == -EOPNOTSUPP) {
>> +                error_report("accelerator does not support reset!");
>>             } else {
>>                 info_report("virtual machine state has been rebuilt with new "
>>                             "guest file handle.");
>> --
>> 2.42.0
>>> Is this a confidential guest that cannot be normally reset?
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-10  8:54           ` Ani Sinha
@ 2026-03-10  9:08             ` Misbah Anjum N
  2026-03-10  9:34               ` Ani Sinha
  0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-10  9:08 UTC (permalink / raw)
  To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin, Harsh Prateek Bora

On 2026-03-10 14:24, Ani Sinha wrote:
>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> 
>> wrote:
>> 
>> Hi Ani and Paolo,
>> 
>> We have tested the code by applying both the original commit 
>> (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 
>> 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>> However, the issue persists. We've conducted GDB debugging that shows 
>> the hang is occurring in a different location than what the fix 
>> addresses.
>> 
>> Since the original patch is breaking KVM guest bringup completely on 
>> ppc64le, and the fix patch does not resolve the issue, given the 
>> severity of this regression (complete KVM breakage on ppc64le), we 
>> should either find a quick fix or consider reverting the patch until a 
>> proper solution can be identified.
> 
> Based on what you just described, it does not seem like the issue is
> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
> revert this patch in your local tree, can you confirm that your issue
> gets fixed?
> 

Yes, the issue is not seen with the immediate previous commit:

commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
Author: Ani Sinha <anisinha@redhat.com>
Date:   Wed Feb 25 09:19:09 2026 +0530

     system/physmem: add helper to reattach existing memory after KVM VM 
fd change

     After the guest KVM file descriptor has changed as a part of the 
process of
     confidential guest reset mechanism, existing memory needs to be 
reattached to
     the new file descriptor. This change adds a helper function 
ram_block_rebind()
     for this purpose. The next patch will make use of this function.

     Signed-off-by: Ani Sinha <anisinha@redhat.com>
     Link: 
https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Looks like the next patch is enabling the functionality of the previous 
patches in such a way which causes bql_lock() to get stuck on 
architectures (ppc64le in this case) which does not support this feature 
yet.

Did you validate your patches on other architectures which does not 
support this feature yet?

>> 
>> Analysis:
>> 1. This is not a confidential guest. This is a regular KVM guest 
>> running on ppc64le.
>> 2. The execution flow shows that qemu_system_reset() completes 
>> successfully and never enters the code path at line 529-543
> 
> This is what I expected and therefore, no code related to coco guest
> rebuilding is getting executed. Your issue seems to be somewhere else.
> 

The issue occurs only with the introduction of this patch and not with 
the previous upstream commit as explained above.

>> 3. The hang occurs later in qemu_default_main() at system/main.c:49, 
>> after calling bql_lock()
>> 4. The ppc KVM guest boots fine with the previous commit - 
>> df8df3cb6b743372ebb335bd8404bc3d748da350
>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP 
>> during reset, but bql_lock() getting stuck in qemu_default_main()
>> 
>> GDB Trace Analysis:
>> We set breakpoints at qemu_system_reset() and qemu_default_main() to 
>> trace the execution flow. The system successfully completes 
>> qemu_system_reset() without entering the problematic code path where 
>> the fix provided by you applies (system/runstate.c:529-543).
>> 
>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
>> pseries,accel=kvm -enable-kvm -m 32768 -smp 
>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>> 
>> (gdb) handle SIGUSR1 pass nostop noprint
>> Signal        Stop Print Pass to program Description
>> SIGUSR1       No No Yes User defined signal 1
>> (gdb) b qemu_system_reset
>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>> (gdb) b qemu_default_main
>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>> (gdb) r
>> 
>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 
>> -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 
>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>> 
>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset 
>> (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
>> 513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>> (gdb) n
>> 517     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : 
>> NULL;
>> (gdb) n
>> 519     cpu_synchronize_all_states();
>> (gdb) n
>> 521     switch (reason) {
>> (gdb) n
>> 529     if (!cpus_are_resettable() &&
>> (gdb) n
>> 553     if (mc && mc->reset) {
>> (gdb) n
>> 554         mc->reset(current_machine, type);
>> (gdb) n
>> 558     switch (reason) {
>> (gdb) n
>> 574     if (cpus_are_resettable()) {
>> (gdb) n
>> 583             cpu_synchronize_all_post_reset();
>> (gdb) n
>> 587     vm_set_suspended(false);
>> (gdb) n
>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>> 1814    register_global_state();
>> (gdb) n
>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at 
>> ../system/vl.c:2785
>> 2785    if (machine->cgs && !machine->cgs->ready) {
>> (gdb) n
>> 2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>> (gdb) n
>> 2793    if (!vga_interface_created && !default_vga &&
>> (gdb) n
>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at 
>> ../system/vl.c:2815
>> 2815    if (loadvm) {
>> (gdb) n
>> 2820    if (replay_mode != REPLAY_MODE_NONE) {
>> (gdb) n
>> 2824    if (incoming) {
>> (gdb) n
>> 2837    } else if (autostart) {
>> (gdb) n
>> 2838        qmp_cont(NULL);
>> (gdb) n
>> qemu_init (argc=<optimized out>, argv=<optimized out>) at 
>> ../system/vl.c:3849
>> 3849    qemu_init_displays();
>> (gdb) n
>> 3850    accel_setup_post(current_machine);
>> (gdb) n
>> 3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>> (gdb) n
>> 3852        os_setup_post();
>> (gdb) n
>> 3854    resume_mux_open();
>> (gdb) n
>> main (argc=<optimized out>, argv=<optimized out>) at 
>> ../system/main.c:84
>> 84      bql_unlock();
>> (gdb) n
>> 85      replay_mutex_unlock();
>> (gdb) n
>> 87      if (qemu_main) {
>> (gdb) n
>> 93          qemu_default_main(NULL);
>> (gdb) n
>> 
>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main 
>> (opaque=opaque@entry=0x0) at ../system/main.c:48
>> 48      replay_mutex_lock();
>> (gdb) n
>> 49      bql_lock();
>> (gdb) n
>> 
>> <hangs>
>> <system becomes unresponsive at this point>
>> 
>> 
>> Thanks,
>> Misbah Anjum N <misanjumn@ibm.com>
>> 
>> 
>> 
>> On 2026-03-09 18:53, Ani Sinha wrote:
>>> Yes seems this is an issue and I will fix it. Not sure if the fix 
>>> will
>>> address your issue though ...
>>> Can you try the following patch?
>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 
>>> 2001
>>> From: Ani Sinha <anisinha@redhat.com>
>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support 
>>> reset yet
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> ---
>>> system/runstate.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>> diff --git a/system/runstate.c b/system/runstate.c
>>> index eca722b43c..c1f41284c9 100644
>>> --- a/system/runstate.c
>>> +++ b/system/runstate.c
>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>>         (current_machine->new_accel_vmfd_on_reset || 
>>> !cpus_are_resettable())) {
>>>         if (ac->rebuild_guest) {
>>>             ret = ac->rebuild_guest(current_machine);
>>> -            if (ret < 0) {
>>> +            if (ret < 0 && ret != -EOPNOTSUPP) {
>>>                 error_report("unable to rebuild guest: %s(%d)",
>>>                              strerror(-ret), ret);
>>>                 vm_stop(RUN_STATE_INTERNAL_ERROR);
>>> +            } else if (ret == -EOPNOTSUPP) {
>>> +                error_report("accelerator does not support reset!");
>>>             } else {
>>>                 info_report("virtual machine state has been rebuilt 
>>> with new "
>>>                             "guest file handle.");
>>> --
>>> 2.42.0
>>>> Is this a confidential guest that cannot be normally reset?
>> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-10  9:08             ` Misbah Anjum N
@ 2026-03-10  9:34               ` Ani Sinha
  2026-03-10 10:05                 ` Misbah Anjum N
  0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-10  9:34 UTC (permalink / raw)
  To: Misbah Anjum N
  Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora



> On 10 Mar 2026, at 2:38 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
> 
> On 2026-03-10 14:24, Ani Sinha wrote:
>>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>> Hi Ani and Paolo,
>>> We have tested the code by applying both the original commit (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>>> However, the issue persists. We've conducted GDB debugging that shows the hang is occurring in a different location than what the fix addresses.
>>> Since the original patch is breaking KVM guest bringup completely on ppc64le, and the fix patch does not resolve the issue, given the severity of this regression (complete KVM breakage on ppc64le), we should either find a quick fix or consider reverting the patch until a proper solution can be identified.
>> Based on what you just described, it does not seem like the issue is
>> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
>> revert this patch in your local tree, can you confirm that your issue
>> gets fixed?
> 
> Yes, the issue is not seen with the immediate previous commit:
> 
> commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
> Author: Ani Sinha <anisinha@redhat.com>
> Date:   Wed Feb 25 09:19:09 2026 +0530
> 
>    system/physmem: add helper to reattach existing memory after KVM VM fd change
> 
>    After the guest KVM file descriptor has changed as a part of the process of
>    confidential guest reset mechanism, existing memory needs to be reattached to
>    the new file descriptor. This change adds a helper function ram_block_rebind()
>    for this purpose. The next patch will make use of this function.
> 
>    Signed-off-by: Ani Sinha <anisinha@redhat.com>
>    Link: https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
>    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Looks like the next patch is enabling the functionality of the previous patches in such a way which causes bql_lock() to get stuck on architectures (ppc64le in this case) which does not support this feature yet.

This theory is not substantiated by code or evidence. 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd() which is called by this block of code with the tip at 98884e0cc10997a17ce9abfd6ff10be19224ca6a :

   if (!cpus_are_resettable() &&
        (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
        if (ac->rebuild_guest) {
            ret = ac->rebuild_guest(current_machine);
            if (ret < 0) {
                error_report("unable to rebuild guest: %s(%d)",
                             strerror(-ret), ret);
                vm_stop(RUN_STATE_INTERNAL_ERROR);
            } else {
                info_report("virtual machine state has been rebuilt with new "
                            "guest file handle.");
                guest_state_rebuilt = true;
            }
        } else if (!cpus_are_resettable())  {
            error_report("accelerator does not support reset!");
        } else {
            error_report("accelerator does not support rebuilding guest state,"
                         " proceeding with normal reset!");
        }
    }

If cpus are resettable, this block will not be called and nothing that the patch introduces will have been executed.
So I think you guys need to explain a bit more why you so strongly feel this patch broke it. I am confused and unable to reason this.

> 
> Did you validate your patches on other architectures which does not support this feature yet?

As you have already seen, on other architectures, the entire block of code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently exercises this.

> 
>>> Analysis:
>>> 1. This is not a confidential guest. This is a regular KVM guest running on ppc64le.
>>> 2. The execution flow shows that qemu_system_reset() completes successfully and never enters the code path at line 529-543
>> This is what I expected and therefore, no code related to coco guest
>> rebuilding is getting executed. Your issue seems to be somewhere else.
> 
> The issue occurs only with the introduction of this patch and not with the previous upstream commit as explained above.
> 
>>> 3. The hang occurs later in qemu_default_main() at system/main.c:49, after calling bql_lock()
>>> 4. The ppc KVM guest boots fine with the previous commit - df8df3cb6b743372ebb335bd8404bc3d748da350
>>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP during reset, but bql_lock() getting stuck in qemu_default_main()
>>> GDB Trace Analysis:
>>> We set breakpoints at qemu_system_reset() and qemu_default_main() to trace the execution flow. The system successfully completes qemu_system_reset() without entering the problematic code path where the fix provided by you applies (system/runstate.c:529-543).
>>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>> (gdb) handle SIGUSR1 pass nostop noprint
>>> Signal        Stop Print Pass to program Description
>>> SIGUSR1       No No Yes User defined signal 1
>>> (gdb) b qemu_system_reset
>>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>>> (gdb) b qemu_default_main
>>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>>> (gdb) r
>>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
>>> 513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>>> (gdb) n
>>> 517     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
>>> (gdb) n
>>> 519     cpu_synchronize_all_states();
>>> (gdb) n
>>> 521     switch (reason) {
>>> (gdb) n
>>> 529     if (!cpus_are_resettable() &&
>>> (gdb) n
>>> 553     if (mc && mc->reset) {
>>> (gdb) n
>>> 554         mc->reset(current_machine, type);
>>> (gdb) n
>>> 558     switch (reason) {
>>> (gdb) n
>>> 574     if (cpus_are_resettable()) {
>>> (gdb) n
>>> 583             cpu_synchronize_all_post_reset();
>>> (gdb) n
>>> 587     vm_set_suspended(false);
>>> (gdb) n
>>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>>> 1814    register_global_state();
>>> (gdb) n
>>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at ../system/vl.c:2785
>>> 2785    if (machine->cgs && !machine->cgs->ready) {
>>> (gdb) n
>>> 2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>>> (gdb) n
>>> 2793    if (!vga_interface_created && !default_vga &&
>>> (gdb) n
>>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at ../system/vl.c:2815
>>> 2815    if (loadvm) {
>>> (gdb) n
>>> 2820    if (replay_mode != REPLAY_MODE_NONE) {
>>> (gdb) n
>>> 2824    if (incoming) {
>>> (gdb) n
>>> 2837    } else if (autostart) {
>>> (gdb) n
>>> 2838        qmp_cont(NULL);
>>> (gdb) n
>>> qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3849
>>> 3849    qemu_init_displays();
>>> (gdb) n
>>> 3850    accel_setup_post(current_machine);
>>> (gdb) n
>>> 3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>>> (gdb) n
>>> 3852        os_setup_post();
>>> (gdb) n
>>> 3854    resume_mux_open();
>>> (gdb) n
>>> main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
>>> 84      bql_unlock();
>>> (gdb) n
>>> 85      replay_mutex_unlock();
>>> (gdb) n
>>> 87      if (qemu_main) {
>>> (gdb) n
>>> 93          qemu_default_main(NULL);
>>> (gdb) n
>>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main (opaque=opaque@entry=0x0) at ../system/main.c:48
>>> 48      replay_mutex_lock();
>>> (gdb) n
>>> 49      bql_lock();
>>> (gdb) n
>>> <hangs>
>>> <system becomes unresponsive at this point>
>>> Thanks,
>>> Misbah Anjum N <misanjumn@ibm.com>
>>> On 2026-03-09 18:53, Ani Sinha wrote:
>>>> Yes seems this is an issue and I will fix it. Not sure if the fix will
>>>> address your issue though ...
>>>> Can you try the following patch?
>>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
>>>> From: Ani Sinha <anisinha@redhat.com>
>>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>> ---
>>>> system/runstate.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>> diff --git a/system/runstate.c b/system/runstate.c
>>>> index eca722b43c..c1f41284c9 100644
>>>> --- a/system/runstate.c
>>>> +++ b/system/runstate.c
>>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>>>        (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
>>>>        if (ac->rebuild_guest) {
>>>>            ret = ac->rebuild_guest(current_machine);
>>>> -            if (ret < 0) {
>>>> +            if (ret < 0 && ret != -EOPNOTSUPP) {
>>>>                error_report("unable to rebuild guest: %s(%d)",
>>>>                             strerror(-ret), ret);
>>>>                vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>> +            } else if (ret == -EOPNOTSUPP) {
>>>> +                error_report("accelerator does not support reset!");
>>>>            } else {
>>>>                info_report("virtual machine state has been rebuilt with new "
>>>>                            "guest file handle.");
>>>> --
>>>> 2.42.0
>>>>> Is this a confidential guest that cannot be normally reset?




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-10  9:34               ` Ani Sinha
@ 2026-03-10 10:05                 ` Misbah Anjum N
  2026-03-10 10:12                   ` Ani Sinha
  0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-10 10:05 UTC (permalink / raw)
  To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
  Cc: npiggin, Harsh Prateek Bora, vaibhav, sbhat

On 2026-03-10 15:04, Ani Sinha wrote:
>> On 10 Mar 2026, at 2:38 PM, Misbah Anjum N <misanjum@linux.ibm.com> 
>> wrote:
>> 
>> On 2026-03-10 14:24, Ani Sinha wrote:
>>>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> 
>>>> wrote:
>>>> Hi Ani and Paolo,
>>>> We have tested the code by applying both the original commit 
>>>> (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch 
>>>> (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>>>> However, the issue persists. We've conducted GDB debugging that 
>>>> shows the hang is occurring in a different location than what the 
>>>> fix addresses.
>>>> Since the original patch is breaking KVM guest bringup completely on 
>>>> ppc64le, and the fix patch does not resolve the issue, given the 
>>>> severity of this regression (complete KVM breakage on ppc64le), we 
>>>> should either find a quick fix or consider reverting the patch until 
>>>> a proper solution can be identified.
>>> Based on what you just described, it does not seem like the issue is
>>> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
>>> revert this patch in your local tree, can you confirm that your issue
>>> gets fixed?
>> 
>> Yes, the issue is not seen with the immediate previous commit:
>> 
>> commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
>> Author: Ani Sinha <anisinha@redhat.com>
>> Date:   Wed Feb 25 09:19:09 2026 +0530
>> 
>>    system/physmem: add helper to reattach existing memory after KVM VM 
>> fd change
>> 
>>    After the guest KVM file descriptor has changed as a part of the 
>> process of
>>    confidential guest reset mechanism, existing memory needs to be 
>> reattached to
>>    the new file descriptor. This change adds a helper function 
>> ram_block_rebind()
>>    for this purpose. The next patch will make use of this function.
>> 
>>    Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>    Link: 
>> https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
>>    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> 
>> Looks like the next patch is enabling the functionality of the 
>> previous patches in such a way which causes bql_lock() to get stuck on 
>> architectures (ppc64le in this case) which does not support this 
>> feature yet.
> 
> This theory is not substantiated by code or evidence.
> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
> which is called by this block of code with the tip at
> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
> 
>    if (!cpus_are_resettable() &&
>         (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>          reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>         if (ac->rebuild_guest) {
>             ret = ac->rebuild_guest(current_machine);
>             if (ret < 0) {
>                 error_report("unable to rebuild guest: %s(%d)",
>                              strerror(-ret), ret);
>                 vm_stop(RUN_STATE_INTERNAL_ERROR);
>             } else {
>                 info_report("virtual machine state has been rebuilt 
> with new "
>                             "guest file handle.");
>                 guest_state_rebuilt = true;
>             }
>         } else if (!cpus_are_resettable())  {
>             error_report("accelerator does not support reset!");
>         } else {
>             error_report("accelerator does not support rebuilding guest 
> state,"
>                          " proceeding with normal reset!");
>         }
>     }
> 
> If cpus are resettable, this block will not be called and nothing that
> the patch introduces will have been executed.
> So I think you guys need to explain a bit more why you so strongly
> feel this patch broke it. I am confused and unable to reason this.
> 
>> 
>> Did you validate your patches on other architectures which does not 
>> support this feature yet?
> 
> As you have already seen, on other architectures, the entire block of
> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
> exercises this.
> 

I understand your concern about the code path analysis. Let me clarify 
our findings with concrete evidence.

Reproducibility Evidence:
With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are 
able to reproduce the hang issue 100% of the time across multiple test 
runs. When we revert to the previous commit 
df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots 
successfully 100% of the time.

This consistent reproducibility strongly indicates that commit 
98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the regression, 
even if the code path analysis suggests otherwise. This suggests the 
issue may not be in the code path, but rather in the changes introduced 
by the patch series.

As the author who led the development of this patch series, we would 
appreciate your help in figuring out this issue.

>> 
>>>> Analysis:
>>>> 1. This is not a confidential guest. This is a regular KVM guest 
>>>> running on ppc64le.
>>>> 2. The execution flow shows that qemu_system_reset() completes 
>>>> successfully and never enters the code path at line 529-543
>>> This is what I expected and therefore, no code related to coco guest
>>> rebuilding is getting executed. Your issue seems to be somewhere 
>>> else.
>> 
>> The issue occurs only with the introduction of this patch and not with 
>> the previous upstream commit as explained above.
>> 
>>>> 3. The hang occurs later in qemu_default_main() at system/main.c:49, 
>>>> after calling bql_lock()
>>>> 4. The ppc KVM guest boots fine with the previous commit - 
>>>> df8df3cb6b743372ebb335bd8404bc3d748da350
>>>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP 
>>>> during reset, but bql_lock() getting stuck in qemu_default_main()
>>>> GDB Trace Analysis:
>>>> We set breakpoints at qemu_system_reset() and qemu_default_main() to 
>>>> trace the execution flow. The system successfully completes 
>>>> qemu_system_reset() without entering the problematic code path where 
>>>> the fix provided by you applies (system/runstate.c:529-543).
>>>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 
>>>> -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 
>>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
>>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
>>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
>>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
>>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>> (gdb) handle SIGUSR1 pass nostop noprint
>>>> Signal        Stop Print Pass to program Description
>>>> SIGUSR1       No No Yes User defined signal 1
>>>> (gdb) b qemu_system_reset
>>>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>>>> (gdb) b qemu_default_main
>>>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>>>> (gdb) r
>>>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 
>>>> -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 
>>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device 
>>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive 
>>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
>>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev 
>>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset 
>>>> (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at 
>>>> ../system/runstate.c:513
>>>> 513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>>>> (gdb) n
>>>> 517     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : 
>>>> NULL;
>>>> (gdb) n
>>>> 519     cpu_synchronize_all_states();
>>>> (gdb) n
>>>> 521     switch (reason) {
>>>> (gdb) n
>>>> 529     if (!cpus_are_resettable() &&
>>>> (gdb) n
>>>> 553     if (mc && mc->reset) {
>>>> (gdb) n
>>>> 554         mc->reset(current_machine, type);
>>>> (gdb) n
>>>> 558     switch (reason) {
>>>> (gdb) n
>>>> 574     if (cpus_are_resettable()) {
>>>> (gdb) n
>>>> 583             cpu_synchronize_all_post_reset();
>>>> (gdb) n
>>>> 587     vm_set_suspended(false);
>>>> (gdb) n
>>>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>>>> 1814    register_global_state();
>>>> (gdb) n
>>>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at 
>>>> ../system/vl.c:2785
>>>> 2785    if (machine->cgs && !machine->cgs->ready) {
>>>> (gdb) n
>>>> 2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>>>> (gdb) n
>>>> 2793    if (!vga_interface_created && !default_vga &&
>>>> (gdb) n
>>>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at 
>>>> ../system/vl.c:2815
>>>> 2815    if (loadvm) {
>>>> (gdb) n
>>>> 2820    if (replay_mode != REPLAY_MODE_NONE) {
>>>> (gdb) n
>>>> 2824    if (incoming) {
>>>> (gdb) n
>>>> 2837    } else if (autostart) {
>>>> (gdb) n
>>>> 2838        qmp_cont(NULL);
>>>> (gdb) n
>>>> qemu_init (argc=<optimized out>, argv=<optimized out>) at 
>>>> ../system/vl.c:3849
>>>> 3849    qemu_init_displays();
>>>> (gdb) n
>>>> 3850    accel_setup_post(current_machine);
>>>> (gdb) n
>>>> 3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>>>> (gdb) n
>>>> 3852        os_setup_post();
>>>> (gdb) n
>>>> 3854    resume_mux_open();
>>>> (gdb) n
>>>> main (argc=<optimized out>, argv=<optimized out>) at 
>>>> ../system/main.c:84
>>>> 84      bql_unlock();
>>>> (gdb) n
>>>> 85      replay_mutex_unlock();
>>>> (gdb) n
>>>> 87      if (qemu_main) {
>>>> (gdb) n
>>>> 93          qemu_default_main(NULL);
>>>> (gdb) n
>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main 
>>>> (opaque=opaque@entry=0x0) at ../system/main.c:48
>>>> 48      replay_mutex_lock();
>>>> (gdb) n
>>>> 49      bql_lock();
>>>> (gdb) n
>>>> <hangs>
>>>> <system becomes unresponsive at this point>
>>>> Thanks,
>>>> Misbah Anjum N <misanjumn@ibm.com>
>>>> On 2026-03-09 18:53, Ani Sinha wrote:
>>>>> Yes seems this is an issue and I will fix it. Not sure if the fix 
>>>>> will
>>>>> address your issue though ...
>>>>> Can you try the following patch?
>>>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 
>>>>> 2001
>>>>> From: Ani Sinha <anisinha@redhat.com>
>>>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support 
>>>>> reset yet
>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>> ---
>>>>> system/runstate.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>> diff --git a/system/runstate.c b/system/runstate.c
>>>>> index eca722b43c..c1f41284c9 100644
>>>>> --- a/system/runstate.c
>>>>> +++ b/system/runstate.c
>>>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>>>>        (current_machine->new_accel_vmfd_on_reset || 
>>>>> !cpus_are_resettable())) {
>>>>>        if (ac->rebuild_guest) {
>>>>>            ret = ac->rebuild_guest(current_machine);
>>>>> -            if (ret < 0) {
>>>>> +            if (ret < 0 && ret != -EOPNOTSUPP) {
>>>>>                error_report("unable to rebuild guest: %s(%d)",
>>>>>                             strerror(-ret), ret);
>>>>>                vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>>> +            } else if (ret == -EOPNOTSUPP) {
>>>>> +                error_report("accelerator does not support 
>>>>> reset!");
>>>>>            } else {
>>>>>                info_report("virtual machine state has been rebuilt 
>>>>> with new "
>>>>>                            "guest file handle.");
>>>>> --
>>>>> 2.42.0
>>>>>> Is this a confidential guest that cannot be normally reset?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-10 10:05                 ` Misbah Anjum N
@ 2026-03-10 10:12                   ` Ani Sinha
  2026-03-18  8:19                     ` Misbah Anjum N
  0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-10 10:12 UTC (permalink / raw)
  To: Misbah Anjum N
  Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
	vaibhav, sbhat



> On 10 Mar 2026, at 3:35 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
> 
> On 2026-03-10 15:04, Ani Sinha wrote:
>>> On 10 Mar 2026, at 2:38 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>> On 2026-03-10 14:24, Ani Sinha wrote:
>>>>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>>>> Hi Ani and Paolo,
>>>>> We have tested the code by applying both the original commit (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>>>>> However, the issue persists. We've conducted GDB debugging that shows the hang is occurring in a different location than what the fix addresses.
>>>>> Since the original patch is breaking KVM guest bringup completely on ppc64le, and the fix patch does not resolve the issue, given the severity of this regression (complete KVM breakage on ppc64le), we should either find a quick fix or consider reverting the patch until a proper solution can be identified.
>>>> Based on what you just described, it does not seem like the issue is
>>>> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
>>>> revert this patch in your local tree, can you confirm that your issue
>>>> gets fixed?
>>> Yes, the issue is not seen with the immediate previous commit:
>>> commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
>>> Author: Ani Sinha <anisinha@redhat.com>
>>> Date:   Wed Feb 25 09:19:09 2026 +0530
>>>   system/physmem: add helper to reattach existing memory after KVM VM fd change
>>>   After the guest KVM file descriptor has changed as a part of the process of
>>>   confidential guest reset mechanism, existing memory needs to be reattached to
>>>   the new file descriptor. This change adds a helper function ram_block_rebind()
>>>   for this purpose. The next patch will make use of this function.
>>>   Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>   Link: https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
>>>   Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Looks like the next patch is enabling the functionality of the previous patches in such a way which causes bql_lock() to get stuck on architectures (ppc64le in this case) which does not support this feature yet.
>> This theory is not substantiated by code or evidence.
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
>> which is called by this block of code with the tip at
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>>   if (!cpus_are_resettable() &&
>>        (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>>         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>>        if (ac->rebuild_guest) {
>>            ret = ac->rebuild_guest(current_machine);
>>            if (ret < 0) {
>>                error_report("unable to rebuild guest: %s(%d)",
>>                             strerror(-ret), ret);
>>                vm_stop(RUN_STATE_INTERNAL_ERROR);
>>            } else {
>>                info_report("virtual machine state has been rebuilt with new "
>>                            "guest file handle.");
>>                guest_state_rebuilt = true;
>>            }
>>        } else if (!cpus_are_resettable())  {
>>            error_report("accelerator does not support reset!");
>>        } else {
>>            error_report("accelerator does not support rebuilding guest state,"
>>                         " proceeding with normal reset!");
>>        }
>>    }
>> If cpus are resettable, this block will not be called and nothing that
>> the patch introduces will have been executed.
>> So I think you guys need to explain a bit more why you so strongly
>> feel this patch broke it. I am confused and unable to reason this.
>>> Did you validate your patches on other architectures which does not support this feature yet?
>> As you have already seen, on other architectures, the entire block of
>> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
>> exercises this.
> 
> I understand your concern about the code path analysis. Let me clarify our findings with concrete evidence.
> 
> Reproducibility Evidence:
> With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are able to reproduce the hang issue 100% of the time across multiple test runs. When we revert to the previous commit df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots successfully 100% of the time.
> 
> This consistent reproducibility strongly indicates that commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the regression, even if the code path analysis suggests otherwise. This suggests the issue may not be in the code path, but rather in the changes introduced by the patch series.
> 
> As the author who led the development of this patch series, we would appreciate your help in figuring out this issue.

I am really not sure what changes in that patch can cause this breakage in a completely unrelated area when the changes are not even executed.

> 
>>>>> Analysis:
>>>>> 1. This is not a confidential guest. This is a regular KVM guest running on ppc64le.
>>>>> 2. The execution flow shows that qemu_system_reset() completes successfully and never enters the code path at line 529-543
>>>> This is what I expected and therefore, no code related to coco guest
>>>> rebuilding is getting executed. Your issue seems to be somewhere else.
>>> The issue occurs only with the introduction of this patch and not with the previous upstream commit as explained above.
>>>>> 3. The hang occurs later in qemu_default_main() at system/main.c:49, after calling bql_lock()
>>>>> 4. The ppc KVM guest boots fine with the previous commit - df8df3cb6b743372ebb335bd8404bc3d748da350
>>>>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP during reset, but bql_lock() getting stuck in qemu_default_main()
>>>>> GDB Trace Analysis:
>>>>> We set breakpoints at qemu_system_reset() and qemu_default_main() to trace the execution flow. The system successfully completes qemu_system_reset() without entering the problematic code path where the fix provided by you applies (system/runstate.c:529-543).
>>>>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>>> (gdb) handle SIGUSR1 pass nostop noprint
>>>>> Signal        Stop Print Pass to program Description
>>>>> SIGUSR1       No No Yes User defined signal 1
>>>>> (gdb) b qemu_system_reset
>>>>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>>>>> (gdb) b qemu_default_main
>>>>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>>>>> (gdb) r
>>>>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
>>>>> 513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>>>>> (gdb) n
>>>>> 517     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
>>>>> (gdb) n
>>>>> 519     cpu_synchronize_all_states();
>>>>> (gdb) n
>>>>> 521     switch (reason) {
>>>>> (gdb) n
>>>>> 529     if (!cpus_are_resettable() &&
>>>>> (gdb) n
>>>>> 553     if (mc && mc->reset) {
>>>>> (gdb) n
>>>>> 554         mc->reset(current_machine, type);
>>>>> (gdb) n
>>>>> 558     switch (reason) {
>>>>> (gdb) n
>>>>> 574     if (cpus_are_resettable()) {
>>>>> (gdb) n
>>>>> 583             cpu_synchronize_all_post_reset();
>>>>> (gdb) n
>>>>> 587     vm_set_suspended(false);
>>>>> (gdb) n
>>>>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>>>>> 1814    register_global_state();
>>>>> (gdb) n
>>>>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at ../system/vl.c:2785
>>>>> 2785    if (machine->cgs && !machine->cgs->ready) {
>>>>> (gdb) n
>>>>> 2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>>>>> (gdb) n
>>>>> 2793    if (!vga_interface_created && !default_vga &&
>>>>> (gdb) n
>>>>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at ../system/vl.c:2815
>>>>> 2815    if (loadvm) {
>>>>> (gdb) n
>>>>> 2820    if (replay_mode != REPLAY_MODE_NONE) {
>>>>> (gdb) n
>>>>> 2824    if (incoming) {
>>>>> (gdb) n
>>>>> 2837    } else if (autostart) {
>>>>> (gdb) n
>>>>> 2838        qmp_cont(NULL);
>>>>> (gdb) n
>>>>> qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3849
>>>>> 3849    qemu_init_displays();
>>>>> (gdb) n
>>>>> 3850    accel_setup_post(current_machine);
>>>>> (gdb) n
>>>>> 3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>>>>> (gdb) n
>>>>> 3852        os_setup_post();
>>>>> (gdb) n
>>>>> 3854    resume_mux_open();
>>>>> (gdb) n
>>>>> main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
>>>>> 84      bql_unlock();
>>>>> (gdb) n
>>>>> 85      replay_mutex_unlock();
>>>>> (gdb) n
>>>>> 87      if (qemu_main) {
>>>>> (gdb) n
>>>>> 93          qemu_default_main(NULL);
>>>>> (gdb) n
>>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main (opaque=opaque@entry=0x0) at ../system/main.c:48
>>>>> 48      replay_mutex_lock();
>>>>> (gdb) n
>>>>> 49      bql_lock();
>>>>> (gdb) n
>>>>> <hangs>
>>>>> <system becomes unresponsive at this point>
>>>>> Thanks,
>>>>> Misbah Anjum N <misanjumn@ibm.com>
>>>>> On 2026-03-09 18:53, Ani Sinha wrote:
>>>>>> Yes seems this is an issue and I will fix it. Not sure if the fix will
>>>>>> address your issue though ...
>>>>>> Can you try the following patch?
>>>>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
>>>>>> From: Ani Sinha <anisinha@redhat.com>
>>>>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>>>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
>>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>>> ---
>>>>>> system/runstate.c | 4 +++-
>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>> diff --git a/system/runstate.c b/system/runstate.c
>>>>>> index eca722b43c..c1f41284c9 100644
>>>>>> --- a/system/runstate.c
>>>>>> +++ b/system/runstate.c
>>>>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>>>>>       (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
>>>>>>       if (ac->rebuild_guest) {
>>>>>>           ret = ac->rebuild_guest(current_machine);
>>>>>> -            if (ret < 0) {
>>>>>> +            if (ret < 0 && ret != -EOPNOTSUPP) {
>>>>>>               error_report("unable to rebuild guest: %s(%d)",
>>>>>>                            strerror(-ret), ret);
>>>>>>               vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>>>> +            } else if (ret == -EOPNOTSUPP) {
>>>>>> +                error_report("accelerator does not support reset!");
>>>>>>           } else {
>>>>>>               info_report("virtual machine state has been rebuilt with new "
>>>>>>                           "guest file handle.");
>>>>>> --
>>>>>> 2.42.0
>>>>>>> Is this a confidential guest that cannot be normally reset?




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-10 10:12                   ` Ani Sinha
@ 2026-03-18  8:19                     ` Misbah Anjum N
  2026-03-18  8:39                       ` Ani Sinha
  0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-18  8:19 UTC (permalink / raw)
  To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
  Cc: npiggin, Harsh Prateek Bora, vaibhav, sbhat

Hi Ani and Paolo,

Following up on the KVM guest boot issue due to commit 98884e0c, I have 
conducted additional testing that reveals important new information 
about the nature of this issue.

The hang is specifically triggered when SMP is configured, that is, when 
-smp parameter is provided in the QEMU command. This is also validated 
via KVM Unit Tests involving SMP which are failing due to the same 
commit.

Test Results:
Without SMP (boots successfully):
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
pseries,accel=kvm \
   -enable-kvm -m 32768 -nographic -device virtio-balloon \
   -device virtio-scsi-pci,id=scsi0 \
   -drive 
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
\
   -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
   -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0

SLOF 
**********************************************************************
QEMU Starting
  Build Date = Oct 26 2025 18:45:22
  FW Version = release 20251026
  Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
...
...
Result: Guest boots successfully

With SMP (hangs indefinitely):
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
pseries,accel=kvm \
   -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
   -device virtio-balloon -device virtio-scsi-pci,id=scsi0 \
   -drive 
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 
\
   -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
   -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
...
...
Result: Hangs indefinitely at bql_lock() in qemu_default_main()

KVM Unit Tests:
Running kvm-unit-tests confirms the SMP dependency. Note that tests 
explicitly involving SMP (smp, smp-smt, atomics) all fail with SIGKILL, 
while single-threaded tests pass.

# ./run_tests.sh
FAIL selftest-setup (terminated on SIGKILL)
PASS selftest-migration (2 tests)
PASS selftest-migration-skip (1 tests)
PASS migration-memory (1 tests)
PASS spapr_hcall (9 tests, 1 skipped)
PASS spapr_vpa (13 tests)
PASS rtas-get-time-of-day (10 tests)
PASS rtas-get-time-of-day-base (10 tests)
PASS rtas-set-time-of-day (5 tests)
PASS emulator (4 tests)
PASS interrupts (13 tests)
FAIL mmu (terminated on SIGKILL)
FAIL smp (terminated on SIGKILL)
FAIL smp-smt (terminated on SIGKILL)
SKIP smp-thread-single (qemu-system-ppc64: -accel tcg,thread=single: 
invalid accelerator tcg)
FAIL atomics (terminated on SIGKILL)
PASS atomics-migration (1 tests)
PASS timebase (12 tests, 1 known failures, 1 skipped)
SKIP timebase-icount (qemu-system-ppc64: -icount shift=5: cannot 
configure icount, TCG support not available)
FAIL h_cede_tm
PASS sprs (14 tests)
FAIL sprs-migration (14 tests, 1 unexpected failures)
PASS sieve

Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>


On 2026-03-10 15:42, Ani Sinha wrote:
>>> This theory is not substantiated by code or evidence.
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
>>> which is called by this block of code with the tip at
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>>>   if (!cpus_are_resettable() &&
>>>        (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>>>         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>>>        if (ac->rebuild_guest) {
>>>            ret = ac->rebuild_guest(current_machine);
>>>            if (ret < 0) {
>>>                error_report("unable to rebuild guest: %s(%d)",
>>>                             strerror(-ret), ret);
>>>                vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>            } else {
>>>                info_report("virtual machine state has been rebuilt 
>>> with new "
>>>                            "guest file handle.");
>>>                guest_state_rebuilt = true;
>>>            }
>>>        } else if (!cpus_are_resettable())  {
>>>            error_report("accelerator does not support reset!");
>>>        } else {
>>>            error_report("accelerator does not support rebuilding 
>>> guest state,"
>>>                         " proceeding with normal reset!");
>>>        }
>>>    }
>>> If cpus are resettable, this block will not be called and nothing 
>>> that
>>> the patch introduces will have been executed.
>>> So I think you guys need to explain a bit more why you so strongly
>>> feel this patch broke it. I am confused and unable to reason this.
>>>> Did you validate your patches on other architectures which does not 
>>>> support this feature yet?
>>> As you have already seen, on other architectures, the entire block of
>>> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
>>> exercises this.
>> 
>> I understand your concern about the code path analysis. Let me clarify 
>> our findings with concrete evidence.
>> 
>> Reproducibility Evidence:
>> With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are 
>> able to reproduce the hang issue 100% of the time across multiple test 
>> runs. When we revert to the previous commit 
>> df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots 
>> successfully 100% of the time.
>> 
>> This consistent reproducibility strongly indicates that commit 
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the 
>> regression, even if the code path analysis suggests otherwise. This 
>> suggests the issue may not be in the code path, but rather in the 
>> changes introduced by the patch series.
>> 
>> As the author who led the development of this patch series, we would 
>> appreciate your help in figuring out this issue.
> 
> I am really not sure what changes in that patch can cause this
> breakage in a completely unrelated area when the changes are not even
> executed.
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-18  8:19                     ` Misbah Anjum N
@ 2026-03-18  8:39                       ` Ani Sinha
  2026-03-18  9:30                         ` Ani Sinha
  0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-18  8:39 UTC (permalink / raw)
  To: Misbah Anjum N
  Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
	vaibhav, sbhat



> On 18 Mar 2026, at 1:49 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
> 
> Hi Ani and Paolo,
> 
> Following up on the KVM guest boot issue due to commit 98884e0c, I have conducted additional testing that reveals important new information about the nature of this issue.
> 
> The hang is specifically triggered when SMP is configured, that is, when -smp parameter is provided in the QEMU command. This is also validated via KVM Unit Tests involving SMP which are failing due to the same commit.

So basically what we know is:
 - Issue seems to show on ppc64.
 - tree with df8df3cb6b at tip does not show issue.
 - tree with next commit 98884e0cc1 at tip shows the issue.
 - kvm_reset_vmfd() introduced by 98884e0cc1 is not called. 

You think any one these commits are the cause of the issue (which I personally cannot agree with):

98884e0cc1 accel/kvm: add changes required to support KVM VM file descriptor change
df8df3cb6b system/physmem: add helper to reattach existing memory after KVM VM fd change
4003e5e65f hw/accel: add a per-accelerator callback to change VM accelerator handle
2391125f13 accel/kvm: add confidential class member to indicate guest rebuild capability
b3f0a55576 i386/kvm: avoid installing duplicate msr entries in msr_handlers

None of the above commits does anything SMP related.

> 
> Test Results:
> Without SMP (boots successfully):
> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm \
>  -enable-kvm -m 32768 -nographic -device virtio-balloon \
>  -device virtio-scsi-pci,id=scsi0 \
>  -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 \
>  -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
>  -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> 
> SLOF **********************************************************************
> QEMU Starting
> Build Date = Oct 26 2025 18:45:22
> FW Version = release 20251026
> Press "s" to enter Open Firmware.
> 
> Populating /vdevice methods
> Populating /vdevice/vty@71000000
> Populating /vdevice/nvram@71000001
> Populating /pci@800000020000000
> ...
> ...
> Result: Guest boots successfully
> 
> With SMP (hangs indefinitely):
> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm \
>  -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
>  -device virtio-balloon -device virtio-scsi-pci,id=scsi0 \
>  -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 \
>  -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
>  -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> ...
> ...
> Result: Hangs indefinitely at bql_lock() in qemu_default_main()
> 
> KVM Unit Tests:
> Running kvm-unit-tests confirms the SMP dependency. Note that tests explicitly involving SMP (smp, smp-smt, atomics) all fail with SIGKILL, while single-threaded tests pass.
> 
> # ./run_tests.sh
> FAIL selftest-setup (terminated on SIGKILL)
> PASS selftest-migration (2 tests)
> PASS selftest-migration-skip (1 tests)
> PASS migration-memory (1 tests)
> PASS spapr_hcall (9 tests, 1 skipped)
> PASS spapr_vpa (13 tests)
> PASS rtas-get-time-of-day (10 tests)
> PASS rtas-get-time-of-day-base (10 tests)
> PASS rtas-set-time-of-day (5 tests)
> PASS emulator (4 tests)
> PASS interrupts (13 tests)
> FAIL mmu (terminated on SIGKILL)
> FAIL smp (terminated on SIGKILL)
> FAIL smp-smt (terminated on SIGKILL)
> SKIP smp-thread-single (qemu-system-ppc64: -accel tcg,thread=single: invalid accelerator tcg)
> FAIL atomics (terminated on SIGKILL)
> PASS atomics-migration (1 tests)
> PASS timebase (12 tests, 1 known failures, 1 skipped)
> SKIP timebase-icount (qemu-system-ppc64: -icount shift=5: cannot configure icount, TCG support not available)
> FAIL h_cede_tm
> PASS sprs (14 tests)
> FAIL sprs-migration (14 tests, 1 unexpected failures)
> PASS sieve
> 
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>
> 
> 
> On 2026-03-10 15:42, Ani Sinha wrote:
>>>> This theory is not substantiated by code or evidence.
>>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
>>>> which is called by this block of code with the tip at
>>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>>>>  if (!cpus_are_resettable() &&
>>>>       (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>>>>        reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>>>>       if (ac->rebuild_guest) {
>>>>           ret = ac->rebuild_guest(current_machine);
>>>>           if (ret < 0) {
>>>>               error_report("unable to rebuild guest: %s(%d)",
>>>>                            strerror(-ret), ret);
>>>>               vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>>           } else {
>>>>               info_report("virtual machine state has been rebuilt with new "
>>>>                           "guest file handle.");
>>>>               guest_state_rebuilt = true;
>>>>           }
>>>>       } else if (!cpus_are_resettable())  {
>>>>           error_report("accelerator does not support reset!");
>>>>       } else {
>>>>           error_report("accelerator does not support rebuilding guest state,"
>>>>                        " proceeding with normal reset!");
>>>>       }
>>>>   }
>>>> If cpus are resettable, this block will not be called and nothing that
>>>> the patch introduces will have been executed.
>>>> So I think you guys need to explain a bit more why you so strongly
>>>> feel this patch broke it. I am confused and unable to reason this.
>>>>> Did you validate your patches on other architectures which does not support this feature yet?
>>>> As you have already seen, on other architectures, the entire block of
>>>> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
>>>> exercises this.
>>> I understand your concern about the code path analysis. Let me clarify our findings with concrete evidence.
>>> Reproducibility Evidence:
>>> With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are able to reproduce the hang issue 100% of the time across multiple test runs. When we revert to the previous commit df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots successfully 100% of the time.
>>> This consistent reproducibility strongly indicates that commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the regression, even if the code path analysis suggests otherwise. This suggests the issue may not be in the code path, but rather in the changes introduced by the patch series.
>>> As the author who led the development of this patch series, we would appreciate your help in figuring out this issue.
>> I am really not sure what changes in that patch can cause this
>> breakage in a completely unrelated area when the changes are not even
>> executed.
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-18  8:39                       ` Ani Sinha
@ 2026-03-18  9:30                         ` Ani Sinha
  2026-04-06  8:54                           ` Misbah Anjum N
  0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-18  9:30 UTC (permalink / raw)
  To: Misbah Anjum N
  Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
	vaibhav, sbhat



> On 18 Mar 2026, at 2:09 PM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> 
> 
>> On 18 Mar 2026, at 1:49 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>> 
>> Hi Ani and Paolo,
>> 
>> Following up on the KVM guest boot issue due to commit 98884e0c, I have conducted additional testing that reveals important new information about the nature of this issue.
>> 
>> The hang is specifically triggered when SMP is configured, that is, when -smp parameter is provided in the QEMU command. This is also validated via KVM Unit Tests involving SMP which are failing due to the same commit.
> 
> So basically what we know is:
> - Issue seems to show on ppc64.
> - tree with df8df3cb6b at tip does not show issue.
> - tree with next commit 98884e0cc1 at tip shows the issue.
> - kvm_reset_vmfd() introduced by 98884e0cc1 is not called. 
> 
> You think any one these commits are the cause of the issue (which I personally cannot agree with):
> 
> 98884e0cc1 accel/kvm: add changes required to support KVM VM file descriptor change
> df8df3cb6b system/physmem: add helper to reattach existing memory after KVM VM fd change
> 4003e5e65f hw/accel: add a per-accelerator callback to change VM accelerator handle
> 2391125f13 accel/kvm: add confidential class member to indicate guest rebuild capability
> b3f0a55576 i386/kvm: avoid installing duplicate msr entries in msr_handlers
> 
> None of the above commits does anything SMP related.

One possible thing to try is:

Revert everything in stubs/kvm.c and hence changes in stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c introduced by 98884e0cc1 .
You will have to comment out calls to kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since kvm_reset_vmfd() is not called anyway, not should make no difference if those calls are commented out.

Let me know what you get after doing the above.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-03-18  9:30                         ` Ani Sinha
@ 2026-04-06  8:54                           ` Misbah Anjum N
  2026-04-07  4:09                             ` Ani Sinha
  2026-04-09 16:18                             ` Harsh Prateek Bora
  0 siblings, 2 replies; 19+ messages in thread
From: Misbah Anjum N @ 2026-04-06  8:54 UTC (permalink / raw)
  To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
  Cc: npiggin, Harsh Prateek Bora, vaibhav, sbhat

Hi Ani,
I've completed the testing you suggested. Unfortunately, the SMP hang 
still persists with these changes.

Changes made:
As requested, I reverted everything in stubs/kvm.c and the related 
changes in stubs/meson.build, include/system/kvm.h, and 
target/i386/kvm/kvm.c. I also commented out the calls to 
kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in 
kvm_reset_vmfd().

Test result:
The issue persists - guests still hang indefinitely during boot when SMP 
is configured.

Git diff:
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index cc5c42ce4d..04b9cbe7c9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2622,11 +2622,12 @@ static int kvm_reset_vmfd(MachineState *ms)
       * bail if the current architecture does not support VM file
       * descriptor change.
       */
-    if (!kvm_arch_supports_vmfd_change()) {
+    /*if (!kvm_arch_supports_vmfd_change()) {
          error_report("This target architecture does not support KVM VM 
"
                       "file descriptor change.");
          return -EOPNOTSUPP;
      }
+    */

      s = KVM_STATE(ms->accelerator);
      kml = &s->memory_listener;
@@ -2659,10 +2660,10 @@ static int kvm_reset_vmfd(MachineState *ms)
      }
      assert(!err);

-    ret = kvm_arch_on_vmfd_change(ms, s);
+    /*ret = kvm_arch_on_vmfd_change(ms, s);
      if (ret < 0) {
          return ret;
-    }
+    }*/

      if (s->kernel_irqchip_allowed) {
          do_kvm_irqchip_create(s);

diff --git a/include/system/kvm.h b/include/system/kvm.h
index 5fc7251fd9..0dad0079ed 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -456,8 +456,6 @@ int kvm_physical_memory_addr_from_host(KVMState *s, 
void *ram_addr,

  #endif /* COMPILING_PER_TARGET */

-bool kvm_arch_supports_vmfd_change(void);
-int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);

  void kvm_cpu_synchronize_state(CPUState *cpu);

diff --git a/stubs/kvm.c b/stubs/kvm.c
deleted file mode 100644
index 2db61d89a7..0000000000
--- a/stubs/kvm.c
+++ /dev/null
@@ -1,22 +0,0 @@
-/*
- * kvm target arch specific stubs
- *
- * Copyright (c) 2026 Red Hat, Inc.
- *
- * Author:
- *   Ani Sinha <anisinha@redhat.com>
- *
- * SPDX-License-Identifier: GPL-2.0-or-later
- */
-#include "qemu/osdep.h"
-#include "system/kvm.h"
-
-int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
-{
-    abort();
-}
-
-bool kvm_arch_supports_vmfd_change(void)
-{
-    return false;
-}

diff --git a/stubs/meson.build b/stubs/meson.build
index 6ae478bacc..8a07059500 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -74,7 +74,6 @@ if have_system
    if igvm.found()
      stub_ss.add(files('igvm.c'))
    endif
-  stub_ss.add(files('kvm.c'))
    stub_ss.add(files('target-get-monitor-def.c'))
    stub_ss.add(files('target-monitor-defs.c'))
    stub_ss.add(files('win32-kbd-hook.c'))

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 524b5276a6..3dfd9a5974 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3389,15 +3389,6 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
      return 0;
  }

-int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
-{
-    abort();
-}
-
-bool kvm_arch_supports_vmfd_change(void)
-{
-    return false;
-}

  int kvm_arch_init(MachineState *ms, KVMState *s)
  {


I've also tested with the latest QEMU build from master, and the issue 
still persists there as well. Could you suggest what additional 
debugging steps I should take to help identify the root cause?

Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>


On 2026-03-18 15:00, Ani Sinha wrote:
> One possible thing to try is:
> 
> Revert everything in stubs/kvm.c and hence changes in
> stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c
> introduced by 98884e0cc1 .
> You will have to comment out calls to kvm_arch_supports_vmfd_change()
> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since
> kvm_reset_vmfd() is not called anyway, not should make no difference
> if those calls are commented out.
> 
> Let me know what you get after doing the above.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-04-06  8:54                           ` Misbah Anjum N
@ 2026-04-07  4:09                             ` Ani Sinha
  2026-04-07 13:45                               ` Ani Sinha
  2026-04-09 16:18                             ` Harsh Prateek Bora
  1 sibling, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-04-07  4:09 UTC (permalink / raw)
  To: Misbah Anjum N
  Cc: Pbonzini, Qemu Devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
	vaibhav, sbhat



> On 6 Apr 2026, at 2:24 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
> 
> Hi Ani,
> I've completed the testing you suggested. Unfortunately, the SMP hang still persists with these changes.
> 
> Changes made:
> As requested, I reverted everything in stubs/kvm.c and the related changes in stubs/meson.build, include/system/kvm.h, and target/i386/kvm/kvm.c. I also commented out the calls to kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in kvm_reset_vmfd().
> 
> Test result:
> The issue persists - guests still hang indefinitely during boot when SMP is configured.
> 
> Git diff:
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index cc5c42ce4d..04b9cbe7c9 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2622,11 +2622,12 @@ static int kvm_reset_vmfd(MachineState *ms)
>      * bail if the current architecture does not support VM file
>      * descriptor change.
>      */
> -    if (!kvm_arch_supports_vmfd_change()) {
> +    /*if (!kvm_arch_supports_vmfd_change()) {
>         error_report("This target architecture does not support KVM VM "
>                      "file descriptor change.");
>         return -EOPNOTSUPP;
>     }
> +    */
> 
>     s = KVM_STATE(ms->accelerator);
>     kml = &s->memory_listener;
> @@ -2659,10 +2660,10 @@ static int kvm_reset_vmfd(MachineState *ms)
>     }
>     assert(!err);
> 
> -    ret = kvm_arch_on_vmfd_change(ms, s);
> +    /*ret = kvm_arch_on_vmfd_change(ms, s);
>     if (ret < 0) {
>         return ret;
> -    }
> +    }*/
> 
>     if (s->kernel_irqchip_allowed) {
>         do_kvm_irqchip_create(s);
> 
> diff --git a/include/system/kvm.h b/include/system/kvm.h
> index 5fc7251fd9..0dad0079ed 100644
> --- a/include/system/kvm.h
> +++ b/include/system/kvm.h
> @@ -456,8 +456,6 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
> 
> #endif /* COMPILING_PER_TARGET */
> 
> -bool kvm_arch_supports_vmfd_change(void);
> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);
> 
> void kvm_cpu_synchronize_state(CPUState *cpu);
> 
> diff --git a/stubs/kvm.c b/stubs/kvm.c
> deleted file mode 100644
> index 2db61d89a7..0000000000
> --- a/stubs/kvm.c
> +++ /dev/null
> @@ -1,22 +0,0 @@
> -/*
> - * kvm target arch specific stubs
> - *
> - * Copyright (c) 2026 Red Hat, Inc.
> - *
> - * Author:
> - *   Ani Sinha <anisinha@redhat.com>
> - *
> - * SPDX-License-Identifier: GPL-2.0-or-later
> - */
> -#include "qemu/osdep.h"
> -#include "system/kvm.h"
> -
> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
> -{
> -    abort();
> -}
> -
> -bool kvm_arch_supports_vmfd_change(void)
> -{
> -    return false;
> -}
> 
> diff --git a/stubs/meson.build b/stubs/meson.build
> index 6ae478bacc..8a07059500 100644
> --- a/stubs/meson.build
> +++ b/stubs/meson.build
> @@ -74,7 +74,6 @@ if have_system
>   if igvm.found()
>     stub_ss.add(files('igvm.c'))
>   endif
> -  stub_ss.add(files('kvm.c'))
>   stub_ss.add(files('target-get-monitor-def.c'))
>   stub_ss.add(files('target-monitor-defs.c'))
>   stub_ss.add(files('win32-kbd-hook.c'))
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 524b5276a6..3dfd9a5974 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -3389,15 +3389,6 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
>     return 0;
> }
> 
> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
> -{
> -    abort();
> -}
> -
> -bool kvm_arch_supports_vmfd_change(void)
> -{
> -    return false;
> -}
> 
> int kvm_arch_init(MachineState *ms, KVMState *s)
> {
> 
> 
> I've also tested with the latest QEMU build from master, and the issue still persists there as well. Could you suggest what additional debugging steps I should take to help identify the root cause?

Since you mentioned that 98884e0cc1 is the root of the problem, the intention is to bisect within this patch and find the problematic hunk. You reported that reverting a major part of this patch did not make a difference. Maybe you can try reverting the following refactoring too (in addition to the reverts above) and see what happens.

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0d8b0c4347..cc5c42ce4d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
     g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
 }
 
-static void kvm_irqchip_create(KVMState *s)
+static void do_kvm_irqchip_create(KVMState *s)
 {
     int ret;
-
-    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
     if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
         ;
     } else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
@@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
         fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
         exit(1);
     }
+}
+
+static void kvm_irqchip_create(KVMState *s)
+{
+    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
 
+    do_kvm_irqchip_create(s);
     kvm_kernel_irqchip = true;
     /* If we have an in-kernel IRQ chip then we must have asynchronous
      * interrupt delivery (though the reverse is not necessarily true)

If reverting the above still shows the issue, then I think 98884e0cc1 is not the root cause of the issue. The issue is somewhere else as we have tried reverting almost everything from that patch except implementation of kvm_reset_vmfd() which as you reported isn’t called/executed.

> 
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>
> 
> 
> On 2026-03-18 15:00, Ani Sinha wrote:
>> One possible thing to try is:
>> Revert everything in stubs/kvm.c and hence changes in
>> stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c
>> introduced by 98884e0cc1 .
>> You will have to comment out calls to kvm_arch_supports_vmfd_change()
>> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since
>> kvm_reset_vmfd() is not called anyway, not should make no difference
>> if those calls are commented out.
>> Let me know what you get after doing the above.
> 



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-04-07  4:09                             ` Ani Sinha
@ 2026-04-07 13:45                               ` Ani Sinha
  0 siblings, 0 replies; 19+ messages in thread
From: Ani Sinha @ 2026-04-07 13:45 UTC (permalink / raw)
  To: Misbah Anjum N
  Cc: Pbonzini, Qemu Devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
	vaibhav, sbhat



> On 7 Apr 2026, at 9:39 AM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> 
> 
>> On 6 Apr 2026, at 2:24 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>> 
>> Hi Ani,
>> I've completed the testing you suggested. Unfortunately, the SMP hang still persists with these changes.
>> 
>> Changes made:
>> As requested, I reverted everything in stubs/kvm.c and the related changes in stubs/meson.build, include/system/kvm.h, and target/i386/kvm/kvm.c. I also commented out the calls to kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in kvm_reset_vmfd().
>> 
>> Test result:
>> The issue persists - guests still hang indefinitely during boot when SMP is configured.
>> 
>> Git diff:
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index cc5c42ce4d..04b9cbe7c9 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -2622,11 +2622,12 @@ static int kvm_reset_vmfd(MachineState *ms)
>>     * bail if the current architecture does not support VM file
>>     * descriptor change.
>>     */
>> -    if (!kvm_arch_supports_vmfd_change()) {
>> +    /*if (!kvm_arch_supports_vmfd_change()) {
>>        error_report("This target architecture does not support KVM VM "
>>                     "file descriptor change.");
>>        return -EOPNOTSUPP;
>>    }
>> +    */
>> 
>>    s = KVM_STATE(ms->accelerator);
>>    kml = &s->memory_listener;
>> @@ -2659,10 +2660,10 @@ static int kvm_reset_vmfd(MachineState *ms)
>>    }
>>    assert(!err);
>> 
>> -    ret = kvm_arch_on_vmfd_change(ms, s);
>> +    /*ret = kvm_arch_on_vmfd_change(ms, s);
>>    if (ret < 0) {
>>        return ret;
>> -    }
>> +    }*/
>> 
>>    if (s->kernel_irqchip_allowed) {
>>        do_kvm_irqchip_create(s);
>> 
>> diff --git a/include/system/kvm.h b/include/system/kvm.h
>> index 5fc7251fd9..0dad0079ed 100644
>> --- a/include/system/kvm.h
>> +++ b/include/system/kvm.h
>> @@ -456,8 +456,6 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
>> 
>> #endif /* COMPILING_PER_TARGET */
>> 
>> -bool kvm_arch_supports_vmfd_change(void);
>> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);
>> 
>> void kvm_cpu_synchronize_state(CPUState *cpu);
>> 
>> diff --git a/stubs/kvm.c b/stubs/kvm.c
>> deleted file mode 100644
>> index 2db61d89a7..0000000000
>> --- a/stubs/kvm.c
>> +++ /dev/null
>> @@ -1,22 +0,0 @@
>> -/*
>> - * kvm target arch specific stubs
>> - *
>> - * Copyright (c) 2026 Red Hat, Inc.
>> - *
>> - * Author:
>> - *   Ani Sinha <anisinha@redhat.com>
>> - *
>> - * SPDX-License-Identifier: GPL-2.0-or-later
>> - */
>> -#include "qemu/osdep.h"
>> -#include "system/kvm.h"
>> -
>> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
>> -{
>> -    abort();
>> -}
>> -
>> -bool kvm_arch_supports_vmfd_change(void)
>> -{
>> -    return false;
>> -}
>> 
>> diff --git a/stubs/meson.build b/stubs/meson.build
>> index 6ae478bacc..8a07059500 100644
>> --- a/stubs/meson.build
>> +++ b/stubs/meson.build
>> @@ -74,7 +74,6 @@ if have_system
>>  if igvm.found()
>>    stub_ss.add(files('igvm.c'))
>>  endif
>> -  stub_ss.add(files('kvm.c'))
>>  stub_ss.add(files('target-get-monitor-def.c'))
>>  stub_ss.add(files('target-monitor-defs.c'))
>>  stub_ss.add(files('win32-kbd-hook.c'))
>> 
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 524b5276a6..3dfd9a5974 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -3389,15 +3389,6 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
>>    return 0;
>> }
>> 
>> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
>> -{
>> -    abort();
>> -}
>> -
>> -bool kvm_arch_supports_vmfd_change(void)
>> -{
>> -    return false;
>> -}
>> 
>> int kvm_arch_init(MachineState *ms, KVMState *s)
>> {
>> 
>> 
>> I've also tested with the latest QEMU build from master, and the issue still persists there as well. Could you suggest what additional debugging steps I should take to help identify the root cause?
> 
> Since you mentioned that 98884e0cc1 is the root of the problem, the intention is to bisect within this patch and find the problematic hunk. You reported that reverting a major part of this patch did not make a difference. Maybe you can try reverting the following refactoring too (in addition to the reverts above) and see what happens.
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 0d8b0c4347..cc5c42ce4d 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
>     g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
> }
> 
> -static void kvm_irqchip_create(KVMState *s)
> +static void do_kvm_irqchip_create(KVMState *s)
> {
>     int ret;
> -
> -    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
>     if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
>         ;
>     } else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
> @@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
>         fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
>         exit(1);
>     }
> +}
> +
> +static void kvm_irqchip_create(KVMState *s)
> +{
> +    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
> 
> +    do_kvm_irqchip_create(s);
>     kvm_kernel_irqchip = true;
>     /* If we have an in-kernel IRQ chip then we must have asynchronous
>      * interrupt delivery (though the reverse is not necessarily true)
> 
> If reverting the above still shows the issue, then I think 98884e0cc1 is not the root cause of the issue.

Actually if you still see the issue, revert the following change as well

@@ -4015,6 +4096,7 @@ static void kvm_accel_class_init(ObjectClass *oc, const void *data)
     AccelClass *ac = ACCEL_CLASS(oc);
     ac->name = "KVM";
     ac->init_machine = kvm_init;
+    ac->rebuild_guest = kvm_reset_vmfd;
     ac->has_memory = kvm_accel_has_memory;
     ac->allowed = &kvm_allowed;
     ac->gdbstub_supported_sstep_flags = kvm_gdbstub_sstep_flags;

And then if you still see the issue, revert this too

diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e43d18a869..e4beda0148 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -14,6 +14,7 @@ kvm_destroy_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
 kvm_irqchip_commit_routes(void) ""
+kvm_reset_vmfd(void) ""
 kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
 kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"


Basically incrementally keep reverting until it works and at that point you know which hunk is the cause of the problem.


> The issue is somewhere else as we have tried reverting almost everything from that patch except implementation of kvm_reset_vmfd() which as you reported isn’t called/executed.
> 
>> 
>> Thanks,
>> Misbah Anjum N <misanjum@linux.ibm.com>
>> 
>> 
>> On 2026-03-18 15:00, Ani Sinha wrote:
>>> One possible thing to try is:
>>> Revert everything in stubs/kvm.c and hence changes in
>>> stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c
>>> introduced by 98884e0cc1 .
>>> You will have to comment out calls to kvm_arch_supports_vmfd_change()
>>> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since
>>> kvm_reset_vmfd() is not called anyway, not should make no difference
>>> if those calls are commented out.
>>> Let me know what you get after doing the above.




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
  2026-04-06  8:54                           ` Misbah Anjum N
  2026-04-07  4:09                             ` Ani Sinha
@ 2026-04-09 16:18                             ` Harsh Prateek Bora
  1 sibling, 0 replies; 19+ messages in thread
From: Harsh Prateek Bora @ 2026-04-09 16:18 UTC (permalink / raw)
  To: Misbah Anjum N, Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
  Cc: npiggin, vaibhav, sbhat, Gautam Menghani

Hi Misbah,

On 06/04/26 2:24 pm, Misbah Anjum N wrote:
> Hi Ani,
> I've completed the testing you suggested. Unfortunately, the SMP hang 
> still persists with these changes.
> 
> Changes made:
> As requested, I reverted everything in stubs/kvm.c and the related 
> changes in stubs/meson.build, include/system/kvm.h, and target/i386/kvm/ 
> kvm.c. I also commented out the calls to kvm_arch_supports_vmfd_change() 
> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd().
> 
> Test result:
> The issue persists - guests still hang indefinitely during boot when SMP 
> is configured.

I have posted a patch here which should fix this smp hang issue:

https://lore.kernel.org/qemu-devel/20260409161042.55281-1-harshpb@linux.ibm.com/

Could you please help validate for different scenarios and share feedback?

Thanks
Harsh


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-04-09 16:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 10:52 [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c Misbah Anjum N
2026-03-09  8:28 ` Misbah Anjum N
2026-03-09 11:04   ` Harsh Prateek Bora
2026-03-09 13:11     ` Ani Sinha
2026-03-09 13:23       ` Ani Sinha
2026-03-10  8:39         ` Misbah Anjum N
2026-03-10  8:54           ` Ani Sinha
2026-03-10  9:08             ` Misbah Anjum N
2026-03-10  9:34               ` Ani Sinha
2026-03-10 10:05                 ` Misbah Anjum N
2026-03-10 10:12                   ` Ani Sinha
2026-03-18  8:19                     ` Misbah Anjum N
2026-03-18  8:39                       ` Ani Sinha
2026-03-18  9:30                         ` Ani Sinha
2026-04-06  8:54                           ` Misbah Anjum N
2026-04-07  4:09                             ` Ani Sinha
2026-04-07 13:45                               ` Ani Sinha
2026-04-09 16:18                             ` Harsh Prateek Bora
2026-03-09 13:30     ` Ani Sinha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.