* [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
@ 2026-03-06 10:52 Misbah Anjum N
2026-03-09 8:28 ` Misbah Anjum N
0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-06 10:52 UTC (permalink / raw)
To: qemu-ppc; +Cc: qemu-devel, anisinha, pbonzini
Hi,
I'm reporting a critical regression on ppc64le that causes all KVM
guests to hang immediately during startup. Git bisect identified commit
98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad commit. The
commit completely breaks KVM functionality on ppc64le.
Regression Details:
Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
changes required to support KVM VM file descriptor change"
Commit Link:
https://gitlab.com/qemu-project/qemu/-/commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
Environment:
Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
Libvirt: 12.1.0
Guest: Fedora 42, Kernel 7.0.0-rc2
Machine Type: pseries with KVM acceleration
Build Configuration:
git clone https://gitlab.com/qemu-project/qemu.git
cd qemu
git submodule init
git submodule update --recursive
./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
make && make install
Reproduction:
Using virt-install:
/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name
'avocado-vt-vm1' --machine pseries --memory=32768
--vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
--os-variant rhel8.0 --serial pty --memballoon model=virtio --controller
type=scsi,model=virtio-scsi --disk
path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,bus=scsi,size=10,format=qcow2
--network=bridge=virbr0,model=virtio --boot
emulator=/usr/bin/qemu-system-ppc64
Result: Starting install...
<hangs indefinitely with no output>
Using direct QEMU command:
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm -enable-kvm -m 32768 -smp
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
Result: <hangs indefinitely with no output>
Analysis:
The commit introduces VM file descriptor change support with
architecture-specific hooks.
I attempted the following fixes without success:
1. Changed abort() to return 0; in stubs/kvm.c
2. Added early return in kvm_reset_vmfd() when
kvm_arch_supports_vmfd_change() returns false
Git Bisect Log:
# git bisect bad
98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
Author: Ani Sinha <anisinha@redhat.com>
Date: Wed Feb 25 09:19:10 2026 +0530
accel/kvm: add changes required to support KVM VM file descriptor
change
This change adds common kvm specific support to handle KVM VM file
descriptor
change. KVM VM file descriptor can change as a part of confidential
guest reset
mechanism. A new function api kvm_arch_on_vmfd_change() per
architecture platform is added in order to implement architecture
specific
changes required to support it. A subsequent patch will add x86
specific
implementation for kvm_arch_on_vmfd_change() as currently only x86
supports
confidential guest reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link:
https://lore.kernel.org/r/20260225035000.385950-6-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MAINTAINERS | 6 ++++++
accel/kvm/kvm-all.c | 88
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
accel/kvm/trace-events | 1 +
include/system/kvm.h | 3 +++
stubs/kvm.c | 22 ++++++++++++++++++++++
stubs/meson.build | 1 +
target/i386/kvm/kvm.c | 10 ++++++++++
7 files changed, 128 insertions(+), 3 deletions(-)
create mode 100644 stubs/kvm.c
# git bisect log
git bisect start
git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-06 10:52 [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c Misbah Anjum N
@ 2026-03-09 8:28 ` Misbah Anjum N
2026-03-09 11:04 ` Harsh Prateek Bora
0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-09 8:28 UTC (permalink / raw)
To: Anisinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin, harshpb
Hi Ani and Paolo,
Following up on my previous report, I've attempted additional debugging
to isolate the issue on ppc64le.
I implemented the architecture-specific hooks for ppc64le. After adding
the following changes and recompiling QEMU and testing with the direct
qemu-system-ppc64 command, the hang persists with the same issue - no
output and complete unresponsiveness.
Could you suggest what additional changes are needed to ensure the VM FD
change doesn't affect architectures that don't support this feature?
Tested with the following changes:
File: stubs/kvm.c
Changed the abort() call to return 0:
int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
{
return 0; / Changed from abort() */
}
File: target/ppc/kvm.c
Added the following stubs:
int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
{
/ ppc64le doesn't support VM FD changes for confidential guests */
return 0;
}
bool kvm_arch_supports_vmfd_change(void)
{
return false;
}
GDB Backtrace:
I ran QEMU under GDB to capture the hang state. The backtrace shows the
vCPU thread is waiting on a condition variable:
Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
__syscall_cancel_arch () at
../sysdeps/unix/sysv/linux/powerpc/syscall_cancel.S:77
#0 __syscall_cancel_arch () at
../sysdeps/unix/sysv/linux/powerpc/syscall_cancel.S:77
#1 0x00007ffff58a9678 in __internal_syscall_cancel (nr=221) at
cancellation.c:49
#2 0x00007ffff58aa220 in __futex_abstimed_wait_common64
(futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true)
at futex-internal.c:57
#3 __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0,
clockid=0, abstime=<optimized out>, private=0, cancel=true) at
futex-internal.c:87
#4 __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10,
expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
#5 0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0,
mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at
pthread_cond_wait.c:426
#6 ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at
pthread_cond_wait.c:458
#7 0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0,
mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472)
at ../util/qemu-thread-posix.c:240
#8 0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) at
../system/cpus.c:472
#9 0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at
../accel/kvm/kvm-accel-ops.c:50
#10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at
../util/qemu-thread-posix.c:414
#11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at
pthread_create.c:448
#12 0x00007ffff59555f8 in __GI___clone3 () at
../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S:114
Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>
On 2026-03-06 16:22, Misbah Anjum N wrote:
> Hi,
> I'm reporting a critical regression on ppc64le that causes all KVM
> guests to hang immediately during startup. Git bisect identified
> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
> commit. The commit completely breaks KVM functionality on ppc64le.
>
> Regression Details:
> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
> changes required to support KVM VM file descriptor change"
> Commit Link:
> https://gitlab.com/qemu-project/qemu/-/commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
>
> Environment:
> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
> Libvirt: 12.1.0
> Guest: Fedora 42, Kernel 7.0.0-rc2
> Machine Type: pseries with KVM acceleration
>
> Build Configuration:
> git clone https://gitlab.com/qemu-project/qemu.git
> cd qemu
> git submodule init
> git submodule update --recursive
> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
> make && make install
>
> Reproduction:
> Using virt-install:
> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
> --name 'avocado-vt-vm1' --machine pseries --memory=32768
> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
> --os-variant rhel8.0 --serial pty --memballoon model=virtio
> --controller type=scsi,model=virtio-scsi --disk
> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,bus=scsi,size=10,format=qcow2
> --network=bridge=virbr0,model=virtio --boot
> emulator=/usr/bin/qemu-system-ppc64
> Result: Starting install...
> <hangs indefinitely with no output>
>
> Using direct QEMU command:
> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
> pseries,accel=kvm -enable-kvm -m 32768 -smp
> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> Result: <hangs indefinitely with no output>
>
> Analysis:
> The commit introduces VM file descriptor change support with
> architecture-specific hooks.
> I attempted the following fixes without success:
> 1. Changed abort() to return 0; in stubs/kvm.c
> 2. Added early return in kvm_reset_vmfd() when
> kvm_arch_supports_vmfd_change() returns false
>
> Git Bisect Log:
> # git bisect bad
> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
> Author: Ani Sinha <anisinha@redhat.com>
> Date: Wed Feb 25 09:19:10 2026 +0530
>
> accel/kvm: add changes required to support KVM VM file descriptor
> change
>
> This change adds common kvm specific support to handle KVM VM file
> descriptor
> change. KVM VM file descriptor can change as a part of
> confidential guest reset
> mechanism. A new function api kvm_arch_on_vmfd_change() per
> architecture platform is added in order to implement architecture
> specific
> changes required to support it. A subsequent patch will add x86
> specific
> implementation for kvm_arch_on_vmfd_change() as currently only x86
> supports
> confidential guest reset.
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> Link:
> https://lore.kernel.org/r/20260225035000.385950-6-anisinha@redhat.com
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> MAINTAINERS | 6 ++++++
> accel/kvm/kvm-all.c | 88
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
> accel/kvm/trace-events | 1 +
> include/system/kvm.h | 3 +++
> stubs/kvm.c | 22 ++++++++++++++++++++++
> stubs/meson.build | 1 +
> target/i386/kvm/kvm.c | 10 ++++++++++
> 7 files changed, 128 insertions(+), 3 deletions(-)
> create mode 100644 stubs/kvm.c
>
> # git bisect log
> git bisect start
> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
>
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-09 8:28 ` Misbah Anjum N
@ 2026-03-09 11:04 ` Harsh Prateek Bora
2026-03-09 13:11 ` Ani Sinha
2026-03-09 13:30 ` Ani Sinha
0 siblings, 2 replies; 19+ messages in thread
From: Harsh Prateek Bora @ 2026-03-09 11:04 UTC (permalink / raw)
To: Misbah Anjum N, Anisinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin
Hi Ani, Paolo,
I think the problem lies here:
For archs which doesnt support vm fd change, we are baling out as below
in kvm_reset_vmfd.
/*
* bail if the current architecture does not support VM file
* descriptor change.
*/
if (!kvm_arch_supports_vmfd_change()) {
error_report("This target architecture does not support KVM VM "
"file descriptor change.");
return -EOPNOTSUPP;
}
However, when rebuild_guest (kvm_reset_vmfd) is called in
qemu_system_reset here:
if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
(current_machine->new_accel_vmfd_on_reset ||
!cpus_are_resettable())) {
if (ac->rebuild_guest) {
ret = ac->rebuild_guest(current_machine);
if (ret < 0) {
error_report("unable to rebuild guest: %s(%d)",
strerror(-ret), ret);
vm_stop(RUN_STATE_INTERNAL_ERROR);
} else {
info_report("virtual machine state has been rebuilt
with new "
"guest file handle.");
guest_state_rebuilt = true;
}
} else if (!cpus_are_resettable()) {
error_report("accelerator does not support reset!");
} else {
error_report("accelerator does not support rebuilding guest
state,"
" proceeding with normal reset!");
}
}
it just does a vm_stop if rebuild_guest returns < 0.
IMHO, This should handle -EOPNOTSUPP gracefully.
Please advise if this needs to be taken care differently?
regards,
Harsh
On 09/03/26 1:58 pm, Misbah Anjum N wrote:
> Hi Ani and Paolo,
> Following up on my previous report, I've attempted additional debugging
> to isolate the issue on ppc64le.
>
> I implemented the architecture-specific hooks for ppc64le. After adding
> the following changes and recompiling QEMU and testing with the direct
> qemu-system-ppc64 command, the hang persists with the same issue - no
> output and complete unresponsiveness.
>
> Could you suggest what additional changes are needed to ensure the VM FD
> change doesn't affect architectures that don't support this feature?
>
> Tested with the following changes:
> File: stubs/kvm.c
> Changed the abort() call to return 0:
> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
> {
> return 0; / Changed from abort() */
> }
>
> File: target/ppc/kvm.c
> Added the following stubs:
> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
> {
> / ppc64le doesn't support VM FD changes for confidential guests */
> return 0;
> }
>
> bool kvm_arch_supports_vmfd_change(void)
> {
> return false;
> }
>
> GDB Backtrace:
> I ran QEMU under GDB to capture the hang state. The backtrace shows the
> vCPU thread is waiting on a condition variable:
>
> Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
> __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/
> syscall_cancel.S:77
> #0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/
> syscall_cancel.S:77
> #1 0x00007ffff58a9678 in __internal_syscall_cancel (nr=221) at
> cancellation.c:49
> #2 0x00007ffff58aa220 in __futex_abstimed_wait_common64
> (futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true)
> at futex-internal.c:57
> #3 __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0,
> clockid=0, abstime=<optimized out>, private=0, cancel=true) at futex-
> internal.c:87
> #4 __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10,
> expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
> #5 0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0,
> mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at pthread_cond_wait.c:426
> #6 ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at
> pthread_cond_wait.c:458
> #7 0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0,
> mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472)
> at ../util/qemu-thread-posix.c:240
> #8 0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260)
> at ../system/cpus.c:472
> #9 0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at ../
> accel/kvm/kvm-accel-ops.c:50
> #10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at ../
> util/qemu-thread-posix.c:414
> #11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at
> pthread_create.c:448
> #12 0x00007ffff59555f8 in __GI___clone3 () at ../sysdeps/unix/sysv/
> linux/powerpc/powerpc64/clone3.S:114
>
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>
>
>
> On 2026-03-06 16:22, Misbah Anjum N wrote:
>> Hi,
>> I'm reporting a critical regression on ppc64le that causes all KVM
>> guests to hang immediately during startup. Git bisect identified
>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
>> commit. The commit completely breaks KVM functionality on ppc64le.
>>
>> Regression Details:
>> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
>> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
>> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
>> changes required to support KVM VM file descriptor change"
>> Commit Link:
>> https://gitlab.com/qemu-project/qemu/-/
>> commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>
>> Environment:
>> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
>> Libvirt: 12.1.0
>> Guest: Fedora 42, Kernel 7.0.0-rc2
>> Machine Type: pseries with KVM acceleration
>>
>> Build Configuration:
>> git clone https://gitlab.com/qemu-project/qemu.git
>> cd qemu
>> git submodule init
>> git submodule update --recursive
>> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
>> make && make install
>>
>> Reproduction:
>> Using virt-install:
>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
>> --controller type=scsi,model=virtio-scsi --disk
>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-
>> ppc64le.qcow2,bus=scsi,size=10,format=qcow2
>> --network=bridge=virbr0,model=virtio --boot
>> emulator=/usr/bin/qemu-system-ppc64
>> Result: Starting install...
>> <hangs indefinitely with no output>
>>
>> Using direct QEMU command:
>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>> pseries,accel=kvm -enable-kvm -m 32768 -smp
>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-
>> ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>> Result: <hangs indefinitely with no output>
>>
>> Analysis:
>> The commit introduces VM file descriptor change support with
>> architecture-specific hooks.
>> I attempted the following fixes without success:
>> 1. Changed abort() to return 0; in stubs/kvm.c
>> 2. Added early return in kvm_reset_vmfd() when
>> kvm_arch_supports_vmfd_change() returns false
>>
>> Git Bisect Log:
>> # git bisect bad
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
>> Author: Ani Sinha <anisinha@redhat.com>
>> Date: Wed Feb 25 09:19:10 2026 +0530
>>
>> accel/kvm: add changes required to support KVM VM file descriptor
>> change
>>
>> This change adds common kvm specific support to handle KVM VM file
>> descriptor
>> change. KVM VM file descriptor can change as a part of
>> confidential guest reset
>> mechanism. A new function api kvm_arch_on_vmfd_change() per
>> architecture platform is added in order to implement architecture
>> specific
>> changes required to support it. A subsequent patch will add x86
>> specific
>> implementation for kvm_arch_on_vmfd_change() as currently only x86
>> supports
>> confidential guest reset.
>>
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> Link: https://lore.kernel.org/r/20260225035000.385950-6-
>> anisinha@redhat.com
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> MAINTAINERS | 6 ++++++
>> accel/kvm/kvm-all.c | 88
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> +++++++++++++++---
>> accel/kvm/trace-events | 1 +
>> include/system/kvm.h | 3 +++
>> stubs/kvm.c | 22 ++++++++++++++++++++++
>> stubs/meson.build | 1 +
>> target/i386/kvm/kvm.c | 10 ++++++++++
>> 7 files changed, 128 insertions(+), 3 deletions(-)
>> create mode 100644 stubs/kvm.c
>>
>> # git bisect log
>> git bisect start
>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
>>
>> Thanks,
>> Misbah Anjum N <misanjum@linux.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-09 11:04 ` Harsh Prateek Bora
@ 2026-03-09 13:11 ` Ani Sinha
2026-03-09 13:23 ` Ani Sinha
2026-03-09 13:30 ` Ani Sinha
1 sibling, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-09 13:11 UTC (permalink / raw)
To: Harsh Prateek Bora
Cc: Misbah Anjum N, Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin
> On 9 Mar 2026, at 4:34 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>
> Hi Ani, Paolo,
>
> I think the problem lies here:
>
> For archs which doesnt support vm fd change, we are baling out as below in kvm_reset_vmfd.
>
>
> /*
> * bail if the current architecture does not support VM file
> * descriptor change.
> */
> if (!kvm_arch_supports_vmfd_change()) {
> error_report("This target architecture does not support KVM VM "
> "file descriptor change.");
> return -EOPNOTSUPP;
> }
>
> However, when rebuild_guest (kvm_reset_vmfd) is called in
> qemu_system_reset here:
>
> if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
> reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
> (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
> if (ac->rebuild_guest) {
> ret = ac->rebuild_guest(current_machine);
> if (ret < 0) {
> error_report("unable to rebuild guest: %s(%d)",
> strerror(-ret), ret);
> vm_stop(RUN_STATE_INTERNAL_ERROR);
> } else {
> info_report("virtual machine state has been rebuilt with new "
> "guest file handle.");
> guest_state_rebuilt = true;
> }
> } else if (!cpus_are_resettable()) {
> error_report("accelerator does not support reset!");
> } else {
> error_report("accelerator does not support rebuilding guest state,"
> " proceeding with normal reset!");
> }
> }
>
>
> it just does a vm_stop if rebuild_guest returns < 0.
>
> IMHO, This should handle -EOPNOTSUPP gracefully.
> Please advise if this needs to be taken care differently?
Is this a confidential guest that cannot be normally reset?
>
> regards,
> Harsh
>
> On 09/03/26 1:58 pm, Misbah Anjum N wrote:
>> Hi Ani and Paolo,
>> Following up on my previous report, I've attempted additional debugging to isolate the issue on ppc64le.
>> I implemented the architecture-specific hooks for ppc64le. After adding the following changes and recompiling QEMU and testing with the direct qemu-system-ppc64 command, the hang persists with the same issue - no output and complete unresponsiveness.
>> Could you suggest what additional changes are needed to ensure the VM FD change doesn't affect architectures that don't support this feature?
>> Tested with the following changes:
>> File: stubs/kvm.c
>> Changed the abort() call to return 0:
>> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
>> {
>> return 0; / Changed from abort() */
>> }
>> File: target/ppc/kvm.c
>> Added the following stubs:
>> int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
>> {
>> / ppc64le doesn't support VM FD changes for confidential guests */
>> return 0;
>> }
>> bool kvm_arch_supports_vmfd_change(void)
>> {
>> return false;
>> }
>> GDB Backtrace:
>> I ran QEMU under GDB to capture the hang state. The backtrace shows the vCPU thread is waiting on a condition variable:
>> Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
>> __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ syscall_cancel.S:77
>> #0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ syscall_cancel.S:77
>> #1 0x00007ffff58a9678 in __internal_syscall_cancel (nr=221) at cancellation.c:49
>> #2 0x00007ffff58aa220 in __futex_abstimed_wait_common64 (futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true) at futex-internal.c:57
>> #3 __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0, clockid=0, abstime=<optimized out>, private=0, cancel=true) at futex- internal.c:87
>> #4 __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10, expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
>> #5 0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at pthread_cond_wait.c:426
>> #6 ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at pthread_cond_wait.c:458
>> #7 0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472) at ../util/qemu-thread-posix.c:240
>> #8 0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) at ../system/cpus.c:472
>> #9 0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at ../ accel/kvm/kvm-accel-ops.c:50
>> #10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at ../ util/qemu-thread-posix.c:414
>> #11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at pthread_create.c:448
>> #12 0x00007ffff59555f8 in __GI___clone3 () at ../sysdeps/unix/sysv/ linux/powerpc/powerpc64/clone3.S:114
>> Thanks,
>> Misbah Anjum N <misanjum@linux.ibm.com>
>> On 2026-03-06 16:22, Misbah Anjum N wrote:
>>> Hi,
>>> I'm reporting a critical regression on ppc64le that causes all KVM
>>> guests to hang immediately during startup. Git bisect identified
>>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
>>> commit. The commit completely breaks KVM functionality on ppc64le.
>>>
>>> Regression Details:
>>> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
>>> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
>>> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
>>> changes required to support KVM VM file descriptor change"
>>> Commit Link:
>>> https://gitlab.com/qemu-project/qemu/-/ commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>>
>>> Environment:
>>> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
>>> Libvirt: 12.1.0
>>> Guest: Fedora 42, Kernel 7.0.0-rc2
>>> Machine Type: pseries with KVM acceleration
>>>
>>> Build Configuration:
>>> git clone https://gitlab.com/qemu-project/qemu.git
>>> cd qemu
>>> git submodule init
>>> git submodule update --recursive
>>> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
>>> make && make install
>>>
>>> Reproduction:
>>> Using virt-install:
>>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
>>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
>>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
>>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
>>> --controller type=scsi,model=virtio-scsi --disk
>>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,bus=scsi,size=10,format=qcow2
>>> --network=bridge=virbr0,model=virtio --boot
>>> emulator=/usr/bin/qemu-system-ppc64
>>> Result: Starting install...
>>> <hangs indefinitely with no output>
>>>
>>> Using direct QEMU command:
>>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>>> pseries,accel=kvm -enable-kvm -m 32768 -smp
>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>> Result: <hangs indefinitely with no output>
>>>
>>> Analysis:
>>> The commit introduces VM file descriptor change support with
>>> architecture-specific hooks.
>>> I attempted the following fixes without success:
>>> 1. Changed abort() to return 0; in stubs/kvm.c
>>> 2. Added early return in kvm_reset_vmfd() when
>>> kvm_arch_supports_vmfd_change() returns false
>>>
>>> Git Bisect Log:
>>> # git bisect bad
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
>>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
>>> Author: Ani Sinha <anisinha@redhat.com>
>>> Date: Wed Feb 25 09:19:10 2026 +0530
>>>
>>> accel/kvm: add changes required to support KVM VM file descriptor change
>>>
>>> This change adds common kvm specific support to handle KVM VM file
>>> descriptor
>>> change. KVM VM file descriptor can change as a part of
>>> confidential guest reset
>>> mechanism. A new function api kvm_arch_on_vmfd_change() per
>>> architecture platform is added in order to implement architecture specific
>>> changes required to support it. A subsequent patch will add x86 specific
>>> implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
>>> confidential guest reset.
>>>
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> Link: https://lore.kernel.org/r/20260225035000.385950-6- anisinha@redhat.com
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>> MAINTAINERS | 6 ++++++
>>> accel/kvm/kvm-all.c | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++---
>>> accel/kvm/trace-events | 1 +
>>> include/system/kvm.h | 3 +++
>>> stubs/kvm.c | 22 ++++++++++++++++++++++
>>> stubs/meson.build | 1 +
>>> target/i386/kvm/kvm.c | 10 ++++++++++
>>> 7 files changed, 128 insertions(+), 3 deletions(-)
>>> create mode 100644 stubs/kvm.c
>>>
>>> # git bisect log
>>> git bisect start
>>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
>>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
>>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
>>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
>>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
>>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
>>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
>>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
>>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
>>>
>>> Thanks,
>>> Misbah Anjum N <misanjum@linux.ibm.com>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-09 13:11 ` Ani Sinha
@ 2026-03-09 13:23 ` Ani Sinha
2026-03-10 8:39 ` Misbah Anjum N
0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-09 13:23 UTC (permalink / raw)
To: Harsh Prateek Bora
Cc: Misbah Anjum N, Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin
[-- Attachment #1: Type: text/plain, Size: 10021 bytes --]
On Mon, 9 Mar 2026, Ani Sinha wrote:
>
>
> > On 9 Mar 2026, at 4:34 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> >
> > Hi Ani, Paolo,
> >
> > I think the problem lies here:
> >
> > For archs which doesnt support vm fd change, we are baling out as below in kvm_reset_vmfd.
> >
> >
> > /*
> > * bail if the current architecture does not support VM file
> > * descriptor change.
> > */
> > if (!kvm_arch_supports_vmfd_change()) {
> > error_report("This target architecture does not support KVM VM "
> > "file descriptor change.");
> > return -EOPNOTSUPP;
> > }
> >
> > However, when rebuild_guest (kvm_reset_vmfd) is called in
> > qemu_system_reset here:
> >
> > if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
> > reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
> > (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
> > if (ac->rebuild_guest) {
> > ret = ac->rebuild_guest(current_machine);
> > if (ret < 0) {
> > error_report("unable to rebuild guest: %s(%d)",
> > strerror(-ret), ret);
> > vm_stop(RUN_STATE_INTERNAL_ERROR);
> > } else {
> > info_report("virtual machine state has been rebuilt with new "
> > "guest file handle.");
> > guest_state_rebuilt = true;
> > }
> > } else if (!cpus_are_resettable()) {
> > error_report("accelerator does not support reset!");
> > } else {
> > error_report("accelerator does not support rebuilding guest state,"
> > " proceeding with normal reset!");
> > }
> > }
> >
> >
> > it just does a vm_stop if rebuild_guest returns < 0.
> >
> > IMHO, This should handle -EOPNOTSUPP gracefully.
> > Please advise if this needs to be taken care differently?
Yes seems this is an issue and I will fix it. Not sure if the fix will
address your issue though ...
Can you try the following patch?
From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
From: Ani Sinha <anisinha@redhat.com>
Date: Mon, 9 Mar 2026 18:44:40 +0530
Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
system/runstate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/system/runstate.c b/system/runstate.c
index eca722b43c..c1f41284c9 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
(current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
if (ac->rebuild_guest) {
ret = ac->rebuild_guest(current_machine);
- if (ret < 0) {
+ if (ret < 0 && ret != -EOPNOTSUPP) {
error_report("unable to rebuild guest: %s(%d)",
strerror(-ret), ret);
vm_stop(RUN_STATE_INTERNAL_ERROR);
+ } else if (ret == -EOPNOTSUPP) {
+ error_report("accelerator does not support reset!");
} else {
info_report("virtual machine state has been rebuilt with new "
"guest file handle.");
--
2.42.0
>
> Is this a confidential guest that cannot be normally reset?
>
> >> #2 0x00007ffff58aa220 in __futex_abstimed_wait_common64 (futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true) at futex-internal.c:57
> >> #3 __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0, clockid=0, abstime=<optimized out>, private=0, cancel=true) at futex- internal.c:87
> >> #4 __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10, expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139
> >> #5 0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at pthread_cond_wait.c:426
> >> #6 ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at pthread_cond_wait.c:458
> >> #7 0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472) at ../util/qemu-thread-posix.c:240
> >> #8 0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) at ../system/cpus.c:472
> >> #9 0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at ../ accel/kvm/kvm-accel-ops.c:50
> >> #10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at ../ util/qemu-thread-posix.c:414
> >> #11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at pthread_create.c:448
> >> #12 0x00007ffff59555f8 in __GI___clone3 () at ../sysdeps/unix/sysv/ linux/powerpc/powerpc64/clone3.S:114
> >> Thanks,
> >> Misbah Anjum N <misanjum@linux.ibm.com>
> >> On 2026-03-06 16:22, Misbah Anjum N wrote:
> >>> Hi,
> >>> I'm reporting a critical regression on ppc64le that causes all KVM
> >>> guests to hang immediately during startup. Git bisect identified
> >>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
> >>> commit. The commit completely breaks KVM functionality on ppc64le.
> >>>
> >>> Regression Details:
> >>> Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
> >>> Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
> >>> Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
> >>> changes required to support KVM VM file descriptor change"
> >>> Commit Link:
> >>> https://gitlab.com/qemu-project/qemu/-/ commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a
> >>>
> >>> Environment:
> >>> Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
> >>> Libvirt: 12.1.0
> >>> Guest: Fedora 42, Kernel 7.0.0-rc2
> >>> Machine Type: pseries with KVM acceleration
> >>>
> >>> Build Configuration:
> >>> git clone https://gitlab.com/qemu-project/qemu.git
> >>> cd qemu
> >>> git submodule init
> >>> git submodule update --recursive
> >>> ./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
> >>> make && make install
> >>>
> >>> Reproduction:
> >>> Using virt-install:
> >>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
> >>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
> >>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
> >>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
> >>> --controller type=scsi,model=virtio-scsi --disk
> >>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,bus=scsi,size=10,format=qcow2
> >>> --network=bridge=virbr0,model=virtio --boot
> >>> emulator=/usr/bin/qemu-system-ppc64
> >>> Result: Starting install...
> >>> <hangs indefinitely with no output>
> >>>
> >>> Using direct QEMU command:
> >>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
> >>> pseries,accel=kvm -enable-kvm -m 32768 -smp
> >>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
> >>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
> >>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
> >>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
> >>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> >>> Result: <hangs indefinitely with no output>
> >>>
> >>> Analysis:
> >>> The commit introduces VM file descriptor change support with
> >>> architecture-specific hooks.
> >>> I attempted the following fixes without success:
> >>> 1. Changed abort() to return 0; in stubs/kvm.c
> >>> 2. Added early return in kvm_reset_vmfd() when
> >>> kvm_arch_supports_vmfd_change() returns false
> >>>
> >>> Git Bisect Log:
> >>> # git bisect bad
> >>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
> >>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
> >>> Author: Ani Sinha <anisinha@redhat.com>
> >>> Date: Wed Feb 25 09:19:10 2026 +0530
> >>>
> >>> accel/kvm: add changes required to support KVM VM file descriptor change
> >>>
> >>> This change adds common kvm specific support to handle KVM VM file
> >>> descriptor
> >>> change. KVM VM file descriptor can change as a part of
> >>> confidential guest reset
> >>> mechanism. A new function api kvm_arch_on_vmfd_change() per
> >>> architecture platform is added in order to implement architecture specific
> >>> changes required to support it. A subsequent patch will add x86 specific
> >>> implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
> >>> confidential guest reset.
> >>>
> >>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>> Link: https://lore.kernel.org/r/20260225035000.385950-6- anisinha@redhat.com
> >>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>
> >>> MAINTAINERS | 6 ++++++
> >>> accel/kvm/kvm-all.c | 88
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++---
> >>> accel/kvm/trace-events | 1 +
> >>> include/system/kvm.h | 3 +++
> >>> stubs/kvm.c | 22 ++++++++++++++++++++++
> >>> stubs/meson.build | 1 +
> >>> target/i386/kvm/kvm.c | 10 ++++++++++
> >>> 7 files changed, 128 insertions(+), 3 deletions(-)
> >>> create mode 100644 stubs/kvm.c
> >>>
> >>> # git bisect log
> >>> git bisect start
> >>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
> >>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
> >>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
> >>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
> >>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
> >>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
> >>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
> >>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
> >>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
> >>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
> >>>
> >>> Thanks,
> >>> Misbah Anjum N <misanjum@linux.ibm.com>
> >
>
>
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-09 11:04 ` Harsh Prateek Bora
2026-03-09 13:11 ` Ani Sinha
@ 2026-03-09 13:30 ` Ani Sinha
1 sibling, 0 replies; 19+ messages in thread
From: Ani Sinha @ 2026-03-09 13:30 UTC (permalink / raw)
To: Harsh Prateek Bora
Cc: Misbah Anjum N, Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin
> On 9 Mar 2026, at 4:34 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>
> Hi Ani, Paolo,
>
> I think the problem lies here:
>
> For archs which doesnt support vm fd change, we are baling out as below in kvm_reset_vmfd.
>
>
> /*
> * bail if the current architecture does not support VM file
> * descriptor change.
> */
> if (!kvm_arch_supports_vmfd_change()) {
> error_report("This target architecture does not support KVM VM "
> "file descriptor change.");
> return -EOPNOTSUPP;
> }
>
> However, when rebuild_guest (kvm_reset_vmfd) is called in
> qemu_system_reset here:
>
> if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
> reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
> (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
This entire block is only executed if either you manually enable new_accel_vmfd_on_reset or cpus are not resettable.
From looking at the code
$ git grep kvm_mark_guest_state_protected
accel/kvm/kvm-all.c:void kvm_mark_guest_state_protected(void)
include/system/kvm.h:void kvm_mark_guest_state_protected(void);
target/i386/kvm/tdx.c: kvm_mark_guest_state_protected();
target/i386/sev.c: kvm_mark_guest_state_protected();
target/i386/sev.c: kvm_mark_guest_state_protected();
Seems only sev and tax makes the cpus non-resettable.
> if (ac->rebuild_guest) {
> ret = ac->rebuild_guest(current_machine);
> if (ret < 0) {
> error_report("unable to rebuild guest: %s(%d)",
> strerror(-ret), ret);
> vm_stop(RUN_STATE_INTERNAL_ERROR);
> } else {
> info_report("virtual machine state has been rebuilt with new "
> "guest file handle.");
> guest_state_rebuilt = true;
> }
> } else if (!cpus_are_resettable()) {
> error_report("accelerator does not support reset!");
> } else {
> error_report("accelerator does not support rebuilding guest state,"
> " proceeding with normal reset!");
> }
> }
>
>>>
<anip>
>>> Reproduction:
>>> Using virt-install:
>>> /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
>>> --name 'avocado-vt-vm1' --machine pseries --memory=32768
>>> --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
>>> --os-variant rhel8.0 --serial pty --memballoon model=virtio
>>> --controller type=scsi,model=virtio-scsi --disk
>>> path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,bus=scsi,size=10,format=qcow2
>>> --network=bridge=virbr0,model=virtio --boot
>>> emulator=/usr/bin/qemu-system-ppc64
>>> Result: Starting install...
>>> <hangs indefinitely with no output>
>>>
>>> Using direct QEMU command:
>>> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>>> pseries,accel=kvm -enable-kvm -m 32768 -smp
>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
Hmm, this command line does not seems to indicate its a confidential vm or you enable that debug flag.
>>> Result: <hangs indefinitely with no output>
>>>
>>> Analysis:
>>> The commit introduces VM file descriptor change support with
>>> architecture-specific hooks.
>>> I attempted the following fixes without success:
>>> 1. Changed abort() to return 0; in stubs/kvm.c
>>> 2. Added early return in kvm_reset_vmfd() when
>>> kvm_arch_supports_vmfd_change() returns false
>>>
>>> Git Bisect Log:
>>> # git bisect bad
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
>>> commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
>>> Author: Ani Sinha <anisinha@redhat.com>
>>> Date: Wed Feb 25 09:19:10 2026 +0530
>>>
>>> accel/kvm: add changes required to support KVM VM file descriptor change
>>>
>>> This change adds common kvm specific support to handle KVM VM file
>>> descriptor
>>> change. KVM VM file descriptor can change as a part of
>>> confidential guest reset
>>> mechanism. A new function api kvm_arch_on_vmfd_change() per
>>> architecture platform is added in order to implement architecture specific
>>> changes required to support it. A subsequent patch will add x86 specific
>>> implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
>>> confidential guest reset.
>>>
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> Link: https://lore.kernel.org/r/20260225035000.385950-6- anisinha@redhat.com
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>> MAINTAINERS | 6 ++++++
>>> accel/kvm/kvm-all.c | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++---
>>> accel/kvm/trace-events | 1 +
>>> include/system/kvm.h | 3 +++
>>> stubs/kvm.c | 22 ++++++++++++++++++++++
>>> stubs/meson.build | 1 +
>>> target/i386/kvm/kvm.c | 10 ++++++++++
>>> 7 files changed, 128 insertions(+), 3 deletions(-)
>>> create mode 100644 stubs/kvm.c
>>>
>>> # git bisect log
>>> git bisect start
>>> git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
>>> git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
>>> git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
>>> git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
>>> git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
>>> git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
>>> git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
>>> git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
>>> git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
>>> first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]
>>>
>>> Thanks,
>>> Misbah Anjum N <misanjum@linux.ibm.com>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-09 13:23 ` Ani Sinha
@ 2026-03-10 8:39 ` Misbah Anjum N
2026-03-10 8:54 ` Ani Sinha
0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-10 8:39 UTC (permalink / raw)
To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin, Harshpb
Hi Ani and Paolo,
We have tested the code by applying both the original commit
(98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit
9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
However, the issue persists. We've conducted GDB debugging that shows
the hang is occurring in a different location than what the fix
addresses.
Since the original patch is breaking KVM guest bringup completely on
ppc64le, and the fix patch does not resolve the issue, given the
severity of this regression (complete KVM breakage on ppc64le), we
should either find a quick fix or consider reverting the patch until a
proper solution can be identified.
Analysis:
1. This is not a confidential guest. This is a regular KVM guest running
on ppc64le.
2. The execution flow shows that qemu_system_reset() completes
successfully and never enters the code path at line 529-543
3. The hang occurs later in qemu_default_main() at system/main.c:49,
after calling bql_lock()
4. The ppc KVM guest boots fine with the previous commit -
df8df3cb6b743372ebb335bd8404bc3d748da350
5. This suggests the issue is not with error handling of -EOPNOTSUPP
during reset, but bql_lock() getting stuck in qemu_default_main()
GDB Trace Analysis:
We set breakpoints at qemu_system_reset() and qemu_default_main() to
trace the execution flow. The system successfully completes
qemu_system_reset() without entering the problematic code path where the
fix provided by you applies (system/runstate.c:529-543).
# gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm -enable-kvm -m 32768 -smp
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
(gdb) handle SIGUSR1 pass nostop noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) b qemu_system_reset
Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
(gdb) b qemu_default_main
Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
(gdb) r
Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1
-machine pseries,accel=kvm -enable-kvm -m 32768 -smp
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset
(reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
513 AccelClass *ac = ACCEL_GET_CLASS(current_accel());
(gdb) n
517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) :
NULL;
(gdb) n
519 cpu_synchronize_all_states();
(gdb) n
521 switch (reason) {
(gdb) n
529 if (!cpus_are_resettable() &&
(gdb) n
553 if (mc && mc->reset) {
(gdb) n
554 mc->reset(current_machine, type);
(gdb) n
558 switch (reason) {
(gdb) n
574 if (cpus_are_resettable()) {
(gdb) n
583 cpu_synchronize_all_post_reset();
(gdb) n
587 vm_set_suspended(false);
(gdb) n
qdev_machine_creation_done () at ../hw/core/machine.c:1814
1814 register_global_state();
(gdb) n
qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at
../system/vl.c:2785
2785 if (machine->cgs && !machine->cgs->ready) {
(gdb) n
2791 foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
(gdb) n
2793 if (!vga_interface_created && !default_vga &&
(gdb) n
qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at
../system/vl.c:2815
2815 if (loadvm) {
(gdb) n
2820 if (replay_mode != REPLAY_MODE_NONE) {
(gdb) n
2824 if (incoming) {
(gdb) n
2837 } else if (autostart) {
(gdb) n
2838 qmp_cont(NULL);
(gdb) n
qemu_init (argc=<optimized out>, argv=<optimized out>) at
../system/vl.c:3849
3849 qemu_init_displays();
(gdb) n
3850 accel_setup_post(current_machine);
(gdb) n
3851 if (migrate_mode() != MIG_MODE_CPR_EXEC) {
(gdb) n
3852 os_setup_post();
(gdb) n
3854 resume_mux_open();
(gdb) n
main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
84 bql_unlock();
(gdb) n
85 replay_mutex_unlock();
(gdb) n
87 if (qemu_main) {
(gdb) n
93 qemu_default_main(NULL);
(gdb) n
Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main
(opaque=opaque@entry=0x0) at ../system/main.c:48
48 replay_mutex_lock();
(gdb) n
49 bql_lock();
(gdb) n
<hangs>
<system becomes unresponsive at this point>
Thanks,
Misbah Anjum N <misanjumn@ibm.com>
On 2026-03-09 18:53, Ani Sinha wrote:
> Yes seems this is an issue and I will fix it. Not sure if the fix will
> address your issue though ...
>
> Can you try the following patch?
>
> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
> From: Ani Sinha <anisinha@redhat.com>
> Date: Mon, 9 Mar 2026 18:44:40 +0530
> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset
> yet
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
> system/runstate.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/system/runstate.c b/system/runstate.c
> index eca722b43c..c1f41284c9 100644
> --- a/system/runstate.c
> +++ b/system/runstate.c
> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
> (current_machine->new_accel_vmfd_on_reset ||
> !cpus_are_resettable())) {
> if (ac->rebuild_guest) {
> ret = ac->rebuild_guest(current_machine);
> - if (ret < 0) {
> + if (ret < 0 && ret != -EOPNOTSUPP) {
> error_report("unable to rebuild guest: %s(%d)",
> strerror(-ret), ret);
> vm_stop(RUN_STATE_INTERNAL_ERROR);
> + } else if (ret == -EOPNOTSUPP) {
> + error_report("accelerator does not support reset!");
> } else {
> info_report("virtual machine state has been rebuilt
> with new "
> "guest file handle.");
> --
> 2.42.0
>
>
>>
>> Is this a confidential guest that cannot be normally reset?
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-10 8:39 ` Misbah Anjum N
@ 2026-03-10 8:54 ` Ani Sinha
2026-03-10 9:08 ` Misbah Anjum N
0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-10 8:54 UTC (permalink / raw)
To: Misbah Anjum N
Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora
> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>
> Hi Ani and Paolo,
>
> We have tested the code by applying both the original commit (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
> However, the issue persists. We've conducted GDB debugging that shows the hang is occurring in a different location than what the fix addresses.
>
> Since the original patch is breaking KVM guest bringup completely on ppc64le, and the fix patch does not resolve the issue, given the severity of this regression (complete KVM breakage on ppc64le), we should either find a quick fix or consider reverting the patch until a proper solution can be identified.
Based on what you just described, it does not seem like the issue is related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you revert this patch in your local tree, can you confirm that your issue gets fixed?
>
> Analysis:
> 1. This is not a confidential guest. This is a regular KVM guest running on ppc64le.
> 2. The execution flow shows that qemu_system_reset() completes successfully and never enters the code path at line 529-543
This is what I expected and therefore, no code related to coco guest rebuilding is getting executed. Your issue seems to be somewhere else.
> 3. The hang occurs later in qemu_default_main() at system/main.c:49, after calling bql_lock()
> 4. The ppc KVM guest boots fine with the previous commit - df8df3cb6b743372ebb335bd8404bc3d748da350
> 5. This suggests the issue is not with error handling of -EOPNOTSUPP during reset, but bql_lock() getting stuck in qemu_default_main()
>
> GDB Trace Analysis:
> We set breakpoints at qemu_system_reset() and qemu_default_main() to trace the execution flow. The system successfully completes qemu_system_reset() without entering the problematic code path where the fix provided by you applies (system/runstate.c:529-543).
>
> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>
> (gdb) handle SIGUSR1 pass nostop noprint
> Signal Stop Print Pass to program Description
> SIGUSR1 No No Yes User defined signal 1
> (gdb) b qemu_system_reset
> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
> (gdb) b qemu_default_main
> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
> (gdb) r
>
> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>
> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
> 513 AccelClass *ac = ACCEL_GET_CLASS(current_accel());
> (gdb) n
> 517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
> (gdb) n
> 519 cpu_synchronize_all_states();
> (gdb) n
> 521 switch (reason) {
> (gdb) n
> 529 if (!cpus_are_resettable() &&
> (gdb) n
> 553 if (mc && mc->reset) {
> (gdb) n
> 554 mc->reset(current_machine, type);
> (gdb) n
> 558 switch (reason) {
> (gdb) n
> 574 if (cpus_are_resettable()) {
> (gdb) n
> 583 cpu_synchronize_all_post_reset();
> (gdb) n
> 587 vm_set_suspended(false);
> (gdb) n
> qdev_machine_creation_done () at ../hw/core/machine.c:1814
> 1814 register_global_state();
> (gdb) n
> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at ../system/vl.c:2785
> 2785 if (machine->cgs && !machine->cgs->ready) {
> (gdb) n
> 2791 foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
> (gdb) n
> 2793 if (!vga_interface_created && !default_vga &&
> (gdb) n
> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at ../system/vl.c:2815
> 2815 if (loadvm) {
> (gdb) n
> 2820 if (replay_mode != REPLAY_MODE_NONE) {
> (gdb) n
> 2824 if (incoming) {
> (gdb) n
> 2837 } else if (autostart) {
> (gdb) n
> 2838 qmp_cont(NULL);
> (gdb) n
> qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3849
> 3849 qemu_init_displays();
> (gdb) n
> 3850 accel_setup_post(current_machine);
> (gdb) n
> 3851 if (migrate_mode() != MIG_MODE_CPR_EXEC) {
> (gdb) n
> 3852 os_setup_post();
> (gdb) n
> 3854 resume_mux_open();
> (gdb) n
> main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
> 84 bql_unlock();
> (gdb) n
> 85 replay_mutex_unlock();
> (gdb) n
> 87 if (qemu_main) {
> (gdb) n
> 93 qemu_default_main(NULL);
> (gdb) n
>
> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main (opaque=opaque@entry=0x0) at ../system/main.c:48
> 48 replay_mutex_lock();
> (gdb) n
> 49 bql_lock();
> (gdb) n
>
> <hangs>
> <system becomes unresponsive at this point>
>
>
> Thanks,
> Misbah Anjum N <misanjumn@ibm.com>
>
>
>
> On 2026-03-09 18:53, Ani Sinha wrote:
>> Yes seems this is an issue and I will fix it. Not sure if the fix will
>> address your issue though ...
>> Can you try the following patch?
>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
>> From: Ani Sinha <anisinha@redhat.com>
>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> ---
>> system/runstate.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>> diff --git a/system/runstate.c b/system/runstate.c
>> index eca722b43c..c1f41284c9 100644
>> --- a/system/runstate.c
>> +++ b/system/runstate.c
>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>> (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
>> if (ac->rebuild_guest) {
>> ret = ac->rebuild_guest(current_machine);
>> - if (ret < 0) {
>> + if (ret < 0 && ret != -EOPNOTSUPP) {
>> error_report("unable to rebuild guest: %s(%d)",
>> strerror(-ret), ret);
>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>> + } else if (ret == -EOPNOTSUPP) {
>> + error_report("accelerator does not support reset!");
>> } else {
>> info_report("virtual machine state has been rebuilt with new "
>> "guest file handle.");
>> --
>> 2.42.0
>>> Is this a confidential guest that cannot be normally reset?
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-10 8:54 ` Ani Sinha
@ 2026-03-10 9:08 ` Misbah Anjum N
2026-03-10 9:34 ` Ani Sinha
0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-10 9:08 UTC (permalink / raw)
To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc; +Cc: npiggin, Harsh Prateek Bora
On 2026-03-10 14:24, Ani Sinha wrote:
>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com>
>> wrote:
>>
>> Hi Ani and Paolo,
>>
>> We have tested the code by applying both the original commit
>> (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit
>> 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>> However, the issue persists. We've conducted GDB debugging that shows
>> the hang is occurring in a different location than what the fix
>> addresses.
>>
>> Since the original patch is breaking KVM guest bringup completely on
>> ppc64le, and the fix patch does not resolve the issue, given the
>> severity of this regression (complete KVM breakage on ppc64le), we
>> should either find a quick fix or consider reverting the patch until a
>> proper solution can be identified.
>
> Based on what you just described, it does not seem like the issue is
> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
> revert this patch in your local tree, can you confirm that your issue
> gets fixed?
>
Yes, the issue is not seen with the immediate previous commit:
commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
Author: Ani Sinha <anisinha@redhat.com>
Date: Wed Feb 25 09:19:09 2026 +0530
system/physmem: add helper to reattach existing memory after KVM VM
fd change
After the guest KVM file descriptor has changed as a part of the
process of
confidential guest reset mechanism, existing memory needs to be
reattached to
the new file descriptor. This change adds a helper function
ram_block_rebind()
for this purpose. The next patch will make use of this function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link:
https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Looks like the next patch is enabling the functionality of the previous
patches in such a way which causes bql_lock() to get stuck on
architectures (ppc64le in this case) which does not support this feature
yet.
Did you validate your patches on other architectures which does not
support this feature yet?
>>
>> Analysis:
>> 1. This is not a confidential guest. This is a regular KVM guest
>> running on ppc64le.
>> 2. The execution flow shows that qemu_system_reset() completes
>> successfully and never enters the code path at line 529-543
>
> This is what I expected and therefore, no code related to coco guest
> rebuilding is getting executed. Your issue seems to be somewhere else.
>
The issue occurs only with the introduction of this patch and not with
the previous upstream commit as explained above.
>> 3. The hang occurs later in qemu_default_main() at system/main.c:49,
>> after calling bql_lock()
>> 4. The ppc KVM guest boots fine with the previous commit -
>> df8df3cb6b743372ebb335bd8404bc3d748da350
>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP
>> during reset, but bql_lock() getting stuck in qemu_default_main()
>>
>> GDB Trace Analysis:
>> We set breakpoints at qemu_system_reset() and qemu_default_main() to
>> trace the execution flow. The system successfully completes
>> qemu_system_reset() without entering the problematic code path where
>> the fix provided by you applies (system/runstate.c:529-543).
>>
>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>> pseries,accel=kvm -enable-kvm -m 32768 -smp
>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>
>> (gdb) handle SIGUSR1 pass nostop noprint
>> Signal Stop Print Pass to program Description
>> SIGUSR1 No No Yes User defined signal 1
>> (gdb) b qemu_system_reset
>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>> (gdb) b qemu_default_main
>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>> (gdb) r
>>
>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1
>> -machine pseries,accel=kvm -enable-kvm -m 32768 -smp
>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>
>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset
>> (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
>> 513 AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>> (gdb) n
>> 517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) :
>> NULL;
>> (gdb) n
>> 519 cpu_synchronize_all_states();
>> (gdb) n
>> 521 switch (reason) {
>> (gdb) n
>> 529 if (!cpus_are_resettable() &&
>> (gdb) n
>> 553 if (mc && mc->reset) {
>> (gdb) n
>> 554 mc->reset(current_machine, type);
>> (gdb) n
>> 558 switch (reason) {
>> (gdb) n
>> 574 if (cpus_are_resettable()) {
>> (gdb) n
>> 583 cpu_synchronize_all_post_reset();
>> (gdb) n
>> 587 vm_set_suspended(false);
>> (gdb) n
>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>> 1814 register_global_state();
>> (gdb) n
>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at
>> ../system/vl.c:2785
>> 2785 if (machine->cgs && !machine->cgs->ready) {
>> (gdb) n
>> 2791 foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>> (gdb) n
>> 2793 if (!vga_interface_created && !default_vga &&
>> (gdb) n
>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at
>> ../system/vl.c:2815
>> 2815 if (loadvm) {
>> (gdb) n
>> 2820 if (replay_mode != REPLAY_MODE_NONE) {
>> (gdb) n
>> 2824 if (incoming) {
>> (gdb) n
>> 2837 } else if (autostart) {
>> (gdb) n
>> 2838 qmp_cont(NULL);
>> (gdb) n
>> qemu_init (argc=<optimized out>, argv=<optimized out>) at
>> ../system/vl.c:3849
>> 3849 qemu_init_displays();
>> (gdb) n
>> 3850 accel_setup_post(current_machine);
>> (gdb) n
>> 3851 if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>> (gdb) n
>> 3852 os_setup_post();
>> (gdb) n
>> 3854 resume_mux_open();
>> (gdb) n
>> main (argc=<optimized out>, argv=<optimized out>) at
>> ../system/main.c:84
>> 84 bql_unlock();
>> (gdb) n
>> 85 replay_mutex_unlock();
>> (gdb) n
>> 87 if (qemu_main) {
>> (gdb) n
>> 93 qemu_default_main(NULL);
>> (gdb) n
>>
>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main
>> (opaque=opaque@entry=0x0) at ../system/main.c:48
>> 48 replay_mutex_lock();
>> (gdb) n
>> 49 bql_lock();
>> (gdb) n
>>
>> <hangs>
>> <system becomes unresponsive at this point>
>>
>>
>> Thanks,
>> Misbah Anjum N <misanjumn@ibm.com>
>>
>>
>>
>> On 2026-03-09 18:53, Ani Sinha wrote:
>>> Yes seems this is an issue and I will fix it. Not sure if the fix
>>> will
>>> address your issue though ...
>>> Can you try the following patch?
>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00
>>> 2001
>>> From: Ani Sinha <anisinha@redhat.com>
>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support
>>> reset yet
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> ---
>>> system/runstate.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>> diff --git a/system/runstate.c b/system/runstate.c
>>> index eca722b43c..c1f41284c9 100644
>>> --- a/system/runstate.c
>>> +++ b/system/runstate.c
>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>> (current_machine->new_accel_vmfd_on_reset ||
>>> !cpus_are_resettable())) {
>>> if (ac->rebuild_guest) {
>>> ret = ac->rebuild_guest(current_machine);
>>> - if (ret < 0) {
>>> + if (ret < 0 && ret != -EOPNOTSUPP) {
>>> error_report("unable to rebuild guest: %s(%d)",
>>> strerror(-ret), ret);
>>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>>> + } else if (ret == -EOPNOTSUPP) {
>>> + error_report("accelerator does not support reset!");
>>> } else {
>>> info_report("virtual machine state has been rebuilt
>>> with new "
>>> "guest file handle.");
>>> --
>>> 2.42.0
>>>> Is this a confidential guest that cannot be normally reset?
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-10 9:08 ` Misbah Anjum N
@ 2026-03-10 9:34 ` Ani Sinha
2026-03-10 10:05 ` Misbah Anjum N
0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-10 9:34 UTC (permalink / raw)
To: Misbah Anjum N
Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora
> On 10 Mar 2026, at 2:38 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>
> On 2026-03-10 14:24, Ani Sinha wrote:
>>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>> Hi Ani and Paolo,
>>> We have tested the code by applying both the original commit (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>>> However, the issue persists. We've conducted GDB debugging that shows the hang is occurring in a different location than what the fix addresses.
>>> Since the original patch is breaking KVM guest bringup completely on ppc64le, and the fix patch does not resolve the issue, given the severity of this regression (complete KVM breakage on ppc64le), we should either find a quick fix or consider reverting the patch until a proper solution can be identified.
>> Based on what you just described, it does not seem like the issue is
>> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
>> revert this patch in your local tree, can you confirm that your issue
>> gets fixed?
>
> Yes, the issue is not seen with the immediate previous commit:
>
> commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
> Author: Ani Sinha <anisinha@redhat.com>
> Date: Wed Feb 25 09:19:09 2026 +0530
>
> system/physmem: add helper to reattach existing memory after KVM VM fd change
>
> After the guest KVM file descriptor has changed as a part of the process of
> confidential guest reset mechanism, existing memory needs to be reattached to
> the new file descriptor. This change adds a helper function ram_block_rebind()
> for this purpose. The next patch will make use of this function.
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> Link: https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Looks like the next patch is enabling the functionality of the previous patches in such a way which causes bql_lock() to get stuck on architectures (ppc64le in this case) which does not support this feature yet.
This theory is not substantiated by code or evidence. 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd() which is called by this block of code with the tip at 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
if (!cpus_are_resettable() &&
(reason == SHUTDOWN_CAUSE_GUEST_RESET ||
reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
if (ac->rebuild_guest) {
ret = ac->rebuild_guest(current_machine);
if (ret < 0) {
error_report("unable to rebuild guest: %s(%d)",
strerror(-ret), ret);
vm_stop(RUN_STATE_INTERNAL_ERROR);
} else {
info_report("virtual machine state has been rebuilt with new "
"guest file handle.");
guest_state_rebuilt = true;
}
} else if (!cpus_are_resettable()) {
error_report("accelerator does not support reset!");
} else {
error_report("accelerator does not support rebuilding guest state,"
" proceeding with normal reset!");
}
}
If cpus are resettable, this block will not be called and nothing that the patch introduces will have been executed.
So I think you guys need to explain a bit more why you so strongly feel this patch broke it. I am confused and unable to reason this.
>
> Did you validate your patches on other architectures which does not support this feature yet?
As you have already seen, on other architectures, the entire block of code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently exercises this.
>
>>> Analysis:
>>> 1. This is not a confidential guest. This is a regular KVM guest running on ppc64le.
>>> 2. The execution flow shows that qemu_system_reset() completes successfully and never enters the code path at line 529-543
>> This is what I expected and therefore, no code related to coco guest
>> rebuilding is getting executed. Your issue seems to be somewhere else.
>
> The issue occurs only with the introduction of this patch and not with the previous upstream commit as explained above.
>
>>> 3. The hang occurs later in qemu_default_main() at system/main.c:49, after calling bql_lock()
>>> 4. The ppc KVM guest boots fine with the previous commit - df8df3cb6b743372ebb335bd8404bc3d748da350
>>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP during reset, but bql_lock() getting stuck in qemu_default_main()
>>> GDB Trace Analysis:
>>> We set breakpoints at qemu_system_reset() and qemu_default_main() to trace the execution flow. The system successfully completes qemu_system_reset() without entering the problematic code path where the fix provided by you applies (system/runstate.c:529-543).
>>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>> (gdb) handle SIGUSR1 pass nostop noprint
>>> Signal Stop Print Pass to program Description
>>> SIGUSR1 No No Yes User defined signal 1
>>> (gdb) b qemu_system_reset
>>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>>> (gdb) b qemu_default_main
>>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>>> (gdb) r
>>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
>>> 513 AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>>> (gdb) n
>>> 517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
>>> (gdb) n
>>> 519 cpu_synchronize_all_states();
>>> (gdb) n
>>> 521 switch (reason) {
>>> (gdb) n
>>> 529 if (!cpus_are_resettable() &&
>>> (gdb) n
>>> 553 if (mc && mc->reset) {
>>> (gdb) n
>>> 554 mc->reset(current_machine, type);
>>> (gdb) n
>>> 558 switch (reason) {
>>> (gdb) n
>>> 574 if (cpus_are_resettable()) {
>>> (gdb) n
>>> 583 cpu_synchronize_all_post_reset();
>>> (gdb) n
>>> 587 vm_set_suspended(false);
>>> (gdb) n
>>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>>> 1814 register_global_state();
>>> (gdb) n
>>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at ../system/vl.c:2785
>>> 2785 if (machine->cgs && !machine->cgs->ready) {
>>> (gdb) n
>>> 2791 foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>>> (gdb) n
>>> 2793 if (!vga_interface_created && !default_vga &&
>>> (gdb) n
>>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at ../system/vl.c:2815
>>> 2815 if (loadvm) {
>>> (gdb) n
>>> 2820 if (replay_mode != REPLAY_MODE_NONE) {
>>> (gdb) n
>>> 2824 if (incoming) {
>>> (gdb) n
>>> 2837 } else if (autostart) {
>>> (gdb) n
>>> 2838 qmp_cont(NULL);
>>> (gdb) n
>>> qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3849
>>> 3849 qemu_init_displays();
>>> (gdb) n
>>> 3850 accel_setup_post(current_machine);
>>> (gdb) n
>>> 3851 if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>>> (gdb) n
>>> 3852 os_setup_post();
>>> (gdb) n
>>> 3854 resume_mux_open();
>>> (gdb) n
>>> main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
>>> 84 bql_unlock();
>>> (gdb) n
>>> 85 replay_mutex_unlock();
>>> (gdb) n
>>> 87 if (qemu_main) {
>>> (gdb) n
>>> 93 qemu_default_main(NULL);
>>> (gdb) n
>>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main (opaque=opaque@entry=0x0) at ../system/main.c:48
>>> 48 replay_mutex_lock();
>>> (gdb) n
>>> 49 bql_lock();
>>> (gdb) n
>>> <hangs>
>>> <system becomes unresponsive at this point>
>>> Thanks,
>>> Misbah Anjum N <misanjumn@ibm.com>
>>> On 2026-03-09 18:53, Ani Sinha wrote:
>>>> Yes seems this is an issue and I will fix it. Not sure if the fix will
>>>> address your issue though ...
>>>> Can you try the following patch?
>>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
>>>> From: Ani Sinha <anisinha@redhat.com>
>>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>> ---
>>>> system/runstate.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>> diff --git a/system/runstate.c b/system/runstate.c
>>>> index eca722b43c..c1f41284c9 100644
>>>> --- a/system/runstate.c
>>>> +++ b/system/runstate.c
>>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>>> (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
>>>> if (ac->rebuild_guest) {
>>>> ret = ac->rebuild_guest(current_machine);
>>>> - if (ret < 0) {
>>>> + if (ret < 0 && ret != -EOPNOTSUPP) {
>>>> error_report("unable to rebuild guest: %s(%d)",
>>>> strerror(-ret), ret);
>>>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>> + } else if (ret == -EOPNOTSUPP) {
>>>> + error_report("accelerator does not support reset!");
>>>> } else {
>>>> info_report("virtual machine state has been rebuilt with new "
>>>> "guest file handle.");
>>>> --
>>>> 2.42.0
>>>>> Is this a confidential guest that cannot be normally reset?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-10 9:34 ` Ani Sinha
@ 2026-03-10 10:05 ` Misbah Anjum N
2026-03-10 10:12 ` Ani Sinha
0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-10 10:05 UTC (permalink / raw)
To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
Cc: npiggin, Harsh Prateek Bora, vaibhav, sbhat
On 2026-03-10 15:04, Ani Sinha wrote:
>> On 10 Mar 2026, at 2:38 PM, Misbah Anjum N <misanjum@linux.ibm.com>
>> wrote:
>>
>> On 2026-03-10 14:24, Ani Sinha wrote:
>>>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com>
>>>> wrote:
>>>> Hi Ani and Paolo,
>>>> We have tested the code by applying both the original commit
>>>> (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch
>>>> (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>>>> However, the issue persists. We've conducted GDB debugging that
>>>> shows the hang is occurring in a different location than what the
>>>> fix addresses.
>>>> Since the original patch is breaking KVM guest bringup completely on
>>>> ppc64le, and the fix patch does not resolve the issue, given the
>>>> severity of this regression (complete KVM breakage on ppc64le), we
>>>> should either find a quick fix or consider reverting the patch until
>>>> a proper solution can be identified.
>>> Based on what you just described, it does not seem like the issue is
>>> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
>>> revert this patch in your local tree, can you confirm that your issue
>>> gets fixed?
>>
>> Yes, the issue is not seen with the immediate previous commit:
>>
>> commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
>> Author: Ani Sinha <anisinha@redhat.com>
>> Date: Wed Feb 25 09:19:09 2026 +0530
>>
>> system/physmem: add helper to reattach existing memory after KVM VM
>> fd change
>>
>> After the guest KVM file descriptor has changed as a part of the
>> process of
>> confidential guest reset mechanism, existing memory needs to be
>> reattached to
>> the new file descriptor. This change adds a helper function
>> ram_block_rebind()
>> for this purpose. The next patch will make use of this function.
>>
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> Link:
>> https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> Looks like the next patch is enabling the functionality of the
>> previous patches in such a way which causes bql_lock() to get stuck on
>> architectures (ppc64le in this case) which does not support this
>> feature yet.
>
> This theory is not substantiated by code or evidence.
> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
> which is called by this block of code with the tip at
> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>
> if (!cpus_are_resettable() &&
> (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
> reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
> if (ac->rebuild_guest) {
> ret = ac->rebuild_guest(current_machine);
> if (ret < 0) {
> error_report("unable to rebuild guest: %s(%d)",
> strerror(-ret), ret);
> vm_stop(RUN_STATE_INTERNAL_ERROR);
> } else {
> info_report("virtual machine state has been rebuilt
> with new "
> "guest file handle.");
> guest_state_rebuilt = true;
> }
> } else if (!cpus_are_resettable()) {
> error_report("accelerator does not support reset!");
> } else {
> error_report("accelerator does not support rebuilding guest
> state,"
> " proceeding with normal reset!");
> }
> }
>
> If cpus are resettable, this block will not be called and nothing that
> the patch introduces will have been executed.
> So I think you guys need to explain a bit more why you so strongly
> feel this patch broke it. I am confused and unable to reason this.
>
>>
>> Did you validate your patches on other architectures which does not
>> support this feature yet?
>
> As you have already seen, on other architectures, the entire block of
> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
> exercises this.
>
I understand your concern about the code path analysis. Let me clarify
our findings with concrete evidence.
Reproducibility Evidence:
With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are
able to reproduce the hang issue 100% of the time across multiple test
runs. When we revert to the previous commit
df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots
successfully 100% of the time.
This consistent reproducibility strongly indicates that commit
98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the regression,
even if the code path analysis suggests otherwise. This suggests the
issue may not be in the code path, but rather in the changes introduced
by the patch series.
As the author who led the development of this patch series, we would
appreciate your help in figuring out this issue.
>>
>>>> Analysis:
>>>> 1. This is not a confidential guest. This is a regular KVM guest
>>>> running on ppc64le.
>>>> 2. The execution flow shows that qemu_system_reset() completes
>>>> successfully and never enters the code path at line 529-543
>>> This is what I expected and therefore, no code related to coco guest
>>> rebuilding is getting executed. Your issue seems to be somewhere
>>> else.
>>
>> The issue occurs only with the introduction of this patch and not with
>> the previous upstream commit as explained above.
>>
>>>> 3. The hang occurs later in qemu_default_main() at system/main.c:49,
>>>> after calling bql_lock()
>>>> 4. The ppc KVM guest boots fine with the previous commit -
>>>> df8df3cb6b743372ebb335bd8404bc3d748da350
>>>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP
>>>> during reset, but bql_lock() getting stuck in qemu_default_main()
>>>> GDB Trace Analysis:
>>>> We set breakpoints at qemu_system_reset() and qemu_default_main() to
>>>> trace the execution flow. The system successfully completes
>>>> qemu_system_reset() without entering the problematic code path where
>>>> the fix provided by you applies (system/runstate.c:529-543).
>>>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1
>>>> -machine pseries,accel=kvm -enable-kvm -m 32768 -smp
>>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>> (gdb) handle SIGUSR1 pass nostop noprint
>>>> Signal Stop Print Pass to program Description
>>>> SIGUSR1 No No Yes User defined signal 1
>>>> (gdb) b qemu_system_reset
>>>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>>>> (gdb) b qemu_default_main
>>>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>>>> (gdb) r
>>>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1
>>>> -machine pseries,accel=kvm -enable-kvm -m 32768 -smp
>>>> 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
>>>> virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
>>>> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
>>>> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
>>>> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset
>>>> (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at
>>>> ../system/runstate.c:513
>>>> 513 AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>>>> (gdb) n
>>>> 517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) :
>>>> NULL;
>>>> (gdb) n
>>>> 519 cpu_synchronize_all_states();
>>>> (gdb) n
>>>> 521 switch (reason) {
>>>> (gdb) n
>>>> 529 if (!cpus_are_resettable() &&
>>>> (gdb) n
>>>> 553 if (mc && mc->reset) {
>>>> (gdb) n
>>>> 554 mc->reset(current_machine, type);
>>>> (gdb) n
>>>> 558 switch (reason) {
>>>> (gdb) n
>>>> 574 if (cpus_are_resettable()) {
>>>> (gdb) n
>>>> 583 cpu_synchronize_all_post_reset();
>>>> (gdb) n
>>>> 587 vm_set_suspended(false);
>>>> (gdb) n
>>>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>>>> 1814 register_global_state();
>>>> (gdb) n
>>>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at
>>>> ../system/vl.c:2785
>>>> 2785 if (machine->cgs && !machine->cgs->ready) {
>>>> (gdb) n
>>>> 2791 foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>>>> (gdb) n
>>>> 2793 if (!vga_interface_created && !default_vga &&
>>>> (gdb) n
>>>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at
>>>> ../system/vl.c:2815
>>>> 2815 if (loadvm) {
>>>> (gdb) n
>>>> 2820 if (replay_mode != REPLAY_MODE_NONE) {
>>>> (gdb) n
>>>> 2824 if (incoming) {
>>>> (gdb) n
>>>> 2837 } else if (autostart) {
>>>> (gdb) n
>>>> 2838 qmp_cont(NULL);
>>>> (gdb) n
>>>> qemu_init (argc=<optimized out>, argv=<optimized out>) at
>>>> ../system/vl.c:3849
>>>> 3849 qemu_init_displays();
>>>> (gdb) n
>>>> 3850 accel_setup_post(current_machine);
>>>> (gdb) n
>>>> 3851 if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>>>> (gdb) n
>>>> 3852 os_setup_post();
>>>> (gdb) n
>>>> 3854 resume_mux_open();
>>>> (gdb) n
>>>> main (argc=<optimized out>, argv=<optimized out>) at
>>>> ../system/main.c:84
>>>> 84 bql_unlock();
>>>> (gdb) n
>>>> 85 replay_mutex_unlock();
>>>> (gdb) n
>>>> 87 if (qemu_main) {
>>>> (gdb) n
>>>> 93 qemu_default_main(NULL);
>>>> (gdb) n
>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main
>>>> (opaque=opaque@entry=0x0) at ../system/main.c:48
>>>> 48 replay_mutex_lock();
>>>> (gdb) n
>>>> 49 bql_lock();
>>>> (gdb) n
>>>> <hangs>
>>>> <system becomes unresponsive at this point>
>>>> Thanks,
>>>> Misbah Anjum N <misanjumn@ibm.com>
>>>> On 2026-03-09 18:53, Ani Sinha wrote:
>>>>> Yes seems this is an issue and I will fix it. Not sure if the fix
>>>>> will
>>>>> address your issue though ...
>>>>> Can you try the following patch?
>>>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00
>>>>> 2001
>>>>> From: Ani Sinha <anisinha@redhat.com>
>>>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support
>>>>> reset yet
>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>> ---
>>>>> system/runstate.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>> diff --git a/system/runstate.c b/system/runstate.c
>>>>> index eca722b43c..c1f41284c9 100644
>>>>> --- a/system/runstate.c
>>>>> +++ b/system/runstate.c
>>>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>>>> (current_machine->new_accel_vmfd_on_reset ||
>>>>> !cpus_are_resettable())) {
>>>>> if (ac->rebuild_guest) {
>>>>> ret = ac->rebuild_guest(current_machine);
>>>>> - if (ret < 0) {
>>>>> + if (ret < 0 && ret != -EOPNOTSUPP) {
>>>>> error_report("unable to rebuild guest: %s(%d)",
>>>>> strerror(-ret), ret);
>>>>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>>> + } else if (ret == -EOPNOTSUPP) {
>>>>> + error_report("accelerator does not support
>>>>> reset!");
>>>>> } else {
>>>>> info_report("virtual machine state has been rebuilt
>>>>> with new "
>>>>> "guest file handle.");
>>>>> --
>>>>> 2.42.0
>>>>>> Is this a confidential guest that cannot be normally reset?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-10 10:05 ` Misbah Anjum N
@ 2026-03-10 10:12 ` Ani Sinha
2026-03-18 8:19 ` Misbah Anjum N
0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-10 10:12 UTC (permalink / raw)
To: Misbah Anjum N
Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
vaibhav, sbhat
> On 10 Mar 2026, at 3:35 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>
> On 2026-03-10 15:04, Ani Sinha wrote:
>>> On 10 Mar 2026, at 2:38 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>> On 2026-03-10 14:24, Ani Sinha wrote:
>>>>> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>>>> Hi Ani and Paolo,
>>>>> We have tested the code by applying both the original commit (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
>>>>> However, the issue persists. We've conducted GDB debugging that shows the hang is occurring in a different location than what the fix addresses.
>>>>> Since the original patch is breaking KVM guest bringup completely on ppc64le, and the fix patch does not resolve the issue, given the severity of this regression (complete KVM breakage on ppc64le), we should either find a quick fix or consider reverting the patch until a proper solution can be identified.
>>>> Based on what you just described, it does not seem like the issue is
>>>> related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
>>>> revert this patch in your local tree, can you confirm that your issue
>>>> gets fixed?
>>> Yes, the issue is not seen with the immediate previous commit:
>>> commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
>>> Author: Ani Sinha <anisinha@redhat.com>
>>> Date: Wed Feb 25 09:19:09 2026 +0530
>>> system/physmem: add helper to reattach existing memory after KVM VM fd change
>>> After the guest KVM file descriptor has changed as a part of the process of
>>> confidential guest reset mechanism, existing memory needs to be reattached to
>>> the new file descriptor. This change adds a helper function ram_block_rebind()
>>> for this purpose. The next patch will make use of this function.
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> Link: https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Looks like the next patch is enabling the functionality of the previous patches in such a way which causes bql_lock() to get stuck on architectures (ppc64le in this case) which does not support this feature yet.
>> This theory is not substantiated by code or evidence.
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
>> which is called by this block of code with the tip at
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>> if (!cpus_are_resettable() &&
>> (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>> reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>> if (ac->rebuild_guest) {
>> ret = ac->rebuild_guest(current_machine);
>> if (ret < 0) {
>> error_report("unable to rebuild guest: %s(%d)",
>> strerror(-ret), ret);
>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>> } else {
>> info_report("virtual machine state has been rebuilt with new "
>> "guest file handle.");
>> guest_state_rebuilt = true;
>> }
>> } else if (!cpus_are_resettable()) {
>> error_report("accelerator does not support reset!");
>> } else {
>> error_report("accelerator does not support rebuilding guest state,"
>> " proceeding with normal reset!");
>> }
>> }
>> If cpus are resettable, this block will not be called and nothing that
>> the patch introduces will have been executed.
>> So I think you guys need to explain a bit more why you so strongly
>> feel this patch broke it. I am confused and unable to reason this.
>>> Did you validate your patches on other architectures which does not support this feature yet?
>> As you have already seen, on other architectures, the entire block of
>> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
>> exercises this.
>
> I understand your concern about the code path analysis. Let me clarify our findings with concrete evidence.
>
> Reproducibility Evidence:
> With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are able to reproduce the hang issue 100% of the time across multiple test runs. When we revert to the previous commit df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots successfully 100% of the time.
>
> This consistent reproducibility strongly indicates that commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the regression, even if the code path analysis suggests otherwise. This suggests the issue may not be in the code path, but rather in the changes introduced by the patch series.
>
> As the author who led the development of this patch series, we would appreciate your help in figuring out this issue.
I am really not sure what changes in that patch can cause this breakage in a completely unrelated area when the changes are not even executed.
>
>>>>> Analysis:
>>>>> 1. This is not a confidential guest. This is a regular KVM guest running on ppc64le.
>>>>> 2. The execution flow shows that qemu_system_reset() completes successfully and never enters the code path at line 529-543
>>>> This is what I expected and therefore, no code related to coco guest
>>>> rebuilding is getting executed. Your issue seems to be somewhere else.
>>> The issue occurs only with the introduction of this patch and not with the previous upstream commit as explained above.
>>>>> 3. The hang occurs later in qemu_default_main() at system/main.c:49, after calling bql_lock()
>>>>> 4. The ppc KVM guest boots fine with the previous commit - df8df3cb6b743372ebb335bd8404bc3d748da350
>>>>> 5. This suggests the issue is not with error handling of -EOPNOTSUPP during reset, but bql_lock() getting stuck in qemu_default_main()
>>>>> GDB Trace Analysis:
>>>>> We set breakpoints at qemu_system_reset() and qemu_default_main() to trace the execution flow. The system successfully completes qemu_system_reset() without entering the problematic code path where the fix provided by you applies (system/runstate.c:529-543).
>>>>> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>>> (gdb) handle SIGUSR1 pass nostop noprint
>>>>> Signal Stop Print Pass to program Description
>>>>> SIGUSR1 No No Yes User defined signal 1
>>>>> (gdb) b qemu_system_reset
>>>>> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
>>>>> (gdb) b qemu_default_main
>>>>> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
>>>>> (gdb) r
>>>>> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic -serial pty -device virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
>>>>> 513 AccelClass *ac = ACCEL_GET_CLASS(current_accel());
>>>>> (gdb) n
>>>>> 517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
>>>>> (gdb) n
>>>>> 519 cpu_synchronize_all_states();
>>>>> (gdb) n
>>>>> 521 switch (reason) {
>>>>> (gdb) n
>>>>> 529 if (!cpus_are_resettable() &&
>>>>> (gdb) n
>>>>> 553 if (mc && mc->reset) {
>>>>> (gdb) n
>>>>> 554 mc->reset(current_machine, type);
>>>>> (gdb) n
>>>>> 558 switch (reason) {
>>>>> (gdb) n
>>>>> 574 if (cpus_are_resettable()) {
>>>>> (gdb) n
>>>>> 583 cpu_synchronize_all_post_reset();
>>>>> (gdb) n
>>>>> 587 vm_set_suspended(false);
>>>>> (gdb) n
>>>>> qdev_machine_creation_done () at ../hw/core/machine.c:1814
>>>>> 1814 register_global_state();
>>>>> (gdb) n
>>>>> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at ../system/vl.c:2785
>>>>> 2785 if (machine->cgs && !machine->cgs->ready) {
>>>>> (gdb) n
>>>>> 2791 foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
>>>>> (gdb) n
>>>>> 2793 if (!vga_interface_created && !default_vga &&
>>>>> (gdb) n
>>>>> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at ../system/vl.c:2815
>>>>> 2815 if (loadvm) {
>>>>> (gdb) n
>>>>> 2820 if (replay_mode != REPLAY_MODE_NONE) {
>>>>> (gdb) n
>>>>> 2824 if (incoming) {
>>>>> (gdb) n
>>>>> 2837 } else if (autostart) {
>>>>> (gdb) n
>>>>> 2838 qmp_cont(NULL);
>>>>> (gdb) n
>>>>> qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3849
>>>>> 3849 qemu_init_displays();
>>>>> (gdb) n
>>>>> 3850 accel_setup_post(current_machine);
>>>>> (gdb) n
>>>>> 3851 if (migrate_mode() != MIG_MODE_CPR_EXEC) {
>>>>> (gdb) n
>>>>> 3852 os_setup_post();
>>>>> (gdb) n
>>>>> 3854 resume_mux_open();
>>>>> (gdb) n
>>>>> main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
>>>>> 84 bql_unlock();
>>>>> (gdb) n
>>>>> 85 replay_mutex_unlock();
>>>>> (gdb) n
>>>>> 87 if (qemu_main) {
>>>>> (gdb) n
>>>>> 93 qemu_default_main(NULL);
>>>>> (gdb) n
>>>>> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main (opaque=opaque@entry=0x0) at ../system/main.c:48
>>>>> 48 replay_mutex_lock();
>>>>> (gdb) n
>>>>> 49 bql_lock();
>>>>> (gdb) n
>>>>> <hangs>
>>>>> <system becomes unresponsive at this point>
>>>>> Thanks,
>>>>> Misbah Anjum N <misanjumn@ibm.com>
>>>>> On 2026-03-09 18:53, Ani Sinha wrote:
>>>>>> Yes seems this is an issue and I will fix it. Not sure if the fix will
>>>>>> address your issue though ...
>>>>>> Can you try the following patch?
>>>>>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
>>>>>> From: Ani Sinha <anisinha@redhat.com>
>>>>>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>>>>>> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
>>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>>> ---
>>>>>> system/runstate.c | 4 +++-
>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>> diff --git a/system/runstate.c b/system/runstate.c
>>>>>> index eca722b43c..c1f41284c9 100644
>>>>>> --- a/system/runstate.c
>>>>>> +++ b/system/runstate.c
>>>>>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>>>>>> (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
>>>>>> if (ac->rebuild_guest) {
>>>>>> ret = ac->rebuild_guest(current_machine);
>>>>>> - if (ret < 0) {
>>>>>> + if (ret < 0 && ret != -EOPNOTSUPP) {
>>>>>> error_report("unable to rebuild guest: %s(%d)",
>>>>>> strerror(-ret), ret);
>>>>>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>>>> + } else if (ret == -EOPNOTSUPP) {
>>>>>> + error_report("accelerator does not support reset!");
>>>>>> } else {
>>>>>> info_report("virtual machine state has been rebuilt with new "
>>>>>> "guest file handle.");
>>>>>> --
>>>>>> 2.42.0
>>>>>>> Is this a confidential guest that cannot be normally reset?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-10 10:12 ` Ani Sinha
@ 2026-03-18 8:19 ` Misbah Anjum N
2026-03-18 8:39 ` Ani Sinha
0 siblings, 1 reply; 19+ messages in thread
From: Misbah Anjum N @ 2026-03-18 8:19 UTC (permalink / raw)
To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
Cc: npiggin, Harsh Prateek Bora, vaibhav, sbhat
Hi Ani and Paolo,
Following up on the KVM guest boot issue due to commit 98884e0c, I have
conducted additional testing that reveals important new information
about the nature of this issue.
The hang is specifically triggered when SMP is configured, that is, when
-smp parameter is provided in the QEMU command. This is also validated
via KVM Unit Tests involving SMP which are failing due to the same
commit.
Test Results:
Without SMP (boots successfully):
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm \
-enable-kvm -m 32768 -nographic -device virtio-balloon \
-device virtio-scsi-pci,id=scsi0 \
-drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
\
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
-netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
SLOF
**********************************************************************
QEMU Starting
Build Date = Oct 26 2025 18:45:22
FW Version = release 20251026
Press "s" to enter Open Firmware.
Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
...
...
Result: Guest boots successfully
With SMP (hangs indefinitely):
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm \
-enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
-device virtio-balloon -device virtio-scsi-pci,id=scsi0 \
-drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
\
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
-netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
...
...
Result: Hangs indefinitely at bql_lock() in qemu_default_main()
KVM Unit Tests:
Running kvm-unit-tests confirms the SMP dependency. Note that tests
explicitly involving SMP (smp, smp-smt, atomics) all fail with SIGKILL,
while single-threaded tests pass.
# ./run_tests.sh
FAIL selftest-setup (terminated on SIGKILL)
PASS selftest-migration (2 tests)
PASS selftest-migration-skip (1 tests)
PASS migration-memory (1 tests)
PASS spapr_hcall (9 tests, 1 skipped)
PASS spapr_vpa (13 tests)
PASS rtas-get-time-of-day (10 tests)
PASS rtas-get-time-of-day-base (10 tests)
PASS rtas-set-time-of-day (5 tests)
PASS emulator (4 tests)
PASS interrupts (13 tests)
FAIL mmu (terminated on SIGKILL)
FAIL smp (terminated on SIGKILL)
FAIL smp-smt (terminated on SIGKILL)
SKIP smp-thread-single (qemu-system-ppc64: -accel tcg,thread=single:
invalid accelerator tcg)
FAIL atomics (terminated on SIGKILL)
PASS atomics-migration (1 tests)
PASS timebase (12 tests, 1 known failures, 1 skipped)
SKIP timebase-icount (qemu-system-ppc64: -icount shift=5: cannot
configure icount, TCG support not available)
FAIL h_cede_tm
PASS sprs (14 tests)
FAIL sprs-migration (14 tests, 1 unexpected failures)
PASS sieve
Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>
On 2026-03-10 15:42, Ani Sinha wrote:
>>> This theory is not substantiated by code or evidence.
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
>>> which is called by this block of code with the tip at
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>>> if (!cpus_are_resettable() &&
>>> (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>>> reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>>> if (ac->rebuild_guest) {
>>> ret = ac->rebuild_guest(current_machine);
>>> if (ret < 0) {
>>> error_report("unable to rebuild guest: %s(%d)",
>>> strerror(-ret), ret);
>>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>>> } else {
>>> info_report("virtual machine state has been rebuilt
>>> with new "
>>> "guest file handle.");
>>> guest_state_rebuilt = true;
>>> }
>>> } else if (!cpus_are_resettable()) {
>>> error_report("accelerator does not support reset!");
>>> } else {
>>> error_report("accelerator does not support rebuilding
>>> guest state,"
>>> " proceeding with normal reset!");
>>> }
>>> }
>>> If cpus are resettable, this block will not be called and nothing
>>> that
>>> the patch introduces will have been executed.
>>> So I think you guys need to explain a bit more why you so strongly
>>> feel this patch broke it. I am confused and unable to reason this.
>>>> Did you validate your patches on other architectures which does not
>>>> support this feature yet?
>>> As you have already seen, on other architectures, the entire block of
>>> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
>>> exercises this.
>>
>> I understand your concern about the code path analysis. Let me clarify
>> our findings with concrete evidence.
>>
>> Reproducibility Evidence:
>> With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are
>> able to reproduce the hang issue 100% of the time across multiple test
>> runs. When we revert to the previous commit
>> df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots
>> successfully 100% of the time.
>>
>> This consistent reproducibility strongly indicates that commit
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the
>> regression, even if the code path analysis suggests otherwise. This
>> suggests the issue may not be in the code path, but rather in the
>> changes introduced by the patch series.
>>
>> As the author who led the development of this patch series, we would
>> appreciate your help in figuring out this issue.
>
> I am really not sure what changes in that patch can cause this
> breakage in a completely unrelated area when the changes are not even
> executed.
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-18 8:19 ` Misbah Anjum N
@ 2026-03-18 8:39 ` Ani Sinha
2026-03-18 9:30 ` Ani Sinha
0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-18 8:39 UTC (permalink / raw)
To: Misbah Anjum N
Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
vaibhav, sbhat
> On 18 Mar 2026, at 1:49 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>
> Hi Ani and Paolo,
>
> Following up on the KVM guest boot issue due to commit 98884e0c, I have conducted additional testing that reveals important new information about the nature of this issue.
>
> The hang is specifically triggered when SMP is configured, that is, when -smp parameter is provided in the QEMU command. This is also validated via KVM Unit Tests involving SMP which are failing due to the same commit.
So basically what we know is:
- Issue seems to show on ppc64.
- tree with df8df3cb6b at tip does not show issue.
- tree with next commit 98884e0cc1 at tip shows the issue.
- kvm_reset_vmfd() introduced by 98884e0cc1 is not called.
You think any one these commits are the cause of the issue (which I personally cannot agree with):
98884e0cc1 accel/kvm: add changes required to support KVM VM file descriptor change
df8df3cb6b system/physmem: add helper to reattach existing memory after KVM VM fd change
4003e5e65f hw/accel: add a per-accelerator callback to change VM accelerator handle
2391125f13 accel/kvm: add confidential class member to indicate guest rebuild capability
b3f0a55576 i386/kvm: avoid installing duplicate msr entries in msr_handlers
None of the above commits does anything SMP related.
>
> Test Results:
> Without SMP (boots successfully):
> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm \
> -enable-kvm -m 32768 -nographic -device virtio-balloon \
> -device virtio-scsi-pci,id=scsi0 \
> -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 \
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
> -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>
> SLOF **********************************************************************
> QEMU Starting
> Build Date = Oct 26 2025 18:45:22
> FW Version = release 20251026
> Press "s" to enter Open Firmware.
>
> Populating /vdevice methods
> Populating /vdevice/vty@71000000
> Populating /vdevice/nvram@71000001
> Populating /pci@800000020000000
> ...
> ...
> Result: Guest boots successfully
>
> With SMP (hangs indefinitely):
> /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine pseries,accel=kvm \
> -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
> -device virtio-balloon -device virtio-scsi-pci,id=scsi0 \
> -drive file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2 \
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
> -netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
> ...
> ...
> Result: Hangs indefinitely at bql_lock() in qemu_default_main()
>
> KVM Unit Tests:
> Running kvm-unit-tests confirms the SMP dependency. Note that tests explicitly involving SMP (smp, smp-smt, atomics) all fail with SIGKILL, while single-threaded tests pass.
>
> # ./run_tests.sh
> FAIL selftest-setup (terminated on SIGKILL)
> PASS selftest-migration (2 tests)
> PASS selftest-migration-skip (1 tests)
> PASS migration-memory (1 tests)
> PASS spapr_hcall (9 tests, 1 skipped)
> PASS spapr_vpa (13 tests)
> PASS rtas-get-time-of-day (10 tests)
> PASS rtas-get-time-of-day-base (10 tests)
> PASS rtas-set-time-of-day (5 tests)
> PASS emulator (4 tests)
> PASS interrupts (13 tests)
> FAIL mmu (terminated on SIGKILL)
> FAIL smp (terminated on SIGKILL)
> FAIL smp-smt (terminated on SIGKILL)
> SKIP smp-thread-single (qemu-system-ppc64: -accel tcg,thread=single: invalid accelerator tcg)
> FAIL atomics (terminated on SIGKILL)
> PASS atomics-migration (1 tests)
> PASS timebase (12 tests, 1 known failures, 1 skipped)
> SKIP timebase-icount (qemu-system-ppc64: -icount shift=5: cannot configure icount, TCG support not available)
> FAIL h_cede_tm
> PASS sprs (14 tests)
> FAIL sprs-migration (14 tests, 1 unexpected failures)
> PASS sieve
>
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>
>
>
> On 2026-03-10 15:42, Ani Sinha wrote:
>>>> This theory is not substantiated by code or evidence.
>>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
>>>> which is called by this block of code with the tip at
>>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>>>> if (!cpus_are_resettable() &&
>>>> (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>>>> reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>>>> if (ac->rebuild_guest) {
>>>> ret = ac->rebuild_guest(current_machine);
>>>> if (ret < 0) {
>>>> error_report("unable to rebuild guest: %s(%d)",
>>>> strerror(-ret), ret);
>>>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>>>> } else {
>>>> info_report("virtual machine state has been rebuilt with new "
>>>> "guest file handle.");
>>>> guest_state_rebuilt = true;
>>>> }
>>>> } else if (!cpus_are_resettable()) {
>>>> error_report("accelerator does not support reset!");
>>>> } else {
>>>> error_report("accelerator does not support rebuilding guest state,"
>>>> " proceeding with normal reset!");
>>>> }
>>>> }
>>>> If cpus are resettable, this block will not be called and nothing that
>>>> the patch introduces will have been executed.
>>>> So I think you guys need to explain a bit more why you so strongly
>>>> feel this patch broke it. I am confused and unable to reason this.
>>>>> Did you validate your patches on other architectures which does not support this feature yet?
>>>> As you have already seen, on other architectures, the entire block of
>>>> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
>>>> exercises this.
>>> I understand your concern about the code path analysis. Let me clarify our findings with concrete evidence.
>>> Reproducibility Evidence:
>>> With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are able to reproduce the hang issue 100% of the time across multiple test runs. When we revert to the previous commit df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots successfully 100% of the time.
>>> This consistent reproducibility strongly indicates that commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the regression, even if the code path analysis suggests otherwise. This suggests the issue may not be in the code path, but rather in the changes introduced by the patch series.
>>> As the author who led the development of this patch series, we would appreciate your help in figuring out this issue.
>> I am really not sure what changes in that patch can cause this
>> breakage in a completely unrelated area when the changes are not even
>> executed.
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-18 8:39 ` Ani Sinha
@ 2026-03-18 9:30 ` Ani Sinha
2026-04-06 8:54 ` Misbah Anjum N
0 siblings, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-03-18 9:30 UTC (permalink / raw)
To: Misbah Anjum N
Cc: Paolo Bonzini, qemu-devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
vaibhav, sbhat
> On 18 Mar 2026, at 2:09 PM, Ani Sinha <anisinha@redhat.com> wrote:
>
>
>
>> On 18 Mar 2026, at 1:49 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>
>> Hi Ani and Paolo,
>>
>> Following up on the KVM guest boot issue due to commit 98884e0c, I have conducted additional testing that reveals important new information about the nature of this issue.
>>
>> The hang is specifically triggered when SMP is configured, that is, when -smp parameter is provided in the QEMU command. This is also validated via KVM Unit Tests involving SMP which are failing due to the same commit.
>
> So basically what we know is:
> - Issue seems to show on ppc64.
> - tree with df8df3cb6b at tip does not show issue.
> - tree with next commit 98884e0cc1 at tip shows the issue.
> - kvm_reset_vmfd() introduced by 98884e0cc1 is not called.
>
> You think any one these commits are the cause of the issue (which I personally cannot agree with):
>
> 98884e0cc1 accel/kvm: add changes required to support KVM VM file descriptor change
> df8df3cb6b system/physmem: add helper to reattach existing memory after KVM VM fd change
> 4003e5e65f hw/accel: add a per-accelerator callback to change VM accelerator handle
> 2391125f13 accel/kvm: add confidential class member to indicate guest rebuild capability
> b3f0a55576 i386/kvm: avoid installing duplicate msr entries in msr_handlers
>
> None of the above commits does anything SMP related.
One possible thing to try is:
Revert everything in stubs/kvm.c and hence changes in stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c introduced by 98884e0cc1 .
You will have to comment out calls to kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since kvm_reset_vmfd() is not called anyway, not should make no difference if those calls are commented out.
Let me know what you get after doing the above.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-03-18 9:30 ` Ani Sinha
@ 2026-04-06 8:54 ` Misbah Anjum N
2026-04-07 4:09 ` Ani Sinha
2026-04-09 16:18 ` Harsh Prateek Bora
0 siblings, 2 replies; 19+ messages in thread
From: Misbah Anjum N @ 2026-04-06 8:54 UTC (permalink / raw)
To: Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
Cc: npiggin, Harsh Prateek Bora, vaibhav, sbhat
Hi Ani,
I've completed the testing you suggested. Unfortunately, the SMP hang
still persists with these changes.
Changes made:
As requested, I reverted everything in stubs/kvm.c and the related
changes in stubs/meson.build, include/system/kvm.h, and
target/i386/kvm/kvm.c. I also commented out the calls to
kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in
kvm_reset_vmfd().
Test result:
The issue persists - guests still hang indefinitely during boot when SMP
is configured.
Git diff:
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index cc5c42ce4d..04b9cbe7c9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2622,11 +2622,12 @@ static int kvm_reset_vmfd(MachineState *ms)
* bail if the current architecture does not support VM file
* descriptor change.
*/
- if (!kvm_arch_supports_vmfd_change()) {
+ /*if (!kvm_arch_supports_vmfd_change()) {
error_report("This target architecture does not support KVM VM
"
"file descriptor change.");
return -EOPNOTSUPP;
}
+ */
s = KVM_STATE(ms->accelerator);
kml = &s->memory_listener;
@@ -2659,10 +2660,10 @@ static int kvm_reset_vmfd(MachineState *ms)
}
assert(!err);
- ret = kvm_arch_on_vmfd_change(ms, s);
+ /*ret = kvm_arch_on_vmfd_change(ms, s);
if (ret < 0) {
return ret;
- }
+ }*/
if (s->kernel_irqchip_allowed) {
do_kvm_irqchip_create(s);
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 5fc7251fd9..0dad0079ed 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -456,8 +456,6 @@ int kvm_physical_memory_addr_from_host(KVMState *s,
void *ram_addr,
#endif /* COMPILING_PER_TARGET */
-bool kvm_arch_supports_vmfd_change(void);
-int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);
void kvm_cpu_synchronize_state(CPUState *cpu);
diff --git a/stubs/kvm.c b/stubs/kvm.c
deleted file mode 100644
index 2db61d89a7..0000000000
--- a/stubs/kvm.c
+++ /dev/null
@@ -1,22 +0,0 @@
-/*
- * kvm target arch specific stubs
- *
- * Copyright (c) 2026 Red Hat, Inc.
- *
- * Author:
- * Ani Sinha <anisinha@redhat.com>
- *
- * SPDX-License-Identifier: GPL-2.0-or-later
- */
-#include "qemu/osdep.h"
-#include "system/kvm.h"
-
-int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
-{
- abort();
-}
-
-bool kvm_arch_supports_vmfd_change(void)
-{
- return false;
-}
diff --git a/stubs/meson.build b/stubs/meson.build
index 6ae478bacc..8a07059500 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -74,7 +74,6 @@ if have_system
if igvm.found()
stub_ss.add(files('igvm.c'))
endif
- stub_ss.add(files('kvm.c'))
stub_ss.add(files('target-get-monitor-def.c'))
stub_ss.add(files('target-monitor-defs.c'))
stub_ss.add(files('win32-kbd-hook.c'))
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 524b5276a6..3dfd9a5974 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3389,15 +3389,6 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
return 0;
}
-int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
-{
- abort();
-}
-
-bool kvm_arch_supports_vmfd_change(void)
-{
- return false;
-}
int kvm_arch_init(MachineState *ms, KVMState *s)
{
I've also tested with the latest QEMU build from master, and the issue
still persists there as well. Could you suggest what additional
debugging steps I should take to help identify the root cause?
Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>
On 2026-03-18 15:00, Ani Sinha wrote:
> One possible thing to try is:
>
> Revert everything in stubs/kvm.c and hence changes in
> stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c
> introduced by 98884e0cc1 .
> You will have to comment out calls to kvm_arch_supports_vmfd_change()
> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since
> kvm_reset_vmfd() is not called anyway, not should make no difference
> if those calls are commented out.
>
> Let me know what you get after doing the above.
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-04-06 8:54 ` Misbah Anjum N
@ 2026-04-07 4:09 ` Ani Sinha
2026-04-07 13:45 ` Ani Sinha
2026-04-09 16:18 ` Harsh Prateek Bora
1 sibling, 1 reply; 19+ messages in thread
From: Ani Sinha @ 2026-04-07 4:09 UTC (permalink / raw)
To: Misbah Anjum N
Cc: Pbonzini, Qemu Devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
vaibhav, sbhat
> On 6 Apr 2026, at 2:24 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>
> Hi Ani,
> I've completed the testing you suggested. Unfortunately, the SMP hang still persists with these changes.
>
> Changes made:
> As requested, I reverted everything in stubs/kvm.c and the related changes in stubs/meson.build, include/system/kvm.h, and target/i386/kvm/kvm.c. I also commented out the calls to kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in kvm_reset_vmfd().
>
> Test result:
> The issue persists - guests still hang indefinitely during boot when SMP is configured.
>
> Git diff:
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index cc5c42ce4d..04b9cbe7c9 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2622,11 +2622,12 @@ static int kvm_reset_vmfd(MachineState *ms)
> * bail if the current architecture does not support VM file
> * descriptor change.
> */
> - if (!kvm_arch_supports_vmfd_change()) {
> + /*if (!kvm_arch_supports_vmfd_change()) {
> error_report("This target architecture does not support KVM VM "
> "file descriptor change.");
> return -EOPNOTSUPP;
> }
> + */
>
> s = KVM_STATE(ms->accelerator);
> kml = &s->memory_listener;
> @@ -2659,10 +2660,10 @@ static int kvm_reset_vmfd(MachineState *ms)
> }
> assert(!err);
>
> - ret = kvm_arch_on_vmfd_change(ms, s);
> + /*ret = kvm_arch_on_vmfd_change(ms, s);
> if (ret < 0) {
> return ret;
> - }
> + }*/
>
> if (s->kernel_irqchip_allowed) {
> do_kvm_irqchip_create(s);
>
> diff --git a/include/system/kvm.h b/include/system/kvm.h
> index 5fc7251fd9..0dad0079ed 100644
> --- a/include/system/kvm.h
> +++ b/include/system/kvm.h
> @@ -456,8 +456,6 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
>
> #endif /* COMPILING_PER_TARGET */
>
> -bool kvm_arch_supports_vmfd_change(void);
> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);
>
> void kvm_cpu_synchronize_state(CPUState *cpu);
>
> diff --git a/stubs/kvm.c b/stubs/kvm.c
> deleted file mode 100644
> index 2db61d89a7..0000000000
> --- a/stubs/kvm.c
> +++ /dev/null
> @@ -1,22 +0,0 @@
> -/*
> - * kvm target arch specific stubs
> - *
> - * Copyright (c) 2026 Red Hat, Inc.
> - *
> - * Author:
> - * Ani Sinha <anisinha@redhat.com>
> - *
> - * SPDX-License-Identifier: GPL-2.0-or-later
> - */
> -#include "qemu/osdep.h"
> -#include "system/kvm.h"
> -
> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
> -{
> - abort();
> -}
> -
> -bool kvm_arch_supports_vmfd_change(void)
> -{
> - return false;
> -}
>
> diff --git a/stubs/meson.build b/stubs/meson.build
> index 6ae478bacc..8a07059500 100644
> --- a/stubs/meson.build
> +++ b/stubs/meson.build
> @@ -74,7 +74,6 @@ if have_system
> if igvm.found()
> stub_ss.add(files('igvm.c'))
> endif
> - stub_ss.add(files('kvm.c'))
> stub_ss.add(files('target-get-monitor-def.c'))
> stub_ss.add(files('target-monitor-defs.c'))
> stub_ss.add(files('win32-kbd-hook.c'))
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 524b5276a6..3dfd9a5974 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -3389,15 +3389,6 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
> return 0;
> }
>
> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
> -{
> - abort();
> -}
> -
> -bool kvm_arch_supports_vmfd_change(void)
> -{
> - return false;
> -}
>
> int kvm_arch_init(MachineState *ms, KVMState *s)
> {
>
>
> I've also tested with the latest QEMU build from master, and the issue still persists there as well. Could you suggest what additional debugging steps I should take to help identify the root cause?
Since you mentioned that 98884e0cc1 is the root of the problem, the intention is to bisect within this patch and find the problematic hunk. You reported that reverting a major part of this patch did not make a difference. Maybe you can try reverting the following refactoring too (in addition to the reverts above) and see what happens.
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0d8b0c4347..cc5c42ce4d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
}
-static void kvm_irqchip_create(KVMState *s)
+static void do_kvm_irqchip_create(KVMState *s)
{
int ret;
-
- assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
;
} else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
@@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
exit(1);
}
+}
+
+static void kvm_irqchip_create(KVMState *s)
+{
+ assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
+ do_kvm_irqchip_create(s);
kvm_kernel_irqchip = true;
/* If we have an in-kernel IRQ chip then we must have asynchronous
* interrupt delivery (though the reverse is not necessarily true)
If reverting the above still shows the issue, then I think 98884e0cc1 is not the root cause of the issue. The issue is somewhere else as we have tried reverting almost everything from that patch except implementation of kvm_reset_vmfd() which as you reported isn’t called/executed.
>
> Thanks,
> Misbah Anjum N <misanjum@linux.ibm.com>
>
>
> On 2026-03-18 15:00, Ani Sinha wrote:
>> One possible thing to try is:
>> Revert everything in stubs/kvm.c and hence changes in
>> stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c
>> introduced by 98884e0cc1 .
>> You will have to comment out calls to kvm_arch_supports_vmfd_change()
>> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since
>> kvm_reset_vmfd() is not called anyway, not should make no difference
>> if those calls are commented out.
>> Let me know what you get after doing the above.
>
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-04-07 4:09 ` Ani Sinha
@ 2026-04-07 13:45 ` Ani Sinha
0 siblings, 0 replies; 19+ messages in thread
From: Ani Sinha @ 2026-04-07 13:45 UTC (permalink / raw)
To: Misbah Anjum N
Cc: Pbonzini, Qemu Devel, Qemu Ppc, npiggin, Harsh Prateek Bora,
vaibhav, sbhat
> On 7 Apr 2026, at 9:39 AM, Ani Sinha <anisinha@redhat.com> wrote:
>
>
>
>> On 6 Apr 2026, at 2:24 PM, Misbah Anjum N <misanjum@linux.ibm.com> wrote:
>>
>> Hi Ani,
>> I've completed the testing you suggested. Unfortunately, the SMP hang still persists with these changes.
>>
>> Changes made:
>> As requested, I reverted everything in stubs/kvm.c and the related changes in stubs/meson.build, include/system/kvm.h, and target/i386/kvm/kvm.c. I also commented out the calls to kvm_arch_supports_vmfd_change() and kvm_arch_on_vmfd_change() in kvm_reset_vmfd().
>>
>> Test result:
>> The issue persists - guests still hang indefinitely during boot when SMP is configured.
>>
>> Git diff:
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index cc5c42ce4d..04b9cbe7c9 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -2622,11 +2622,12 @@ static int kvm_reset_vmfd(MachineState *ms)
>> * bail if the current architecture does not support VM file
>> * descriptor change.
>> */
>> - if (!kvm_arch_supports_vmfd_change()) {
>> + /*if (!kvm_arch_supports_vmfd_change()) {
>> error_report("This target architecture does not support KVM VM "
>> "file descriptor change.");
>> return -EOPNOTSUPP;
>> }
>> + */
>>
>> s = KVM_STATE(ms->accelerator);
>> kml = &s->memory_listener;
>> @@ -2659,10 +2660,10 @@ static int kvm_reset_vmfd(MachineState *ms)
>> }
>> assert(!err);
>>
>> - ret = kvm_arch_on_vmfd_change(ms, s);
>> + /*ret = kvm_arch_on_vmfd_change(ms, s);
>> if (ret < 0) {
>> return ret;
>> - }
>> + }*/
>>
>> if (s->kernel_irqchip_allowed) {
>> do_kvm_irqchip_create(s);
>>
>> diff --git a/include/system/kvm.h b/include/system/kvm.h
>> index 5fc7251fd9..0dad0079ed 100644
>> --- a/include/system/kvm.h
>> +++ b/include/system/kvm.h
>> @@ -456,8 +456,6 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
>>
>> #endif /* COMPILING_PER_TARGET */
>>
>> -bool kvm_arch_supports_vmfd_change(void);
>> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);
>>
>> void kvm_cpu_synchronize_state(CPUState *cpu);
>>
>> diff --git a/stubs/kvm.c b/stubs/kvm.c
>> deleted file mode 100644
>> index 2db61d89a7..0000000000
>> --- a/stubs/kvm.c
>> +++ /dev/null
>> @@ -1,22 +0,0 @@
>> -/*
>> - * kvm target arch specific stubs
>> - *
>> - * Copyright (c) 2026 Red Hat, Inc.
>> - *
>> - * Author:
>> - * Ani Sinha <anisinha@redhat.com>
>> - *
>> - * SPDX-License-Identifier: GPL-2.0-or-later
>> - */
>> -#include "qemu/osdep.h"
>> -#include "system/kvm.h"
>> -
>> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
>> -{
>> - abort();
>> -}
>> -
>> -bool kvm_arch_supports_vmfd_change(void)
>> -{
>> - return false;
>> -}
>>
>> diff --git a/stubs/meson.build b/stubs/meson.build
>> index 6ae478bacc..8a07059500 100644
>> --- a/stubs/meson.build
>> +++ b/stubs/meson.build
>> @@ -74,7 +74,6 @@ if have_system
>> if igvm.found()
>> stub_ss.add(files('igvm.c'))
>> endif
>> - stub_ss.add(files('kvm.c'))
>> stub_ss.add(files('target-get-monitor-def.c'))
>> stub_ss.add(files('target-monitor-defs.c'))
>> stub_ss.add(files('win32-kbd-hook.c'))
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 524b5276a6..3dfd9a5974 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -3389,15 +3389,6 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
>> return 0;
>> }
>>
>> -int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
>> -{
>> - abort();
>> -}
>> -
>> -bool kvm_arch_supports_vmfd_change(void)
>> -{
>> - return false;
>> -}
>>
>> int kvm_arch_init(MachineState *ms, KVMState *s)
>> {
>>
>>
>> I've also tested with the latest QEMU build from master, and the issue still persists there as well. Could you suggest what additional debugging steps I should take to help identify the root cause?
>
> Since you mentioned that 98884e0cc1 is the root of the problem, the intention is to bisect within this patch and find the problematic hunk. You reported that reverting a major part of this patch did not make a difference. Maybe you can try reverting the following refactoring too (in addition to the reverts above) and see what happens.
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 0d8b0c4347..cc5c42ce4d 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
> g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
> }
>
> -static void kvm_irqchip_create(KVMState *s)
> +static void do_kvm_irqchip_create(KVMState *s)
> {
> int ret;
> -
> - assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
> if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
> ;
> } else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
> @@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
> fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
> exit(1);
> }
> +}
> +
> +static void kvm_irqchip_create(KVMState *s)
> +{
> + assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
>
> + do_kvm_irqchip_create(s);
> kvm_kernel_irqchip = true;
> /* If we have an in-kernel IRQ chip then we must have asynchronous
> * interrupt delivery (though the reverse is not necessarily true)
>
> If reverting the above still shows the issue, then I think 98884e0cc1 is not the root cause of the issue.
Actually if you still see the issue, revert the following change as well
@@ -4015,6 +4096,7 @@ static void kvm_accel_class_init(ObjectClass *oc, const void *data)
AccelClass *ac = ACCEL_CLASS(oc);
ac->name = "KVM";
ac->init_machine = kvm_init;
+ ac->rebuild_guest = kvm_reset_vmfd;
ac->has_memory = kvm_accel_has_memory;
ac->allowed = &kvm_allowed;
ac->gdbstub_supported_sstep_flags = kvm_gdbstub_sstep_flags;
And then if you still see the issue, revert this too
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e43d18a869..e4beda0148 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -14,6 +14,7 @@ kvm_destroy_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
kvm_irqchip_commit_routes(void) ""
+kvm_reset_vmfd(void) ""
kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
kvm_irqchip_release_virq(int virq) "virq %d"
Basically incrementally keep reverting until it works and at that point you know which hunk is the cause of the problem.
> The issue is somewhere else as we have tried reverting almost everything from that patch except implementation of kvm_reset_vmfd() which as you reported isn’t called/executed.
>
>>
>> Thanks,
>> Misbah Anjum N <misanjum@linux.ibm.com>
>>
>>
>> On 2026-03-18 15:00, Ani Sinha wrote:
>>> One possible thing to try is:
>>> Revert everything in stubs/kvm.c and hence changes in
>>> stubs/meson.build, include/system/kvm.h and in target/i386/kvm/kvm.c
>>> introduced by 98884e0cc1 .
>>> You will have to comment out calls to kvm_arch_supports_vmfd_change()
>>> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd(). Since
>>> kvm_reset_vmfd() is not called anyway, not should make no difference
>>> if those calls are commented out.
>>> Let me know what you get after doing the above.
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
2026-04-06 8:54 ` Misbah Anjum N
2026-04-07 4:09 ` Ani Sinha
@ 2026-04-09 16:18 ` Harsh Prateek Bora
1 sibling, 0 replies; 19+ messages in thread
From: Harsh Prateek Bora @ 2026-04-09 16:18 UTC (permalink / raw)
To: Misbah Anjum N, Ani Sinha, Pbonzini, Qemu Devel, Qemu Ppc
Cc: npiggin, vaibhav, sbhat, Gautam Menghani
Hi Misbah,
On 06/04/26 2:24 pm, Misbah Anjum N wrote:
> Hi Ani,
> I've completed the testing you suggested. Unfortunately, the SMP hang
> still persists with these changes.
>
> Changes made:
> As requested, I reverted everything in stubs/kvm.c and the related
> changes in stubs/meson.build, include/system/kvm.h, and target/i386/kvm/
> kvm.c. I also commented out the calls to kvm_arch_supports_vmfd_change()
> and kvm_arch_on_vmfd_change() in kvm_reset_vmfd().
>
> Test result:
> The issue persists - guests still hang indefinitely during boot when SMP
> is configured.
I have posted a patch here which should fix this smp hang issue:
https://lore.kernel.org/qemu-devel/20260409161042.55281-1-harshpb@linux.ibm.com/
Could you please help validate for different scenarios and share feedback?
Thanks
Harsh
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-04-09 16:19 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 10:52 [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c Misbah Anjum N
2026-03-09 8:28 ` Misbah Anjum N
2026-03-09 11:04 ` Harsh Prateek Bora
2026-03-09 13:11 ` Ani Sinha
2026-03-09 13:23 ` Ani Sinha
2026-03-10 8:39 ` Misbah Anjum N
2026-03-10 8:54 ` Ani Sinha
2026-03-10 9:08 ` Misbah Anjum N
2026-03-10 9:34 ` Ani Sinha
2026-03-10 10:05 ` Misbah Anjum N
2026-03-10 10:12 ` Ani Sinha
2026-03-18 8:19 ` Misbah Anjum N
2026-03-18 8:39 ` Ani Sinha
2026-03-18 9:30 ` Ani Sinha
2026-04-06 8:54 ` Misbah Anjum N
2026-04-07 4:09 ` Ani Sinha
2026-04-07 13:45 ` Ani Sinha
2026-04-09 16:18 ` Harsh Prateek Bora
2026-03-09 13:30 ` Ani Sinha
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.