From: Misbah Anjum N <misanjum@linux.ibm.com>
To: Ani Sinha <anisinha@redhat.com>, Pbonzini <pbonzini@redhat.com>,
Qemu Devel <qemu-devel@nongnu.org>,
Qemu Ppc <qemu-ppc@nongnu.org>
Cc: npiggin@gmail.com, Harsh Prateek Bora <harshpb@linux.ibm.com>,
vaibhav@linux.ibm.com, sbhat@linux.ibm.com
Subject: Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c
Date: Wed, 18 Mar 2026 13:49:32 +0530 [thread overview]
Message-ID: <526d7172f3933baf913bf4b105a6fa9a@linux.ibm.com> (raw)
In-Reply-To: <B2B682DB-1467-4C42-AF74-346BC61DE31B@redhat.com>
Hi Ani and Paolo,
Following up on the KVM guest boot issue due to commit 98884e0c, I have
conducted additional testing that reveals important new information
about the nature of this issue.
The hang is specifically triggered when SMP is configured, that is, when
-smp parameter is provided in the QEMU command. This is also validated
via KVM Unit Tests involving SMP which are failing due to the same
commit.
Test Results:
Without SMP (boots successfully):
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm \
-enable-kvm -m 32768 -nographic -device virtio-balloon \
-device virtio-scsi-pci,id=scsi0 \
-drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
\
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
-netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
SLOF
**********************************************************************
QEMU Starting
Build Date = Oct 26 2025 18:45:22
FW Version = release 20251026
Press "s" to enter Open Firmware.
Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
...
...
Result: Guest boots successfully
With SMP (hangs indefinitely):
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm \
-enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
-device virtio-balloon -device virtio-scsi-pci,id=scsi0 \
-drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le-hpb.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
\
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 \
-netdev bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
...
...
Result: Hangs indefinitely at bql_lock() in qemu_default_main()
KVM Unit Tests:
Running kvm-unit-tests confirms the SMP dependency. Note that tests
explicitly involving SMP (smp, smp-smt, atomics) all fail with SIGKILL,
while single-threaded tests pass.
# ./run_tests.sh
FAIL selftest-setup (terminated on SIGKILL)
PASS selftest-migration (2 tests)
PASS selftest-migration-skip (1 tests)
PASS migration-memory (1 tests)
PASS spapr_hcall (9 tests, 1 skipped)
PASS spapr_vpa (13 tests)
PASS rtas-get-time-of-day (10 tests)
PASS rtas-get-time-of-day-base (10 tests)
PASS rtas-set-time-of-day (5 tests)
PASS emulator (4 tests)
PASS interrupts (13 tests)
FAIL mmu (terminated on SIGKILL)
FAIL smp (terminated on SIGKILL)
FAIL smp-smt (terminated on SIGKILL)
SKIP smp-thread-single (qemu-system-ppc64: -accel tcg,thread=single:
invalid accelerator tcg)
FAIL atomics (terminated on SIGKILL)
PASS atomics-migration (1 tests)
PASS timebase (12 tests, 1 known failures, 1 skipped)
SKIP timebase-icount (qemu-system-ppc64: -icount shift=5: cannot
configure icount, TCG support not available)
FAIL h_cede_tm
PASS sprs (14 tests)
FAIL sprs-migration (14 tests, 1 unexpected failures)
PASS sieve
Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>
On 2026-03-10 15:42, Ani Sinha wrote:
>>> This theory is not substantiated by code or evidence.
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a introduces kvm_reset_vmfd()
>>> which is called by this block of code with the tip at
>>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a :
>>> if (!cpus_are_resettable() &&
>>> (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
>>> reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
>>> if (ac->rebuild_guest) {
>>> ret = ac->rebuild_guest(current_machine);
>>> if (ret < 0) {
>>> error_report("unable to rebuild guest: %s(%d)",
>>> strerror(-ret), ret);
>>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>>> } else {
>>> info_report("virtual machine state has been rebuilt
>>> with new "
>>> "guest file handle.");
>>> guest_state_rebuilt = true;
>>> }
>>> } else if (!cpus_are_resettable()) {
>>> error_report("accelerator does not support reset!");
>>> } else {
>>> error_report("accelerator does not support rebuilding
>>> guest state,"
>>> " proceeding with normal reset!");
>>> }
>>> }
>>> If cpus are resettable, this block will not be called and nothing
>>> that
>>> the patch introduces will have been executed.
>>> So I think you guys need to explain a bit more why you so strongly
>>> feel this patch broke it. I am confused and unable to reason this.
>>>> Did you validate your patches on other architectures which does not
>>>> support this feature yet?
>>> As you have already seen, on other architectures, the entire block of
>>> code is not executed at all. Only SEV-ES, SEV-SNP and TDX currently
>>> exercises this.
>>
>> I understand your concern about the code path analysis. Let me clarify
>> our findings with concrete evidence.
>>
>> Reproducibility Evidence:
>> With commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a applied, we are
>> able to reproduce the hang issue 100% of the time across multiple test
>> runs. When we revert to the previous commit
>> df8df3cb6b743372ebb335bd8404bc3d748da350, the same KVM guest boots
>> successfully 100% of the time.
>>
>> This consistent reproducibility strongly indicates that commit
>> 98884e0cc10997a17ce9abfd6ff10be19224ca6a is introducing the
>> regression, even if the code path analysis suggests otherwise. This
>> suggests the issue may not be in the code path, but rather in the
>> changes introduced by the patch series.
>>
>> As the author who led the development of this patch series, we would
>> appreciate your help in figuring out this issue.
>
> I am really not sure what changes in that patch can cause this
> breakage in a completely unrelated area when the changes are not even
> executed.
>
next prev parent reply other threads:[~2026-03-18 8:20 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-06 10:52 [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c Misbah Anjum N
2026-03-09 8:28 ` Misbah Anjum N
2026-03-09 11:04 ` Harsh Prateek Bora
2026-03-09 13:11 ` Ani Sinha
2026-03-09 13:23 ` Ani Sinha
2026-03-10 8:39 ` Misbah Anjum N
2026-03-10 8:54 ` Ani Sinha
2026-03-10 9:08 ` Misbah Anjum N
2026-03-10 9:34 ` Ani Sinha
2026-03-10 10:05 ` Misbah Anjum N
2026-03-10 10:12 ` Ani Sinha
2026-03-18 8:19 ` Misbah Anjum N [this message]
2026-03-18 8:39 ` Ani Sinha
2026-03-18 9:30 ` Ani Sinha
2026-04-06 8:54 ` Misbah Anjum N
2026-04-07 4:09 ` Ani Sinha
2026-04-07 13:45 ` Ani Sinha
2026-04-09 16:18 ` Harsh Prateek Bora
2026-03-09 13:30 ` Ani Sinha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=526d7172f3933baf913bf4b105a6fa9a@linux.ibm.com \
--to=misanjum@linux.ibm.com \
--cc=anisinha@redhat.com \
--cc=harshpb@linux.ibm.com \
--cc=npiggin@gmail.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=sbhat@linux.ibm.com \
--cc=vaibhav@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.