qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Nicholas Piggin" <npiggin@gmail.com>
To: "misanjum" <misanjum@linux.ibm.com>, <qemu-ppc@nongnu.org>
Cc: <qemu-devel@nongnu.org>
Subject: Re: [BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()”
Date: Tue, 18 Mar 2025 13:20:56 +1000	[thread overview]
Message-ID: <D8J2HBJWT4FD.11OT57ZT2LQGN@gmail.com> (raw)
In-Reply-To: <fbb76ecc616d4065de7ab56d1311f876@linux.ibm.com>

Thanks for the report.

Tricky problem. A secondary CPU is hanging before it is started by the
primary via rtas call.

That secondary keeps calling kvm_cpu_exec(), which keeps exiting out
early with EXCP_HLT because kvm_arch_process_async_events() returns
true because that cpu has ->halted=1. That just goes around he run
loop because there is an interrupt pending (DEC).

So it never runs. It also never releases the BQL, and another CPU,
the primary which is actually supposed to be running, is stuck in
spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL.

This patch just exposes the bug I think, by causing the interrupt.
although I'm not quite sure why it's okay previously (-ve decrementer
values should be causing a timer exception too). The timer exception
should not be taken as an interrupt by those secondary CPUs, and it
doesn't because it is masked, until set_all_lpcrs sets an LPCR value
that enables powersave wakeup on decrementer interrupt.

The start_powered_off sate just sets ->halted, which makes it look
like a powersaving state. Logically I think it's not the same thing
as far as spapr goes. I don't know why start_powered_off only sets
->halted, and not ->stop/stopped as well.

Not sure how best to solve it cleanly. I'll send a revert if I can't
get something working soon.

Thanks,
Nick

On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote:
> Bug Description:
> Encountering a boot failure when launching a KVM guest with 
> qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor 
> crashes.
>
>
> Reproduction Steps:
> # qemu-system-ppc64 --version
> QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
> Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
>
> # /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine 
> pseries,accel=kvm \
>    -m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
>    -device virtio-scsi-pci,id=scsi \
>    -drive 
> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 
> \
>    -device scsi-hd,drive=drive0,bus=scsi.0 \
>    -netdev bridge,id=net0,br=virbr0 \
>    -device virtio-net-pci,netdev=net0 \
>    -serial pty \
>    -device virtio-balloon-pci \
>    -cpu host
> QEMU 9.2.50 monitor - type 'help' for more information
> char device redirected to /dev/pts/2 (label serial0)
> (qemu)
> (qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but 
> unavailable: IRQ_XIVE capability must be present for KVM
> Falling back to kernel-irqchip=off
> ** Qemu Hang
>
> (In another ssh session)
> # screen /dev/pts/2
> Preparing to boot Linux version 6.10.4-200.fc40.ppc64le 
> (mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 
> (Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 
> 15:20:17 UTC 2024
> Detected machine type: 0000000000000101
> command line: 
> BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le 
> root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
> Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
> Calling ibm,client-architecture-support... done
> memory layout at init:
>    memory_limit : 0000000000000000 (16 MB aligned)
>    alloc_bottom : 0000000008200000
>    alloc_top    : 0000000030000000
>    alloc_top_hi : 0000000800000000
>    rmo_top      : 0000000030000000
>    ram_top      : 0000000800000000
> instantiating rtas at 0x000000002fff0000... done
> prom_hold_cpus: skipped
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
> Device tree struct  0x0000000008220000 -> 0x0000000008230000
> Quiescing Open Firmware ...
> Booting Linux via __start() @ 0x0000000000440000 ...
> ** Guest Console Hang
>
>
> Git Bisect:
> Performing git bisect points to the following patch:
> # git bisect bad
> e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
> commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Thu Dec 19 13:40:31 2024 +1000
>
>      target/ppc: fix timebase register reset state
>
>      (H)DEC and PURR get reset before icount does, which causes them to 
> be
>      skewed and not match the init state. This can cause replay to not
>      match the recorded trace exactly. For DEC and HDEC this is usually 
> not
>      noticable since they tend to get programmed before affecting the
>      target machine. PURR has been observed to cause replay bugs when
>      running Linux.
>
>      Fix this by resetting using a time of 0.
>
>      Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
>      Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
>   hw/ppc/ppc.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
>
>
> Reverting the patch helps boot the guest.
> Thanks,
> Misbah Anjum N



      reply	other threads:[~2025-03-18  3:21 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17 21:09 [BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()” misanjum
2025-03-18  3:20 ` Nicholas Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D8J2HBJWT4FD.11OT57ZT2LQGN@gmail.com \
    --to=npiggin@gmail.com \
    --cc=misanjum@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).