From: Gavin Shan <gshan@redhat.com>
To: Salil Mehta <salil.mehta@opnsrc.net>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org, mst@redhat.com,
salil.mehta@huawei.com, maz@kernel.org, jean-philippe@linaro.org,
jonathan.cameron@huawei.com, lpieralisi@kernel.org,
peter.maydell@linaro.org, richard.henderson@linaro.org,
imammedo@redhat.com, armbru@redhat.com, andrew.jones@linux.dev,
david@redhat.com, philmd@linaro.org, eric.auger@redhat.com,
will@kernel.org, ardb@kernel.org, oliver.upton@linux.dev,
pbonzini@redhat.com, rafael@kernel.org,
borntraeger@linux.ibm.com, alex.bennee@linaro.org,
gustavo.romero@linaro.org, npiggin@gmail.com,
harshpb@linux.ibm.com, linux@armlinux.org.uk,
darren@os.amperecomputing.com, ilkka@os.amperecomputing.com,
vishnu@os.amperecomputing.com,
gankulkarni@os.amperecomputing.com, karl.heubaum@oracle.com,
miguel.luis@oracle.com, zhukeqian1@huawei.com,
wangxiongfeng2@huawei.com, wangyanan55@huawei.com,
wangzhou1@hisilicon.com, linuxarm@huawei.com,
jiakernel2@gmail.com, maobibo@loongson.cn,
lixianglai@loongson.cn, shahuang@redhat.com, zhao1.liu@intel.com,
Keqian Zhu <zhuqian1@huawei.com>
Subject: Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Date: Thu, 23 Oct 2025 11:58:19 +1000 [thread overview]
Message-ID: <fa5d5a65-616f-405a-a5e5-f8b70ff32193@redhat.com> (raw)
In-Reply-To: <CAJ7pxeYEpJGhtL1-3qFEJYTzL-s19fF-it6p5dkq=fg384wBpg@mail.gmail.com>
Hi Salil,
On 10/23/25 10:35 AM, Salil Mehta wrote:
> On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <gshan@redhat.com> wrote:
>> On 10/23/25 4:50 AM, Salil Mehta wrote:
>>> On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <salil.mehta@opnsrc.net> wrote:
>>>> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
>>>>> On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
>>>>>> From: Salil Mehta <salil.mehta@huawei.com>
[...]
>>>>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
>>>>>> +{
>>>>>> + CPUState *cs = CPU(cpu);
>>>>>> + unsigned long vcpu_id = cs->cpu_index;
>>>>>> + int ret;
>>>>>> +
>>>>>> + ret = kvm_create_vcpu(cs);
>>>>>> + if (ret < 0) {
>>>>>> + error_report("Failed to create host vcpu %ld", vcpu_id);
>>>>>> + abort();
>>>>>> + }
>>>>>> +
>>>>>> + /*
>>>>>> + * Initialize the vCPU in the host. This will reset the sys regs
>>>>>> + * for this vCPU and related registers like MPIDR_EL1 etc. also
>>>>>> + * get programmed during this call to host. These are referenced
>>>>>> + * later while setting device attributes of the GICR during GICv3
>>>>>> + * reset.
>>>>>> + */
>>>>>> + ret = kvm_arch_init_vcpu(cs);
>>>>>> + if (ret < 0) {
>>>>>> + error_report("Failed to initialize host vcpu %ld", vcpu_id);
>>>>>> + abort();
>>>>>> + }
>>>>>> +
>>>>>> + /*
>>>>>> + * park the created vCPU. shall be used during kvm_get_vcpu() when
>>>>>> + * threads are created during realization of ARM vCPUs.
>>>>>> + */
>>>>>> + kvm_park_vcpu(cs);
>>>>>> +}
>>>>>> +
>>>>>
>>>>> I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
>>>>> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
>>>>> SVE is supported by default.
>>>>
>>>> Thanks for reporting this. That is not true. As long as we initialize
>>>> KVM correctly and
>>>> finalize the features like SVE we should be fine. In fact, this is
>>>> precisely what we are
>>>> doing right now.
>>>>
>>>> To understand the crash, I need a bit more info.
>>>>
>>>> 1# is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you check
>>>> within the KVM if it is happening because
>>>> a. features specified by QEMU are not matching the defaults within the KVM
>>>> (HInt: check kvm_vcpu_init_check_features())?
>>>> b. or complaining about init feate change kvm_vcpu_init_changed()?
>>>> 2# or it is happening during the setting of vector length or
>>>> finalizing features?
>>>>
>>>> int kvm_arch_init_vcpu(CPUState *cs)
>>>> {
>>>> [...]
>>>> /* Do KVM_ARM_VCPU_INIT ioctl */
>>>> ret = kvm_arm_vcpu_init(cpu); ---->[1]
>>>> if (ret) {
>>>> return ret;
>>>> }
>>>> if (cpu_isar_feature(aa64_sve, cpu)) {
>>>> ret = kvm_arm_sve_set_vls(cpu); ---->[2]
>>>> if (ret) {
>>>> return ret;
>>>> }
>>>> ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
>>>> if (ret) {
>>>> return ret;
>>>> }
>>>> }
>>>> [...]
>>>> }
>>>>
>>>> I think it's happening because vector length is going uninitialized.
>>>> This initialization
>>>> happens in context to arm_cpu_finalize_features() which I forgot to call before
>>>> calling KVM finalize.
>>>>
>>>>>
>>>>> kvm_arch_init_vcpu() is supposed to be called in the realization path in current
>>>>> implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
>>>>> is populated at vCPU realization time.
>>>>
>>>> Not necessarily. It is just meant to initialize the KVM. If we take care of the
>>>> KVM requirements in the similar way the realize path does we should be
>>>> fine. Can you try to add the patch below in your code and test if it works?
>>>>
>>>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>>>> index c4b68a0b17..1091593478 100644
>>>> --- a/target/arm/kvm.c
>>>> +++ b/target/arm/kvm.c
>>>> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
>>>> abort();
>>>> }
>>>>
>>>> + /* finalize the features like SVE, SME etc */
>>>> + arm_cpu_finalize_features(cpu, &error_abort);
>>>> +
>>>> /*
>>>> * Initialize the vCPU in the host. This will reset the sys regs
>>>> * for this vCPU and related registers like MPIDR_EL1 etc. also
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>>>>> --enable-kvm -machine virt,gic-version=3 -cpu host \
>>>>> -smp cpus=4,disabledcpus=2 -m 1024M \
>>>>> -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image \
>>>>> -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
>>>>> qemu-system-aarch64: Failed to initialize host vcpu 4
>>>>> Aborted (core dumped)
>>>>>
>>>>> Backtrace
>>>>> =========
>>>>> (gdb) bt
>>>>> #0 0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
>>>>> #1 0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
>>>>> #2 0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
>>>>> #3 0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
>>>>> at ../target/arm/kvm.c:1081
>>>>> #4 0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
>>>>> at ../hw/arm/virt.c:2483
>>>>> #5 0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
>>>>> #6 0x0000aaaab160f220 in machine_run_board_init
>>>>> (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
>>>>> #7 0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
>>>>> #8 0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
>>>>> at ../system/vl.c:2821
>>>>> #9 0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
>>>>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71
>>>>
>>>>
>>>> Thank you for this. Please let me know if the above fix works and also
>>>> the return values in
>>>> case you encounter errors.
>>>
>>> I've pushed the fix to below branch for your convenience:
>>>
>>> Branch: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
>>> Fix: https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
>>>
>>
>> I guess rfc-v6.2 branch isn't ready for test because it runs into another crash
>> dump with rfc-v6.2 branch, like below.
>
>
> rfc-6.2 is not crashing on Kunpeng920 where I tested. But this
> chip does not have some ARM extensions like SVE etc so
> Unfortunately, I can't test SVE/SME/PAuth etc support.
>
> Can you disable SVE and then try if it comes up just to corner
> the case?
>
Right, this crash dump shouldn't be encountered if SVE isn't supported. I already
had the workaround "-cpu host,sve=off" to keep my tests moving forwards...
>>
>> host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>> -accel kvm -machine virt,gic-version=host,nvdimm=on \
>> -cpu host,sve=on \
>> -smp maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
>> -m 4096M,slots=16,maxmem=128G \
>> -object memory-backend-ram,id=mem0,size=2048M \
>> -object memory-backend-ram,id=mem1,size=2048M \
>> -numa node,nodeid=0,memdev=mem0,cpus=0-1 \
>> -numa node,nodeid=1,memdev=mem1,cpus=2-3 \
>> -L /home/gavin/sandbox/qemu.main/build/pc-bios \
>> -monitor none -serial mon:stdio -nographic -gdb tcp::6666 \
>> -qmp tcp:localhost:5555,server,wait=off \
>> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \
>> -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image \
>> -initrd /home/gavin/sandbox/images/rootfs.cpio.xz \
>> -append memhp_default_state=online_movable
>> :
>> :
>> guest$ cd /sys/devices/system/cpu/
>> guest$ cat present enabled online
>> 0-3
>> 0-1
>> 0-1
>> (qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
>> qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation not permitted
>
>
> Ah, I see. I think I understand the issue. It's complaining
> about calling the finalize twice. Is it possible to check as
> I do not have a way to test it?
>
>
> int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
> {
> switch (feature) {
> case KVM_ARM_VCPU_SVE:
> [...]
> if (kvm_arm_vcpu_sve_finalized(vcpu))
> return -EPERM;-----> this where it must be popping?
> [...]
> }
>
Right, I think that's the case: QEMU tries to finalize SVE capability for twice,
which is the real problem. I'm explaining what I found as below, which would be
helpful to the forthcoming revisions.
machvirt_init
virt_setup_lazy_vcpu_realization
arm_cpu_finalize_features
kvm_arm_create_host_vcpu
kvm_create_vcpu // New fd is created
kvm_arch_init_vcpu
kvm_arm_vcpu_init
kvm_arm_sve_set_vls
kvm_arm_vcpu_finalize // (A) SVE capability is finalized
device_set_admin_power_state
device_pre_poweron
virt_machine_device_pre_poweron
virt_cpu_pre_poweron
qdev_realize
arm_cpu_realizefn
cpu_exec_realizefn
arm_cpu_finalize_features // Called for the second time
qemu_init_vcpu
kvm_start_vcpu_thread
kvm_vcpu_thread_fn
kvm_init_vcpu
kvm_create_vcpu // Called for the second time
kvm_arch_init_vcpu // Called for the second time
kvm_arm_vcpu_init
kvm_arm_sve_set_vls // (B) Failed here
kvm_arm_vcpu_finalize
(B) where we try to finalize SVE capability again. It has been finalized at (A)
Fianlizing SVE capability for twice is disallowed by KVM on the host side.
>>
>> I picked the fix (the last patch in rfc-v6.2 branch) to rfc-v6 branch, same crash dump
>> can be seen.
>
> Are you getting previously reported abort or above new problem?
>
Previously, the VM can't be started. After your fix is applied, the VM is able to start.
It's a new problem that qemu crash dump is seens on attempt to hot add a vCPU.
Thanks,
Gavin
next prev parent reply other threads:[~2025-10-23 1:59 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 1:01 [PATCH RFC V6 00/24] Support of Virtual CPU Hotplug-like Feature for ARMv8+ Arch salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 01/24] hw/core: Introduce administrative power-state property and its accessors salil.mehta
2025-10-09 10:48 ` Miguel Luis
2025-10-01 1:01 ` [PATCH RFC V6 02/24] hw/core, qemu-options.hx: Introduce 'disabledcpus' SMP parameter salil.mehta
2025-10-09 11:28 ` Miguel Luis
2025-10-09 13:17 ` Igor Mammedov
2025-10-09 11:51 ` Markus Armbruster
2025-10-28 5:48 ` Gavin Shan
2025-10-01 1:01 ` [PATCH RFC V6 03/24] hw/arm/virt: Clamp 'maxcpus' as-per machine's vCPU deferred online-capability salil.mehta
2025-10-09 12:32 ` Miguel Luis
2025-10-09 13:11 ` Igor Mammedov
2025-10-01 1:01 ` [PATCH RFC V6 04/24] arm/virt, target/arm: Add new ARMCPU {socket, cluster, core, thread}-id property salil.mehta
2025-10-28 6:24 ` [PATCH RFC V6 04/24] arm/virt,target/arm: Add new ARMCPU {socket,cluster,core,thread}-id property Gavin Shan
2025-10-01 1:01 ` [PATCH RFC V6 05/24] arm/virt, kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init salil.mehta
2025-10-22 10:36 ` [PATCH RFC V6 05/24] arm/virt,kvm: " Gavin Shan
2025-10-22 18:18 ` Salil Mehta
2025-10-22 18:50 ` Salil Mehta
2025-10-23 0:14 ` Gavin Shan
2025-10-23 0:35 ` Salil Mehta
2025-10-23 1:29 ` Salil Mehta
2025-10-23 4:14 ` Gavin Shan
2025-10-23 11:27 ` Salil Mehta
2025-10-23 1:58 ` Gavin Shan [this message]
2025-10-23 11:17 ` Salil Mehta
2025-10-01 1:01 ` [PATCH RFC V6 06/24] arm/virt, gicv3: Pre-size GIC with possible " salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 07/24] arm/gicv3: Refactor CPU interface init for shared TCG/KVM use salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 08/24] arm/virt, gicv3: Guard CPU interface access for admin disabled vCPUs salil.mehta
2025-10-24 4:07 ` Gavin Shan
2025-10-28 11:59 ` Gavin Shan
2025-10-01 1:01 ` [PATCH RFC V6 09/24] hw/intc/arm_gicv3_common: Migrate & check 'GICv3CPUState' accessibility mismatch salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 10/24] arm/virt: Init PMU at host for all present vCPUs salil.mehta
2025-10-03 15:02 ` Igor Mammedov
2025-10-01 1:01 ` [PATCH RFC V6 11/24] hw/arm/acpi: MADT change to size the guest with possible vCPUs salil.mehta
2025-10-03 15:09 ` Igor Mammedov
[not found] ` <0175e40f70424dd9a29389b8a4f16c42@huawei.com>
2025-10-07 12:20 ` Igor Mammedov
2025-10-10 3:15 ` Salil Mehta
2025-10-01 1:01 ` [PATCH RFC V6 12/24] hw/core: Introduce generic device power-state handler interface salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 13/24] qdev: make admin power state changes trigger platform transitions via ACPI salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 14/24] arm/acpi: Introduce dedicated CPU OSPM interface for ARM-like platforms salil.mehta
2025-10-03 14:58 ` Igor Mammedov
[not found] ` <7da6a9c470684754810414f0abd23a62@huawei.com>
2025-10-07 12:06 ` Igor Mammedov
2025-10-10 3:00 ` Salil Mehta
2025-10-24 4:47 ` Gavin Shan
2025-10-01 1:01 ` [PATCH RFC V6 15/24] acpi/ged: Notify OSPM of CPU administrative state changes via GED salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 16/24] arm/virt/acpi: Update ACPI DSDT Tbl to include 'Online-Capable' CPUs AML salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 17/24] hw/arm/virt, acpi/ged: Add PowerStateHandler hooks for runtime CPU state changes salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 18/24] target/arm/kvm, tcg: Handle SMCCC hypercall exits in VMM during PSCI_CPU_{ON, OFF} salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 19/24] target/arm/cpu: Add the Accessor hook to fetch ARM CPU arch-id salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 20/24] target/arm/kvm: Write vCPU's state back to KVM on cold-reset salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 21/24] hw/intc/arm-gicv3-kvm: Pause all vCPUs & cache ICC_CTLR_EL1 for userspace PSCI CPU_ON salil.mehta
2025-10-01 1:01 ` [PATCH RFC V6 22/24] monitor, qdev: Introduce 'device_set' to change admin state of existing devices salil.mehta
2025-10-09 8:55 ` [PATCH RFC V6 22/24] monitor,qdev: " Markus Armbruster
2025-10-09 12:51 ` Igor Mammedov
2025-10-09 14:03 ` Daniel P. Berrangé
2025-10-09 14:55 ` Markus Armbruster
2025-10-09 15:19 ` Peter Maydell
2025-10-10 4:59 ` Markus Armbruster
2025-10-17 14:50 ` Igor Mammedov
2025-10-20 11:22 ` Markus Armbruster
2025-10-29 10:08 ` Igor Mammedov
2025-10-29 11:38 ` Markus Armbruster
2025-11-03 8:27 ` Igor Mammedov
2025-11-07 13:10 ` Markus Armbruster
2025-10-01 1:01 ` [PATCH RFC V6 23/24] monitor, qapi: add 'info cpus-powerstate' and QMP query (Admin + Oper states) salil.mehta
2025-10-09 11:53 ` [PATCH RFC V6 23/24] monitor,qapi: " Markus Armbruster
2025-10-01 1:01 ` [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc salil.mehta
2025-10-01 21:34 ` Richard Henderson
2025-10-02 12:27 ` Salil Mehta via
2025-10-02 15:41 ` Richard Henderson
2025-10-07 10:14 ` Salil Mehta via
2025-10-06 14:00 ` [PATCH RFC V6 00/24] Support of Virtual CPU Hotplug-like Feature for ARMv8+ Arch Igor Mammedov
2025-10-13 0:34 ` Gavin Shan
2025-10-22 10:07 ` Gavin Shan
2025-10-24 6:55 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fa5d5a65-616f-405a-a5e5-f8b70ff32193@redhat.com \
--to=gshan@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=andrew.jones@linux.dev \
--cc=ardb@kernel.org \
--cc=armbru@redhat.com \
--cc=borntraeger@linux.ibm.com \
--cc=darren@os.amperecomputing.com \
--cc=david@redhat.com \
--cc=eric.auger@redhat.com \
--cc=gankulkarni@os.amperecomputing.com \
--cc=gustavo.romero@linaro.org \
--cc=harshpb@linux.ibm.com \
--cc=ilkka@os.amperecomputing.com \
--cc=imammedo@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=jiakernel2@gmail.com \
--cc=jonathan.cameron@huawei.com \
--cc=karl.heubaum@oracle.com \
--cc=linux@armlinux.org.uk \
--cc=linuxarm@huawei.com \
--cc=lixianglai@loongson.cn \
--cc=lpieralisi@kernel.org \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=miguel.luis@oracle.com \
--cc=mst@redhat.com \
--cc=npiggin@gmail.com \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rafael@kernel.org \
--cc=richard.henderson@linaro.org \
--cc=salil.mehta@huawei.com \
--cc=salil.mehta@opnsrc.net \
--cc=shahuang@redhat.com \
--cc=vishnu@os.amperecomputing.com \
--cc=wangxiongfeng2@huawei.com \
--cc=wangyanan55@huawei.com \
--cc=wangzhou1@hisilicon.com \
--cc=will@kernel.org \
--cc=zhao1.liu@intel.com \
--cc=zhukeqian1@huawei.com \
--cc=zhuqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).