From: "Scott J. Goldman" <scottjgo@gmail.com>
To: "Mohamed Mediouni" <mohamed@unpredictable.fr>,
"Scott J. Goldman" <scottjgo@gmail.com>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org,
"Peter Maydell" <peter.maydell@linaro.org>,
"Alexander Graf" <agraf@csgraf.de>,
"Phil Dennis-Jordan" <phil@philjordan.eu>,
"Roman Bolshakov" <rbolshakov@ddn.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>
Subject: Re: [PATCH v3] target/arm/hvf: Fix WFI halting to stop idle vCPU spinning
Date: Tue, 12 May 2026 22:25:49 -0400 [thread overview]
Message-ID: <DIH6YGXVPFD4.2O7V71XKGBIB@gmail.com> (raw)
In-Reply-To: <E8468087-647B-45B5-AF4E-85DD15A6D35C@unpredictable.fr>
On Tue May 12, 2026 at 9:21 PM EDT, Mohamed Mediouni wrote:
>
>> On 27. Apr 2026, at 21:55, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>
>> Commit b5f8f77271 ("accel/hvf: Implement WFI without using pselect()")
>> changed hvf_wfi() from blocking the vCPU thread with pselect() to
>> returning EXCP_HLT, intending QEMU's main event loop to handle the
>> idle wait. However, cpu->halted was never set, so cpu_thread_is_idle()
>> always returns false and the vCPU thread spins at 100% CPU per core
>> while the guest is idle.
>>
>> Fix this by:
>>
>> 1. Setting cpu->halted = 1 in hvf_wfi() so the vCPU thread sleeps on
>> halt_cond in qemu_process_cpu_events().
>>
>> 2. Arming a per-vCPU QEMU_CLOCK_VIRTUAL timer to fire when the guest's
>> virtual timer (CNTV_CVAL_EL0) would expire. This is necessary
>> because HVF only delivers HV_EXIT_REASON_VTIMER_ACTIVATED during
>> hv_vcpu_run(), which is not called while the CPU is halted. The
>> timer callback mirrors the VTIMER_ACTIVATED handler: it raises the
>> vtimer IRQ through the GIC and marks vtimer_masked, causing the
>> interrupt delivery chain to wake the vCPU via qemu_cpu_kick().
>>
>> 3. Clearing cpu->halted in hvf_arch_vcpu_exec() when cpu_has_work()
>> indicates a pending interrupt, and cancelling the WFI timer.
>>
>> 4. Re-arming the WFI timer from hvf_vm_state_change() on the resume
>> transition for any halted vCPU, since the QEMUTimer is per-instance
>> state and is not migrated. After cpu_synchronize_all_states() the
>> migrated vtimer state is mirrored in env, so we can read CNTV_CTL
>> and CNTV_CVAL from there. If the vtimer has already expired by the
>> time the destination resumes, hvf_wfi_timer_cb() is invoked
>> directly so the halted vCPU is woken up.
>>
>> Fixes: b5f8f77271 ("accel/hvf: Implement WFI without using pselect()")
>> Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
>> ---
>> Changes since v2:
>> - Use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_HOST so the timer
>> pauses with the VM and a halted vCPU isn't woken (or its IRQ
>> raised) while the user has stopped the guest. (Peter)
>> - Convert vtimer ticks to nanoseconds with muldiv64() to avoid
>> intermediate overflow. (Peter)
>> - Re-arm the WFI timer from hvf_vm_state_change() on the resume
>> transition so a halted vCPU on the migration destination is
>> woken when its vtimer expires (the QEMUTimer is per-instance
>> state and isn't migrated). (Peter)
>> v2: https://lore.kernel.org/qemu-devel/20260410055045.63001-1-scottjgo@gmail.com/
>> v1: https://lore.kernel.org/qemu-devel/20260410044726.61853-1-scottjgo@gmail.com/
>
> For QEMU 11.0 (for backporting to stable):
>
> Reviewed-by: Mohamed Mediouni <mohamed@unpredictable.fr>
>
> For QEMU 11.1:
>
> Adding some checks for !hvf_irqchip_in_kernel() needed
> but can do them on my side if you prefer.
>
> Looks ready apart from that bit.
Hi Mohamed,
TFTR - I sent a follow-up patch that gates the wfi timer as you requested.
For applying this v3 version to the stable tree, is there anyone else I
should ping? I'm not very familiar with the process.
Thanks!
-sjg
>
>>
>> include/system/hvf_int.h | 1 +
>> target/arm/hvf/hvf.c | 124 ++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 124 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/system/hvf_int.h b/include/system/hvf_int.h
>> index 2621164cb2..58fb865eba 100644
>> --- a/include/system/hvf_int.h
>> +++ b/include/system/hvf_int.h
>> @@ -48,6 +48,7 @@ struct AccelCPUState {
>> hv_vcpu_exit_t *exit;
>> bool vtimer_masked;
>> bool guest_debug_enabled;
>> + struct QEMUTimer *wfi_timer;
>> #endif
>> };
>>
>> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
>> index 678afe5c8e..a19d7a5e1f 100644
>> --- a/target/arm/hvf/hvf.c
>> +++ b/target/arm/hvf/hvf.c
>> @@ -28,6 +28,7 @@
>> #include "hw/core/boards.h"
>> #include "hw/core/irq.h"
>> #include "qemu/main-loop.h"
>> +#include "qemu/timer.h"
>> #include "system/cpus.h"
>> #include "arm-powerctl.h"
>> #include "target/arm/cpu.h"
>> @@ -301,6 +302,8 @@ void hvf_arm_init_debug(void)
>> #define TMR_CTL_IMASK (1 << 1)
>> #define TMR_CTL_ISTATUS (1 << 2)
>>
>> +static void hvf_wfi_timer_cb(void *opaque);
>> +
>> static uint32_t chosen_ipa_bit_size;
>>
>> typedef struct HVFVTimer {
>> @@ -1214,6 +1217,9 @@ void hvf_arch_vcpu_destroy(CPUState *cpu)
>> {
>> hv_return_t ret;
>>
>> + timer_free(cpu->accel->wfi_timer);
>> + cpu->accel->wfi_timer = NULL;
>> +
>> ret = hv_vcpu_destroy(cpu->accel->fd);
>> assert_hvf_ok(ret);
>> }
>> @@ -1352,6 +1358,9 @@ int hvf_arch_init_vcpu(CPUState *cpu)
>> arm_cpu->isar.idregs[ID_AA64MMFR0_EL1_IDX]);
>> assert_hvf_ok(ret);
>>
>> + cpu->accel->wfi_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>> + hvf_wfi_timer_cb, cpu);
>> +
> adding !hvf_irqchip_in_kernel()
>
>> aarch64_add_sme_properties(OBJECT(cpu));
>> return 0;
>> }
>> @@ -2027,8 +2036,67 @@ static uint64_t hvf_vtimer_val_raw(void)
>> return mach_absolute_time() - hvf_state->vtimer_offset;
>> }
>>
>> +static void hvf_wfi_timer_cb(void *opaque)
>> +{
>> + CPUState *cpu = opaque;
>> + ARMCPU *arm_cpu = ARM_CPU(cpu);
>> +
>> + /*
>> + * vtimer expired while the CPU was halted for WFI.
>> + * Mirror HV_EXIT_REASON_VTIMER_ACTIVATED: raise the vtimer
>> + * interrupt and mark as masked so hvf_sync_vtimer() will
>> + * check and unmask when the guest handles it.
>> + *
>> + * The interrupt delivery chain (GIC -> cpu_interrupt ->
>> + * qemu_cpu_kick) wakes the vCPU thread from halt_cond.
>> + */
>> + qemu_set_irq(arm_cpu->gt_timer_outputs[GTIMER_VIRT], 1);
>> + cpu->accel->vtimer_masked = true;
>> +}
>> +
>> +/*
>> + * Arm a host-side QEMU_CLOCK_VIRTUAL timer to fire when the guest's
>> + * vtimer (CNTV_CVAL_EL0) is scheduled to expire. HVF only delivers
>> + * HV_EXIT_REASON_VTIMER_ACTIVATED during hv_vcpu_run(), which we won't
>> + * call while the vCPU is halted, so we need this to wake the vCPU.
>> + *
>> + * QEMU_CLOCK_VIRTUAL pauses while the VM is stopped, which keeps the
>> + * timer in lockstep with the guest's view of vtime across pause/resume.
>> + *
>> + * Caller must supply the current CNTV_CTL_EL0 and CNTV_CVAL_EL0 values,
>> + * since the appropriate source (HVF vs. env) depends on context.
>> + *
>> + * Returns 0 if the timer was armed (or if the vtimer is disabled/masked
>> + * and the vCPU should still halt waiting on another event), or -1 if
>> + * the vtimer has already expired.
>> + */
>> +static int hvf_arm_wfi_timer(CPUState *cpu, uint64_t ctl, uint64_t cval)
>> +{
>> + ARMCPU *arm_cpu = ARM_CPU(cpu);
>> + uint64_t now;
>> + int64_t delta_ns;
>> +
>> + if (!(ctl & TMR_CTL_ENABLE) || (ctl & TMR_CTL_IMASK)) {
>> + return 0;
>> + }
>> +
>> + now = hvf_vtimer_val_raw();
>> + if (cval <= now) {
>> + return -1;
>> + }
>> +
>> + delta_ns = muldiv64(cval - now, NANOSECONDS_PER_SECOND,
>> + arm_cpu->gt_cntfrq_hz);
>> + timer_mod(cpu->accel->wfi_timer,
>> + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + delta_ns);
>> + return 0;
>> +}
>> +
>> static int hvf_wfi(CPUState *cpu)
>> {
>> + uint64_t ctl, cval;
>> + hv_return_t r;
>> +
>> if (cpu_has_work(cpu)) {
>> /*
>> * Don't bother to go into our "low power state" if
>> @@ -2037,6 +2105,22 @@ static int hvf_wfi(CPUState *cpu)
>> return 0;
>> }
>>
>> + /*
>> + * Read the vtimer state directly from HVF. We're on the vCPU thread,
>> + * just exited from hv_vcpu_run(), so HVF holds the authoritative
>> + * values and env may be stale.
>> + */
>> + r = hv_vcpu_get_sys_reg(cpu->accel->fd, HV_SYS_REG_CNTV_CTL_EL0, &ctl);
>> + assert_hvf_ok(r);
>> + r = hv_vcpu_get_sys_reg(cpu->accel->fd, HV_SYS_REG_CNTV_CVAL_EL0, &cval);
>> + assert_hvf_ok(r);
>> +
>> + if (hvf_arm_wfi_timer(cpu, ctl, cval) < 0) {
>> + /* vtimer already expired, don't halt */
>> + return 0;
>> + }
>> +
>> + cpu->halted = 1;
>> return EXCP_HLT;
>> }
>>
>> @@ -2332,7 +2416,11 @@ int hvf_arch_vcpu_exec(CPUState *cpu)
>> hv_return_t r;
>>
>> if (cpu->halted) {
>> - return EXCP_HLT;
>> + if (!cpu_has_work(cpu)) {
>> + return EXCP_HLT;
>> + }
>> + cpu->halted = 0;
>> + timer_del(cpu->accel->wfi_timer);
> !hvf_irqchip_in_kernel(), we shouldn’t have the wfi timer otherwise
>> }
>>
>> flush_cpu_state(cpu);
>> @@ -2376,11 +2464,45 @@ static const VMStateDescription vmstate_hvf_vtimer = {
>> static void hvf_vm_state_change(void *opaque, bool running, RunState state)
>> {
>> HVFVTimer *s = opaque;
>> + CPUState *cpu;
>>
>> if (running) {
>> /* Update vtimer offset on all CPUs */
>> hvf_state->vtimer_offset = mach_absolute_time() - s->vtimer_val;
>> cpu_synchronize_all_states();
>> +
>> + /*
>> + * After migration restore (or any resume), the wfi_timer is not
>> + * scheduled on this QEMU instance, so re-arm it for any halted
>> + * vCPU with a pending vtimer. For a non-migration resume the
>> + * QEMU_CLOCK_VIRTUAL timer was already scheduled; recomputing the
>> + * deadline produces the same value and is a harmless no-op.
>> + *
>> + * cpu_synchronize_all_states() above ensures env mirrors the
>> + * authoritative vtimer state (whether that came from HVF or from
>> + * the migration stream), so we can safely read it here from the
>> + * iothread.
>> + */
>
> This should be gated behind !hvf_irqchip_in_kernel()
>
>> + CPU_FOREACH(cpu) {
>> + ARMCPU *arm_cpu;
>> + uint64_t ctl, cval;
>> +
>> + if (!cpu->accel || !cpu->halted) {
>> + continue;
>> + }
>> +
>> + arm_cpu = ARM_CPU(cpu);
>> + ctl = arm_cpu->env.cp15.c14_timer[GTIMER_VIRT].ctl;
>> + cval = arm_cpu->env.cp15.c14_timer[GTIMER_VIRT].cval;
>> +
>> + if (hvf_arm_wfi_timer(cpu, ctl, cval) < 0) {
>> + /*
>> + * vtimer already expired while we were paused; raise the
>> + * IRQ now so the halted vCPU wakes up.
>> + */
>> + hvf_wfi_timer_cb(cpu);
>> + }
>> + }
>> } else {
>> /* Remember vtimer value on every pause */
>> s->vtimer_val = hvf_vtimer_val_raw();
>> --
>> 2.50.1 (Apple Git-155)
>>
>>
next prev parent reply other threads:[~2026-05-13 2:26 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 4:47 [PATCH] target/arm/hvf: Fix WFI halting to stop idle vCPU spinning Scott J. Goldman
2026-04-10 5:06 ` Mohamed Mediouni
2026-04-10 5:28 ` Scott J. Goldman
2026-04-10 5:50 ` [PATCH v2] " Scott J. Goldman
2026-04-10 6:18 ` Mohamed Mediouni
2026-04-16 21:20 ` Scott J. Goldman
2026-04-17 9:57 ` Philippe Mathieu-Daudé
2026-04-17 20:30 ` Scott J. Goldman
2026-04-21 23:24 ` Scott J. Goldman
2026-04-27 11:15 ` Peter Maydell
2026-04-27 18:42 ` Scott J. Goldman
2026-04-27 19:55 ` [PATCH v3] " Scott J. Goldman
2026-05-12 16:46 ` Scott J. Goldman
2026-05-12 19:18 ` Peter Maydell
2026-05-12 17:54 ` Mohamed Mediouni
2026-05-12 20:36 ` Scott J. Goldman
2026-05-12 20:51 ` Mohamed Mediouni
2026-05-13 1:21 ` Mohamed Mediouni
2026-05-13 2:21 ` [PATCH v11.1+ v4] " Scott J. Goldman
2026-05-15 10:39 ` Peter Maydell
2026-05-13 2:25 ` Scott J. Goldman [this message]
2026-05-13 7:14 ` [PATCH v3] " Mohamed Mediouni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DIH6YGXVPFD4.2O7V71XKGBIB@gmail.com \
--to=scottjgo@gmail.com \
--cc=agraf@csgraf.de \
--cc=mohamed@unpredictable.fr \
--cc=peter.maydell@linaro.org \
--cc=phil@philjordan.eu \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rbolshakov@ddn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.