QEMU-Devel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Scott J. Goldman" <scottjgo@gmail.com>
To: "Mohamed Mediouni" <mohamed@unpredictable.fr>,
	"Scott J. Goldman" <scottjgo@gmail.com>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Alexander Graf" <agraf@csgraf.de>,
	"Phil Dennis-Jordan" <phil@philjordan.eu>,
	"Roman Bolshakov" <rbolshakov@ddn.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>
Subject: Re: [PATCH v3] target/arm/hvf: Fix WFI halting to stop idle vCPU spinning
Date: Tue, 12 May 2026 22:25:49 -0400	[thread overview]
Message-ID: <DIH6YGXVPFD4.2O7V71XKGBIB@gmail.com> (raw)
In-Reply-To: <E8468087-647B-45B5-AF4E-85DD15A6D35C@unpredictable.fr>

On Tue May 12, 2026 at 9:21 PM EDT, Mohamed Mediouni wrote:
>
>> On 27. Apr 2026, at 21:55, Scott J. Goldman <scottjgo@gmail.com> wrote:
>> 
>> Commit b5f8f77271 ("accel/hvf: Implement WFI without using pselect()")
>> changed hvf_wfi() from blocking the vCPU thread with pselect() to
>> returning EXCP_HLT, intending QEMU's main event loop to handle the
>> idle wait. However, cpu->halted was never set, so cpu_thread_is_idle()
>> always returns false and the vCPU thread spins at 100% CPU per core
>> while the guest is idle.
>> 
>> Fix this by:
>> 
>> 1. Setting cpu->halted = 1 in hvf_wfi() so the vCPU thread sleeps on
>>  halt_cond in qemu_process_cpu_events().
>> 
>> 2. Arming a per-vCPU QEMU_CLOCK_VIRTUAL timer to fire when the guest's
>>  virtual timer (CNTV_CVAL_EL0) would expire. This is necessary
>>  because HVF only delivers HV_EXIT_REASON_VTIMER_ACTIVATED during
>>  hv_vcpu_run(), which is not called while the CPU is halted. The
>>  timer callback mirrors the VTIMER_ACTIVATED handler: it raises the
>>  vtimer IRQ through the GIC and marks vtimer_masked, causing the
>>  interrupt delivery chain to wake the vCPU via qemu_cpu_kick().
>> 
>> 3. Clearing cpu->halted in hvf_arch_vcpu_exec() when cpu_has_work()
>>  indicates a pending interrupt, and cancelling the WFI timer.
>> 
>> 4. Re-arming the WFI timer from hvf_vm_state_change() on the resume
>>  transition for any halted vCPU, since the QEMUTimer is per-instance
>>  state and is not migrated. After cpu_synchronize_all_states() the
>>  migrated vtimer state is mirrored in env, so we can read CNTV_CTL
>>  and CNTV_CVAL from there. If the vtimer has already expired by the
>>  time the destination resumes, hvf_wfi_timer_cb() is invoked
>>  directly so the halted vCPU is woken up.
>> 
>> Fixes: b5f8f77271 ("accel/hvf: Implement WFI without using pselect()")
>> Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
>> ---
>> Changes since v2:
>> - Use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_HOST so the timer
>>  pauses with the VM and a halted vCPU isn't woken (or its IRQ
>>  raised) while the user has stopped the guest. (Peter)
>> - Convert vtimer ticks to nanoseconds with muldiv64() to avoid
>>  intermediate overflow. (Peter)
>> - Re-arm the WFI timer from hvf_vm_state_change() on the resume
>>  transition so a halted vCPU on the migration destination is
>>  woken when its vtimer expires (the QEMUTimer is per-instance
>>  state and isn't migrated). (Peter)
>> v2: https://lore.kernel.org/qemu-devel/20260410055045.63001-1-scottjgo@gmail.com/
>> v1: https://lore.kernel.org/qemu-devel/20260410044726.61853-1-scottjgo@gmail.com/
>
> For QEMU 11.0 (for backporting to stable):
>
> Reviewed-by: Mohamed Mediouni <mohamed@unpredictable.fr>
>
> For QEMU 11.1:
>
> Adding some checks for !hvf_irqchip_in_kernel() needed
> but can do them on my side if you prefer.
>
> Looks ready apart from that bit.

Hi Mohamed,

TFTR - I sent a follow-up patch that gates the wfi timer as you requested.
For applying this v3 version to the stable tree, is there anyone else I
should ping? I'm not very familiar with the process.
 
Thanks!
-sjg


>
>> 
>> include/system/hvf_int.h |   1 +
>> target/arm/hvf/hvf.c     | 124 ++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 124 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/system/hvf_int.h b/include/system/hvf_int.h
>> index 2621164cb2..58fb865eba 100644
>> --- a/include/system/hvf_int.h
>> +++ b/include/system/hvf_int.h
>> @@ -48,6 +48,7 @@ struct AccelCPUState {
>>    hv_vcpu_exit_t *exit;
>>    bool vtimer_masked;
>>    bool guest_debug_enabled;
>> +    struct QEMUTimer *wfi_timer;
>> #endif
>> };
>> 
>> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
>> index 678afe5c8e..a19d7a5e1f 100644
>> --- a/target/arm/hvf/hvf.c
>> +++ b/target/arm/hvf/hvf.c
>> @@ -28,6 +28,7 @@
>> #include "hw/core/boards.h"
>> #include "hw/core/irq.h"
>> #include "qemu/main-loop.h"
>> +#include "qemu/timer.h"
>> #include "system/cpus.h"
>> #include "arm-powerctl.h"
>> #include "target/arm/cpu.h"
>> @@ -301,6 +302,8 @@ void hvf_arm_init_debug(void)
>> #define TMR_CTL_IMASK   (1 << 1)
>> #define TMR_CTL_ISTATUS (1 << 2)
>> 
>> +static void hvf_wfi_timer_cb(void *opaque);
>> +
>> static uint32_t chosen_ipa_bit_size;
>> 
>> typedef struct HVFVTimer {
>> @@ -1214,6 +1217,9 @@ void hvf_arch_vcpu_destroy(CPUState *cpu)
>> {
>>    hv_return_t ret;
>> 
>> +    timer_free(cpu->accel->wfi_timer);
>> +    cpu->accel->wfi_timer = NULL;
>> +
>>    ret = hv_vcpu_destroy(cpu->accel->fd);
>>    assert_hvf_ok(ret);
>> }
>> @@ -1352,6 +1358,9 @@ int hvf_arch_init_vcpu(CPUState *cpu)
>>                              arm_cpu->isar.idregs[ID_AA64MMFR0_EL1_IDX]);
>>    assert_hvf_ok(ret);
>> 
>> +    cpu->accel->wfi_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>> +                                          hvf_wfi_timer_cb, cpu);
>> +
> adding !hvf_irqchip_in_kernel()
>
>>     aarch64_add_sme_properties(OBJECT(cpu));
>>    return 0;
>> }
>> @@ -2027,8 +2036,67 @@ static uint64_t hvf_vtimer_val_raw(void)
>>    return mach_absolute_time() - hvf_state->vtimer_offset;
>> }
>> 
>> +static void hvf_wfi_timer_cb(void *opaque)
>> +{
>> +    CPUState *cpu = opaque;
>> +    ARMCPU *arm_cpu = ARM_CPU(cpu);
>> +
>> +    /*
>> +     * vtimer expired while the CPU was halted for WFI.
>> +     * Mirror HV_EXIT_REASON_VTIMER_ACTIVATED: raise the vtimer
>> +     * interrupt and mark as masked so hvf_sync_vtimer() will
>> +     * check and unmask when the guest handles it.
>> +     *
>> +     * The interrupt delivery chain (GIC -> cpu_interrupt ->
>> +     * qemu_cpu_kick) wakes the vCPU thread from halt_cond.
>> +     */
>> +    qemu_set_irq(arm_cpu->gt_timer_outputs[GTIMER_VIRT], 1);
>> +    cpu->accel->vtimer_masked = true;
>> +}
>> +
>> +/*
>> + * Arm a host-side QEMU_CLOCK_VIRTUAL timer to fire when the guest's
>> + * vtimer (CNTV_CVAL_EL0) is scheduled to expire. HVF only delivers
>> + * HV_EXIT_REASON_VTIMER_ACTIVATED during hv_vcpu_run(), which we won't
>> + * call while the vCPU is halted, so we need this to wake the vCPU.
>> + *
>> + * QEMU_CLOCK_VIRTUAL pauses while the VM is stopped, which keeps the
>> + * timer in lockstep with the guest's view of vtime across pause/resume.
>> + *
>> + * Caller must supply the current CNTV_CTL_EL0 and CNTV_CVAL_EL0 values,
>> + * since the appropriate source (HVF vs. env) depends on context.
>> + *
>> + * Returns 0 if the timer was armed (or if the vtimer is disabled/masked
>> + * and the vCPU should still halt waiting on another event), or -1 if
>> + * the vtimer has already expired.
>> + */
>> +static int hvf_arm_wfi_timer(CPUState *cpu, uint64_t ctl, uint64_t cval)
>> +{
>> +    ARMCPU *arm_cpu = ARM_CPU(cpu);
>> +    uint64_t now;
>> +    int64_t delta_ns;
>> +
>> +    if (!(ctl & TMR_CTL_ENABLE) || (ctl & TMR_CTL_IMASK)) {
>> +        return 0;
>> +    }
>> +
>> +    now = hvf_vtimer_val_raw();
>> +    if (cval <= now) {
>> +        return -1;
>> +    }
>> +
>> +    delta_ns = muldiv64(cval - now, NANOSECONDS_PER_SECOND,
>> +                        arm_cpu->gt_cntfrq_hz);
>> +    timer_mod(cpu->accel->wfi_timer,
>> +              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + delta_ns);
>> +    return 0;
>> +}
>> +
>> static int hvf_wfi(CPUState *cpu)
>> {
>> +    uint64_t ctl, cval;
>> +    hv_return_t r;
>> +
>>    if (cpu_has_work(cpu)) {
>>        /*
>>         * Don't bother to go into our "low power state" if
>> @@ -2037,6 +2105,22 @@ static int hvf_wfi(CPUState *cpu)
>>        return 0;
>>    }
>> 
>> +    /*
>> +     * Read the vtimer state directly from HVF. We're on the vCPU thread,
>> +     * just exited from hv_vcpu_run(), so HVF holds the authoritative
>> +     * values and env may be stale.
>> +     */
>> +    r = hv_vcpu_get_sys_reg(cpu->accel->fd, HV_SYS_REG_CNTV_CTL_EL0, &ctl);
>> +    assert_hvf_ok(r);
>> +    r = hv_vcpu_get_sys_reg(cpu->accel->fd, HV_SYS_REG_CNTV_CVAL_EL0, &cval);
>> +    assert_hvf_ok(r);
>> +
>> +    if (hvf_arm_wfi_timer(cpu, ctl, cval) < 0) {
>> +        /* vtimer already expired, don't halt */
>> +        return 0;
>> +    }
>> +
>> +    cpu->halted = 1;
>>    return EXCP_HLT;
>> }
>> 
>> @@ -2332,7 +2416,11 @@ int hvf_arch_vcpu_exec(CPUState *cpu)
>>    hv_return_t r;
>> 
>>    if (cpu->halted) {
>> -        return EXCP_HLT;
>> +        if (!cpu_has_work(cpu)) {
>> +            return EXCP_HLT;
>> +        }
>> +        cpu->halted = 0;
>> +        timer_del(cpu->accel->wfi_timer);
> !hvf_irqchip_in_kernel(), we shouldn’t have the wfi timer otherwise
>>     }
>> 
>>    flush_cpu_state(cpu);
>> @@ -2376,11 +2464,45 @@ static const VMStateDescription vmstate_hvf_vtimer = {
>> static void hvf_vm_state_change(void *opaque, bool running, RunState state)
>> {
>>    HVFVTimer *s = opaque;
>> +    CPUState *cpu;
>> 
>>    if (running) {
>>        /* Update vtimer offset on all CPUs */
>>        hvf_state->vtimer_offset = mach_absolute_time() - s->vtimer_val;
>>        cpu_synchronize_all_states();
>> +
>> +        /*
>> +         * After migration restore (or any resume), the wfi_timer is not
>> +         * scheduled on this QEMU instance, so re-arm it for any halted
>> +         * vCPU with a pending vtimer. For a non-migration resume the
>> +         * QEMU_CLOCK_VIRTUAL timer was already scheduled; recomputing the
>> +         * deadline produces the same value and is a harmless no-op.
>> +         *
>> +         * cpu_synchronize_all_states() above ensures env mirrors the
>> +         * authoritative vtimer state (whether that came from HVF or from
>> +         * the migration stream), so we can safely read it here from the
>> +         * iothread.
>> +         */
>
> This should be gated behind !hvf_irqchip_in_kernel()
>
>> +        CPU_FOREACH(cpu) {
>> +            ARMCPU *arm_cpu;
>> +            uint64_t ctl, cval;
>> +
>> +            if (!cpu->accel || !cpu->halted) {
>> +                continue;
>> +            }
>> +
>> +            arm_cpu = ARM_CPU(cpu);
>> +            ctl = arm_cpu->env.cp15.c14_timer[GTIMER_VIRT].ctl;
>> +            cval = arm_cpu->env.cp15.c14_timer[GTIMER_VIRT].cval;
>> +
>> +            if (hvf_arm_wfi_timer(cpu, ctl, cval) < 0) {
>> +                /*
>> +                 * vtimer already expired while we were paused; raise the
>> +                 * IRQ now so the halted vCPU wakes up.
>> +                 */
>> +                hvf_wfi_timer_cb(cpu);
>> +            }
>> +        }
>>    } else {
>>        /* Remember vtimer value on every pause */
>>        s->vtimer_val = hvf_vtimer_val_raw();
>> -- 
>> 2.50.1 (Apple Git-155)
>> 
>> 



  parent reply	other threads:[~2026-05-13  2:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10  4:47 [PATCH] target/arm/hvf: Fix WFI halting to stop idle vCPU spinning Scott J. Goldman
2026-04-10  5:06 ` Mohamed Mediouni
2026-04-10  5:28   ` Scott J. Goldman
2026-04-10  5:50 ` [PATCH v2] " Scott J. Goldman
2026-04-10  6:18   ` Mohamed Mediouni
2026-04-16 21:20     ` Scott J. Goldman
2026-04-17  9:57       ` Philippe Mathieu-Daudé
2026-04-17 20:30         ` Scott J. Goldman
2026-04-21 23:24           ` Scott J. Goldman
2026-04-27 11:15   ` Peter Maydell
2026-04-27 18:42     ` Scott J. Goldman
2026-04-27 19:55   ` [PATCH v3] " Scott J. Goldman
2026-05-12 16:46     ` Scott J. Goldman
2026-05-12 19:18       ` Peter Maydell
2026-05-12 17:54     ` Mohamed Mediouni
2026-05-12 20:36       ` Scott J. Goldman
2026-05-12 20:51         ` Mohamed Mediouni
2026-05-13  1:21     ` Mohamed Mediouni
2026-05-13  2:21       ` [PATCH v11.1+ v4] " Scott J. Goldman
2026-05-15 10:39         ` Peter Maydell
2026-05-13  2:25       ` Scott J. Goldman [this message]
2026-05-13  7:14         ` [PATCH v3] " Mohamed Mediouni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DIH6YGXVPFD4.2O7V71XKGBIB@gmail.com \
    --to=scottjgo@gmail.com \
    --cc=agraf@csgraf.de \
    --cc=mohamed@unpredictable.fr \
    --cc=peter.maydell@linaro.org \
    --cc=phil@philjordan.eu \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rbolshakov@ddn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox