From: cdall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
Date: Mon, 20 Nov 2017 12:15:40 +0100 [thread overview]
Message-ID: <20171120111540.GE28855@cbox> (raw)
In-Reply-To: <CAHyh4xhxCcZ3EKLCkZhhzvbj753J5gTRutM6wA4rkuDDxmn_FA@mail.gmail.com>
On Thu, Nov 16, 2017 at 03:30:39PM -0500, Jintack Lim wrote:
> Hi Christoffer,
>
> On Fri, Oct 20, 2017 at 7:49 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > From: Christoffer Dall <cdall@linaro.org>
> >
> > We don't need to save and restore the hardware timer state and examine
> > if it generates interrupts on on every entry/exit to the guest. The
> > timer hardware is perfectly capable of telling us when it has expired
> > by signaling interrupts.
> >
> > When taking a vtimer interrupt in the host, we don't want to mess with
> > the timer configuration, we just want to forward the physical interrupt
> > to the guest as a virtual interrupt. We can use the split priority drop
> > and deactivate feature of the GIC to do this, which leaves an EOI'ed
> > interrupt active on the physical distributor, making sure we don't keep
> > taking timer interrupts which would prevent the guest from running. We
> > can then forward the physical interrupt to the VM using the HW bit in
> > the LR of the GIC, like we do already, which lets the guest directly
> > deactivate both the physical and virtual timer simultaneously, allowing
> > the timer hardware to exit the VM and generate a new physical interrupt
> > when the timer output is again asserted later on.
> >
> > We do need to capture this state when migrating VCPUs between physical
> > CPUs, however, which we use the vcpu put/load functions for, which are
> > called through preempt notifiers whenever the thread is scheduled away
> > from the CPU or called directly if we return from the ioctl to
> > userspace.
> >
> > One caveat is that we have to save and restore the timer state in both
> > kvm_timer_vcpu_[put/load] and kvm_timer_[schedule/unschedule], because
> > we can have the following flows:
> >
> > 1. kvm_vcpu_block
> > 2. kvm_timer_schedule
> > 3. schedule
> > 4. kvm_timer_vcpu_put (preempt notifier)
> > 5. schedule (vcpu thread gets scheduled back)
> > 6. kvm_timer_vcpu_load (preempt notifier)
> > 7. kvm_timer_unschedule
> >
> > And a version where we don't actually call schedule:
> >
> > 1. kvm_vcpu_block
> > 2. kvm_timer_schedule
> > 7. kvm_timer_unschedule
> >
> > Since kvm_timer_[schedule/unschedule] may not be followed by put/load,
> > but put/load also may be called independently, we call the timer
> > save/restore functions from both paths. Since they rely on the loaded
> > flag to never save/restore when unnecessary, this doesn't cause any
> > harm, and we ensure that all invokations of either set of functions work
> > as intended.
> >
> > An added benefit beyond not having to read and write the timer sysregs
> > on every entry and exit is that we no longer have to actively write the
> > active state to the physical distributor, because we configured the
> > irq for the vtimer to only get a priority drop when handling the
> > interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and
> > the interrupt stays active after firing on the host.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >
> > Notes:
> > Changes since v3:
> > - Added comments explaining the 'loaded' flag and made other clarifying
> > comments.
> > - No longer rely on the armed flag to conditionally save/restore state,
> > as we already rely on the 'loaded' flag to not repetitively
> > save/restore state.
> > - Reworded parts of the commit message.
> > - Removed renames not belonging to this patch.
> > - Added warning in kvm_arch_timer_handler in case we see spurious
> > interrupts, for example if the hardware doesn't retire the
> > level-triggered timer signal fast enough.
> >
> > include/kvm/arm_arch_timer.h | 16 ++-
> > virt/kvm/arm/arch_timer.c | 237 +++++++++++++++++++++++++++----------------
> > virt/kvm/arm/arm.c | 19 +++-
> > 3 files changed, 178 insertions(+), 94 deletions(-)
> >
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index 184c3ef2df93..c538f707e1c1 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -31,8 +31,15 @@ struct arch_timer_context {
> > /* Timer IRQ */
> > struct kvm_irq_level irq;
> >
> > - /* Active IRQ state caching */
> > - bool active_cleared_last;
> > + /*
> > + * We have multiple paths which can save/restore the timer state
> > + * onto the hardware, so we need some way of keeping track of
> > + * where the latest state is.
> > + *
> > + * loaded == true: State is loaded on the hardware registers.
> > + * loaded == false: State is stored in memory.
> > + */
> > + bool loaded;
> >
> > /* Virtual offset */
> > u64 cntvoff;
> > @@ -78,10 +85,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >
> > u64 kvm_phys_timer_read(void);
> >
> > +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
> > void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
> >
> > void kvm_timer_init_vhe(void);
> >
> > #define vcpu_vtimer(v) (&(v)->arch.timer_cpu.vtimer)
> > #define vcpu_ptimer(v) (&(v)->arch.timer_cpu.ptimer)
> > +
> > +void enable_el1_phys_timer_access(void);
> > +void disable_el1_phys_timer_access(void);
> > +
> > #endif
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index eac1b3d83a86..ec685c1f3b78 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -46,10 +46,9 @@ static const struct kvm_irq_level default_vtimer_irq = {
> > .level = 1,
> > };
> >
> > -void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > -{
> > - vcpu_vtimer(vcpu)->active_cleared_last = false;
> > -}
> > +static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
> > +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> > + struct arch_timer_context *timer_ctx);
> >
> > u64 kvm_phys_timer_read(void)
> > {
> > @@ -69,17 +68,45 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
> > cancel_work_sync(work);
> > }
> >
> > -static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > +static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
> > {
> > - struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > + struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >
> > /*
> > - * We disable the timer in the world switch and let it be
> > - * handled by kvm_timer_sync_hwstate(). Getting a timer
> > - * interrupt at this point is a sure sign of some major
> > - * breakage.
> > + * When using a userspace irqchip with the architected timers, we must
> > + * prevent continuously exiting from the guest, and therefore mask the
> > + * physical interrupt by disabling it on the host interrupt controller
> > + * when the virtual level is high, such that the guest can make
> > + * forward progress. Once we detect the output level being
> > + * de-asserted, we unmask the interrupt again so that we exit from the
> > + * guest when the timer fires.
> > */
> > - pr_warn("Unexpected interrupt %d on vcpu %p\n", irq, vcpu);
> > + if (vtimer->irq.level)
> > + disable_percpu_irq(host_vtimer_irq);
> > + else
> > + enable_percpu_irq(host_vtimer_irq, 0);
> > +}
> > +
> > +static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > +{
> > + struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > + struct arch_timer_context *vtimer;
> > +
> > + if (!vcpu) {
> > + pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
> > + return IRQ_NONE;
> > + }
> > + vtimer = vcpu_vtimer(vcpu);
> > +
> > + if (!vtimer->irq.level) {
> > + vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > + if (kvm_timer_irq_can_fire(vtimer))
> > + kvm_timer_update_irq(vcpu, true, vtimer);
> > + }
> > +
> > + if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > + kvm_vtimer_update_mask_user(vcpu);
> > +
> > return IRQ_HANDLED;
> > }
> >
> > @@ -215,7 +242,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> > {
> > int ret;
> >
> > - timer_ctx->active_cleared_last = false;
> > timer_ctx->irq.level = new_level;
> > trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> > timer_ctx->irq.level);
> > @@ -271,10 +297,16 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> > soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> > }
> >
> > -static void timer_save_state(struct kvm_vcpu *vcpu)
> > +static void vtimer_save_state(struct kvm_vcpu *vcpu)
> > {
> > struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > + unsigned long flags;
> > +
> > + local_irq_save(flags);
> > +
> > + if (!vtimer->loaded)
> > + goto out;
> >
> > if (timer->enabled) {
> > vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > @@ -283,6 +315,10 @@ static void timer_save_state(struct kvm_vcpu *vcpu)
> >
> > /* Disable the virtual timer */
> > write_sysreg_el0(0, cntv_ctl);
> > +
> > + vtimer->loaded = false;
> > +out:
> > + local_irq_restore(flags);
> > }
> >
> > /*
> > @@ -296,6 +332,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> > struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >
> > + vtimer_save_state(vcpu);
> > +
> > /*
> > * No need to schedule a background timer if any guest timer has
> > * already expired, because kvm_vcpu_block will return before putting
> > @@ -318,22 +356,34 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> > soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
> > }
> >
> > -static void timer_restore_state(struct kvm_vcpu *vcpu)
> > +static void vtimer_restore_state(struct kvm_vcpu *vcpu)
> > {
> > struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > + unsigned long flags;
> > +
> > + local_irq_save(flags);
> > +
> > + if (vtimer->loaded)
> > + goto out;
> >
> > if (timer->enabled) {
> > write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> > isb();
> > write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> > }
> > +
> > + vtimer->loaded = true;
> > +out:
> > + local_irq_restore(flags);
> > }
> >
> > void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> > {
> > struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >
> > + vtimer_restore_state(vcpu);
> > +
> > soft_timer_cancel(&timer->bg_timer, &timer->expired);
> > }
> >
> > @@ -352,61 +402,45 @@ static void set_cntvoff(u64 cntvoff)
> > kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
> > }
> >
> > -static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> > +static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
> > {
> > struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > bool phys_active;
> > int ret;
> >
> > - /*
> > - * If we enter the guest with the virtual input level to the VGIC
> > - * asserted, then we have already told the VGIC what we need to, and
> > - * we don't need to exit from the guest until the guest deactivates
> > - * the already injected interrupt, so therefore we should set the
> > - * hardware active state to prevent unnecessary exits from the guest.
> > - *
> > - * Also, if we enter the guest with the virtual timer interrupt active,
> > - * then it must be active on the physical distributor, because we set
> > - * the HW bit and the guest must be able to deactivate the virtual and
> > - * physical interrupt at the same time.
> > - *
> > - * Conversely, if the virtual input level is deasserted and the virtual
> > - * interrupt is not active, then always clear the hardware active state
> > - * to ensure that hardware interrupts from the timer triggers a guest
> > - * exit.
> > - */
> > phys_active = vtimer->irq.level ||
> > - kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
> > -
> > - /*
> > - * We want to avoid hitting the (re)distributor as much as
> > - * possible, as this is a potentially expensive MMIO access
> > - * (not to mention locks in the irq layer), and a solution for
> > - * this is to cache the "active" state in memory.
> > - *
> > - * Things to consider: we cannot cache an "active set" state,
> > - * because the HW can change this behind our back (it becomes
> > - * "clear" in the HW). We must then restrict the caching to
> > - * the "clear" state.
> > - *
> > - * The cache is invalidated on:
> > - * - vcpu put, indicating that the HW cannot be trusted to be
> > - * in a sane state on the next vcpu load,
> > - * - any change in the interrupt state
> > - *
> > - * Usage conditions:
> > - * - cached value is "active clear"
> > - * - value to be programmed is "active clear"
> > - */
> > - if (vtimer->active_cleared_last && !phys_active)
> > - return;
> > + kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
> >
> > ret = irq_set_irqchip_state(host_vtimer_irq,
> > IRQCHIP_STATE_ACTIVE,
> > phys_active);
> > WARN_ON(ret);
> > +}
> >
> > - vtimer->active_cleared_last = !phys_active;
> > +static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
> > +{
> > + kvm_vtimer_update_mask_user(vcpu);
> > +}
> > +
> > +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
> > +{
> > + struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > + struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > + if (unlikely(!timer->enabled))
> > + return;
> > +
> > + if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > + kvm_timer_vcpu_load_user(vcpu);
> > + else
> > + kvm_timer_vcpu_load_vgic(vcpu);
> > +
> > + set_cntvoff(vtimer->cntvoff);
> > +
> > + vtimer_restore_state(vcpu);
> > +
> > + if (has_vhe())
> > + disable_el1_phys_timer_access();
>
> Same question here :)
>
Same answer as below.
> > }
> >
> > bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> > @@ -426,23 +460,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> > ptimer->irq.level != plevel;
> > }
> >
> > -static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> > -{
> > - struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > -
> > - /*
> > - * To prevent continuously exiting from the guest, we mask the
> > - * physical interrupt such that the guest can make forward progress.
> > - * Once we detect the output level being deasserted, we unmask the
> > - * interrupt again so that we exit from the guest when the timer
> > - * fires.
> > - */
> > - if (vtimer->irq.level)
> > - disable_percpu_irq(host_vtimer_irq);
> > - else
> > - enable_percpu_irq(host_vtimer_irq, 0);
> > -}
> > -
> > /**
> > * kvm_timer_flush_hwstate - prepare timers before running the vcpu
> > * @vcpu: The vcpu pointer
> > @@ -455,23 +472,61 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> > void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > {
> > struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > - struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > + struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >
> > if (unlikely(!timer->enabled))
> > return;
> >
> > - kvm_timer_update_state(vcpu);
> > + if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> > + kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> >
> > /* Set the background timer for the physical timer emulation. */
> > phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> > +}
> >
> > - if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > - kvm_timer_flush_hwstate_user(vcpu);
> > - else
> > - kvm_timer_flush_hwstate_vgic(vcpu);
> > +void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > +{
> > + struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >
> > - set_cntvoff(vtimer->cntvoff);
> > - timer_restore_state(vcpu);
> > + if (unlikely(!timer->enabled))
> > + return;
> > +
> > + if (has_vhe())
> > + enable_el1_phys_timer_access();
>
> I wonder why we need to enable the EL1 physical timer access on VHE
> systems (assuming TGE bit is set at this point)? EL2 can access it
> regardless of EL1PTEN bit status, and EL0 access is controlled by
> EL0PTEN.
Yeah, my code is bogus, you already addressed that. I think I wrote the
first version of these patches prior to you fixing the physical timer
trap configuration for VHE systems.
>
> In any case, since cnthcntl_el2 format is changed when E2H == 1, don't
> we need to consider this in enable_el1_phys_timer_access() function
> implementation?
>
You are indeed right. Nice catch!
Fix incoming.
-Christoffer
next prev parent reply other threads:[~2017-11-20 11:15 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-20 11:49 [PATCH v4 00/20] KVM: arm/arm64: Optimize arch timer register handling Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 01/20] irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU interface Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 02/20] arm64: Implement arch_counter_get_cntpct to read the physical counter Christoffer Dall
2017-10-25 13:41 ` Marc Zyngier
2017-10-20 11:49 ` [PATCH v4 03/20] arm64: Use physical counter for in-kernel reads when booted in EL2 Christoffer Dall
2017-10-25 13:43 ` Marc Zyngier
2017-10-20 11:49 ` [PATCH v4 04/20] KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context Christoffer Dall
2017-10-25 13:45 ` Marc Zyngier
2017-10-20 11:49 ` [PATCH v4 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate Christoffer Dall
2017-10-25 13:47 ` Marc Zyngier
2017-10-20 11:49 ` [PATCH v4 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic Christoffer Dall
2017-10-25 13:49 ` Marc Zyngier
2017-10-20 11:49 ` [PATCH v4 08/20] KVM: arm/arm64: Rename soft timer to bg_timer Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 09/20] KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 10/20] KVM: arm/arm64: Use separate timer for phys timer emulation Christoffer Dall
2017-10-25 13:59 ` Marc Zyngier
2017-10-20 11:49 ` [PATCH v4 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts Christoffer Dall
2017-10-20 11:56 ` Thomas Gleixner
2017-10-21 14:29 ` Christoffer Dall
2017-10-21 14:55 ` Thomas Gleixner
2017-10-21 15:30 ` Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 13/20] KVM: arm/arm64: Set VCPU affinity for virt timer irq Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit Christoffer Dall
2017-10-25 14:36 ` Marc Zyngier
2017-11-16 20:30 ` Jintack Lim
2017-11-16 20:32 ` Jintack Lim
2017-11-20 11:15 ` Christoffer Dall [this message]
2017-11-20 16:32 ` Jintack Lim
2017-10-20 11:49 ` [PATCH v4 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg Christoffer Dall
2017-10-25 14:38 ` Marc Zyngier
2017-10-20 11:49 ` [PATCH v4 16/20] KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 17/20] KVM: arm/arm64: Move phys_timer_emulate function Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 19/20] KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate Christoffer Dall
2017-10-20 11:49 ` [PATCH v4 20/20] KVM: arm/arm64: Rework kvm_timer_should_fire Christoffer Dall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171120111540.GE28855@cbox \
--to=cdall@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).