Re: [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vincent Donnefort <vdonnefort@google.com>
To: tabba@google.com
Cc: Marc Zyngier <maz@kernel.org>, Oliver Upton <oupton@kernel.org>,
	Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Quentin Perret <qperret@google.com>,
	Sebastian Ene <sebastianene@google.com>,
	Per Larsen <perlarsen@google.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Steffen Eiden <seiden@linux.ibm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Hyunwoo Kim <imv4bel@gmail.com>,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
Date: Mon, 15 Jun 2026 17:25:20 +0100	[thread overview]
Message-ID: <ajAncPp3nOGcWD1U@google.com> (raw)
In-Reply-To: <20260612065925.755562-12-tabba@google.com>

On Fri, Jun 12, 2026 at 07:59:25AM +0100, tabba@google.com wrote:
> pKVM copies a non-protected guest's register context between the host
> and the hypervisor on every world switch, even when the host never
> inspects it. Defer the copy: on entry, flush the host context into the
> hyp vCPU only when the host marked it dirty (PKVM_HOST_STATE_DIRTY); on
> exit, leave it in the hyp vCPU and copy it back only when the host needs
> it, via a __pkvm_vcpu_sync_state hypercall on trap handling or at vcpu
> put. A protected guest's context is copied as before, since lazy sync
> only helps where the host is trusted to see the guest's registers.
> 
> The PC is the exception: it is copied back on every exit so the
> kvm_exit tracepoint reports the guest's real exit PC rather than the
> value left by the previous sync.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h   |  1 +
>  arch/arm64/include/asm/kvm_host.h  |  2 +
>  arch/arm64/kvm/arm.c               |  7 +++
>  arch/arm64/kvm/handle_exit.c       | 22 ++++++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 88 ++++++++++++++++++++++++++++--
>  5 files changed, 115 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 043495f7fc78..6e1135b3ded4 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -113,6 +113,7 @@ enum __kvm_host_smccc_func {
>  	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
>  
>  	MARKER(__KVM_HOST_SMCCC_FUNC_MAX)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a49042bfa801..1ef660774adc 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1113,6 +1113,8 @@ struct kvm_vcpu_arch {
>  /* SError pending for nested guest */
>  #define NESTED_SERROR_PENDING	__vcpu_single_flag(sflags, BIT(8))
>  
> +/* pKVM host vcpu state is dirty, needs resync (nVHE-only) */

nit: with hVHE, I guess we can just drop that nVHE-only? 

> +#define PKVM_HOST_STATE_DIRTY	__vcpu_single_flag(iflags, BIT(4))
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>  #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index c9f36932c980..a5c54e37778b 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -734,6 +734,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	if (is_protected_kvm_enabled()) {
>  		kvm_call_hyp(__vgic_v3_save_aprs, &vcpu->arch.vgic_cpu.vgic_v3);
>  		kvm_call_hyp_nvhe(__pkvm_vcpu_put);
> +
> +		/* __pkvm_vcpu_put implies a sync of the state */
> +		if (!kvm_vm_is_protected(vcpu->kvm))
> +			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
>  	}
>  
>  	kvm_vcpu_put_debug(vcpu);
> @@ -961,6 +965,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>  		return ret;
>  
>  	if (is_protected_kvm_enabled()) {
> +		/* Start with the vcpu in a dirty state */
> +		if (!kvm_vm_is_protected(vcpu->kvm))
> +			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
>  		ret = pkvm_create_hyp_vm(kvm);
>  		if (ret)
>  			return ret;
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 54aedf93c78b..dccc3786548b 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -422,6 +422,21 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
>  {
>  	int handled;
>  
> +	/*
> +	 * If we run a non-protected VM when protection is enabled
> +	 * system-wide, resync the state from the hypervisor and mark
> +	 * it as dirty on the host side if it wasn't dirty already
> +	 * (which could happen if preemption has taken place).
> +	 */
> +	if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
> +		preempt_disable();

nit: since we are introducing guard() with that series, this one could be
guard(preempt)().

> +		if (!(vcpu_get_flag(vcpu, PKVM_HOST_STATE_DIRTY))) {
> +			kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> +			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> +		}
> +		preempt_enable();
> +	}
> +
>  	/*
>  	 * See ARM ARM B1.14.1: "Hyp traps on instructions
>  	 * that fail their condition code check"
> @@ -489,6 +504,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
>  /* For exit types that need handling before we can be preempted */
>  void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
>  {
> +	/*
> +	 * We just exited, so the state is clean from a hypervisor
> +	 * perspective.
> +	 */
> +	if (is_protected_kvm_enabled())
> +		vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> +
>  	if (ARM_SERROR_PENDING(exception_index)) {
>  		if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
>  			u64 disr = kvm_vcpu_get_disr(vcpu);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 23e644c24a03..02383b372258 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -139,6 +139,49 @@ static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
>  		host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
>  }
>  
> +
> +static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu,
> +			      struct kvm_vcpu *to_vcpu)
> +{
> +	int i;
> +
> +	to_vcpu->arch.ctxt.regs		= from_vcpu->arch.ctxt.regs;
> +	to_vcpu->arch.ctxt.spsr_abt	= from_vcpu->arch.ctxt.spsr_abt;
> +	to_vcpu->arch.ctxt.spsr_und	= from_vcpu->arch.ctxt.spsr_und;
> +	to_vcpu->arch.ctxt.spsr_irq	= from_vcpu->arch.ctxt.spsr_irq;
> +	to_vcpu->arch.ctxt.spsr_fiq	= from_vcpu->arch.ctxt.spsr_fiq;
> +	to_vcpu->arch.ctxt.fp_regs	= from_vcpu->arch.ctxt.fp_regs;
> +
> +	/*
> +	 * Copy the sysregs, but don't mess with the timer state which
> +	 * is directly handled by EL1 and is expected to be preserved.
> +	 * enum vcpu_sysreg is sparse: VNCR-mapped registers take values
> +	 * derived from their VNCR page offset, so the timer registers do
> +	 * not form a contiguous numeric range and must be skipped by name.
> +	 */
> +	for (i = 1; i < NR_SYS_REGS; i++) {
> +		switch (i) {
> +		case CNTVOFF_EL2:
> +		case CNTV_CVAL_EL0:
> +		case CNTV_CTL_EL0:
> +		case CNTP_CVAL_EL0:
> +		case CNTP_CTL_EL0:
> +			continue;
> +		}
> +		to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i];
> +	}
> +}
> +
> +static void __sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> +	__copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu);
> +}
> +
> +static void __flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> +	__copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu);
> +}

nit: Could that be flush/sync_hyp_vcpu_state? as everything this is called
"state" and we already have flush_debug_state() below ?

> +
>  static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
>  {
>  	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> @@ -168,7 +211,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
>  	fpsimd_sve_flush();
>  	flush_debug_state(hyp_vcpu);
>  
> -	hyp_vcpu->vcpu.arch.ctxt	= host_vcpu->arch.ctxt;
> +	/*
> +	 * If we deal with a non-protected guest and the state is potentially
> +	 * dirty (from a host perspective), copy the state back into the hyp
> +	 * vcpu.
> +	 */
> +	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> +		if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> +			__flush_hyp_vcpu(hyp_vcpu);
> +	} else {
> +		hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
> +	}
>  
>  	hyp_vcpu->vcpu.arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
>  	hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
> @@ -191,9 +244,11 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
>  	fpsimd_sve_sync(&hyp_vcpu->vcpu);
>  	sync_debug_state(hyp_vcpu);
>  
> -	host_vcpu->arch.ctxt		= hyp_vcpu->vcpu.arch.ctxt;
> -
> -	host_vcpu->arch.hcr_el2		= hyp_vcpu->vcpu.arch.hcr_el2;
> +	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +		host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
> +	else
> +		/* Keep the PC current for the kvm_exit tracepoint (lazy ctxt sync). */
> +		host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
>  
>  	host_vcpu->arch.fault		= hyp_vcpu->vcpu.arch.fault;
>  
> @@ -227,8 +282,30 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
>  {
>  	struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>  
> -	if (hyp_vcpu)
> +	if (hyp_vcpu) {
> +		struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> +
> +		if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) &&
> +		    !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) {
> +			__sync_hyp_vcpu(hyp_vcpu);
> +		}
> +
>  		pkvm_put_hyp_vcpu(hyp_vcpu);
> +	}
> +}
> +
> +static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt)
> +{
> +	struct pkvm_hyp_vcpu *hyp_vcpu;
> +
> +	if (!is_protected_kvm_enabled())
> +		return;

Since "KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls" we
got rid of those is_protected_kvm_enabled() for pKVM-only HVCs. (also, it is
declared in the pKVM-only section of the HVCs)

> +
> +	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> +	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +		return;
> +
> +	__sync_hyp_vcpu(hyp_vcpu);
>  }
>  
>  static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
> @@ -859,6 +936,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
>  	HANDLE_FUNC(__pkvm_vcpu_load),
>  	HANDLE_FUNC(__pkvm_vcpu_put),
> +	HANDLE_FUNC(__pkvm_vcpu_sync_state),
>  	HANDLE_FUNC(__pkvm_tlb_flush_vmid),
>  };
>  
> -- 
> 2.54.0.1136.gdb2ca164c4-goog
>

next prev parent reply	other threads:[~2026-06-15 16:25 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
2026-06-12  6:59 ` [PATCH v1 01/11] KVM: arm64: Add scoped resource management (guard) for hyp_spinlock tabba
2026-06-12  6:59 ` [PATCH v1 02/11] KVM: arm64: Use guard(hyp_spinlock) in pKVM hypervisor code tabba
2026-06-15 12:53   ` Vincent Donnefort
2026-06-15 13:11     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 03/11] KVM: arm64: Use guard()/scoped_guard() in arm64 KVM EL1 code tabba
2026-06-15 12:59   ` Vincent Donnefort
2026-06-15 13:17     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 04/11] KVM: arm64: Extract MPIDR computation into a shared header tabba
2026-06-12  6:59 ` [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code tabba
2026-06-12  7:17   ` sashiko-bot
2026-06-12  7:53     ` Fuad Tabba
2026-06-15 13:11   ` Vincent Donnefort
2026-06-15 13:29     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 06/11] KVM: arm64: Factor out reusable vCPU reset helpers tabba
2026-06-15 13:16   ` Vincent Donnefort
2026-06-15 13:45     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 07/11] KVM: arm64: Move PSCI helper functions to a shared header tabba
2026-06-12  6:59 ` [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives tabba
2026-06-12  7:08   ` sashiko-bot
2026-06-12  7:15     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch tabba
2026-06-12  7:24   ` sashiko-bot
2026-06-12  8:05     ` Fuad Tabba
2026-06-12  8:09       ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 tabba
2026-06-12  7:23   ` sashiko-bot
2026-06-12  8:14     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests tabba
2026-06-12  7:19   ` sashiko-bot
2026-06-12  9:51     ` Fuad Tabba
2026-06-15 16:25   ` Vincent Donnefort [this message]
2026-06-15 16:44     ` Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajAncPp3nOGcWD1U@google.com \
    --to=vdonnefort@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=imv4bel@gmail.com \
    --cc=joey.gouly@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=perlarsen@google.com \
    --cc=qperret@google.com \
    --cc=sebastianene@google.com \
    --cc=seiden@linux.ibm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.