From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AD55ACDE008 for ; Fri, 26 Jun 2026 07:05:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=yFMA8+V+zE8pc9O0ExLyoI+ctNuCMpTqUvr6xoh2490=; b=lKGj5yKA1L7XZtkhMzceS6qdw1 Ir4zRMc0LxEq9FufmkGBGuWa8oQTLwtCF2pWh9jkYHPQnlg35nEkcBYHpig3h8cqK0hEJGuqWFjue 1zKEITi6+xTXUbq1hOW7sYLD3SPRKYEsVB7U0IhLRXqMPSSrWRK4YzINYdcwHmjetfD9dUIUpvOiD e/lr3UktfVR655j8T/qiIc0O1LQjZsav3lgugijSJjcxQmSeNCp+dQY8+cAcAokquaL3O8zUtyvDz 3xRHxNbm488BV7AIJxEY6lDWy3IJINxuaWpXxyxJgdu+c9eGt9ay2dLLSZ8VJiYpxLKBEpW/izCKL yAmjStXw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.99.1 #2 (Red Hat Linux)) id 1wd0cg-0000000AeH9-3x8d; Fri, 26 Jun 2026 07:05:10 +0000 Received: from out-180.mta1.migadu.com ([2001:41d0:203:375::b4]) by bombadil.infradead.org with esmtps (Exim 4.99.1 #2 (Red Hat Linux)) id 1wd0cM-0000000AdzP-0RwG for linux-arm-kernel@lists.infradead.org; Fri, 26 Jun 2026 07:04:53 +0000 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1782457485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yFMA8+V+zE8pc9O0ExLyoI+ctNuCMpTqUvr6xoh2490=; b=YQAIHr5eDMLP1cGSlBtmBWb0v8REM76yaiSbOUM377Z7MNDW60ar8LbUhXgk/nw1irBHKv pQXjc3P3l4gKYlD6dULpmofOp0fhglx4S4cDtOTnyVLG3NZcHvxzJcexVJapVh7qoF34/l 5kV+lBUdtY3/qNYWeHFObf1wAYXQX9Y= From: Fuad Tabba To: Marc Zyngier , Oliver Upton , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Catalin Marinas , Will Deacon , Joey Gouly , Steffen Eiden , Suzuki K Poulose , Zenghui Yu , Vincent Donnefort , Quentin Perret , Sebastian Ene , Hyunwoo Kim , Fuad Tabba Subject: [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests Date: Fri, 26 Jun 2026 08:04:08 +0100 Message-Id: <20260626070408.3420953-9-fuad.tabba@linux.dev> In-Reply-To: <20260626070408.3420953-1-fuad.tabba@linux.dev> References: <20260626070408.3420953-1-fuad.tabba@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.9.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20260626_000450_340323_15C88359 X-CRM114-Status: GOOD ( 24.79 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org pKVM copies a non-protected guest's register context between the host and the hypervisor on every world switch, even when the host never inspects it. Defer the copy: on entry, flush the host context into the hyp vCPU only when the host marked it dirty (PKVM_HOST_STATE_DIRTY); on exit, leave it in the hyp vCPU and copy it back only when the host needs it, via a __pkvm_vcpu_sync_state hypercall or at vcpu put. A protected guest's context is copied as before, since lazy sync only helps where the host is trusted to see the guest's registers. PC and PSTATE are the exception: they are copied back on every exit so the kvm_exit tracepoint reports the guest's real exit PC, and the run loop's vcpu_mode_is_bad_32bit() and SError-masking checks evaluate the guest's current PSTATE rather than the value left by the previous sync. The host needs the full context when it is about to read it (trap handling) or write it (the SError injection that writes ESR_EL1). Sync both from handle_exit_early(), which runs non-preemptible so the loaded hyp vCPU is stable without a preempt guard. Signed-off-by: Fuad Tabba --- arch/arm64/include/asm/kvm_asm.h | 1 + arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/arm.c | 7 +++ arch/arm64/kvm/handle_exit.c | 23 ++++++++ arch/arm64/kvm/hyp/nvhe/hyp-main.c | 86 ++++++++++++++++++++++++++++-- 5 files changed, 114 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 043495f7fc78b..6e1135b3ded44 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -113,6 +113,7 @@ enum __kvm_host_smccc_func { __KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm, __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load, __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put, + __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state, __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid, MARKER(__KVM_HOST_SMCCC_FUNC_MAX) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2faa60df847d2..caa39ee5125f2 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1068,6 +1068,8 @@ struct kvm_vcpu_arch { #define INCREMENT_PC __vcpu_single_flag(iflags, BIT(1)) /* Target EL/MODE (not a single flag, but let's abuse the macro) */ #define EXCEPT_MASK __vcpu_single_flag(iflags, GENMASK(3, 1)) +/* Host-set: the hyp flushes the non-protected vCPU state in on entry */ +#define PKVM_HOST_STATE_DIRTY __vcpu_single_flag(iflags, BIT(4)) /* Helpers to encode exceptions with minimum fuss */ #define __EXCEPT_MASK_VAL unpack_vcpu_flag(EXCEPT_MASK) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 3732ee9eb0d4e..4e89558d80278 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -733,6 +733,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) if (is_protected_kvm_enabled()) { kvm_call_hyp(__vgic_v3_save_aprs, &vcpu->arch.vgic_cpu.vgic_v3); kvm_call_hyp_nvhe(__pkvm_vcpu_put); + + /* __pkvm_vcpu_put implies a sync of the state */ + if (!kvm_vm_is_protected(vcpu->kvm)) + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY); } kvm_vcpu_put_debug(vcpu); @@ -964,6 +968,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) return ret; if (is_protected_kvm_enabled()) { + /* Start with the vcpu in a dirty state */ + if (!kvm_vm_is_protected(vcpu->kvm)) + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY); ret = pkvm_create_hyp_vm(kvm); if (ret) return ret; diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 54aedf93c78b6..29108e5c0206e 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -486,9 +486,32 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index) } } +static void handle_exit_pkvm_state(struct kvm_vcpu *vcpu, int exception_index) +{ + int exception_code = ARM_EXCEPTION_CODE(exception_index); + + if (!is_protected_kvm_enabled() || kvm_vm_is_protected(vcpu->kvm)) + return; + + /* + * Sync the context back when the host will read (trap) or write + * (SError) it. Preempt-off here, so the loaded hyp vCPU is stable. + */ + if (exception_code == ARM_EXCEPTION_TRAP || + exception_code == ARM_EXCEPTION_EL1_SERROR || + ARM_SERROR_PENDING(exception_index)) { + kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state); + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY); + } else { + vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY); + } +} + /* For exit types that need handling before we can be preempted */ void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index) { + handle_exit_pkvm_state(vcpu, exception_index); + if (ARM_SERROR_PENDING(exception_index)) { if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) { u64 disr = kvm_vcpu_get_disr(vcpu); diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index 0194965930e61..acf53aae4fe43 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -141,6 +141,48 @@ static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu) host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i]; } +static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu, + struct kvm_vcpu *to_vcpu) +{ + int i; + + to_vcpu->arch.ctxt.regs = from_vcpu->arch.ctxt.regs; + to_vcpu->arch.ctxt.spsr_abt = from_vcpu->arch.ctxt.spsr_abt; + to_vcpu->arch.ctxt.spsr_und = from_vcpu->arch.ctxt.spsr_und; + to_vcpu->arch.ctxt.spsr_irq = from_vcpu->arch.ctxt.spsr_irq; + to_vcpu->arch.ctxt.spsr_fiq = from_vcpu->arch.ctxt.spsr_fiq; + to_vcpu->arch.ctxt.fp_regs = from_vcpu->arch.ctxt.fp_regs; + + /* + * Copy the sysregs, but don't mess with the timer state which + * is directly handled by EL1 and is expected to be preserved. + * enum vcpu_sysreg is sparse: VNCR-mapped registers take values + * derived from their VNCR page offset, so the timer registers do + * not form a contiguous numeric range and must be skipped by name. + */ + for (i = 1; i < NR_SYS_REGS; i++) { + switch (i) { + case CNTVOFF_EL2: + case CNTV_CVAL_EL0: + case CNTV_CTL_EL0: + case CNTP_CVAL_EL0: + case CNTP_CTL_EL0: + continue; + } + to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i]; + } +} + +static void sync_hyp_vcpu_state(struct pkvm_hyp_vcpu *hyp_vcpu) +{ + __copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu); +} + +static void flush_hyp_vcpu_state(struct pkvm_hyp_vcpu *hyp_vcpu) +{ + __copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu); +} + static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu) { struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu; @@ -170,7 +212,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu) fpsimd_sve_flush(); flush_debug_state(hyp_vcpu); - hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt; + /* + * If we deal with a non-protected guest and the state is potentially + * dirty (from a host perspective), copy the state back into the hyp + * vcpu. + */ + if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) { + if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) + flush_hyp_vcpu_state(hyp_vcpu); + } else { + hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt; + } /* __hyp_running_vcpu must be NULL in a guest context. */ hyp_vcpu->vcpu.arch.ctxt.__hyp_running_vcpu = NULL; @@ -201,9 +253,13 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu) fpsimd_sve_sync(&hyp_vcpu->vcpu); sync_debug_state(hyp_vcpu); - host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt; - - host_vcpu->arch.hcr_el2 = hyp_vcpu->vcpu.arch.hcr_el2; + if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) { + host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt; + } else { + /* Keep PC (tracepoint) and PSTATE (vcpu_mode_is_bad_32bit) current. */ + host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc; + host_vcpu->arch.ctxt.regs.pstate = hyp_vcpu->vcpu.arch.ctxt.regs.pstate; + } host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault; @@ -237,8 +293,27 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt) { struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu(); - if (hyp_vcpu) + if (hyp_vcpu) { + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu; + + if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) && + !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) { + sync_hyp_vcpu_state(hyp_vcpu); + } + pkvm_put_hyp_vcpu(hyp_vcpu); + } +} + +static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt) +{ + struct pkvm_hyp_vcpu *hyp_vcpu; + + hyp_vcpu = pkvm_get_loaded_hyp_vcpu(); + if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu)) + return; + + sync_hyp_vcpu_state(hyp_vcpu); } static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg, @@ -869,6 +944,7 @@ static const hcall_t host_hcall[] = { HANDLE_FUNC(__pkvm_finalize_teardown_vm), HANDLE_FUNC(__pkvm_vcpu_load), HANDLE_FUNC(__pkvm_vcpu_put), + HANDLE_FUNC(__pkvm_vcpu_sync_state), HANDLE_FUNC(__pkvm_tlb_flush_vmid), }; -- 2.39.5