[PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation
@ 2026-06-12  6:59 tabba
  2026-06-12  6:59 ` [PATCH v1 01/11] KVM: arm64: Add scoped resource management (guard) for hyp_spinlock tabba
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

Hi folks,

Building on Will's pKVM infrastructure series [1], this series reworks
how pKVM moves vCPU state between the host and EL2, and stops copying a
non-protected guest's state on every world switch.

EL2 gains proper primitives for the state it transfers: vCPU lookup
helpers, and VGIC flush/sync that reduces how much host state EL2
dereferences. The series also moves some preparatory code (such as sys
reg access and PSCI helpers) to shared headers and HYP, and implements
lazy copying of a non-protected guest's register state back to the host
until the host actually needs it, instead of on every exit.

This is the first of two series moving pKVM vCPU state management to
EL2. The follow-up completes the job for protected VMs: state
isolation, PSCI handling at EL2, and the resulting API behaviour.

The series is structured as follows:

  01-03:  Guard/scoped-resource support for hyp_spinlock and KVM locking
          (Marc asked for this to land as a prequel to a series that uses it).
  04-07:  Preparatory refactoring (MPIDR, sys reg access, vCPU reset, PSCI
          helpers) to shared headers and HYP.
  08:     Host and hypervisor vCPU lookup primitives.
  09-10:  VGIC: reduce EL2's exposure to host state, add flush/sync primitives.
  11:     Lazy state sync for non-protected guests.

Based on v7.1-rc7.

[1] https://lore.kernel.org/all/20260105154939.11041-1-will@kernel.org/

Cheers,
/fuad

Fuad Tabba (8):
  KVM: arm64: Add scoped resource management (guard) for hyp_spinlock
  KVM: arm64: Use guard(hyp_spinlock) in pKVM hypervisor code
  KVM: arm64: Use guard()/scoped_guard() in arm64 KVM EL1 code
  KVM: arm64: Extract MPIDR computation into a shared header
  KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code
  KVM: arm64: Factor out reusable vCPU reset helpers
  KVM: arm64: Move PSCI helper functions to a shared header
  KVM: arm64: Implement lazy vCPU state sync for non-protected guests

Marc Zyngier (3):
  KVM: arm64: Add host and hypervisor vCPU lookup primitives
  KVM: arm64: Minimise EL2's exposure of host VGIC state during world
    switch
  KVM: arm64: Add primitives to flush/sync the VGIC state at EL2

 arch/arm64/include/asm/kvm_arm.h           |  12 +
 arch/arm64/include/asm/kvm_asm.h           |   1 +
 arch/arm64/include/asm/kvm_emulate.h       |  80 ++++++-
 arch/arm64/include/asm/kvm_host.h          |   2 +
 arch/arm64/kvm/arm.c                       |  21 +-
 arch/arm64/kvm/handle_exit.c               |  22 ++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h |   6 +
 arch/arm64/kvm/hyp/nvhe/ffa.c              | 154 +++++--------
 arch/arm64/kvm/hyp/nvhe/hyp-main.c         | 255 ++++++++++++++++++---
 arch/arm64/kvm/hyp/nvhe/mm.c               |  37 +--
 arch/arm64/kvm/hyp/nvhe/page_alloc.c       |  13 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c             |  86 +++----
 arch/arm64/kvm/mmu.c                       |  80 +++----
 arch/arm64/kvm/pkvm.c                      |  26 +--
 arch/arm64/kvm/psci.c                      |  47 +---
 arch/arm64/kvm/reset.c                     |  68 +-----
 arch/arm64/kvm/sys_regs.c                  |  14 +-
 arch/arm64/kvm/sys_regs.h                  |  19 ++
 include/kvm/arm_psci.h                     |  28 +++
 19 files changed, 562 insertions(+), 409 deletions(-)


base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 01/11] KVM: arm64: Add scoped resource management (guard) for hyp_spinlock
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  6:59 ` [PATCH v1 02/11] KVM: arm64: Use guard(hyp_spinlock) in pKVM hypervisor code tabba
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

The nVHE hypervisor manages hyp_spinlock_t locks by hand across error
paths, where a missed unlock deadlocks the next CPU to take the lock.
Wire hyp_spinlock_t into <linux/cleanup.h> via DEFINE_LOCK_GUARD_1 so
callers can use guard(hyp_spinlock) and scoped_guard(hyp_spinlock),
letting later patches replace the manual lock/unlock pairs.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 7c7ea8c55405..63ba826d8e3d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -13,6 +13,8 @@
 #ifndef __ARM64_KVM_NVHE_SPINLOCK_H__
 #define __ARM64_KVM_NVHE_SPINLOCK_H__
 
+#include <linux/cleanup.h>
+
 #include <asm/alternative.h>
 #include <asm/lse.h>
 #include <asm/rwonce.h>
@@ -98,6 +100,10 @@ static inline void hyp_spin_unlock(hyp_spinlock_t *lock)
 	: "memory");
 }
 
+DEFINE_LOCK_GUARD_1(hyp_spinlock, hyp_spinlock_t,
+		    hyp_spin_lock(_T->lock),
+		    hyp_spin_unlock(_T->lock))
+
 static inline bool hyp_spin_is_locked(hyp_spinlock_t *lock)
 {
 	hyp_spinlock_t lockval = READ_ONCE(*lock);
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 02/11] KVM: arm64: Use guard(hyp_spinlock) in pKVM hypervisor code
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
  2026-06-12  6:59 ` [PATCH v1 01/11] KVM: arm64: Add scoped resource management (guard) for hyp_spinlock tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  6:59 ` [PATCH v1 03/11] KVM: arm64: Use guard()/scoped_guard() in arm64 KVM EL1 code tabba
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

Convert the manual hyp_spin_lock()/hyp_spin_unlock() pairs in
arch/arm64/kvm/hyp/nvhe/{pkvm,mm,page_alloc,ffa}.c to
guard(hyp_spinlock) and scoped_guard(hyp_spinlock), dropping several
unlock-only goto labels in favour of direct returns.

hyp_fixblock_lock in mm.c is left as an explicit lock/unlock pair: it is
acquired in hyp_fixblock_map() and released in hyp_fixblock_unmap(), so
its critical section spans two functions and cannot be expressed as a
single lexical scope.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/nvhe/ffa.c        | 154 +++++++++++----------------
 arch/arm64/kvm/hyp/nvhe/mm.c         |  37 ++-----
 arch/arm64/kvm/hyp/nvhe/page_alloc.c |  13 +--
 arch/arm64/kvm/hyp/nvhe/pkvm.c       |  86 +++++----------
 4 files changed, 105 insertions(+), 185 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
index 1af722771178..46cd4fa924be 100644
--- a/arch/arm64/kvm/hyp/nvhe/ffa.c
+++ b/arch/arm64/kvm/hyp/nvhe/ffa.c
@@ -313,17 +313,16 @@ static void do_ffa_rxtx_unmap(struct arm_smccc_1_2_regs *res,
 			      struct kvm_cpu_context *ctxt)
 {
 	DECLARE_REG(u32, id, ctxt, 1);
-	int ret = 0;
 
 	if (id != HOST_FFA_ID) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
-	hyp_spin_lock(&host_buffers.lock);
+	guard(hyp_spinlock)(&host_buffers.lock);
 	if (!host_buffers.tx) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
 	hyp_unpin_shared_mem(host_buffers.tx, host_buffers.tx + 1);
@@ -336,10 +335,7 @@ static void do_ffa_rxtx_unmap(struct arm_smccc_1_2_regs *res,
 
 	ffa_unmap_hyp_buffers();
 
-out_unlock:
-	hyp_spin_unlock(&host_buffers.lock);
-out:
-	ffa_to_smccc_res(res, ret);
+	ffa_to_smccc_res(res, 0);
 }
 
 static u32 __ffa_host_share_ranges(struct ffa_mem_region_addr_range *ranges,
@@ -418,18 +414,20 @@ static void do_ffa_mem_frag_tx(struct arm_smccc_1_2_regs *res,
 	DECLARE_REG(u32, fraglen, ctxt, 3);
 	DECLARE_REG(u32, endpoint_id, ctxt, 4);
 	struct ffa_mem_region_addr_range *buf;
-	int ret = FFA_RET_INVALID_PARAMETERS;
+	int ret;
 	u32 nr_ranges;
 
-	if (fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE)
-		goto out;
+	if (fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE ||
+	    fraglen % sizeof(*buf)) {
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
+	}
 
-	if (fraglen % sizeof(*buf))
-		goto out;
-
-	hyp_spin_lock(&host_buffers.lock);
-	if (!host_buffers.tx)
-		goto out_unlock;
+	guard(hyp_spinlock)(&host_buffers.lock);
+	if (!host_buffers.tx) {
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
+	}
 
 	buf = hyp_buffers.tx;
 	memcpy(buf, host_buffers.tx, fraglen);
@@ -444,19 +442,14 @@ static void do_ffa_mem_frag_tx(struct arm_smccc_1_2_regs *res,
 		 */
 		ffa_mem_reclaim(res, handle_lo, handle_hi, 0);
 		WARN_ON(res->a0 != FFA_SUCCESS);
-		goto out_unlock;
+		ffa_to_smccc_res(res, ret);
+		return;
 	}
 
 	ffa_mem_frag_tx(res, handle_lo, handle_hi, fraglen, endpoint_id);
 	if (res->a0 != FFA_SUCCESS && res->a0 != FFA_MEM_FRAG_RX)
 		WARN_ON(ffa_host_unshare_ranges(buf, nr_ranges));
 
-out_unlock:
-	hyp_spin_unlock(&host_buffers.lock);
-out:
-	if (ret)
-		ffa_to_smccc_res(res, ret);
-
 	/*
 	 * If for any reason this did not succeed, we're in trouble as we have
 	 * now lost the content of the previous fragments and we can't rollback
@@ -465,7 +458,6 @@ static void do_ffa_mem_frag_tx(struct arm_smccc_1_2_regs *res,
 	 * sharing/donating them again and may possibly lead to subsequent
 	 * failures, but this will not compromise confidentiality.
 	 */
-	return;
 }
 
 static void __do_ffa_mem_xfer(const u64 func_id,
@@ -480,29 +472,29 @@ static void __do_ffa_mem_xfer(const u64 func_id,
 	struct ffa_composite_mem_region *reg;
 	struct ffa_mem_region *buf;
 	u32 offset, nr_ranges, checked_offset;
-	int ret = 0;
+	int ret;
 
 	if (addr_mbz || npages_mbz || fraglen > len ||
 	    fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
 	if (fraglen < sizeof(struct ffa_mem_region) +
 		      sizeof(struct ffa_mem_region_attributes)) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
-	hyp_spin_lock(&host_buffers.lock);
+	guard(hyp_spinlock)(&host_buffers.lock);
 	if (!host_buffers.tx) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
 	if (len > ffa_desc_buf.len) {
-		ret = FFA_RET_NO_MEMORY;
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_NO_MEMORY);
+		return;
 	}
 
 	buf = hyp_buffers.tx;
@@ -512,53 +504,41 @@ static void __do_ffa_mem_xfer(const u64 func_id,
 			ffa_mem_desc_offset(buf, 0, hyp_ffa_version);
 	offset = ep_mem_access->composite_off;
 	if (!offset || buf->ep_count != 1 || buf->sender_id != HOST_FFA_ID) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
 	if (check_add_overflow(offset, sizeof(struct ffa_composite_mem_region), &checked_offset)) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
 	if (fraglen < checked_offset) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
 	reg = (void *)buf + offset;
 	nr_ranges = ((void *)buf + fraglen) - (void *)reg->constituents;
 	if (nr_ranges % sizeof(reg->constituents[0])) {
-		ret = FFA_RET_INVALID_PARAMETERS;
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+		return;
 	}
 
 	nr_ranges /= sizeof(reg->constituents[0]);
 	ret = ffa_host_share_ranges(reg->constituents, nr_ranges);
-	if (ret)
-		goto out_unlock;
+	if (ret) {
+		ffa_to_smccc_res(res, ret);
+		return;
+	}
 
 	ffa_mem_xfer(res, func_id, len, fraglen);
 	if (fraglen != len) {
-		if (res->a0 != FFA_MEM_FRAG_RX)
-			goto err_unshare;
-
-		if (res->a3 != fraglen)
-			goto err_unshare;
+		if (res->a0 != FFA_MEM_FRAG_RX || res->a3 != fraglen)
+			WARN_ON(ffa_host_unshare_ranges(reg->constituents, nr_ranges));
 	} else if (res->a0 != FFA_SUCCESS) {
-		goto err_unshare;
+		WARN_ON(ffa_host_unshare_ranges(reg->constituents, nr_ranges));
 	}
-
-out_unlock:
-	hyp_spin_unlock(&host_buffers.lock);
-out:
-	if (ret)
-		ffa_to_smccc_res(res, ret);
-	return;
-
-err_unshare:
-	WARN_ON(ffa_host_unshare_ranges(reg->constituents, nr_ranges));
-	goto out_unlock;
 }
 
 #define do_ffa_mem_xfer(fid, res, ctxt)				\
@@ -578,12 +558,11 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
 	struct ffa_composite_mem_region *reg;
 	u32 offset, len, fraglen, fragoff;
 	struct ffa_mem_region *buf;
-	int ret = 0;
 	u64 handle;
 
 	handle = PACK_HANDLE(handle_lo, handle_hi);
 
-	hyp_spin_lock(&host_buffers.lock);
+	guard(hyp_spinlock)(&host_buffers.lock);
 
 	buf = hyp_buffers.tx;
 	*buf = (struct ffa_mem_region) {
@@ -594,7 +573,7 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
 	ffa_retrieve_req(res, sizeof(*buf));
 	buf = hyp_buffers.rx;
 	if (res->a0 != FFA_MEM_RETRIEVE_RESP)
-		goto out_unlock;
+		return;
 
 	len = res->a1;
 	fraglen = res->a2;
@@ -609,15 +588,15 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
 	 */
 	if (WARN_ON(offset > len ||
 		    fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE)) {
-		ret = FFA_RET_ABORTED;
 		ffa_rx_release(res);
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_ABORTED);
+		return;
 	}
 
 	if (len > ffa_desc_buf.len) {
-		ret = FFA_RET_NO_MEMORY;
 		ffa_rx_release(res);
-		goto out_unlock;
+		ffa_to_smccc_res(res, FFA_RET_NO_MEMORY);
+		return;
 	}
 
 	buf = ffa_desc_buf.buf;
@@ -627,8 +606,8 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
 	for (fragoff = fraglen; fragoff < len; fragoff += fraglen) {
 		ffa_mem_frag_rx(res, handle_lo, handle_hi, fragoff);
 		if (res->a0 != FFA_MEM_FRAG_TX) {
-			ret = FFA_RET_INVALID_PARAMETERS;
-			goto out_unlock;
+			ffa_to_smccc_res(res, FFA_RET_INVALID_PARAMETERS);
+			return;
 		}
 
 		fraglen = res->a3;
@@ -638,17 +617,12 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
 
 	ffa_mem_reclaim(res, handle_lo, handle_hi, flags);
 	if (res->a0 != FFA_SUCCESS)
-		goto out_unlock;
+		return;
 
 	reg = (void *)buf + offset;
 	/* If the SPMD was happy, then we should be too. */
 	WARN_ON(ffa_host_unshare_ranges(reg->constituents,
 					reg->addr_range_cnt));
-out_unlock:
-	hyp_spin_unlock(&host_buffers.lock);
-
-	if (ret)
-		ffa_to_smccc_res(res, ret);
 }
 
 /*
@@ -774,13 +748,13 @@ static void do_ffa_version(struct arm_smccc_1_2_regs *res,
 		return;
 	}
 
-	hyp_spin_lock(&version_lock);
+	guard(hyp_spinlock)(&version_lock);
 	if (has_version_negotiated) {
 		if (FFA_MINOR_VERSION(ffa_req_version) < FFA_MINOR_VERSION(hyp_ffa_version))
 			res->a0 = FFA_RET_NOT_SUPPORTED;
 		else
 			res->a0 = hyp_ffa_version;
-		goto unlock;
+		return;
 	}
 
 	/*
@@ -793,7 +767,7 @@ static void do_ffa_version(struct arm_smccc_1_2_regs *res,
 			.a1 = ffa_req_version,
 		}, res);
 		if ((s32)res->a0 == FFA_RET_NOT_SUPPORTED)
-			goto unlock;
+			return;
 
 		hyp_ffa_version = ffa_req_version;
 	}
@@ -804,8 +778,6 @@ static void do_ffa_version(struct arm_smccc_1_2_regs *res,
 		smp_store_release(&has_version_negotiated, true);
 		res->a0 = hyp_ffa_version;
 	}
-unlock:
-	hyp_spin_unlock(&version_lock);
 }
 
 static void do_ffa_part_get(struct arm_smccc_1_2_regs *res,
@@ -818,10 +790,10 @@ static void do_ffa_part_get(struct arm_smccc_1_2_regs *res,
 	DECLARE_REG(u32, flags, ctxt, 5);
 	u32 count, partition_sz, copy_sz;
 
-	hyp_spin_lock(&host_buffers.lock);
+	guard(hyp_spinlock)(&host_buffers.lock);
 	if (!host_buffers.rx) {
 		ffa_to_smccc_res(res, FFA_RET_BUSY);
-		goto out_unlock;
+		return;
 	}
 
 	hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
@@ -834,16 +806,16 @@ static void do_ffa_part_get(struct arm_smccc_1_2_regs *res,
 	}, res);
 
 	if (res->a0 != FFA_SUCCESS)
-		goto out_unlock;
+		return;
 
 	count = res->a2;
 	if (!count)
-		goto out_unlock;
+		return;
 
 	if (hyp_ffa_version > FFA_VERSION_1_0) {
 		/* Get the number of partitions deployed in the system */
 		if (flags & 0x1)
-			goto out_unlock;
+			return;
 
 		partition_sz  = res->a3;
 	} else {
@@ -854,12 +826,10 @@ static void do_ffa_part_get(struct arm_smccc_1_2_regs *res,
 	copy_sz = partition_sz * count;
 	if (copy_sz > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE) {
 		ffa_to_smccc_res(res, FFA_RET_ABORTED);
-		goto out_unlock;
+		return;
 	}
 
 	memcpy(host_buffers.rx, hyp_buffers.rx, copy_sz);
-out_unlock:
-	hyp_spin_unlock(&host_buffers.lock);
 }
 
 bool kvm_host_ffa_handler(struct kvm_cpu_context *host_ctxt, u32 func_id)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 3b0bee496bff..56c3eb4a2251 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -35,13 +35,8 @@ static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
 static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 				  unsigned long phys, enum kvm_pgtable_prot prot)
 {
-	int err;
-
-	hyp_spin_lock(&pkvm_pgd_lock);
-	err = kvm_pgtable_hyp_map(&pkvm_pgtable, start, size, phys, prot);
-	hyp_spin_unlock(&pkvm_pgd_lock);
-
-	return err;
+	guard(hyp_spinlock)(&pkvm_pgd_lock);
+	return kvm_pgtable_hyp_map(&pkvm_pgtable, start, size, phys, prot);
 }
 
 static int __pkvm_alloc_private_va_range(unsigned long start, size_t size)
@@ -80,10 +75,9 @@ int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr)
 	unsigned long addr;
 	int ret;
 
-	hyp_spin_lock(&pkvm_pgd_lock);
+	guard(hyp_spinlock)(&pkvm_pgd_lock);
 	addr = __io_map_base;
 	ret = __pkvm_alloc_private_va_range(addr, size);
-	hyp_spin_unlock(&pkvm_pgd_lock);
 
 	*haddr = addr;
 
@@ -137,13 +131,8 @@ int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot
 
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 {
-	int ret;
-
-	hyp_spin_lock(&pkvm_pgd_lock);
-	ret = pkvm_create_mappings_locked(from, to, prot);
-	hyp_spin_unlock(&pkvm_pgd_lock);
-
-	return ret;
+	guard(hyp_spinlock)(&pkvm_pgd_lock);
+	return pkvm_create_mappings_locked(from, to, prot);
 }
 
 int hyp_back_vmemmap(phys_addr_t back)
@@ -340,22 +329,17 @@ static int create_fixblock(void)
 	if (i >= hyp_memblock_nr)
 		return -EINVAL;
 
-	hyp_spin_lock(&pkvm_pgd_lock);
+	guard(hyp_spinlock)(&pkvm_pgd_lock);
 	addr = ALIGN(__io_map_base, PMD_SIZE);
 	ret = __pkvm_alloc_private_va_range(addr, PMD_SIZE);
 	if (ret)
-		goto unlock;
+		return ret;
 
 	ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PMD_SIZE, phys, PAGE_HYP);
 	if (ret)
-		goto unlock;
+		return ret;
 
-	ret = kvm_pgtable_walk(&pkvm_pgtable, addr, PMD_SIZE, &walker);
-
-unlock:
-	hyp_spin_unlock(&pkvm_pgd_lock);
-
-	return ret;
+	return kvm_pgtable_walk(&pkvm_pgtable, addr, PMD_SIZE, &walker);
 #else
 	return 0;
 #endif
@@ -437,7 +421,7 @@ int pkvm_create_stack(phys_addr_t phys, unsigned long *haddr)
 	size_t size;
 	int ret;
 
-	hyp_spin_lock(&pkvm_pgd_lock);
+	guard(hyp_spinlock)(&pkvm_pgd_lock);
 
 	prev_base = __io_map_base;
 	/*
@@ -463,7 +447,6 @@ int pkvm_create_stack(phys_addr_t phys, unsigned long *haddr)
 		if (ret)
 			__io_map_base = prev_base;
 	}
-	hyp_spin_unlock(&pkvm_pgd_lock);
 
 	*haddr = addr + size;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index a1eb27a1a747..f43d8ad507e9 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -167,18 +167,16 @@ void hyp_put_page(struct hyp_pool *pool, void *addr)
 {
 	struct hyp_page *p = hyp_virt_to_page(addr);
 
-	hyp_spin_lock(&pool->lock);
+	guard(hyp_spinlock)(&pool->lock);
 	__hyp_put_page(pool, p);
-	hyp_spin_unlock(&pool->lock);
 }
 
 void hyp_get_page(struct hyp_pool *pool, void *addr)
 {
 	struct hyp_page *p = hyp_virt_to_page(addr);
 
-	hyp_spin_lock(&pool->lock);
+	guard(hyp_spinlock)(&pool->lock);
 	hyp_page_ref_inc(p);
-	hyp_spin_unlock(&pool->lock);
 }
 
 void hyp_split_page(struct hyp_page *p)
@@ -200,22 +198,19 @@ void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
 	struct hyp_page *p;
 	u8 i = order;
 
-	hyp_spin_lock(&pool->lock);
+	guard(hyp_spinlock)(&pool->lock);
 
 	/* Look for a high-enough-order page */
 	while (i <= pool->max_order && list_empty(&pool->free_area[i]))
 		i++;
-	if (i > pool->max_order) {
-		hyp_spin_unlock(&pool->lock);
+	if (i > pool->max_order)
 		return NULL;
-	}
 
 	/* Extract it from the tree at the right order */
 	p = node_to_page(pool->free_area[i].next);
 	p = __hyp_extract_page(pool, p, order);
 
 	hyp_set_page_refcounted(p);
-	hyp_spin_unlock(&pool->lock);
 
 	return hyp_page_to_virt(p);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index eb1c10120f9f..7d843afd8c0a 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -258,32 +258,27 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
 	if (__this_cpu_read(loaded_hyp_vcpu))
 		return NULL;
 
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	hyp_vm = get_vm_by_handle(handle);
 	if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying)
-		goto unlock;
+		return NULL;
 
 	if (hyp_vm->kvm.created_vcpus <= vcpu_idx)
-		goto unlock;
+		return NULL;
 
 	/* Pairs with smp_store_release() in register_hyp_vcpu(). */
 	hyp_vcpu = smp_load_acquire(&hyp_vm->vcpus[vcpu_idx]);
 	if (!hyp_vcpu)
-		goto unlock;
+		return NULL;
 
 	/* Ensure vcpu isn't loaded on more than one cpu simultaneously. */
-	if (unlikely(hyp_vcpu->loaded_hyp_vcpu)) {
-		hyp_vcpu = NULL;
-		goto unlock;
-	}
+	if (unlikely(hyp_vcpu->loaded_hyp_vcpu))
+		return NULL;
 
 	hyp_vcpu->loaded_hyp_vcpu = this_cpu_ptr(&loaded_hyp_vcpu);
 	hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
-unlock:
-	hyp_spin_unlock(&vm_table_lock);
 
-	if (hyp_vcpu)
-		__this_cpu_write(loaded_hyp_vcpu, hyp_vcpu);
+	__this_cpu_write(loaded_hyp_vcpu, hyp_vcpu);
 	return hyp_vcpu;
 }
 
@@ -291,11 +286,10 @@ void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 {
 	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
 
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	hyp_vcpu->loaded_hyp_vcpu = NULL;
 	__this_cpu_write(loaded_hyp_vcpu, NULL);
 	hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
-	hyp_spin_unlock(&vm_table_lock);
 }
 
 struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void)
@@ -308,20 +302,18 @@ struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle)
 {
 	struct pkvm_hyp_vm *hyp_vm;
 
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	hyp_vm = get_vm_by_handle(handle);
 	if (hyp_vm)
 		hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
-	hyp_spin_unlock(&vm_table_lock);
 
 	return hyp_vm;
 }
 
 void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm)
 {
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
-	hyp_spin_unlock(&vm_table_lock);
 }
 
 struct pkvm_hyp_vm *get_np_pkvm_hyp_vm(pkvm_handle_t handle)
@@ -620,13 +612,8 @@ static int __insert_vm_table_entry(pkvm_handle_t handle,
 static int insert_vm_table_entry(pkvm_handle_t handle,
 				 struct pkvm_hyp_vm *hyp_vm)
 {
-	int ret;
-
-	hyp_spin_lock(&vm_table_lock);
-	ret = __insert_vm_table_entry(handle, hyp_vm);
-	hyp_spin_unlock(&vm_table_lock);
-
-	return ret;
+	guard(hyp_spinlock)(&vm_table_lock);
+	return __insert_vm_table_entry(handle, hyp_vm);
 }
 
 /*
@@ -701,9 +688,8 @@ int __pkvm_reserve_vm(void)
 {
 	int ret;
 
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	ret = allocate_vm_table_entry();
-	hyp_spin_unlock(&vm_table_lock);
 
 	if (ret < 0)
 		return ret;
@@ -722,10 +708,9 @@ void __pkvm_unreserve_vm(pkvm_handle_t handle)
 	if (unlikely(!vm_table))
 		return;
 
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	if (likely(idx < KVM_MAX_PVMS && vm_table[idx] == RESERVED_ENTRY))
 		remove_vm_table_entry(handle);
-	hyp_spin_unlock(&vm_table_lock);
 }
 
 #ifdef CONFIG_NVHE_EL2_DEBUG
@@ -785,9 +770,8 @@ struct pkvm_hyp_vcpu *init_selftest_vm(void *virt)
 
 void teardown_selftest_vm(void)
 {
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	remove_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle);
-	hyp_spin_unlock(&vm_table_lock);
 }
 #endif /* CONFIG_NVHE_EL2_DEBUG */
 
@@ -973,20 +957,15 @@ static struct pkvm_hyp_vm *get_pkvm_unref_hyp_vm_locked(pkvm_handle_t handle)
 int __pkvm_start_teardown_vm(pkvm_handle_t handle)
 {
 	struct pkvm_hyp_vm *hyp_vm;
-	int ret = 0;
 
-	hyp_spin_lock(&vm_table_lock);
+	guard(hyp_spinlock)(&vm_table_lock);
 	hyp_vm = get_pkvm_unref_hyp_vm_locked(handle);
-	if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying) {
-		ret = -EINVAL;
-		goto unlock;
-	}
+	if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying)
+		return -EINVAL;
 
 	hyp_vm->kvm.arch.pkvm.is_dying = true;
-unlock:
-	hyp_spin_unlock(&vm_table_lock);
 
-	return ret;
+	return 0;
 }
 
 int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
@@ -996,22 +975,19 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
 	struct kvm *host_kvm;
 	unsigned int idx;
 	size_t vm_size;
-	int err;
 
-	hyp_spin_lock(&vm_table_lock);
-	hyp_vm = get_pkvm_unref_hyp_vm_locked(handle);
-	if (!hyp_vm || !hyp_vm->kvm.arch.pkvm.is_dying) {
-		err = -EINVAL;
-		goto err_unlock;
+	scoped_guard(hyp_spinlock, &vm_table_lock) {
+		hyp_vm = get_pkvm_unref_hyp_vm_locked(handle);
+		if (!hyp_vm || !hyp_vm->kvm.arch.pkvm.is_dying)
+			return -EINVAL;
+
+		host_kvm = hyp_vm->host_kvm;
+
+		/* Ensure the VMID is clean before it can be reallocated */
+		__kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
+		remove_vm_table_entry(handle);
 	}
 
-	host_kvm = hyp_vm->host_kvm;
-
-	/* Ensure the VMID is clean before it can be reallocated */
-	__kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
-	remove_vm_table_entry(handle);
-	hyp_spin_unlock(&vm_table_lock);
-
 	/* Reclaim guest pages (including page-table pages) */
 	mc = &host_kvm->arch.pkvm.teardown_mc;
 	stage2_mc = &host_kvm->arch.pkvm.stage2_teardown_mc;
@@ -1042,10 +1018,6 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
 	teardown_donated_memory(mc, hyp_vm, vm_size);
 	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
 	return 0;
-
-err_unlock:
-	hyp_spin_unlock(&vm_table_lock);
-	return err;
 }
 
 static u64 __pkvm_memshare_page_req(struct kvm_vcpu *vcpu, u64 ipa)
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 03/11] KVM: arm64: Use guard()/scoped_guard() in arm64 KVM EL1 code
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
  2026-06-12  6:59 ` [PATCH v1 01/11] KVM: arm64: Add scoped resource management (guard) for hyp_spinlock tabba
  2026-06-12  6:59 ` [PATCH v1 02/11] KVM: arm64: Use guard(hyp_spinlock) in pKVM hypervisor code tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  6:59 ` [PATCH v1 04/11] KVM: arm64: Extract MPIDR computation into a shared header tabba
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

Convert the manual mutex_lock()/spin_lock() pairs in
arch/arm64/kvm/{pkvm,arm,mmu,reset,psci}.c to guard(mutex),
guard(spinlock) and scoped_guard(), dropping unlock-only goto labels in
favour of direct returns. Centralised cleanup gotos that still serve
other resources are preserved.

reset.c uses scoped_guard() rather than guard() so the lock covers only
the small read/update window inside kvm_reset_vcpu(), leaving the rest
of the function outside the critical section.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/arm.c   | 14 +++-----
 arch/arm64/kvm/mmu.c   | 80 +++++++++++++++---------------------------
 arch/arm64/kvm/pkvm.c  | 26 ++++++--------
 arch/arm64/kvm/psci.c  | 17 ++++-----
 arch/arm64/kvm/reset.c |  8 ++---
 5 files changed, 53 insertions(+), 92 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9453321ef8c6..c9f36932c980 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -793,9 +793,7 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	int ret = 0;
-
-	spin_lock(&vcpu->arch.mp_state_lock);
+	guard(spinlock)(&vcpu->arch.mp_state_lock);
 
 	switch (mp_state->mp_state) {
 	case KVM_MP_STATE_RUNNABLE:
@@ -808,12 +806,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		kvm_arm_vcpu_suspend(vcpu);
 		break;
 	default:
-		ret = -EINVAL;
+		return -EINVAL;
 	}
 
-	spin_unlock(&vcpu->arch.mp_state_lock);
-
-	return ret;
+	return 0;
 }
 
 /**
@@ -1726,15 +1722,13 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	/*
 	 * Handle the "start in power-off" case.
 	 */
-	spin_lock(&vcpu->arch.mp_state_lock);
+	guard(spinlock)(&vcpu->arch.mp_state_lock);
 
 	if (power_off)
 		__kvm_arm_vcpu_power_off(vcpu);
 	else
 		WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_RUNNABLE);
 
-	spin_unlock(&vcpu->arch.mp_state_lock);
-
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4da9281312eb..d18f4ce7ceae 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -391,13 +391,13 @@ static void stage2_flush_vm(struct kvm *kvm)
  */
 void __init free_hyp_pgds(void)
 {
-	mutex_lock(&kvm_hyp_pgd_mutex);
-	if (hyp_pgtable) {
-		kvm_pgtable_hyp_destroy(hyp_pgtable);
-		kfree(hyp_pgtable);
-		hyp_pgtable = NULL;
-	}
-	mutex_unlock(&kvm_hyp_pgd_mutex);
+	guard(mutex)(&kvm_hyp_pgd_mutex);
+	if (!hyp_pgtable)
+		return;
+
+	kvm_pgtable_hyp_destroy(hyp_pgtable);
+	kfree(hyp_pgtable);
+	hyp_pgtable = NULL;
 }
 
 static bool kvm_host_owns_hyp_mappings(void)
@@ -424,16 +424,11 @@ static bool kvm_host_owns_hyp_mappings(void)
 int __create_hyp_mappings(unsigned long start, unsigned long size,
 			  unsigned long phys, enum kvm_pgtable_prot prot)
 {
-	int err;
-
 	if (WARN_ON(!kvm_host_owns_hyp_mappings()))
 		return -EINVAL;
 
-	mutex_lock(&kvm_hyp_pgd_mutex);
-	err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
-	mutex_unlock(&kvm_hyp_pgd_mutex);
-
-	return err;
+	guard(mutex)(&kvm_hyp_pgd_mutex);
+	return kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
 }
 
 static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
@@ -481,56 +476,42 @@ static int share_pfn_hyp(u64 pfn)
 {
 	struct rb_node **node, *parent;
 	struct hyp_shared_pfn *this;
-	int ret = 0;
 
-	mutex_lock(&hyp_shared_pfns_lock);
+	guard(mutex)(&hyp_shared_pfns_lock);
 	this = find_shared_pfn(pfn, &node, &parent);
 	if (this) {
 		this->count++;
-		goto unlock;
+		return 0;
 	}
 
 	this = kzalloc_obj(*this);
-	if (!this) {
-		ret = -ENOMEM;
-		goto unlock;
-	}
+	if (!this)
+		return -ENOMEM;
 
 	this->pfn = pfn;
 	this->count = 1;
 	rb_link_node(&this->node, parent, node);
 	rb_insert_color(&this->node, &hyp_shared_pfns);
-	ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, pfn);
-unlock:
-	mutex_unlock(&hyp_shared_pfns_lock);
-
-	return ret;
+	return kvm_call_hyp_nvhe(__pkvm_host_share_hyp, pfn);
 }
 
 static int unshare_pfn_hyp(u64 pfn)
 {
 	struct rb_node **node, *parent;
 	struct hyp_shared_pfn *this;
-	int ret = 0;
 
-	mutex_lock(&hyp_shared_pfns_lock);
+	guard(mutex)(&hyp_shared_pfns_lock);
 	this = find_shared_pfn(pfn, &node, &parent);
-	if (WARN_ON(!this)) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (WARN_ON(!this))
+		return -ENOENT;
 
 	this->count--;
 	if (this->count)
-		goto unlock;
+		return 0;
 
 	rb_erase(&this->node, &hyp_shared_pfns);
 	kfree(this);
-	ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn);
-unlock:
-	mutex_unlock(&hyp_shared_pfns_lock);
-
-	return ret;
+	return kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn);
 }
 
 int kvm_share_hyp(void *from, void *to)
@@ -655,7 +636,7 @@ int hyp_alloc_private_va_range(size_t size, unsigned long *haddr)
 	unsigned long base;
 	int ret = 0;
 
-	mutex_lock(&kvm_hyp_pgd_mutex);
+	guard(mutex)(&kvm_hyp_pgd_mutex);
 
 	/*
 	 * This assumes that we have enough space below the idmap
@@ -670,8 +651,6 @@ int hyp_alloc_private_va_range(size_t size, unsigned long *haddr)
 	base = io_map_base - size;
 	ret = __hyp_alloc_private_va_range(base);
 
-	mutex_unlock(&kvm_hyp_pgd_mutex);
-
 	if (!ret)
 		*haddr = base;
 
@@ -714,17 +693,16 @@ int create_hyp_stack(phys_addr_t phys_addr, unsigned long *haddr)
 	size_t size;
 	int ret;
 
-	mutex_lock(&kvm_hyp_pgd_mutex);
-	/*
-	 * Efficient stack verification using the NVHE_STACK_SHIFT bit implies
-	 * an alignment of our allocation on the order of the size.
-	 */
-	size = NVHE_STACK_SIZE * 2;
-	base = ALIGN_DOWN(io_map_base - size, size);
+	scoped_guard(mutex, &kvm_hyp_pgd_mutex) {
+		/*
+		 * Efficient stack verification using the NVHE_STACK_SHIFT bit implies
+		 * an alignment of our allocation on the order of the size.
+		 */
+		size = NVHE_STACK_SIZE * 2;
+		base = ALIGN_DOWN(io_map_base - size, size);
 
-	ret = __hyp_alloc_private_va_range(base);
-
-	mutex_unlock(&kvm_hyp_pgd_mutex);
+		ret = __hyp_alloc_private_va_range(base);
+	}
 
 	if (ret) {
 		kvm_err("Cannot allocate hyp stack guard page\n");
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 053e4f733e4b..a39111b70f9f 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -190,39 +190,33 @@ bool pkvm_hyp_vm_is_created(struct kvm *kvm)
 
 int pkvm_create_hyp_vm(struct kvm *kvm)
 {
-	int ret = 0;
-
 	/*
 	 * Synchronise with kvm_arch_prepare_memory_region(), as we
 	 * prevent memslot modifications on a pVM that has been run.
 	 */
-	mutex_lock(&kvm->slots_lock);
-	mutex_lock(&kvm->arch.config_lock);
-	if (!pkvm_hyp_vm_is_created(kvm))
-		ret = __pkvm_create_hyp_vm(kvm);
-	mutex_unlock(&kvm->arch.config_lock);
-	mutex_unlock(&kvm->slots_lock);
+	guard(mutex)(&kvm->slots_lock);
+	guard(mutex)(&kvm->arch.config_lock);
 
-	return ret;
+	if (!pkvm_hyp_vm_is_created(kvm))
+		return __pkvm_create_hyp_vm(kvm);
+
+	return 0;
 }
 
 int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
 {
-	int ret = 0;
+	guard(mutex)(&vcpu->kvm->arch.config_lock);
 
-	mutex_lock(&vcpu->kvm->arch.config_lock);
 	if (!vcpu_get_flag(vcpu, VCPU_PKVM_FINALIZED))
-		ret = __pkvm_create_hyp_vcpu(vcpu);
-	mutex_unlock(&vcpu->kvm->arch.config_lock);
+		return __pkvm_create_hyp_vcpu(vcpu);
 
-	return ret;
+	return 0;
 }
 
 void pkvm_destroy_hyp_vm(struct kvm *kvm)
 {
-	mutex_lock(&kvm->arch.config_lock);
+	guard(mutex)(&kvm->arch.config_lock);
 	__pkvm_destroy_hyp_vm(kvm);
-	mutex_unlock(&kvm->arch.config_lock);
 }
 
 int pkvm_init_host_vm(struct kvm *kvm, unsigned long type)
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 3b5dbe9a0a0e..e1389c525e9d 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -62,7 +62,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	struct vcpu_reset_state *reset_state;
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL;
-	int ret = PSCI_RET_SUCCESS;
 	unsigned long cpu_id;
 
 	cpu_id = smccc_get_arg1(source_vcpu);
@@ -78,14 +77,13 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
 
-	spin_lock(&vcpu->arch.mp_state_lock);
+	guard(spinlock)(&vcpu->arch.mp_state_lock);
+
 	if (!kvm_arm_vcpu_stopped(vcpu)) {
 		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
-			ret = PSCI_RET_ALREADY_ON;
+			return PSCI_RET_ALREADY_ON;
 		else
-			ret = PSCI_RET_INVALID_PARAMS;
-
-		goto out_unlock;
+			return PSCI_RET_INVALID_PARAMS;
 	}
 
 	reset_state = &vcpu->arch.reset_state;
@@ -113,9 +111,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_RUNNABLE);
 	kvm_vcpu_wake_up(vcpu);
 
-out_unlock:
-	spin_unlock(&vcpu->arch.mp_state_lock);
-	return ret;
+	return PSCI_RET_SUCCESS;
 }
 
 static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
@@ -176,9 +172,8 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type, u64 flags)
 	 * re-initialized.
 	 */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
-		spin_lock(&tmp->arch.mp_state_lock);
+		guard(spinlock)(&tmp->arch.mp_state_lock);
 		WRITE_ONCE(tmp->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED);
-		spin_unlock(&tmp->arch.mp_state_lock);
 	}
 	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b963fd975aac..60969d90bdd3 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -193,10 +193,10 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	bool loaded;
 	u32 pstate;
 
-	spin_lock(&vcpu->arch.mp_state_lock);
-	reset_state = vcpu->arch.reset_state;
-	vcpu->arch.reset_state.reset = false;
-	spin_unlock(&vcpu->arch.mp_state_lock);
+	scoped_guard(spinlock, &vcpu->arch.mp_state_lock) {
+		reset_state = vcpu->arch.reset_state;
+		vcpu->arch.reset_state.reset = false;
+	}
 
 	preempt_disable();
 	loaded = (vcpu->cpu != -1);
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 04/11] KVM: arm64: Extract MPIDR computation into a shared header
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (2 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 03/11] KVM: arm64: Use guard()/scoped_guard() in arm64 KVM EL1 code tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  6:59 ` [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code tabba
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

Extract the vCPU MPIDR computation embedded in reset_mpidr() into a
kvm_calculate_mpidr() inline in sys_regs.h, so it can be computed
without duplicating the logic. A follow-up series reuses it to reset
protected vCPUs at EL2.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/sys_regs.c | 14 +-------------
 arch/arm64/kvm/sys_regs.h | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index fa5c93c7a135..869a4bac96d6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -979,21 +979,9 @@ static u64 reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 
 static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-	u64 mpidr;
+	u64 mpidr = kvm_calculate_mpidr(vcpu);
 
-	/*
-	 * Map the vcpu_id into the first three affinity level fields of
-	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
-	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
-	 * of the GICv3 to be able to address each CPU directly when
-	 * sending IPIs.
-	 */
-	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
-	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
-	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	mpidr |= (1ULL << 31);
 	vcpu_write_sys_reg(vcpu, mpidr, MPIDR_EL1);
-
 	return mpidr;
 }
 
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 2a983664220c..bd56a45abbf9 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -222,6 +222,25 @@ find_reg(const struct sys_reg_params *params, const struct sys_reg_desc table[],
 	return __inline_bsearch((void *)pval, table, num, sizeof(table[0]), match_sys_reg);
 }
 
+static inline u64 kvm_calculate_mpidr(const struct kvm_vcpu *vcpu)
+{
+	u64 mpidr;
+
+	/*
+	 * Map the vcpu_id into the first three affinity level fields of
+	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
+	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
+	 * of the GICv3 to be able to address each CPU directly when
+	 * sending IPIs.
+	 */
+	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
+	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
+	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
+	mpidr |= (1ULL << 31);
+
+	return mpidr;
+}
+
 const struct sys_reg_desc *get_reg_by_id(u64 id,
 					 const struct sys_reg_desc table[],
 					 unsigned int num);
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (3 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 04/11] KVM: arm64: Extract MPIDR computation into a shared header tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  7:17   ` sashiko-bot
  2026-06-12  6:59 ` [PATCH v1 06/11] KVM: arm64: Factor out reusable vCPU reset helpers tabba
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

The vcpu_{read,write}_sys_reg() accessors are host-only, so helpers
built on them such as kvm_vcpu_set_be()/kvm_vcpu_is_be() cannot be
shared with hyp code. Add _vcpu_read_sys_reg()/_vcpu_write_sys_reg()
inlines in kvm_emulate.h that dispatch on is_nvhe_hyp_code() to the
host- or hyp-side accessor. A follow-up series uses this to share that
emulation code at EL2.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5bf3d7e1d92c..aed9fc0b717b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -506,6 +506,22 @@ static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
+static inline u64 _vcpu_read_sys_reg(struct kvm_vcpu *vcpu, enum vcpu_sysreg reg)
+{
+	if (!is_nvhe_hyp_code())
+		return vcpu_read_sys_reg(vcpu, reg);
+
+	return __vcpu_sys_reg(vcpu, reg);
+}
+
+static inline void _vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, enum vcpu_sysreg reg)
+{
+	if (!is_nvhe_hyp_code())
+		vcpu_write_sys_reg(vcpu, val, reg);
+	else
+		__vcpu_assign_sys_reg(vcpu, reg, val);
+}
+
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_mode_is_32bit(vcpu)) {
@@ -516,9 +532,9 @@ static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
 
 		r = vcpu_has_nv(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
 
-		sctlr = vcpu_read_sys_reg(vcpu, r);
+		sctlr = _vcpu_read_sys_reg(vcpu, r);
 		sctlr |= SCTLR_ELx_EE;
-		vcpu_write_sys_reg(vcpu, sctlr, r);
+		_vcpu_write_sys_reg(vcpu, sctlr, r);
 	}
 }
 
@@ -533,7 +549,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
 	r = is_hyp_ctxt(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
 	bit = vcpu_mode_priv(vcpu) ? SCTLR_ELx_EE : SCTLR_EL1_E0E;
 
-	return vcpu_read_sys_reg(vcpu, r) & bit;
+	return _vcpu_read_sys_reg(vcpu, r) & bit;
 }
 
 static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 06/11] KVM: arm64: Factor out reusable vCPU reset helpers
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (4 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  6:59 ` [PATCH v1 07/11] KVM: arm64: Move PSCI helper functions to a shared header tabba
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

Pull the reusable pieces out of kvm_reset_vcpu(): expose the reset
PSTATE values in kvm_arm.h, and split the core register reset and the
PSCI-driven reset into kvm_reset_vcpu_core() and kvm_reset_vcpu_psci().
A follow-up series reuses these to reset protected vCPUs at EL2.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_arm.h     | 12 ++++++
 arch/arm64/include/asm/kvm_emulate.h | 58 +++++++++++++++++++++++++++
 arch/arm64/kvm/reset.c               | 60 ++--------------------------
 3 files changed, 73 insertions(+), 57 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3f9233b5a130..aba4ec09acd2 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -348,4 +348,16 @@
 	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
 	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
 
+/*
+ * ARMv8 Reset Values
+ */
+#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
+				 PSR_F_BIT | PSR_D_BIT)
+
+#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
+				 PSR_F_BIT | PSR_D_BIT)
+
+#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
+				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
+
 #endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index aed9fc0b717b..8436e71c402d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -704,4 +704,62 @@ static inline void vcpu_set_hcrx(struct kvm_vcpu *vcpu)
 			vcpu->arch.hcrx_el2 |= HCRX_EL2_EnASR;
 	}
 }
+
+/* Reset a vcpu's core registers. */
+static inline void kvm_reset_vcpu_core(struct kvm_vcpu *vcpu)
+{
+	u32 pstate;
+
+	if (vcpu_el1_is_32bit(vcpu)) {
+		pstate = VCPU_RESET_PSTATE_SVC;
+	} else if (vcpu_has_nv(vcpu)) {
+		pstate = VCPU_RESET_PSTATE_EL2;
+	} else {
+		pstate = VCPU_RESET_PSTATE_EL1;
+	}
+
+	/* Reset core registers */
+	memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
+	memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
+	vcpu->arch.ctxt.spsr_abt = 0;
+	vcpu->arch.ctxt.spsr_und = 0;
+	vcpu->arch.ctxt.spsr_irq = 0;
+	vcpu->arch.ctxt.spsr_fiq = 0;
+	vcpu_gp_regs(vcpu)->pstate = pstate;
+}
+
+/* PSCI reset handling for a vcpu. */
+static inline void kvm_reset_vcpu_psci(struct kvm_vcpu *vcpu,
+				       struct vcpu_reset_state *reset_state)
+{
+	unsigned long target_pc = reset_state->pc;
+
+	/* Gracefully handle Thumb2 entry point */
+	if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
+		target_pc &= ~1UL;
+		vcpu_set_thumb(vcpu);
+	}
+
+	/* Propagate caller endianness */
+	if (reset_state->be)
+		kvm_vcpu_set_be(vcpu);
+
+	*vcpu_pc(vcpu) = target_pc;
+
+	/*
+	 * We may come from a state where either a PC update was
+	 * pending (SMC call resulting in PC being increpented to
+	 * skip the SMC) or a pending exception. Make sure we get
+	 * rid of all that, as this cannot be valid out of reset.
+	 *
+	 * Note that clearing the exception mask also clears PC
+	 * updates, but that's an implementation detail, and we
+	 * really want to make it explicit.
+	 */
+	vcpu_clear_flag(vcpu, PENDING_EXCEPTION);
+	vcpu_clear_flag(vcpu, EXCEPT_MASK);
+	vcpu_clear_flag(vcpu, INCREMENT_PC);
+	vcpu_set_reg(vcpu, 0, reset_state->r0);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 60969d90bdd3..e22d0be9e57c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -34,18 +34,6 @@
 static u32 __ro_after_init kvm_ipa_limit;
 unsigned int __ro_after_init kvm_host_sve_max_vl;
 
-/*
- * ARMv8 Reset Values
- */
-#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
-				 PSR_F_BIT | PSR_D_BIT)
-
-#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
-				 PSR_F_BIT | PSR_D_BIT)
-
-#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
-				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
-
 unsigned int __ro_after_init kvm_sve_max_vl;
 
 int __init kvm_arm_init_sve(void)
@@ -191,7 +179,6 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_reset_state reset_state;
 	bool loaded;
-	u32 pstate;
 
 	scoped_guard(spinlock, &vcpu->arch.mp_state_lock) {
 		reset_state = vcpu->arch.reset_state;
@@ -210,21 +197,8 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		kvm_vcpu_reset_sve(vcpu);
 	}
 
-	if (vcpu_el1_is_32bit(vcpu))
-		pstate = VCPU_RESET_PSTATE_SVC;
-	else if (vcpu_has_nv(vcpu))
-		pstate = VCPU_RESET_PSTATE_EL2;
-	else
-		pstate = VCPU_RESET_PSTATE_EL1;
-
 	/* Reset core registers */
-	memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
-	memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
-	vcpu->arch.ctxt.spsr_abt = 0;
-	vcpu->arch.ctxt.spsr_und = 0;
-	vcpu->arch.ctxt.spsr_irq = 0;
-	vcpu->arch.ctxt.spsr_fiq = 0;
-	vcpu_gp_regs(vcpu)->pstate = pstate;
+	kvm_reset_vcpu_core(vcpu);
 
 	/* Reset system registers */
 	kvm_reset_sys_regs(vcpu);
@@ -233,36 +207,8 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	 * Additional reset state handling that PSCI may have imposed on us.
 	 * Must be done after all the sys_reg reset.
 	 */
-	if (reset_state.reset) {
-		unsigned long target_pc = reset_state.pc;
-
-		/* Gracefully handle Thumb2 entry point */
-		if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
-			target_pc &= ~1UL;
-			vcpu_set_thumb(vcpu);
-		}
-
-		/* Propagate caller endianness */
-		if (reset_state.be)
-			kvm_vcpu_set_be(vcpu);
-
-		*vcpu_pc(vcpu) = target_pc;
-
-		/*
-		 * We may come from a state where either a PC update was
-		 * pending (SMC call resulting in PC being increpented to
-		 * skip the SMC) or a pending exception. Make sure we get
-		 * rid of all that, as this cannot be valid out of reset.
-		 *
-		 * Note that clearing the exception mask also clears PC
-		 * updates, but that's an implementation detail, and we
-		 * really want to make it explicit.
-		 */
-		vcpu_clear_flag(vcpu, PENDING_EXCEPTION);
-		vcpu_clear_flag(vcpu, EXCEPT_MASK);
-		vcpu_clear_flag(vcpu, INCREMENT_PC);
-		vcpu_set_reg(vcpu, 0, reset_state.r0);
-	}
+	if (reset_state.reset)
+		kvm_reset_vcpu_psci(vcpu, &reset_state);
 
 	/* Reset timer */
 	kvm_timer_vcpu_reset(vcpu);
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 07/11] KVM: arm64: Move PSCI helper functions to a shared header
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (5 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 06/11] KVM: arm64: Factor out reusable vCPU reset helpers tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  6:59 ` [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives tabba
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

Move kvm_psci_valid_affinity() and kvm_psci_narrow_to_32bit() from
psci.c to include/kvm/arm_psci.h, and move psci_affinity_mask() there
too, renaming it kvm_psci_affinity_mask() now that it is no longer
file-local. A follow-up series handles some protected-guest PSCI calls
at EL2 using these helpers.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/psci.c  | 30 +-----------------------------
 include/kvm/arm_psci.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index e1389c525e9d..228e5040c379 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -21,16 +21,6 @@
  * as described in ARM document number ARM DEN 0022A.
  */
 
-#define AFFINITY_MASK(level)	~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
-
-static unsigned long psci_affinity_mask(unsigned long affinity_level)
-{
-	if (affinity_level <= 3)
-		return MPIDR_HWID_BITMASK & AFFINITY_MASK(affinity_level);
-
-	return 0;
-}
-
 static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -51,12 +41,6 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 	return PSCI_RET_SUCCESS;
 }
 
-static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
-					   unsigned long affinity)
-{
-	return !(affinity & ~MPIDR_HWID_BITMASK);
-}
-
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 {
 	struct vcpu_reset_state *reset_state;
@@ -131,7 +115,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 		return PSCI_RET_INVALID_PARAMS;
 
 	/* Determine target affinity mask */
-	target_affinity_mask = psci_affinity_mask(lowest_affinity_level);
+	target_affinity_mask = kvm_psci_affinity_mask(lowest_affinity_level);
 	if (!target_affinity_mask)
 		return PSCI_RET_INVALID_PARAMS;
 
@@ -215,18 +199,6 @@ static void kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
 	run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
 }
 
-static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
-{
-	int i;
-
-	/*
-	 * Zero the input registers' upper 32 bits. They will be fully
-	 * zeroed on exit, so we're fine changing them in place.
-	 */
-	for (i = 1; i < 4; i++)
-		vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
-}
-
 static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32 fn)
 {
 	/*
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index cbaec804eb83..f12b74a4b176 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -38,6 +38,34 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
 	return KVM_ARM_PSCI_0_1;
 }
 
+/* Narrow the PSCI register arguments (r1 to r3) to 32 bits. */
+static inline void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
+{
+	int i;
+
+	/*
+	 * Zero the input registers' upper 32 bits. They will be fully
+	 * zeroed on exit, so we're fine changing them in place.
+	 */
+	for (i = 1; i < 4; i++)
+		vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
+}
+
+static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
+					   unsigned long affinity)
+{
+	return !(affinity & ~MPIDR_HWID_BITMASK);
+}
+
+
+static inline unsigned long kvm_psci_affinity_mask(unsigned long affinity_level)
+{
+	if (affinity_level <= 3)
+		return MPIDR_HWID_BITMASK &
+			~((0x1UL << (affinity_level * MPIDR_LEVEL_BITS)) - 1);
+
+	return 0;
+}
 
 int kvm_psci_call(struct kvm_vcpu *vcpu);
 
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (6 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 07/11] KVM: arm64: Move PSCI helper functions to a shared header tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  7:08   ` sashiko-bot
  2026-06-12  6:59 ` [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch tabba
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

From: Marc Zyngier <maz@kernel.org>

The nVHE hypervisor repeatedly resolves a host vCPU into the EL2
address space and validates that the loaded hyp vCPU matches it, with
that logic open-coded in each handler.

Add __get_host_hyp_vcpus() and the get_host_hyp_vcpus() macro, which
translate the host vCPU into the hypervisor's address space and, when
pKVM is enabled, also return the loaded hyp vCPU if it matches. If pKVM
is enabled but the loaded hyp vCPU does not correspond to the requested
host vCPU, both the host and hyp vCPU are returned as NULL. Convert
handle___kvm_vcpu_run() to use it.

No functional change intended.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 52 ++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 06db299c37a8..420fb19a6476 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -195,14 +195,45 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
 		pkvm_put_hyp_vcpu(hyp_vcpu);
 }
 
-static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
+static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
+					     struct pkvm_hyp_vcpu **hyp_vcpup)
 {
-	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
-	int ret;
+	struct kvm_vcpu *host_vcpu = kern_hyp_va(arg);
+	struct pkvm_hyp_vcpu *hyp_vcpu = NULL;
 
 	if (unlikely(is_protected_kvm_enabled())) {
-		struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+		hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 
+		if (!hyp_vcpu || hyp_vcpu->host_vcpu != host_vcpu) {
+			hyp_vcpu = NULL;
+			host_vcpu = NULL;
+		}
+	}
+
+	*hyp_vcpup = hyp_vcpu;
+	return host_vcpu;
+}
+
+#define get_host_hyp_vcpus(ctxt, regnr, hyp_vcpup)			\
+	({								\
+		DECLARE_REG(struct kvm_vcpu *, __vcpu, ctxt, regnr);	\
+		__get_host_hyp_vcpus(__vcpu, hyp_vcpup);		\
+	})
+
+static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct kvm_vcpu *host_vcpu;
+	int ret;
+
+	host_vcpu = get_host_hyp_vcpus(host_ctxt, 1, &hyp_vcpu);
+
+	if (!host_vcpu) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (unlikely(hyp_vcpu)) {
 		/*
 		 * KVM (and pKVM) doesn't support SME guests for now, and
 		 * ensures that SME features aren't enabled in pstate when
@@ -214,23 +245,16 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 			goto out;
 		}
 
-		if (!hyp_vcpu) {
-			ret = -EINVAL;
-			goto out;
-		}
-
 		flush_hyp_vcpu(hyp_vcpu);
 
 		ret = __kvm_vcpu_run(&hyp_vcpu->vcpu);
 
 		sync_hyp_vcpu(hyp_vcpu);
 	} else {
-		struct kvm_vcpu *vcpu = kern_hyp_va(host_vcpu);
-
 		/* The host is fully trusted, run its vCPU directly. */
-		fpsimd_lazy_switch_to_guest(vcpu);
-		ret = __kvm_vcpu_run(vcpu);
-		fpsimd_lazy_switch_to_host(vcpu);
+		fpsimd_lazy_switch_to_guest(host_vcpu);
+		ret = __kvm_vcpu_run(host_vcpu);
+		fpsimd_lazy_switch_to_host(host_vcpu);
 	}
 out:
 	cpu_reg(host_ctxt, 1) =  ret;
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (7 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  7:24   ` sashiko-bot
  2026-06-12  6:59 ` [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 tabba
  2026-06-12  6:59 ` [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests tabba
  10 siblings, 1 reply; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

From: Marc Zyngier <maz@kernel.org>

The host passes a vgic_v3_cpu_if pointer to the __vgic_v3_save_aprs and
__vgic_v3_restore_vmcr_aprs hypercalls, which EL2 dereferences
wholesale. That exposes the host's full VGIC emulation state to the
hypervisor, against pKVM's isolation goals.

Recover the host vCPU from the supplied cpu_if via container_of() and
copy only vgic_vmcr and the active priority registers between EL2's
hyp-side state and the host vCPU, so EL2 no longer dereferences the
host's vgic_v3_cpu_if directly.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 67 ++++++++++++++++++++++++++++--
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 420fb19a6476..2f165b6c7b07 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -7,6 +7,8 @@
 #include <hyp/adjust_pc.h>
 #include <hyp/switch.h>
 
+#include <linux/irqchip/arm-gic-v3.h>
+
 #include <asm/pgtable-types.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
@@ -220,6 +222,16 @@ static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
 		__get_host_hyp_vcpus(__vcpu, hyp_vcpup);		\
 	})
 
+#define get_host_hyp_vcpus_from_vgic_v3_cpu_if(ctxt, regnr, hyp_vcpup)		\
+	({									\
+		DECLARE_REG(struct vgic_v3_cpu_if *, cif, ctxt, regnr);\
+		struct kvm_vcpu *__vcpu = container_of(cif,			\
+						       struct kvm_vcpu,		\
+						       arch.vgic_cpu.vgic_v3);	\
+										\
+		__get_host_hyp_vcpus(__vcpu, hyp_vcpup);			\
+	})
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
 	struct pkvm_hyp_vcpu *hyp_vcpu;
@@ -489,16 +501,63 @@ static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
 
 static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct kvm_vcpu *host_vcpu;
 
-	__vgic_v3_save_aprs(kern_hyp_va(cpu_if));
+	host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
+							   &hyp_vcpu);
+	if (!host_vcpu)
+		return;
+
+	if (unlikely(hyp_vcpu)) {
+		struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
+		int i;
+
+		hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+		__vgic_v3_save_aprs(hyp_cpu_if);
+
+		host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+		host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
+		for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
+			host_cpu_if->vgic_ap0r[i] = hyp_cpu_if->vgic_ap0r[i];
+			host_cpu_if->vgic_ap1r[i] = hyp_cpu_if->vgic_ap1r[i];
+		}
+	} else {
+		__vgic_v3_save_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
+	}
 }
 
 static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct kvm_vcpu *host_vcpu;
 
-	__vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
+	host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
+							   &hyp_vcpu);
+	if (!host_vcpu)
+		return;
+
+	if (unlikely(hyp_vcpu)) {
+		struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
+		int i;
+
+		hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+		host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+
+		hyp_cpu_if->vgic_vmcr = host_cpu_if->vgic_vmcr;
+		/* Should be a one-off */
+		hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
+					ICC_SRE_EL1_DFB |
+					ICC_SRE_EL1_SRE);
+		for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
+			hyp_cpu_if->vgic_ap0r[i] = host_cpu_if->vgic_ap0r[i];
+			hyp_cpu_if->vgic_ap1r[i] = host_cpu_if->vgic_ap1r[i];
+		}
+
+		__vgic_v3_restore_vmcr_aprs(hyp_cpu_if);
+	} else {
+		__vgic_v3_restore_vmcr_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
+	}
 }
 
 static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (8 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  7:23   ` sashiko-bot
  2026-06-12  6:59 ` [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests tabba
  10 siblings, 1 reply; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

From: Marc Zyngier <maz@kernel.org>

pKVM performs its own world switch for protected VMs but has no
primitives to move the per-vCPU VGIC state between the host and
hypervisor vCPU contexts.

Add flush_hyp_vgic_state() and sync_hyp_vgic_state(). Flush copies
vgic_hcr, the in-use list registers and used_lrs from the host into the
hyp vCPU and pins vgic_sre to a fixed value; sync copies vgic_hcr,
vgic_vmcr and the in-use list registers back. The active priority
registers are handled separately by the save/restore-aprs path.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 50 +++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 2f165b6c7b07..23e644c24a03 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -99,6 +99,46 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
 	*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
 }
 
+static void flush_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
+	unsigned int used_lrs, max_lrs, i;
+
+	host_cpu_if	= &host_vcpu->arch.vgic_cpu.vgic_v3;
+	hyp_cpu_if	= &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+
+	max_lrs		= (read_gicreg(ICH_VTR_EL2) & ICH_VTR_EL2_ListRegs) + 1;
+	used_lrs	= host_cpu_if->used_lrs;
+	used_lrs	= min(used_lrs, max_lrs);
+
+	hyp_cpu_if->vgic_hcr	= host_cpu_if->vgic_hcr;
+	/* Should be a one-off */
+	hyp_cpu_if->vgic_sre	= (ICC_SRE_EL1_DIB |
+				   ICC_SRE_EL1_DFB |
+				   ICC_SRE_EL1_SRE);
+	hyp_cpu_if->used_lrs	= used_lrs;
+
+	for (i = 0; i < used_lrs; i++)
+		hyp_cpu_if->vgic_lr[i] = host_cpu_if->vgic_lr[i];
+}
+
+static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
+	unsigned int i;
+
+	host_cpu_if	= &host_vcpu->arch.vgic_cpu.vgic_v3;
+	hyp_cpu_if	= &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+
+	host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
+	host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
+
+	for (i = 0; i < hyp_cpu_if->used_lrs; i++)
+		host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
+}
+
 static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
 {
 	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
@@ -139,7 +179,7 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 
 	hyp_vcpu->vcpu.arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;
 
-	hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
+	flush_hyp_vgic_state(hyp_vcpu);
 
 	hyp_vcpu->vcpu.arch.pid = host_vcpu->arch.pid;
 }
@@ -147,9 +187,6 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 {
 	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
-	struct vgic_v3_cpu_if *hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
-	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
-	unsigned int i;
 
 	fpsimd_sve_sync(&hyp_vcpu->vcpu);
 	sync_debug_state(hyp_vcpu);
@@ -162,10 +199,7 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 
 	host_vcpu->arch.iflags		= hyp_vcpu->vcpu.arch.iflags;
 
-	host_cpu_if->vgic_hcr		= hyp_cpu_if->vgic_hcr;
-	host_cpu_if->vgic_vmcr		= hyp_cpu_if->vgic_vmcr;
-	for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
-		host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
+	sync_hyp_vgic_state(hyp_vcpu);
 }
 
 static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
  2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
                   ` (9 preceding siblings ...)
  2026-06-12  6:59 ` [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 tabba
@ 2026-06-12  6:59 ` tabba
  2026-06-12  7:19   ` sashiko-bot
  10 siblings, 1 reply; 23+ messages in thread
From: tabba @ 2026-06-12  6:59 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Fuad Tabba, Will Deacon, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel

pKVM copies a non-protected guest's register context between the host
and the hypervisor on every world switch, even when the host never
inspects it. Defer the copy: on entry, flush the host context into the
hyp vCPU only when the host marked it dirty (PKVM_HOST_STATE_DIRTY); on
exit, leave it in the hyp vCPU and copy it back only when the host needs
it, via a __pkvm_vcpu_sync_state hypercall on trap handling or at vcpu
put. A protected guest's context is copied as before, since lazy sync
only helps where the host is trusted to see the guest's registers.

The PC is the exception: it is copied back on every exit so the
kvm_exit tracepoint reports the guest's real exit PC rather than the
value left by the previous sync.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_asm.h   |  1 +
 arch/arm64/include/asm/kvm_host.h  |  2 +
 arch/arm64/kvm/arm.c               |  7 +++
 arch/arm64/kvm/handle_exit.c       | 22 ++++++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 88 ++++++++++++++++++++++++++++--
 5 files changed, 115 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 043495f7fc78..6e1135b3ded4 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -113,6 +113,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
+	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
 	__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
 
 	MARKER(__KVM_HOST_SMCCC_FUNC_MAX)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a49042bfa801..1ef660774adc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1113,6 +1113,8 @@ struct kvm_vcpu_arch {
 /* SError pending for nested guest */
 #define NESTED_SERROR_PENDING	__vcpu_single_flag(sflags, BIT(8))
 
+/* pKVM host vcpu state is dirty, needs resync (nVHE-only) */
+#define PKVM_HOST_STATE_DIRTY	__vcpu_single_flag(iflags, BIT(4))
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
 #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c9f36932c980..a5c54e37778b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -734,6 +734,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	if (is_protected_kvm_enabled()) {
 		kvm_call_hyp(__vgic_v3_save_aprs, &vcpu->arch.vgic_cpu.vgic_v3);
 		kvm_call_hyp_nvhe(__pkvm_vcpu_put);
+
+		/* __pkvm_vcpu_put implies a sync of the state */
+		if (!kvm_vm_is_protected(vcpu->kvm))
+			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
 	}
 
 	kvm_vcpu_put_debug(vcpu);
@@ -961,6 +965,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 		return ret;
 
 	if (is_protected_kvm_enabled()) {
+		/* Start with the vcpu in a dirty state */
+		if (!kvm_vm_is_protected(vcpu->kvm))
+			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
 		ret = pkvm_create_hyp_vm(kvm);
 		if (ret)
 			return ret;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 54aedf93c78b..dccc3786548b 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -422,6 +422,21 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
 {
 	int handled;
 
+	/*
+	 * If we run a non-protected VM when protection is enabled
+	 * system-wide, resync the state from the hypervisor and mark
+	 * it as dirty on the host side if it wasn't dirty already
+	 * (which could happen if preemption has taken place).
+	 */
+	if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
+		preempt_disable();
+		if (!(vcpu_get_flag(vcpu, PKVM_HOST_STATE_DIRTY))) {
+			kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
+			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+		}
+		preempt_enable();
+	}
+
 	/*
 	 * See ARM ARM B1.14.1: "Hyp traps on instructions
 	 * that fail their condition code check"
@@ -489,6 +504,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 /* For exit types that need handling before we can be preempted */
 void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
 {
+	/*
+	 * We just exited, so the state is clean from a hypervisor
+	 * perspective.
+	 */
+	if (is_protected_kvm_enabled())
+		vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+
 	if (ARM_SERROR_PENDING(exception_index)) {
 		if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
 			u64 disr = kvm_vcpu_get_disr(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 23e644c24a03..02383b372258 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -139,6 +139,49 @@ static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
 		host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
 }
 
+
+static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu,
+			      struct kvm_vcpu *to_vcpu)
+{
+	int i;
+
+	to_vcpu->arch.ctxt.regs		= from_vcpu->arch.ctxt.regs;
+	to_vcpu->arch.ctxt.spsr_abt	= from_vcpu->arch.ctxt.spsr_abt;
+	to_vcpu->arch.ctxt.spsr_und	= from_vcpu->arch.ctxt.spsr_und;
+	to_vcpu->arch.ctxt.spsr_irq	= from_vcpu->arch.ctxt.spsr_irq;
+	to_vcpu->arch.ctxt.spsr_fiq	= from_vcpu->arch.ctxt.spsr_fiq;
+	to_vcpu->arch.ctxt.fp_regs	= from_vcpu->arch.ctxt.fp_regs;
+
+	/*
+	 * Copy the sysregs, but don't mess with the timer state which
+	 * is directly handled by EL1 and is expected to be preserved.
+	 * enum vcpu_sysreg is sparse: VNCR-mapped registers take values
+	 * derived from their VNCR page offset, so the timer registers do
+	 * not form a contiguous numeric range and must be skipped by name.
+	 */
+	for (i = 1; i < NR_SYS_REGS; i++) {
+		switch (i) {
+		case CNTVOFF_EL2:
+		case CNTV_CVAL_EL0:
+		case CNTV_CTL_EL0:
+		case CNTP_CVAL_EL0:
+		case CNTP_CTL_EL0:
+			continue;
+		}
+		to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i];
+	}
+}
+
+static void __sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	__copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu);
+}
+
+static void __flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	__copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu);
+}
+
 static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
 {
 	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
@@ -168,7 +211,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	fpsimd_sve_flush();
 	flush_debug_state(hyp_vcpu);
 
-	hyp_vcpu->vcpu.arch.ctxt	= host_vcpu->arch.ctxt;
+	/*
+	 * If we deal with a non-protected guest and the state is potentially
+	 * dirty (from a host perspective), copy the state back into the hyp
+	 * vcpu.
+	 */
+	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+		if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
+			__flush_hyp_vcpu(hyp_vcpu);
+	} else {
+		hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
+	}
 
 	hyp_vcpu->vcpu.arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
 	hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
@@ -191,9 +244,11 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	fpsimd_sve_sync(&hyp_vcpu->vcpu);
 	sync_debug_state(hyp_vcpu);
 
-	host_vcpu->arch.ctxt		= hyp_vcpu->vcpu.arch.ctxt;
-
-	host_vcpu->arch.hcr_el2		= hyp_vcpu->vcpu.arch.hcr_el2;
+	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
+	else
+		/* Keep the PC current for the kvm_exit tracepoint (lazy ctxt sync). */
+		host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
 
 	host_vcpu->arch.fault		= hyp_vcpu->vcpu.arch.fault;
 
@@ -227,8 +282,30 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
 {
 	struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 
-	if (hyp_vcpu)
+	if (hyp_vcpu) {
+		struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+		if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) &&
+		    !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) {
+			__sync_hyp_vcpu(hyp_vcpu);
+		}
+
 		pkvm_put_hyp_vcpu(hyp_vcpu);
+	}
+}
+
+static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		return;
+
+	__sync_hyp_vcpu(hyp_vcpu);
 }
 
 static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
@@ -859,6 +936,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
 	HANDLE_FUNC(__pkvm_vcpu_load),
 	HANDLE_FUNC(__pkvm_vcpu_put),
+	HANDLE_FUNC(__pkvm_vcpu_sync_state),
 	HANDLE_FUNC(__pkvm_tlb_flush_vmid),
 };
 
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives
  2026-06-12  6:59 ` [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives tabba
@ 2026-06-12  7:08   ` sashiko-bot
  2026-06-12  7:15     ` Fuad Tabba
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-12  7:08 UTC (permalink / raw)
  To: tabba; +Cc: kvmarm, Oliver Upton, Marc Zyngier

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] flush_hyp_vcpu() incorrectly copies MDCR_EL2 from the untrusted host context for protected VMs, violating EL2 security guidelines.
--

commit 11498ed516d4ebc65b4ad00086974239ee0e2eb4
Author: Marc Zyngier <maz@kernel.org>
KVM: arm64: Add host and hypervisor vCPU lookup primitives

This commit introduces primitives to translate a host vCPU into the
hypervisor's address space. It refactors handle___kvm_vcpu_run() to use
these new lookup primitives.

> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
[ ... ]
> @@ -214,23 +245,16 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
>  			goto out;
>  		}
>  
> -		if (!hyp_vcpu) {
> -			ret = -EINVAL;
> -			goto out;
> -		}
> -
>  		flush_hyp_vcpu(hyp_vcpu);

[Severity: High]
This is a pre-existing issue, but does flush_hyp_vcpu() incorrectly copy
mdcr_el2 from the untrusted host context for protected VMs?

Looking at flush_hyp_vcpu() in arch/arm64/kvm/hyp/nvhe/hyp-main.c:

static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
{
        struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
        [ ... ]
        hyp_vcpu->vcpu.arch.mdcr_el2 = host_vcpu->arch.mdcr_el2;
        [ ... ]
}

According to the pKVM EL2 Security and Trust Boundary guidelines, trap
configuration registers like mdcr_el2 must be initialized in an EL2-private
context for protected VMs. They must not rely on values written by the host.

Because host_vcpu resides in untrusted host memory, this unconditional
assignment could allow a compromised host to overwrite the mdcr_el2
configuration of a protected VM. This could potentially allow the host to
manipulate guest debug exceptions and break guest confidentiality or
hypervisor isolation.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=8

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives
  2026-06-12  7:08   ` sashiko-bot
@ 2026-06-12  7:15     ` Fuad Tabba
  0 siblings, 0 replies; 23+ messages in thread
From: Fuad Tabba @ 2026-06-12  7:15 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvmarm, Oliver Upton, Marc Zyngier, Will Deacon

Hello Sashiko my old friend,

I came prepared for you...

On Fri, 12 Jun 2026 at 08:08, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] flush_hyp_vcpu() incorrectly copies MDCR_EL2 from the untrusted host context for protected VMs, violating EL2 security guidelines.
> --
>
> commit 11498ed516d4ebc65b4ad00086974239ee0e2eb4
> Author: Marc Zyngier <maz@kernel.org>
> KVM: arm64: Add host and hypervisor vCPU lookup primitives
>
> This commit introduces primitives to translate a host vCPU into the
> hypervisor's address space. It refactors handle___kvm_vcpu_run() to use
> these new lookup primitives.
>
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> [ ... ]
> > @@ -214,23 +245,16 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
> >                       goto out;
> >               }
> >
> > -             if (!hyp_vcpu) {
> > -                     ret = -EINVAL;
> > -                     goto out;
> > -             }
> > -
> >               flush_hyp_vcpu(hyp_vcpu);
>
> [Severity: High]
> This is a pre-existing issue, but does flush_hyp_vcpu() incorrectly copy
> mdcr_el2 from the untrusted host context for protected VMs?
>
> Looking at flush_hyp_vcpu() in arch/arm64/kvm/hyp/nvhe/hyp-main.c:
>
> static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> {
>         struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
>         [ ... ]
>         hyp_vcpu->vcpu.arch.mdcr_el2 = host_vcpu->arch.mdcr_el2;
>         [ ... ]
> }
>
> According to the pKVM EL2 Security and Trust Boundary guidelines, trap
> configuration registers like mdcr_el2 must be initialized in an EL2-private
> context for protected VMs. They must not rely on values written by the host.
>
> Because host_vcpu resides in untrusted host memory, this unconditional
> assignment could allow a compromised host to overwrite the mdcr_el2
> configuration of a protected VM. This could potentially allow the host to
> manipulate guest debug exceptions and break guest confidentiality or
> hypervisor isolation.

The assignment is not unconditional. It is inside the
`if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu))` guard (line 607) and only
executes for non-protected VMs.

For protected VMs, mdcr_el2 is initialized at vCPU-init time by
pvm_init_traps_mdcr() (pkvm.c) and is never overwritten from host
memory on the flush path.

Cheers,
/fuad

>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=8

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code
  2026-06-12  6:59 ` [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code tabba
@ 2026-06-12  7:17   ` sashiko-bot
  2026-06-12  7:53     ` Fuad Tabba
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-12  7:17 UTC (permalink / raw)
  To: tabba; +Cc: kvmarm, Oliver Upton, Marc Zyngier

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Hardware/Memory State Desynchronization in EL2 sysreg accessors
--

commit 69d4753f169931c6c1990fc9118b5a6cffe54554
Author: tabba@google.com <tabba@google.com>

KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code

The vcpu_{read,write}_sys_reg() accessors are host-only, so helpers
built on them such as kvm_vcpu_set_be()/kvm_vcpu_is_be() cannot be
shared with hyp code. Add _vcpu_read_sys_reg()/_vcpu_write_sys_reg()
inlines in kvm_emulate.h that dispatch on is_nvhe_hyp_code() to the
host- or hyp-side accessor. A follow-up series uses this to share that
emulation code at EL2.

> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5bf3d7e1d92c7..aed9fc0b717b1 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -506,6 +506,22 @@ static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>  }
>  
> +static inline u64 _vcpu_read_sys_reg(struct kvm_vcpu *vcpu, enum vcpu_sysreg reg)
> +{
> +	if (!is_nvhe_hyp_code())
> +		return vcpu_read_sys_reg(vcpu, reg);
> +
> +	return __vcpu_sys_reg(vcpu, reg);
> +}
> +
> +static inline void _vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, enum vcpu_sysreg reg)
> +{
> +	if (!is_nvhe_hyp_code())
> +		vcpu_write_sys_reg(vcpu, val, reg);
> +	else
> +		__vcpu_assign_sys_reg(vcpu, reg, val);
> +}

[Severity: High]
If these accessors are called from nVHE EL2 code during guest exit handling,
will they read and write from the in-memory sys_regs array instead of the
active hardware registers?

Since the guest's system registers remain live in the CPU hardware during the
run loop:

    do {
        __guest_enter(vcpu);
    } while (fixup_guest_exit(vcpu, &exit_code));

and aren't saved to memory until __sysreg_save_state_nvhe() runs after the
loop terminates, it seems _vcpu_read_sys_reg() might read stale values from
memory.

Likewise, would _vcpu_write_sys_reg() updates write to memory without updating
the active hardware registers before the guest is re-entered?

> +
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu_mode_is_32bit(vcpu)) {
> @@ -516,9 +532,9 @@ static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
>  
>  		r = vcpu_has_nv(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
>  
> -		sctlr = vcpu_read_sys_reg(vcpu, r);
> +		sctlr = _vcpu_read_sys_reg(vcpu, r);
>  		sctlr |= SCTLR_ELx_EE;
> -		vcpu_write_sys_reg(vcpu, sctlr, r);
> +		_vcpu_write_sys_reg(vcpu, sctlr, r);
>  	}
>  }

[Severity: High]
When kvm_vcpu_set_be() is used at EL2, could this result in the guest being
re-entered without the new endianness taking effect?

Whether updating the 32-bit pstate via *vcpu_cpsr() or the 64-bit SCTLR via
_vcpu_write_sys_reg(), the modifications appear to happen to the in-memory
state rather than the active hardware registers (like SPSR_EL2 or SCTLR_EL1).

If so, wouldn't the updates be lost and unobserved by the guest upon re-entry?

>  
> @@ -533,7 +549,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
>  	r = is_hyp_ctxt(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
>  	bit = vcpu_mode_priv(vcpu) ? SCTLR_ELx_EE : SCTLR_EL1_E0E;
>  
> -	return vcpu_read_sys_reg(vcpu, r) & bit;
> +	return _vcpu_read_sys_reg(vcpu, r) & bit;
>  }

[Severity: High]
Could this result in kvm_vcpu_is_be() reading an incorrect endianness state
at EL2?

If the guest modifies SCTLR_EL1 and traps to EL2, calling this helper would
use _vcpu_read_sys_reg() which accesses the stale memory array instead of the
current hardware register state.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
  2026-06-12  6:59 ` [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests tabba
@ 2026-06-12  7:19   ` sashiko-bot
  2026-06-12  9:51     ` Fuad Tabba
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-12  7:19 UTC (permalink / raw)
  To: tabba; +Cc: kvmarm, Marc Zyngier, Oliver Upton

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Lazy vCPU state sync bypasses vcpu_mode_is_bad_32bit() check on non-trap VM exits due to stale CPSR.
- [High] Exceptions injected by the host and other context modifications during the KVM_RUN loop are silently dropped, causing infinite VM exit loops or ignored requests.

Pre-existing issues:
- [Critical] Protected VM register state is unconditionally copied between the hypervisor and untrusted host, violating pKVM confidentiality and integrity guarantees.
--

commit 96ecd36038c29097f333a40bf0eebd7b62bffc92
Author: tabba@google.com <tabba@google.com>
Subject: KVM: arm64: Implement lazy vCPU state sync for non-protected guests

This commit implements lazy synchronization of vCPU register state for
non-protected guests running under pKVM. It defers copying the context from
the host to the hypervisor until the host marks it dirty, and from the
hypervisor to the host until the host requires it. The PC is synced on every
exit to maintain accurate tracepoints.

> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 54aedf93c78b6..dccc3786548b0 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
[ ... ]
> @@ -489,6 +504,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
>  /* For exit types that need handling before we can be preempted */
>  void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
>  {
> +	/*
> +	 * We just exited, so the state is clean from a hypervisor
> +	 * perspective.
> +	 */
> +	if (is_protected_kvm_enabled())
> +		vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> +

[Severity: High]
When a guest exits (such as for ARM_EXCEPTION_IL or an IRQ), this clears the
PKVM_HOST_STATE_DIRTY flag.

If the host handles the exit or processes pending requests (like
KVM_REQ_VCPU_RESET or injecting undefined/SErrors), those operations modify
the host's copy of the guest register context (vcpu->arch.ctxt).

However, since these paths bypass handle_trap_exceptions(), the
PKVM_HOST_STATE_DIRTY flag is never set again.

On the next guest entry, flush_hyp_vcpu() at EL2 will see the dirty flag is
clear and skip calling __flush_hyp_vcpu().

Will this cause exceptions injected by the host and other context modifications
made during the KVM_RUN loop to be silently dropped? For ARM_EXCEPTION_IL,
does this mean the hypervisor will re-enter the guest using the old,
unmodified register state, causing an instant re-trap?

>  	if (ARM_SERROR_PENDING(exception_index)) {
>  		if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
>  			u64 disr = kvm_vcpu_get_disr(vcpu);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 23e644c24a03a..02383b372258f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
[ ... ]
> @@ -168,7 +211,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
>  	fpsimd_sve_flush();
>  	flush_debug_state(hyp_vcpu);
>  
> -	hyp_vcpu->vcpu.arch.ctxt	= host_vcpu->arch.ctxt;
> +	/*
> +	 * If we deal with a non-protected guest and the state is potentially
> +	 * dirty (from a host perspective), copy the state back into the hyp
> +	 * vcpu.
> +	 */
> +	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> +		if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> +			__flush_hyp_vcpu(hyp_vcpu);
> +	} else {
> +		hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;

[Severity: Critical]
This is a pre-existing issue, but I have a question about this copy operation.

This executes at EL2 and copies the entire vcpu.arch.ctxt from host_vcpu
(memory accessible to the untrusted host EL1) into the protected VM's
private state.

Could this violate guest integrity by allowing the host to hijack the VM's
control flow?

> +	}
>  
>  	hyp_vcpu->vcpu.arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
>  	hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
[ ... ]
> @@ -191,9 +244,11 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
>  	fpsimd_sve_sync(&hyp_vcpu->vcpu);
>  	sync_debug_state(hyp_vcpu);
>  
> -	host_vcpu->arch.ctxt		= hyp_vcpu->vcpu.arch.ctxt;
> -
> -	host_vcpu->arch.hcr_el2		= hyp_vcpu->vcpu.arch.hcr_el2;
> +	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +		host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;

[Severity: Critical]
This is a pre-existing issue, but I have a similar question about copying
the protected guest's register state back to the host here.

Does copying hyp_vcpu->vcpu.arch.ctxt back into the untrusted
host_vcpu->arch.ctxt violate pKVM confidentiality guarantees by leaking
the state to the host?

> +	else
> +		/* Keep the PC current for the kvm_exit tracepoint (lazy ctxt sync). */
> +		host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;

[Severity: High]
Since handle_trap_exceptions() is only called for traps, an exit like
ARM_EXCEPTION_IRQ won't trigger the __pkvm_vcpu_sync_state hypercall.

Because this code only synchronizes the pc field for non-protected guests,
vcpu->arch.ctxt.regs.pstate (CPSR) remains stale in the host vCPU context.

Does this lazy synchronization leave the host with a stale CPSR on non-trap
exits?

If the guest transitioned to an illegal 32-bit mode immediately before the
IRQ was taken, would the KVM_RUN inner loop evaluate vcpu_mode_is_bad_32bit()
using this stale CPSR, bypassing KVM's architectural protection against illegal
AArch32 execution?

>  
>  	host_vcpu->arch.fault		= hyp_vcpu->vcpu.arch.fault;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=11

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
  2026-06-12  6:59 ` [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 tabba
@ 2026-06-12  7:23   ` sashiko-bot
  2026-06-12  8:14     ` Fuad Tabba
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-12  7:23 UTC (permalink / raw)
  To: tabba; +Cc: Oliver Upton, Marc Zyngier, kvmarm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unconditional pinning of vgic_sre to SRE=1 breaks GICv2 emulation for non-protected guests.
--

commit 3d9e59afaa5e29a6ce57ece8ccacae434b38915d
Author: Marc Zyngier <maz@kernel.org>

KVM: arm64: Add primitives to flush/sync the VGIC state at EL2

This commit adds flush_hyp_vgic_state() and sync_hyp_vgic_state()
primitives to move per-vCPU VGIC state between the host and hypervisor
contexts for pKVM.

> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 2f165b6c7b071..23e644c24a03a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
[ ... ]
> @@ -99,6 +99,46 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
[ ... ]
> +static void flush_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> +	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> +	struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
> +	unsigned int used_lrs, max_lrs, i;
> +
> +	host_cpu_if	= &host_vcpu->arch.vgic_cpu.vgic_v3;
> +	hyp_cpu_if	= &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> +
> +	max_lrs		= (read_gicreg(ICH_VTR_EL2) & ICH_VTR_EL2_ListRegs) + 1;
> +	used_lrs	= host_cpu_if->used_lrs;
> +	used_lrs	= min(used_lrs, max_lrs);
> +
> +	hyp_cpu_if->vgic_hcr	= host_cpu_if->vgic_hcr;
> +	/* Should be a one-off */
> +	hyp_cpu_if->vgic_sre	= (ICC_SRE_EL1_DIB |
> +				   ICC_SRE_EL1_DFB |
> +				   ICC_SRE_EL1_SRE);

[Severity: High]
Does hardcoding ICC_SRE_EL1_SRE here break GICv2 emulation for non-protected
guests?

When __vgic_v3_restore_state() executes on guest entry, it will program
ICC_SRE_EL1 with this value. Setting the SRE bit to 1 disables the GICv2
memory-mapped interface (GICV) for EL1. Non-protected guests configured to
use GICv2 rely on vgic_sre == 0 to keep the memory-mapped interface enabled.

Should this value be copied from the host-set values for non-protected VMs
rather than being unconditionally pinned?

> +	hyp_cpu_if->used_lrs	= used_lrs;
> +
> +	for (i = 0; i < used_lrs; i++)
> +		hyp_cpu_if->vgic_lr[i] = host_cpu_if->vgic_lr[i];
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=10

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
  2026-06-12  6:59 ` [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch tabba
@ 2026-06-12  7:24   ` sashiko-bot
  2026-06-12  8:05     ` Fuad Tabba
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-12  7:24 UTC (permalink / raw)
  To: tabba; +Cc: kvmarm, Oliver Upton, Marc Zyngier

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] The GICv5 equivalents of the VGIC save/restore hypercalls (`handle___vgic_v5_save_apr` and `handle___vgic_v5_restore_vmcr_apr`) directly dereference host-provided pointers without validation, allowing a complete bypass of pKVM isolation.
- [Critical] The `handle___kvm_adjust_pc` hypercall directly dereferences a host-provided `vcpu` pointer without validation, granting the host an arbitrary memory corruption primitive against the hypervisor.
--

commit a5d2ecd901c7ff6f71119eb6ee605aaa1a3fee90
Author: Marc Zyngier <maz@kernel.org>

KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch

The host passes a vgic_v3_cpu_if pointer to the __vgic_v3_save_aprs and
__vgic_v3_restore_vmcr_aprs hypercalls, which EL2 dereferences
wholesale. That exposes the host's full VGIC emulation state to the
hypervisor, against pKVM's isolation goals.

Recover the host vCPU from the supplied cpu_if via container_of() and
copy only vgic_vmcr and the active priority registers between EL2's
hyp-side state and the host vCPU, so EL2 no longer dereferences the
host's vgic_v3_cpu_if directly.

> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 420fb19a64760..2f165b6c7b071 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c

[ ... ]

> @@ -489,16 +501,63 @@ static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
>  
>  static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
>  {
> -	DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
> +	struct pkvm_hyp_vcpu *hyp_vcpu;
> +	struct kvm_vcpu *host_vcpu;
>  
> -	__vgic_v3_save_aprs(kern_hyp_va(cpu_if));
> +	host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
> +							   &hyp_vcpu);
> +	if (!host_vcpu)
> +		return;

[ ... ]

> +	} else {
> +		__vgic_v3_save_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
> +	}
>  }

[Severity: Critical]
This is a pre-existing issue, but while this patch addresses the VGIC v3 host
pointer dereference bypass by using get_host_hyp_vcpus_from_vgic_v3_cpu_if(),
does the same pKVM isolation bypass exist in the GICv5 equivalents?

Looking at handle___vgic_v5_save_apr() and handle___vgic_v5_restore_vmcr_apr()
in arch/arm64/kvm/hyp/nvhe/hyp-main.c:

handle___vgic_v5_save_apr(struct kvm_cpu_context *host_ctxt) {
    DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
    __vgic_v5_save_apr(kern_hyp_va(cpu_if));
}

These handlers appear to take a cpu_if pointer directly from the untrusted
host context and apply kern_hyp_va() without validation. Could a malicious
host provide an address that translates to an arbitrary EL2-private memory
location, granting an arbitrary write or read primitive when
__vgic_v5_save_apr() or __vgic_v5_restore_vmcr_apr() is invoked?

[Severity: Critical]
This is also a pre-existing issue, but does a similar arbitrary memory
corruption primitive exist in handle___kvm_adjust_pc()?

Looking at handle___kvm_adjust_pc() in arch/arm64/kvm/hyp/nvhe/hyp-main.c:

static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt) {
    DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
    __kvm_adjust_pc(kern_hyp_va(vcpu));
}

And __kvm_adjust_pc() in arch/arm64/kvm/hyp/exception.c:

void __kvm_adjust_pc(struct kvm_vcpu *vcpu) {
    if (vcpu_get_flag(vcpu, PENDING_EXCEPTION)) {
        kvm_inject_exception(vcpu);
        vcpu_clear_flag(vcpu, PENDING_EXCEPTION);
        vcpu_clear_flag(vcpu, EXCEPT_MASK);
    }
    ...
}

It seems the host-provided vcpu pointer is converted via kern_hyp_va()
without the __get_host_hyp_vcpus() validation barrier. Could a malicious
host inject a pointer that translates into EL2-private memory (like
hypervisor page tables) and set the PENDING_EXCEPTION bit, forcing
__kvm_adjust_pc() to write exception vector addresses and clear flags at
specific offsets within the target structure?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=9

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code
  2026-06-12  7:17   ` sashiko-bot
@ 2026-06-12  7:53     ` Fuad Tabba
  0 siblings, 0 replies; 23+ messages in thread
From: Fuad Tabba @ 2026-06-12  7:53 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvmarm, Oliver Upton, Marc Zyngier

Hello again,

On Fri, 12 Jun 2026 at 08:17, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Hardware/Memory State Desynchronization in EL2 sysreg accessors
> --
>
> commit 69d4753f169931c6c1990fc9118b5a6cffe54554
> Author: tabba@google.com <tabba@google.com>
>
> KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code
>
> The vcpu_{read,write}_sys_reg() accessors are host-only, so helpers
> built on them such as kvm_vcpu_set_be()/kvm_vcpu_is_be() cannot be
> shared with hyp code. Add _vcpu_read_sys_reg()/_vcpu_write_sys_reg()
> inlines in kvm_emulate.h that dispatch on is_nvhe_hyp_code() to the
> host- or hyp-side accessor. A follow-up series uses this to share that
> emulation code at EL2.
>
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 5bf3d7e1d92c7..aed9fc0b717b1 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -506,6 +506,22 @@ static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> >       return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> >  }
> >
> > +static inline u64 _vcpu_read_sys_reg(struct kvm_vcpu *vcpu, enum vcpu_sysreg reg)
> > +{
> > +     if (!is_nvhe_hyp_code())
> > +             return vcpu_read_sys_reg(vcpu, reg);
> > +
> > +     return __vcpu_sys_reg(vcpu, reg);
> > +}
> > +
> > +static inline void _vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, enum vcpu_sysreg reg)
> > +{
> > +     if (!is_nvhe_hyp_code())
> > +             vcpu_write_sys_reg(vcpu, val, reg);
> > +     else
> > +             __vcpu_assign_sys_reg(vcpu, reg, val);
> > +}
>
> [Severity: High]
> If these accessors are called from nVHE EL2 code during guest exit handling,
> will they read and write from the in-memory sys_regs array instead of the
> active hardware registers?
>
> Since the guest's system registers remain live in the CPU hardware during the
> run loop:
>
>     do {
>         __guest_enter(vcpu);
>     } while (fixup_guest_exit(vcpu, &exit_code));
>
> and aren't saved to memory until __sysreg_save_state_nvhe() runs after the
> loop terminates, it seems _vcpu_read_sys_reg() might read stale values from
> memory.
>
> Likewise, would _vcpu_write_sys_reg() updates write to memory without updating
> the active hardware registers before the guest is re-entered?
>
> > +
> >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> >  {
> >       if (vcpu_mode_is_32bit(vcpu)) {
> > @@ -516,9 +532,9 @@ static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> >
> >               r = vcpu_has_nv(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
> >
> > -             sctlr = vcpu_read_sys_reg(vcpu, r);
> > +             sctlr = _vcpu_read_sys_reg(vcpu, r);
> >               sctlr |= SCTLR_ELx_EE;
> > -             vcpu_write_sys_reg(vcpu, sctlr, r);
> > +             _vcpu_write_sys_reg(vcpu, sctlr, r);
> >       }
> >  }
>
> [Severity: High]
> When kvm_vcpu_set_be() is used at EL2, could this result in the guest being
> re-entered without the new endianness taking effect?
>
> Whether updating the 32-bit pstate via *vcpu_cpsr() or the 64-bit SCTLR via
> _vcpu_write_sys_reg(), the modifications appear to happen to the in-memory
> state rather than the active hardware registers (like SPSR_EL2 or SCTLR_EL1).
>
> If so, wouldn't the updates be lost and unobserved by the guest upon re-entry?
>
> >
> > @@ -533,7 +549,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
> >       r = is_hyp_ctxt(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
> >       bit = vcpu_mode_priv(vcpu) ? SCTLR_ELx_EE : SCTLR_EL1_E0E;
> >
> > -     return vcpu_read_sys_reg(vcpu, r) & bit;
> > +     return _vcpu_read_sys_reg(vcpu, r) & bit;
> >  }
>
> [Severity: High]
> Could this result in kvm_vcpu_is_be() reading an incorrect endianness state
> at EL2?
>
> If the guest modifies SCTLR_EL1 and traps to EL2, calling this helper would
> use _vcpu_read_sys_reg() which accesses the stale memory array instead of the
> current hardware register state.

This patch is no-functional-change: _vcpu_{read,write}_sys_reg() only
diverge from
vcpu_{read,write}_sys_reg() when is_nvhe_hyp_code() is true, and at this commit
neither kvm_vcpu_set_be() nor kvm_vcpu_is_be() has an nVHE-EL2 caller. The nVHE
branch uses __vcpu_sys_reg() because vcpu_read_sys_reg() is host-only;
under nVHE
that host accessor reads from memory anyway (locate_register() returns
SR_LOC_MEMORY when SYSREGS_ON_CPU is clear, enforced by the
WARN_ON_ONCE(!has_vhe() && loc != SR_LOC_MEMORY); the HW path uses VHE-only
SYS_*_EL12). So the EL2 branch reads exactly what the host accessor
would, via the
same accessor all existing EL2 sysreg emulation already uses
(exception.c does the
same dispatch keyed on has_vhe()).

Cheers,
/fuad
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
  2026-06-12  7:24   ` sashiko-bot
@ 2026-06-12  8:05     ` Fuad Tabba
  2026-06-12  8:09       ` Fuad Tabba
  0 siblings, 1 reply; 23+ messages in thread
From: Fuad Tabba @ 2026-06-12  8:05 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvmarm, Oliver Upton, Marc Zyngier

On Fri, 12 Jun 2026 at 08:24, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> Pre-existing issues:
> - [Critical] The GICv5 equivalents of the VGIC save/restore hypercalls (`handle___vgic_v5_save_apr` and `handle___vgic_v5_restore_vmcr_apr`) directly dereference host-provided pointers without validation, allowing a complete bypass of pKVM isolation.
> - [Critical] The `handle___kvm_adjust_pc` hypercall directly dereferences a host-provided `vcpu` pointer without validation, granting the host an arbitrary memory corruption primitive against the hypervisor.
> --
>
> commit a5d2ecd901c7ff6f71119eb6ee605aaa1a3fee90
> Author: Marc Zyngier <maz@kernel.org>
>
> KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
>
> The host passes a vgic_v3_cpu_if pointer to the __vgic_v3_save_aprs and
> __vgic_v3_restore_vmcr_aprs hypercalls, which EL2 dereferences
> wholesale. That exposes the host's full VGIC emulation state to the
> hypervisor, against pKVM's isolation goals.
>
> Recover the host vCPU from the supplied cpu_if via container_of() and
> copy only vgic_vmcr and the active priority registers between EL2's
> hyp-side state and the host vCPU, so EL2 no longer dereferences the
> host's vgic_v3_cpu_if directly.
>
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 420fb19a64760..2f165b6c7b071 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>
> [ ... ]
>
> > @@ -489,16 +501,63 @@ static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
> >
> >  static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
> >  {
> > -     DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
> > +     struct pkvm_hyp_vcpu *hyp_vcpu;
> > +     struct kvm_vcpu *host_vcpu;
> >
> > -     __vgic_v3_save_aprs(kern_hyp_va(cpu_if));
> > +     host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
> > +                                                        &hyp_vcpu);
> > +     if (!host_vcpu)
> > +             return;
>
> [ ... ]
>
> > +     } else {
> > +             __vgic_v3_save_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
> > +     }
> >  }
>
> [Severity: Critical]
> This is a pre-existing issue, but while this patch addresses the VGIC v3 host
> pointer dereference bypass by using get_host_hyp_vcpus_from_vgic_v3_cpu_if(),
> does the same pKVM isolation bypass exist in the GICv5 equivalents?

GICv5 is not supported in protected mode. vgic-v5.c:62-64 skips
GICv5 registration entirely when is_protected_kvm_enabled(), so no
GICv5-based guest can be created under pKVM. The host never reaches
these hypercalls through any legitimate code path.

>
> Looking at handle___vgic_v5_save_apr() and handle___vgic_v5_restore_vmcr_apr()
> in arch/arm64/kvm/hyp/nvhe/hyp-main.c:
>
> handle___vgic_v5_save_apr(struct kvm_cpu_context *host_ctxt) {
>     DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
>     __vgic_v5_save_apr(kern_hyp_va(cpu_if));
> }
>
> These handlers appear to take a cpu_if pointer directly from the untrusted
> host context and apply kern_hyp_va() without validation. Could a malicious
> host provide an address that translates to an arbitrary EL2-private memory
> location, granting an arbitrary write or read primitive when
> __vgic_v5_save_apr() or __vgic_v5_restore_vmcr_apr() is invoked?
>
> [Severity: Critical]
> This is also a pre-existing issue, but does a similar arbitrary memory
> corruption primitive exist in handle___kvm_adjust_pc()?

Yes, pre-existing. It is addressed later in this series by
"Prevent host PC adjustments for protected vCPUs" (patch 10/11).

Cheers,
/fuad

>
> Looking at handle___kvm_adjust_pc() in arch/arm64/kvm/hyp/nvhe/hyp-main.c:
>
> static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt) {
>     DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
>     __kvm_adjust_pc(kern_hyp_va(vcpu));
> }
>
> And __kvm_adjust_pc() in arch/arm64/kvm/hyp/exception.c:
>
> void __kvm_adjust_pc(struct kvm_vcpu *vcpu) {
>     if (vcpu_get_flag(vcpu, PENDING_EXCEPTION)) {
>         kvm_inject_exception(vcpu);
>         vcpu_clear_flag(vcpu, PENDING_EXCEPTION);
>         vcpu_clear_flag(vcpu, EXCEPT_MASK);
>     }
>     ...
> }
>
> It seems the host-provided vcpu pointer is converted via kern_hyp_va()
> without the __get_host_hyp_vcpus() validation barrier. Could a malicious
> host inject a pointer that translates into EL2-private memory (like
> hypervisor page tables) and set the PENDING_EXCEPTION bit, forcing
> __kvm_adjust_pc() to write exception vector addresses and clear flags at
> specific offsets within the target structure?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=9

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
  2026-06-12  8:05     ` Fuad Tabba
@ 2026-06-12  8:09       ` Fuad Tabba
  0 siblings, 0 replies; 23+ messages in thread
From: Fuad Tabba @ 2026-06-12  8:09 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvmarm, Oliver Upton, Marc Zyngier

On Fri, 12 Jun 2026 at 09:05, Fuad Tabba <tabba@google.com> wrote:
>
> On Fri, 12 Jun 2026 at 08:24, <sashiko-bot@kernel.org> wrote:
> >
> > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> >
> > Pre-existing issues:
> > - [Critical] The GICv5 equivalents of the VGIC save/restore hypercalls (`handle___vgic_v5_save_apr` and `handle___vgic_v5_restore_vmcr_apr`) directly dereference host-provided pointers without validation, allowing a complete bypass of pKVM isolation.
> > - [Critical] The `handle___kvm_adjust_pc` hypercall directly dereferences a host-provided `vcpu` pointer without validation, granting the host an arbitrary memory corruption primitive against the hypervisor.
> > --
> >
> > commit a5d2ecd901c7ff6f71119eb6ee605aaa1a3fee90
> > Author: Marc Zyngier <maz@kernel.org>
> >
> > KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
> >
> > The host passes a vgic_v3_cpu_if pointer to the __vgic_v3_save_aprs and
> > __vgic_v3_restore_vmcr_aprs hypercalls, which EL2 dereferences
> > wholesale. That exposes the host's full VGIC emulation state to the
> > hypervisor, against pKVM's isolation goals.
> >
> > Recover the host vCPU from the supplied cpu_if via container_of() and
> > copy only vgic_vmcr and the active priority registers between EL2's
> > hyp-side state and the host vCPU, so EL2 no longer dereferences the
> > host's vgic_v3_cpu_if directly.
> >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index 420fb19a64760..2f165b6c7b071 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> >
> > [ ... ]
> >
> > > @@ -489,16 +501,63 @@ static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
> > >
> > >  static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
> > >  {
> > > -     DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
> > > +     struct pkvm_hyp_vcpu *hyp_vcpu;
> > > +     struct kvm_vcpu *host_vcpu;
> > >
> > > -     __vgic_v3_save_aprs(kern_hyp_va(cpu_if));
> > > +     host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
> > > +                                                        &hyp_vcpu);
> > > +     if (!host_vcpu)
> > > +             return;
> >
> > [ ... ]
> >
> > > +     } else {
> > > +             __vgic_v3_save_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
> > > +     }
> > >  }
> >
> > [Severity: Critical]
> > This is a pre-existing issue, but while this patch addresses the VGIC v3 host
> > pointer dereference bypass by using get_host_hyp_vcpus_from_vgic_v3_cpu_if(),
> > does the same pKVM isolation bypass exist in the GICv5 equivalents?
>
> GICv5 is not supported in protected mode. vgic-v5.c:62-64 skips
> GICv5 registration entirely when is_protected_kvm_enabled(), so no
> GICv5-based guest can be created under pKVM. The host never reaches
> these hypercalls through any legitimate code path.
>
> >
> > Looking at handle___vgic_v5_save_apr() and handle___vgic_v5_restore_vmcr_apr()
> > in arch/arm64/kvm/hyp/nvhe/hyp-main.c:
> >
> > handle___vgic_v5_save_apr(struct kvm_cpu_context *host_ctxt) {
> >     DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
> >     __vgic_v5_save_apr(kern_hyp_va(cpu_if));
> > }
> >
> > These handlers appear to take a cpu_if pointer directly from the untrusted
> > host context and apply kern_hyp_va() without validation. Could a malicious
> > host provide an address that translates to an arbitrary EL2-private memory
> > location, granting an arbitrary write or read primitive when
> > __vgic_v5_save_apr() or __vgic_v5_restore_vmcr_apr() is invoked?
> >
> > [Severity: Critical]
> > This is also a pre-existing issue, but does a similar arbitrary memory
> > corruption primitive exist in handle___kvm_adjust_pc()?
>
> Yes, pre-existing. It is addressed later in this series by
> "Prevent host PC adjustments for protected vCPUs" (patch 10/11).

Sorry, it is preexisting, and the fix is in the followup series, not this one.

> Cheers,
> /fuad
>
> >
> > Looking at handle___kvm_adjust_pc() in arch/arm64/kvm/hyp/nvhe/hyp-main.c:
> >
> > static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt) {
> >     DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> >     __kvm_adjust_pc(kern_hyp_va(vcpu));
> > }
> >
> > And __kvm_adjust_pc() in arch/arm64/kvm/hyp/exception.c:
> >
> > void __kvm_adjust_pc(struct kvm_vcpu *vcpu) {
> >     if (vcpu_get_flag(vcpu, PENDING_EXCEPTION)) {
> >         kvm_inject_exception(vcpu);
> >         vcpu_clear_flag(vcpu, PENDING_EXCEPTION);
> >         vcpu_clear_flag(vcpu, EXCEPT_MASK);
> >     }
> >     ...
> > }
> >
> > It seems the host-provided vcpu pointer is converted via kern_hyp_va()
> > without the __get_host_hyp_vcpus() validation barrier. Could a malicious
> > host inject a pointer that translates into EL2-private memory (like
> > hypervisor page tables) and set the PENDING_EXCEPTION bit, forcing
> > __kvm_adjust_pc() to write exception vector addresses and clear flags at
> > specific offsets within the target structure?
> >
> > --
> > Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=9

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
  2026-06-12  7:23   ` sashiko-bot
@ 2026-06-12  8:14     ` Fuad Tabba
  0 siblings, 0 replies; 23+ messages in thread
From: Fuad Tabba @ 2026-06-12  8:14 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: Oliver Upton, Marc Zyngier, kvmarm

On Fri, 12 Jun 2026 at 08:23, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Unconditional pinning of vgic_sre to SRE=1 breaks GICv2 emulation for non-protected guests.
> --
>
> commit 3d9e59afaa5e29a6ce57ece8ccacae434b38915d
> Author: Marc Zyngier <maz@kernel.org>
>
> KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
>
> This commit adds flush_hyp_vgic_state() and sync_hyp_vgic_state()
> primitives to move per-vCPU VGIC state between the host and hypervisor
> contexts for pKVM.
>
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 2f165b6c7b071..23e644c24a03a 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> [ ... ]
> > @@ -99,6 +99,46 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
> [ ... ]
> > +static void flush_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > +     struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> > +     struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
> > +     unsigned int used_lrs, max_lrs, i;
> > +
> > +     host_cpu_if     = &host_vcpu->arch.vgic_cpu.vgic_v3;
> > +     hyp_cpu_if      = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> > +
> > +     max_lrs         = (read_gicreg(ICH_VTR_EL2) & ICH_VTR_EL2_ListRegs) + 1;
> > +     used_lrs        = host_cpu_if->used_lrs;
> > +     used_lrs        = min(used_lrs, max_lrs);
> > +
> > +     hyp_cpu_if->vgic_hcr    = host_cpu_if->vgic_hcr;
> > +     /* Should be a one-off */
> > +     hyp_cpu_if->vgic_sre    = (ICC_SRE_EL1_DIB |
> > +                                ICC_SRE_EL1_DFB |
> > +                                ICC_SRE_EL1_SRE);
>
> [Severity: High]
> Does hardcoding ICC_SRE_EL1_SRE here break GICv2 emulation for non-protected
> guests?

No. GICv2 is not supported under pKVM. vgic_v2_probe() (vgic-v2.c)
returns -ENXIO when is_protected_kvm_enabled(), so no GICv2 guest can
be created. Every guest reaching flush_hyp_vgic_state is GICv3.

> When __vgic_v3_restore_state() executes on guest entry, it will program
> ICC_SRE_EL1 with this value. Setting the SRE bit to 1 disables the GICv2
> memory-mapped interface (GICV) for EL1. Non-protected guests configured to
> use GICv2 rely on vgic_sre == 0 to keep the memory-mapped interface enabled.
>
> Should this value be copied from the host-set values for non-protected VMs
> rather than being unconditionally pinned?

The host sets the same value for GICv3 guests (vgic-v3.c),
so copying and hardcoding produce the same result. GICv2 is blocked
under pKVM (vgic_v2_probe returns -ENXIO), so the vgic_sre=0 branch
is unreachable. Hardcoding is intentional for the protected case
(EL2 must not trust the host value) and harmless for non-protected
(the values are identical).

Cheers,
/fuad

>
> > +     hyp_cpu_if->used_lrs    = used_lrs;
> > +
> > +     for (i = 0; i < used_lrs; i++)
> > +             hyp_cpu_if->vgic_lr[i] = host_cpu_if->vgic_lr[i];
> > +}
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=10

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
  2026-06-12  7:19   ` sashiko-bot
@ 2026-06-12  9:51     ` Fuad Tabba
  0 siblings, 0 replies; 23+ messages in thread
From: Fuad Tabba @ 2026-06-12  9:51 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvmarm, Marc Zyngier, Oliver Upton, Will Deacon

You finally got me Sashiko... you finally got me (and on the last patch)...

Will fix in the respin. Details below...

On Fri, 12 Jun 2026 at 08:19, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>
> New issues:
> - [High] Lazy vCPU state sync bypasses vcpu_mode_is_bad_32bit() check on non-trap VM exits due to stale CPSR.
> - [High] Exceptions injected by the host and other context modifications during the KVM_RUN loop are silently dropped, causing infinite VM exit loops or ignored requests.
>
> Pre-existing issues:
> - [Critical] Protected VM register state is unconditionally copied between the hypervisor and untrusted host, violating pKVM confidentiality and integrity guarantees.
> --
>
> commit 96ecd36038c29097f333a40bf0eebd7b62bffc92
> Author: tabba@google.com <tabba@google.com>
> Subject: KVM: arm64: Implement lazy vCPU state sync for non-protected guests
>
> This commit implements lazy synchronization of vCPU register state for
> non-protected guests running under pKVM. It defers copying the context from
> the host to the hypervisor until the host marks it dirty, and from the
> hypervisor to the host until the host requires it. The PC is synced on every
> exit to maintain accurate tracepoints.
>
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 54aedf93c78b6..dccc3786548b0 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> [ ... ]
> > @@ -489,6 +504,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
> >  /* For exit types that need handling before we can be preempted */
> >  void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
> >  {
> > +     /*
> > +      * We just exited, so the state is clean from a hypervisor
> > +      * perspective.
> > +      */
> > +     if (is_protected_kvm_enabled())
> > +             vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> > +
>
> [Severity: High]
> When a guest exits (such as for ARM_EXCEPTION_IL or an IRQ), this clears the
> PKVM_HOST_STATE_DIRTY flag.
>
> If the host handles the exit or processes pending requests (like
> KVM_REQ_VCPU_RESET or injecting undefined/SErrors), those operations modify
> the host's copy of the guest register context (vcpu->arch.ctxt).
>
> However, since these paths bypass handle_trap_exceptions(), the
> PKVM_HOST_STATE_DIRTY flag is never set again.
>
> On the next guest entry, flush_hyp_vcpu() at EL2 will see the dirty flag is
> clear and skip calling __flush_hyp_vcpu().
>
> Will this cause exceptions injected by the host and other context modifications
> made during the KVM_RUN loop to be silently dropped? For ARM_EXCEPTION_IL,
> does this mean the hypervisor will re-enter the guest using the old,
> unmodified register state, causing an instant re-trap?

No loop: PENDING_EXCEPTION rides in iflags (flushed unconditionally),
so EL2 re-derives and
delivers the exception on re-entry. It did surface one real gap:
enter_exception64() doesn't
write ESR, so an unmasked SError's ESR_EL1 syndrome was dropped under
lazy sync. Fixed in the
respin (sync + dirty before host-side SError injection).

>
> >       if (ARM_SERROR_PENDING(exception_index)) {
> >               if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
> >                       u64 disr = kvm_vcpu_get_disr(vcpu);
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 23e644c24a03a..02383b372258f 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> [ ... ]
> > @@ -168,7 +211,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> >       fpsimd_sve_flush();
> >       flush_debug_state(hyp_vcpu);
> >
> > -     hyp_vcpu->vcpu.arch.ctxt        = host_vcpu->arch.ctxt;
> > +     /*
> > +      * If we deal with a non-protected guest and the state is potentially
> > +      * dirty (from a host perspective), copy the state back into the hyp
> > +      * vcpu.
> > +      */
> > +     if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> > +             if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> > +                     __flush_hyp_vcpu(hyp_vcpu);
> > +     } else {
> > +             hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
>
> [Severity: Critical]
> This is a pre-existing issue, but I have a question about this copy operation.
>
> This executes at EL2 and copies the entire vcpu.arch.ctxt from host_vcpu
> (memory accessible to the untrusted host EL1) into the protected VM's
> private state.
>
> Could this violate guest integrity by allowing the host to hijack the VM's
> control flow?

Existing upstream behaviour, unchanged here. The follow-up series
replaces this copy with
per-field marshalling, which is what establishes protected vCPU-state isolation.

>
> > +     }
> >
> >       hyp_vcpu->vcpu.arch.mdcr_el2    = host_vcpu->arch.mdcr_el2;
> >       hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
> [ ... ]
> > @@ -191,9 +244,11 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> >       fpsimd_sve_sync(&hyp_vcpu->vcpu);
> >       sync_debug_state(hyp_vcpu);
> >
> > -     host_vcpu->arch.ctxt            = hyp_vcpu->vcpu.arch.ctxt;
> > -
> > -     host_vcpu->arch.hcr_el2         = hyp_vcpu->vcpu.arch.hcr_el2;
> > +     if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> > +             host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
>
> [Severity: Critical]
> This is a pre-existing issue, but I have a similar question about copying
> the protected guest's register state back to the host here.
>
> Does copying hyp_vcpu->vcpu.arch.ctxt back into the untrusted
> host_vcpu->arch.ctxt violate pKVM confidentiality guarantees by leaking
> the state to the host?

Same as the flush side: unchanged here, established by the follow-up
series' marshalling.

>
> > +     else
> > +             /* Keep the PC current for the kvm_exit tracepoint (lazy ctxt sync). */
> > +             host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
>
> [Severity: High]
> Since handle_trap_exceptions() is only called for traps, an exit like
> ARM_EXCEPTION_IRQ won't trigger the __pkvm_vcpu_sync_state hypercall.
>
> Because this code only synchronizes the pc field for non-protected guests,
> vcpu->arch.ctxt.regs.pstate (CPSR) remains stale in the host vCPU context.
>
> Does this lazy synchronization leave the host with a stale CPSR on non-trap
> exits?
>
> If the guest transitioned to an illegal 32-bit mode immediately before the
> IRQ was taken, would the KVM_RUN inner loop evaluate vcpu_mode_is_bad_32bit()
> using this stale CPSR, bypassing KVM's architectural protection against illegal
> AArch32 execution?

Yes: stale PSTATE on non-trap exits, non-protected only. Fixed in the
respin (PSTATE synced
alongside PC every exit).

Cheers,
/fuad

>
> >
> >       host_vcpu->arch.fault           = hyp_vcpu->vcpu.arch.fault;
> >
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260612065925.755562-1-tabba@google.com?part=11

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-06-12  9:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12  6:59 [PATCH v1 00/11] KVM: arm64: Rework pKVM vCPU state synchronisation tabba
2026-06-12  6:59 ` [PATCH v1 01/11] KVM: arm64: Add scoped resource management (guard) for hyp_spinlock tabba
2026-06-12  6:59 ` [PATCH v1 02/11] KVM: arm64: Use guard(hyp_spinlock) in pKVM hypervisor code tabba
2026-06-12  6:59 ` [PATCH v1 03/11] KVM: arm64: Use guard()/scoped_guard() in arm64 KVM EL1 code tabba
2026-06-12  6:59 ` [PATCH v1 04/11] KVM: arm64: Extract MPIDR computation into a shared header tabba
2026-06-12  6:59 ` [PATCH v1 05/11] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code tabba
2026-06-12  7:17   ` sashiko-bot
2026-06-12  7:53     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 06/11] KVM: arm64: Factor out reusable vCPU reset helpers tabba
2026-06-12  6:59 ` [PATCH v1 07/11] KVM: arm64: Move PSCI helper functions to a shared header tabba
2026-06-12  6:59 ` [PATCH v1 08/11] KVM: arm64: Add host and hypervisor vCPU lookup primitives tabba
2026-06-12  7:08   ` sashiko-bot
2026-06-12  7:15     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 09/11] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch tabba
2026-06-12  7:24   ` sashiko-bot
2026-06-12  8:05     ` Fuad Tabba
2026-06-12  8:09       ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 10/11] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 tabba
2026-06-12  7:23   ` sashiko-bot
2026-06-12  8:14     ` Fuad Tabba
2026-06-12  6:59 ` [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests tabba
2026-06-12  7:19   ` sashiko-bot
2026-06-12  9:51     ` Fuad Tabba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.