* [PATCH v3 1/8] KVM: arm64: Extract MPIDR computation into a shared header
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 2/8] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code Fuad Tabba
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
Extract the vCPU MPIDR computation embedded in reset_mpidr() into a
kvm_calculate_mpidr() inline in sys_regs.h, so it can be computed
without duplicating the logic. A follow-up series reuses it to reset
protected vCPUs at EL2.
No functional change intended.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/kvm/sys_regs.c | 14 +-------------
arch/arm64/kvm/sys_regs.h | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 33c921df19b54..674fabe1d40d1 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -976,21 +976,9 @@ static u64 reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
- u64 mpidr;
+ u64 mpidr = kvm_calculate_mpidr(vcpu);
- /*
- * Map the vcpu_id into the first three affinity level fields of
- * the MPIDR. We limit the number of VCPUs in level 0 due to a
- * limitation to 16 CPUs in that level in the ICC_SGIxR registers
- * of the GICv3 to be able to address each CPU directly when
- * sending IPIs.
- */
- mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
- mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
- mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
- mpidr |= (1ULL << 31);
vcpu_write_sys_reg(vcpu, mpidr, MPIDR_EL1);
-
return mpidr;
}
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 2a983664220ce..bd56a45abbf9c 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -222,6 +222,25 @@ find_reg(const struct sys_reg_params *params, const struct sys_reg_desc table[],
return __inline_bsearch((void *)pval, table, num, sizeof(table[0]), match_sys_reg);
}
+static inline u64 kvm_calculate_mpidr(const struct kvm_vcpu *vcpu)
+{
+ u64 mpidr;
+
+ /*
+ * Map the vcpu_id into the first three affinity level fields of
+ * the MPIDR. We limit the number of VCPUs in level 0 due to a
+ * limitation to 16 CPUs in that level in the ICC_SGIxR registers
+ * of the GICv3 to be able to address each CPU directly when
+ * sending IPIs.
+ */
+ mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
+ mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
+ mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
+ mpidr |= (1ULL << 31);
+
+ return mpidr;
+}
+
const struct sys_reg_desc *get_reg_by_id(u64 id,
const struct sys_reg_desc table[],
unsigned int num);
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 2/8] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 1/8] KVM: arm64: Extract MPIDR computation into a shared header Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 3/8] KVM: arm64: Factor out reusable vCPU reset helpers Fuad Tabba
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
The vcpu_{read,write}_sys_reg() accessors are host-only, so helpers
built on them such as kvm_vcpu_set_be()/kvm_vcpu_is_be() cannot be
shared with hyp code. exception.c already wraps them in
__vcpu_{read,write}_sys_reg(), which pick the host- or hyp-side accessor
via has_vhe() and so are valid in any context.
Move those wrappers to kvm_emulate.h as kvm_vcpu_{read,write}_sys_reg()
and switch the callers over, so a follow-up series can share that
emulation code at EL2.
No functional change intended.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/include/asm/kvm_emulate.h | 22 +++++++++++++++---
arch/arm64/kvm/hyp/exception.c | 34 ++++++++--------------------
2 files changed, 28 insertions(+), 28 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5bf3d7e1d92c7..80b30fead3d16 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -506,6 +506,22 @@ static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
}
+static inline u64 kvm_vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
+{
+ if (has_vhe())
+ return vcpu_read_sys_reg(vcpu, reg);
+
+ return __vcpu_sys_reg(vcpu, reg);
+}
+
+static inline void kvm_vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
+{
+ if (has_vhe())
+ vcpu_write_sys_reg(vcpu, val, reg);
+ else
+ __vcpu_assign_sys_reg(vcpu, reg, val);
+}
+
static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
{
if (vcpu_mode_is_32bit(vcpu)) {
@@ -516,9 +532,9 @@ static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
r = vcpu_has_nv(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
- sctlr = vcpu_read_sys_reg(vcpu, r);
+ sctlr = kvm_vcpu_read_sys_reg(vcpu, r);
sctlr |= SCTLR_ELx_EE;
- vcpu_write_sys_reg(vcpu, sctlr, r);
+ kvm_vcpu_write_sys_reg(vcpu, sctlr, r);
}
}
@@ -533,7 +549,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
r = is_hyp_ctxt(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
bit = vcpu_mode_priv(vcpu) ? SCTLR_ELx_EE : SCTLR_EL1_E0E;
- return vcpu_read_sys_reg(vcpu, r) & bit;
+ return kvm_vcpu_read_sys_reg(vcpu, r) & bit;
}
static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index bef40ddb16dbc..2cb68dc7d441e 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -20,22 +20,6 @@
#error Hypervisor code only!
#endif
-static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
-{
- if (has_vhe())
- return vcpu_read_sys_reg(vcpu, reg);
-
- return __vcpu_sys_reg(vcpu, reg);
-}
-
-static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
-{
- if (has_vhe())
- vcpu_write_sys_reg(vcpu, val, reg);
- else
- __vcpu_assign_sys_reg(vcpu, reg, val);
-}
-
static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
u64 val)
{
@@ -101,14 +85,14 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
switch (target_mode) {
case PSR_MODE_EL1h:
- vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
- sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
- __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
+ vbar = kvm_vcpu_read_sys_reg(vcpu, VBAR_EL1);
+ sctlr = kvm_vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+ kvm_vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
break;
case PSR_MODE_EL2h:
- vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
- sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
- __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+ vbar = kvm_vcpu_read_sys_reg(vcpu, VBAR_EL2);
+ sctlr = kvm_vcpu_read_sys_reg(vcpu, SCTLR_EL2);
+ kvm_vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
break;
default:
/* Don't do that */
@@ -185,7 +169,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
*/
static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
{
- u32 sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+ u32 sctlr = kvm_vcpu_read_sys_reg(vcpu, SCTLR_EL1);
unsigned long old, new;
old = *vcpu_cpsr(vcpu);
@@ -281,7 +265,7 @@ static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
unsigned long spsr = *vcpu_cpsr(vcpu);
bool is_thumb = (spsr & PSR_AA32_T_BIT);
- u32 sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+ u32 sctlr = kvm_vcpu_read_sys_reg(vcpu, SCTLR_EL1);
u32 return_address;
*vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
@@ -305,7 +289,7 @@ static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
if (sctlr & (1 << 13))
vect_offset += 0xffff0000;
else /* always have security exceptions */
- vect_offset += __vcpu_read_sys_reg(vcpu, VBAR_EL1);
+ vect_offset += kvm_vcpu_read_sys_reg(vcpu, VBAR_EL1);
*vcpu_pc(vcpu) = vect_offset;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 3/8] KVM: arm64: Factor out reusable vCPU reset helpers
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 1/8] KVM: arm64: Extract MPIDR computation into a shared header Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 2/8] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 4/8] KVM: arm64: Move PSCI helper functions to a shared header Fuad Tabba
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
Pull the reusable pieces out of kvm_reset_vcpu(): expose the reset
PSTATE values in kvm_arm.h, and split the core register reset and the
PSCI-driven reset into kvm_reset_vcpu_core() and kvm_reset_vcpu_psci().
A follow-up series reuses these to reset protected vCPUs at EL2.
No functional change intended.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/include/asm/kvm_arm.h | 12 ++++++
arch/arm64/include/asm/kvm_emulate.h | 57 ++++++++++++++++++++++++++
arch/arm64/kvm/reset.c | 60 ++--------------------------
3 files changed, 72 insertions(+), 57 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3f9233b5a1308..aba4ec09acd23 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -348,4 +348,16 @@
{ PSR_AA32_MODE_UND, "32-bit UND" }, \
{ PSR_AA32_MODE_SYS, "32-bit SYS" }
+/*
+ * ARMv8 Reset Values
+ */
+#define VCPU_RESET_PSTATE_EL1 (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
+ PSR_F_BIT | PSR_D_BIT)
+
+#define VCPU_RESET_PSTATE_EL2 (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
+ PSR_F_BIT | PSR_D_BIT)
+
+#define VCPU_RESET_PSTATE_SVC (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
+ PSR_AA32_I_BIT | PSR_AA32_F_BIT)
+
#endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 80b30fead3d16..2385d8855fcfd 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -704,4 +704,61 @@ static inline void vcpu_set_hcrx(struct kvm_vcpu *vcpu)
vcpu->arch.hcrx_el2 |= HCRX_EL2_EnASR;
}
}
+
+/* Reset a vcpu's core registers. */
+static inline void kvm_reset_vcpu_core(struct kvm_vcpu *vcpu)
+{
+ u32 pstate;
+
+ if (vcpu_el1_is_32bit(vcpu))
+ pstate = VCPU_RESET_PSTATE_SVC;
+ else if (vcpu_has_nv(vcpu))
+ pstate = VCPU_RESET_PSTATE_EL2;
+ else
+ pstate = VCPU_RESET_PSTATE_EL1;
+
+ /* Reset core registers */
+ memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
+ memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
+ vcpu->arch.ctxt.spsr_abt = 0;
+ vcpu->arch.ctxt.spsr_und = 0;
+ vcpu->arch.ctxt.spsr_irq = 0;
+ vcpu->arch.ctxt.spsr_fiq = 0;
+ vcpu_gp_regs(vcpu)->pstate = pstate;
+}
+
+/* PSCI reset handling for a vcpu. */
+static inline void kvm_reset_vcpu_psci(struct kvm_vcpu *vcpu,
+ struct vcpu_reset_state *reset_state)
+{
+ unsigned long target_pc = reset_state->pc;
+
+ /* Gracefully handle Thumb2 entry point */
+ if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
+ target_pc &= ~1UL;
+ vcpu_set_thumb(vcpu);
+ }
+
+ /* Propagate caller endianness */
+ if (reset_state->be)
+ kvm_vcpu_set_be(vcpu);
+
+ *vcpu_pc(vcpu) = target_pc;
+
+ /*
+ * We may come from a state where either a PC update was
+ * pending (SMC call resulting in PC being increpented to
+ * skip the SMC) or a pending exception. Make sure we get
+ * rid of all that, as this cannot be valid out of reset.
+ *
+ * Note that clearing the exception mask also clears PC
+ * updates, but that's an implementation detail, and we
+ * really want to make it explicit.
+ */
+ vcpu_clear_flag(vcpu, PENDING_EXCEPTION);
+ vcpu_clear_flag(vcpu, EXCEPT_MASK);
+ vcpu_clear_flag(vcpu, INCREMENT_PC);
+ vcpu_set_reg(vcpu, 0, reset_state->r0);
+}
+
#endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b963fd975aaca..10eb7249aa9e8 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -34,18 +34,6 @@
static u32 __ro_after_init kvm_ipa_limit;
unsigned int __ro_after_init kvm_host_sve_max_vl;
-/*
- * ARMv8 Reset Values
- */
-#define VCPU_RESET_PSTATE_EL1 (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
- PSR_F_BIT | PSR_D_BIT)
-
-#define VCPU_RESET_PSTATE_EL2 (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
- PSR_F_BIT | PSR_D_BIT)
-
-#define VCPU_RESET_PSTATE_SVC (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
- PSR_AA32_I_BIT | PSR_AA32_F_BIT)
-
unsigned int __ro_after_init kvm_sve_max_vl;
int __init kvm_arm_init_sve(void)
@@ -191,7 +179,6 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_reset_state reset_state;
bool loaded;
- u32 pstate;
spin_lock(&vcpu->arch.mp_state_lock);
reset_state = vcpu->arch.reset_state;
@@ -210,21 +197,8 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
kvm_vcpu_reset_sve(vcpu);
}
- if (vcpu_el1_is_32bit(vcpu))
- pstate = VCPU_RESET_PSTATE_SVC;
- else if (vcpu_has_nv(vcpu))
- pstate = VCPU_RESET_PSTATE_EL2;
- else
- pstate = VCPU_RESET_PSTATE_EL1;
-
/* Reset core registers */
- memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
- memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
- vcpu->arch.ctxt.spsr_abt = 0;
- vcpu->arch.ctxt.spsr_und = 0;
- vcpu->arch.ctxt.spsr_irq = 0;
- vcpu->arch.ctxt.spsr_fiq = 0;
- vcpu_gp_regs(vcpu)->pstate = pstate;
+ kvm_reset_vcpu_core(vcpu);
/* Reset system registers */
kvm_reset_sys_regs(vcpu);
@@ -233,36 +207,8 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
* Additional reset state handling that PSCI may have imposed on us.
* Must be done after all the sys_reg reset.
*/
- if (reset_state.reset) {
- unsigned long target_pc = reset_state.pc;
-
- /* Gracefully handle Thumb2 entry point */
- if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
- target_pc &= ~1UL;
- vcpu_set_thumb(vcpu);
- }
-
- /* Propagate caller endianness */
- if (reset_state.be)
- kvm_vcpu_set_be(vcpu);
-
- *vcpu_pc(vcpu) = target_pc;
-
- /*
- * We may come from a state where either a PC update was
- * pending (SMC call resulting in PC being increpented to
- * skip the SMC) or a pending exception. Make sure we get
- * rid of all that, as this cannot be valid out of reset.
- *
- * Note that clearing the exception mask also clears PC
- * updates, but that's an implementation detail, and we
- * really want to make it explicit.
- */
- vcpu_clear_flag(vcpu, PENDING_EXCEPTION);
- vcpu_clear_flag(vcpu, EXCEPT_MASK);
- vcpu_clear_flag(vcpu, INCREMENT_PC);
- vcpu_set_reg(vcpu, 0, reset_state.r0);
- }
+ if (reset_state.reset)
+ kvm_reset_vcpu_psci(vcpu, &reset_state);
/* Reset timer */
kvm_timer_vcpu_reset(vcpu);
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 4/8] KVM: arm64: Move PSCI helper functions to a shared header
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
` (2 preceding siblings ...)
2026-06-26 7:04 ` [PATCH v3 3/8] KVM: arm64: Factor out reusable vCPU reset helpers Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:16 ` sashiko-bot
2026-06-26 7:04 ` [PATCH v3 5/8] KVM: arm64: Add host and hypervisor vCPU lookup primitives Fuad Tabba
` (3 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
Move kvm_psci_valid_affinity() and kvm_psci_narrow_to_32bit() from
psci.c to include/kvm/arm_psci.h, and move psci_affinity_mask() there
too, renaming it kvm_psci_affinity_mask() now that it is no longer
file-local. A follow-up series handles some protected-guest PSCI calls
at EL2 using these helpers.
No functional change intended.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/kvm/psci.c | 30 +-----------------------------
include/kvm/arm_psci.h | 27 +++++++++++++++++++++++++++
2 files changed, 28 insertions(+), 29 deletions(-)
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 3b5dbe9a0a0ea..e3db84400d1f8 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -21,16 +21,6 @@
* as described in ARM document number ARM DEN 0022A.
*/
-#define AFFINITY_MASK(level) ~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
-
-static unsigned long psci_affinity_mask(unsigned long affinity_level)
-{
- if (affinity_level <= 3)
- return MPIDR_HWID_BITMASK & AFFINITY_MASK(affinity_level);
-
- return 0;
-}
-
static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
{
/*
@@ -51,12 +41,6 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
return PSCI_RET_SUCCESS;
}
-static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
- unsigned long affinity)
-{
- return !(affinity & ~MPIDR_HWID_BITMASK);
-}
-
static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
{
struct vcpu_reset_state *reset_state;
@@ -135,7 +119,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
return PSCI_RET_INVALID_PARAMS;
/* Determine target affinity mask */
- target_affinity_mask = psci_affinity_mask(lowest_affinity_level);
+ target_affinity_mask = kvm_psci_affinity_mask(lowest_affinity_level);
if (!target_affinity_mask)
return PSCI_RET_INVALID_PARAMS;
@@ -220,18 +204,6 @@ static void kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
}
-static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
-{
- int i;
-
- /*
- * Zero the input registers' upper 32 bits. They will be fully
- * zeroed on exit, so we're fine changing them in place.
- */
- for (i = 1; i < 4; i++)
- vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
-}
-
static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32 fn)
{
/*
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index cbaec804eb839..f86a006d67136 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -38,6 +38,33 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
return KVM_ARM_PSCI_0_1;
}
+/* Narrow the PSCI register arguments (r1 to r3) to 32 bits. */
+static inline void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
+{
+ int i;
+
+ /*
+ * Zero the input registers' upper 32 bits. They will be fully
+ * zeroed on exit, so we're fine changing them in place.
+ */
+ for (i = 1; i < 4; i++)
+ vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
+}
+
+static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
+ unsigned long affinity)
+{
+ return !(affinity & ~MPIDR_HWID_BITMASK);
+}
+
+static inline unsigned long kvm_psci_affinity_mask(unsigned long affinity_level)
+{
+ if (affinity_level <= 3)
+ return MPIDR_HWID_BITMASK &
+ ~((0x1UL << (affinity_level * MPIDR_LEVEL_BITS)) - 1);
+
+ return 0;
+}
int kvm_psci_call(struct kvm_vcpu *vcpu);
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 4/8] KVM: arm64: Move PSCI helper functions to a shared header
2026-06-26 7:04 ` [PATCH v3 4/8] KVM: arm64: Move PSCI helper functions to a shared header Fuad Tabba
@ 2026-06-26 7:16 ` sashiko-bot
2026-06-26 7:20 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-06-26 7:16 UTC (permalink / raw)
To: Fuad Tabba; +Cc: Oliver Upton, Marc Zyngier, kvmarm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The newly shared header `include/kvm/arm_psci.h` references functions without including their defining headers (`<asm/kvm_emulate.h>` and `<linux/wordpart.h>`), violating the practice of self-contained headers.
--
commit 6015f90f953617e9e81e02b97f5532daaa5dfdd7
Author: Fuad Tabba <fuad.tabba@linux.dev>
KVM: arm64: Move PSCI helper functions to a shared header
This commit moves kvm_psci_valid_affinity(), kvm_psci_narrow_to_32bit(),
and psci_affinity_mask() from psci.c to the shared include/kvm/arm_psci.h
header. This prepares the helpers for use in protected-guest PSCI calls
at EL2.
> diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> index cbaec804eb839..f86a006d67136 100644
> --- a/include/kvm/arm_psci.h
> +++ b/include/kvm/arm_psci.h
[ ... ]
> +/* Narrow the PSCI register arguments (r1 to r3) to 32 bits. */
> +static inline void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
> +{
> + int i;
> +
> + /*
> + * Zero the input registers' upper 32 bits. They will be fully
> + * zeroed on exit, so we're fine changing them in place.
> + */
> + for (i = 1; i < 4; i++)
> + vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
> +}
[Severity: Low]
Does this new shared header need to include asm/kvm_emulate.h and
linux/wordpart.h to remain self-contained?
This inline function relies on lower_32_bits(), vcpu_get_reg(), and
vcpu_set_reg(), but the definitions are not directly included here.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=4
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 4/8] KVM: arm64: Move PSCI helper functions to a shared header
2026-06-26 7:16 ` sashiko-bot
@ 2026-06-26 7:20 ` Fuad Tabba
0 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:20 UTC (permalink / raw)
To: sashiko-reviews; +Cc: Oliver Upton, Marc Zyngier, kvmarm
On Fri, 26 Jun 2026 at 08:16, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Low] The newly shared header `include/kvm/arm_psci.h` references functions without including their defining headers (`<asm/kvm_emulate.h>` and `<linux/wordpart.h>`), violating the practice of self-contained headers.
> --
>
> commit 6015f90f953617e9e81e02b97f5532daaa5dfdd7
> Author: Fuad Tabba <fuad.tabba@linux.dev>
>
> KVM: arm64: Move PSCI helper functions to a shared header
>
> This commit moves kvm_psci_valid_affinity(), kvm_psci_narrow_to_32bit(),
> and psci_affinity_mask() from psci.c to the shared include/kvm/arm_psci.h
> header. This prepares the helpers for use in protected-guest PSCI calls
> at EL2.
We've been throught this before my friend [1]
The header was already not self-contained before this commit: the
pre-existing kvm_psci_version() uses vcpu_has_feature() from
asm/kvm_emulate.h without including it. Every .c file that includes
arm_psci.h already includes kvm_emulate.h. The other dependencies
(lower_32_bits, MPIDR_HWID_BITMASK) are transitively available through
linux/kvm_host.h.
Cheers,
/fuad
[1] https://lore.kernel.org/all/CA+EHjTwpB2fmoYGfkP5RVZcB4sgbVPQ0=QR5dyUJnpQ_VQa+pA@mail.gmail.com/
>
> > diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> > index cbaec804eb839..f86a006d67136 100644
> > --- a/include/kvm/arm_psci.h
> > +++ b/include/kvm/arm_psci.h
> [ ... ]
> > +/* Narrow the PSCI register arguments (r1 to r3) to 32 bits. */
> > +static inline void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
> > +{
> > + int i;
> > +
> > + /*
> > + * Zero the input registers' upper 32 bits. They will be fully
> > + * zeroed on exit, so we're fine changing them in place.
> > + */
> > + for (i = 1; i < 4; i++)
> > + vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
> > +}
>
> [Severity: Low]
> Does this new shared header need to include asm/kvm_emulate.h and
> linux/wordpart.h to remain self-contained?
>
> This inline function relies on lower_32_bits(), vcpu_get_reg(), and
> vcpu_set_reg(), but the definitions are not directly included here.
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=4
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 5/8] KVM: arm64: Add host and hypervisor vCPU lookup primitives
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
` (3 preceding siblings ...)
2026-06-26 7:04 ` [PATCH v3 4/8] KVM: arm64: Move PSCI helper functions to a shared header Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 6/8] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch Fuad Tabba
` (2 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
From: Marc Zyngier <maz@kernel.org>
The nVHE hypervisor repeatedly resolves a host vCPU into the EL2
address space and validates that the loaded hyp vCPU matches it, with
that logic open-coded in each handler.
Add __get_host_hyp_vcpus() and the get_host_hyp_vcpus() macro, which
translate the host vCPU into the hypervisor's address space and, when
pKVM is enabled, also return the loaded hyp vCPU if it matches. If pKVM
is enabled but the loaded hyp vCPU does not correspond to the requested
host vCPU, both the host and hyp vCPU are returned as NULL. Convert
handle___kvm_vcpu_run() to use it.
No functional change intended.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Fuad Tabba <fuad.tabba@linux.dev>
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 52 ++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 1d01c6e547f5d..8923f594c2640 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -212,14 +212,45 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
pkvm_put_hyp_vcpu(hyp_vcpu);
}
-static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
+static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
+ struct pkvm_hyp_vcpu **hyp_vcpup)
{
- DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
- int ret;
+ struct kvm_vcpu *host_vcpu = kern_hyp_va(arg);
+ struct pkvm_hyp_vcpu *hyp_vcpu = NULL;
if (unlikely(is_protected_kvm_enabled())) {
- struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu || hyp_vcpu->host_vcpu != host_vcpu) {
+ hyp_vcpu = NULL;
+ host_vcpu = NULL;
+ }
+ }
+
+ *hyp_vcpup = hyp_vcpu;
+ return host_vcpu;
+}
+
+#define get_host_hyp_vcpus(ctxt, regnr, hyp_vcpup) \
+ ({ \
+ DECLARE_REG(struct kvm_vcpu *, __vcpu, ctxt, regnr); \
+ __get_host_hyp_vcpus(__vcpu, hyp_vcpup); \
+ })
+
+static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
+{
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ struct kvm_vcpu *host_vcpu;
+ int ret;
+
+ host_vcpu = get_host_hyp_vcpus(host_ctxt, 1, &hyp_vcpu);
+
+ if (!host_vcpu) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (unlikely(hyp_vcpu)) {
/*
* KVM (and pKVM) doesn't support SME guests for now, and
* ensures that SME features aren't enabled in pstate when
@@ -231,23 +262,16 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
goto out;
}
- if (!hyp_vcpu) {
- ret = -EINVAL;
- goto out;
- }
-
flush_hyp_vcpu(hyp_vcpu);
ret = __kvm_vcpu_run(&hyp_vcpu->vcpu);
sync_hyp_vcpu(hyp_vcpu);
} else {
- struct kvm_vcpu *vcpu = kern_hyp_va(host_vcpu);
-
/* The host is fully trusted, run its vCPU directly. */
- fpsimd_lazy_switch_to_guest(vcpu);
- ret = __kvm_vcpu_run(vcpu);
- fpsimd_lazy_switch_to_host(vcpu);
+ fpsimd_lazy_switch_to_guest(host_vcpu);
+ ret = __kvm_vcpu_run(host_vcpu);
+ fpsimd_lazy_switch_to_host(host_vcpu);
}
out:
cpu_reg(host_ctxt, 1) = ret;
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 6/8] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
` (4 preceding siblings ...)
2026-06-26 7:04 ` [PATCH v3 5/8] KVM: arm64: Add host and hypervisor vCPU lookup primitives Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:24 ` sashiko-bot
2026-06-26 7:04 ` [PATCH v3 7/8] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 Fuad Tabba
2026-06-26 7:04 ` [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests Fuad Tabba
7 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
From: Marc Zyngier <maz@kernel.org>
The host passes a vgic_v3_cpu_if pointer to the __vgic_v3_save_aprs and
__vgic_v3_restore_vmcr_aprs hypercalls, which EL2 dereferences
wholesale. That exposes the host's full VGIC emulation state to the
hypervisor, against pKVM's isolation goals.
Recover the host vCPU from the supplied cpu_if via container_of() and
copy only vgic_vmcr and the active priority registers between EL2's
hyp-side state and the host vCPU, so EL2 no longer dereferences the
host's vgic_v3_cpu_if directly.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Fuad Tabba <fuad.tabba@linux.dev>
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 67 ++++++++++++++++++++++++++++--
1 file changed, 63 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 8923f594c2640..f25ee39715282 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -7,6 +7,8 @@
#include <hyp/adjust_pc.h>
#include <hyp/switch.h>
+#include <linux/irqchip/arm-gic-v3.h>
+
#include <asm/pgtable-types.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
@@ -237,6 +239,16 @@ static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
__get_host_hyp_vcpus(__vcpu, hyp_vcpup); \
})
+#define get_host_hyp_vcpus_from_vgic_v3_cpu_if(ctxt, regnr, hyp_vcpup) \
+ ({ \
+ DECLARE_REG(struct vgic_v3_cpu_if *, cif, ctxt, regnr);\
+ struct kvm_vcpu *__vcpu = container_of(cif, \
+ struct kvm_vcpu, \
+ arch.vgic_cpu.vgic_v3); \
+ \
+ __get_host_hyp_vcpus(__vcpu, hyp_vcpup); \
+ })
+
static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
struct pkvm_hyp_vcpu *hyp_vcpu;
@@ -506,16 +518,63 @@ static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
{
- DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ struct kvm_vcpu *host_vcpu;
- __vgic_v3_save_aprs(kern_hyp_va(cpu_if));
+ host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
+ &hyp_vcpu);
+ if (!host_vcpu)
+ return;
+
+ if (unlikely(hyp_vcpu)) {
+ struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
+ int i;
+
+ hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+ __vgic_v3_save_aprs(hyp_cpu_if);
+
+ host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+ host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
+ for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
+ host_cpu_if->vgic_ap0r[i] = hyp_cpu_if->vgic_ap0r[i];
+ host_cpu_if->vgic_ap1r[i] = hyp_cpu_if->vgic_ap1r[i];
+ }
+ } else {
+ __vgic_v3_save_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
+ }
}
static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
{
- DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ struct kvm_vcpu *host_vcpu;
- __vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
+ host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
+ &hyp_vcpu);
+ if (!host_vcpu)
+ return;
+
+ if (unlikely(hyp_vcpu)) {
+ struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
+ int i;
+
+ hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+ host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+
+ hyp_cpu_if->vgic_vmcr = host_cpu_if->vgic_vmcr;
+ /* Should be a one-off */
+ hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
+ ICC_SRE_EL1_DFB |
+ ICC_SRE_EL1_SRE);
+ for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
+ hyp_cpu_if->vgic_ap0r[i] = host_cpu_if->vgic_ap0r[i];
+ hyp_cpu_if->vgic_ap1r[i] = host_cpu_if->vgic_ap1r[i];
+ }
+
+ __vgic_v3_restore_vmcr_aprs(hyp_cpu_if);
+ } else {
+ __vgic_v3_restore_vmcr_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
+ }
}
static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 6/8] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
2026-06-26 7:04 ` [PATCH v3 6/8] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch Fuad Tabba
@ 2026-06-26 7:24 ` sashiko-bot
2026-06-26 7:30 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-06-26 7:24 UTC (permalink / raw)
To: Fuad Tabba; +Cc: kvmarm, Oliver Upton, Marc Zyngier
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [Critical] The patch correctly fixes TOCTOU and arbitrary read/write vulnerabilities for GICv3 by preventing EL2 from directly dereferencing host-provided pointers in `handle___vgic_v3_save_aprs` and `handle___vgic_v3_restore_vmcr_aprs`. However, it leaves the exact same vulnerability unpatched in the GICv5 equivalents: `handle___vgic_v5_save_apr` and `handle___vgic_v5_restore_vmcr_apr`. These functions take a host-supplied pointer (`cpu_if`), which they pass directly to the inner save/restore functions via `kern_hyp_va`. Because `kern_hyp_va` is idempotent for hypervisor linear-map pointers, the untrusted host (which is an adversary in the pKVM threat model) can pass an arbitrary EL2 memory address. This allows the host to execute arbitrary memory reads and writes in the hypervisor (e.g., writing the value of `SYS_ICH_APR_EL2` to arbitrary EL2 memory or reading arbitrary memory into sysregs), completely bypassing pKVM isolation.
--
commit 815e460f9f860b3c06ec7eb259e1f2d75d196c33
Author: Marc Zyngier <maz@kernel.org>
KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
The host passes a vgic_v3_cpu_if pointer to the __vgic_v3_save_aprs and
__vgic_v3_restore_vmcr_aprs hypercalls, which EL2 dereferences
wholesale. That exposes the host's full VGIC emulation state to the
hypervisor, against pKVM's isolation goals.
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 45d717889f6ea..f3233ee343a39 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
[ ... ]
> @@ -508,16 +520,63 @@ static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
>
> static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
> {
> - DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
> + struct pkvm_hyp_vcpu *hyp_vcpu;
> + struct kvm_vcpu *host_vcpu;
>
> - __vgic_v3_save_aprs(kern_hyp_va(cpu_if));
> + host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
> + &hyp_vcpu);
> + if (!host_vcpu)
> + return;
> +
> + if (unlikely(hyp_vcpu)) {
> + struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
> + int i;
> +
> + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> + __vgic_v3_save_aprs(hyp_cpu_if);
> +
> + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> + host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
> + for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
> + host_cpu_if->vgic_ap0r[i] = hyp_cpu_if->vgic_ap0r[i];
> + host_cpu_if->vgic_ap1r[i] = hyp_cpu_if->vgic_ap1r[i];
> + }
> + } else {
> + __vgic_v3_save_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
> + }
> }
>
> static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
> {
> - DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
> + struct pkvm_hyp_vcpu *hyp_vcpu;
> + struct kvm_vcpu *host_vcpu;
> +
> + host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
> + &hyp_vcpu);
> + if (!host_vcpu)
> + return;
>
> - __vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
> + if (unlikely(hyp_vcpu)) {
> + struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
> + int i;
> +
> + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> +
> + hyp_cpu_if->vgic_vmcr = host_cpu_if->vgic_vmcr;
> + /* Should be a one-off */
> + hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
> + ICC_SRE_EL1_DFB |
> + ICC_SRE_EL1_SRE);
> + for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
> + hyp_cpu_if->vgic_ap0r[i] = host_cpu_if->vgic_ap0r[i];
> + hyp_cpu_if->vgic_ap1r[i] = host_cpu_if->vgic_ap1r[i];
> + }
> +
> + __vgic_v3_restore_vmcr_aprs(hyp_cpu_if);
> + } else {
> + __vgic_v3_restore_vmcr_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
> + }
> }
[Severity: Critical]
This is a pre-existing issue, but while this patch correctly fixes the TOCTOU
and arbitrary read/write vulnerabilities for GICv3, it seems the same pattern
remains unpatched in the GICv5 equivalents:
arch/arm64/kvm/hyp/nvhe/hyp-main.c:handle___vgic_v5_save_apr() {
DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v5_save_apr(kern_hyp_va(cpu_if));
}
arch/arm64/kvm/hyp/nvhe/hyp-main.c:handle___vgic_v5_restore_vmcr_apr() {
DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v5_restore_vmcr_apr(kern_hyp_va(cpu_if));
}
Since kern_hyp_va() is idempotent for hypervisor linear-map pointers, could
the untrusted host (which is an adversary in the pKVM threat model) pass an
arbitrary EL2 memory address here?
Would this allow the host to execute arbitrary memory reads and writes in the
hypervisor (for example, writing the value of SYS_ICH_APR_EL2 to arbitrary
EL2 memory or reading arbitrary memory into sysregs), which might bypass pKVM
isolation?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=6
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 6/8] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
2026-06-26 7:24 ` sashiko-bot
@ 2026-06-26 7:30 ` Fuad Tabba
0 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:30 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvmarm, Oliver Upton, Marc Zyngier
On Fri, 26 Jun 2026 at 08:24, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [Critical] The patch correctly fixes TOCTOU and arbitrary read/write vulnerabilities for GICv3 by preventing EL2 from directly dereferencing host-provided pointers in `handle___vgic_v3_save_aprs` and `handle___vgic_v3_restore_vmcr_aprs`. However, it leaves the exact same vulnerability unpatched in the GICv5 equivalents: `handle___vgic_v5_save_apr` and `handle___vgic_v5_restore_vmcr_apr`. These functions take a host-supplied pointer (`cpu_if`), which they pass directly to the inner save/restore functions via `kern_hyp_va`. Because `kern_hyp_va` is idempotent for hypervisor linear-map pointers, the untrusted host (which is an adversary in the pKVM threat model) can pass an arbitrary EL2 memory address. This allows the host to execute arbitrary memory reads and writes in the hypervisor (e.g., writing the value of `SYS_ICH_APR_EL2` to arbitrary EL2 memory or reading arbitrary memory into sysregs), completely bypassing pKVM isolation.
> --
>
> commit 815e460f9f860b3c06ec7eb259e1f2d75d196c33
> Author: Marc Zyngier <maz@kernel.org>
>
> KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch
>
> The host passes a vgic_v3_cpu_if pointer to the __vgic_v3_save_aprs and
> __vgic_v3_restore_vmcr_aprs hypercalls, which EL2 dereferences
> wholesale. That exposes the host's full VGIC emulation state to the
> hypervisor, against pKVM's isolation goals.
>
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 45d717889f6ea..f3233ee343a39 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>
> [ ... ]
>
> > @@ -508,16 +520,63 @@ static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
> >
> > static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
> > {
> > - DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
> > + struct pkvm_hyp_vcpu *hyp_vcpu;
> > + struct kvm_vcpu *host_vcpu;
> >
> > - __vgic_v3_save_aprs(kern_hyp_va(cpu_if));
> > + host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
> > + &hyp_vcpu);
> > + if (!host_vcpu)
> > + return;
> > +
> > + if (unlikely(hyp_vcpu)) {
> > + struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
> > + int i;
> > +
> > + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> > + __vgic_v3_save_aprs(hyp_cpu_if);
> > +
> > + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> > + host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
> > + for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
> > + host_cpu_if->vgic_ap0r[i] = hyp_cpu_if->vgic_ap0r[i];
> > + host_cpu_if->vgic_ap1r[i] = hyp_cpu_if->vgic_ap1r[i];
> > + }
> > + } else {
> > + __vgic_v3_save_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
> > + }
> > }
> >
> > static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
> > {
> > - DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
> > + struct pkvm_hyp_vcpu *hyp_vcpu;
> > + struct kvm_vcpu *host_vcpu;
> > +
> > + host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
> > + &hyp_vcpu);
> > + if (!host_vcpu)
> > + return;
> >
> > - __vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
> > + if (unlikely(hyp_vcpu)) {
> > + struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
> > + int i;
> > +
> > + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> > + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> > +
> > + hyp_cpu_if->vgic_vmcr = host_cpu_if->vgic_vmcr;
> > + /* Should be a one-off */
> > + hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
> > + ICC_SRE_EL1_DFB |
> > + ICC_SRE_EL1_SRE);
> > + for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
> > + hyp_cpu_if->vgic_ap0r[i] = host_cpu_if->vgic_ap0r[i];
> > + hyp_cpu_if->vgic_ap1r[i] = host_cpu_if->vgic_ap1r[i];
> > + }
> > +
> > + __vgic_v3_restore_vmcr_aprs(hyp_cpu_if);
> > + } else {
> > + __vgic_v3_restore_vmcr_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
> > + }
> > }
>
> [Severity: Critical]
> This is a pre-existing issue, but while this patch correctly fixes the TOCTOU
> and arbitrary read/write vulnerabilities for GICv3, it seems the same pattern
> remains unpatched in the GICv5 equivalents:
>
> arch/arm64/kvm/hyp/nvhe/hyp-main.c:handle___vgic_v5_save_apr() {
> DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
> __vgic_v5_save_apr(kern_hyp_va(cpu_if));
> }
>
> arch/arm64/kvm/hyp/nvhe/hyp-main.c:handle___vgic_v5_restore_vmcr_apr() {
> DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
> __vgic_v5_restore_vmcr_apr(kern_hyp_va(cpu_if));
> }
>
> Since kern_hyp_va() is idempotent for hypervisor linear-map pointers, could
> the untrusted host (which is an adversary in the pKVM threat model) pass an
> arbitrary EL2 memory address here?
>
> Would this allow the host to execute arbitrary memory reads and writes in the
> hypervisor (for example, writing the value of SYS_ICH_APR_EL2 to arbitrary
> EL2 memory or reading arbitrary memory into sysregs), which might bypass pKVM
> isolation?
We covered this too [1]:
GICv5 is not supported in protected mode (is_protected_kvm_enabled()
gate in vgic-v5.c skips device registration entirely), so the v5
handlers are unreachable.
Cheers,
/fuad
[1] https://lore.kernel.org/all/CA+EHjTxSZOwDBwVidTnjkNCzS-F9x=YXfZFQs0pB0OyJWcby7Q@mail.gmail.com/
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=6
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 7/8] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
` (5 preceding siblings ...)
2026-06-26 7:04 ` [PATCH v3 6/8] KVM: arm64: Minimise EL2's exposure of host VGIC state during world switch Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:27 ` sashiko-bot
2026-06-26 7:04 ` [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests Fuad Tabba
7 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
From: Marc Zyngier <maz@kernel.org>
pKVM performs its own world switch for protected VMs but has no
primitives to move the per-vCPU VGIC state between the host and
hypervisor vCPU contexts.
Add flush_hyp_vgic_state() and sync_hyp_vgic_state(). Flush copies
vgic_hcr, the in-use list registers and used_lrs from the host into the
hyp vCPU and pins vgic_sre to a fixed value; sync copies vgic_hcr,
vgic_vmcr and the in-use list registers back. The active priority
registers are handled separately by the save/restore-aprs path.
Bound used_lrs by hyp_gicv3_nr_lr, the cached implemented-LR count,
instead of reading ICH_VTR_EL2 on each entry. That clamps the
host-supplied value and avoids a per-entry sysreg read that is costly
under NV.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Fuad Tabba <fuad.tabba@linux.dev>
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 55 ++++++++++++++++++++++--------
1 file changed, 41 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f25ee39715282..0194965930e61 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -102,6 +102,45 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
}
+static void flush_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+ struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+ struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
+ unsigned int used_lrs, i;
+
+ host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+ hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+
+ used_lrs = host_cpu_if->used_lrs;
+ used_lrs = min(used_lrs, hyp_gicv3_nr_lr);
+
+ hyp_cpu_if->vgic_hcr = host_cpu_if->vgic_hcr;
+ /* Should be a one-off */
+ hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
+ ICC_SRE_EL1_DFB |
+ ICC_SRE_EL1_SRE);
+ hyp_cpu_if->used_lrs = used_lrs;
+
+ for (i = 0; i < used_lrs; i++)
+ hyp_cpu_if->vgic_lr[i] = host_cpu_if->vgic_lr[i];
+}
+
+static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+ struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+ struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
+ unsigned int i;
+
+ host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+ hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+
+ host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
+ host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
+
+ for (i = 0; i < hyp_cpu_if->used_lrs; i++)
+ host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
+}
+
static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
{
struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
@@ -150,13 +189,7 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
hyp_vcpu->vcpu.arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
- hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
-
- /* Bound used_lrs by the number of implemented list registers. */
- hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3.used_lrs =
- min_t(unsigned int,
- hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3.used_lrs,
- hyp_gicv3_nr_lr);
+ flush_hyp_vgic_state(hyp_vcpu);
hyp_vcpu->vcpu.arch.pid = host_vcpu->arch.pid;
}
@@ -164,9 +197,6 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
{
struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
- struct vgic_v3_cpu_if *hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
- struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
- unsigned int i;
fpsimd_sve_sync(&hyp_vcpu->vcpu);
sync_debug_state(hyp_vcpu);
@@ -179,10 +209,7 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
host_vcpu->arch.iflags = hyp_vcpu->vcpu.arch.iflags;
- host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
- host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
- for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
- host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
+ sync_hyp_vgic_state(hyp_vcpu);
}
static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 7/8] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
2026-06-26 7:04 ` [PATCH v3 7/8] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 Fuad Tabba
@ 2026-06-26 7:27 ` sashiko-bot
2026-06-26 7:33 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-06-26 7:27 UTC (permalink / raw)
To: Fuad Tabba; +Cc: Oliver Upton, kvmarm, Marc Zyngier
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Host updates to vgic_vmcr during MMIO emulation are silently discarded and reverted due to missing flush on vCPU entry and unconditional overwrite on vCPU exit.
- [High] flush_hyp_vgic_state() hardcodes vgic_sre to GICv3 values, permanently breaking GICv2 emulation by preventing ICH_VMCR_EL2 from being written.
--
commit 7488e8d4b40f79a0ec0bb726d02c6676c9426897
Author: Marc Zyngier <maz@kernel.org>
KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
This commit introduces flush_hyp_vgic_state() and sync_hyp_vgic_state() to
move per-vCPU VGIC state between host and hypervisor vCPU contexts for
protected VMs. It bounds used_lrs and hardcodes vgic_sre to a fixed value,
replacing the previous direct struct copy.
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index f3233ee343a39..2db56146ec493 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -104,6 +104,45 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
> *host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
> }
>
> +static void flush_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> + struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
> + unsigned int used_lrs, i;
> +
> + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> +
> + used_lrs = host_cpu_if->used_lrs;
> + used_lrs = min(used_lrs, hyp_gicv3_nr_lr);
> +
> + hyp_cpu_if->vgic_hcr = host_cpu_if->vgic_hcr;
[Severity: High]
Does this code lose host-side updates to vgic_vmcr?
When a guest triggers a GICv2 MMIO trap, the host KVM run loop processes
it synchronously and updates host_cpu_if->vgic_vmcr. Because
flush_hyp_vgic_state() does not copy vgic_vmcr into the hypervisor state,
the hypervisor retains its stale value.
When the guest exits, sync_hyp_vgic_state() copies the stale hypervisor
value back, which will permanently overwrite and clobber the host's update.
Could this lead to broken GICv2 CPU interface emulation and lost interrupts?
> + /* Should be a one-off */
> + hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
> + ICC_SRE_EL1_DFB |
> + ICC_SRE_EL1_SRE);
[Severity: High]
Can hardcoding vgic_sre to GICv3 values here break GICv2 emulation?
For VMs using GICv2 emulation on GICv3 hardware, the host sets vgic_sre = 0
to enable compatibility mode. By unconditionally overriding it to GICv3
values here, the !cpu_if->vgic_sre check will evaluate to false in
__vgic_v3_activate_traps(), which prevents the hypervisor from writing to
ICH_VMCR_EL2.
This appears to leave the hardware with a stale or zeroed VMCR and masks
guest interrupts. Is this intended?
> + hyp_cpu_if->used_lrs = used_lrs;
> +
> + for (i = 0; i < used_lrs; i++)
> + hyp_cpu_if->vgic_lr[i] = host_cpu_if->vgic_lr[i];
> +}
> +
> +static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> + struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
> + unsigned int i;
> +
> + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> +
> + host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
> + host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
[Severity: High]
As mentioned above, this unconditionally overwrites the host's vgic_vmcr
with the hypervisor's copy, which was never flushed from the host on entry.
Could this cause the host's MMIO emulation updates to be discarded?
> +
> + for (i = 0; i < hyp_cpu_if->used_lrs; i++)
> + host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=7
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 7/8] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
2026-06-26 7:27 ` sashiko-bot
@ 2026-06-26 7:33 ` Fuad Tabba
0 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:33 UTC (permalink / raw)
To: sashiko-reviews; +Cc: Oliver Upton, kvmarm, Marc Zyngier
On Fri, 26 Jun 2026 at 08:27, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Host updates to vgic_vmcr during MMIO emulation are silently discarded and reverted due to missing flush on vCPU entry and unconditional overwrite on vCPU exit.
VMCR is not part of the per-entry flush/sync; it is handled by the
separate restore_vmcr_aprs/save_aprs cadence at vcpu_load/vcpu_put
time. No code path modifies host_cpu_if->vgic_vmcr between
sync_hyp_vgic_state (exit) and save_aprs (put). GICv2 emulation is
not supported under pKVM.
> - [High] flush_hyp_vgic_state() hardcodes vgic_sre to GICv3 values, permanently breaking GICv2 emulation by preventing ICH_VMCR_EL2 from being written.
GICv2 is not supported under pKVM (vgic_v2_probe returns -ENXIO when
is_protected_kvm_enabled()). The hardcoded SRE value matches the
GICv3-only configuration.
/fuad
> --
>
> commit 7488e8d4b40f79a0ec0bb726d02c6676c9426897
> Author: Marc Zyngier <maz@kernel.org>
>
> KVM: arm64: Add primitives to flush/sync the VGIC state at EL2
>
> This commit introduces flush_hyp_vgic_state() and sync_hyp_vgic_state() to
> move per-vCPU VGIC state between host and hypervisor vCPU contexts for
> protected VMs. It bounds used_lrs and hardcodes vgic_sre to a fixed value,
> replacing the previous direct struct copy.
>
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index f3233ee343a39..2db56146ec493 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -104,6 +104,45 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
> > *host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
> > }
> >
> > +static void flush_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> > + struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
> > + unsigned int used_lrs, i;
> > +
> > + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> > + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> > +
> > + used_lrs = host_cpu_if->used_lrs;
> > + used_lrs = min(used_lrs, hyp_gicv3_nr_lr);
> > +
> > + hyp_cpu_if->vgic_hcr = host_cpu_if->vgic_hcr;
>
> [Severity: High]
> Does this code lose host-side updates to vgic_vmcr?
>
> When a guest triggers a GICv2 MMIO trap, the host KVM run loop processes
> it synchronously and updates host_cpu_if->vgic_vmcr. Because
> flush_hyp_vgic_state() does not copy vgic_vmcr into the hypervisor state,
> the hypervisor retains its stale value.
>
> When the guest exits, sync_hyp_vgic_state() copies the stale hypervisor
> value back, which will permanently overwrite and clobber the host's update.
> Could this lead to broken GICv2 CPU interface emulation and lost interrupts?
>
> > + /* Should be a one-off */
> > + hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
> > + ICC_SRE_EL1_DFB |
> > + ICC_SRE_EL1_SRE);
>
> [Severity: High]
> Can hardcoding vgic_sre to GICv3 values here break GICv2 emulation?
>
> For VMs using GICv2 emulation on GICv3 hardware, the host sets vgic_sre = 0
> to enable compatibility mode. By unconditionally overriding it to GICv3
> values here, the !cpu_if->vgic_sre check will evaluate to false in
> __vgic_v3_activate_traps(), which prevents the hypervisor from writing to
> ICH_VMCR_EL2.
>
> This appears to leave the hardware with a stale or zeroed VMCR and masks
> guest interrupts. Is this intended?
>
> > + hyp_cpu_if->used_lrs = used_lrs;
> > +
> > + for (i = 0; i < used_lrs; i++)
> > + hyp_cpu_if->vgic_lr[i] = host_cpu_if->vgic_lr[i];
> > +}
> > +
> > +static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> > + struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
> > + unsigned int i;
> > +
> > + host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
> > + hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
> > +
> > + host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
> > + host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
>
> [Severity: High]
> As mentioned above, this unconditionally overwrites the host's vgic_vmcr
> with the hypervisor's copy, which was never flushed from the host on entry.
> Could this cause the host's MMIO emulation updates to be discarded?
>
> > +
> > + for (i = 0; i < hyp_cpu_if->used_lrs; i++)
> > + host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
> > +}
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=7
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
2026-06-26 7:04 [PATCH v3 0/8] KVM: arm64: Rework pKVM vCPU state synchronisation Fuad Tabba
` (6 preceding siblings ...)
2026-06-26 7:04 ` [PATCH v3 7/8] KVM: arm64: Add primitives to flush/sync the VGIC state at EL2 Fuad Tabba
@ 2026-06-26 7:04 ` Fuad Tabba
2026-06-26 7:26 ` Vincent Donnefort
2026-06-26 7:29 ` sashiko-bot
7 siblings, 2 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:04 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Will Deacon, Joey Gouly, Steffen Eiden,
Suzuki K Poulose, Zenghui Yu, Vincent Donnefort, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
pKVM copies a non-protected guest's register context between the host
and the hypervisor on every world switch, even when the host never
inspects it. Defer the copy: on entry, flush the host context into the
hyp vCPU only when the host marked it dirty (PKVM_HOST_STATE_DIRTY); on
exit, leave it in the hyp vCPU and copy it back only when the host needs
it, via a __pkvm_vcpu_sync_state hypercall or at vcpu put. A protected
guest's context is copied as before, since lazy sync only helps where
the host is trusted to see the guest's registers.
PC and PSTATE are the exception: they are copied back on every exit so
the kvm_exit tracepoint reports the guest's real exit PC, and the run
loop's vcpu_mode_is_bad_32bit() and SError-masking checks evaluate the
guest's current PSTATE rather than the value left by the previous sync.
The host needs the full context when it is about to read it (trap
handling) or write it (the SError injection that writes ESR_EL1). Sync
both from handle_exit_early(), which runs non-preemptible so the loaded
hyp vCPU is stable without a preempt guard.
Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/kvm/arm.c | 7 +++
arch/arm64/kvm/handle_exit.c | 23 ++++++++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 86 ++++++++++++++++++++++++++++--
5 files changed, 114 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 043495f7fc78b..6e1135b3ded44 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -113,6 +113,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
+ __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
MARKER(__KVM_HOST_SMCCC_FUNC_MAX)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2faa60df847d2..caa39ee5125f2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1068,6 +1068,8 @@ struct kvm_vcpu_arch {
#define INCREMENT_PC __vcpu_single_flag(iflags, BIT(1))
/* Target EL/MODE (not a single flag, but let's abuse the macro) */
#define EXCEPT_MASK __vcpu_single_flag(iflags, GENMASK(3, 1))
+/* Host-set: the hyp flushes the non-protected vCPU state in on entry */
+#define PKVM_HOST_STATE_DIRTY __vcpu_single_flag(iflags, BIT(4))
/* Helpers to encode exceptions with minimum fuss */
#define __EXCEPT_MASK_VAL unpack_vcpu_flag(EXCEPT_MASK)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3732ee9eb0d4e..4e89558d80278 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -733,6 +733,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
if (is_protected_kvm_enabled()) {
kvm_call_hyp(__vgic_v3_save_aprs, &vcpu->arch.vgic_cpu.vgic_v3);
kvm_call_hyp_nvhe(__pkvm_vcpu_put);
+
+ /* __pkvm_vcpu_put implies a sync of the state */
+ if (!kvm_vm_is_protected(vcpu->kvm))
+ vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
}
kvm_vcpu_put_debug(vcpu);
@@ -964,6 +968,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
return ret;
if (is_protected_kvm_enabled()) {
+ /* Start with the vcpu in a dirty state */
+ if (!kvm_vm_is_protected(vcpu->kvm))
+ vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
ret = pkvm_create_hyp_vm(kvm);
if (ret)
return ret;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 54aedf93c78b6..29108e5c0206e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -486,9 +486,32 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
}
}
+static void handle_exit_pkvm_state(struct kvm_vcpu *vcpu, int exception_index)
+{
+ int exception_code = ARM_EXCEPTION_CODE(exception_index);
+
+ if (!is_protected_kvm_enabled() || kvm_vm_is_protected(vcpu->kvm))
+ return;
+
+ /*
+ * Sync the context back when the host will read (trap) or write
+ * (SError) it. Preempt-off here, so the loaded hyp vCPU is stable.
+ */
+ if (exception_code == ARM_EXCEPTION_TRAP ||
+ exception_code == ARM_EXCEPTION_EL1_SERROR ||
+ ARM_SERROR_PENDING(exception_index)) {
+ kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
+ vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+ } else {
+ vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+ }
+}
+
/* For exit types that need handling before we can be preempted */
void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
{
+ handle_exit_pkvm_state(vcpu, exception_index);
+
if (ARM_SERROR_PENDING(exception_index)) {
if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
u64 disr = kvm_vcpu_get_disr(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 0194965930e61..acf53aae4fe43 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -141,6 +141,48 @@ static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
}
+static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu,
+ struct kvm_vcpu *to_vcpu)
+{
+ int i;
+
+ to_vcpu->arch.ctxt.regs = from_vcpu->arch.ctxt.regs;
+ to_vcpu->arch.ctxt.spsr_abt = from_vcpu->arch.ctxt.spsr_abt;
+ to_vcpu->arch.ctxt.spsr_und = from_vcpu->arch.ctxt.spsr_und;
+ to_vcpu->arch.ctxt.spsr_irq = from_vcpu->arch.ctxt.spsr_irq;
+ to_vcpu->arch.ctxt.spsr_fiq = from_vcpu->arch.ctxt.spsr_fiq;
+ to_vcpu->arch.ctxt.fp_regs = from_vcpu->arch.ctxt.fp_regs;
+
+ /*
+ * Copy the sysregs, but don't mess with the timer state which
+ * is directly handled by EL1 and is expected to be preserved.
+ * enum vcpu_sysreg is sparse: VNCR-mapped registers take values
+ * derived from their VNCR page offset, so the timer registers do
+ * not form a contiguous numeric range and must be skipped by name.
+ */
+ for (i = 1; i < NR_SYS_REGS; i++) {
+ switch (i) {
+ case CNTVOFF_EL2:
+ case CNTV_CVAL_EL0:
+ case CNTV_CTL_EL0:
+ case CNTP_CVAL_EL0:
+ case CNTP_CTL_EL0:
+ continue;
+ }
+ to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i];
+ }
+}
+
+static void sync_hyp_vcpu_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+ __copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu);
+}
+
+static void flush_hyp_vcpu_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+ __copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu);
+}
+
static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
{
struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
@@ -170,7 +212,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
fpsimd_sve_flush();
flush_debug_state(hyp_vcpu);
- hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
+ /*
+ * If we deal with a non-protected guest and the state is potentially
+ * dirty (from a host perspective), copy the state back into the hyp
+ * vcpu.
+ */
+ if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+ if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
+ flush_hyp_vcpu_state(hyp_vcpu);
+ } else {
+ hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
+ }
/* __hyp_running_vcpu must be NULL in a guest context. */
hyp_vcpu->vcpu.arch.ctxt.__hyp_running_vcpu = NULL;
@@ -201,9 +253,13 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
fpsimd_sve_sync(&hyp_vcpu->vcpu);
sync_debug_state(hyp_vcpu);
- host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
-
- host_vcpu->arch.hcr_el2 = hyp_vcpu->vcpu.arch.hcr_el2;
+ if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+ host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
+ } else {
+ /* Keep PC (tracepoint) and PSTATE (vcpu_mode_is_bad_32bit) current. */
+ host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
+ host_vcpu->arch.ctxt.regs.pstate = hyp_vcpu->vcpu.arch.ctxt.regs.pstate;
+ }
host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault;
@@ -237,8 +293,27 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
{
struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
- if (hyp_vcpu)
+ if (hyp_vcpu) {
+ struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+ if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) &&
+ !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) {
+ sync_hyp_vcpu_state(hyp_vcpu);
+ }
+
pkvm_put_hyp_vcpu(hyp_vcpu);
+ }
+}
+
+static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt)
+{
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+ return;
+
+ sync_hyp_vcpu_state(hyp_vcpu);
}
static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
@@ -869,6 +944,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_finalize_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
HANDLE_FUNC(__pkvm_vcpu_put),
+ HANDLE_FUNC(__pkvm_vcpu_sync_state),
HANDLE_FUNC(__pkvm_tlb_flush_vmid),
};
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
2026-06-26 7:04 ` [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests Fuad Tabba
@ 2026-06-26 7:26 ` Vincent Donnefort
2026-06-26 7:29 ` sashiko-bot
1 sibling, 0 replies; 18+ messages in thread
From: Vincent Donnefort @ 2026-06-26 7:26 UTC (permalink / raw)
To: Fuad Tabba
Cc: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
linux-kernel, Catalin Marinas, Will Deacon, Joey Gouly,
Steffen Eiden, Suzuki K Poulose, Zenghui Yu, Quentin Perret,
Sebastian Ene, Hyunwoo Kim, Fuad Tabba
On Fri, Jun 26, 2026 at 08:04:08AM +0100, Fuad Tabba wrote:
> pKVM copies a non-protected guest's register context between the host
> and the hypervisor on every world switch, even when the host never
> inspects it. Defer the copy: on entry, flush the host context into the
> hyp vCPU only when the host marked it dirty (PKVM_HOST_STATE_DIRTY); on
> exit, leave it in the hyp vCPU and copy it back only when the host needs
> it, via a __pkvm_vcpu_sync_state hypercall or at vcpu put. A protected
> guest's context is copied as before, since lazy sync only helps where
> the host is trusted to see the guest's registers.
>
> PC and PSTATE are the exception: they are copied back on every exit so
> the kvm_exit tracepoint reports the guest's real exit PC, and the run
> loop's vcpu_mode_is_bad_32bit() and SError-masking checks evaluate the
> guest's current PSTATE rather than the value left by the previous sync.
>
> The host needs the full context when it is about to read it (trap
> handling) or write it (the SError injection that writes ESR_EL1). Sync
> both from handle_exit_early(), which runs non-preemptible so the loaded
> hyp vCPU is stable without a preempt guard.
>
> Signed-off-by: Fuad Tabba <fuad.tabba@linux.dev>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
> ---
> arch/arm64/include/asm/kvm_asm.h | 1 +
> arch/arm64/include/asm/kvm_host.h | 2 +
> arch/arm64/kvm/arm.c | 7 +++
> arch/arm64/kvm/handle_exit.c | 23 ++++++++
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 86 ++++++++++++++++++++++++++++--
> 5 files changed, 114 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 043495f7fc78b..6e1135b3ded44 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -113,6 +113,7 @@ enum __kvm_host_smccc_func {
> __KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
> __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
> __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> + __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
> __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
>
> MARKER(__KVM_HOST_SMCCC_FUNC_MAX)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2faa60df847d2..caa39ee5125f2 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1068,6 +1068,8 @@ struct kvm_vcpu_arch {
> #define INCREMENT_PC __vcpu_single_flag(iflags, BIT(1))
> /* Target EL/MODE (not a single flag, but let's abuse the macro) */
> #define EXCEPT_MASK __vcpu_single_flag(iflags, GENMASK(3, 1))
> +/* Host-set: the hyp flushes the non-protected vCPU state in on entry */
> +#define PKVM_HOST_STATE_DIRTY __vcpu_single_flag(iflags, BIT(4))
>
> /* Helpers to encode exceptions with minimum fuss */
> #define __EXCEPT_MASK_VAL unpack_vcpu_flag(EXCEPT_MASK)
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 3732ee9eb0d4e..4e89558d80278 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -733,6 +733,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> if (is_protected_kvm_enabled()) {
> kvm_call_hyp(__vgic_v3_save_aprs, &vcpu->arch.vgic_cpu.vgic_v3);
> kvm_call_hyp_nvhe(__pkvm_vcpu_put);
> +
> + /* __pkvm_vcpu_put implies a sync of the state */
> + if (!kvm_vm_is_protected(vcpu->kvm))
> + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> }
>
> kvm_vcpu_put_debug(vcpu);
> @@ -964,6 +968,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> return ret;
>
> if (is_protected_kvm_enabled()) {
> + /* Start with the vcpu in a dirty state */
> + if (!kvm_vm_is_protected(vcpu->kvm))
> + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> ret = pkvm_create_hyp_vm(kvm);
> if (ret)
> return ret;
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 54aedf93c78b6..29108e5c0206e 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -486,9 +486,32 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
> }
> }
>
> +static void handle_exit_pkvm_state(struct kvm_vcpu *vcpu, int exception_index)
> +{
> + int exception_code = ARM_EXCEPTION_CODE(exception_index);
> +
> + if (!is_protected_kvm_enabled() || kvm_vm_is_protected(vcpu->kvm))
> + return;
> +
> + /*
> + * Sync the context back when the host will read (trap) or write
> + * (SError) it. Preempt-off here, so the loaded hyp vCPU is stable.
> + */
> + if (exception_code == ARM_EXCEPTION_TRAP ||
> + exception_code == ARM_EXCEPTION_EL1_SERROR ||
> + ARM_SERROR_PENDING(exception_index)) {
> + kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> + } else {
> + vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> + }
> +}
> +
> /* For exit types that need handling before we can be preempted */
> void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
> {
> + handle_exit_pkvm_state(vcpu, exception_index);
> +
> if (ARM_SERROR_PENDING(exception_index)) {
> if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
> u64 disr = kvm_vcpu_get_disr(vcpu);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 0194965930e61..acf53aae4fe43 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -141,6 +141,48 @@ static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
> }
>
> +static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu,
> + struct kvm_vcpu *to_vcpu)
> +{
> + int i;
> +
> + to_vcpu->arch.ctxt.regs = from_vcpu->arch.ctxt.regs;
> + to_vcpu->arch.ctxt.spsr_abt = from_vcpu->arch.ctxt.spsr_abt;
> + to_vcpu->arch.ctxt.spsr_und = from_vcpu->arch.ctxt.spsr_und;
> + to_vcpu->arch.ctxt.spsr_irq = from_vcpu->arch.ctxt.spsr_irq;
> + to_vcpu->arch.ctxt.spsr_fiq = from_vcpu->arch.ctxt.spsr_fiq;
> + to_vcpu->arch.ctxt.fp_regs = from_vcpu->arch.ctxt.fp_regs;
> +
> + /*
> + * Copy the sysregs, but don't mess with the timer state which
> + * is directly handled by EL1 and is expected to be preserved.
> + * enum vcpu_sysreg is sparse: VNCR-mapped registers take values
> + * derived from their VNCR page offset, so the timer registers do
> + * not form a contiguous numeric range and must be skipped by name.
> + */
> + for (i = 1; i < NR_SYS_REGS; i++) {
> + switch (i) {
> + case CNTVOFF_EL2:
> + case CNTV_CVAL_EL0:
> + case CNTV_CTL_EL0:
> + case CNTP_CVAL_EL0:
> + case CNTP_CTL_EL0:
> + continue;
> + }
> + to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i];
> + }
> +}
> +
> +static void sync_hyp_vcpu_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> + __copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu);
> +}
> +
> +static void flush_hyp_vcpu_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> + __copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu);
> +}
> +
> static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> {
> struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> @@ -170,7 +212,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> fpsimd_sve_flush();
> flush_debug_state(hyp_vcpu);
>
> - hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
> + /*
> + * If we deal with a non-protected guest and the state is potentially
> + * dirty (from a host perspective), copy the state back into the hyp
> + * vcpu.
> + */
> + if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> + if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> + flush_hyp_vcpu_state(hyp_vcpu);
> + } else {
> + hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
> + }
>
> /* __hyp_running_vcpu must be NULL in a guest context. */
> hyp_vcpu->vcpu.arch.ctxt.__hyp_running_vcpu = NULL;
> @@ -201,9 +253,13 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> fpsimd_sve_sync(&hyp_vcpu->vcpu);
> sync_debug_state(hyp_vcpu);
>
> - host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
> -
> - host_vcpu->arch.hcr_el2 = hyp_vcpu->vcpu.arch.hcr_el2;
> + if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> + host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
> + } else {
> + /* Keep PC (tracepoint) and PSTATE (vcpu_mode_is_bad_32bit) current. */
> + host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
> + host_vcpu->arch.ctxt.regs.pstate = hyp_vcpu->vcpu.arch.ctxt.regs.pstate;
> + }
>
> host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault;
>
> @@ -237,8 +293,27 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
> {
> struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>
> - if (hyp_vcpu)
> + if (hyp_vcpu) {
> + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> +
> + if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) &&
> + !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) {
> + sync_hyp_vcpu_state(hyp_vcpu);
> + }
> +
> pkvm_put_hyp_vcpu(hyp_vcpu);
> + }
> +}
> +
> +static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt)
> +{
> + struct pkvm_hyp_vcpu *hyp_vcpu;
> +
> + hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> + if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> + return;
> +
> + sync_hyp_vcpu_state(hyp_vcpu);
> }
>
> static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
> @@ -869,6 +944,7 @@ static const hcall_t host_hcall[] = {
> HANDLE_FUNC(__pkvm_finalize_teardown_vm),
> HANDLE_FUNC(__pkvm_vcpu_load),
> HANDLE_FUNC(__pkvm_vcpu_put),
> + HANDLE_FUNC(__pkvm_vcpu_sync_state),
> HANDLE_FUNC(__pkvm_tlb_flush_vmid),
> };
>
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
2026-06-26 7:04 ` [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests Fuad Tabba
2026-06-26 7:26 ` Vincent Donnefort
@ 2026-06-26 7:29 ` sashiko-bot
2026-06-26 7:36 ` Fuad Tabba
1 sibling, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-06-26 7:29 UTC (permalink / raw)
To: Fuad Tabba; +Cc: Oliver Upton, Marc Zyngier, kvmarm
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] Asynchronous host modifications to non-protected guest sysregs (e.g., PMU overflows updating PMOVSSET_EL0) are lost because lazy vCPU state sync clears the dirty flag on IRQ exits and later overwrites the host's updates with stale hypervisor state.
- [High] The patch removes the synchronization of `hcr_el2` from the hypervisor back to the host on guest exit, breaking the delivery tracking of virtual SErrors and causing infinite virtual SError loops.
Pre-existing issues:
- [Critical] pKVM completely breaks register confidentiality and integrity for protected VMs by copying the entire execution context (`arch.ctxt`) to and from untrusted host memory on every world switch.
--
commit 44f38b89413b112e6bbd7b360cf03c7937ed43a4
Author: Fuad Tabba <fuad.tabba@linux.dev>
KVM: arm64: Implement lazy vCPU state sync for non-protected guests
pKVM copies a non-protected guest's register context between the host
and the hypervisor on every world switch. This commit defers the copy
by flushing context on entry only if marked dirty, and syncing back
on exit only when requested.
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 54aedf93c78b6..29108e5c0206e 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
[ ... ]
> @@ -486,9 +486,32 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
> }
> }
>
> +static void handle_exit_pkvm_state(struct kvm_vcpu *vcpu, int exception_index)
> +{
> + int exception_code = ARM_EXCEPTION_CODE(exception_index);
> +
> + if (!is_protected_kvm_enabled() || kvm_vm_is_protected(vcpu->kvm))
> + return;
> +
> + /*
> + * Sync the context back when the host will read (trap) or write
> + * (SError) it. Preempt-off here, so the loaded hyp vCPU is stable.
> + */
> + if (exception_code == ARM_EXCEPTION_TRAP ||
> + exception_code == ARM_EXCEPTION_EL1_SERROR ||
> + ARM_SERROR_PENDING(exception_index)) {
> + kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> + } else {
> + vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
[Severity: High]
Is it possible for asynchronous host updates to be lost here? If a physical
IRQ (like a PMU overflow) causes a guest exit, this else branch clears
PKVM_HOST_STATE_DIRTY.
If the host IRQ handler then updates PMOVSSET_EL0 in the host's
vcpu->arch.ctxt, the next guest entry in flush_hyp_vcpu() will skip
copying sys_regs to the hyp vCPU. The guest will see stale state.
If the guest later triggers a TRAP exit, __pkvm_vcpu_sync_state() is
called, which copies the stale hypervisor sys_regs back to the host,
permanently erasing the host's asynchronous update.
> + }
> +}
[ ... ]
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 2db56146ec493..4a8445b636e60 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
[ ... ]
> @@ -172,7 +214,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> fpsimd_sve_flush();
> flush_debug_state(hyp_vcpu);
>
> - hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
> + /*
> + * If we deal with a non-protected guest and the state is potentially
> + * dirty (from a host perspective), copy the state back into the hyp
> + * vcpu.
> + */
> + if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> + if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> + flush_hyp_vcpu_state(hyp_vcpu);
> + } else {
> + hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
[Severity: Critical]
This is a pre-existing issue, but does trusting arch.ctxt violate the pKVM
threat model? Copying the entire execution context directly from untrusted
host memory into the protected guest context on entry breaks integrity,
allowing the host to arbitrarily modify a protected guest's registers.
> + }
>
> /* __hyp_running_vcpu must be NULL in a guest context. */
> hyp_vcpu->vcpu.arch.ctxt.__hyp_running_vcpu = NULL;
[ ... ]
> @@ -203,9 +255,13 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> fpsimd_sve_sync(&hyp_vcpu->vcpu);
> sync_debug_state(hyp_vcpu);
>
> - host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
> -
> - host_vcpu->arch.hcr_el2 = hyp_vcpu->vcpu.arch.hcr_el2;
[Severity: High]
Could removing the hcr_el2 sync result in infinite virtual SError loops?
When the host injects a virtual SError by setting HCR_VSE, flush_hyp_vcpu()
flows this bit to the hyp vCPU. The hardware clears the physical
HCR_EL2.VSE bit when the guest takes the virtual SError, and KVM clears it
in the hyp vCPU's hcr_el2.
Without syncing this back to the host here, the host's hcr_el2 retains
the stale HCR_VSE=1. On the next entry, flush_hyp_vcpu() will re-apply the
stale HCR_VSE bit and re-inject the SError continuously.
> + if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> + host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
[Severity: Critical]
This is a pre-existing issue, but can writing arch.ctxt out to the host
leak sensitive register state? For protected VMs, writing the private guest
context directly out to the untrusted host on exit breaks confidentiality.
> + } else {
> + /* Keep PC (tracepoint) and PSTATE (vcpu_mode_is_bad_32bit) current. */
> + host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
> + host_vcpu->arch.ctxt.regs.pstate = hyp_vcpu->vcpu.arch.ctxt.regs.pstate;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=8
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
2026-06-26 7:29 ` sashiko-bot
@ 2026-06-26 7:36 ` Fuad Tabba
0 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2026-06-26 7:36 UTC (permalink / raw)
To: sashiko-reviews; +Cc: Oliver Upton, Marc Zyngier, kvmarm
On Fri, 26 Jun 2026 at 08:29, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>
> New issues:
> - [High] Asynchronous host modifications to non-protected guest sysregs (e.g., PMU overflows updating PMOVSSET_EL0) are lost because lazy vCPU state sync clears the dirty flag on IRQ exits and later overwrites the host's updates with stale hypervisor state.
No code path modifies vcpu->arch.ctxt.sys_regs[] during host IRQ
handling. Guest PMU state changes go through the trap emulation path
(ARM_EXCEPTION_TRAP), which triggers the sync.
> - [High] The patch removes the synchronization of `hcr_el2` from the hypervisor back to the host on guest exit, breaking the delivery tracking of virtual SErrors and causing infinite virtual SError loops.
HCR_VSE is synced back explicitly (hyp-main.c, sync_hyp_vcpu):
host_vcpu->arch.hcr_el2 &= ~HCR_VSE;
host_vcpu->arch.hcr_el2 |= hyp_vcpu->vcpu.arch.hcr_el2 & HCR_VSE;
The full hcr_el2 copy was replaced by targeted per-bit sync, not
removed.
>
> Pre-existing issues:
> - [Critical] pKVM completely breaks register confidentiality and integrity for protected VMs by copying the entire execution context (`arch.ctxt`) to and from untrusted host memory on every world switch.
This is the stated scope of the follow-up series, as described in
the cover letter: "The follow-up completes the job for protected
VMs: state isolation, PSCI handling at EL2, and the resulting API
behaviour."
And that's a wrap!
/fuad
> --
>
> commit 44f38b89413b112e6bbd7b360cf03c7937ed43a4
> Author: Fuad Tabba <fuad.tabba@linux.dev>
>
> KVM: arm64: Implement lazy vCPU state sync for non-protected guests
>
> pKVM copies a non-protected guest's register context between the host
> and the hypervisor on every world switch. This commit defers the copy
> by flushing context on entry only if marked dirty, and syncing back
> on exit only when requested.
>
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 54aedf93c78b6..29108e5c0206e 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> [ ... ]
> > @@ -486,9 +486,32 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
> > }
> > }
> >
> > +static void handle_exit_pkvm_state(struct kvm_vcpu *vcpu, int exception_index)
> > +{
> > + int exception_code = ARM_EXCEPTION_CODE(exception_index);
> > +
> > + if (!is_protected_kvm_enabled() || kvm_vm_is_protected(vcpu->kvm))
> > + return;
> > +
> > + /*
> > + * Sync the context back when the host will read (trap) or write
> > + * (SError) it. Preempt-off here, so the loaded hyp vCPU is stable.
> > + */
> > + if (exception_code == ARM_EXCEPTION_TRAP ||
> > + exception_code == ARM_EXCEPTION_EL1_SERROR ||
> > + ARM_SERROR_PENDING(exception_index)) {
> > + kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> > + vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> > + } else {
> > + vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
>
> [Severity: High]
> Is it possible for asynchronous host updates to be lost here? If a physical
> IRQ (like a PMU overflow) causes a guest exit, this else branch clears
> PKVM_HOST_STATE_DIRTY.
>
> If the host IRQ handler then updates PMOVSSET_EL0 in the host's
> vcpu->arch.ctxt, the next guest entry in flush_hyp_vcpu() will skip
> copying sys_regs to the hyp vCPU. The guest will see stale state.
>
> If the guest later triggers a TRAP exit, __pkvm_vcpu_sync_state() is
> called, which copies the stale hypervisor sys_regs back to the host,
> permanently erasing the host's asynchronous update.
>
> > + }
> > +}
> [ ... ]
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 2db56146ec493..4a8445b636e60 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> [ ... ]
> > @@ -172,7 +214,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> > fpsimd_sve_flush();
> > flush_debug_state(hyp_vcpu);
> >
> > - hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
> > + /*
> > + * If we deal with a non-protected guest and the state is potentially
> > + * dirty (from a host perspective), copy the state back into the hyp
> > + * vcpu.
> > + */
> > + if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> > + if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> > + flush_hyp_vcpu_state(hyp_vcpu);
> > + } else {
> > + hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
>
> [Severity: Critical]
> This is a pre-existing issue, but does trusting arch.ctxt violate the pKVM
> threat model? Copying the entire execution context directly from untrusted
> host memory into the protected guest context on entry breaks integrity,
> allowing the host to arbitrarily modify a protected guest's registers.
>
> > + }
> >
> > /* __hyp_running_vcpu must be NULL in a guest context. */
> > hyp_vcpu->vcpu.arch.ctxt.__hyp_running_vcpu = NULL;
> [ ... ]
> > @@ -203,9 +255,13 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> > fpsimd_sve_sync(&hyp_vcpu->vcpu);
> > sync_debug_state(hyp_vcpu);
> >
> > - host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
> > -
> > - host_vcpu->arch.hcr_el2 = hyp_vcpu->vcpu.arch.hcr_el2;
>
> [Severity: High]
> Could removing the hcr_el2 sync result in infinite virtual SError loops?
>
> When the host injects a virtual SError by setting HCR_VSE, flush_hyp_vcpu()
> flows this bit to the hyp vCPU. The hardware clears the physical
> HCR_EL2.VSE bit when the guest takes the virtual SError, and KVM clears it
> in the hyp vCPU's hcr_el2.
>
> Without syncing this back to the host here, the host's hcr_el2 retains
> the stale HCR_VSE=1. On the next entry, flush_hyp_vcpu() will re-apply the
> stale HCR_VSE bit and re-inject the SError continuously.
>
> > + if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> > + host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
>
> [Severity: Critical]
> This is a pre-existing issue, but can writing arch.ctxt out to the host
> leak sensitive register state? For protected VMs, writing the private guest
> context directly out to the untrusted host on exit breaks confidentiality.
>
> > + } else {
> > + /* Keep PC (tracepoint) and PSTATE (vcpu_mode_is_bad_32bit) current. */
> > + host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
> > + host_vcpu->arch.ctxt.regs.pstate = hyp_vcpu->vcpu.arch.ctxt.regs.pstate;
> > + }
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260626070408.3420953-1-fuad.tabba@linux.dev?part=8
^ permalink raw reply [flat|nested] 18+ messages in thread