* [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE
@ 2023-10-06 9:35 Oliver Upton
2023-10-06 9:35 ` [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Oliver Upton
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Oliver Upton @ 2023-10-06 9:35 UTC (permalink / raw)
To: kvmarm
Cc: kvm, Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
Oliver Upton
Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
entry/exit in VHE mode as the host is running at EL2. Despite this KVM
reloads the stage-2 on every guest entry, which is needless.
This series moves the setup of the stage-2 MMU context to vcpu_load()
when running in VHE mode. This is likely to be a win across the board,
but also allows us to remove an ISB on the guest entry path for systems
with one of the speculative AT errata.
None of my machines affected by the AT errata are VHE-capable, so it'd
be appreciated if someone could give this series a go and make sure I
haven't wrecked anything.
Oliver Upton (3):
KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()
KVM: arm64: Rename helpers for VHE vCPU load/put
KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()
arch/arm64/include/asm/kvm_host.h | 4 ++--
arch/arm64/include/asm/kvm_hyp.h | 2 ++
arch/arm64/kvm/arm.c | 4 ++--
arch/arm64/kvm/hyp/vhe/switch.c | 33 ++++++++++++++++++------------
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 11 ++++------
arch/arm64/kvm/hyp/vhe/tlb.c | 1 -
6 files changed, 30 insertions(+), 25 deletions(-)
base-commit: 6465e260f48790807eef06b583b38ca9789b6072
--
2.42.0.609.gbb76f46606-goog
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()
2023-10-06 9:35 [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Oliver Upton
@ 2023-10-06 9:35 ` Oliver Upton
2023-10-06 9:41 ` Oliver Upton
2023-10-06 9:35 ` [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put Oliver Upton
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Oliver Upton @ 2023-10-06 9:35 UTC (permalink / raw)
To: kvmarm
Cc: kvm, Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
Oliver Upton
HCR_EL2.TGE=0 is sufficient to disable stage-2 translation, so there's
no need to explicitly zero VTTBR_EL2. Note that this is exactly what we
do on the guest exit path in __kvm_vcpu_run_vhe() already.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/kvm/hyp/vhe/tlb.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 46bd43f61d76..f3f2e142e4f4 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -66,7 +66,6 @@ static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
* We're done with the TLB operation, let's restore the host's
* view of HCR_EL2.
*/
- write_sysreg(0, vttbr_el2);
write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
isb();
--
2.42.0.609.gbb76f46606-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put
2023-10-06 9:35 [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Oliver Upton
2023-10-06 9:35 ` [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Oliver Upton
@ 2023-10-06 9:35 ` Oliver Upton
2023-10-06 22:06 ` kernel test robot
2023-10-06 9:36 ` [PATCH 3/3] KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() Oliver Upton
2023-10-06 11:13 ` [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Marc Zyngier
3 siblings, 1 reply; 11+ messages in thread
From: Oliver Upton @ 2023-10-06 9:35 UTC (permalink / raw)
To: kvmarm
Cc: kvm, Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
Oliver Upton
The names for the helpers we expose to the general KVM code are a bit
imprecise; they switch the EL0 + EL1 sysreg context and setup trap
controls that do not need to change for every guest entry/exit. Rename +
shuffle things around a bit in preparation for loading the stage-2 MMU
context on vcpu_load().
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/include/asm/kvm_host.h | 4 ++--
arch/arm64/include/asm/kvm_hyp.h | 2 ++
arch/arm64/kvm/arm.c | 4 ++--
arch/arm64/kvm/hyp/vhe/switch.c | 18 +++++++++++++++---
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 11 ++++-------
5 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index af06ccb7ee34..e2d38c7d6555 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1109,8 +1109,8 @@ static inline bool kvm_set_pmuserenr(u64 val)
}
#endif
-void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
-void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu);
+void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
+void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu);
int __init kvm_set_ipa_limit(void);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 66efd67ea7e8..80c779d1307a 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -93,6 +93,8 @@ void __timer_disable_traps(struct kvm_vcpu *vcpu);
void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt);
void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt);
#else
+void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu);
+void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu);
void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4866b3f7b4ea..39c969c05990 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -448,7 +448,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_vgic_load(vcpu);
kvm_timer_vcpu_load(vcpu);
if (has_vhe())
- kvm_vcpu_load_sysregs_vhe(vcpu);
+ kvm_vcpu_load_vhe(vcpu);
kvm_arch_vcpu_load_fp(vcpu);
kvm_vcpu_pmu_restore_guest(vcpu);
if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
@@ -472,7 +472,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_arch_vcpu_put_debug_state_flags(vcpu);
kvm_arch_vcpu_put_fp(vcpu);
if (has_vhe())
- kvm_vcpu_put_sysregs_vhe(vcpu);
+ kvm_vcpu_put_vhe(vcpu);
kvm_timer_vcpu_put(vcpu);
kvm_vgic_put(vcpu);
kvm_vcpu_pmu_restore_host(vcpu);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 6537f58b1a8c..d05b6a08dcde 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -93,12 +93,12 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
NOKPROBE_SYMBOL(__deactivate_traps);
/*
- * Disable IRQs in {activate,deactivate}_traps_vhe_{load,put}() to
+ * Disable IRQs in __vcpu_{load,put}_{activate,deactivate}_traps() to
* prevent a race condition between context switching of PMUSERENR_EL0
* in __{activate,deactivate}_traps_common() and IPIs that attempts to
* update PMUSERENR_EL0. See also kvm_set_pmuserenr().
*/
-void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
+static void __vcpu_load_activate_traps(struct kvm_vcpu *vcpu)
{
unsigned long flags;
@@ -107,7 +107,7 @@ void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
local_irq_restore(flags);
}
-void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
+static void __vcpu_put_deactivate_traps(struct kvm_vcpu *vcpu)
{
unsigned long flags;
@@ -116,6 +116,18 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
local_irq_restore(flags);
}
+void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
+{
+ __vcpu_load_switch_sysregs(vcpu);
+ __vcpu_load_activate_traps(vcpu);
+}
+
+void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
+{
+ __vcpu_put_deactivate_traps(vcpu);
+ __vcpu_put_switch_sysregs(vcpu);
+}
+
static const exit_handler_fn hyp_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = NULL,
[ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32,
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index b35a178e7e0d..8e1e0d5033b6 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -52,7 +52,7 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
/**
- * kvm_vcpu_load_sysregs_vhe - Load guest system registers to the physical CPU
+ * __vcpu_load_switch_sysregs - Load guest system registers to the physical CPU
*
* @vcpu: The VCPU pointer
*
@@ -62,7 +62,7 @@ NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
* and loading system register state early avoids having to load them on
* every entry to the VM.
*/
-void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
+void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
@@ -92,12 +92,10 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
__sysreg_restore_el1_state(guest_ctxt);
vcpu_set_flag(vcpu, SYSREGS_ON_CPU);
-
- activate_traps_vhe_load(vcpu);
}
/**
- * kvm_vcpu_put_sysregs_vhe - Restore host system registers to the physical CPU
+ * __vcpu_put_switch_syregs - Restore host system registers to the physical CPU
*
* @vcpu: The VCPU pointer
*
@@ -107,13 +105,12 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
* and deferring saving system register state until we're no longer running the
* VCPU avoids having to save them on every exit from the VM.
*/
-void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
+void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
- deactivate_traps_vhe_put(vcpu);
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);
--
2.42.0.609.gbb76f46606-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()
2023-10-06 9:35 [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Oliver Upton
2023-10-06 9:35 ` [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Oliver Upton
2023-10-06 9:35 ` [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put Oliver Upton
@ 2023-10-06 9:36 ` Oliver Upton
2023-10-06 11:13 ` [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Marc Zyngier
3 siblings, 0 replies; 11+ messages in thread
From: Oliver Upton @ 2023-10-06 9:36 UTC (permalink / raw)
To: kvmarm
Cc: kvm, Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
Oliver Upton
To date the VHE code has aggressively reloaded the stage-2 MMU context
on every guest entry, despite the fact that this isn't necessary. This
was probably done for consistency with the nVHE code, which needs to
switch in/out the stage-2 MMU context as both the host and guest run at
EL1.
Hoist __stage2_load() into kvm_vcpu_load_vhe(), thus avoiding a reload
on every guest entry/exit. This is likely to be beneficial to systems
with one of the speculative AT errata, as there is now one less context
synchronization event on the guest entry path. Additionally, it is
possible that implementations have hitched correctness mitigations on
writes to VTTBR_EL2, which are now elided on guest re-entry.
Note that __tlb_switch_to_guest() is deliberately left untouched as it
can be called outside the context of a running vCPU.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/kvm/hyp/vhe/switch.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index d05b6a08dcde..b0cafd7c5f8f 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -120,6 +120,7 @@ void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
{
__vcpu_load_switch_sysregs(vcpu);
__vcpu_load_activate_traps(vcpu);
+ __load_stage2(vcpu->arch.hw_mmu, vcpu->arch.hw_mmu->arch);
}
void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
@@ -182,17 +183,11 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_save_host_state_vhe(host_ctxt);
/*
- * ARM erratum 1165522 requires us to configure both stage 1 and
- * stage 2 translation for the guest context before we clear
- * HCR_EL2.TGE.
- *
- * We have already configured the guest's stage 1 translation in
- * kvm_vcpu_load_sysregs_vhe above. We must now call
- * __load_stage2 before __activate_traps, because
- * __load_stage2 configures stage 2 translation, and
- * __activate_traps clear HCR_EL2.TGE (among other things).
+ * Note that ARM erratum 1165522 requires us to configure both stage 1
+ * and stage 2 translation for the guest context before we clear
+ * HCR_EL2.TGE. The stage 1 and stage 2 guest context has already been
+ * loaded on the CPU in kvm_vcpu_load_vhe().
*/
- __load_stage2(vcpu->arch.hw_mmu, vcpu->arch.hw_mmu->arch);
__activate_traps(vcpu);
__kvm_adjust_pc(vcpu);
--
2.42.0.609.gbb76f46606-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()
2023-10-06 9:35 ` [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Oliver Upton
@ 2023-10-06 9:41 ` Oliver Upton
0 siblings, 0 replies; 11+ messages in thread
From: Oliver Upton @ 2023-10-06 9:41 UTC (permalink / raw)
To: kvmarm; +Cc: kvm, Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu
On Fri, Oct 06, 2023 at 09:35:58AM +0000, Oliver Upton wrote:
> HCR_EL2.TGE=0 is sufficient to disable stage-2 translation, so there's
TGE=1, obviously :)
> no need to explicitly zero VTTBR_EL2. Note that this is exactly what we
> do on the guest exit path in __kvm_vcpu_run_vhe() already.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> arch/arm64/kvm/hyp/vhe/tlb.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 46bd43f61d76..f3f2e142e4f4 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -66,7 +66,6 @@ static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
> * We're done with the TLB operation, let's restore the host's
> * view of HCR_EL2.
> */
> - write_sysreg(0, vttbr_el2);
> write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> isb();
>
> --
> 2.42.0.609.gbb76f46606-goog
>
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE
2023-10-06 9:35 [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Oliver Upton
` (2 preceding siblings ...)
2023-10-06 9:36 ` [PATCH 3/3] KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() Oliver Upton
@ 2023-10-06 11:13 ` Marc Zyngier
2023-10-06 12:34 ` Marc Zyngier
3 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2023-10-06 11:13 UTC (permalink / raw)
To: Oliver Upton; +Cc: kvmarm, kvm, James Morse, Suzuki K Poulose, Zenghui Yu
On Fri, 06 Oct 2023 10:35:57 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
> entry/exit in VHE mode as the host is running at EL2. Despite this KVM
> reloads the stage-2 on every guest entry, which is needless.
>
> This series moves the setup of the stage-2 MMU context to vcpu_load()
> when running in VHE mode. This is likely to be a win across the board,
> but also allows us to remove an ISB on the guest entry path for systems
> with one of the speculative AT errata.
>
> None of my machines affected by the AT errata are VHE-capable, so it'd
> be appreciated if someone could give this series a go and make sure I
> haven't wrecked anything.
It totally breaks on my A55 board. Running a single guest seems OK,
but running a number of the concurrently makes them explode early on
(faults in EFI...)
I guess we end-up running with the wrong VTTBR at times, which would
be interesting...
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE
2023-10-06 11:13 ` [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Marc Zyngier
@ 2023-10-06 12:34 ` Marc Zyngier
2023-10-06 13:33 ` Marc Zyngier
0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2023-10-06 12:34 UTC (permalink / raw)
To: Oliver Upton; +Cc: kvmarm, kvm, James Morse, Suzuki K Poulose, Zenghui Yu
On Fri, 06 Oct 2023 12:13:00 +0100,
Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 06 Oct 2023 10:35:57 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
> > entry/exit in VHE mode as the host is running at EL2. Despite this KVM
> > reloads the stage-2 on every guest entry, which is needless.
> >
> > This series moves the setup of the stage-2 MMU context to vcpu_load()
> > when running in VHE mode. This is likely to be a win across the board,
> > but also allows us to remove an ISB on the guest entry path for systems
> > with one of the speculative AT errata.
> >
> > None of my machines affected by the AT errata are VHE-capable, so it'd
> > be appreciated if someone could give this series a go and make sure I
> > haven't wrecked anything.
>
> It totally breaks on my A55 board. Running a single guest seems OK,
> but running a number of the concurrently makes them explode early on
> (faults in EFI...)
>
> I guess we end-up running with the wrong VTTBR at times, which would
> be interesting...
Fun fact:
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index b0cafd7c5f8f..40c84db5884a 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -195,6 +195,11 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
+ WARN_ONCE(kvm_get_vttbr(vcpu->arch.hw_mmu) != read_sysreg(vttbr_el2),
+ "Oh crap %llx vs %llx\n",
+ kvm_get_vttbr(vcpu->arch.hw_mmu),
+ read_sysreg(vttbr_el2));
+
if (is_hyp_ctxt(vcpu))
vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
else
[ 36.190355] Oh crap 10000057a6001 vs 57a6001
My bet is that the VMID isn't allocated on first load, and everything
goes downhill from there.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE
2023-10-06 12:34 ` Marc Zyngier
@ 2023-10-06 13:33 ` Marc Zyngier
2023-10-06 15:03 ` Marc Zyngier
0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2023-10-06 13:33 UTC (permalink / raw)
To: Oliver Upton; +Cc: kvmarm, kvm, James Morse, Suzuki K Poulose, Zenghui Yu
On Fri, 06 Oct 2023 13:34:34 +0100,
Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 06 Oct 2023 12:13:00 +0100,
> Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 06 Oct 2023 10:35:57 +0100,
> > Oliver Upton <oliver.upton@linux.dev> wrote:
> > >
> > > Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
> > > entry/exit in VHE mode as the host is running at EL2. Despite this KVM
> > > reloads the stage-2 on every guest entry, which is needless.
> > >
> > > This series moves the setup of the stage-2 MMU context to vcpu_load()
> > > when running in VHE mode. This is likely to be a win across the board,
> > > but also allows us to remove an ISB on the guest entry path for systems
> > > with one of the speculative AT errata.
> > >
> > > None of my machines affected by the AT errata are VHE-capable, so it'd
> > > be appreciated if someone could give this series a go and make sure I
> > > haven't wrecked anything.
> >
> > It totally breaks on my A55 board. Running a single guest seems OK,
> > but running a number of the concurrently makes them explode early on
> > (faults in EFI...)
> >
> > I guess we end-up running with the wrong VTTBR at times, which would
> > be interesting...
>
> Fun fact:
>
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index b0cafd7c5f8f..40c84db5884a 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -195,6 +195,11 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> sysreg_restore_guest_state_vhe(guest_ctxt);
> __debug_switch_to_guest(vcpu);
>
> + WARN_ONCE(kvm_get_vttbr(vcpu->arch.hw_mmu) != read_sysreg(vttbr_el2),
> + "Oh crap %llx vs %llx\n",
> + kvm_get_vttbr(vcpu->arch.hw_mmu),
> + read_sysreg(vttbr_el2));
> +
> if (is_hyp_ctxt(vcpu))
> vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
> else
>
> [ 36.190355] Oh crap 10000057a6001 vs 57a6001
>
> My bet is that the VMID isn't allocated on first load, and everything
> goes downhill from there.
So I was correct that the VMID isn't allocated on the first run, and
the following patch should address that particular problem:
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e2d38c7d6555..759adee42018 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1025,7 +1025,7 @@ int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
extern unsigned int __ro_after_init kvm_arm_vmid_bits;
int __init kvm_arm_vmid_alloc_init(void);
void __init kvm_arm_vmid_alloc_free(void);
-void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
+bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
void kvm_arm_vmid_clear_active(void);
static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 39c969c05990..584be562b1d4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -950,7 +950,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* making a thread's VMID inactive. So we need to call
* kvm_arm_vmid_update() in non-premptible context.
*/
- kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid);
+ if (kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid) &&
+ has_vhe())
+ __load_stage2(vcpu->arch.hw_mmu,
+ vcpu->arch.hw_mmu->arch);
kvm_pmu_flush_hwstate(vcpu);
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index 7fe8ba1a2851..281e4f86d9a2 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -135,10 +135,11 @@ void kvm_arm_vmid_clear_active(void)
atomic64_set(this_cpu_ptr(&active_vmids), VMID_ACTIVE_INVALID);
}
-void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
+bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
{
unsigned long flags;
u64 vmid, old_active_vmid;
+ bool updated = false;
vmid = atomic64_read(&kvm_vmid->id);
@@ -156,17 +157,21 @@ void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
if (old_active_vmid != 0 && vmid_gen_match(vmid) &&
0 != atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
old_active_vmid, vmid))
- return;
+ return false;
raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
/* Check that our VMID belongs to the current generation. */
vmid = atomic64_read(&kvm_vmid->id);
- if (!vmid_gen_match(vmid))
+ if (!vmid_gen_match(vmid)) {
vmid = new_vmid(kvm_vmid);
+ updated = true;
+ }
atomic64_set(this_cpu_ptr(&active_vmids), vmid);
raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
+
+ return updated;
}
/*
However, this isn't enough.
[ 63.450113] Oh crap 400000435c001 vs 3000004430001
So there are situations where we end-up with the wrong VTTBR, rather
than the wrong VMID, which is even worse. Haven't worked out the
scenario yet, but it apparently involves being preempted by a vcpu
from a different VM and not doing the right thing.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE
2023-10-06 13:33 ` Marc Zyngier
@ 2023-10-06 15:03 ` Marc Zyngier
2023-10-06 18:11 ` Oliver Upton
0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2023-10-06 15:03 UTC (permalink / raw)
To: Oliver Upton; +Cc: kvmarm, kvm, James Morse, Suzuki K Poulose, Zenghui Yu
On Fri, 06 Oct 2023 14:33:04 +0100,
Marc Zyngier <maz@kernel.org> wrote:
Still talking to myself! :D
[...]
>
> However, this isn't enough.
>
> [ 63.450113] Oh crap 400000435c001 vs 3000004430001
>
> So there are situations where we end-up with the wrong VTTBR, rather
> than the wrong VMID, which is even worse. Haven't worked out the
> scenario yet, but it apparently involves being preempted by a vcpu
> from a different VM and not doing the right thing.
Actually, no. It is the MMU notifiers kicking in and performing TLB
invalidation for a guest while we're in the context of another. The
joy of running 4 large VMs on a box with 2GB of RAM, basically running
from swap.
The trace looks like this:
[ 66.147484] Call trace:
[ 66.149899] dump_backtrace+0xa0/0x128
[ 66.153607] show_stack+0x20/0x38
[ 66.156884] dump_stack_lvl+0x78/0xc8
[ 66.160507] dump_stack+0x18/0x28
[ 66.163784] __tlb_switch_to_guest+0x50/0x148
[ 66.168097] __kvm_tlb_flush_vmid_ipa+0x3c/0xc8
[ 66.172582] stage2_unmap_put_pte+0xd0/0xe8
[ 66.176722] stage2_unmap_walker+0x160/0x1c0
[ 66.180948] __kvm_pgtable_visit+0x170/0x1f8
[ 66.185174] __kvm_pgtable_walk+0x94/0xc8
[ 66.189142] __kvm_pgtable_visit+0xd8/0x1f8
[ 66.193282] __kvm_pgtable_walk+0x94/0xc8
[ 66.197249] __kvm_pgtable_visit+0xd8/0x1f8
[ 66.201389] __kvm_pgtable_walk+0x94/0xc8
[ 66.205357] kvm_pgtable_walk+0xd4/0x170
[ 66.209238] kvm_pgtable_stage2_unmap+0x54/0xd0
[ 66.213723] stage2_apply_range+0x9c/0x108
[ 66.217777] __unmap_stage2_range+0x34/0x70
[ 66.221917] kvm_unmap_gfn_range+0x38/0x58
[ 66.225970] kvm_mmu_notifier_invalidate_range_start+0xe8/0x310
[ 66.231835] mn_hlist_invalidate_range_start+0x80/0x158
[ 66.237010] __mmu_notifier_invalidate_range_start+0x40/0x78
[ 66.242617] try_to_migrate_one+0x8b0/0xa10
[ 66.246757] rmap_walk_anon+0xec/0x268
[ 66.250465] try_to_migrate+0xc8/0x120
[ 66.254174] migrate_folio_unmap+0x180/0x438
[ 66.258401] migrate_pages_batch+0x14c/0x798
[ 66.262627] migrate_pages_sync+0x8c/0x258
[ 66.266680] migrate_pages+0x4f0/0x690
[ 66.270389] compact_zone+0x1d8/0x6b8
[ 66.274012] compact_zone_order+0xa0/0xf0
[ 66.277979] try_to_compact_pages+0xfc/0x378
[ 66.282205] __alloc_pages_direct_compact+0x80/0x398
[ 66.287122] __alloc_pages_slowpath.constprop.0+0x328/0x868
[ 66.292642] __alloc_pages+0x2cc/0x358
[ 66.296350] __folio_alloc+0x24/0x68
[ 66.299887] vma_alloc_folio+0x2ac/0x340
[ 66.303768] do_huge_pmd_anonymous_page+0xb0/0x3b8
[ 66.308512] __handle_mm_fault+0x31c/0x358
[ 66.312566] handle_mm_fault+0x64/0x270
[ 66.316360] faultin_page+0x74/0x130
[ 66.319897] __get_user_pages+0xc8/0x340
[ 66.323778] get_user_pages_unlocked+0xc8/0x3b8
[ 66.328263] hva_to_pfn+0xfc/0x338
[ 66.331627] __gfn_to_pfn_memslot+0xa8/0x100
[ 66.335853] user_mem_abort+0x17c/0x7c0
[ 66.339648] kvm_handle_guest_abort+0x2f4/0x3d8
[ 66.344133] handle_trap_exceptions+0x44/0xb8
[ 66.348445] handle_exit+0x4c/0x118
[ 66.351895] kvm_arch_vcpu_ioctl_run+0x24c/0x5e0
[ 66.356467] kvm_vcpu_ioctl+0x28c/0x9e0
[ 66.360262] __arm64_sys_ioctl+0xc0/0xe8
[ 66.364143] invoke_syscall+0x50/0x128
[ 66.367852] el0_svc_common.constprop.0+0x48/0xf0
[ 66.372509] do_el0_svc+0x24/0x38
[ 66.375787] el0_svc+0x3c/0xe0
[ 66.378805] el0t_64_sync_handler+0xc0/0xc8
[ 66.382945] el0t_64_sync+0x1a4/0x1a8
Which says it all: we're reclaiming pages from a guest while faulting
on another, resulting in the wrong MMU setup being left behind. Damn!
There's the sum of my hacks, which keeps the box alive.
Thanks,
M.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e2d38c7d6555..759adee42018 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1025,7 +1025,7 @@ int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
extern unsigned int __ro_after_init kvm_arm_vmid_bits;
int __init kvm_arm_vmid_alloc_init(void);
void __init kvm_arm_vmid_alloc_free(void);
-void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
+bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
void kvm_arm_vmid_clear_active(void);
static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 39c969c05990..584be562b1d4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -950,7 +950,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* making a thread's VMID inactive. So we need to call
* kvm_arm_vmid_update() in non-premptible context.
*/
- kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid);
+ if (kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid) &&
+ has_vhe())
+ __load_stage2(vcpu->arch.hw_mmu,
+ vcpu->arch.hw_mmu->arch);
kvm_pmu_flush_hwstate(vcpu);
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index f3f2e142e4f4..bc510c7e53f1 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -12,6 +12,7 @@
struct tlb_inv_context {
unsigned long flags;
+ struct kvm_s2_mmu *mmu;
u64 tcr;
u64 sctlr;
};
@@ -19,10 +20,16 @@ struct tlb_inv_context {
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
{
+ struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
u64 val;
local_irq_save(cxt->flags);
+ if (vcpu && mmu != vcpu->arch.hw_mmu)
+ cxt->mmu = vcpu->arch.hw_mmu;
+ else
+ cxt->mmu = NULL;
+
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
/*
* For CPUs that are affected by ARM errata 1165522 or 1530923,
@@ -64,8 +71,10 @@ static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
{
/*
* We're done with the TLB operation, let's restore the host's
- * view of HCR_EL2.
+ * view of HCR_EL2 and current S2 MMU context.
*/
+ if (cxt->mmu)
+ __load_stage2(cxt->mmu, cxt->mmu->arch);
write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
isb();
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index 7fe8ba1a2851..806223b7022a 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -135,10 +135,11 @@ void kvm_arm_vmid_clear_active(void)
atomic64_set(this_cpu_ptr(&active_vmids), VMID_ACTIVE_INVALID);
}
-void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
+bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
{
unsigned long flags;
u64 vmid, old_active_vmid;
+ bool updated = false;
vmid = atomic64_read(&kvm_vmid->id);
@@ -156,17 +157,21 @@ void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
if (old_active_vmid != 0 && vmid_gen_match(vmid) &&
0 != atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
old_active_vmid, vmid))
- return;
+ return false;
raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
/* Check that our VMID belongs to the current generation. */
vmid = atomic64_read(&kvm_vmid->id);
- if (!vmid_gen_match(vmid))
+ if (!vmid_gen_match(vmid)) {
vmid = new_vmid(kvm_vmid);
+ updated = true;
+ }
atomic64_set(this_cpu_ptr(&active_vmids), vmid);
raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
+
+ return updated;
}
/*
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE
2023-10-06 15:03 ` Marc Zyngier
@ 2023-10-06 18:11 ` Oliver Upton
0 siblings, 0 replies; 11+ messages in thread
From: Oliver Upton @ 2023-10-06 18:11 UTC (permalink / raw)
To: Marc Zyngier; +Cc: kvmarm, kvm, James Morse, Suzuki K Poulose, Zenghui Yu
Hey Marc,
On Fri, Oct 06, 2023 at 04:03:32PM +0100, Marc Zyngier wrote:
> On Fri, 06 Oct 2023 14:33:04 +0100,
> Marc Zyngier <maz@kernel.org> wrote:
>
> Still talking to myself! :D
Seems like you had a good bit of fun :)
> >
> > However, this isn't enough.
> >
> > [ 63.450113] Oh crap 400000435c001 vs 3000004430001
> >
> > So there are situations where we end-up with the wrong VTTBR, rather
> > than the wrong VMID, which is even worse. Haven't worked out the
> > scenario yet, but it apparently involves being preempted by a vcpu
> > from a different VM and not doing the right thing.
>
> Actually, no. It is the MMU notifiers kicking in and performing TLB
> invalidation for a guest while we're in the context of another. The
> joy of running 4 large VMs on a box with 2GB of RAM, basically running
> from swap.
Whelp, looks like my self-rule of no patches on the list after midnight
is in force again. Clearly this was all quite gently tested, thanks for
being the guinea pig.
> There's the sum of my hacks, which keeps the box alive.
Thanks! I'll roll it into v2 so we have something that actually works.
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put
2023-10-06 9:35 ` [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put Oliver Upton
@ 2023-10-06 22:06 ` kernel test robot
0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2023-10-06 22:06 UTC (permalink / raw)
To: Oliver Upton, kvmarm
Cc: oe-kbuild-all, kvm, Marc Zyngier, James Morse, Suzuki K Poulose,
Zenghui Yu, Oliver Upton
Hi Oliver,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 6465e260f48790807eef06b583b38ca9789b6072]
url: https://github.com/intel-lab-lkp/linux/commits/Oliver-Upton/KVM-arm64-Don-t-zero-VTTBR-in-__tlb_switch_to_host/20231006-173738
base: 6465e260f48790807eef06b583b38ca9789b6072
patch link: https://lore.kernel.org/r/20231006093600.1250986-3-oliver.upton%40linux.dev
patch subject: [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put
config: arm64-defconfig (https://download.01.org/0day-ci/archive/20231007/202310070514.3iAWEi1d-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231007/202310070514.3iAWEi1d-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310070514.3iAWEi1d-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> arch/arm64/kvm/hyp/vhe/sysreg-sr.c:109: warning: expecting prototype for __vcpu_put_switch_syregs(). Prototype was for __vcpu_put_switch_sysregs() instead
vim +109 arch/arm64/kvm/hyp/vhe/sysreg-sr.c
13aeb9b400c5d7 David Brazdil 2020-06-25 96
13aeb9b400c5d7 David Brazdil 2020-06-25 97 /**
129fe154f5bca6 Oliver Upton 2023-10-06 98 * __vcpu_put_switch_syregs - Restore host system registers to the physical CPU
13aeb9b400c5d7 David Brazdil 2020-06-25 99 *
13aeb9b400c5d7 David Brazdil 2020-06-25 100 * @vcpu: The VCPU pointer
13aeb9b400c5d7 David Brazdil 2020-06-25 101 *
13aeb9b400c5d7 David Brazdil 2020-06-25 102 * Save guest system registers that do not affect the host's execution, for
13aeb9b400c5d7 David Brazdil 2020-06-25 103 * example EL1 system registers on a VHE system where the host kernel
13aeb9b400c5d7 David Brazdil 2020-06-25 104 * runs at EL2. This function is called from KVM's vcpu_put() function
13aeb9b400c5d7 David Brazdil 2020-06-25 105 * and deferring saving system register state until we're no longer running the
13aeb9b400c5d7 David Brazdil 2020-06-25 106 * VCPU avoids having to save them on every exit from the VM.
13aeb9b400c5d7 David Brazdil 2020-06-25 107 */
129fe154f5bca6 Oliver Upton 2023-10-06 108 void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
13aeb9b400c5d7 David Brazdil 2020-06-25 @109 {
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-10-06 22:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-06 9:35 [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Oliver Upton
2023-10-06 9:35 ` [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Oliver Upton
2023-10-06 9:41 ` Oliver Upton
2023-10-06 9:35 ` [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put Oliver Upton
2023-10-06 22:06 ` kernel test robot
2023-10-06 9:36 ` [PATCH 3/3] KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() Oliver Upton
2023-10-06 11:13 ` [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Marc Zyngier
2023-10-06 12:34 ` Marc Zyngier
2023-10-06 13:33 ` Marc Zyngier
2023-10-06 15:03 ` Marc Zyngier
2023-10-06 18:11 ` Oliver Upton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).