From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
James Morse <james.morse@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE
Date: Fri, 06 Oct 2023 16:03:32 +0100 [thread overview]
Message-ID: <86y1gfn67v.wl-maz@kernel.org> (raw)
In-Reply-To: <86zg0vnaen.wl-maz@kernel.org>
On Fri, 06 Oct 2023 14:33:04 +0100,
Marc Zyngier <maz@kernel.org> wrote:
Still talking to myself! :D
[...]
>
> However, this isn't enough.
>
> [ 63.450113] Oh crap 400000435c001 vs 3000004430001
>
> So there are situations where we end-up with the wrong VTTBR, rather
> than the wrong VMID, which is even worse. Haven't worked out the
> scenario yet, but it apparently involves being preempted by a vcpu
> from a different VM and not doing the right thing.
Actually, no. It is the MMU notifiers kicking in and performing TLB
invalidation for a guest while we're in the context of another. The
joy of running 4 large VMs on a box with 2GB of RAM, basically running
from swap.
The trace looks like this:
[ 66.147484] Call trace:
[ 66.149899] dump_backtrace+0xa0/0x128
[ 66.153607] show_stack+0x20/0x38
[ 66.156884] dump_stack_lvl+0x78/0xc8
[ 66.160507] dump_stack+0x18/0x28
[ 66.163784] __tlb_switch_to_guest+0x50/0x148
[ 66.168097] __kvm_tlb_flush_vmid_ipa+0x3c/0xc8
[ 66.172582] stage2_unmap_put_pte+0xd0/0xe8
[ 66.176722] stage2_unmap_walker+0x160/0x1c0
[ 66.180948] __kvm_pgtable_visit+0x170/0x1f8
[ 66.185174] __kvm_pgtable_walk+0x94/0xc8
[ 66.189142] __kvm_pgtable_visit+0xd8/0x1f8
[ 66.193282] __kvm_pgtable_walk+0x94/0xc8
[ 66.197249] __kvm_pgtable_visit+0xd8/0x1f8
[ 66.201389] __kvm_pgtable_walk+0x94/0xc8
[ 66.205357] kvm_pgtable_walk+0xd4/0x170
[ 66.209238] kvm_pgtable_stage2_unmap+0x54/0xd0
[ 66.213723] stage2_apply_range+0x9c/0x108
[ 66.217777] __unmap_stage2_range+0x34/0x70
[ 66.221917] kvm_unmap_gfn_range+0x38/0x58
[ 66.225970] kvm_mmu_notifier_invalidate_range_start+0xe8/0x310
[ 66.231835] mn_hlist_invalidate_range_start+0x80/0x158
[ 66.237010] __mmu_notifier_invalidate_range_start+0x40/0x78
[ 66.242617] try_to_migrate_one+0x8b0/0xa10
[ 66.246757] rmap_walk_anon+0xec/0x268
[ 66.250465] try_to_migrate+0xc8/0x120
[ 66.254174] migrate_folio_unmap+0x180/0x438
[ 66.258401] migrate_pages_batch+0x14c/0x798
[ 66.262627] migrate_pages_sync+0x8c/0x258
[ 66.266680] migrate_pages+0x4f0/0x690
[ 66.270389] compact_zone+0x1d8/0x6b8
[ 66.274012] compact_zone_order+0xa0/0xf0
[ 66.277979] try_to_compact_pages+0xfc/0x378
[ 66.282205] __alloc_pages_direct_compact+0x80/0x398
[ 66.287122] __alloc_pages_slowpath.constprop.0+0x328/0x868
[ 66.292642] __alloc_pages+0x2cc/0x358
[ 66.296350] __folio_alloc+0x24/0x68
[ 66.299887] vma_alloc_folio+0x2ac/0x340
[ 66.303768] do_huge_pmd_anonymous_page+0xb0/0x3b8
[ 66.308512] __handle_mm_fault+0x31c/0x358
[ 66.312566] handle_mm_fault+0x64/0x270
[ 66.316360] faultin_page+0x74/0x130
[ 66.319897] __get_user_pages+0xc8/0x340
[ 66.323778] get_user_pages_unlocked+0xc8/0x3b8
[ 66.328263] hva_to_pfn+0xfc/0x338
[ 66.331627] __gfn_to_pfn_memslot+0xa8/0x100
[ 66.335853] user_mem_abort+0x17c/0x7c0
[ 66.339648] kvm_handle_guest_abort+0x2f4/0x3d8
[ 66.344133] handle_trap_exceptions+0x44/0xb8
[ 66.348445] handle_exit+0x4c/0x118
[ 66.351895] kvm_arch_vcpu_ioctl_run+0x24c/0x5e0
[ 66.356467] kvm_vcpu_ioctl+0x28c/0x9e0
[ 66.360262] __arm64_sys_ioctl+0xc0/0xe8
[ 66.364143] invoke_syscall+0x50/0x128
[ 66.367852] el0_svc_common.constprop.0+0x48/0xf0
[ 66.372509] do_el0_svc+0x24/0x38
[ 66.375787] el0_svc+0x3c/0xe0
[ 66.378805] el0t_64_sync_handler+0xc0/0xc8
[ 66.382945] el0t_64_sync+0x1a4/0x1a8
Which says it all: we're reclaiming pages from a guest while faulting
on another, resulting in the wrong MMU setup being left behind. Damn!
There's the sum of my hacks, which keeps the box alive.
Thanks,
M.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e2d38c7d6555..759adee42018 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1025,7 +1025,7 @@ int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
extern unsigned int __ro_after_init kvm_arm_vmid_bits;
int __init kvm_arm_vmid_alloc_init(void);
void __init kvm_arm_vmid_alloc_free(void);
-void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
+bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
void kvm_arm_vmid_clear_active(void);
static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 39c969c05990..584be562b1d4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -950,7 +950,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* making a thread's VMID inactive. So we need to call
* kvm_arm_vmid_update() in non-premptible context.
*/
- kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid);
+ if (kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid) &&
+ has_vhe())
+ __load_stage2(vcpu->arch.hw_mmu,
+ vcpu->arch.hw_mmu->arch);
kvm_pmu_flush_hwstate(vcpu);
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index f3f2e142e4f4..bc510c7e53f1 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -12,6 +12,7 @@
struct tlb_inv_context {
unsigned long flags;
+ struct kvm_s2_mmu *mmu;
u64 tcr;
u64 sctlr;
};
@@ -19,10 +20,16 @@ struct tlb_inv_context {
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
{
+ struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
u64 val;
local_irq_save(cxt->flags);
+ if (vcpu && mmu != vcpu->arch.hw_mmu)
+ cxt->mmu = vcpu->arch.hw_mmu;
+ else
+ cxt->mmu = NULL;
+
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
/*
* For CPUs that are affected by ARM errata 1165522 or 1530923,
@@ -64,8 +71,10 @@ static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
{
/*
* We're done with the TLB operation, let's restore the host's
- * view of HCR_EL2.
+ * view of HCR_EL2 and current S2 MMU context.
*/
+ if (cxt->mmu)
+ __load_stage2(cxt->mmu, cxt->mmu->arch);
write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
isb();
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index 7fe8ba1a2851..806223b7022a 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -135,10 +135,11 @@ void kvm_arm_vmid_clear_active(void)
atomic64_set(this_cpu_ptr(&active_vmids), VMID_ACTIVE_INVALID);
}
-void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
+bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
{
unsigned long flags;
u64 vmid, old_active_vmid;
+ bool updated = false;
vmid = atomic64_read(&kvm_vmid->id);
@@ -156,17 +157,21 @@ void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
if (old_active_vmid != 0 && vmid_gen_match(vmid) &&
0 != atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
old_active_vmid, vmid))
- return;
+ return false;
raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
/* Check that our VMID belongs to the current generation. */
vmid = atomic64_read(&kvm_vmid->id);
- if (!vmid_gen_match(vmid))
+ if (!vmid_gen_match(vmid)) {
vmid = new_vmid(kvm_vmid);
+ updated = true;
+ }
atomic64_set(this_cpu_ptr(&active_vmids), vmid);
raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
+
+ return updated;
}
/*
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2023-10-06 15:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 9:35 [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Oliver Upton
2023-10-06 9:35 ` [PATCH 1/3] KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Oliver Upton
2023-10-06 9:41 ` Oliver Upton
2023-10-06 9:35 ` [PATCH 2/3] KVM: arm64: Rename helpers for VHE vCPU load/put Oliver Upton
2023-10-06 22:06 ` kernel test robot
2023-10-06 9:36 ` [PATCH 3/3] KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() Oliver Upton
2023-10-06 11:13 ` [PATCH 0/3] KVM: arm64: Load the stage-2 MMU from vcpu_load() for VHE Marc Zyngier
2023-10-06 12:34 ` Marc Zyngier
2023-10-06 13:33 ` Marc Zyngier
2023-10-06 15:03 ` Marc Zyngier [this message]
2023-10-06 18:11 ` Oliver Upton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86y1gfn67v.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=james.morse@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=oliver.upton@linux.dev \
--cc=suzuki.poulose@arm.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox