* [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all()
@ 2026-05-04 22:42 James Houghton
2026-05-04 22:42 ` [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock " James Houghton
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: James Houghton @ 2026-05-04 22:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Sean Christopherson, Gavin Shan, Shaoqin Huang,
Ricardo Koller, Tianrui Zhao, Bibo Mao, Huacai Chen, James Hogan,
linux-arm-kernel, kvmarm, loongarch, linux-mips, kvm,
linux-kernel, James Houghton
Hi Paolo,
syzbot running on Google production kernels ran into a double-free on
KVM/arm64 in kvm_mmu_free_memory_cache(). It turns out that loongarch
and mips also have a similar problem.
kvm_arch_flush_shadow_all() can be called on the same memslot
concurrently, leading to double-freeing in arm64 and mips. loongarch
is also affected: it can at least underflow some counters; I'm not sure
what else can break.
To get into this scenario, we need to have a process (P1) share an open
VM with another process (P2). If P1 closes its VM to leave P2 holding
the last reference, then there is a race between P1 exiting (exit_mm)
and P2 dropping its last reference to the VM.
exit_mm() and kvm_vm_release() both call kvm_mmu_notifier_release() on
the same KVM, and the only locks held are the KVM srcu lock and the MMU
notifier srcu lock.
Please see the arm64 patch for another description of the same race with
more context on the ensuing double-free in KVM/arm64.
The first three patches fix each broken architecture; each of those
patches have stable CCed with what I think are the appropriate Fixes.
After patching the locking for the broken architectures, it seems better
simply to have KVM take the MMU lock exclusively before calling
kvm_arch_flush_shadow_all() so that architectures don't need to worry
about it. Feel free to drop that patch, the fourth one, if you disagree
with it.
The fifth patch provides a repro (with a crude kernel patch to reliably
demonstrate the double-free). Please do not merge this.
The arm64 patch has been tested with the repro. The loongarch and mips
patches have been compile-tested only.
kvm_arch_guest_memory_reclaimed() is only implemented by one
architecture: x86. Its implementation does not need the KVM MMU lock to
be held.
This series is based on 7.1-rc2.
James Houghton (5):
KVM: arm64: Grab KVM MMU write lock in kvm_arch_flush_shadow_all()
KVM: loongarch: Grab MMU lock in kvm_arch_flush_shadow_all()
KVM: mips: Grab MMU lock in kvm_arch_flush_shadow_all()
KVM: Hold MMU lock exclusively when calling
kvm_arch_flush_shadow_all()
DO NOT MERGE: KVM: selftests: Reproducer for arm64 double-free
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/include/asm/kvm_mmu.h | 1 +
arch/arm64/kvm/mmu.c | 39 +++++-
arch/arm64/kvm/nested.c | 4 +-
arch/loongarch/kvm/mmu.c | 2 +
arch/mips/kvm/mips.c | 2 +
arch/mips/kvm/mmu.c | 2 +
arch/riscv/kvm/mmu.c | 4 +-
arch/riscv/kvm/vm.c | 2 +
arch/x86/kvm/mmu/mmu.c | 4 +-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/transfer_fd_test.c | 129 ++++++++++++++++++
virt/kvm/kvm_main.c | 3 +
13 files changed, 184 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/selftests/kvm/transfer_fd_test.c
base-commit: 6d35786de28116ecf78797a62b84e6bf3c45aa5a
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock in kvm_arch_flush_shadow_all()
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
@ 2026-05-04 22:42 ` James Houghton
2026-05-04 23:10 ` James Houghton
2026-05-04 22:42 ` [PATCH 2/5] KVM: loongarch: Grab MMU " James Houghton
` (4 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: James Houghton @ 2026-05-04 22:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Sean Christopherson, Gavin Shan, Shaoqin Huang,
Ricardo Koller, Tianrui Zhao, Bibo Mao, Huacai Chen, James Hogan,
linux-arm-kernel, kvmarm, loongarch, linux-mips, kvm,
linux-kernel, James Houghton, stable
kvm_arch_flush_shadow_all() may sometimes be called on the same `kvm`
concurrently in the event that the KVM's `mm` is __mmput() at the
same time that last reference to the KVM is being dropped.
T1 T2
KVM_CREATE_VM
Get VM file from T1
close VM
exit_mm() close VM
T1: exit_mm() -> kvm_mmu_notifier_release() -> kvm_flush_shadow_all(),
with only the KVM srcu read lock held.
T2: kvm_vm_release() ---> mmu_notifier_unregister() ->
kvm_mmu_notifier_release() -> kvm_flush_shadow_all(),
again, with only the KVM srcu read lock held.
This leads to a potential double-free of
kvm->arch.kvm_mmu_free_memory_cache and now with NV
kvm->arch.nested_mmus.
Cc: stable@vger.kernel.org
Fixes: e7bf7a490c68 ("KVM: arm64: Split huge pages when dirty logging is enabled")
Signed-off-by: James Houghton <jthoughton@google.com>
---
arch/arm64/include/asm/kvm_mmu.h | 1 +
arch/arm64/kvm/mmu.c | 23 +++++++++++++++++++----
arch/arm64/kvm/nested.c | 4 +++-
3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 01e9c72d6aa7..30d5c24fcebb 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -178,6 +178,7 @@ void stage2_unmap_vm(struct kvm *kvm);
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
void kvm_uninit_stage2_mmu(struct kvm *kvm);
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
+void kvm_free_stage2_pgd_locked(struct kvm_s2_mmu *mmu);
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
phys_addr_t pa, unsigned long size, bool writable);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d089c107d9b7..4bab407d43bb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1021,7 +1021,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
void kvm_uninit_stage2_mmu(struct kvm *kvm)
{
- kvm_free_stage2_pgd(&kvm->arch.mmu);
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ kvm_free_stage2_pgd_locked(&kvm->arch.mmu);
kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
}
@@ -1095,12 +1097,14 @@ void stage2_unmap_vm(struct kvm *kvm)
srcu_read_unlock(&kvm->srcu, idx);
}
-void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
+static void __kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu, bool locked)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
struct kvm_pgtable *pgt = NULL;
- write_lock(&kvm->mmu_lock);
+ if (!locked)
+ write_lock(&kvm->mmu_lock);
+
pgt = mmu->pgt;
if (pgt) {
mmu->pgd_phys = 0;
@@ -1111,7 +1115,8 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
if (kvm_is_nested_s2_mmu(kvm, mmu))
kvm_init_nested_s2_mmu(mmu);
- write_unlock(&kvm->mmu_lock);
+ if (!locked)
+ write_unlock(&kvm->mmu_lock);
if (pgt) {
kvm_stage2_destroy(pgt);
@@ -1119,6 +1124,16 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
}
}
+void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
+{
+ __kvm_free_stage2_pgd(mmu, false);
+}
+
+void kvm_free_stage2_pgd_locked(struct kvm_s2_mmu *mmu)
+{
+ __kvm_free_stage2_pgd(mmu, true);
+}
+
static void hyp_mc_free_fn(void *addr, void *mc)
{
struct kvm_hyp_memcache *memcache = mc;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 883b6c1008fb..977598bff5e6 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -1190,11 +1190,13 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
int i;
+ guard(write_lock)(&kvm->mmu_lock);
+
for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
if (!WARN_ON(atomic_read(&mmu->refcnt)))
- kvm_free_stage2_pgd(mmu);
+ kvm_free_stage2_pgd_locked(mmu);
}
kvfree(kvm->arch.nested_mmus);
kvm->arch.nested_mmus = NULL;
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/5] KVM: loongarch: Grab MMU lock in kvm_arch_flush_shadow_all()
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
2026-05-04 22:42 ` [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock " James Houghton
@ 2026-05-04 22:42 ` James Houghton
2026-05-04 22:42 ` [PATCH 3/5] KVM: mips: " James Houghton
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: James Houghton @ 2026-05-04 22:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Sean Christopherson, Gavin Shan, Shaoqin Huang,
Ricardo Koller, Tianrui Zhao, Bibo Mao, Huacai Chen, James Hogan,
linux-arm-kernel, kvmarm, loongarch, linux-mips, kvm,
linux-kernel, James Houghton, stable
kvm_arch_flush_shadow_all() may be called concurrently on the same
`kvm`. This could at least result in accounting mistakes (e.g.
underflows on `kvm->stat.*pages`).
Cc: stable@vger.kernel.org
Fixes: 752e2cd7b4fb ("LoongArch: KVM: Implement kvm mmu operations")
Signed-off-by: James Houghton <jthoughton@google.com>
---
Note: This is compile-tested only!
arch/loongarch/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index a7fa458e3360..5dbce9b18e1c 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -486,7 +486,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
- kvm_flush_range(kvm, 0, kvm->arch.gpa_size >> PAGE_SHIFT, 0);
+ kvm_flush_range(kvm, 0, kvm->arch.gpa_size >> PAGE_SHIFT, 1);
}
void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/5] KVM: mips: Grab MMU lock in kvm_arch_flush_shadow_all()
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
2026-05-04 22:42 ` [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock " James Houghton
2026-05-04 22:42 ` [PATCH 2/5] KVM: loongarch: Grab MMU " James Houghton
@ 2026-05-04 22:42 ` James Houghton
2026-05-04 22:42 ` [PATCH 4/5] KVM: Hold MMU lock exclusively when calling kvm_arch_flush_shadow_all() James Houghton
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: James Houghton @ 2026-05-04 22:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Sean Christopherson, Gavin Shan, Shaoqin Huang,
Ricardo Koller, Tianrui Zhao, Bibo Mao, Huacai Chen, James Hogan,
linux-arm-kernel, kvmarm, loongarch, linux-mips, kvm,
linux-kernel, James Houghton, stable
kvm_mips_flush_gpa_pt() expects the MMU lock to be held; it is not in
this path.
This can lead to a double-free of page table entries if
kvm_arch_flush_shadow_all() is called twice on the same `kvm`
concurrently; such a scenario is possible.
Cc: stable@vger.kernel.org
Fixes: b62091108633 ("KVM: MIPS: Implement kvm_arch_flush_shadow_all/memslot")
Signed-off-by: James Houghton <jthoughton@google.com>
---
Note: This is compile-tested only!
arch/mips/kvm/mips.c | 2 ++
arch/mips/kvm/mmu.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a53abbba43ea..463b6c4aa62c 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -180,6 +180,8 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl,
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
+ guard(spinlock)(&kvm->mmu_lock);
+
/* Flush whole GPA */
kvm_mips_flush_gpa_pt(kvm, 0, ~0);
kvm_flush_remote_tlbs(kvm);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index d2c3b6b41f18..5045833f8116 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -269,6 +269,8 @@ static bool kvm_mips_flush_gpa_pgd(pgd_t *pgd, unsigned long start_gpa,
*/
bool kvm_mips_flush_gpa_pt(struct kvm *kvm, gfn_t start_gfn, gfn_t end_gfn)
{
+ lockdep_assert_held(&kvm->mmu_lock);
+
return kvm_mips_flush_gpa_pgd(kvm->arch.gpa_mm.pgd,
start_gfn << PAGE_SHIFT,
end_gfn << PAGE_SHIFT);
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/5] KVM: Hold MMU lock exclusively when calling kvm_arch_flush_shadow_all()
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
` (2 preceding siblings ...)
2026-05-04 22:42 ` [PATCH 3/5] KVM: mips: " James Houghton
@ 2026-05-04 22:42 ` James Houghton
2026-05-04 22:42 ` [PATCH 5/5] DO NOT MERGE: KVM: selftests: Reproducer for arm64 double-free James Houghton
2026-05-04 22:44 ` [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
5 siblings, 0 replies; 8+ messages in thread
From: James Houghton @ 2026-05-04 22:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Sean Christopherson, Gavin Shan, Shaoqin Huang,
Ricardo Koller, Tianrui Zhao, Bibo Mao, Huacai Chen, James Hogan,
linux-arm-kernel, kvmarm, loongarch, linux-mips, kvm,
linux-kernel, James Houghton
All architectures that non-trivially implement this function grab the
KVM MMU lock exclusively to prevent things like double-freeing of page
table entries and caches. Do so generically to somewhat simplify the
architecture-specific logic.
Without this change, it is possible for kvm_arch_flush_shadow_all() to
be called on the same `kvm` at the same time: if the `kvm`'s mm is
destroyed (exit_mm()) at the same time to the final reference to the
`kvm` is dropped.
Signed-off-by: James Houghton <jthoughton@google.com>
---
arch/arm64/kvm/nested.c | 2 +-
arch/loongarch/kvm/mmu.c | 4 +++-
arch/mips/kvm/mips.c | 2 +-
arch/riscv/kvm/mmu.c | 4 ++--
arch/riscv/kvm/vm.c | 2 ++
arch/x86/kvm/mmu/mmu.c | 4 +---
virt/kvm/kvm_main.c | 3 +++
7 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 977598bff5e6..ba2e6c98bd45 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -1190,7 +1190,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
int i;
- guard(write_lock)(&kvm->mmu_lock);
+ lockdep_assert_held_write(&kvm->mmu_lock);
for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 5dbce9b18e1c..120001da26e4 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -486,7 +486,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
- kvm_flush_range(kvm, 0, kvm->arch.gpa_size >> PAGE_SHIFT, 1);
+ lockdep_assert_held(&kvm->mmu_lock);
+
+ kvm_flush_range(kvm, 0, kvm->arch.gpa_size >> PAGE_SHIFT, 0);
}
void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 463b6c4aa62c..4ad9e21a3321 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -180,7 +180,7 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl,
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
- guard(spinlock)(&kvm->mmu_lock);
+ lockdep_assert_held(&kvm->mmu_lock);
/* Flush whole GPA */
kvm_mips_flush_gpa_pt(kvm, 0, ~0);
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 2d3def024270..c1b9333f17eb 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -558,7 +558,8 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
struct kvm_gstage gstage;
void *pgd = NULL;
- spin_lock(&kvm->mmu_lock);
+ lockdep_assert_held(&kvm->mmu_lock);
+
if (kvm->arch.pgd) {
kvm_riscv_gstage_init(&gstage, kvm);
kvm_riscv_gstage_unmap_range(&gstage, 0UL,
@@ -568,7 +569,6 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
kvm->arch.pgd_phys = 0;
kvm->arch.pgd_levels = 0;
}
- spin_unlock(&kvm->mmu_lock);
if (pgd)
free_pages((unsigned long)pgd, get_order(kvm_riscv_gstage_pgd_size));
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index a9f083feeb76..f704a64bfc48 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -38,7 +38,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
r = kvm_riscv_gstage_vmid_init(kvm);
if (r) {
+ spin_lock(&kvm->mmu_lock);
kvm_riscv_mmu_free_pgd(kvm);
+ spin_unlock(&kvm->mmu_lock);
return r;
}
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 892246204435..6e6f94046b3f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7352,7 +7352,7 @@ static void kvm_mmu_zap_all(struct kvm *kvm)
LIST_HEAD(invalid_list);
int ign;
- write_lock(&kvm->mmu_lock);
+ lockdep_assert_held_write(&kvm->mmu_lock);
restart:
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
if (WARN_ON_ONCE(sp->role.invalid))
@@ -7367,8 +7367,6 @@ static void kvm_mmu_zap_all(struct kvm *kvm)
if (tdp_mmu_enabled)
kvm_tdp_mmu_zap_all(kvm);
-
- write_unlock(&kvm->mmu_lock);
}
void kvm_arch_flush_shadow_all(struct kvm *kvm)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 89489996fbc1..f5affd3bfda8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -340,7 +340,10 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
static void kvm_flush_shadow_all(struct kvm *kvm)
{
+ KVM_MMU_LOCK(kvm);
kvm_arch_flush_shadow_all(kvm);
+ KVM_MMU_UNLOCK(kvm);
+
kvm_arch_guest_memory_reclaimed(kvm);
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/5] DO NOT MERGE: KVM: selftests: Reproducer for arm64 double-free
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
` (3 preceding siblings ...)
2026-05-04 22:42 ` [PATCH 4/5] KVM: Hold MMU lock exclusively when calling kvm_arch_flush_shadow_all() James Houghton
@ 2026-05-04 22:42 ` James Houghton
2026-05-04 22:44 ` [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
5 siblings, 0 replies; 8+ messages in thread
From: James Houghton @ 2026-05-04 22:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Sean Christopherson, Gavin Shan, Shaoqin Huang,
Ricardo Koller, Tianrui Zhao, Bibo Mao, Huacai Chen, James Hogan,
linux-arm-kernel, kvmarm, loongarch, linux-mips, kvm,
linux-kernel, James Houghton
Add a delay and a counter for the split_page_cache double-free to
reliably demonstrate the race. Please see the cover letter and the arm64
fix patch for more information.
The selftest is partially written by Gemini.
Assisted-by: Gemini:gemini-3-flash-preview
Not-signed-off-by: James Houghton <jthoughton@google.com>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/mmu.c | 16 +++
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/transfer_fd_test.c | 129 ++++++++++++++++++
4 files changed, 147 insertions(+)
create mode 100644 tools/testing/selftests/kvm/transfer_fd_test.c
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 65eead8362e0..5072fc2e2eb8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,7 @@ struct kvm_s2_mmu {
* Protected by kvm->slots_lock.
*/
struct kvm_mmu_memory_cache split_page_cache;
+ int is_freeing;
uint64_t split_page_chunk_size;
struct kvm_arch *arch;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4bab407d43bb..fa05900a5124 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1004,6 +1004,8 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
mmu->split_page_chunk_size = KVM_ARM_EAGER_SPLIT_CHUNK_SIZE_DEFAULT;
mmu->split_page_cache.gfp_zero = __GFP_ZERO;
+ mmu->is_freeing = 0;
+
mmu->pgd_phys = __pa(pgt->pgd);
if (kvm_is_nested_s2_mmu(kvm, mmu))
@@ -1021,10 +1023,24 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
void kvm_uninit_stage2_mmu(struct kvm *kvm)
{
+ int is_freeing;
+ ktime_t s;
+
lockdep_assert_held_write(&kvm->mmu_lock);
kvm_free_stage2_pgd_locked(&kvm->arch.mmu);
+
+ is_freeing = ++kvm->arch.mmu.is_freeing;
+ s = ktime_get();
+
+ /* Sleep for 10ms */
+ while (ktime_to_ns(ktime_get()) - ktime_to_ns(s) < 1E7) {}
+
+ WARN(is_freeing > 1, "detected double-free of split page cache");
+
kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
+
+ kvm->arch.mmu.is_freeing--;
}
static void stage2_unmap_memslot(struct kvm *kvm,
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 9118a5a51b89..53a1b9c7bff8 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -66,6 +66,7 @@ TEST_GEN_PROGS_COMMON += kvm_page_table_test
TEST_GEN_PROGS_COMMON += set_memory_region_test
TEST_GEN_PROGS_COMMON += memslot_modification_stress_test
TEST_GEN_PROGS_COMMON += memslot_perf_test
+TEST_GEN_PROGS_COMMON += transfer_fd_test
# Compiled test targets
TEST_GEN_PROGS_x86 = $(TEST_GEN_PROGS_COMMON)
diff --git a/tools/testing/selftests/kvm/transfer_fd_test.c b/tools/testing/selftests/kvm/transfer_fd_test.c
new file mode 100644
index 000000000000..ff2adff9954b
--- /dev/null
+++ b/tools/testing/selftests/kvm/transfer_fd_test.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test VM file descriptor transfer via Unix Domain Sockets.
+ */
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+
+static void send_fd(int sock, int fd)
+{
+ struct msghdr msg = {0};
+ struct cmsghdr *cmsg;
+ char buf[CMSG_SPACE(sizeof(int))];
+ struct iovec io = {
+ .iov_base = "a",
+ .iov_len = 1,
+ };
+
+ msg.msg_iov = &io;
+ msg.msg_iovlen = 1;
+ msg.msg_control = buf;
+ msg.msg_controllen = sizeof(buf);
+
+ cmsg = CMSG_FIRSTHDR(&msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+
+ *((int *)CMSG_DATA(cmsg)) = fd;
+
+ TEST_ASSERT(sendmsg(sock, &msg, 0) == 1, "sendmsg failed, errno: %d", errno);
+}
+
+static int recv_fd(int sock)
+{
+ struct msghdr msg = {0};
+ struct cmsghdr *cmsg;
+ char buf[CMSG_SPACE(sizeof(int))];
+ char dummy;
+ struct iovec io = {
+ .iov_base = &dummy,
+ .iov_len = 1,
+ };
+ int fd;
+
+ msg.msg_iov = &io;
+ msg.msg_iovlen = 1;
+ msg.msg_control = buf;
+ msg.msg_controllen = sizeof(buf);
+
+ TEST_ASSERT(recvmsg(sock, &msg, 0) == 1, "recvmsg failed, errno: %d", errno);
+
+ cmsg = CMSG_FIRSTHDR(&msg);
+ TEST_ASSERT(cmsg && cmsg->cmsg_level == SOL_SOCKET &&
+ cmsg->cmsg_type == SCM_RIGHTS, "No FD received");
+
+ fd = *((int *)CMSG_DATA(cmsg));
+ return fd;
+}
+
+int main(int argc, char **argv)
+{
+ pthread_barrierattr_t attr;
+ pthread_barrier_t *barrier;
+ int socks[2];
+ pid_t pid;
+ int ret;
+
+ barrier = mmap(NULL, sizeof(*barrier), PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+ TEST_ASSERT(barrier != MAP_FAILED, "mmap failed, errno: %d", errno);
+
+ ret = pthread_barrierattr_init(&attr);
+ TEST_ASSERT(!ret, "pthread_barrierattr_init failed, ret: %d", ret);
+
+ ret = pthread_barrierattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
+ TEST_ASSERT(!ret, "pthread_barrierattr_setpshared failed, ret: %d", ret);
+
+ ret = pthread_barrier_init(barrier, &attr, 2);
+ TEST_ASSERT(!ret, "pthread_barrier_init failed, ret: %d", ret);
+
+ ret = socketpair(AF_UNIX, SOCK_STREAM, 0, socks);
+ TEST_ASSERT(ret == 0, "socketpair failed, errno: %d", errno);
+
+ pid = fork();
+ TEST_ASSERT(pid >= 0, "fork failed, errno: %d", errno);
+
+ if (pid > 0) {
+ struct kvm_vm *vm;
+
+ close(socks[1]);
+
+ vm = vm_create_barebones();
+
+ send_fd(socks[0], vm->fd);
+ close(socks[0]);
+
+ /* Drop *ALL* refs to this VM. */
+ close(vm->fd);
+ close(vm->kvm_fd);
+ if (vm->stats.fd >= 0)
+ close(vm->stats.fd);
+
+ pthread_barrier_wait(barrier);
+
+ /* Trigger the exit_mm() side of the race. */
+ _exit(0);
+ } else {
+ int vm_fd;
+
+ close(socks[0]);
+
+ vm_fd = recv_fd(socks[1]);
+ close(socks[1]);
+
+ pthread_barrier_wait(barrier);
+
+ /* Drop the final ref of the VM, triggering the kvm_destroy_vm()
+ * side of the race. */
+ close(vm_fd);
+ }
+
+ return 0;
+}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all()
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
` (4 preceding siblings ...)
2026-05-04 22:42 ` [PATCH 5/5] DO NOT MERGE: KVM: selftests: Reproducer for arm64 double-free James Houghton
@ 2026-05-04 22:44 ` James Houghton
5 siblings, 0 replies; 8+ messages in thread
From: James Houghton @ 2026-05-04 22:44 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Sean Christopherson, Gavin Shan, Shaoqin Huang,
Ricardo Koller, Tianrui Zhao, Bibo Mao, Huacai Chen, James Hogan,
linux-arm-kernel, kvmarm, loongarch, linux-mips, kvm,
linux-kernel
On Mon, May 4, 2026 at 3:42 PM James Houghton <jthoughton@google.com> wrote:
>
> Hi Paolo,
>
> syzbot running on Google production kernels ran into a double-free on
> KVM/arm64 in kvm_mmu_free_memory_cache().
Here is the full splat:
==================================================================
BUG: KASAN: double-free in kvfree+0x30/0x40 mm/slub.c:7247
Free of addr ffff000810394000 by task syz.25.15225/156017
CPU: 2 UID: 0 PID: 156017 Comm: syz.25.15225 Kdump: loaded Not tainted
6.18.16-smp-DEV #1 NONE
Hardware name: Google
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C)
__dump_stack+0x30/0x40 lib/dump_stack.c:94
dump_stack_lvl+0x198/0x264 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:402 [inline]
print_report+0xbc/0x254 mm/kasan/report.c:506
kasan_report_invalid_free+0x80/0xd0 mm/kasan/report.c:581
check_slab_allocation+0x134/0x1a0 mm/kasan/common.c:-1
__kasan_slab_pre_free+0x30/0x40 mm/kasan/common.c:261
kasan_slab_pre_free include/linux/kasan.h:199 [inline]
slab_free_hook mm/slub.c:2535 [inline]
slab_free mm/slub.c:6730 [inline]
kfree+0x140/0x548 mm/slub.c:6941
kvfree+0x30/0x40 mm/slub.c:7247
kvm_mmu_free_memory_cache+0x22c/0x2a0 virt/kvm/kvm_main.c:679
kvm_uninit_stage2_mmu+0x30/0x40 arch/arm64/kvm/mmu.c:1048
kvm_arch_flush_shadow_all+0x178/0x198 arch/arm64/kvm/nested.c:1113
kvm_flush_shadow_all virt/kvm/kvm_main.c:579 [inline]
kvm_mmu_notifier_release+0x48/0xac virt/kvm/kvm_main.c:1287
mn_hlist_release mm/mmu_notifier.c:321 [inline]
__mmu_notifier_release+0x29c/0x46c mm/mmu_notifier.c:359
mmu_notifier_release include/linux/mmu_notifier.h:406 [inline]
exit_mmap+0x100/0xb08 mm/mmap.c:1265
__mmput+0xb4/0x360 kernel/fork.c:1183
mmput+0x70/0xa8 kernel/fork.c:1206
exit_mm kernel/exit.c:616 [inline]
do_exit+0xc1c/0x26f4 kernel/exit.c:1025
do_group_exit+0x194/0x22c kernel/exit.c:1180
get_signal+0x1614/0x197c kernel/signal.c:3056
arch_do_signal_or_restart+0x25c/0x4a5c arch/arm64/kernel/signal.c:1637
exit_to_user_mode_loop+0x84/0x230 kernel/entry/common.c:44
exit_to_user_mode_prepare include/linux/irq-entry-common.h:228 [inline]
arm64_exit_to_user_mode arch/arm64/kernel/entry-common.c:103 [inline]
el0_svc+0x168/0x1dc arch/arm64/kernel/entry-common.c:750
el0t_64_sync_handler+0x68/0xdc arch/arm64/kernel/entry-common.c:768
el0t_64_sync+0x1b0/0x1b4 arch/arm64/kernel/entry.S:596
Allocated by task 156010:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x30/0x68 mm/kasan/common.c:78
kasan_save_alloc_info+0x40/0x50 mm/kasan/generic.c:573
poison_kmalloc_redzone mm/kasan/common.c:401 [inline]
__kasan_kmalloc+0x9c/0xb4 mm/kasan/common.c:418
kasan_kmalloc include/linux/kasan.h:263 [inline]
__do_kmalloc_node mm/slub.c:5712 [inline]
__kvmalloc_node_noprof+0x4e0/0x7d0 mm/slub.c:7204
kvmalloc_array_node_noprof include/linux/slab.h:1140 [inline]
__kvm_mmu_topup_memory_cache+0x460/0x5b8 virt/kvm/kvm_main.c:638
kvm_mmu_split_huge_pages+0x1d8/0x338 arch/arm64/kvm/mmu.c:143
kvm_mmu_split_memory_region arch/arm64/kvm/mmu.c:1306 [inline]
kvm_arch_commit_memory_region+0x648/0x814 arch/arm64/kvm/mmu.c:2432
kvm_commit_memory_region virt/kvm/kvm_main.c:2244 [inline]
kvm_set_memslot+0xbec/0x120c virt/kvm/kvm_main.c:2506
kvm_set_memory_region+0x7f4/0x9ec virt/kvm/kvm_main.c:2643
kvm_vm_ioctl_set_memory_region+0x74/0xdc virt/kvm/kvm_main.c:2677
kvm_vm_ioctl+0xc28/0xe50 virt/kvm/kvm_main.c:6368
vfs_ioctl fs/ioctl.c:52 [inline]
__do_sys_ioctl fs/ioctl.c:598 [inline]
__se_sys_ioctl fs/ioctl.c:584 [inline]
__arm64_sys_ioctl+0x14c/0x1c4 fs/ioctl.c:584
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0xa4/0x25c arch/arm64/kernel/syscall.c:49
el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x4c/0x5c arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x1dc arch/arm64/kernel/entry-common.c:749
el0t_64_sync_handler+0x68/0xdc arch/arm64/kernel/entry-common.c:768
el0t_64_sync+0x1b0/0x1b4 arch/arm64/kernel/entry.S:596
Freed by task 156010:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x30/0x68 mm/kasan/common.c:78
__kasan_save_free_info+0x54/0x6c mm/kasan/generic.c:587
kasan_save_free_info mm/kasan/kasan.h:406 [inline]
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x74/0xa4 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2590 [inline]
slab_free mm/slub.c:6730 [inline]
kfree+0x17c/0x548 mm/slub.c:6941
kvfree+0x30/0x40 mm/slub.c:7247
kvm_mmu_free_memory_cache+0x22c/0x2a0 virt/kvm/kvm_main.c:679
kvm_uninit_stage2_mmu+0x30/0x40 arch/arm64/kvm/mmu.c:1048
kvm_arch_flush_shadow_all+0x178/0x198 arch/arm64/kvm/nested.c:1113
kvm_flush_shadow_all virt/kvm/kvm_main.c:579 [inline]
kvm_mmu_notifier_release+0x48/0xac virt/kvm/kvm_main.c:1287
mmu_notifier_unregister+0xf0/0x330 mm/mmu_notifier.c:815
kvm_destroy_vm virt/kvm/kvm_main.c:1767 [inline]
kvm_put_kvm+0x664/0xaf0 virt/kvm/kvm_main.c:1825
kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1848
__fput+0x340/0x754 fs/file_table.c:495
____fput+0x20/0x58 fs/file_table.c:523
task_work_run+0x1d8/0x268 kernel/task_work.c:233
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0xcc0/0x26f4 kernel/exit.c:1037
do_group_exit+0x194/0x22c kernel/exit.c:1180
get_signal+0x1614/0x197c kernel/signal.c:3056
arch_do_signal_or_restart+0x25c/0x4a5c arch/arm64/kernel/signal.c:1637
exit_to_user_mode_loop+0x84/0x230 kernel/entry/common.c:44
exit_to_user_mode_prepare include/linux/irq-entry-common.h:228 [inline]
arm64_exit_to_user_mode arch/arm64/kernel/entry-common.c:103 [inline]
el0_svc+0x168/0x1dc arch/arm64/kernel/entry-common.c:750
el0t_64_sync_handler+0x68/0xdc arch/arm64/kernel/entry-common.c:768
el0t_64_sync+0x1b0/0x1b4 arch/arm64/kernel/entry.S:596
The buggy address belongs to the object at ffff000810394000
which belongs to the cache kmalloc-cg-8k of size 8192
The buggy address is located 0 bytes inside of
8192-byte region [ffff000810394000, ffff000810396000)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x810390
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff0008ceb3d481
anon flags: 0x100000000000040(head|node=0|zone=1)
page_type: f5(slab)
raw: 0100000000000040 ffff00001000b200 0000000000000000 0000000000000001
raw: 0000000000000000 0000000080020002 00000000f5000000 ffff0008ceb3d481
head: 0100000000000040 ffff00001000b200 0000000000000000 0000000000000001
head: 0000000000000000 0000000080020002 00000000f5000000 ffff0008ceb3d481
head: 0100000000000003 fffffdffe040e401 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask
0x1528c0(GFP_NOWAIT|__GFP_SENSITIVE|__GFP_IO|__GFP_FS|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL),
pid 18319, tgid 17452 (ContainerMgr), ts 882149108930, free_ts
882148954562
set_page_owner include/linux/page_owner.h:32 [inline]
__post_alloc_hook+0x268/0x288 mm/page_alloc.c:1962
prep_new_page mm/page_alloc.c:1976 [inline]
get_page_from_freelist+0x2dc0/0x2f1c mm/page_alloc.c:4089
__alloc_frozen_pages_noprof+0x254/0x120c mm/page_alloc.c:6001
alloc_pages_mpol+0x1e4/0x444 mm/mempolicy.c:2428
alloc_frozen_pages_noprof+0xe0/0x210 mm/mempolicy.c:2499
alloc_slab_page mm/slub.c:3106 [inline]
allocate_slab+0xc0/0x48c mm/slub.c:3280
new_slab mm/slub.c:3334 [inline]
___slab_alloc+0xc4c/0x13d4 mm/slub.c:4717
__slab_alloc+0x3c/0x6c mm/slub.c:4840
__slab_alloc_node mm/slub.c:4916 [inline]
slab_alloc_node mm/slub.c:5338 [inline]
__do_kmalloc_node mm/slub.c:5711 [inline]
__kvmalloc_node_noprof+0x5dc/0x7d0 mm/slub.c:7204
seq_buf_alloc fs/seq_file.c:38 [inline]
seq_read_iter+0x510/0xd08 fs/seq_file.c:254
proc_reg_read_iter+0x174/0x2a0 fs/proc/inode.c:299
new_sync_read fs/read_write.c:491 [inline]
vfs_read+0x580/0xac8 fs/read_write.c:572
ksys_read+0x128/0x220 fs/read_write.c:715
__do_sys_read fs/read_write.c:724 [inline]
__se_sys_read fs/read_write.c:722 [inline]
__arm64_sys_read+0x7c/0x90 fs/read_write.c:722
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0xa4/0x25c arch/arm64/kernel/syscall.c:49
el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
page last free pid 18319 tgid 17452 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
__free_pages_prepare mm/page_alloc.c:1483 [inline]
__free_frozen_pages+0x1274/0x1798 mm/page_alloc.c:3064
free_frozen_pages+0x14/0x20 mm/page_alloc.c:3114
free_large_kmalloc+0xc0/0x154 mm/slub.c:6867
kfree+0x338/0x548 mm/slub.c:6935
kvfree+0x30/0x40 mm/slub.c:7247
seq_release+0x58/0x7c fs/seq_file.c:368
proc_seq_release+0x9c/0xbc fs/proc/generic.c:623
close_pdeo+0x1b0/0x340 fs/proc/inode.c:242
proc_reg_release+0x144/0x174 fs/proc/inode.c:547
__fput+0x340/0x754 fs/file_table.c:495
fput_close_sync+0x110/0x2c0 fs/file_table.c:600
__do_sys_close fs/open.c:1591 [inline]
__se_sys_close fs/open.c:1576 [inline]
__arm64_sys_close+0x7c/0x118 fs/open.c:1576
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0xa4/0x25c arch/arm64/kernel/syscall.c:49
el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x4c/0x5c arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x1dc arch/arm64/kernel/entry-common.c:749
Memory state around the buggy address:
ffff000810393f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff000810393f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff000810394000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff000810394080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff000810394100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock in kvm_arch_flush_shadow_all()
2026-05-04 22:42 ` [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock " James Houghton
@ 2026-05-04 23:10 ` James Houghton
0 siblings, 0 replies; 8+ messages in thread
From: James Houghton @ 2026-05-04 23:10 UTC (permalink / raw)
To: jthoughton
Cc: chenhuacai, gshan, jhogan, joey.gouly, kvm, kvmarm,
linux-arm-kernel, linux-kernel, linux-mips, loongarch, maobibo,
maz, oupton, pbonzini, ricarkol, seanjc, shahuang, stable,
suzuki.poulose, yuzenghui, zhaotianrui
On Mon, May 4, 2026 at 3:42 PM James Houghton <jthoughton@google.com> wrote:
>
> kvm_arch_flush_shadow_all() may sometimes be called on the same `kvm`
> concurrently in the event that the KVM's `mm` is __mmput() at the
> same time that last reference to the KVM is being dropped.
>
> T1 T2
> KVM_CREATE_VM
> Get VM file from T1
> close VM
> exit_mm() close VM
>
> T1: exit_mm() -> kvm_mmu_notifier_release() -> kvm_flush_shadow_all(),
> with only the KVM srcu read lock held.
>
> T2: kvm_vm_release() ---> mmu_notifier_unregister() ->
> kvm_mmu_notifier_release() -> kvm_flush_shadow_all(),
> again, with only the KVM srcu read lock held.
>
> This leads to a potential double-free of
> kvm->arch.kvm_mmu_free_memory_cache and now with NV
> kvm->arch.nested_mmus.
>
> Cc: stable@vger.kernel.org
> Fixes: e7bf7a490c68 ("KVM: arm64: Split huge pages when dirty logging is enabled")
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
> arch/arm64/include/asm/kvm_mmu.h | 1 +
> arch/arm64/kvm/mmu.c | 23 +++++++++++++++++++----
> arch/arm64/kvm/nested.c | 4 +++-
> 3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 01e9c72d6aa7..30d5c24fcebb 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -178,6 +178,7 @@ void stage2_unmap_vm(struct kvm *kvm);
> int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
> void kvm_uninit_stage2_mmu(struct kvm *kvm);
> void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> +void kvm_free_stage2_pgd_locked(struct kvm_s2_mmu *mmu);
> int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> phys_addr_t pa, unsigned long size, bool writable);
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d089c107d9b7..4bab407d43bb 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1021,7 +1021,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>
> void kvm_uninit_stage2_mmu(struct kvm *kvm)
> {
> - kvm_free_stage2_pgd(&kvm->arch.mmu);
> + lockdep_assert_held_write(&kvm->mmu_lock);
*facepalm*.... this doesn't account for the other callers of
kvm_uninit_stage2_mmu(). They will get lockdep warnings.
I've attached a diff to the bottom of this reply that *does* deal with them.
:( Sorry.
I'm guessing Marc or Oliver will probably want this patch to look quite
different, so I'll wait to hear from them before actually sending a v2.
In the meantime, I'll properly retest with lockdep enabled.
> +
> + kvm_free_stage2_pgd_locked(&kvm->arch.mmu);
> kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
> }
>
> @@ -1095,12 +1097,14 @@ void stage2_unmap_vm(struct kvm *kvm)
> srcu_read_unlock(&kvm->srcu, idx);
> }
>
> -void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> +static void __kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu, bool locked)
> {
> struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
> struct kvm_pgtable *pgt = NULL;
>
> - write_lock(&kvm->mmu_lock);
> + if (!locked)
> + write_lock(&kvm->mmu_lock);
> +
> pgt = mmu->pgt;
> if (pgt) {
> mmu->pgd_phys = 0;
> @@ -1111,7 +1115,8 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> if (kvm_is_nested_s2_mmu(kvm, mmu))
> kvm_init_nested_s2_mmu(mmu);
>
> - write_unlock(&kvm->mmu_lock);
> + if (!locked)
> + write_unlock(&kvm->mmu_lock);
>
> if (pgt) {
> kvm_stage2_destroy(pgt);
> @@ -1119,6 +1124,16 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> }
> }
>
> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> +{
> + __kvm_free_stage2_pgd(mmu, false);
> +}
> +
> +void kvm_free_stage2_pgd_locked(struct kvm_s2_mmu *mmu)
> +{
> + __kvm_free_stage2_pgd(mmu, true);
> +}
> +
> static void hyp_mc_free_fn(void *addr, void *mc)
> {
> struct kvm_hyp_memcache *memcache = mc;
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 883b6c1008fb..977598bff5e6 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -1190,11 +1190,13 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
> {
> int i;
>
> + guard(write_lock)(&kvm->mmu_lock);
> +
> for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
>
> if (!WARN_ON(atomic_read(&mmu->refcnt)))
> - kvm_free_stage2_pgd(mmu);
> + kvm_free_stage2_pgd_locked(mmu);
> }
> kvfree(kvm->arch.nested_mmus);
> kvm->arch.nested_mmus = NULL;
> --
> 2.54.0.545.g6539524ca2-goog
And here is the diff that should fix this patch. (Sorry!!)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 30d5c24fcebb..e32e844943be 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -177,6 +177,7 @@ void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t e
void stage2_unmap_vm(struct kvm *kvm);
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
void kvm_uninit_stage2_mmu(struct kvm *kvm);
+void kvm_uninit_stage2_mmu_locked(struct kvm *kvm);
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
void kvm_free_stage2_pgd_locked(struct kvm_s2_mmu *mmu);
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4bab407d43bb..98ba8116676c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1019,14 +1019,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
return err;
}
-void kvm_uninit_stage2_mmu(struct kvm *kvm)
-{
- lockdep_assert_held_write(&kvm->mmu_lock);
-
- kvm_free_stage2_pgd_locked(&kvm->arch.mmu);
- kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
-}
-
static void stage2_unmap_memslot(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
@@ -1134,6 +1126,24 @@ void kvm_free_stage2_pgd_locked(struct kvm_s2_mmu *mmu)
__kvm_free_stage2_pgd(mmu, true);
}
+static void __kvm_uninit_stage2_mmu(struct kvm *kvm, bool locked)
+{
+ __kvm_free_stage2_pgd(&kvm->arch.mmu, locked);
+ kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
+}
+
+void kvm_uninit_stage2_mmu(struct kvm *kvm)
+{
+ __kvm_uninit_stage2_mmu(kvm, false);
+}
+
+void kvm_uninit_stage2_mmu_locked(struct kvm *kvm)
+{
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ __kvm_uninit_stage2_mmu(kvm, true);
+}
+
static void hyp_mc_free_fn(void *addr, void *mc)
{
struct kvm_hyp_memcache *memcache = mc;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 977598bff5e6..f61f0244f0fb 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -1201,7 +1201,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
kvfree(kvm->arch.nested_mmus);
kvm->arch.nested_mmus = NULL;
kvm->arch.nested_mmus_size = 0;
- kvm_uninit_stage2_mmu(kvm);
+ kvm_uninit_stage2_mmu_locked(kvm);
}
/*
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-04 23:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
2026-05-04 22:42 ` [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock " James Houghton
2026-05-04 23:10 ` James Houghton
2026-05-04 22:42 ` [PATCH 2/5] KVM: loongarch: Grab MMU " James Houghton
2026-05-04 22:42 ` [PATCH 3/5] KVM: mips: " James Houghton
2026-05-04 22:42 ` [PATCH 4/5] KVM: Hold MMU lock exclusively when calling kvm_arch_flush_shadow_all() James Houghton
2026-05-04 22:42 ` [PATCH 5/5] DO NOT MERGE: KVM: selftests: Reproducer for arm64 double-free James Houghton
2026-05-04 22:44 ` [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox