From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: chenhuacai@kernel.org, gshan@redhat.com, jhogan@kernel.org,
joey.gouly@arm.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
loongarch@lists.linux.dev, maobibo@loongson.cn, maz@kernel.org,
oupton@kernel.org, pbonzini@redhat.com, ricarkol@google.com,
shahuang@redhat.com, stable@vger.kernel.org,
suzuki.poulose@arm.com, yuzenghui@huawei.com,
zhaotianrui@loongson.cn
Subject: Re: [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock in kvm_arch_flush_shadow_all()
Date: Tue, 5 May 2026 10:05:15 -0700 [thread overview]
Message-ID: <afohshVlK9YcBk-f@google.com> (raw)
In-Reply-To: <20260504231048.1184273-1-jthoughton@google.com>
On Mon, May 04, 2026, James Houghton wrote:
> On Mon, May 4, 2026 at 3:42 PM James Houghton <jthoughton@google.com> wrote:
> >
> > kvm_arch_flush_shadow_all() may sometimes be called on the same `kvm`
> > concurrently in the event that the KVM's `mm` is __mmput() at the
> > same time that last reference to the KVM is being dropped.
> >
> > T1 T2
> > KVM_CREATE_VM
> > Get VM file from T1
> > close VM
> > exit_mm() close VM
> >
> > T1: exit_mm() -> kvm_mmu_notifier_release() -> kvm_flush_shadow_all(),
> > with only the KVM srcu read lock held.
> >
> > T2: kvm_vm_release() ---> mmu_notifier_unregister() ->
> > kvm_mmu_notifier_release() -> kvm_flush_shadow_all(),
> > again, with only the KVM srcu read lock held.
> >
> > This leads to a potential double-free of
> > kvm->arch.kvm_mmu_free_memory_cache and now with NV
> > kvm->arch.nested_mmus.
...
> > void kvm_uninit_stage2_mmu(struct kvm *kvm)
> > {
> > - kvm_free_stage2_pgd(&kvm->arch.mmu);
> > + lockdep_assert_held_write(&kvm->mmu_lock);
>
> *facepalm*.... this doesn't account for the other callers of
> kvm_uninit_stage2_mmu(). They will get lockdep warnings.
>
> I've attached a diff to the bottom of this reply that *does* deal with them.
> :( Sorry.
...
> > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > index 883b6c1008fb..977598bff5e6 100644
> > --- a/arch/arm64/kvm/nested.c
> > +++ b/arch/arm64/kvm/nested.c
> > @@ -1190,11 +1190,13 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
> > {
> > int i;
> >
> > + guard(write_lock)(&kvm->mmu_lock);
> > +
> > for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> > struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> >
> > if (!WARN_ON(atomic_read(&mmu->refcnt)))
> > - kvm_free_stage2_pgd(mmu);
> > + kvm_free_stage2_pgd_locked(mmu);
> > }
> > kvfree(kvm->arch.nested_mmus);
> > kvm->arch.nested_mmus = NULL;
> > --
> > 2.54.0.545.g6539524ca2-goog
>
> And here is the diff that should fix this patch. (Sorry!!)
There are more issues. kvm->arch.mmu.split_page_cache can be freed by
kvm_arch_commit_memory_region(), which holds slots_lock and slots_arch_lock,
but not mmu_lock.
IMO, the handling of kvm->arch.mmu.split_page_cache should be reworked. I don't
entirely get the motivation for aggressively freeing the cache. The cache will
only be filled if KVM actually does eager page splitting, so it's not like KVM is
burning pages for setups that will never use the cache.
Maybe I'm underestimating how many pages arm64 needs in the worst case scenario?
(I can't follow the math, too many macros). But if KVM is configuring the cache
with a capacity that's _so_ high that the "wasted" memory is problematic, then we
probably should we revisit the capacity and algorithm. E.g. if KVM is splitting
from 1GiB => 4KiB in a single pass (I can't tell if KVM does this on arm64), then
we could break that into a 1GiB => 2MiB => 4KiB sequence.
next prev parent reply other threads:[~2026-05-05 17:05 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 22:42 [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
2026-05-04 22:42 ` [PATCH 1/5] KVM: arm64: Grab KVM MMU write lock " James Houghton
2026-05-04 23:10 ` James Houghton
2026-05-05 17:05 ` Sean Christopherson [this message]
2026-05-05 18:01 ` James Houghton
2026-05-05 18:16 ` Sean Christopherson
2026-05-05 20:14 ` Sean Christopherson
2026-05-06 2:27 ` Bibo Mao
2026-05-04 22:42 ` [PATCH 2/5] KVM: loongarch: Grab MMU " James Houghton
2026-05-04 22:42 ` [PATCH 3/5] KVM: mips: " James Houghton
2026-05-04 22:42 ` [PATCH 4/5] KVM: Hold MMU lock exclusively when calling kvm_arch_flush_shadow_all() James Houghton
2026-05-04 22:42 ` [PATCH 5/5] DO NOT MERGE: KVM: selftests: Reproducer for arm64 double-free James Houghton
2026-05-04 22:44 ` [PATCH 0/5] KVM: Fix race conditions in kvm_arch_flush_shadow_all() James Houghton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afohshVlK9YcBk-f@google.com \
--to=seanjc@google.com \
--cc=chenhuacai@kernel.org \
--cc=gshan@redhat.com \
--cc=jhogan@kernel.org \
--cc=joey.gouly@arm.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=pbonzini@redhat.com \
--cc=ricarkol@google.com \
--cc=shahuang@redhat.com \
--cc=stable@vger.kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=yuzenghui@huawei.com \
--cc=zhaotianrui@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox