Re: Potential bug in TDP MMU - Sean Christopherson

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Ignat Korchagin <ignat@cloudflare.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	stevensd@chromium.org, kernel-team <kernel-team@cloudflare.com>
Subject: Re: Potential bug in TDP MMU
Date: Sat, 11 Dec 2021 02:39:46 +0000	[thread overview]
Message-ID: <YbQPcsnpowmCP7G8@google.com> (raw)
In-Reply-To: <CALrw=nE+yGtRi-0bFFwXa9R8ydHKV7syRYeAYuC0EBTvdFiidQ@mail.gmail.com>

On Fri, Dec 10, 2021, Ignat Korchagin wrote:
> I've been trying to figure out the difference between "good" runs and
> "bad" runs of gvisor. So, if I've been running the following bpftrace
> onliner:

...

> That is, I never get a stack with
> kvm_tdp_mmu_put_root->..->kvm_set_pfn_dirty with a "good" run.
> Perhaps, this may shed some light onto what is going on.

Hmm, a little?

Based on the WARN backtrace, KVM encounters an entire chain of valid, present TDP
MMU paging structures _after_ exit_mm() in the do_exit() path, as the call to
task_work_run() in do_exit() occurs after exit_mm().

That means that kvm_mmu_zap_all() is guaranteed to have been called before the
fatal kvm_arch_destroy_vm(), as either:

  a) exit_mm() put the last reference to mm_users and thus called __mmput ->
     exit_mmap() -> mmu_notifier_release() -> ... -> kvm_mmu_zap_all().

  b) Something else had a reference to mm_users, and so KVM's ->release hook was
     invoked by kvm_destroy_vm() -> mmu_notifier_unregister().

It's probably fairly safe to assume this is a TDP MMU bug, which rules out races
or bad refcounts in other areas.

That means that KVM (a) is somehow losing track of a root, (b) isn't zapping all
SPTEs in kvm_mmu_zap_all(), or (c) is installing a SPTE after the mm has been released.

(a) is unlikely because kvm_tdp_mmu_get_vcpu_root_hpa() is the only way for a
vCPU to get a reference, and it holds mmu_lock for write, doesn't yield, and
either gets a root from the list or adds a root to the list.

(b) is unlikely because I would expect the fallout to be much larger and not
unique to your setup.

That leaves (c), which isn't all that likely either.  I can think of a variety of
ways KVM might write a defunct SPTE, but I can't concoct a scenario where an
entire tree of a present paging structures is written.

Can you run with the below debug patch and see if you get a hit in the failure
scenario?  Or possibly even a non-failure scenario?  This should either confirm
or rule out (c).

---
 arch/x86/kvm/mmu/mmu.c     | 2 ++
 arch/x86/kvm/mmu/tdp_mmu.c | 5 +++++
 include/linux/kvm_host.h   | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1ccee4d17481..e4e283a38570 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5939,6 +5939,8 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	LIST_HEAD(invalid_list);
 	int ign;

+	atomic_set(&kvm->mm_released, 1);
+
 	write_lock(&kvm->mmu_lock);
 restart:
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b69e47e68307..432ccf05f446 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -504,6 +504,9 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
 {
 	lockdep_assert_held_read(&kvm->mmu_lock);

+	WARN_ON(atomic_read(&kvm->mm_released) &&
+		new_spte && !is_removed_spte(new_spte));
+
 	/*
 	 * Do not change removed SPTEs. Only the thread that froze the SPTE
 	 * may modify it.
@@ -577,6 +580,8 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);

+	WARN_ON(atomic_read(&kvm->mm_released) && new_spte);
+
 	/*
 	 * No thread should be using this function to set SPTEs to the
 	 * temporary removed SPTE value.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e7bfcc3b6b0b..8e76e2f6c3be 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -569,6 +569,8 @@ struct kvm {

 	struct mutex slots_lock;

+	atomic_t mm_released;
+
 	/*
 	 * Protects the arch-specific fields of struct kvm_memory_slots in
 	 * use by the VM. To be used under the slots_lock (above) or in a

base-commit: 1c10f4b4877ffaed602d12ff8cbbd5009e82c970
--

next prev parent reply	other threads:[~2021-12-11  2:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 21:44 Potential bug in TDP MMU Ignat Korchagin
2021-11-30  9:29 ` Paolo Bonzini
2021-11-30 10:58   ` Ignat Korchagin
2021-11-30 10:59     ` Ignat Korchagin
2021-11-30 11:11     ` Paolo Bonzini
2021-11-30 11:19       ` Ignat Korchagin
2021-11-30 11:43         ` Ignat Korchagin
2021-11-30 11:49           ` Paolo Bonzini
2021-11-30 12:13           ` Paolo Bonzini
2021-11-30 20:23     ` Sean Christopherson
2021-12-01 23:44       ` Ignat Korchagin
2021-12-10 23:04         ` Ignat Korchagin
2021-12-11  1:34           ` David Matlack
2021-12-11  1:49             ` Paolo Bonzini
2021-12-11 17:46               ` David Matlack
2021-12-11  1:37           ` Paolo Bonzini
2021-12-11  2:39           ` Sean Christopherson [this message]
2021-12-11  9:38             ` Ignat Korchagin
2021-12-11 20:49             ` Paolo Bonzini
2021-12-13 16:14               ` Sean Christopherson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1ccee4d1748 dfblob:e4e283a3857 dfblob:b69e47e6830
dfblob:432ccf05f44 dfblob:e7bfcc3b6b0 dfblob:8e76e2f6c3b )
 OR (
bs:"Re: Potential bug in TDP MMU" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbQPcsnpowmCP7G8@google.com \
    --to=seanjc@google.com \
    --cc=ignat@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=stevensd@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox