From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
David Matlack <dmatlack@google.com>,
Pattara Teerapong <pteerapong@google.com>
Subject: [PATCH 1/8] KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity
Date: Wed, 10 Jan 2024 18:00:41 -0800 [thread overview]
Message-ID: <20240111020048.844847-2-seanjc@google.com> (raw)
In-Reply-To: <20240111020048.844847-1-seanjc@google.com>
Zap invalidated TDP MMU roots at maximum granularity, i.e. with more
frequent conditional resched checkpoints, in order to avoid running for an
extended duration (milliseconds, or worse) without honoring a reschedule
request. And for kernels running with full or real-time preempt models,
zapping at 4KiB granularity also provides significantly reduced latency
for other tasks that are contending for mmu_lock (which isn't necessarily
an overall win for KVM, but KVM should do its best to honor the kernel's
preemption model).
To keep KVM's assertion that zapping at 1GiB granularity is functionally
ok, which is the main reason 1GiB was selected in the past, skip straight
to zapping at 1GiB if KVM is configured to prove the MMU. Zapping roots
is far more common than a vCPU replacing a 1GiB page table with a hugepage,
e.g. generally happens multiple times during boot, and so keeping the test
coverage provided by root zaps is desirable, just not for production.
Cc: David Matlack <dmatlack@google.com>
Cc: Pattara Teerapong <pteerapong@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6ae19b4ee5b1..372da098d3ce 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -734,15 +734,26 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
rcu_read_lock();
/*
- * To avoid RCU stalls due to recursively removing huge swaths of SPs,
- * split the zap into two passes. On the first pass, zap at the 1gb
- * level, and then zap top-level SPs on the second pass. "1gb" is not
- * arbitrary, as KVM must be able to zap a 1gb shadow page without
- * inducing a stall to allow in-place replacement with a 1gb hugepage.
+ * Zap roots in multiple passes of decreasing granularity, i.e. zap at
+ * 4KiB=>2MiB=>1GiB=>root, in order to better honor need_resched() (all
+ * preempt models) or mmu_lock contention (full or real-time models).
+ * Zapping at finer granularity marginally increases the total time of
+ * the zap, but in most cases the zap itself isn't latency sensitive.
*
- * Because zapping a SP recurses on its children, stepping down to
- * PG_LEVEL_4K in the iterator itself is unnecessary.
+ * If KVM is configured to prove the MMU, skip the 4KiB and 2MiB zaps
+ * in order to mimic the page fault path, which can replace a 1GiB page
+ * table with an equivalent 1GiB hugepage, i.e. can get saddled with
+ * zapping a 1GiB region that's fully populated with 4KiB SPTEs. This
+ * allows verifying that KVM can safely zap 1GiB regions, e.g. without
+ * inducing RCU stalls, without relying on a relatively rare event
+ * (zapping roots is orders of magnitude more common). Note, because
+ * zapping a SP recurses on its children, stepping down to PG_LEVEL_4K
+ * in the iterator itself is unnecessary.
*/
+ if (!IS_ENABLED(CONFIG_KVM_PROVE_MMU)) {
+ __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_4K);
+ __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_2M);
+ }
__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G);
__tdp_mmu_zap_root(kvm, root, shared, root->role.level);
--
2.43.0.275.g3460e3d667-goog
next prev parent reply other threads:[~2024-01-11 2:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-11 2:00 [PATCH 0/8] KVM: x86/mmu: Allow TDP MMU (un)load to run in parallel Sean Christopherson
2024-01-11 2:00 ` Sean Christopherson [this message]
2024-01-11 2:00 ` [PATCH 2/8] KVM: x86/mmu: Don't do TLB flush when zappings SPTEs in invalid roots Sean Christopherson
2024-01-11 2:00 ` [PATCH 3/8] KVM: x86/mmu: Allow passing '-1' for "all" as_id for TDP MMU iterators Sean Christopherson
2024-01-11 2:00 ` [PATCH 4/8] KVM: x86/mmu: Skip invalid roots when zapping leaf SPTEs for GFN range Sean Christopherson
2024-01-11 2:00 ` [PATCH 5/8] KVM: x86/mmu: Skip invalid TDP MMU roots when write-protecting SPTEs Sean Christopherson
2024-01-11 2:00 ` [PATCH 6/8] KVM: x86/mmu: Check for usable TDP MMU root while holding mmu_lock for read Sean Christopherson
2024-02-06 10:09 ` Xu Yilun
2024-02-06 14:51 ` Xu Yilun
2024-02-06 18:21 ` Sean Christopherson
2024-02-07 14:54 ` Xu Yilun
2024-01-11 2:00 ` [PATCH 7/8] KVM: x86/mmu: Alloc TDP MMU roots " Sean Christopherson
2024-02-06 15:39 ` Xu Yilun
2024-02-06 18:10 ` Sean Christopherson
2024-02-07 15:13 ` Xu Yilun
2024-01-11 2:00 ` [PATCH 8/8] KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock " Sean Christopherson
2024-02-23 1:35 ` [PATCH 0/8] KVM: x86/mmu: Allow TDP MMU (un)load to run in parallel Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240111020048.844847-2-seanjc@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=pteerapong@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox