From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
"Yan Zhao" <yan.y.zhao@intel.com>,
"Sagi Shahar" <sagis@google.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"David Matlack" <dmatlack@google.com>,
"James Houghton" <jthoughton@google.com>
Subject: [PATCH 02/18] KVM: x86/mmu: Always set SPTE's dirty bit if it's created as writable
Date: Thu, 10 Oct 2024 19:10:34 -0700 [thread overview]
Message-ID: <20241011021051.1557902-3-seanjc@google.com> (raw)
In-Reply-To: <20241011021051.1557902-1-seanjc@google.com>
When creating a SPTE, always set the Dirty bit if the Writable bit is set,
i.e. if KVM is creating a writable mapping. If two (or more) vCPUs are
racing to install a writable SPTE on a !PRESENT fault, only the "winning"
vCPU will create a SPTE with W=1 and D=1, all "losers" will generate a
SPTE with W=1 && D=0.
As a result, tdp_mmu_map_handle_target_level() will fail to detect that
the losing faults are effectively spurious, and will overwrite the D=1
SPTE with a D=0 SPTE. For normal VMs, overwriting a present SPTE is a
small performance blip; KVM blasts a remote TLB flush, but otherwise life
goes on.
For upcoming TDX VMs, overwriting a present SPTE is much more costly, and
can even lead to the VM being terminated if KVM isn't careful, e.g. if KVM
attempts TDH.MEM.PAGE.AUG because the TDX code doesn't detect that the
new SPTE is actually the same as the old SPTE (which would be a bug in its
own right).
Suggested-by: Sagi Shahar <sagis@google.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/spte.c | 28 +++++++++-------------------
1 file changed, 9 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index e5af69a8f101..09ce93c4916a 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -219,30 +219,21 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (pte_access & ACC_WRITE_MASK) {
spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask;
- /*
- * When overwriting an existing leaf SPTE, and the old SPTE was
- * writable, skip trying to unsync shadow pages as any relevant
- * shadow pages must already be unsync, i.e. the hash lookup is
- * unnecessary (and expensive).
- *
- * The same reasoning applies to dirty page/folio accounting;
- * KVM marked the folio dirty when the old SPTE was created,
- * thus there's no need to mark the folio dirty again.
- *
- * Note, both cases rely on KVM not changing PFNs without first
- * zapping the old SPTE, which is guaranteed by both the shadow
- * MMU and the TDP MMU.
- */
- if (is_last_spte(old_spte, level) && is_writable_pte(old_spte))
- goto out;
-
/*
* Unsync shadow pages that are reachable by the new, writable
* SPTE. Write-protect the SPTE if the page can't be unsync'd,
* e.g. it's write-tracked (upper-level SPs) or has one or more
* shadow pages and unsync'ing pages is not allowed.
+ *
+ * When overwriting an existing leaf SPTE, and the old SPTE was
+ * writable, skip trying to unsync shadow pages as any relevant
+ * shadow pages must already be unsync, i.e. the hash lookup is
+ * unnecessary (and expensive). Note, this relies on KVM not
+ * changing PFNs without first zapping the old SPTE, which is
+ * guaranteed by both the shadow MMU and the TDP MMU.
*/
- if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, synchronizing, prefetch)) {
+ if ((!is_last_spte(old_spte, level) || !is_writable_pte(old_spte)) &&
+ mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, synchronizing, prefetch)) {
wrprot = true;
pte_access &= ~ACC_WRITE_MASK;
spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
@@ -252,7 +243,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (pte_access & ACC_WRITE_MASK)
spte |= spte_shadow_dirty_mask(spte);
-out:
if (prefetch && !synchronizing)
spte = mark_spte_for_access_track(spte);
--
2.47.0.rc1.288.g06298d1525-goog
next prev parent reply other threads:[~2024-10-11 2:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-11 2:10 [PATCH 00/18] KVM: x86/mmu: A/D cleanups (on top of kvm_follow_pfn) Sean Christopherson
2024-10-11 2:10 ` [PATCH 01/18] KVM: x86/mmu: Flush remote TLBs iff MMU-writable flag is cleared from RO SPTE Sean Christopherson
2024-10-11 2:10 ` Sean Christopherson [this message]
2024-10-11 2:10 ` [PATCH 03/18] KVM: x86/mmu: Fold all of make_spte()'s writable handling into one if-else Sean Christopherson
2024-10-11 2:10 ` [PATCH 04/18] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit Sean Christopherson
2024-10-11 2:10 ` [PATCH 05/18] KVM: x86/mmu: Don't flush TLBs when clearing Dirty bit in shadow MMU Sean Christopherson
2024-10-11 2:10 ` [PATCH 06/18] KVM: x86/mmu: Drop ignored return value from kvm_tdp_mmu_clear_dirty_slot() Sean Christopherson
2024-10-11 2:10 ` [PATCH 07/18] KVM: x86/mmu: Fold mmu_spte_update_no_track() into mmu_spte_update() Sean Christopherson
2024-10-11 2:10 ` [PATCH 08/18] KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable Sean Christopherson
2024-10-11 2:10 ` [PATCH 09/18] KVM: x86/mmu: Add a dedicated flag to track if A/D bits are globally enabled Sean Christopherson
2024-10-11 2:10 ` [PATCH 10/18] KVM: x86/mmu: Set shadow_accessed_mask for EPT even if A/D bits disabled Sean Christopherson
2024-10-11 2:10 ` [PATCH 11/18] KVM: x86/mmu: Set shadow_dirty_mask " Sean Christopherson
2024-10-11 2:10 ` [PATCH 12/18] KVM: x86/mmu: Use Accessed bit even when _hardware_ A/D bits are disabled Sean Christopherson
2024-10-11 2:10 ` [PATCH 13/18] KVM: x86/mmu: Process only valid TDP MMU roots when aging a gfn range Sean Christopherson
2024-10-11 2:10 ` [PATCH 14/18] KVM: x86/mmu: Stop processing TDP MMU roots for test_age if young SPTE found Sean Christopherson
2024-10-17 16:52 ` Paolo Bonzini
2024-10-11 2:10 ` [PATCH 15/18] KVM: x86/mmu: Dedup logic for detecting TLB flushes on leaf SPTE changes Sean Christopherson
2024-10-17 16:53 ` Paolo Bonzini
2024-10-11 2:10 ` [PATCH 16/18] KVM: x86/mmu: Set Dirty bit for new SPTEs, even if _hardware_ A/D bits are disabled Sean Christopherson
2024-10-11 2:10 ` [PATCH 17/18] KVM: Allow arch code to elide TLB flushes when aging a young page Sean Christopherson
2024-10-11 2:10 ` [PATCH 18/18] KVM: x86: Don't emit TLB flushes when aging SPTEs for mmu_notifiers Sean Christopherson
2024-10-17 16:55 ` [PATCH 00/18] KVM: x86/mmu: A/D cleanups (on top of kvm_follow_pfn) Paolo Bonzini
2024-10-31 19:51 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241011021051.1557902-3-seanjc@google.com \
--to=seanjc@google.com \
--cc=alex.bennee@linaro.org \
--cc=dmatlack@google.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=sagis@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox