From: Sean Christopherson <seanjc@google.com>
To: Ben Gardon <bgardon@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
Peter Shier <pshier@google.com>,
David Matlack <dmatlack@google.com>,
Mingwei Zhang <mizhang@google.com>,
Yulei Zhang <yulei.kernel@gmail.com>,
Wanpeng Li <kernellwp@gmail.com>,
Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
Kai Huang <kai.huang@intel.com>,
Keqian Zhu <zhukeqian1@huawei.com>,
David Hildenbrand <david@redhat.com>
Subject: Re: [RFC 11/19] KVM: x86/mmu: Factor shadow_zero_check out of make_spte
Date: Thu, 18 Nov 2021 16:37:29 +0000 [thread overview]
Message-ID: <YZaBSf+bPc69WR1R@google.com> (raw)
In-Reply-To: <YZXIqAHftH4d+B9Y@google.com>
On Thu, Nov 18, 2021, Sean Christopherson wrote:
> Another idea. The only difference between 5-level and 4-level is that 5-level
> fills in index [4], and I'm pretty sure 4-level doesn't touch that index. For
> PAE NPT (32-bit SVM), the shadow root level will never change, so that's not an issue.
>
> Nested NPT is the only case where anything for an EPT/NPT MMU can change, because
> that follows EFER.NX.
>
> In other words, the non-nested TDP reserved bits don't need to be recalculated
> regardless of level, they can just fill in 5-level and leave it be.
>
> E.g. something like the below. The sp->role.direct check could be removed if we
> forced EFER.NX for nested NPT.
>
> It's a bit ugly in that we'd pass both @kvm and @vcpu, so that needs some more
> thought, but at minimum it means there's no need to recalc the reserved bits.
Ok, I think my final vote is to have the reserved bits passed in, but with the
non-nested TDP reserved bits being computed at MMU init.
I would also prefer to keep the existing make_spte() name so that there's no churn
in those call sites, and to make the relationship between the wrapper, mask_spte(),
and the "real" helper, __make_spte(), more obvious and aligned with the usual
kernel style.
So with the kvm_vcpu_ad_need_write_protect() change and my proposed hack-a-fix for
kvm_x86_get_mt_mask(), the end result would look like:
bool __make_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
struct kvm_memory_slot *slot, unsigned int pte_access,
gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch,
bool can_unsync, bool host_writable, u64 *new_spte,
struct rsvd_bits_validate *shadow_rsvd_bits)
{
int level = sp->role.level;
u64 spte = SPTE_MMU_PRESENT_MASK;
bool wrprot = false;
if (sp->role.ad_disabled)
spte |= SPTE_TDP_AD_DISABLED_MASK;
else if (kvm_mmu_page_ad_need_write_protect(sp))
spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
/*
* For the EPT case, shadow_present_mask is 0 if hardware
* supports exec-only page table entries. In that case,
* ACC_USER_MASK and shadow_user_mask are used to represent
* read access. See FNAME(gpte_access) in paging_tmpl.h.
*/
spte |= shadow_present_mask;
if (!prefetch)
spte |= spte_shadow_accessed_mask(spte);
if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
is_nx_huge_page_enabled()) {
pte_access &= ~ACC_EXEC_MASK;
}
if (pte_access & ACC_EXEC_MASK)
spte |= shadow_x_mask;
else
spte |= shadow_nx_mask;
if (pte_access & ACC_USER_MASK)
spte |= shadow_user_mask;
if (level > PG_LEVEL_4K)
spte |= PT_PAGE_SIZE_MASK;
if (tdp_enabled)
spte |= static_call(kvm_x86_get_mt_mask)(kvm, gfn,
kvm_is_mmio_pfn(pfn));
if (host_writable)
spte |= shadow_host_writable_mask;
else
pte_access &= ~ACC_WRITE_MASK;
if (!kvm_is_mmio_pfn(pfn))
spte |= shadow_me_mask;
spte |= (u64)pfn << PAGE_SHIFT;
if (pte_access & ACC_WRITE_MASK) {
spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask;
/*
* Optimization: for pte sync, if spte was writable the hash
* lookup is unnecessary (and expensive). Write protection
* is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots.
* Same reasoning can be applied to dirty page accounting.
*/
if (is_writable_pte(old_spte))
goto out;
/*
* Unsync shadow pages that are reachable by the new, writable
* SPTE. Write-protect the SPTE if the page can't be unsync'd,
* e.g. it's write-tracked (upper-level SPs) or has one or more
* shadow pages and unsync'ing pages is not allowed.
*/
if (mmu_try_to_unsync_pages(kvm, slot, gfn, can_unsync, prefetch)) {
pgprintk("%s: found shadow page for %llx, marking ro\n",
__func__, gfn);
wrprot = true;
pte_access &= ~ACC_WRITE_MASK;
spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
}
}
if (pte_access & ACC_WRITE_MASK)
spte |= spte_shadow_dirty_mask(spte);
out:
if (prefetch)
spte = mark_spte_for_access_track(spte);
WARN_ONCE(is_rsvd_spte(shadow_rsvd_bits), spte, level),
"spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
get_rsvd_bits(&shadow_rsvd_bits, spte, level));
if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
/* Enforced by kvm_mmu_hugepage_adjust. */
WARN_ON(level > PG_LEVEL_4K);
mark_page_dirty_in_slot(kvm, slot, gfn);
}
*new_spte = spte;
return wrprot;
}
bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
struct kvm_memory_slot *slot,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
u64 old_spte, bool prefetch, bool can_unsync,
bool host_writable, u64 *new_spte)
{
return __make_spte(vcpu->kvm, sp, slot, pte_access, gfn, pfn, old_spte,
prefetch, can_unsync, host_writable, new_spte,
&vcpu->arch.mmu->shadow_zero_check);
}
next prev parent reply other threads:[~2021-11-18 16:37 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-10 22:29 [RFC 00/19] KVM: x86/mmu: Optimize disabling dirty logging Ben Gardon
2021-11-10 22:29 ` [RFC 01/19] KVM: x86/mmu: Fix TLB flush range when handling disconnected pt Ben Gardon
2021-11-11 17:44 ` David Matlack
2021-11-10 22:29 ` [RFC 02/19] KVM: x86/mmu: Batch TLB flushes for a single zap Ben Gardon
2021-11-11 18:06 ` David Matlack
2021-11-12 23:53 ` Sean Christopherson
2021-11-10 22:29 ` [RFC 03/19] KVM: x86/mmu: Factor flush and free up when zapping under MMU write lock Ben Gardon
2021-11-11 18:31 ` David Matlack
2021-11-10 22:29 ` [RFC 04/19] KVM: x86/mmu: Yield while processing disconnected_sps Ben Gardon
2021-11-11 18:50 ` David Matlack
2021-11-10 22:29 ` [RFC 05/19] KVM: x86/mmu: Remove redundant flushes when disabling dirty logging Ben Gardon
2021-11-11 18:55 ` David Matlack
2021-11-10 22:29 ` [RFC 06/19] KVM: x86/mmu: Introduce vcpu_make_spte Ben Gardon
2021-11-10 22:29 ` [RFC 07/19] KVM: x86/mmu: Factor wrprot for nested PML out of make_spte Ben Gardon
2021-11-18 2:12 ` Sean Christopherson
2021-11-18 17:43 ` Ben Gardon
2021-11-18 18:04 ` Paolo Bonzini
2021-11-10 22:29 ` [RFC 08/19] KVM: x86/mmu: Factor mt_mask " Ben Gardon
2021-11-10 22:30 ` [RFC 09/19] KVM: x86/mmu: Remove need for a vcpu from kvm_slot_page_track_is_active Ben Gardon
2021-11-10 22:30 ` [RFC 10/19] KVM: x86/mmu: Remove need for a vcpu from mmu_try_to_unsync_pages Ben Gardon
2021-11-10 22:30 ` [RFC 11/19] KVM: x86/mmu: Factor shadow_zero_check out of make_spte Ben Gardon
2021-11-10 22:44 ` Paolo Bonzini
2021-11-10 23:49 ` Ben Gardon
2021-11-11 1:18 ` Sean Christopherson
2021-11-11 1:44 ` Sean Christopherson
2021-11-11 7:06 ` Paolo Bonzini
2021-11-18 2:05 ` Sean Christopherson
2021-11-18 3:29 ` Sean Christopherson
2021-11-18 16:37 ` Sean Christopherson [this message]
2021-11-18 17:19 ` Paolo Bonzini
2021-11-18 18:02 ` Sean Christopherson
2021-11-18 18:07 ` Paolo Bonzini
2021-11-18 18:14 ` Sean Christopherson
2021-11-10 22:30 ` [RFC 12/19] KVM: x86/mmu: Replace vcpu argument with kvm pointer in make_spte Ben Gardon
2021-11-10 22:30 ` [RFC 13/19] KVM: x86/mmu: Factor out the meat of reset_tdp_shadow_zero_bits_mask Ben Gardon
2021-11-10 22:30 ` [RFC 14/19] KVM: x86/mmu: Propagate memslot const qualifier Ben Gardon
2021-11-10 22:30 ` [RFC 15/19] KVM: x86/MMU: Refactor vmx_get_mt_mask Ben Gardon
2021-11-10 22:30 ` [RFC 16/19] KVM: x86/mmu: Factor out part of vmx_get_mt_mask which does not depend on vcpu Ben Gardon
2021-11-10 22:30 ` [RFC 17/19] KVM: x86/mmu: Add try_get_mt_mask to x86_ops Ben Gardon
2021-11-10 22:30 ` [RFC 18/19] KVM: x86/mmu: Make kvm_is_mmio_pfn usable outside of spte.c Ben Gardon
2021-11-10 22:30 ` [RFC 19/19] KVM: x86/mmu: Promote pages in-place when disabling dirty logging Ben Gardon
2021-11-15 21:24 ` [RFC 00/19] KVM: x86/mmu: Optimize " Ben Gardon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YZaBSf+bPc69WR1R@google.com \
--to=seanjc@google.com \
--cc=bgardon@google.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=kai.huang@intel.com \
--cc=kernellwp@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mizhang@google.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=pshier@google.com \
--cc=xiaoguangrong.eric@gmail.com \
--cc=yulei.kernel@gmail.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).