From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Lai Jiangshan <jiangshan.ljs@antgroup.com>,
Jonathan Corbet <corbet@lwn.net>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
linux-doc@vger.kernel.org
Subject: Re: [RFC PATCH V3 3/4] KVM: X86: Alloc role.pae_root shadow page
Date: Tue, 12 Apr 2022 21:14:26 +0000 [thread overview]
Message-ID: <YlXrshJa2Sd1WQ0P@google.com> (raw)
In-Reply-To: <20220330132152.4568-4-jiangshanlai@gmail.com>
On Wed, Mar 30, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>
> Currently pae_root is special root page, this patch adds facility to
> allow using kvm_mmu_get_page() to allocate pae_root shadow page.
I don't think this will work for shadow paging. CR3 only has to be 32-byte aligned
for PAE paging. Unless I'm missing something subtle in the code, KVM will incorrectly
reuse a pae_root if the guest puts multiple PAE CR3s on a single page because KVM's
gfn calculation will drop bits 11:5.
Handling this as a one-off is probably easier. For TDP, only 32-bit KVM with NPT
benefits from reusing roots, IMO and shaving a few pages in that case is not worth
the complexity.
> @@ -332,7 +337,8 @@ union kvm_mmu_page_role {
> unsigned ad_disabled:1;
> unsigned guest_mode:1;
> unsigned glevel:4;
> - unsigned :2;
> + unsigned pae_root:1;
If we do end up adding a role bit, it can simply be "root", which may or may not
be useful for other things. is_pae_root() is then a combo of root+level. This
will clean up the code a bit as role.root is (mostly?) hardcoded based on the
function, e.g. root allocators set it, child allocators clear it.
> + unsigned :1;
>
> /*
> * This is left at the top of the word so that
> @@ -699,6 +705,7 @@ struct kvm_vcpu_arch {
> struct kvm_mmu_memory_cache mmu_shadow_page_cache;
> struct kvm_mmu_memory_cache mmu_gfn_array_cache;
> struct kvm_mmu_memory_cache mmu_page_header_cache;
> + void *mmu_pae_root_cache;
>
> /*
> * QEMU userspace and the guest each have their own FPU state.
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d53037df8177..81ccaa7c1165 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -694,6 +694,35 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
> }
> }
>
> +static int mmu_topup_pae_root_cache(struct kvm_vcpu *vcpu)
> +{
> + struct page *page;
> +
> + if (vcpu->arch.mmu->shadow_root_level != PT32E_ROOT_LEVEL)
> + return 0;
> + if (vcpu->arch.mmu_pae_root_cache)
> + return 0;
> +
> + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_DMA32);
> + if (!page)
> + return -ENOMEM;
> + vcpu->arch.mmu_pae_root_cache = page_address(page);
> +
> + /*
> + * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
> + * get the CPU to treat the PDPTEs as encrypted. Decrypt the page so
> + * that KVM's writes and the CPU's reads get along. Note, this is
> + * only necessary when using shadow paging, as 64-bit NPT can get at
> + * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
> + * by 32-bit kernels (when KVM itself uses 32-bit NPT).
> + */
> + if (!tdp_enabled)
> + set_memory_decrypted((unsigned long)vcpu->arch.mmu_pae_root_cache, 1);
> + else
> + WARN_ON_ONCE(shadow_me_mask);
> + return 0;
> +}
> +
> static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
> {
> int r;
> @@ -705,6 +734,9 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
> return r;
> r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
> PT64_ROOT_MAX_LEVEL);
> + if (r)
> + return r;
> + r = mmu_topup_pae_root_cache(vcpu);
This doesn't need to be called from the common mmu_topup_memory_caches(), e.g. it
will unnecessarily require allocating another DMA32 page when handling a page fault.
I'd rather call this directly kvm_mmu_load(), which also makes it more obvious
that the cache really is only used for roots.
> if (r)
> return r;
> if (maybe_indirect) {
> @@ -717,12 +749,23 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
> PT64_ROOT_MAX_LEVEL);
> }
>
...
> static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
> struct kvm_mmu_page *sp)
> {
> + struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
> u64 spte;
>
> BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
>
> - spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
> + if (!parent_sp->role.pae_root)
Hmm, without role.root, this could be:
if (sp->role.level == (PT32E_ROOT_level - 1) &&
((__pa(sptep) & PT64_BASE_ADDR_MASK) == vcpu->arch.mmu->root.hpa))
spte = make_pae_pdpte(sp->spt);
else
spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
Which is gross, but it works. We could also do FNAME(link_shadow_page) to send
PAE roots down a dedicated path (also gross). Point being, I don't think we
strictly need a "root" flag unless the PAE roots are put in mmu_page_hash.
> + spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
> + else
> + spte = make_pae_pdpte(sp->spt);
>
> mmu_spte_set(sptep, spte);
>
> @@ -4782,6 +4847,8 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
> role.base.level = kvm_mmu_get_tdp_level(vcpu);
> role.base.direct = true;
> role.base.has_4_byte_gpte = false;
> + if (role.base.level == PT32E_ROOT_LEVEL)
> + role.base.pae_root = 1;
>
> return role;
> }
> @@ -4848,6 +4915,9 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu,
> else
> role.base.level = PT64_ROOT_4LEVEL;
>
> + if (role.base.level == PT32E_ROOT_LEVEL)
> + role.base.pae_root = 1;
> +
> return role;
> }
>
> @@ -4893,6 +4963,8 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu,
>
> role.base.direct = false;
> role.base.level = kvm_mmu_get_tdp_level(vcpu);
> + if (role.base.level == PT32E_ROOT_LEVEL)
> + role.base.pae_root = 1;
>
> return role;
> }
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 67489a060eba..1015f33e0758 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -1043,6 +1043,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> .access = 0x7,
> .quadrant = 0x3,
> .glevel = 0xf,
> + .pae_root = 0x1,
> };
>
> /*
> --
> 2.19.1.6.gb485710b
>
next prev parent reply other threads:[~2022-04-12 22:30 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-30 13:21 [RFC PATCH V3 0/4] KVM: X86: Add and use shadow page with level expanded or acting as pae_root Lai Jiangshan
2022-03-30 13:21 ` [RFC PATCH V3 1/4] KVM: X86: Add arguement gfn and role to kvm_mmu_alloc_page() Lai Jiangshan
2022-03-30 13:21 ` [RFC PATCH V3 2/4] KVM: X86: Introduce role.glevel for level expanded pagetable Lai Jiangshan
2022-03-30 16:01 ` Lai Jiangshan
2022-04-12 21:31 ` Sean Christopherson
2022-04-13 4:13 ` Lai Jiangshan
2022-04-13 8:38 ` Paolo Bonzini
2022-04-13 14:42 ` Sean Christopherson
2022-04-13 14:46 ` Paolo Bonzini
2022-04-13 15:32 ` Sean Christopherson
2022-04-13 16:03 ` Paolo Bonzini
2022-04-14 15:51 ` Sean Christopherson
2022-04-14 16:32 ` Lai Jiangshan
2022-03-30 13:21 ` [RFC PATCH V3 3/4] KVM: X86: Alloc role.pae_root shadow page Lai Jiangshan
2022-04-12 21:14 ` Sean Christopherson [this message]
2022-04-14 9:07 ` Lai Jiangshan
2022-04-14 9:08 ` Paolo Bonzini
2022-04-14 9:32 ` Lai Jiangshan
2022-04-14 10:04 ` Paolo Bonzini
2022-04-14 11:06 ` Lai Jiangshan
2022-04-14 14:12 ` Paolo Bonzini
2022-04-14 14:42 ` Sean Christopherson
2022-04-14 13:35 ` Lai Jiangshan
2022-04-14 14:52 ` Sean Christopherson
2022-03-30 13:21 ` [RFC PATCH V3 4/4] KVM: X86: Use passthrough and pae_root shadow page for 32bit guests Lai Jiangshan
2022-04-12 21:34 ` Sean Christopherson
2022-04-12 9:35 ` [RFC PATCH V3 0/4] KVM: X86: Add and use shadow page with level expanded or acting as pae_root Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YlXrshJa2Sd1WQ0P@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jiangshan.ljs@antgroup.com \
--cc=jiangshanlai@gmail.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.