kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Lai Jiangshan <laijs@linux.alibaba.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC PATCH 4/6] KVM: X86: Introduce role.level_promoted
Date: Tue, 4 Jan 2022 22:14:18 +0000	[thread overview]
Message-ID: <YdTGuoh2vYPsRcu+@google.com> (raw)
In-Reply-To: <20211210092508.7185-5-jiangshanlai@gmail.com>

On Fri, Dec 10, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Level pormotion occurs when mmu->shadow_root_level > mmu->root_level.

s/pormotion/promotion (in multiple places)

That said, I strongly prefer "passthrough" or maybe "nop" over "level_promoted".
Promoted in the context of page level usually refers to making a hugepage, which
is not the case here.

> 
> There are several cases that can cuase level pormotion:
> 
> shadow mmu (shadow paging for 32 bit guest):
> 	case1:	gCR0_PG=1,gEFER_LMA=0,gCR4_PSE=0
> 
> shadow nested NPT (for 32bit L1 hypervisor):
> 	case2:	gCR0_PG=1,gEFER_LMA=0,gCR4_PSE=0,hEFER_LMA=0
> 	case3:	gCR0_PG=1,gEFER_LMA=0,hEFER_LMA=1
> 
> shadow nested NPT (for 64bit L1 hypervisor):
> 	case4:	gEFER_LMA=1,gCR4_LA57=0,hEFER_LMA=1,hCR4_LA57=1
> 
> When level pormotion occurs (32bit guest, case1-3), special roots are
> often used.  But case4 is not using special roots.  It uses shadow page
> without fully aware of the specialty.  It might work accidentally:
> 	1) The root_page (root_sp->spt) is allocated with level = 5,
> 	   and root_sp->spt[0] is allocated with the same gfn and the
> 	   same role except role.level = 4.  Luckly that they are
> 	   different shadow pages.
> 	2) FNAME(walk_addr_generic) sets walker->table_gfn[4] and
> 	   walker->pt_access[4], which are normally unused when
> 	   mmu->shadow_root_level == mmu->root_level == 4, so that
> 	   FNAME(fetch) can use them to allocate shadow page for
> 	   root_sp->spt[0] and link them when shadow_root_level == 5.
> 
> But it has problems.
> If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse)
> and usees the same gfn as the root for the nNPT before and after
> switching gCR4_LA57.  The host (hCR4_LA57=1) wold use the same root_sp
> for guest even guest switches gCR4_LA57.  The guest will see unexpected
> page mapped and L2 can hurts L1.  It is lucky the the problem can't
> hurt L0.
> 
> The root_sp should be like role.direct=1 sometimes: its contents are
> not backed by gptes, root_sp->gfns is meaningless.  For a normal high
> level sp, sp->gfns is often unused and kept zero, but it could be
> relevant and meaningful when sp->gfns is used because they are back
> by concret gptes.  For level-promoted root_sp described before, root_sp
> is just a portal to contribute root_sp->spt[0], and root_sp should not
> have root_sp->gfns and root_sp->spt[0] should not be dropped if gpte[0]
> of the root gfn is changed.
> 
> This patch adds role.level_promoted to address the two problems.
> role.level_promoted is set when shadow paging and
> role.level > gMMU.level.
> 
> An alternative way to fix the problem of case4 is that: also using the
> special root pml5_root for it.  But it would required to change many
> other places because it is assumption that special roots is only used
> for 32bit guests.
> 
> This patch also paves the way to use level promoted shadow page for
> case1-3, but that requires the special handling or PAE paging, so the
> extensive usage of it is not included.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  3 ++-
>  arch/x86/kvm/mmu/mmu.c          | 15 +++++++++++++--
>  arch/x86/kvm/mmu/paging_tmpl.h  |  1 +
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 88ecf53f0d2b..6465c83794fc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -334,7 +334,8 @@ union kvm_mmu_page_role {
>  		unsigned smap_andnot_wp:1;
>  		unsigned ad_disabled:1;
>  		unsigned guest_mode:1;
> -		unsigned :6;
> +		unsigned level_promoted:1;
> +		unsigned :5;

The massive comment above this union needs to be updated.

>  		/*
>  		 * This is left at the top of the word so that
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 54e7cbc15380..4769253e9024 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -767,6 +767,9 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
>  
>  static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  {
> +	if (sp->role.level_promoted)
> +		return sp->gfn;
> +
>  	if (!sp->role.direct)
>  		return sp->gfns[index];
>  
> @@ -776,6 +779,8 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
>  {
>  	if (!sp->role.direct) {
> +		if (WARN_ON_ONCE(sp->role.level_promoted && gfn != sp->gfn))
> +			return;
>  		sp->gfns[index] = gfn;

This is wrong, sp->gfns is NULL when sp->role.level_promoted is true.  I believe
you want something similar to the "get" flow, e.g.

	if (sp->role.passthrough) {
		WARN_ON_ONCE(gfn != sp->gfn);
		return;
	}

	if (!sp->role.direct) {
		...
	}

Alternatively, should we mark passthrough shadow pages as direct=1?  That would
naturally handle this code, and for things like reexecute_instruction()'s usage
of kvm_mmu_unprotect_page(), I don't think passthrough shadow pages should be
considered indirect, e.g. zapping them won't help and the shadow page can't become
unsync.

>  		return;
>  	}
> @@ -1702,7 +1707,7 @@ static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
>  	hlist_del(&sp->hash_link);
>  	list_del(&sp->link);
>  	free_page((unsigned long)sp->spt);
> -	if (!sp->role.direct)
> +	if (!sp->role.direct && !sp->role.level_promoted)

Hmm, at this point, it may be better to just omit the check entirely.  @sp is
zero allocated and free_page() plays nice with NULL "pointers".  I believe TDX
and maybe SNP support will need to do more work here, i.e. may add back a similar
check, but if so then checking sp->gfns directly for !NULL is preferable.

>  		free_page((unsigned long)sp->gfns);
>  	kmem_cache_free(mmu_page_header_cache, sp);
>  }
> @@ -1740,7 +1745,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, gfn_t gfn,
>  
>  	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
>  	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
> -	if (!role.direct)
> +	if (!(role.direct || role.level_promoted))

I personally prefer the style of the previous check, mainly because I'm horrible
at reading !(x || y).

	if (!role.direct && !role.passthrough)

>  		sp->gfns = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
>  	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>  
> @@ -2084,6 +2089,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  		quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
>  		role.quadrant = quadrant;
>  	}
> +	if (role.level_promoted && (level <= vcpu->arch.mmu->root_level))
> +		role.level_promoted = 0;
>  
>  	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
>  	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
> @@ -4836,6 +4843,8 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu,
>  
>  	role.base.direct = false;
>  	role.base.level = kvm_mmu_get_tdp_level(vcpu);
> +	if (role.base.level > role_regs_to_root_level(regs))
> +		role.base.level_promoted = 1;
>  
>  	return role;
>  }
> @@ -5228,6 +5237,8 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
>  
>  	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
> +		if (sp->role.level_promoted)
> +			continue;
>  		if (detect_write_misaligned(sp, gpa, bytes) ||
>  		      detect_write_flooding(sp)) {
>  			kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 5c78300fc7d9..16ac276d342a 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -1043,6 +1043,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  		.level = 0xf,
>  		.access = 0x7,
>  		.quadrant = 0x3,
> +		.level_promoted = 0x1,
>  	};
>  
>  	/*
> -- 
> 2.19.1.6.gb485710b
> 

  reply	other threads:[~2022-01-04 22:14 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10  9:25 [RFC PATCH 0/6] KVM: X86: Add and use shadow page with level promoted or acting as pae_root Lai Jiangshan
2021-12-10  9:25 ` [RFC PATCH 1/6] KVM: X86: Check root_level only in fast_pgd_switch() Lai Jiangshan
2022-01-04 20:24   ` Sean Christopherson
2021-12-10  9:25 ` [RFC PATCH 2/6] KVM: X86: Walk shadow page starting with shadow_root_level Lai Jiangshan
2022-01-04 20:34   ` Sean Christopherson
2022-01-04 20:37     ` Paolo Bonzini
2022-01-04 20:43       ` Maxim Levitsky
2021-12-10  9:25 ` [RFC PATCH 3/6] KVM: X86: Add arguement gfn and role to kvm_mmu_alloc_page() Lai Jiangshan
2022-01-04 20:53   ` Sean Christopherson
2021-12-10  9:25 ` [RFC PATCH 4/6] KVM: X86: Introduce role.level_promoted Lai Jiangshan
2022-01-04 22:14   ` Sean Christopherson [this message]
2022-02-11 16:06     ` Paolo Bonzini
2021-12-10  9:25 ` [RFC PATCH 5/6] KVM: X86: Alloc pae_root shadow page Lai Jiangshan
2022-01-04 21:54   ` Sean Christopherson
2022-01-05  3:11     ` Lai Jiangshan
2022-01-05 16:45       ` Sean Christopherson
2022-01-06  2:01         ` Lai Jiangshan
2022-01-06 19:41           ` Sean Christopherson
2022-01-07  4:36             ` Lai Jiangshan
2021-12-10  9:25 ` [RFC PATCH 6/6] KVM: X86: Use level_promoted and pae_root shadow page for 32bit guests Lai Jiangshan
2022-01-04 20:55   ` Sean Christopherson
2021-12-10 10:27 ` [RFC PATCH 0/6] KVM: X86: Add and use shadow page with level promoted or acting as pae_root Maxim Levitsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdTGuoh2vYPsRcu+@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=laijs@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).