Re: [PATCH 2/6] KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mingwei Zhang <mizhang@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/6] KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs
Date: Mon, 11 Apr 2022 17:40:11 +0000	[thread overview]
Message-ID: <YlRn+8bYsHqNIbTU@google.com> (raw)
In-Reply-To: <20220409003847.819686-3-seanjc@google.com>

On Sat, Apr 09, 2022, Sean Christopherson wrote:
> Account and track NX huge pages for nonpaging MMUs so that a future
> enhancement to precisely check if shadow page cannot be replaced by a NX
> huge page doesn't get false positives.  Without correct tracking, KVM can
> get stuck in a loop if an instruction is fetching and writing data on the
> same huge page, e.g. KVM installs a small executable page on the fetch
> fault, replaces it with an NX huge page on the write fault, and faults
> again on the fetch.
> 
> Alternatively, and perhaps ideally, KVM would simply not enforce the
> workaround for nonpaging MMUs.  The guest has no page tables to abuse
> and KVM is guaranteed to switch to a different MMU on CR0.PG being
> toggled so there're no security or performance concerns.  But getting
> make_spte() to play nice now and in the future is unnecessarily complex.
> In the current code base, make_spte() can enforce the mitigation if TDP
> is enabled or the MMU is indirect, but other in-flight patches aim to
> drop the @vcpu param[*].  Without a @vcpu, KVM could either pass in the
> correct information and/or derive it from the shadow page, but the former
> is ugly and the latter subtly non-trivial due to the possitibility of
> direct shadow pages in indirect MMUs.  Given that using shadow paging
> with an unpaged guest is far from top priority in terms of performance,
> _and_ has been subjected to the workaround since its inception, keep it
> simple and just fix the accounting glitch.
> 
> [*] https://lore.kernel.org/all/20220321224358.1305530-5-bgardon@google.com
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu.h      |  9 +++++++++
>  arch/x86/kvm/mmu/mmu.c  |  2 +-
>  arch/x86/kvm/mmu/spte.c | 11 +++++++++++
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 671cfeccf04e..89df062d5921 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -191,6 +191,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		.user = err & PFERR_USER_MASK,
>  		.prefetch = prefetch,
>  		.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> +
> +		/*
> +		 * Note, enforcing the NX huge page mitigation for nonpaging
> +		 * MMUs (shadow paging, CR0.PG=0 in the guest) is completely
> +		 * unnecessary.  The guest doesn't have any page tables to
> +		 * abuse and is guaranteed to switch to a different MMU when
> +		 * CR0.PG is toggled on (may not always be guaranteed when KVM
> +		 * is using TDP).  See make_spte() for details.
> +		 */
>  		.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),

hmm. I think there could be a minor issue here (even in original code).
The nx_huge_page_workaround_enabled is attached here with page fault.
However, at the time of make_spte(), we call is_nx_huge_page_enabled()
again. Since this function will directly check the module parameter,
there might be a race condition here. eg., at the time of page fault,
the workround was 'true', while by the time we reach make_spte(), the
parameter was set to 'false'.

I have not figured out what the side effect is. But I feel like the
make_spte() should just follow the information in kvm_page_fault instead
of directly querying the global config.
>
>  		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d230d2d78ace..9416445afa3e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2954,7 +2954,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  				      it.level - 1, true, ACC_ALL);
>  
>  		link_shadow_page(vcpu, it.sptep, sp);
> -		if (fault->is_tdp && fault->huge_page_disallowed)
> +		if (fault->huge_page_disallowed)
>  			account_nx_huge_page(vcpu->kvm, sp,
>  					     fault->req_level >= it.level);
>  	}
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 4739b53c9734..14ad821cb0c7 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -115,6 +115,17 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>  	if (!prefetch)
>  		spte |= spte_shadow_accessed_mask(spte);
>  
> +	/*
> +	 * For simplicity, enforce the NX huge page mitigation even if not
> +	 * strictly necessary.  KVM could ignore if the mitigation if paging is
> +	 * disabled in the guest, but KVM would then have to ensure a new MMU
> +	 * is loaded (or all shadow pages zapped) when CR0.PG is toggled on,
> +	 * and that's a net negative for performance when TDP is enabled.  KVM
> +	 * could ignore the mitigation if TDP is disabled and CR0.PG=0, as KVM
> +	 * will always switch to a new MMU if paging is enabled in the guest,
> +	 * but that adds complexity just to optimize a mode that is anything
> +	 * but performance critical.
> +	 */
>  	if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
>  	    is_nx_huge_page_enabled()) {
>  		pte_access &= ~ACC_EXEC_MASK;
> -- 
> 2.35.1.1178.g4f1659d476-goog
>

next prev parent reply	other threads:[~2022-04-11 17:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-09  0:38 [PATCH 0/6] KVM: x86: Apply NX mitigation more precisely Sean Christopherson
2022-04-09  0:38 ` [PATCH 1/6] KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked Sean Christopherson
2022-04-09  0:38 ` [PATCH 2/6] KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs Sean Christopherson
2022-04-11 17:40   ` Mingwei Zhang [this message]
2022-04-11 18:33     ` Sean Christopherson
2022-04-11 22:05       ` Mingwei Zhang
2022-04-09  0:38 ` [PATCH 3/6] KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting SPTE Sean Christopherson
2022-04-09  0:38 ` [PATCH 4/6] KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual pages Sean Christopherson
2022-04-09  0:38 ` [PATCH 5/6] KVM: x86/mmu: Add helper to convert SPTE value to its shadow page Sean Christopherson
2022-04-09  0:38 ` [PATCH 6/6] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Sean Christopherson
2022-07-18  5:30 ` [PATCH 0/6] KVM: x86: Apply NX mitigation more precisely Mingwei Zhang
2022-07-18 16:09   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YlRn+8bYsHqNIbTU@google.com \
    --to=mizhang@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.