Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Kevin Cheng <chengkev@google.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  yosry@kernel.org
Subject: Re: [PATCH V3 3/4] KVM: VMX: Fix nested EPT violation injection of GVA_IS_VALID/GVA_TRANSLATED bits
Date: Fri, 22 May 2026 15:07:10 -0700	[thread overview]
Message-ID: <ahDTjhbEGe1HwwpG@google.com> (raw)
In-Reply-To: <20260313071033.4153209-4-chengkev@google.com>

On Fri, Mar 13, 2026, Kevin Cheng wrote:
> Make the OR of EPT_VIOLATION_GVA_IS_VALID and
> EPT_VIOLATION_GVA_TRANSLATED from the hardware exit qualification
> conditional on the fault originating from a hardware EPT violation
> exit. The hardware exit qualification reflects the original VM exit,
> which may not be an EPT violation at all, e.g. if KVM is emulating
> an I/O instruction and the memory operand's translation through L1's
> EPT fails. In that case, bits 7-8 of the exit qualification have
> completely different semantics (or are simply zero), and OR'ing them
> into the injected EPT violation corrupts the GVA_IS_VALID/
> GVA_TRANSLATED information.
> 
> Use the hardware_nested_page_fault flag introduced in the previous
> patch to distinguish hardware EPT violation exits from
> emulation-triggered faults. For hardware exits, take the
> GVA_IS_VALID/GVA_TRANSLATED bits from the hardware exit qualification.
> For emulation faults, take them from fault->exit_qualification, which
> is populated by the nested_mmu walker in paging_tmpl.h.
> 
> Replace the #if PTTYPE != PTTYPE_EPT preprocessor guards in
> paging_tmpl.h with a runtime kvm_nested_fault_is_ept() helper that
> checks guest_mmu to determine whether the nested fault is EPT vs NPT,
> and sets the appropriate field (exit_qualification for EPT, error_code
> for NPF) accordingly.

Same comments on the changelog.

> Signed-off-by: Kevin Cheng <chengkev@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c         | 10 ++++++++++
>  arch/x86/kvm/mmu/paging_tmpl.h | 22 +++++++++++++++-------
>  arch/x86/kvm/vmx/nested.c      |  9 +++++----
>  3 files changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3dce38ffee76..aabf4ac39c43 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5272,6 +5272,9 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
>  	return false;
>  }
>  
> +static bool kvm_nested_fault_is_ept(struct kvm_vcpu *vcpu,
> +				    struct x86_exception *exception);
> +
>  #define PTTYPE_EPT 18 /* arbitrary */
>  #define PTTYPE PTTYPE_EPT
>  #include "paging_tmpl.h"
> @@ -5285,6 +5288,13 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
>  #include "paging_tmpl.h"
>  #undef PTTYPE
>  
> +static bool kvm_nested_fault_is_ept(struct kvm_vcpu *vcpu,
> +				    struct x86_exception *exception)
> +{
> +	WARN_ON_ONCE(!exception->nested_page_fault);
> +	return vcpu->arch.guest_mmu.page_fault == ept_page_fault;

Happily, on top the MBEC+GMET support, this goes away.

> +}
> +
>  static void __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
>  				    u64 pa_bits_rsvd, int level, bool nx,
>  				    bool gbpages, bool pse, bool amd)
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index ea2b7569f8a4..15be93d735ab 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -386,9 +386,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  					     nested_access, &walker->fault);
>  
>  		if (unlikely(real_gpa == INVALID_GPA)) {
> -#if PTTYPE != PTTYPE_EPT
> -			walker->fault.error_code |= PFERR_GUEST_PAGE_MASK;
> -#endif
> +			/*
> +			 * Set EPT Violation flags even if the fault is an
> +			 * EPT Misconfig, fault.exit_qualification is ignored
> +			 * for EPT Misconfigs.
> +			 */
> +			if (kvm_nested_fault_is_ept(vcpu, &walker->fault))
> +				walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID;
> +			else
> +				walker->fault.error_code |= PFERR_GUEST_PAGE_MASK;
>  			return 0;
>  		}
>  
> @@ -447,9 +453,11 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  
>  	real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walker->fault);
>  	if (real_gpa == INVALID_GPA) {
> -#if PTTYPE != PTTYPE_EPT
> -		walker->fault.error_code |= PFERR_GUEST_FINAL_MASK;
> -#endif
> +		if (kvm_nested_fault_is_ept(vcpu, &walker->fault))
> +			walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID |
> +							    EPT_VIOLATION_GVA_TRANSLATED;
> +		else
> +			walker->fault.error_code |= PFERR_GUEST_FINAL_MASK;
>  		return 0;

And these become:

diff --git arch/x86/kvm/mmu/paging_tmpl.h arch/x86/kvm/mmu/paging_tmpl.h
index 5b2410ed7e45..b3a2f7b59797 100644
--- arch/x86/kvm/mmu/paging_tmpl.h
+++ arch/x86/kvm/mmu/paging_tmpl.h
@@ -502,7 +502,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
         * [2:0] - Derive from the access bits. The exit_qualification might be
         *         out of date if it is serving an EPT misconfiguration.
         * [5:3] - Calculated by the page walk of the guest EPT page tables
-        * [7:11] - Derived from [7:11] of real exit_qualification
+        * [7:8] - Dervived from "fault stage" access bits
+        * [9:11] - Derived from [9:11] of real exit_qualification
         *
         * The other bits are set to 0.
         */
@@ -516,6 +517,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
                else
                        walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
 
+               /*
+                * KVM doesn't emulate features that access GPAs directly, e.g.
+                * Intel Processor Trace.  Assume the GVA is always valid; when
+                * propagating faults from hardware, KVM will discard this info
+                * and use the EXIT_QUALIFICATION bits from the VMCS.
+                */
+               walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID;
+
                /*
                 * Accesses to guest paging structures are either "reads" or
                 * "read+write" accesses, so consider them the latter if write_fault
@@ -523,6 +532,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
                 */
                if (access & PFERR_GUEST_PAGE_MASK)
                        walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
+               else
+                       walker->fault.exit_qualification |= EPT_VIOLATION_GVA_TRANSLATED;
 
                /*
                 * Note, pte_access holds the raw RWX bits from the EPTE, not

>  	}
>  
> @@ -496,7 +504,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	 * [2:0] - Derive from the access bits. The exit_qualification might be
>  	 *         out of date if it is serving an EPT misconfiguration.
>  	 * [5:3] - Calculated by the page walk of the guest EPT page tables
> -	 * [7:8] - Derived from [7:8] of real exit_qualification
> +	 * [7:8] - Set at the kvm_translate_gpa() call sites above
>  	 *
>  	 * The other bits are set to 0.
>  	 */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 937aeb474af7..39f8504f5cf2 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -443,11 +443,12 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
>  			vm_exit_reason = EXIT_REASON_EPT_MISCONFIG;
>  			exit_qualification = 0;
>  		} else {
> -			exit_qualification = fault->exit_qualification;
> -			exit_qualification |= vmx_get_exit_qual(vcpu) &
> -					      (EPT_VIOLATION_GVA_IS_VALID |
> -					       EPT_VIOLATION_GVA_TRANSLATED);
>  			vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
> +			exit_qualification = fault->exit_qualification;
> +			if (fault->hardware_nested_page_fault)
> +				exit_qualification |= vmx_get_exit_qual(vcpu) &
> +						      (EPT_VIOLATION_GVA_IS_VALID |
> +						       EPT_VIOLATION_GVA_TRANSLATED);

Similar to the goof in NPT, effectively merging emulated and hardware information
is wrong.  On top of the MBEC+GMET changes:

			u64 mask = EPT_VIOLATION_GVA_IS_VALID |
				   EPT_VIOLATION_GVA_TRANSLATED;

			if (vmx->nested.msrs.ept_caps & VMX_EPT_ADVANCED_VMEXIT_INFO_BIT)
				mask |= EPT_VIOLATION_GVA_USER |
					EPT_VIOLATION_GVA_WRITABLE |
					EPT_VIOLATION_GVA_NX;

			exit_qualification = fault->exit_qualification & ~mask;

			/*
			 * Use the EXIT_QUALIFICATION from the VMCS if and only
			 * if the hardware VM-Exit from L2 was an EPT Violation.
			 * If the fault is synthesized, then EXIT_QUALIFICATION
			 * is stale and/or holds entirely different data.  And
			 * conversely, KVM _must_ rely on EXIT_QUALIFICATION if
			 * the fault came from hardware, because KVM only sees
			 * and walks the faulting GPA.
			 */
			if (from_hardware)
				exit_qualification |= vmx_get_exit_qual(vcpu) & mask;
			else
				exit_qualification |= fault->exit_qualification & mask;

			vm_exit_reason = EXIT_REASON_EPT_VIOLATION;

  reply	other threads:[~2026-05-22 22:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-13  7:10 [PATCH V3 0/4] KVM: X86: Correctly populate nested page fault injection error information Kevin Cheng
2026-03-13  7:10 ` [PATCH V3 1/4] KVM: x86: Widen x86_exception's error_code to 64 bits Kevin Cheng
2026-03-13  7:10 ` [PATCH V3 2/4] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
2026-05-22 22:04   ` Sean Christopherson
2026-03-13  7:10 ` [PATCH V3 3/4] KVM: VMX: Fix nested EPT violation injection of GVA_IS_VALID/GVA_TRANSLATED bits Kevin Cheng
2026-05-22 22:07   ` Sean Christopherson [this message]
2026-03-13  7:10 ` [PATCH V3 4/4] KVM: selftests: Add nested page fault injection test Kevin Cheng
2026-05-22 22:33   ` Sean Christopherson
2026-05-22 22:34 ` [PATCH V3 0/4] KVM: X86: Correctly populate nested page fault injection error information Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahDTjhbEGe1HwwpG@google.com \
    --to=seanjc@google.com \
    --cc=chengkev@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox