From: Sean Christopherson <seanjc@google.com>
To: Kevin Cheng <chengkev@google.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, yosry@kernel.org
Subject: Re: [PATCH V3 3/4] KVM: VMX: Fix nested EPT violation injection of GVA_IS_VALID/GVA_TRANSLATED bits
Date: Fri, 22 May 2026 15:07:10 -0700 [thread overview]
Message-ID: <ahDTjhbEGe1HwwpG@google.com> (raw)
In-Reply-To: <20260313071033.4153209-4-chengkev@google.com>
On Fri, Mar 13, 2026, Kevin Cheng wrote:
> Make the OR of EPT_VIOLATION_GVA_IS_VALID and
> EPT_VIOLATION_GVA_TRANSLATED from the hardware exit qualification
> conditional on the fault originating from a hardware EPT violation
> exit. The hardware exit qualification reflects the original VM exit,
> which may not be an EPT violation at all, e.g. if KVM is emulating
> an I/O instruction and the memory operand's translation through L1's
> EPT fails. In that case, bits 7-8 of the exit qualification have
> completely different semantics (or are simply zero), and OR'ing them
> into the injected EPT violation corrupts the GVA_IS_VALID/
> GVA_TRANSLATED information.
>
> Use the hardware_nested_page_fault flag introduced in the previous
> patch to distinguish hardware EPT violation exits from
> emulation-triggered faults. For hardware exits, take the
> GVA_IS_VALID/GVA_TRANSLATED bits from the hardware exit qualification.
> For emulation faults, take them from fault->exit_qualification, which
> is populated by the nested_mmu walker in paging_tmpl.h.
>
> Replace the #if PTTYPE != PTTYPE_EPT preprocessor guards in
> paging_tmpl.h with a runtime kvm_nested_fault_is_ept() helper that
> checks guest_mmu to determine whether the nested fault is EPT vs NPT,
> and sets the appropriate field (exit_qualification for EPT, error_code
> for NPF) accordingly.
Same comments on the changelog.
> Signed-off-by: Kevin Cheng <chengkev@google.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 10 ++++++++++
> arch/x86/kvm/mmu/paging_tmpl.h | 22 +++++++++++++++-------
> arch/x86/kvm/vmx/nested.c | 9 +++++----
> 3 files changed, 30 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3dce38ffee76..aabf4ac39c43 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5272,6 +5272,9 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
> return false;
> }
>
> +static bool kvm_nested_fault_is_ept(struct kvm_vcpu *vcpu,
> + struct x86_exception *exception);
> +
> #define PTTYPE_EPT 18 /* arbitrary */
> #define PTTYPE PTTYPE_EPT
> #include "paging_tmpl.h"
> @@ -5285,6 +5288,13 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
> #include "paging_tmpl.h"
> #undef PTTYPE
>
> +static bool kvm_nested_fault_is_ept(struct kvm_vcpu *vcpu,
> + struct x86_exception *exception)
> +{
> + WARN_ON_ONCE(!exception->nested_page_fault);
> + return vcpu->arch.guest_mmu.page_fault == ept_page_fault;
Happily, on top the MBEC+GMET support, this goes away.
> +}
> +
> static void __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
> u64 pa_bits_rsvd, int level, bool nx,
> bool gbpages, bool pse, bool amd)
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index ea2b7569f8a4..15be93d735ab 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -386,9 +386,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> nested_access, &walker->fault);
>
> if (unlikely(real_gpa == INVALID_GPA)) {
> -#if PTTYPE != PTTYPE_EPT
> - walker->fault.error_code |= PFERR_GUEST_PAGE_MASK;
> -#endif
> + /*
> + * Set EPT Violation flags even if the fault is an
> + * EPT Misconfig, fault.exit_qualification is ignored
> + * for EPT Misconfigs.
> + */
> + if (kvm_nested_fault_is_ept(vcpu, &walker->fault))
> + walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID;
> + else
> + walker->fault.error_code |= PFERR_GUEST_PAGE_MASK;
> return 0;
> }
>
> @@ -447,9 +453,11 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>
> real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walker->fault);
> if (real_gpa == INVALID_GPA) {
> -#if PTTYPE != PTTYPE_EPT
> - walker->fault.error_code |= PFERR_GUEST_FINAL_MASK;
> -#endif
> + if (kvm_nested_fault_is_ept(vcpu, &walker->fault))
> + walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID |
> + EPT_VIOLATION_GVA_TRANSLATED;
> + else
> + walker->fault.error_code |= PFERR_GUEST_FINAL_MASK;
> return 0;
And these become:
diff --git arch/x86/kvm/mmu/paging_tmpl.h arch/x86/kvm/mmu/paging_tmpl.h
index 5b2410ed7e45..b3a2f7b59797 100644
--- arch/x86/kvm/mmu/paging_tmpl.h
+++ arch/x86/kvm/mmu/paging_tmpl.h
@@ -502,7 +502,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
* [2:0] - Derive from the access bits. The exit_qualification might be
* out of date if it is serving an EPT misconfiguration.
* [5:3] - Calculated by the page walk of the guest EPT page tables
- * [7:11] - Derived from [7:11] of real exit_qualification
+ * [7:8] - Dervived from "fault stage" access bits
+ * [9:11] - Derived from [9:11] of real exit_qualification
*
* The other bits are set to 0.
*/
@@ -516,6 +517,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
else
walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
+ /*
+ * KVM doesn't emulate features that access GPAs directly, e.g.
+ * Intel Processor Trace. Assume the GVA is always valid; when
+ * propagating faults from hardware, KVM will discard this info
+ * and use the EXIT_QUALIFICATION bits from the VMCS.
+ */
+ walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID;
+
/*
* Accesses to guest paging structures are either "reads" or
* "read+write" accesses, so consider them the latter if write_fault
@@ -523,6 +532,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
*/
if (access & PFERR_GUEST_PAGE_MASK)
walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
+ else
+ walker->fault.exit_qualification |= EPT_VIOLATION_GVA_TRANSLATED;
/*
* Note, pte_access holds the raw RWX bits from the EPTE, not
> }
>
> @@ -496,7 +504,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> * [2:0] - Derive from the access bits. The exit_qualification might be
> * out of date if it is serving an EPT misconfiguration.
> * [5:3] - Calculated by the page walk of the guest EPT page tables
> - * [7:8] - Derived from [7:8] of real exit_qualification
> + * [7:8] - Set at the kvm_translate_gpa() call sites above
> *
> * The other bits are set to 0.
> */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 937aeb474af7..39f8504f5cf2 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -443,11 +443,12 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
> vm_exit_reason = EXIT_REASON_EPT_MISCONFIG;
> exit_qualification = 0;
> } else {
> - exit_qualification = fault->exit_qualification;
> - exit_qualification |= vmx_get_exit_qual(vcpu) &
> - (EPT_VIOLATION_GVA_IS_VALID |
> - EPT_VIOLATION_GVA_TRANSLATED);
> vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
> + exit_qualification = fault->exit_qualification;
> + if (fault->hardware_nested_page_fault)
> + exit_qualification |= vmx_get_exit_qual(vcpu) &
> + (EPT_VIOLATION_GVA_IS_VALID |
> + EPT_VIOLATION_GVA_TRANSLATED);
Similar to the goof in NPT, effectively merging emulated and hardware information
is wrong. On top of the MBEC+GMET changes:
u64 mask = EPT_VIOLATION_GVA_IS_VALID |
EPT_VIOLATION_GVA_TRANSLATED;
if (vmx->nested.msrs.ept_caps & VMX_EPT_ADVANCED_VMEXIT_INFO_BIT)
mask |= EPT_VIOLATION_GVA_USER |
EPT_VIOLATION_GVA_WRITABLE |
EPT_VIOLATION_GVA_NX;
exit_qualification = fault->exit_qualification & ~mask;
/*
* Use the EXIT_QUALIFICATION from the VMCS if and only
* if the hardware VM-Exit from L2 was an EPT Violation.
* If the fault is synthesized, then EXIT_QUALIFICATION
* is stale and/or holds entirely different data. And
* conversely, KVM _must_ rely on EXIT_QUALIFICATION if
* the fault came from hardware, because KVM only sees
* and walks the faulting GPA.
*/
if (from_hardware)
exit_qualification |= vmx_get_exit_qual(vcpu) & mask;
else
exit_qualification |= fault->exit_qualification & mask;
vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
next prev parent reply other threads:[~2026-05-22 22:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-13 7:10 [PATCH V3 0/4] KVM: X86: Correctly populate nested page fault injection error information Kevin Cheng
2026-03-13 7:10 ` [PATCH V3 1/4] KVM: x86: Widen x86_exception's error_code to 64 bits Kevin Cheng
2026-03-13 7:10 ` [PATCH V3 2/4] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
2026-05-22 22:04 ` Sean Christopherson
2026-03-13 7:10 ` [PATCH V3 3/4] KVM: VMX: Fix nested EPT violation injection of GVA_IS_VALID/GVA_TRANSLATED bits Kevin Cheng
2026-05-22 22:07 ` Sean Christopherson [this message]
2026-03-13 7:10 ` [PATCH V3 4/4] KVM: selftests: Add nested page fault injection test Kevin Cheng
2026-05-22 22:33 ` Sean Christopherson
2026-05-22 22:34 ` [PATCH V3 0/4] KVM: X86: Correctly populate nested page fault injection error information Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahDTjhbEGe1HwwpG@google.com \
--to=seanjc@google.com \
--cc=chengkev@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.