* [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info
@ 2026-05-22 23:26 Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits Sean Christopherson
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
Kevin's series to fix how KVM populates error information when injecting
nested page faults (NPF on SVM, EPT violations on VMX) to L1 during
instruction emulation.
See v3 for the full cover letter.
v4:
- Pass @from_hardware directly instead of stuff a flag in x86_exception.
- Use the bits in @access (thanks to MBEC+GMET) to get the fault stage.
- Check the entire PFEC/EXIT_QUAL in the selftest.
- Use hardware _or_ KVM information, never merge the two.
- Name the selftest nested_tdp_fault_test.
v3:
- https://lore.kernel.org/all/20260313071033.4153209-1-chengkev@google.com
- Introduce hardware_nested_page_fault in struct x86_exception to
distinguish hardware NPF/EPT exits from emulation-triggered faults
as per Sean
- For SVM, take PFERR_GUEST_FAULT_STAGE bits from hardware exit_info_1
on hardware NPF exits, and from fault->error_code on emulation
faults
- For VMX, conditionally OR hardware exit qualification GVA_IS_VALID/
GVA_TRANSLATED bits only for hardware EPT violation exits as per
Sean
- Replace #if PTTYPE != PTTYPE_EPT preprocessor guards in
paging_tmpl.h with runtime kvm_nested_fault_is_ept() helper that
checks guest_mmu as per Sean
v2:
- https://lore.kernel.org/all/20260224071822.369326-1-chengkev@google.com
- Split out the widening of the x86_exception error code into a
separate patch as per Sean.
- Added a WARN if both PFERR_GUEST_* bits are set and force the
exit_info_1 to PFERR_GUEST_FINAL_MASK if this occurs.
- Removed the selftest TDP helpers as per Sean
- Added a patch to populate the EPT violation bits for VMX nested page
faults as per Sean.
- Expanded the added selftest to support VMX and also added a test
case for write protected pages using the INS instruction.
v1: https://lore.kernel.org/all/20260121004906.2373989-1-chengkev@google.com
Kevin Cheng (4):
KVM: x86: Widen x86_exception's error_code to 64 bits
KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK
bits
KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED
bits
KVM: selftests: Add nested page fault injection test
Sean Christopherson (1):
KVM: x86: Tell ->inject_page_fault() whether or a fault came from
hardware
arch/x86/include/asm/kvm_host.h | 20 +-
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 36 +-
arch/x86/kvm/svm/nested.c | 38 ++-
arch/x86/kvm/vmx/nested.c | 29 +-
arch/x86/kvm/x86.c | 16 +-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 9 +
.../selftests/kvm/x86/nested_tdp_fault_test.c | 313 ++++++++++++++++++
9 files changed, 422 insertions(+), 42 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c
base-commit: 66939c1603bd5579e63278f9dc72cba5b79da9b5
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
@ 2026-05-22 23:26 ` Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware Sean Christopherson
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
Widen the error_code field in struct x86_exception from u16 to u64 to
accommodate AMD's NPF error code, which defines information bits above
bit 31, e.g. PFERR_GUEST_FINAL_MASK (bit 32), and PFERR_GUEST_PAGE_MASK
(bit 33).
Retain the u16 type for the local errcode variable in walk_addr_generic
as the walker synthesizes conventional #PF error codes that are
architecturally limited to bits 15:0.
Signed-off-by: Kevin Cheng <chengkev@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 72aece9ef575..f5df31a52996 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -22,7 +22,7 @@ enum x86_intercept_stage;
struct x86_exception {
u8 vector;
bool error_code_valid;
- u16 error_code;
+ u64 error_code;
bool nested_page_fault;
union {
u64 address; /* cr2 or nested page fault gpa */
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 07100bbfc270..51f8b4522314 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -328,6 +328,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
const int write_fault = access & PFERR_WRITE_MASK;
const int user_fault = access & PFERR_USER_MASK;
const int fetch_fault = access & PFERR_FETCH_MASK;
+ /*
+ * Note! Track the error_code that's common to legacy shadow paging
+ * and NPT shadow paging as a u16 to guard against unintentionally
+ * setting any of bits 63:16. Architecturally, the #PF error code is
+ * 32 bits, and Intel CPUs don't support settings bits 31:16.
+ */
u16 errcode = 0;
gpa_t real_gpa;
gfn_t gfn;
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits Sean Christopherson
@ 2026-05-22 23:26 ` Sean Christopherson
2026-05-26 18:18 ` Yosry Ahmed
2026-05-22 23:26 ` [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits Sean Christopherson
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
When injecting a page fault (including nested TDP faults into L1), tell the
injection routine whether or not the fault originated in hardware, i.e. if
KVM is effectively forwarding a fault it intercept. For nested TDP fault
injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
VMCB/VMCS, _if_ the fault originated in hardware.
No functional change intended (nothing uses the new param, yet...).
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/svm/nested.c | 3 ++-
arch/x86/kvm/vmx/nested.c | 3 ++-
arch/x86/kvm/x86.c | 16 +++++++++-------
5 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 271bdd109a98..d11063c36f03 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -484,7 +484,8 @@ struct kvm_mmu {
u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
- struct x86_exception *fault);
+ struct x86_exception *fault,
+ bool from_hardware);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t gva_or_gpa, u64 access,
struct x86_exception *exception);
@@ -2305,9 +2306,18 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned int nr,
bool has_error_code, u32 error_code);
-void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
-void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault);
+void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
+ bool from_hardware);
+void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault,
+ bool from_hardware);
+
+static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault)
+{
+ __kvm_inject_emulated_page_fault(vcpu, fault, false);
+}
+
bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 51f8b4522314..cc9c7deb34bc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -813,7 +813,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
*/
if (!r) {
if (!fault->prefetch)
- kvm_inject_emulated_page_fault(vcpu, &walker.fault);
+ __kvm_inject_emulated_page_fault(vcpu, &walker.fault, true);
return RET_PF_RETRY;
}
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4ef9bc6a553f..1c1a5e322d18 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -34,7 +34,8 @@
#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
+ struct x86_exception *fault,
+ bool from_hardware)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4690a4d23709..3bb7eaa7b2a5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -411,7 +411,8 @@ static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
}
static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
+ struct x86_exception *fault,
+ bool from_hardware)
{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cecb2f84e5e0..aa2f8f43d94c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -969,7 +969,8 @@ static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
EMULTYPE_COMPLETE_USER_EXIT);
}
-void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
+void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
+ bool from_hardware)
{
++vcpu->stat.pf_guest;
@@ -986,8 +987,9 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
fault->address);
}
-void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
+void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault,
+ bool from_hardware)
{
struct kvm_mmu *fault_mmu;
WARN_ON_ONCE(fault->vector != PF_VECTOR);
@@ -1004,9 +1006,9 @@ void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
kvm_mmu_invalidate_addr(vcpu, fault_mmu, fault->address,
KVM_MMU_ROOT_CURRENT);
- fault_mmu->inject_page_fault(vcpu, fault);
+ fault_mmu->inject_page_fault(vcpu, fault, from_hardware);
}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inject_emulated_page_fault);
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_inject_emulated_page_fault);
void kvm_inject_nmi(struct kvm_vcpu *vcpu)
{
@@ -14065,7 +14067,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
fault.nested_page_fault = false;
fault.address = work->arch.token;
fault.async_page_fault = true;
- kvm_inject_page_fault(vcpu, &fault);
+ kvm_inject_page_fault(vcpu, &fault, false);
return true;
} else {
/*
@@ -14236,7 +14238,7 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c
fault.address = gva;
fault.async_page_fault = false;
}
- vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault);
+ vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault, true);
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_fixup_and_inject_pf_error);
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware Sean Christopherson
@ 2026-05-22 23:26 ` Sean Christopherson
2026-05-26 18:31 ` Yosry Ahmed
2026-05-22 23:27 ` [PATCH v4 4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits Sean Christopherson
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
Nested Page Fault into L1. Currently, KVM blindly stuffs GUEST_FINAL into
L1, which is blatantly wrong given that KVM obviously generates NPFs for
page table accesses.
There are two paths that trigger NPF injection: hardware NPF exits (from
L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
emulating an L2 GVA access. For the hardware case, use the bits verbatim
from the VMCB, as KVM is simply forwarding a NPF to L1. For the emulation
case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
were recently added for MBEC+GMET support).
To differentiate between the two cases, add "hardware_nested_page_fault"
to "struct x86_exception", and set it when injecting a NPF in response to
an NPF exit from L2.
To help guard against future goofs, assert that exactly one of GUEST_PAGE
or GUEST_FINAL is set when injecting a NPF. Unlike VMX, there are no
(known) cases where hardware doesn't set either bit, and KVM should always
set one or the other when emulating a GVA access.
Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: use plumbed in @access bits, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu/paging_tmpl.h | 15 +++++---------
arch/x86/kvm/svm/nested.c | 35 ++++++++++++++++++++++-----------
3 files changed, 31 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d11063c36f03..e1c4151d6693 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -284,6 +284,8 @@ enum x86_intercept_stage;
#define PFERR_GUEST_RMP_MASK BIT_ULL(31)
#define PFERR_GUEST_FINAL_MASK BIT_ULL(32)
#define PFERR_GUEST_PAGE_MASK BIT_ULL(33)
+#define PFERR_GUEST_FAULT_STAGE_MASK \
+ (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK)
#define PFERR_GUEST_ENC_MASK BIT_ULL(34)
#define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
#define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index cc9c7deb34bc..66eee6914234 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -397,16 +397,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
nested_access | PFERR_GUEST_PAGE_MASK,
&walker->fault, 0);
- /*
- * FIXME: This can happen if emulation (for of an INS/OUTS
- * instruction) triggers a nested page fault. The exit
- * qualification / exit info field will incorrectly have
- * "guest page access" as the nested page fault's cause,
- * instead of "guest page structure access". To fix this,
- * the x86_exception struct should be augmented with enough
- * information to fix the exit_qualification or exit_info_1
- * fields.
- */
if (unlikely(real_gpa == INVALID_GPA))
return 0;
@@ -548,6 +538,11 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
walker->fault.async_page_fault = false;
+#if PTTYPE != PTTYPE_EPT
+ if (walker->fault.nested_page_fault)
+ walker->fault.error_code |= access & PFERR_GUEST_FAULT_STAGE_MASK;
+#endif
+
trace_kvm_mmu_walker_error(walker->fault.error_code);
return 0;
}
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1c1a5e322d18..28ac5d5c990d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -39,19 +39,32 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
+ u64 fault_stage;
- if (vmcb->control.exit_code != SVM_EXIT_NPF) {
- /*
- * TODO: track the cause of the nested page fault, and
- * correctly fill in the high bits of exit_info_1.
- */
- vmcb->control.exit_code = SVM_EXIT_NPF;
- vmcb->control.exit_info_1 = (1ULL << 32);
- vmcb->control.exit_info_2 = fault->address;
- }
+ /*
+ * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only
+ * available in the hardware exit_info_1, since the guest_mmu
+ * walker doesn't know whether the faulting GPA was a page table
+ * page or final page from L2's perspective.
+ */
+ if (from_hardware)
+ fault_stage = vmcb->control.exit_info_1 &
+ PFERR_GUEST_FAULT_STAGE_MASK;
+ else
+ fault_stage = fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK;
- vmcb->control.exit_info_1 &= ~0xffffffffULL;
- vmcb->control.exit_info_1 |= fault->error_code;
+ /*
+ * All nested page faults should be annotated as occurring on the
+ * final translation *or* the page walk. Arbitrarily choose "final"
+ * if KVM is buggy and enumerated both or neither.
+ */
+ if (WARN_ON_ONCE(hweight64(fault_stage) != 1))
+ fault_stage = PFERR_GUEST_FINAL_MASK;
+
+ vmcb->control.exit_code = SVM_EXIT_NPF;
+ vmcb->control.exit_info_1 = fault_stage |
+ (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
+ vmcb->control.exit_info_2 = fault->address;
nested_svm_vmexit(svm);
}
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
` (2 preceding siblings ...)
2026-05-22 23:26 ` [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits Sean Christopherson
@ 2026-05-22 23:27 ` Sean Christopherson
2026-05-22 23:27 ` [PATCH v4 5/5] KVM: selftests: Add nested page fault injection test Sean Christopherson
2026-05-27 18:10 ` [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
When injecting an EPT Violation into L2 in response to a fault detected
while emulating an L2 GVA access, synthesize the GVA_IS_VALID and
GVA_TRANSLATED bits using information provided by the walker, instead of
pulling the bits from vmcs02.EXIT_QUALIFICATION. The information in
vmcs02.EXIT_QUALIFICATION is valid/correct if and only if the fault being
injected into L1 is the direct result of an EPT Violation VM-Exit from L2.
E.g. if KVM is emulating an I/O instruction and the memory operand's
translation through L1's EPT fails, using vmcs02.EXIT_QUALIFICATION is
wrong as the semantics for EXIT_QUALIFICATION would be for an I/O exit,
not an EPT Violation exit.
Opportunistically clean up the formatting for creating the mask of bits
to pull from vmcs02.EXIT_QUALIFICATION.
Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: use plumbed in @access bits, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/paging_tmpl.h | 13 ++++++++++++-
arch/x86/kvm/vmx/nested.c | 26 +++++++++++++++++++++-----
2 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 66eee6914234..df3ae0c7ec2c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -502,7 +502,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
* [2:0] - Derive from the access bits. The exit_qualification might be
* out of date if it is serving an EPT misconfiguration.
* [5:3] - Calculated by the page walk of the guest EPT page tables
- * [7:11] - Derived from [7:11] of real exit_qualification
+ * [7:8] - Derived from "fault stage" access bits
+ * [9:11] - Derived from [9:11] of real exit_qualification
*
* The other bits are set to 0.
*/
@@ -516,6 +517,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
else
walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
+ /*
+ * KVM doesn't emulate features that access GPAs directly, e.g.
+ * Intel Processor Trace. Assume the GVA is always valid; when
+ * propagating faults from hardware, KVM will discard this info
+ * and use the EXIT_QUALIFICATION bits from the VMCS.
+ */
+ walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID;
+
/*
* Accesses to guest paging structures are either "reads" or
* "read+write" accesses, so consider them the latter if write_fault
@@ -523,6 +532,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
*/
if (access & PFERR_GUEST_PAGE_MASK)
walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
+ else
+ walker->fault.exit_qualification |= EPT_VIOLATION_GVA_TRANSLATED;
/*
* Note, pte_access holds the raw RWX bits from the EPTE, not
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3bb7eaa7b2a5..a78ce0080963 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -445,13 +445,29 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
exit_qualification = 0;
} else {
u64 mask = EPT_VIOLATION_GVA_IS_VALID |
- EPT_VIOLATION_GVA_TRANSLATED;
+ EPT_VIOLATION_GVA_TRANSLATED;
+
if (vmx->nested.msrs.ept_caps & VMX_EPT_ADVANCED_VMEXIT_INFO_BIT)
mask |= EPT_VIOLATION_GVA_USER |
- EPT_VIOLATION_GVA_WRITABLE |
- EPT_VIOLATION_GVA_NX;
- exit_qualification = fault->exit_qualification;
- exit_qualification |= vmx_get_exit_qual(vcpu) & mask;
+ EPT_VIOLATION_GVA_WRITABLE |
+ EPT_VIOLATION_GVA_NX;
+
+ exit_qualification = fault->exit_qualification & ~mask;
+
+ /*
+ * Use the EXIT_QUALIFICATION from the VMCS if and only
+ * if the hardware VM-Exit from L2 was an EPT Violation.
+ * If the fault is synthesized, then EXIT_QUALIFICATION
+ * is stale and/or holds entirely different data. And
+ * conversely, KVM _must_ rely on EXIT_QUALIFICATION if
+ * the fault came from hardware, because KVM only sees
+ * and walks the faulting GPA.
+ */
+ if (from_hardware)
+ exit_qualification |= vmx_get_exit_qual(vcpu) & mask;
+ else
+ exit_qualification |= fault->exit_qualification & mask;
+
vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
}
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 5/5] KVM: selftests: Add nested page fault injection test
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
` (3 preceding siblings ...)
2026-05-22 23:27 ` [PATCH v4 4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits Sean Christopherson
@ 2026-05-22 23:27 ` Sean Christopherson
2026-05-27 18:10 ` [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
Add a test that exercises nested page fault injection during L2
execution. L2 executes I/O string instructions (OUTSB/INSB) that access
memory restricted in L1's nested page tables (NPT/EPT), triggering a
nested page fault that L0 must inject to L1.
The test supports both AMD SVM (NPF) and Intel VMX (EPT violation) and
verifies that:
- The exit reason is an NPF/EPT violation
- The access type and permission bits are correct
- The faulting GPA is correct
Three test cases are implemented:
- Unmap the final data page (final translation fault, OUTSB read)
- Unmap a PT page (page walk fault, OUTSB read)
- Write-protect the final data page (protection violation, INSB write)
- Write-protect a PT page (protection violation on A/D update, OUTSB
read)
Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: name it nested_tdp_fault_test, consolidate asserts]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 9 +
.../selftests/kvm/x86/nested_tdp_fault_test.c | 313 ++++++++++++++++++
3 files changed, 323 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 82fa943b9503..2908eca1647a 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -97,6 +97,7 @@ TEST_GEN_PROGS_x86 += x86/nested_emulation_test
TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
TEST_GEN_PROGS_x86 += x86/nested_invalid_cr3_test
TEST_GEN_PROGS_x86 += x86/nested_set_state_test
+TEST_GEN_PROGS_x86 += x86/nested_tdp_fault_test
TEST_GEN_PROGS_x86 += x86/nested_tsc_adjust_test
TEST_GEN_PROGS_x86 += x86/nested_tsc_scaling_test
TEST_GEN_PROGS_x86 += x86/nested_vmsave_vmload_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 851ffcd3340c..06878e7c7347 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1573,6 +1573,15 @@ u64 *tdp_get_pte(struct kvm_vm *vm, u64 l2_gpa);
#define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define EPT_VIOLATION_ACC_READ BIT(0)
+#define EPT_VIOLATION_ACC_WRITE BIT(1)
+#define EPT_VIOLATION_ACC_INSTR BIT(2)
+#define EPT_VIOLATION_PROT_READ BIT(3)
+#define EPT_VIOLATION_PROT_WRITE BIT(4)
+#define EPT_VIOLATION_PROT_EXEC BIT(5)
+#define EPT_VIOLATION_GVA_IS_VALID BIT(7)
+#define EPT_VIOLATION_GVA_TRANSLATED BIT(8)
+
bool sys_clocksource_is_based_on_tsc(void);
#endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c b/tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c
new file mode 100644
index 000000000000..fa95568f55ff
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025, Google, Inc.
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "vmx.h"
+
+#define L2_GUEST_STACK_SIZE 64
+
+enum test_type {
+ TEST_FINAL_PAGE_UNMAPPED, /* Final data page not present */
+ TEST_PT_PAGE_UNMAPPED, /* Page table page not present */
+ TEST_FINAL_PAGE_WRITE_PROTECTED, /* Final data page read-only */
+ TEST_PT_PAGE_WRITE_PROTECTED, /* Page table page read-only */
+};
+
+static gva_t l2_test_page;
+static void (*l2_entry)(void);
+
+#define TEST_IO_PORT 0x80
+#define TEST1_VADDR 0x8000000ULL
+#define TEST2_VADDR 0x10000000ULL
+#define TEST3_VADDR 0x18000000ULL
+#define TEST4_VADDR 0x20000000ULL
+
+/*
+ * L2 executes OUTS reading from l2_test_page, triggering a nested page
+ * fault on the read access.
+ */
+static void l2_guest_code_outs(void)
+{
+ asm volatile("outsb" ::"S"(l2_test_page), "d"(TEST_IO_PORT) : "memory");
+ GUEST_FAIL("L2 should not reach here");
+}
+
+/*
+ * L2 executes INS writing to l2_test_page, triggering a nested page
+ * fault on the write access.
+ */
+static void l2_guest_code_ins(void)
+{
+ asm volatile("insb" ::"D"(l2_test_page), "d"(TEST_IO_PORT) : "memory");
+ GUEST_FAIL("L2 should not reach here");
+}
+
+#define GUEST_ASSERT_EXIT_QUAL(ac_eq, ex_eq) \
+ __GUEST_ASSERT((ac_eq) == (ex_eq), \
+ "Wanted EXIT_QUAL '0x%lx', got '0x%lx'", ex_eq, ac_eq)
+
+static void l1_vmx_code(struct vmx_pages *vmx, u64 expected_fault_gpa,
+ u64 test_type)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ u64 exit_qual;
+
+ GUEST_ASSERT(vmx->vmcs_gpa);
+ GUEST_ASSERT(prepare_for_vmx_operation(vmx));
+ GUEST_ASSERT(load_vmcs(vmx));
+
+ prepare_vmcs(vmx, l2_entry, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ GUEST_ASSERT(!vmlaunch());
+
+ /* Verify we got an EPT violation exit */
+ __GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_EPT_VIOLATION,
+ "Expected EPT violation (0x%x), got 0x%lx",
+ EXIT_REASON_EPT_VIOLATION,
+ vmreadz(VM_EXIT_REASON));
+
+ __GUEST_ASSERT(vmreadz(GUEST_PHYSICAL_ADDRESS) == expected_fault_gpa,
+ "Expected guest_physical_address = 0x%lx, got 0x%lx",
+ expected_fault_gpa,
+ vmreadz(GUEST_PHYSICAL_ADDRESS));
+
+ exit_qual = vmreadz(EXIT_QUALIFICATION);
+
+ /*
+ * Note, EPT page table accesses are always read+write, e.g. so that
+ * the CPU can do A/D updates at-will.
+ */
+ switch (test_type) {
+ case TEST_FINAL_PAGE_UNMAPPED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_READ |
+ EPT_VIOLATION_GVA_IS_VALID |
+ EPT_VIOLATION_GVA_TRANSLATED);
+ break;
+ case TEST_PT_PAGE_UNMAPPED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_READ |
+ EPT_VIOLATION_ACC_WRITE |
+ EPT_VIOLATION_GVA_IS_VALID);
+ break;
+ case TEST_FINAL_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_WRITE |
+ EPT_VIOLATION_PROT_READ |
+ EPT_VIOLATION_PROT_EXEC |
+ EPT_VIOLATION_GVA_IS_VALID |
+ EPT_VIOLATION_GVA_TRANSLATED);
+ break;
+ case TEST_PT_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_READ |
+ EPT_VIOLATION_ACC_WRITE |
+ EPT_VIOLATION_PROT_READ |
+ EPT_VIOLATION_PROT_EXEC |
+ EPT_VIOLATION_GVA_IS_VALID);
+ break;
+ }
+
+ GUEST_DONE();
+}
+
+#define GUEST_ASSERT_NPF_EC(ac_ec, ex_ec) \
+ __GUEST_ASSERT((ac_ec) == (ex_ec), \
+ "Wanted NPF error code '0x%lx', got '0x%lx'", (u64)(ex_ec), ac_ec)
+
+
+static void l1_svm_code(struct svm_test_data *svm, u64 expected_fault_gpa,
+ u64 test_type)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ struct vmcb *vmcb = svm->vmcb;
+ u64 exit_info_1;
+
+ generic_svm_setup(svm, l2_entry,
+ &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ run_guest(vmcb, svm->vmcb_gpa);
+
+ /* Verify we got an NPF exit */
+ __GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_NPF,
+ "Expected NPF exit (0x%x), got 0x%lx", SVM_EXIT_NPF,
+ vmcb->control.exit_code);
+
+ __GUEST_ASSERT(vmcb->control.exit_info_2 == expected_fault_gpa,
+ "Expected exit_info_2 = 0x%lx, got 0x%lx",
+ expected_fault_gpa,
+ vmcb->control.exit_info_2);
+
+ exit_info_1 = vmcb->control.exit_info_1;
+
+ /*
+ * Note, without GMET enabled, NPT walks are always user accesses. And
+ * like EPT, page table accesses are always read+write.
+ */
+ switch (test_type) {
+ case TEST_FINAL_PAGE_UNMAPPED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_USER_MASK |
+ PFERR_GUEST_FINAL_MASK);
+ break;
+ case TEST_PT_PAGE_UNMAPPED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_WRITE_MASK |
+ PFERR_USER_MASK |
+ PFERR_GUEST_PAGE_MASK);
+ break;
+ case TEST_FINAL_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_PRESENT_MASK |
+ PFERR_WRITE_MASK |
+ PFERR_USER_MASK |
+ PFERR_GUEST_FINAL_MASK);
+ break;
+ case TEST_PT_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_PRESENT_MASK |
+ PFERR_WRITE_MASK |
+ PFERR_USER_MASK |
+ PFERR_GUEST_PAGE_MASK);
+ break;
+ }
+
+ GUEST_DONE();
+}
+
+static void l1_guest_code(void *data, u64 expected_fault_gpa,
+ u64 test_type)
+{
+ if (this_cpu_has(X86_FEATURE_VMX))
+ l1_vmx_code(data, expected_fault_gpa, test_type);
+ else
+ l1_svm_code(data, expected_fault_gpa, test_type);
+}
+
+/* Returns the GPA of the PT page that maps @vaddr. */
+static u64 get_pt_gpa_for_vaddr(struct kvm_vm *vm, u64 vaddr)
+{
+ u64 *pte;
+
+ pte = vm_get_pte(vm, vaddr);
+ TEST_ASSERT(pte && (*pte & 0x1), "PTE not present for vaddr 0x%lx",
+ (unsigned long)vaddr);
+
+ return addr_hva2gpa(vm, (void *)((u64)pte & ~0xFFFULL));
+}
+
+static void run_test(enum test_type type)
+{
+ gpa_t expected_fault_gpa;
+ gva_t nested_gva;
+
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct ucall uc;
+
+ vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+ vm_enable_tdp(vm);
+
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vcpu_alloc_vmx(vm, &nested_gva);
+ else
+ vcpu_alloc_svm(vm, &nested_gva);
+
+ switch (type) {
+ case TEST_FINAL_PAGE_UNMAPPED:
+ /*
+ * Unmap the final data page from NPT/EPT. The guest page
+ * table walk succeeds, but the final GPA->HPA translation
+ * fails. L2 reads from the page via OUTS.
+ */
+ l2_entry = l2_guest_code_outs;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST1_VADDR);
+ expected_fault_gpa = addr_gva2gpa(vm, l2_test_page);
+ break;
+ case TEST_PT_PAGE_UNMAPPED:
+ /*
+ * Unmap a page table page from NPT/EPT. The hardware page
+ * table walk fails when translating the PT page's GPA
+ * through NPT/EPT. L2 reads from the page via OUTS.
+ */
+ l2_entry = l2_guest_code_outs;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST2_VADDR);
+ expected_fault_gpa = get_pt_gpa_for_vaddr(vm, l2_test_page);
+ break;
+ case TEST_FINAL_PAGE_WRITE_PROTECTED:
+ /*
+ * Write-protect the final data page in NPT/EPT. The page
+ * is present and readable, but not writable. L2 writes to
+ * the page via INS, triggering a protection violation.
+ */
+ l2_entry = l2_guest_code_ins;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST3_VADDR);
+ expected_fault_gpa = addr_gva2gpa(vm, l2_test_page);
+ break;
+ case TEST_PT_PAGE_WRITE_PROTECTED:
+ /*
+ * Write-protect a page table page in NPT/EPT. The page is
+ * present and readable, but not writable. The guest page
+ * table walk needs write access to set A/D bits, so it
+ * triggers a protection violation on the PT page.
+ * L2 reads from the page via OUTS.
+ */
+ l2_entry = l2_guest_code_outs;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST4_VADDR);
+ expected_fault_gpa = get_pt_gpa_for_vaddr(vm, l2_test_page);
+ break;
+ }
+
+ tdp_identity_map_default_memslots(vm);
+
+ if (type == TEST_FINAL_PAGE_WRITE_PROTECTED ||
+ type == TEST_PT_PAGE_WRITE_PROTECTED)
+ *tdp_get_pte(vm, expected_fault_gpa) &= ~PTE_WRITABLE_MASK(&vm->stage2_mmu);
+ else
+ *tdp_get_pte(vm, expected_fault_gpa) &= ~(PTE_PRESENT_MASK(&vm->stage2_mmu) |
+ PTE_READABLE_MASK(&vm->stage2_mmu) |
+ PTE_WRITABLE_MASK(&vm->stage2_mmu) |
+ PTE_EXECUTABLE_MASK(&vm->stage2_mmu));
+
+ sync_global_to_guest(vm, l2_entry);
+ sync_global_to_guest(vm, l2_test_page);
+ vcpu_args_set(vcpu, 3, nested_gva, expected_fault_gpa, (u64)type);
+
+ /*
+ * For the INS-based write test, KVM emulates the instruction and
+ * first reads from the I/O port, which exits to userspace.
+ * Re-enter the guest so emulation can proceed to the memory
+ * write, where the nested page fault is triggered.
+ */
+ for (;;) {
+ vcpu_run(vcpu);
+
+ if (vcpu->run->exit_reason == KVM_EXIT_IO &&
+ vcpu->run->io.port == TEST_IO_PORT &&
+ vcpu->run->io.direction == KVM_EXIT_IO_IN) {
+ continue;
+ }
+ break;
+ }
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ default:
+ TEST_FAIL("Unexpected exit reason: %d", vcpu->run->exit_reason);
+ }
+
+ kvm_vm_free(vm);
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX) || kvm_cpu_has(X86_FEATURE_SVM));
+ TEST_REQUIRE(kvm_cpu_has_tdp());
+
+ run_test(TEST_FINAL_PAGE_UNMAPPED);
+ run_test(TEST_PT_PAGE_UNMAPPED);
+ run_test(TEST_FINAL_PAGE_WRITE_PROTECTED);
+ run_test(TEST_PT_PAGE_WRITE_PROTECTED);
+
+ return 0;
+}
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-22 23:26 ` [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware Sean Christopherson
@ 2026-05-26 18:18 ` Yosry Ahmed
2026-05-26 18:48 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:18 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> When injecting a page fault (including nested TDP faults into L1), tell the
> injection routine whether or not the fault originated in hardware, i.e. if
> KVM is effectively forwarding a fault it intercept. For nested TDP fault
> injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> VMCB/VMCS, _if_ the fault originated in hardware.
>
> No functional change intended (nothing uses the new param, yet...).
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> arch/x86/kvm/svm/nested.c | 3 ++-
> arch/x86/kvm/vmx/nested.c | 3 ++-
> arch/x86/kvm/x86.c | 16 +++++++++-------
> 5 files changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 271bdd109a98..d11063c36f03 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -484,7 +484,8 @@ struct kvm_mmu {
> u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> - struct x86_exception *fault);
> + struct x86_exception *fault,
> + bool from_hardware);
Probably a bit late to ask this question, but why do we need
from_hardware (or the previous hardware_nested_page_fault) as opposed
to just checking exit_code / exit_reason? Is it possible to get an
NPF/EPT violation but then synthesize a different one into L1 rather
than forwarding the one we got from HW?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-22 23:26 ` [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits Sean Christopherson
@ 2026-05-26 18:31 ` Yosry Ahmed
2026-05-26 18:44 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:31 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> From: Kevin Cheng <chengkev@google.com>
>
> Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
> Nested Page Fault into L1. Currently, KVM blindly stuffs GUEST_FINAL into
> L1, which is blatantly wrong given that KVM obviously generates NPFs for
> page table accesses.
>
> There are two paths that trigger NPF injection: hardware NPF exits (from
> L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
> emulating an L2 GVA access. For the hardware case, use the bits verbatim
> from the VMCB, as KVM is simply forwarding a NPF to L1. For the emulation
> case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
> were recently added for MBEC+GMET support).
>
> To differentiate between the two cases, add "hardware_nested_page_fault"
> to "struct x86_exception", and set it when injecting a NPF in response to
> an NPF exit from L2.
hardware_nested_page_fault is no more.
>
> To help guard against future goofs, assert that exactly one of GUEST_PAGE
> or GUEST_FINAL is set when injecting a NPF. Unlike VMX, there are no
> (known) cases where hardware doesn't set either bit, and KVM should always
> set one or the other when emulating a GVA access.
>
> Signed-off-by: Kevin Cheng <chengkev@google.com>
> [sean: use plumbed in @access bits, massage changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
[..]
> @@ -39,19 +39,32 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = svm->vmcb;
> + u64 fault_stage;
>
> - if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> - /*
> - * TODO: track the cause of the nested page fault, and
> - * correctly fill in the high bits of exit_info_1.
> - */
> - vmcb->control.exit_code = SVM_EXIT_NPF;
> - vmcb->control.exit_info_1 = (1ULL << 32);
> - vmcb->control.exit_info_2 = fault->address;
> - }
> + /*
> + * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only
> + * available in the hardware exit_info_1, since the guest_mmu
> + * walker doesn't know whether the faulting GPA was a page table
> + * page or final page from L2's perspective.
> + */
> + if (from_hardware)
> + fault_stage = vmcb->control.exit_info_1 &
> + PFERR_GUEST_FAULT_STAGE_MASK;
> + else
> + fault_stage = fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK;
>
> - vmcb->control.exit_info_1 &= ~0xffffffffULL;
> - vmcb->control.exit_info_1 |= fault->error_code;
> + /*
> + * All nested page faults should be annotated as occurring on the
> + * final translation *or* the page walk. Arbitrarily choose "final"
> + * if KVM is buggy and enumerated both or neither.
> + */
> + if (WARN_ON_ONCE(hweight64(fault_stage) != 1))
> + fault_stage = PFERR_GUEST_FINAL_MASK;
> +
> + vmcb->control.exit_code = SVM_EXIT_NPF;
> + vmcb->control.exit_info_1 = fault_stage |
> + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
Do we need to do this in the common path? If from_hardware=true, can
the fault injected by KVM have different flags from the one produced
by hardware? I guess the answer is yes, (e.g. if KVM is doing
write-protection?). Might be worth a comment.
> + vmcb->control.exit_info_2 = fault->address;
>
> nested_svm_vmexit(svm);
> }
> --
> 2.54.0.794.g4f17f83d09-goog
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-26 18:31 ` Yosry Ahmed
@ 2026-05-26 18:44 ` Sean Christopherson
2026-05-26 18:50 ` Yosry Ahmed
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-26 18:44 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > From: Kevin Cheng <chengkev@google.com>
> >
> > Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
> > Nested Page Fault into L1. Currently, KVM blindly stuffs GUEST_FINAL into
> > L1, which is blatantly wrong given that KVM obviously generates NPFs for
> > page table accesses.
> >
> > There are two paths that trigger NPF injection: hardware NPF exits (from
> > L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
> > emulating an L2 GVA access. For the hardware case, use the bits verbatim
> > from the VMCB, as KVM is simply forwarding a NPF to L1. For the emulation
> > case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
> > were recently added for MBEC+GMET support).
> >
> > To differentiate between the two cases, add "hardware_nested_page_fault"
> > to "struct x86_exception", and set it when injecting a NPF in response to
> > an NPF exit from L2.
>
> hardware_nested_page_fault is no more.
Hrm, I suspect I unintentionally discarded a changelog update, I distinctly
remember rewriting this. *sigh*
> > To help guard against future goofs, assert that exactly one of GUEST_PAGE
> > or GUEST_FINAL is set when injecting a NPF. Unlike VMX, there are no
> > (known) cases where hardware doesn't set either bit, and KVM should always
> > set one or the other when emulating a GVA access.
> >
> > Signed-off-by: Kevin Cheng <chengkev@google.com>
> > [sean: use plumbed in @access bits, massage changelog]
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> [..]
> > @@ -39,19 +39,32 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> > {
> > struct vcpu_svm *svm = to_svm(vcpu);
> > struct vmcb *vmcb = svm->vmcb;
> > + u64 fault_stage;
> >
> > - if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> > - /*
> > - * TODO: track the cause of the nested page fault, and
> > - * correctly fill in the high bits of exit_info_1.
> > - */
> > - vmcb->control.exit_code = SVM_EXIT_NPF;
> > - vmcb->control.exit_info_1 = (1ULL << 32);
> > - vmcb->control.exit_info_2 = fault->address;
> > - }
> > + /*
> > + * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only
> > + * available in the hardware exit_info_1, since the guest_mmu
> > + * walker doesn't know whether the faulting GPA was a page table
> > + * page or final page from L2's perspective.
> > + */
> > + if (from_hardware)
> > + fault_stage = vmcb->control.exit_info_1 &
> > + PFERR_GUEST_FAULT_STAGE_MASK;
> > + else
> > + fault_stage = fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK;
> >
> > - vmcb->control.exit_info_1 &= ~0xffffffffULL;
> > - vmcb->control.exit_info_1 |= fault->error_code;
> > + /*
> > + * All nested page faults should be annotated as occurring on the
> > + * final translation *or* the page walk. Arbitrarily choose "final"
> > + * if KVM is buggy and enumerated both or neither.
> > + */
> > + if (WARN_ON_ONCE(hweight64(fault_stage) != 1))
> > + fault_stage = PFERR_GUEST_FINAL_MASK;
> > +
> > + vmcb->control.exit_code = SVM_EXIT_NPF;
> > + vmcb->control.exit_info_1 = fault_stage |
> > + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
>
> Do we need to do this in the common path?
What do you mean by "this"? Pulling flags from fault->error_code?
> If from_hardware=true, can the fault injected by KVM have different flags
> from the one produced by hardware?
Flags, yes. fault_stage, no.
> I guess the answer is yes, (e.g. if KVM is doing write-protection?). Might be
> worth a comment.
Or if L1 has modified its TDP PTEs in memory, but hasn't yet flushed TLBs. In
that case, KVM's software walker can see the updated PTEs, while hardware may
have seen something else.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-26 18:18 ` Yosry Ahmed
@ 2026-05-26 18:48 ` Sean Christopherson
2026-05-26 18:52 ` Yosry Ahmed
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-26 18:48 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > When injecting a page fault (including nested TDP faults into L1), tell the
> > injection routine whether or not the fault originated in hardware, i.e. if
> > KVM is effectively forwarding a fault it intercept. For nested TDP fault
> > injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> > VMCB/VMCS, _if_ the fault originated in hardware.
> >
> > No functional change intended (nothing uses the new param, yet...).
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> > arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> > arch/x86/kvm/svm/nested.c | 3 ++-
> > arch/x86/kvm/vmx/nested.c | 3 ++-
> > arch/x86/kvm/x86.c | 16 +++++++++-------
> > 5 files changed, 28 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 271bdd109a98..d11063c36f03 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -484,7 +484,8 @@ struct kvm_mmu {
> > u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> > int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> > - struct x86_exception *fault);
> > + struct x86_exception *fault,
> > + bool from_hardware);
>
> Probably a bit late to ask this question, but why do we need
> from_hardware (or the previous hardware_nested_page_fault) as opposed
> to just checking exit_code / exit_reason? Is it possible to get an
> NPF/EPT violation but then synthesize a different one into L1 rather
> than forwarding the one we got from HW?
Yes. E.g. if access to emulated MMIO from L2 hit a !PRESENT fault (EPT Violation
or #NPF), e.g. because MMIO caching is disabled or it's the first time the GPA has
been accessed by L2, then KVM will enter the emulator. If emulating the MMIO
access then hits a TDP fault, e.g. because L2 was accessing MMIO with a MOVQ
(memory-to-memory move), or because L1 has since unmapped the code stream, then
the TDP fault synthesized to L1 will not be the "same" fault the triggered the
VM-Exit.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-26 18:44 ` Sean Christopherson
@ 2026-05-26 18:50 ` Yosry Ahmed
2026-05-27 18:14 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:50 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
> > > + vmcb->control.exit_code = SVM_EXIT_NPF;
> > > + vmcb->control.exit_info_1 = fault_stage |
> > > + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
> >
> > Do we need to do this in the common path?
>
> What do you mean by "this"? Pulling flags from fault->error_code?
Yes, sorry if that wasn't clear.
>
> > If from_hardware=true, can the fault injected by KVM have different flags
> > from the one produced by hardware?
>
> Flags, yes. fault_stage, no.
Right, I meant the flags.
>
> > I guess the answer is yes, (e.g. if KVM is doing write-protection?). Might be
> > worth a comment.
>
> Or if L1 has modified its TDP PTEs in memory, but hasn't yet flushed TLBs. In
> that case, KVM's software walker can see the updated PTEs, while hardware may
> have seen something else.
Makes sense. A comment would be helpful for laymans like myself.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-26 18:48 ` Sean Christopherson
@ 2026-05-26 18:52 ` Yosry Ahmed
2026-05-27 18:11 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:52 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026 at 11:48 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, May 26, 2026, Yosry Ahmed wrote:
> > On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > When injecting a page fault (including nested TDP faults into L1), tell the
> > > injection routine whether or not the fault originated in hardware, i.e. if
> > > KVM is effectively forwarding a fault it intercept. For nested TDP fault
> > > injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> > > VMCB/VMCS, _if_ the fault originated in hardware.
> > >
> > > No functional change intended (nothing uses the new param, yet...).
> > >
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> > > arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> > > arch/x86/kvm/svm/nested.c | 3 ++-
> > > arch/x86/kvm/vmx/nested.c | 3 ++-
> > > arch/x86/kvm/x86.c | 16 +++++++++-------
> > > 5 files changed, 28 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 271bdd109a98..d11063c36f03 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -484,7 +484,8 @@ struct kvm_mmu {
> > > u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> > > int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > > void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> > > - struct x86_exception *fault);
> > > + struct x86_exception *fault,
> > > + bool from_hardware);
> >
> > Probably a bit late to ask this question, but why do we need
> > from_hardware (or the previous hardware_nested_page_fault) as opposed
> > to just checking exit_code / exit_reason? Is it possible to get an
> > NPF/EPT violation but then synthesize a different one into L1 rather
> > than forwarding the one we got from HW?
>
> Yes. E.g. if access to emulated MMIO from L2 hit a !PRESENT fault (EPT Violation
> or #NPF), e.g. because MMIO caching is disabled or it's the first time the GPA has
> been accessed by L2, then KVM will enter the emulator. If emulating the MMIO
> access then hits a TDP fault, e.g. because L2 was accessing MMIO with a MOVQ
> (memory-to-memory move), or because L1 has since unmapped the code stream, then
> the TDP fault synthesized to L1 will not be the "same" fault the triggered the
> VM-Exit.
Interesting, thanks for the example. Probably worth documenting this
somewhere (changelog? comment?).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
` (4 preceding siblings ...)
2026-05-22 23:27 ` [PATCH v4 5/5] KVM: selftests: Add nested page fault injection test Sean Christopherson
@ 2026-05-27 18:10 ` Sean Christopherson
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-27 18:10 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
On Fri, 22 May 2026 16:26:56 -0700, Sean Christopherson wrote:
> Kevin's series to fix how KVM populates error information when injecting
> nested page faults (NPF on SVM, EPT violations on VMX) to L1 during
> instruction emulation.
>
> See v3 for the full cover letter.
>
> v4:
> - Pass @from_hardware directly instead of stuff a flag in x86_exception.
> - Use the bits in @access (thanks to MBEC+GMET) to get the fault stage.
> - Check the entire PFEC/EXIT_QUAL in the selftest.
> - Use hardware _or_ KVM information, never merge the two.
> - Name the selftest nested_tdp_fault_test.
>
> [...]
Applied to kvm-x86 misc, thanks!
[1/5] KVM: x86: Widen x86_exception's error_code to 64 bits
https://github.com/kvm-x86/linux/commit/bb24edbb673f
[2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
https://github.com/kvm-x86/linux/commit/fe0b872d7500
[3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
https://github.com/kvm-x86/linux/commit/297c2fe249db
[4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits
https://github.com/kvm-x86/linux/commit/96b067b59ad9
[5/5] KVM: selftests: Add nested page fault injection test
https://github.com/kvm-x86/linux/commit/0de1020f7bbb
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-26 18:52 ` Yosry Ahmed
@ 2026-05-27 18:11 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-27 18:11 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> On Tue, May 26, 2026 at 11:48 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, May 26, 2026, Yosry Ahmed wrote:
> > > On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > When injecting a page fault (including nested TDP faults into L1), tell the
> > > > injection routine whether or not the fault originated in hardware, i.e. if
> > > > KVM is effectively forwarding a fault it intercept. For nested TDP fault
> > > > injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> > > > VMCB/VMCS, _if_ the fault originated in hardware.
> > > >
> > > > No functional change intended (nothing uses the new param, yet...).
> > > >
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > > arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> > > > arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> > > > arch/x86/kvm/svm/nested.c | 3 ++-
> > > > arch/x86/kvm/vmx/nested.c | 3 ++-
> > > > arch/x86/kvm/x86.c | 16 +++++++++-------
> > > > 5 files changed, 28 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 271bdd109a98..d11063c36f03 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -484,7 +484,8 @@ struct kvm_mmu {
> > > > u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> > > > int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > > > void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> > > > - struct x86_exception *fault);
> > > > + struct x86_exception *fault,
> > > > + bool from_hardware);
> > >
> > > Probably a bit late to ask this question, but why do we need
> > > from_hardware (or the previous hardware_nested_page_fault) as opposed
> > > to just checking exit_code / exit_reason? Is it possible to get an
> > > NPF/EPT violation but then synthesize a different one into L1 rather
> > > than forwarding the one we got from HW?
> >
> > Yes. E.g. if access to emulated MMIO from L2 hit a !PRESENT fault (EPT Violation
> > or #NPF), e.g. because MMIO caching is disabled or it's the first time the GPA has
> > been accessed by L2, then KVM will enter the emulator. If emulating the MMIO
> > access then hits a TDP fault, e.g. because L2 was accessing MMIO with a MOVQ
> > (memory-to-memory move), or because L1 has since unmapped the code stream, then
> > the TDP fault synthesized to L1 will not be the "same" fault the triggered the
> > VM-Exit.
>
> Interesting, thanks for the example. Probably worth documenting this
> somewhere (changelog? comment?).
I added a version of the above to the changelog.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-26 18:50 ` Yosry Ahmed
@ 2026-05-27 18:14 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-27 18:14 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> > > > + vmcb->control.exit_code = SVM_EXIT_NPF;
> > > > + vmcb->control.exit_info_1 = fault_stage |
> > > > + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
> > >
> > > Do we need to do this in the common path?
> >
> > What do you mean by "this"? Pulling flags from fault->error_code?
>
> Yes, sorry if that wasn't clear.
>
> >
> > > If from_hardware=true, can the fault injected by KVM have different flags
> > > from the one produced by hardware?
> >
> > Flags, yes. fault_stage, no.
>
> Right, I meant the flags.
>
> >
> > > I guess the answer is yes, (e.g. if KVM is doing write-protection?). Might be
> > > worth a comment.
> >
> > Or if L1 has modified its TDP PTEs in memory, but hasn't yet flushed TLBs. In
> > that case, KVM's software walker can see the updated PTEs, while hardware may
> > have seen something else.
>
> Makes sense. A comment would be helpful for laymans like myself.
I elected to not add a comment for now, because I'm not 100% confident the nSVM
code is correct, and so didn't want to stealth in a comment that wasn't correct
either. It's certainly much better than it was, but especially with GMET in play,
I need to stare more to convince myself it handles all the edge cases correctly.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-05-27 18:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware Sean Christopherson
2026-05-26 18:18 ` Yosry Ahmed
2026-05-26 18:48 ` Sean Christopherson
2026-05-26 18:52 ` Yosry Ahmed
2026-05-27 18:11 ` Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits Sean Christopherson
2026-05-26 18:31 ` Yosry Ahmed
2026-05-26 18:44 ` Sean Christopherson
2026-05-26 18:50 ` Yosry Ahmed
2026-05-27 18:14 ` Sean Christopherson
2026-05-22 23:27 ` [PATCH v4 4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits Sean Christopherson
2026-05-22 23:27 ` [PATCH v4 5/5] KVM: selftests: Add nested page fault injection test Sean Christopherson
2026-05-27 18:10 ` [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox