* [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
@ 2026-05-22 23:26 ` Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware Sean Christopherson
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
Widen the error_code field in struct x86_exception from u16 to u64 to
accommodate AMD's NPF error code, which defines information bits above
bit 31, e.g. PFERR_GUEST_FINAL_MASK (bit 32), and PFERR_GUEST_PAGE_MASK
(bit 33).
Retain the u16 type for the local errcode variable in walk_addr_generic
as the walker synthesizes conventional #PF error codes that are
architecturally limited to bits 15:0.
Signed-off-by: Kevin Cheng <chengkev@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 72aece9ef575..f5df31a52996 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -22,7 +22,7 @@ enum x86_intercept_stage;
struct x86_exception {
u8 vector;
bool error_code_valid;
- u16 error_code;
+ u64 error_code;
bool nested_page_fault;
union {
u64 address; /* cr2 or nested page fault gpa */
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 07100bbfc270..51f8b4522314 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -328,6 +328,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
const int write_fault = access & PFERR_WRITE_MASK;
const int user_fault = access & PFERR_USER_MASK;
const int fetch_fault = access & PFERR_FETCH_MASK;
+ /*
+ * Note! Track the error_code that's common to legacy shadow paging
+ * and NPT shadow paging as a u16 to guard against unintentionally
+ * setting any of bits 63:16. Architecturally, the #PF error code is
+ * 32 bits, and Intel CPUs don't support settings bits 31:16.
+ */
u16 errcode = 0;
gpa_t real_gpa;
gfn_t gfn;
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits Sean Christopherson
@ 2026-05-22 23:26 ` Sean Christopherson
2026-05-26 18:18 ` Yosry Ahmed
2026-05-22 23:26 ` [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits Sean Christopherson
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
When injecting a page fault (including nested TDP faults into L1), tell the
injection routine whether or not the fault originated in hardware, i.e. if
KVM is effectively forwarding a fault it intercept. For nested TDP fault
injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
VMCB/VMCS, _if_ the fault originated in hardware.
No functional change intended (nothing uses the new param, yet...).
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/svm/nested.c | 3 ++-
arch/x86/kvm/vmx/nested.c | 3 ++-
arch/x86/kvm/x86.c | 16 +++++++++-------
5 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 271bdd109a98..d11063c36f03 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -484,7 +484,8 @@ struct kvm_mmu {
u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
- struct x86_exception *fault);
+ struct x86_exception *fault,
+ bool from_hardware);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t gva_or_gpa, u64 access,
struct x86_exception *exception);
@@ -2305,9 +2306,18 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned int nr,
bool has_error_code, u32 error_code);
-void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
-void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault);
+void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
+ bool from_hardware);
+void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault,
+ bool from_hardware);
+
+static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault)
+{
+ __kvm_inject_emulated_page_fault(vcpu, fault, false);
+}
+
bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 51f8b4522314..cc9c7deb34bc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -813,7 +813,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
*/
if (!r) {
if (!fault->prefetch)
- kvm_inject_emulated_page_fault(vcpu, &walker.fault);
+ __kvm_inject_emulated_page_fault(vcpu, &walker.fault, true);
return RET_PF_RETRY;
}
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4ef9bc6a553f..1c1a5e322d18 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -34,7 +34,8 @@
#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
+ struct x86_exception *fault,
+ bool from_hardware)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4690a4d23709..3bb7eaa7b2a5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -411,7 +411,8 @@ static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
}
static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
+ struct x86_exception *fault,
+ bool from_hardware)
{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cecb2f84e5e0..aa2f8f43d94c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -969,7 +969,8 @@ static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
EMULTYPE_COMPLETE_USER_EXIT);
}
-void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
+void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
+ bool from_hardware)
{
++vcpu->stat.pf_guest;
@@ -986,8 +987,9 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
fault->address);
}
-void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
+void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault,
+ bool from_hardware)
{
struct kvm_mmu *fault_mmu;
WARN_ON_ONCE(fault->vector != PF_VECTOR);
@@ -1004,9 +1006,9 @@ void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
kvm_mmu_invalidate_addr(vcpu, fault_mmu, fault->address,
KVM_MMU_ROOT_CURRENT);
- fault_mmu->inject_page_fault(vcpu, fault);
+ fault_mmu->inject_page_fault(vcpu, fault, from_hardware);
}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inject_emulated_page_fault);
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_inject_emulated_page_fault);
void kvm_inject_nmi(struct kvm_vcpu *vcpu)
{
@@ -14065,7 +14067,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
fault.nested_page_fault = false;
fault.address = work->arch.token;
fault.async_page_fault = true;
- kvm_inject_page_fault(vcpu, &fault);
+ kvm_inject_page_fault(vcpu, &fault, false);
return true;
} else {
/*
@@ -14236,7 +14238,7 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c
fault.address = gva;
fault.async_page_fault = false;
}
- vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault);
+ vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault, true);
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_fixup_and_inject_pf_error);
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-22 23:26 ` [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware Sean Christopherson
@ 2026-05-26 18:18 ` Yosry Ahmed
2026-05-26 18:48 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:18 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> When injecting a page fault (including nested TDP faults into L1), tell the
> injection routine whether or not the fault originated in hardware, i.e. if
> KVM is effectively forwarding a fault it intercept. For nested TDP fault
> injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> VMCB/VMCS, _if_ the fault originated in hardware.
>
> No functional change intended (nothing uses the new param, yet...).
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> arch/x86/kvm/svm/nested.c | 3 ++-
> arch/x86/kvm/vmx/nested.c | 3 ++-
> arch/x86/kvm/x86.c | 16 +++++++++-------
> 5 files changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 271bdd109a98..d11063c36f03 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -484,7 +484,8 @@ struct kvm_mmu {
> u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> - struct x86_exception *fault);
> + struct x86_exception *fault,
> + bool from_hardware);
Probably a bit late to ask this question, but why do we need
from_hardware (or the previous hardware_nested_page_fault) as opposed
to just checking exit_code / exit_reason? Is it possible to get an
NPF/EPT violation but then synthesize a different one into L1 rather
than forwarding the one we got from HW?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-26 18:18 ` Yosry Ahmed
@ 2026-05-26 18:48 ` Sean Christopherson
2026-05-26 18:52 ` Yosry Ahmed
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-26 18:48 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > When injecting a page fault (including nested TDP faults into L1), tell the
> > injection routine whether or not the fault originated in hardware, i.e. if
> > KVM is effectively forwarding a fault it intercept. For nested TDP fault
> > injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> > VMCB/VMCS, _if_ the fault originated in hardware.
> >
> > No functional change intended (nothing uses the new param, yet...).
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> > arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> > arch/x86/kvm/svm/nested.c | 3 ++-
> > arch/x86/kvm/vmx/nested.c | 3 ++-
> > arch/x86/kvm/x86.c | 16 +++++++++-------
> > 5 files changed, 28 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 271bdd109a98..d11063c36f03 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -484,7 +484,8 @@ struct kvm_mmu {
> > u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> > int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> > - struct x86_exception *fault);
> > + struct x86_exception *fault,
> > + bool from_hardware);
>
> Probably a bit late to ask this question, but why do we need
> from_hardware (or the previous hardware_nested_page_fault) as opposed
> to just checking exit_code / exit_reason? Is it possible to get an
> NPF/EPT violation but then synthesize a different one into L1 rather
> than forwarding the one we got from HW?
Yes. E.g. if access to emulated MMIO from L2 hit a !PRESENT fault (EPT Violation
or #NPF), e.g. because MMIO caching is disabled or it's the first time the GPA has
been accessed by L2, then KVM will enter the emulator. If emulating the MMIO
access then hits a TDP fault, e.g. because L2 was accessing MMIO with a MOVQ
(memory-to-memory move), or because L1 has since unmapped the code stream, then
the TDP fault synthesized to L1 will not be the "same" fault the triggered the
VM-Exit.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-26 18:48 ` Sean Christopherson
@ 2026-05-26 18:52 ` Yosry Ahmed
2026-05-27 18:11 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:52 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026 at 11:48 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, May 26, 2026, Yosry Ahmed wrote:
> > On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > When injecting a page fault (including nested TDP faults into L1), tell the
> > > injection routine whether or not the fault originated in hardware, i.e. if
> > > KVM is effectively forwarding a fault it intercept. For nested TDP fault
> > > injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> > > VMCB/VMCS, _if_ the fault originated in hardware.
> > >
> > > No functional change intended (nothing uses the new param, yet...).
> > >
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> > > arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> > > arch/x86/kvm/svm/nested.c | 3 ++-
> > > arch/x86/kvm/vmx/nested.c | 3 ++-
> > > arch/x86/kvm/x86.c | 16 +++++++++-------
> > > 5 files changed, 28 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 271bdd109a98..d11063c36f03 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -484,7 +484,8 @@ struct kvm_mmu {
> > > u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> > > int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > > void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> > > - struct x86_exception *fault);
> > > + struct x86_exception *fault,
> > > + bool from_hardware);
> >
> > Probably a bit late to ask this question, but why do we need
> > from_hardware (or the previous hardware_nested_page_fault) as opposed
> > to just checking exit_code / exit_reason? Is it possible to get an
> > NPF/EPT violation but then synthesize a different one into L1 rather
> > than forwarding the one we got from HW?
>
> Yes. E.g. if access to emulated MMIO from L2 hit a !PRESENT fault (EPT Violation
> or #NPF), e.g. because MMIO caching is disabled or it's the first time the GPA has
> been accessed by L2, then KVM will enter the emulator. If emulating the MMIO
> access then hits a TDP fault, e.g. because L2 was accessing MMIO with a MOVQ
> (memory-to-memory move), or because L1 has since unmapped the code stream, then
> the TDP fault synthesized to L1 will not be the "same" fault the triggered the
> VM-Exit.
Interesting, thanks for the example. Probably worth documenting this
somewhere (changelog? comment?).
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
2026-05-26 18:52 ` Yosry Ahmed
@ 2026-05-27 18:11 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-27 18:11 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> On Tue, May 26, 2026 at 11:48 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, May 26, 2026, Yosry Ahmed wrote:
> > > On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > When injecting a page fault (including nested TDP faults into L1), tell the
> > > > injection routine whether or not the fault originated in hardware, i.e. if
> > > > KVM is effectively forwarding a fault it intercept. For nested TDP fault
> > > > injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
> > > > VMCB/VMCS, _if_ the fault originated in hardware.
> > > >
> > > > No functional change intended (nothing uses the new param, yet...).
> > > >
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > > arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++----
> > > > arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> > > > arch/x86/kvm/svm/nested.c | 3 ++-
> > > > arch/x86/kvm/vmx/nested.c | 3 ++-
> > > > arch/x86/kvm/x86.c | 16 +++++++++-------
> > > > 5 files changed, 28 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 271bdd109a98..d11063c36f03 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -484,7 +484,8 @@ struct kvm_mmu {
> > > > u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
> > > > int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > > > void (*inject_page_fault)(struct kvm_vcpu *vcpu,
> > > > - struct x86_exception *fault);
> > > > + struct x86_exception *fault,
> > > > + bool from_hardware);
> > >
> > > Probably a bit late to ask this question, but why do we need
> > > from_hardware (or the previous hardware_nested_page_fault) as opposed
> > > to just checking exit_code / exit_reason? Is it possible to get an
> > > NPF/EPT violation but then synthesize a different one into L1 rather
> > > than forwarding the one we got from HW?
> >
> > Yes. E.g. if access to emulated MMIO from L2 hit a !PRESENT fault (EPT Violation
> > or #NPF), e.g. because MMIO caching is disabled or it's the first time the GPA has
> > been accessed by L2, then KVM will enter the emulator. If emulating the MMIO
> > access then hits a TDP fault, e.g. because L2 was accessing MMIO with a MOVQ
> > (memory-to-memory move), or because L1 has since unmapped the code stream, then
> > the TDP fault synthesized to L1 will not be the "same" fault the triggered the
> > VM-Exit.
>
> Interesting, thanks for the example. Probably worth documenting this
> somewhere (changelog? comment?).
I added a version of the above to the changelog.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 1/5] KVM: x86: Widen x86_exception's error_code to 64 bits Sean Christopherson
2026-05-22 23:26 ` [PATCH v4 2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware Sean Christopherson
@ 2026-05-22 23:26 ` Sean Christopherson
2026-05-26 18:31 ` Yosry Ahmed
2026-05-22 23:27 ` [PATCH v4 4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits Sean Christopherson
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
Nested Page Fault into L1. Currently, KVM blindly stuffs GUEST_FINAL into
L1, which is blatantly wrong given that KVM obviously generates NPFs for
page table accesses.
There are two paths that trigger NPF injection: hardware NPF exits (from
L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
emulating an L2 GVA access. For the hardware case, use the bits verbatim
from the VMCB, as KVM is simply forwarding a NPF to L1. For the emulation
case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
were recently added for MBEC+GMET support).
To differentiate between the two cases, add "hardware_nested_page_fault"
to "struct x86_exception", and set it when injecting a NPF in response to
an NPF exit from L2.
To help guard against future goofs, assert that exactly one of GUEST_PAGE
or GUEST_FINAL is set when injecting a NPF. Unlike VMX, there are no
(known) cases where hardware doesn't set either bit, and KVM should always
set one or the other when emulating a GVA access.
Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: use plumbed in @access bits, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu/paging_tmpl.h | 15 +++++---------
arch/x86/kvm/svm/nested.c | 35 ++++++++++++++++++++++-----------
3 files changed, 31 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d11063c36f03..e1c4151d6693 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -284,6 +284,8 @@ enum x86_intercept_stage;
#define PFERR_GUEST_RMP_MASK BIT_ULL(31)
#define PFERR_GUEST_FINAL_MASK BIT_ULL(32)
#define PFERR_GUEST_PAGE_MASK BIT_ULL(33)
+#define PFERR_GUEST_FAULT_STAGE_MASK \
+ (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK)
#define PFERR_GUEST_ENC_MASK BIT_ULL(34)
#define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
#define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index cc9c7deb34bc..66eee6914234 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -397,16 +397,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
nested_access | PFERR_GUEST_PAGE_MASK,
&walker->fault, 0);
- /*
- * FIXME: This can happen if emulation (for of an INS/OUTS
- * instruction) triggers a nested page fault. The exit
- * qualification / exit info field will incorrectly have
- * "guest page access" as the nested page fault's cause,
- * instead of "guest page structure access". To fix this,
- * the x86_exception struct should be augmented with enough
- * information to fix the exit_qualification or exit_info_1
- * fields.
- */
if (unlikely(real_gpa == INVALID_GPA))
return 0;
@@ -548,6 +538,11 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
walker->fault.async_page_fault = false;
+#if PTTYPE != PTTYPE_EPT
+ if (walker->fault.nested_page_fault)
+ walker->fault.error_code |= access & PFERR_GUEST_FAULT_STAGE_MASK;
+#endif
+
trace_kvm_mmu_walker_error(walker->fault.error_code);
return 0;
}
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1c1a5e322d18..28ac5d5c990d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -39,19 +39,32 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
+ u64 fault_stage;
- if (vmcb->control.exit_code != SVM_EXIT_NPF) {
- /*
- * TODO: track the cause of the nested page fault, and
- * correctly fill in the high bits of exit_info_1.
- */
- vmcb->control.exit_code = SVM_EXIT_NPF;
- vmcb->control.exit_info_1 = (1ULL << 32);
- vmcb->control.exit_info_2 = fault->address;
- }
+ /*
+ * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only
+ * available in the hardware exit_info_1, since the guest_mmu
+ * walker doesn't know whether the faulting GPA was a page table
+ * page or final page from L2's perspective.
+ */
+ if (from_hardware)
+ fault_stage = vmcb->control.exit_info_1 &
+ PFERR_GUEST_FAULT_STAGE_MASK;
+ else
+ fault_stage = fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK;
- vmcb->control.exit_info_1 &= ~0xffffffffULL;
- vmcb->control.exit_info_1 |= fault->error_code;
+ /*
+ * All nested page faults should be annotated as occurring on the
+ * final translation *or* the page walk. Arbitrarily choose "final"
+ * if KVM is buggy and enumerated both or neither.
+ */
+ if (WARN_ON_ONCE(hweight64(fault_stage) != 1))
+ fault_stage = PFERR_GUEST_FINAL_MASK;
+
+ vmcb->control.exit_code = SVM_EXIT_NPF;
+ vmcb->control.exit_info_1 = fault_stage |
+ (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
+ vmcb->control.exit_info_2 = fault->address;
nested_svm_vmexit(svm);
}
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-22 23:26 ` [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits Sean Christopherson
@ 2026-05-26 18:31 ` Yosry Ahmed
2026-05-26 18:44 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:31 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> From: Kevin Cheng <chengkev@google.com>
>
> Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
> Nested Page Fault into L1. Currently, KVM blindly stuffs GUEST_FINAL into
> L1, which is blatantly wrong given that KVM obviously generates NPFs for
> page table accesses.
>
> There are two paths that trigger NPF injection: hardware NPF exits (from
> L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
> emulating an L2 GVA access. For the hardware case, use the bits verbatim
> from the VMCB, as KVM is simply forwarding a NPF to L1. For the emulation
> case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
> were recently added for MBEC+GMET support).
>
> To differentiate between the two cases, add "hardware_nested_page_fault"
> to "struct x86_exception", and set it when injecting a NPF in response to
> an NPF exit from L2.
hardware_nested_page_fault is no more.
>
> To help guard against future goofs, assert that exactly one of GUEST_PAGE
> or GUEST_FINAL is set when injecting a NPF. Unlike VMX, there are no
> (known) cases where hardware doesn't set either bit, and KVM should always
> set one or the other when emulating a GVA access.
>
> Signed-off-by: Kevin Cheng <chengkev@google.com>
> [sean: use plumbed in @access bits, massage changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
[..]
> @@ -39,19 +39,32 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = svm->vmcb;
> + u64 fault_stage;
>
> - if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> - /*
> - * TODO: track the cause of the nested page fault, and
> - * correctly fill in the high bits of exit_info_1.
> - */
> - vmcb->control.exit_code = SVM_EXIT_NPF;
> - vmcb->control.exit_info_1 = (1ULL << 32);
> - vmcb->control.exit_info_2 = fault->address;
> - }
> + /*
> + * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only
> + * available in the hardware exit_info_1, since the guest_mmu
> + * walker doesn't know whether the faulting GPA was a page table
> + * page or final page from L2's perspective.
> + */
> + if (from_hardware)
> + fault_stage = vmcb->control.exit_info_1 &
> + PFERR_GUEST_FAULT_STAGE_MASK;
> + else
> + fault_stage = fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK;
>
> - vmcb->control.exit_info_1 &= ~0xffffffffULL;
> - vmcb->control.exit_info_1 |= fault->error_code;
> + /*
> + * All nested page faults should be annotated as occurring on the
> + * final translation *or* the page walk. Arbitrarily choose "final"
> + * if KVM is buggy and enumerated both or neither.
> + */
> + if (WARN_ON_ONCE(hweight64(fault_stage) != 1))
> + fault_stage = PFERR_GUEST_FINAL_MASK;
> +
> + vmcb->control.exit_code = SVM_EXIT_NPF;
> + vmcb->control.exit_info_1 = fault_stage |
> + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
Do we need to do this in the common path? If from_hardware=true, can
the fault injected by KVM have different flags from the one produced
by hardware? I guess the answer is yes, (e.g. if KVM is doing
write-protection?). Might be worth a comment.
> + vmcb->control.exit_info_2 = fault->address;
>
> nested_svm_vmexit(svm);
> }
> --
> 2.54.0.794.g4f17f83d09-goog
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-26 18:31 ` Yosry Ahmed
@ 2026-05-26 18:44 ` Sean Christopherson
2026-05-26 18:50 ` Yosry Ahmed
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-05-26 18:44 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > From: Kevin Cheng <chengkev@google.com>
> >
> > Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
> > Nested Page Fault into L1. Currently, KVM blindly stuffs GUEST_FINAL into
> > L1, which is blatantly wrong given that KVM obviously generates NPFs for
> > page table accesses.
> >
> > There are two paths that trigger NPF injection: hardware NPF exits (from
> > L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
> > emulating an L2 GVA access. For the hardware case, use the bits verbatim
> > from the VMCB, as KVM is simply forwarding a NPF to L1. For the emulation
> > case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
> > were recently added for MBEC+GMET support).
> >
> > To differentiate between the two cases, add "hardware_nested_page_fault"
> > to "struct x86_exception", and set it when injecting a NPF in response to
> > an NPF exit from L2.
>
> hardware_nested_page_fault is no more.
Hrm, I suspect I unintentionally discarded a changelog update, I distinctly
remember rewriting this. *sigh*
> > To help guard against future goofs, assert that exactly one of GUEST_PAGE
> > or GUEST_FINAL is set when injecting a NPF. Unlike VMX, there are no
> > (known) cases where hardware doesn't set either bit, and KVM should always
> > set one or the other when emulating a GVA access.
> >
> > Signed-off-by: Kevin Cheng <chengkev@google.com>
> > [sean: use plumbed in @access bits, massage changelog]
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> [..]
> > @@ -39,19 +39,32 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> > {
> > struct vcpu_svm *svm = to_svm(vcpu);
> > struct vmcb *vmcb = svm->vmcb;
> > + u64 fault_stage;
> >
> > - if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> > - /*
> > - * TODO: track the cause of the nested page fault, and
> > - * correctly fill in the high bits of exit_info_1.
> > - */
> > - vmcb->control.exit_code = SVM_EXIT_NPF;
> > - vmcb->control.exit_info_1 = (1ULL << 32);
> > - vmcb->control.exit_info_2 = fault->address;
> > - }
> > + /*
> > + * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only
> > + * available in the hardware exit_info_1, since the guest_mmu
> > + * walker doesn't know whether the faulting GPA was a page table
> > + * page or final page from L2's perspective.
> > + */
> > + if (from_hardware)
> > + fault_stage = vmcb->control.exit_info_1 &
> > + PFERR_GUEST_FAULT_STAGE_MASK;
> > + else
> > + fault_stage = fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK;
> >
> > - vmcb->control.exit_info_1 &= ~0xffffffffULL;
> > - vmcb->control.exit_info_1 |= fault->error_code;
> > + /*
> > + * All nested page faults should be annotated as occurring on the
> > + * final translation *or* the page walk. Arbitrarily choose "final"
> > + * if KVM is buggy and enumerated both or neither.
> > + */
> > + if (WARN_ON_ONCE(hweight64(fault_stage) != 1))
> > + fault_stage = PFERR_GUEST_FINAL_MASK;
> > +
> > + vmcb->control.exit_code = SVM_EXIT_NPF;
> > + vmcb->control.exit_info_1 = fault_stage |
> > + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
>
> Do we need to do this in the common path?
What do you mean by "this"? Pulling flags from fault->error_code?
> If from_hardware=true, can the fault injected by KVM have different flags
> from the one produced by hardware?
Flags, yes. fault_stage, no.
> I guess the answer is yes, (e.g. if KVM is doing write-protection?). Might be
> worth a comment.
Or if L1 has modified its TDP PTEs in memory, but hasn't yet flushed TLBs. In
that case, KVM's software walker can see the updated PTEs, while hardware may
have seen something else.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-26 18:44 ` Sean Christopherson
@ 2026-05-26 18:50 ` Yosry Ahmed
2026-05-27 18:14 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-05-26 18:50 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
> > > + vmcb->control.exit_code = SVM_EXIT_NPF;
> > > + vmcb->control.exit_info_1 = fault_stage |
> > > + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
> >
> > Do we need to do this in the common path?
>
> What do you mean by "this"? Pulling flags from fault->error_code?
Yes, sorry if that wasn't clear.
>
> > If from_hardware=true, can the fault injected by KVM have different flags
> > from the one produced by hardware?
>
> Flags, yes. fault_stage, no.
Right, I meant the flags.
>
> > I guess the answer is yes, (e.g. if KVM is doing write-protection?). Might be
> > worth a comment.
>
> Or if L1 has modified its TDP PTEs in memory, but hasn't yet flushed TLBs. In
> that case, KVM's software walker can see the updated PTEs, while hardware may
> have seen something else.
Makes sense. A comment would be helpful for laymans like myself.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
2026-05-26 18:50 ` Yosry Ahmed
@ 2026-05-27 18:14 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-27 18:14 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, Kevin Cheng
On Tue, May 26, 2026, Yosry Ahmed wrote:
> > > > + vmcb->control.exit_code = SVM_EXIT_NPF;
> > > > + vmcb->control.exit_info_1 = fault_stage |
> > > > + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);
> > >
> > > Do we need to do this in the common path?
> >
> > What do you mean by "this"? Pulling flags from fault->error_code?
>
> Yes, sorry if that wasn't clear.
>
> >
> > > If from_hardware=true, can the fault injected by KVM have different flags
> > > from the one produced by hardware?
> >
> > Flags, yes. fault_stage, no.
>
> Right, I meant the flags.
>
> >
> > > I guess the answer is yes, (e.g. if KVM is doing write-protection?). Might be
> > > worth a comment.
> >
> > Or if L1 has modified its TDP PTEs in memory, but hasn't yet flushed TLBs. In
> > that case, KVM's software walker can see the updated PTEs, while hardware may
> > have seen something else.
>
> Makes sense. A comment would be helpful for laymans like myself.
I elected to not add a comment for now, because I'm not 100% confident the nSVM
code is correct, and so didn't want to stealth in a comment that wasn't correct
either. It's certainly much better than it was, but especially with GMET in play,
I need to stare more to convince myself it handles all the edge cases correctly.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
` (2 preceding siblings ...)
2026-05-22 23:26 ` [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits Sean Christopherson
@ 2026-05-22 23:27 ` Sean Christopherson
2026-05-22 23:27 ` [PATCH v4 5/5] KVM: selftests: Add nested page fault injection test Sean Christopherson
2026-05-27 18:10 ` [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
When injecting an EPT Violation into L2 in response to a fault detected
while emulating an L2 GVA access, synthesize the GVA_IS_VALID and
GVA_TRANSLATED bits using information provided by the walker, instead of
pulling the bits from vmcs02.EXIT_QUALIFICATION. The information in
vmcs02.EXIT_QUALIFICATION is valid/correct if and only if the fault being
injected into L1 is the direct result of an EPT Violation VM-Exit from L2.
E.g. if KVM is emulating an I/O instruction and the memory operand's
translation through L1's EPT fails, using vmcs02.EXIT_QUALIFICATION is
wrong as the semantics for EXIT_QUALIFICATION would be for an I/O exit,
not an EPT Violation exit.
Opportunistically clean up the formatting for creating the mask of bits
to pull from vmcs02.EXIT_QUALIFICATION.
Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: use plumbed in @access bits, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/paging_tmpl.h | 13 ++++++++++++-
arch/x86/kvm/vmx/nested.c | 26 +++++++++++++++++++++-----
2 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 66eee6914234..df3ae0c7ec2c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -502,7 +502,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
* [2:0] - Derive from the access bits. The exit_qualification might be
* out of date if it is serving an EPT misconfiguration.
* [5:3] - Calculated by the page walk of the guest EPT page tables
- * [7:11] - Derived from [7:11] of real exit_qualification
+ * [7:8] - Derived from "fault stage" access bits
+ * [9:11] - Derived from [9:11] of real exit_qualification
*
* The other bits are set to 0.
*/
@@ -516,6 +517,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
else
walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
+ /*
+ * KVM doesn't emulate features that access GPAs directly, e.g.
+ * Intel Processor Trace. Assume the GVA is always valid; when
+ * propagating faults from hardware, KVM will discard this info
+ * and use the EXIT_QUALIFICATION bits from the VMCS.
+ */
+ walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID;
+
/*
* Accesses to guest paging structures are either "reads" or
* "read+write" accesses, so consider them the latter if write_fault
@@ -523,6 +532,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
*/
if (access & PFERR_GUEST_PAGE_MASK)
walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ;
+ else
+ walker->fault.exit_qualification |= EPT_VIOLATION_GVA_TRANSLATED;
/*
* Note, pte_access holds the raw RWX bits from the EPTE, not
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3bb7eaa7b2a5..a78ce0080963 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -445,13 +445,29 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
exit_qualification = 0;
} else {
u64 mask = EPT_VIOLATION_GVA_IS_VALID |
- EPT_VIOLATION_GVA_TRANSLATED;
+ EPT_VIOLATION_GVA_TRANSLATED;
+
if (vmx->nested.msrs.ept_caps & VMX_EPT_ADVANCED_VMEXIT_INFO_BIT)
mask |= EPT_VIOLATION_GVA_USER |
- EPT_VIOLATION_GVA_WRITABLE |
- EPT_VIOLATION_GVA_NX;
- exit_qualification = fault->exit_qualification;
- exit_qualification |= vmx_get_exit_qual(vcpu) & mask;
+ EPT_VIOLATION_GVA_WRITABLE |
+ EPT_VIOLATION_GVA_NX;
+
+ exit_qualification = fault->exit_qualification & ~mask;
+
+ /*
+ * Use the EXIT_QUALIFICATION from the VMCS if and only
+ * if the hardware VM-Exit from L2 was an EPT Violation.
+ * If the fault is synthesized, then EXIT_QUALIFICATION
+ * is stale and/or holds entirely different data. And
+ * conversely, KVM _must_ rely on EXIT_QUALIFICATION if
+ * the fault came from hardware, because KVM only sees
+ * and walks the faulting GPA.
+ */
+ if (from_hardware)
+ exit_qualification |= vmx_get_exit_qual(vcpu) & mask;
+ else
+ exit_qualification |= fault->exit_qualification & mask;
+
vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
}
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v4 5/5] KVM: selftests: Add nested page fault injection test
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
` (3 preceding siblings ...)
2026-05-22 23:27 ` [PATCH v4 4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits Sean Christopherson
@ 2026-05-22 23:27 ` Sean Christopherson
2026-05-27 18:10 ` [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-22 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
From: Kevin Cheng <chengkev@google.com>
Add a test that exercises nested page fault injection during L2
execution. L2 executes I/O string instructions (OUTSB/INSB) that access
memory restricted in L1's nested page tables (NPT/EPT), triggering a
nested page fault that L0 must inject to L1.
The test supports both AMD SVM (NPF) and Intel VMX (EPT violation) and
verifies that:
- The exit reason is an NPF/EPT violation
- The access type and permission bits are correct
- The faulting GPA is correct
Three test cases are implemented:
- Unmap the final data page (final translation fault, OUTSB read)
- Unmap a PT page (page walk fault, OUTSB read)
- Write-protect the final data page (protection violation, INSB write)
- Write-protect a PT page (protection violation on A/D update, OUTSB
read)
Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: name it nested_tdp_fault_test, consolidate asserts]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 9 +
.../selftests/kvm/x86/nested_tdp_fault_test.c | 313 ++++++++++++++++++
3 files changed, 323 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 82fa943b9503..2908eca1647a 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -97,6 +97,7 @@ TEST_GEN_PROGS_x86 += x86/nested_emulation_test
TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
TEST_GEN_PROGS_x86 += x86/nested_invalid_cr3_test
TEST_GEN_PROGS_x86 += x86/nested_set_state_test
+TEST_GEN_PROGS_x86 += x86/nested_tdp_fault_test
TEST_GEN_PROGS_x86 += x86/nested_tsc_adjust_test
TEST_GEN_PROGS_x86 += x86/nested_tsc_scaling_test
TEST_GEN_PROGS_x86 += x86/nested_vmsave_vmload_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 851ffcd3340c..06878e7c7347 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1573,6 +1573,15 @@ u64 *tdp_get_pte(struct kvm_vm *vm, u64 l2_gpa);
#define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define EPT_VIOLATION_ACC_READ BIT(0)
+#define EPT_VIOLATION_ACC_WRITE BIT(1)
+#define EPT_VIOLATION_ACC_INSTR BIT(2)
+#define EPT_VIOLATION_PROT_READ BIT(3)
+#define EPT_VIOLATION_PROT_WRITE BIT(4)
+#define EPT_VIOLATION_PROT_EXEC BIT(5)
+#define EPT_VIOLATION_GVA_IS_VALID BIT(7)
+#define EPT_VIOLATION_GVA_TRANSLATED BIT(8)
+
bool sys_clocksource_is_based_on_tsc(void);
#endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c b/tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c
new file mode 100644
index 000000000000..fa95568f55ff
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/nested_tdp_fault_test.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025, Google, Inc.
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "vmx.h"
+
+#define L2_GUEST_STACK_SIZE 64
+
+enum test_type {
+ TEST_FINAL_PAGE_UNMAPPED, /* Final data page not present */
+ TEST_PT_PAGE_UNMAPPED, /* Page table page not present */
+ TEST_FINAL_PAGE_WRITE_PROTECTED, /* Final data page read-only */
+ TEST_PT_PAGE_WRITE_PROTECTED, /* Page table page read-only */
+};
+
+static gva_t l2_test_page;
+static void (*l2_entry)(void);
+
+#define TEST_IO_PORT 0x80
+#define TEST1_VADDR 0x8000000ULL
+#define TEST2_VADDR 0x10000000ULL
+#define TEST3_VADDR 0x18000000ULL
+#define TEST4_VADDR 0x20000000ULL
+
+/*
+ * L2 executes OUTS reading from l2_test_page, triggering a nested page
+ * fault on the read access.
+ */
+static void l2_guest_code_outs(void)
+{
+ asm volatile("outsb" ::"S"(l2_test_page), "d"(TEST_IO_PORT) : "memory");
+ GUEST_FAIL("L2 should not reach here");
+}
+
+/*
+ * L2 executes INS writing to l2_test_page, triggering a nested page
+ * fault on the write access.
+ */
+static void l2_guest_code_ins(void)
+{
+ asm volatile("insb" ::"D"(l2_test_page), "d"(TEST_IO_PORT) : "memory");
+ GUEST_FAIL("L2 should not reach here");
+}
+
+#define GUEST_ASSERT_EXIT_QUAL(ac_eq, ex_eq) \
+ __GUEST_ASSERT((ac_eq) == (ex_eq), \
+ "Wanted EXIT_QUAL '0x%lx', got '0x%lx'", ex_eq, ac_eq)
+
+static void l1_vmx_code(struct vmx_pages *vmx, u64 expected_fault_gpa,
+ u64 test_type)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ u64 exit_qual;
+
+ GUEST_ASSERT(vmx->vmcs_gpa);
+ GUEST_ASSERT(prepare_for_vmx_operation(vmx));
+ GUEST_ASSERT(load_vmcs(vmx));
+
+ prepare_vmcs(vmx, l2_entry, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ GUEST_ASSERT(!vmlaunch());
+
+ /* Verify we got an EPT violation exit */
+ __GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_EPT_VIOLATION,
+ "Expected EPT violation (0x%x), got 0x%lx",
+ EXIT_REASON_EPT_VIOLATION,
+ vmreadz(VM_EXIT_REASON));
+
+ __GUEST_ASSERT(vmreadz(GUEST_PHYSICAL_ADDRESS) == expected_fault_gpa,
+ "Expected guest_physical_address = 0x%lx, got 0x%lx",
+ expected_fault_gpa,
+ vmreadz(GUEST_PHYSICAL_ADDRESS));
+
+ exit_qual = vmreadz(EXIT_QUALIFICATION);
+
+ /*
+ * Note, EPT page table accesses are always read+write, e.g. so that
+ * the CPU can do A/D updates at-will.
+ */
+ switch (test_type) {
+ case TEST_FINAL_PAGE_UNMAPPED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_READ |
+ EPT_VIOLATION_GVA_IS_VALID |
+ EPT_VIOLATION_GVA_TRANSLATED);
+ break;
+ case TEST_PT_PAGE_UNMAPPED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_READ |
+ EPT_VIOLATION_ACC_WRITE |
+ EPT_VIOLATION_GVA_IS_VALID);
+ break;
+ case TEST_FINAL_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_WRITE |
+ EPT_VIOLATION_PROT_READ |
+ EPT_VIOLATION_PROT_EXEC |
+ EPT_VIOLATION_GVA_IS_VALID |
+ EPT_VIOLATION_GVA_TRANSLATED);
+ break;
+ case TEST_PT_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_EXIT_QUAL(exit_qual, EPT_VIOLATION_ACC_READ |
+ EPT_VIOLATION_ACC_WRITE |
+ EPT_VIOLATION_PROT_READ |
+ EPT_VIOLATION_PROT_EXEC |
+ EPT_VIOLATION_GVA_IS_VALID);
+ break;
+ }
+
+ GUEST_DONE();
+}
+
+#define GUEST_ASSERT_NPF_EC(ac_ec, ex_ec) \
+ __GUEST_ASSERT((ac_ec) == (ex_ec), \
+ "Wanted NPF error code '0x%lx', got '0x%lx'", (u64)(ex_ec), ac_ec)
+
+
+static void l1_svm_code(struct svm_test_data *svm, u64 expected_fault_gpa,
+ u64 test_type)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ struct vmcb *vmcb = svm->vmcb;
+ u64 exit_info_1;
+
+ generic_svm_setup(svm, l2_entry,
+ &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ run_guest(vmcb, svm->vmcb_gpa);
+
+ /* Verify we got an NPF exit */
+ __GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_NPF,
+ "Expected NPF exit (0x%x), got 0x%lx", SVM_EXIT_NPF,
+ vmcb->control.exit_code);
+
+ __GUEST_ASSERT(vmcb->control.exit_info_2 == expected_fault_gpa,
+ "Expected exit_info_2 = 0x%lx, got 0x%lx",
+ expected_fault_gpa,
+ vmcb->control.exit_info_2);
+
+ exit_info_1 = vmcb->control.exit_info_1;
+
+ /*
+ * Note, without GMET enabled, NPT walks are always user accesses. And
+ * like EPT, page table accesses are always read+write.
+ */
+ switch (test_type) {
+ case TEST_FINAL_PAGE_UNMAPPED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_USER_MASK |
+ PFERR_GUEST_FINAL_MASK);
+ break;
+ case TEST_PT_PAGE_UNMAPPED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_WRITE_MASK |
+ PFERR_USER_MASK |
+ PFERR_GUEST_PAGE_MASK);
+ break;
+ case TEST_FINAL_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_PRESENT_MASK |
+ PFERR_WRITE_MASK |
+ PFERR_USER_MASK |
+ PFERR_GUEST_FINAL_MASK);
+ break;
+ case TEST_PT_PAGE_WRITE_PROTECTED:
+ GUEST_ASSERT_NPF_EC(exit_info_1, PFERR_PRESENT_MASK |
+ PFERR_WRITE_MASK |
+ PFERR_USER_MASK |
+ PFERR_GUEST_PAGE_MASK);
+ break;
+ }
+
+ GUEST_DONE();
+}
+
+static void l1_guest_code(void *data, u64 expected_fault_gpa,
+ u64 test_type)
+{
+ if (this_cpu_has(X86_FEATURE_VMX))
+ l1_vmx_code(data, expected_fault_gpa, test_type);
+ else
+ l1_svm_code(data, expected_fault_gpa, test_type);
+}
+
+/* Returns the GPA of the PT page that maps @vaddr. */
+static u64 get_pt_gpa_for_vaddr(struct kvm_vm *vm, u64 vaddr)
+{
+ u64 *pte;
+
+ pte = vm_get_pte(vm, vaddr);
+ TEST_ASSERT(pte && (*pte & 0x1), "PTE not present for vaddr 0x%lx",
+ (unsigned long)vaddr);
+
+ return addr_hva2gpa(vm, (void *)((u64)pte & ~0xFFFULL));
+}
+
+static void run_test(enum test_type type)
+{
+ gpa_t expected_fault_gpa;
+ gva_t nested_gva;
+
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct ucall uc;
+
+ vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+ vm_enable_tdp(vm);
+
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vcpu_alloc_vmx(vm, &nested_gva);
+ else
+ vcpu_alloc_svm(vm, &nested_gva);
+
+ switch (type) {
+ case TEST_FINAL_PAGE_UNMAPPED:
+ /*
+ * Unmap the final data page from NPT/EPT. The guest page
+ * table walk succeeds, but the final GPA->HPA translation
+ * fails. L2 reads from the page via OUTS.
+ */
+ l2_entry = l2_guest_code_outs;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST1_VADDR);
+ expected_fault_gpa = addr_gva2gpa(vm, l2_test_page);
+ break;
+ case TEST_PT_PAGE_UNMAPPED:
+ /*
+ * Unmap a page table page from NPT/EPT. The hardware page
+ * table walk fails when translating the PT page's GPA
+ * through NPT/EPT. L2 reads from the page via OUTS.
+ */
+ l2_entry = l2_guest_code_outs;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST2_VADDR);
+ expected_fault_gpa = get_pt_gpa_for_vaddr(vm, l2_test_page);
+ break;
+ case TEST_FINAL_PAGE_WRITE_PROTECTED:
+ /*
+ * Write-protect the final data page in NPT/EPT. The page
+ * is present and readable, but not writable. L2 writes to
+ * the page via INS, triggering a protection violation.
+ */
+ l2_entry = l2_guest_code_ins;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST3_VADDR);
+ expected_fault_gpa = addr_gva2gpa(vm, l2_test_page);
+ break;
+ case TEST_PT_PAGE_WRITE_PROTECTED:
+ /*
+ * Write-protect a page table page in NPT/EPT. The page is
+ * present and readable, but not writable. The guest page
+ * table walk needs write access to set A/D bits, so it
+ * triggers a protection violation on the PT page.
+ * L2 reads from the page via OUTS.
+ */
+ l2_entry = l2_guest_code_outs;
+ l2_test_page = vm_alloc(vm, vm->page_size, TEST4_VADDR);
+ expected_fault_gpa = get_pt_gpa_for_vaddr(vm, l2_test_page);
+ break;
+ }
+
+ tdp_identity_map_default_memslots(vm);
+
+ if (type == TEST_FINAL_PAGE_WRITE_PROTECTED ||
+ type == TEST_PT_PAGE_WRITE_PROTECTED)
+ *tdp_get_pte(vm, expected_fault_gpa) &= ~PTE_WRITABLE_MASK(&vm->stage2_mmu);
+ else
+ *tdp_get_pte(vm, expected_fault_gpa) &= ~(PTE_PRESENT_MASK(&vm->stage2_mmu) |
+ PTE_READABLE_MASK(&vm->stage2_mmu) |
+ PTE_WRITABLE_MASK(&vm->stage2_mmu) |
+ PTE_EXECUTABLE_MASK(&vm->stage2_mmu));
+
+ sync_global_to_guest(vm, l2_entry);
+ sync_global_to_guest(vm, l2_test_page);
+ vcpu_args_set(vcpu, 3, nested_gva, expected_fault_gpa, (u64)type);
+
+ /*
+ * For the INS-based write test, KVM emulates the instruction and
+ * first reads from the I/O port, which exits to userspace.
+ * Re-enter the guest so emulation can proceed to the memory
+ * write, where the nested page fault is triggered.
+ */
+ for (;;) {
+ vcpu_run(vcpu);
+
+ if (vcpu->run->exit_reason == KVM_EXIT_IO &&
+ vcpu->run->io.port == TEST_IO_PORT &&
+ vcpu->run->io.direction == KVM_EXIT_IO_IN) {
+ continue;
+ }
+ break;
+ }
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ default:
+ TEST_FAIL("Unexpected exit reason: %d", vcpu->run->exit_reason);
+ }
+
+ kvm_vm_free(vm);
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX) || kvm_cpu_has(X86_FEATURE_SVM));
+ TEST_REQUIRE(kvm_cpu_has_tdp());
+
+ run_test(TEST_FINAL_PAGE_UNMAPPED);
+ run_test(TEST_PT_PAGE_UNMAPPED);
+ run_test(TEST_FINAL_PAGE_WRITE_PROTECTED);
+ run_test(TEST_PT_PAGE_WRITE_PROTECTED);
+
+ return 0;
+}
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info
2026-05-22 23:26 [PATCH v4 0/5] KVM: X86: Fix nested TDP error code info Sean Christopherson
` (4 preceding siblings ...)
2026-05-22 23:27 ` [PATCH v4 5/5] KVM: selftests: Add nested page fault injection test Sean Christopherson
@ 2026-05-27 18:10 ` Sean Christopherson
5 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2026-05-27 18:10 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Kevin Cheng
On Fri, 22 May 2026 16:26:56 -0700, Sean Christopherson wrote:
> Kevin's series to fix how KVM populates error information when injecting
> nested page faults (NPF on SVM, EPT violations on VMX) to L1 during
> instruction emulation.
>
> See v3 for the full cover letter.
>
> v4:
> - Pass @from_hardware directly instead of stuff a flag in x86_exception.
> - Use the bits in @access (thanks to MBEC+GMET) to get the fault stage.
> - Check the entire PFEC/EXIT_QUAL in the selftest.
> - Use hardware _or_ KVM information, never merge the two.
> - Name the selftest nested_tdp_fault_test.
>
> [...]
Applied to kvm-x86 misc, thanks!
[1/5] KVM: x86: Widen x86_exception's error_code to 64 bits
https://github.com/kvm-x86/linux/commit/bb24edbb673f
[2/5] KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
https://github.com/kvm-x86/linux/commit/fe0b872d7500
[3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
https://github.com/kvm-x86/linux/commit/297c2fe249db
[4/5] KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits
https://github.com/kvm-x86/linux/commit/96b067b59ad9
[5/5] KVM: selftests: Add nested page fault injection test
https://github.com/kvm-x86/linux/commit/0de1020f7bbb
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 15+ messages in thread