public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: SVM: Set PFERR_GUEST_{PAGE,FINAL}_MASK for nested NPF and add selftest
@ 2026-01-21  0:49 Kevin Cheng
  2026-01-21  0:49 ` [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kevin Cheng @ 2026-01-21  0:49 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, yosry.ahmed, Kevin Cheng

This series fixes the setting of PFERR_GUEST_PAGE_MASK and
PFERR_GUEST_FINAL_MASK when injecting a nested page fault to L1,
and adds a selftest to verify the behavior.

When a nested page fault occurs, L1 needs to know whether the fault
happened during the page table walk (on a PT page) or on the final
data page translation. This information is conveyed through
PFERR_GUEST_PAGE_MASK (bit 33) and PFERR_GUEST_FINAL_MASK (bit 32)
in exit_info_1.

Currently, these bits are not set when an NPF is injected during
instruction emulation where the original exit reason was not NPF
(e.g., during string I/O emulation). This series fixes that and adds
test coverage.

Patch 1 adds the PFERR_GUEST_{PAGE,FINAL}_MASK bits to the fault error
code in paging_tmpl.h when the GPA->HPA translation fails during a
guest page table walk or final page translation.

Patch 2 adds TDP unmap helper functions to the selftest library,
enabling tests to selectively unmap pages from the NPT to trigger
nested page faults.

Patch 3 adds a selftest that exercises the nested NPF injection path
by having L2 execute an OUTS instruction with the source address
unmapped from L1's NPT. The test verifies that the correct
PFERR_GUEST_* bit is set and that exit_info_2 contains the faulting
GPA.

Kevin Cheng (3):
  KVM: SVM: Fix nested NPF injection to set
    PFERR_GUEST_{PAGE,FINAL}_MASK
  KVM: selftests: Add TDP unmap helpers
  KVM: selftests: Add nested NPF injection test for SVM

 arch/x86/kvm/kvm_emulate.h                    |   2 +-
 arch/x86/kvm/mmu/paging_tmpl.h                |  22 ++-
 arch/x86/kvm/svm/nested.c                     |  11 +-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/include/x86/processor.h     |   6 +
 .../testing/selftests/kvm/lib/x86/processor.c |  53 ++++++
 .../selftests/kvm/x86/svm_nested_npf_test.c   | 154 ++++++++++++++++++
 7 files changed, 230 insertions(+), 19 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_npf_test.c


base-commit: 38f626812b20ad22ab6dc9cfe6d811850f2d8244
--
2.52.0.457.g6b5491de43-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK
  2026-01-21  0:49 [PATCH 0/3] KVM: SVM: Set PFERR_GUEST_{PAGE,FINAL}_MASK for nested NPF and add selftest Kevin Cheng
@ 2026-01-21  0:49 ` Kevin Cheng
  2026-01-21 22:07   ` Sean Christopherson
  2026-01-21  0:49 ` [PATCH 2/3] KVM: selftests: Add TDP unmap helpers Kevin Cheng
  2026-01-21  0:49 ` [PATCH 3/3] KVM: selftests: Add nested NPF injection test for SVM Kevin Cheng
  2 siblings, 1 reply; 11+ messages in thread
From: Kevin Cheng @ 2026-01-21  0:49 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, yosry.ahmed, Kevin Cheng

When KVM emulates an instruction for L2 and encounters a nested page
fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
injects an NPF to L1. However, the code incorrectly hardcodes
(1ULL << 32) for exit_info_1's upper bits when the original exit was
not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
occurred on a page table page, preventing L1 from correctly identifying
the cause of the fault.

Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
the fault occurs on the final GPA-to-HPA translation.

Widen error_code in struct x86_exception from u16 to u64 to accommodate
the PFERR_GUEST_* bits (bits 32 and 33).

Update nested_svm_inject_npf_exit() to use fault->error_code directly
instead of hardcoding the upper bits. Also add a WARN_ON_ONCE if neither
PFERR_GUEST_FINAL_MASK nor PFERR_GUEST_PAGE_MASK is set, as this would
indicate a bug in the page fault handling code.

Signed-off-by: Kevin Cheng <chengkev@google.com>
---
 arch/x86/kvm/kvm_emulate.h     |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h | 22 ++++++++++------------
 arch/x86/kvm/svm/nested.c      | 11 +++++------
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index fb3dab4b5a53e..ff4f9b0a01ff7 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -22,7 +22,7 @@ enum x86_intercept_stage;
 struct x86_exception {
 	u8 vector;
 	bool error_code_valid;
-	u16 error_code;
+	u64 error_code;
 	bool nested_page_fault;
 	u64 address; /* cr2 or nested page fault gpa */
 	u8 async_page_fault;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 901cd2bd40b84..923179bfd5c74 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -379,18 +379,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 		real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn),
 					     nested_access, &walker->fault);
 
-		/*
-		 * FIXME: This can happen if emulation (for of an INS/OUTS
-		 * instruction) triggers a nested page fault.  The exit
-		 * qualification / exit info field will incorrectly have
-		 * "guest page access" as the nested page fault's cause,
-		 * instead of "guest page structure access".  To fix this,
-		 * the x86_exception struct should be augmented with enough
-		 * information to fix the exit_qualification or exit_info_1
-		 * fields.
-		 */
-		if (unlikely(real_gpa == INVALID_GPA))
+		if (unlikely(real_gpa == INVALID_GPA)) {
+#if PTTYPE != PTTYPE_EPT
+			walker->fault.error_code |= PFERR_GUEST_PAGE_MASK;
+#endif
 			return 0;
+		}
 
 		slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(real_gpa));
 		if (!kvm_is_visible_memslot(slot))
@@ -446,8 +440,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 #endif
 
 	real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walker->fault);
-	if (real_gpa == INVALID_GPA)
+	if (real_gpa == INVALID_GPA) {
+#if PTTYPE != PTTYPE_EPT
+		walker->fault.error_code |= PFERR_GUEST_FINAL_MASK;
+#endif
 		return 0;
+	}
 
 	walker->gfn = real_gpa >> PAGE_SHIFT;
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd5..f8dfd5c333023 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -40,18 +40,17 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
 	struct vmcb *vmcb = svm->vmcb;
 
 	if (vmcb->control.exit_code != SVM_EXIT_NPF) {
-		/*
-		 * TODO: track the cause of the nested page fault, and
-		 * correctly fill in the high bits of exit_info_1.
-		 */
-		vmcb->control.exit_code = SVM_EXIT_NPF;
-		vmcb->control.exit_info_1 = (1ULL << 32);
+		vmcb->control.exit_info_1 = fault->error_code;
 		vmcb->control.exit_info_2 = fault->address;
 	}
 
+	vmcb->control.exit_code = SVM_EXIT_NPF;
 	vmcb->control.exit_info_1 &= ~0xffffffffULL;
 	vmcb->control.exit_info_1 |= fault->error_code;
 
+	WARN_ON_ONCE(!(vmcb->control.exit_info_1 &
+		       (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK)));
+
 	nested_svm_vmexit(svm);
 }
 
-- 
2.52.0.457.g6b5491de43-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] KVM: selftests: Add TDP unmap helpers
  2026-01-21  0:49 [PATCH 0/3] KVM: SVM: Set PFERR_GUEST_{PAGE,FINAL}_MASK for nested NPF and add selftest Kevin Cheng
  2026-01-21  0:49 ` [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
@ 2026-01-21  0:49 ` Kevin Cheng
  2026-01-21 22:21   ` Sean Christopherson
  2026-01-21  0:49 ` [PATCH 3/3] KVM: selftests: Add nested NPF injection test for SVM Kevin Cheng
  2 siblings, 1 reply; 11+ messages in thread
From: Kevin Cheng @ 2026-01-21  0:49 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, yosry.ahmed, Kevin Cheng

Add __virt_pg_unmap(), __tdp_unmap(), and tdp_unmap() as counterparts
to the existing __virt_pg_map(), __tdp_map(), and tdp_map() functions.
These helpers allow tests to selectively unmap pages from the TDP/NPT,
enabling testing of NPT faults for unmapped pages.

Signed-off-by: Kevin Cheng <chengkev@google.com>
---
 .../selftests/kvm/include/x86/processor.h     |  6 +++
 .../testing/selftests/kvm/lib/x86/processor.c | 53 +++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 6bfffc3b0a332..23ec5030a1d1f 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1487,6 +1487,12 @@ void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t
 void tdp_identity_map_default_memslots(struct kvm_vm *vm);
 void tdp_identity_map_1g(struct kvm_vm *vm,  uint64_t addr, uint64_t size);
 
+void __virt_pg_unmap(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
+		     int level);
+void __tdp_unmap(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t size,
+		 int level);
+void tdp_unmap(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t size);
+
 /*
  * Basic CPU control in CR0
  */
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index ab869a98bbdce..8cb0d74aaa41e 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -338,6 +338,40 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 	}
 }
 
+void __virt_pg_unmap(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
+		     int level)
+{
+	uint64_t *pte = &mmu->pgd;
+	int current_level;
+
+	TEST_ASSERT(level >= PG_LEVEL_4K && level <= mmu->pgtable_levels,
+		    "Invalid level %d", level);
+
+	/* Walk down to target level */
+	for (current_level = mmu->pgtable_levels;
+	     current_level > level;
+	     current_level--) {
+		pte = virt_get_pte(vm, mmu, pte, vaddr, current_level);
+
+		TEST_ASSERT(is_present_pte(mmu, pte),
+			    "Entry not present at level %d for vaddr 0x%lx",
+			    current_level, vaddr);
+		TEST_ASSERT(!is_huge_pte(mmu, pte),
+			    "Unexpected huge page at level %d for vaddr 0x%lx",
+			    current_level, vaddr);
+	}
+
+	/* Get the PTE at target level */
+	pte = virt_get_pte(vm, mmu, pte, vaddr, level);
+
+	TEST_ASSERT(is_present_pte(mmu, pte),
+		    "Entry not present at level %d for vaddr 0x%lx",
+		    level, vaddr);
+
+	/* Clear the PTE */
+	*pte = 0;
+}
+
 static bool vm_is_target_pte(struct kvm_mmu *mmu, uint64_t *pte,
 			     int *level, int current_level)
 {
@@ -541,6 +575,25 @@ void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
 	__tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
 }
 
+void __tdp_unmap(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t size,
+		 int level)
+{
+	size_t page_size = PG_LEVEL_SIZE(level);
+	size_t npages = size / page_size;
+
+	TEST_ASSERT(nested_paddr + size > nested_paddr, "Address overflow");
+
+	while (npages--) {
+		__virt_pg_unmap(vm, &vm->stage2_mmu, nested_paddr, level);
+		nested_paddr += page_size;
+	}
+}
+
+void tdp_unmap(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t size)
+{
+	__tdp_unmap(vm, nested_paddr, size, PG_LEVEL_4K);
+}
+
 /*
  * Set Unusable Segment
  *
-- 
2.52.0.457.g6b5491de43-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] KVM: selftests: Add nested NPF injection test for SVM
  2026-01-21  0:49 [PATCH 0/3] KVM: SVM: Set PFERR_GUEST_{PAGE,FINAL}_MASK for nested NPF and add selftest Kevin Cheng
  2026-01-21  0:49 ` [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
  2026-01-21  0:49 ` [PATCH 2/3] KVM: selftests: Add TDP unmap helpers Kevin Cheng
@ 2026-01-21  0:49 ` Kevin Cheng
  2026-01-23  1:13   ` Sean Christopherson
  2 siblings, 1 reply; 11+ messages in thread
From: Kevin Cheng @ 2026-01-21  0:49 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, yosry.ahmed, Kevin Cheng

Add a test that exercises nested NPF injection when the original VM
exit was not an NPF. This tests the code path in
nested_svm_inject_npf_exit() where exit_code != SVM_EXIT_NPF.

L2 executes an OUTS instruction with the source address mapped in L2's
page tables but not in L1's NPT. KVM emulates the string I/O, and when
it tries to read the source operand, the GPA->HPA translation fails.
KVM then injects an NPF to L1 even though the original exit was IOIO.

The test verifies that:
  - The exit code is converted to SVM_EXIT_NPF
  - exit_info_1 has the appropriate PFERR_GUEST_* bit set
  - exit_info_2 contains the correct faulting GPA

Two test cases are implemented:
  - Test 1: Unmap the final data page from NPT (PFERR_GUEST_FINAL_MASK)
  - Test 2: Unmap a PT page from NPT (PFERR_GUEST_PAGE_MASK)

Signed-off-by: Kevin Cheng <chengkev@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/x86/svm_nested_npf_test.c   | 154 ++++++++++++++++++
 2 files changed, 155 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_npf_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index e88699e227ddf..8babe6e228e11 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -112,6 +112,7 @@ TEST_GEN_PROGS_x86 += x86/svm_vmcall_test
 TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test
 TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test
 TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test
+TEST_GEN_PROGS_x86 += x86/svm_nested_npf_test
 TEST_GEN_PROGS_x86 += x86/tsc_scaling_sync
 TEST_GEN_PROGS_x86 += x86/sync_regs_test
 TEST_GEN_PROGS_x86 += x86/ucna_injection_test
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_npf_test.c b/tools/testing/selftests/kvm/x86/svm_nested_npf_test.c
new file mode 100644
index 0000000000000..c0a894acbc483
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/svm_nested_npf_test.c
@@ -0,0 +1,154 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * svm_nested_npf_test
+ *
+ * Test nested NPF injection when the original VM exit was not an NPF.
+ * This exercises nested_svm_inject_npf_exit() with exit_code != SVM_EXIT_NPF.
+ *
+ * L2 executes OUTS with the source address mapped in L2's page tables but
+ * not in L1's NPT. KVM emulates the string I/O instruction, and when it
+ * tries to read the source operand, the GPA->HPA translation fails. KVM
+ * then injects an NPF to L1 even though the original exit was IOIO.
+ *
+ * Test 1: Final data page GPA not in NPT (PFERR_GUEST_FINAL_MASK)
+ * Test 2: Page table page GPA not in NPT (PFERR_GUEST_PAGE_MASK)
+ *
+ * Copyright (C) 2025, Google, Inc.
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+
+#define L2_GUEST_STACK_SIZE 64
+
+enum test_type {
+	TEST_FINAL_PAGE_UNMAPPED, /* Final data page GPA not in NPT */
+	TEST_PT_PAGE_UNMAPPED, /* Page table page GPA not in NPT */
+};
+
+static void *l2_test_page;
+
+#define TEST_IO_PORT 0x80
+#define TEST1_VADDR 0x8000000ULL
+#define TEST2_VADDR 0x10000000ULL
+
+/*
+ * L2 executes OUTS with source at l2_test_page, triggering a nested NPF.
+ * The address is mapped in L2's page tables, but either the data page or
+ * a PT page is unmapped from L1's NPT, causing the fault.
+ */
+static void l2_guest_code(void *unused)
+{
+	asm volatile("outsb" ::"S"(l2_test_page), "d"(TEST_IO_PORT) : "memory");
+	GUEST_ASSERT(0);
+}
+
+static void l1_guest_code(struct svm_test_data *svm, void *expected_fault_gpa,
+						  uint64_t exit_info_1_mask)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	struct vmcb *vmcb = svm->vmcb;
+
+	generic_svm_setup(svm, l2_guest_code,
+			  &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+	run_guest(vmcb, svm->vmcb_gpa);
+
+	/* Verify we got an NPF exit (converted from IOIO by KVM) */
+	__GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_NPF,
+		       "Expected NPF exit (0x%x), got 0x%lx", SVM_EXIT_NPF,
+		       vmcb->control.exit_code);
+
+	/* Check for PFERR_GUEST_FINAL_MASK or PFERR_GUEST_PAGE_MASK */
+	__GUEST_ASSERT(vmcb->control.exit_info_1 & exit_info_1_mask,
+		       "Expected exit_info_1 to have 0x%lx set, got 0x%lx",
+		       (unsigned long)exit_info_1_mask,
+		       (unsigned long)vmcb->control.exit_info_1);
+
+	__GUEST_ASSERT(vmcb->control.exit_info_2 == (u64)expected_fault_gpa,
+		       "Expected exit_info_2 = 0x%lx, got 0x%lx",
+		       (unsigned long)expected_fault_gpa,
+		       (unsigned long)vmcb->control.exit_info_2);
+
+	GUEST_DONE();
+}
+
+/* Returns the GPA of the PT page that maps @vaddr. */
+static uint64_t get_pt_gpa_for_vaddr(struct kvm_vm *vm, uint64_t vaddr)
+{
+	uint64_t *pte;
+
+	pte = vm_get_pte(vm, vaddr);
+	TEST_ASSERT(pte && (*pte & 0x1), "PTE not present for vaddr 0x%lx",
+		    (unsigned long)vaddr);
+
+	return addr_hva2gpa(vm, (void *)((uint64_t)pte & ~0xFFFULL));
+}
+
+static void run_test(enum test_type type)
+{
+	vm_paddr_t expected_fault_gpa;
+	uint64_t exit_info_1_mask;
+	vm_vaddr_t svm_gva;
+
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	struct ucall uc;
+
+	vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+	vm_enable_npt(vm);
+	vcpu_alloc_svm(vm, &svm_gva);
+
+	if (type == TEST_FINAL_PAGE_UNMAPPED) {
+		/*
+		 * Test 1: Unmap the final data page from NPT. The page table
+		 * walk succeeds, but the final GPA->HPA translation fails.
+		 */
+		l2_test_page =
+			(void *)vm_vaddr_alloc(vm, vm->page_size, TEST1_VADDR);
+		expected_fault_gpa = addr_gva2gpa(vm, (vm_vaddr_t)l2_test_page);
+		exit_info_1_mask = PFERR_GUEST_FINAL_MASK;
+	} else {
+		/*
+		 * Test 2: Unmap a PT page from NPT. The hardware page table
+		 * walk fails when translating the PT page's GPA through NPT.
+		 */
+		l2_test_page =
+			(void *)vm_vaddr_alloc(vm, vm->page_size, TEST2_VADDR);
+		expected_fault_gpa =
+			get_pt_gpa_for_vaddr(vm, (vm_vaddr_t)l2_test_page);
+		exit_info_1_mask = PFERR_GUEST_PAGE_MASK;
+	}
+
+	tdp_identity_map_default_memslots(vm);
+	tdp_unmap(vm, expected_fault_gpa, vm->page_size);
+
+	sync_global_to_guest(vm, l2_test_page);
+	vcpu_args_set(vcpu, 3, svm_gva, expected_fault_gpa, exit_info_1_mask);
+
+	vcpu_run(vcpu);
+
+	switch (get_ucall(vcpu, &uc)) {
+	case UCALL_DONE:
+		break;
+	case UCALL_ABORT:
+		REPORT_GUEST_ASSERT(uc);
+	default:
+		TEST_FAIL("Unexpected exit reason: %d", vcpu->run->exit_reason);
+	}
+
+	kvm_vm_free(vm);
+}
+
+int main(int argc, char *argv[])
+{
+	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
+	TEST_REQUIRE(kvm_cpu_has_npt());
+
+	run_test(TEST_FINAL_PAGE_UNMAPPED);
+	run_test(TEST_PT_PAGE_UNMAPPED);
+
+	return 0;
+}
-- 
2.52.0.457.g6b5491de43-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK
  2026-01-21  0:49 ` [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
@ 2026-01-21 22:07   ` Sean Christopherson
  2026-01-22  0:45     ` Yosry Ahmed
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2026-01-21 22:07 UTC (permalink / raw)
  To: Kevin Cheng; +Cc: pbonzini, kvm, linux-kernel, yosry.ahmed

On Wed, Jan 21, 2026, Kevin Cheng wrote:
> When KVM emulates an instruction for L2 and encounters a nested page
> fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
> injects an NPF to L1. However, the code incorrectly hardcodes
> (1ULL << 32) for exit_info_1's upper bits when the original exit was
> not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
> occurred on a page table page, preventing L1 from correctly identifying
> the cause of the fault.
> 
> Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
> occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
> the fault occurs on the final GPA-to-HPA translation.
> 
> Widen error_code in struct x86_exception from u16 to u64 to accommodate
> the PFERR_GUEST_* bits (bits 32 and 33).

Please do this in a separate patch.  Intel CPUs straight up don't support 32-bit
error codes, let alone 64-bit error codes, so this seemingly innocuous change
needs to be accompanied by a lengthy changelog that effectively audits all usage
to "prove" this change is ok.

> Update nested_svm_inject_npf_exit() to use fault->error_code directly
> instead of hardcoding the upper bits. Also add a WARN_ON_ONCE if neither
> PFERR_GUEST_FINAL_MASK nor PFERR_GUEST_PAGE_MASK is set, as this would
> indicate a bug in the page fault handling code.
> 
> Signed-off-by: Kevin Cheng <chengkev@google.com>
> ---
>  arch/x86/kvm/kvm_emulate.h     |  2 +-
>  arch/x86/kvm/mmu/paging_tmpl.h | 22 ++++++++++------------
>  arch/x86/kvm/svm/nested.c      | 11 +++++------
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
> index fb3dab4b5a53e..ff4f9b0a01ff7 100644
> --- a/arch/x86/kvm/kvm_emulate.h
> +++ b/arch/x86/kvm/kvm_emulate.h
> @@ -22,7 +22,7 @@ enum x86_intercept_stage;
>  struct x86_exception {
>  	u8 vector;
>  	bool error_code_valid;
> -	u16 error_code;
> +	u64 error_code;
>  	bool nested_page_fault;
>  	u64 address; /* cr2 or nested page fault gpa */
>  	u8 async_page_fault;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 901cd2bd40b84..923179bfd5c74 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -379,18 +379,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  		real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn),
>  					     nested_access, &walker->fault);
>  
> -		/*
> -		 * FIXME: This can happen if emulation (for of an INS/OUTS
> -		 * instruction) triggers a nested page fault.  The exit
> -		 * qualification / exit info field will incorrectly have
> -		 * "guest page access" as the nested page fault's cause,
> -		 * instead of "guest page structure access".  To fix this,
> -		 * the x86_exception struct should be augmented with enough
> -		 * information to fix the exit_qualification or exit_info_1
> -		 * fields.
> -		 */
> -		if (unlikely(real_gpa == INVALID_GPA))
> +		if (unlikely(real_gpa == INVALID_GPA)) {
> +#if PTTYPE != PTTYPE_EPT
> +			walker->fault.error_code |= PFERR_GUEST_PAGE_MASK;
> +#endif

Why exclude EPT?  EPT doesn't use the error code _verbatim_, but EPT shares the
concept/reporting of intermediate vs. final walks.

>  			return 0;
> +		}
>  
>  		slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(real_gpa));
>  		if (!kvm_is_visible_memslot(slot))
> @@ -446,8 +440,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  #endif
>  
>  	real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walker->fault);
> -	if (real_gpa == INVALID_GPA)
> +	if (real_gpa == INVALID_GPA) {
> +#if PTTYPE != PTTYPE_EPT
> +		walker->fault.error_code |= PFERR_GUEST_FINAL_MASK;
> +#endif

Same thing here.

>  		return 0;
> +	}
>  
>  	walker->gfn = real_gpa >> PAGE_SHIFT;
>  
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de90b104a0dd5..f8dfd5c333023 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -40,18 +40,17 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
>  	struct vmcb *vmcb = svm->vmcb;
>  
>  	if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> -		/*
> -		 * TODO: track the cause of the nested page fault, and
> -		 * correctly fill in the high bits of exit_info_1.
> -		 */
> -		vmcb->control.exit_code = SVM_EXIT_NPF;
> -		vmcb->control.exit_info_1 = (1ULL << 32);
> +		vmcb->control.exit_info_1 = fault->error_code;
>  		vmcb->control.exit_info_2 = fault->address;
>  	}
>  
> +	vmcb->control.exit_code = SVM_EXIT_NPF;
>  	vmcb->control.exit_info_1 &= ~0xffffffffULL;
>  	vmcb->control.exit_info_1 |= fault->error_code;

So... what happens when exit_info_1 already has PFERR_GUEST_PAGE_MASK, and then
@fault sets PFERR_GUEST_FINAL_MASK?  Presumably that can't/shouldn't happen, but
nothing in the changelog explains why such a scenario is impossible, and nothing
in the code hardens KVM against such goofs.

> +	WARN_ON_ONCE(!(vmcb->control.exit_info_1 &
> +		       (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK)));
> +
>  	nested_svm_vmexit(svm);
>  }
>  
> -- 
> 2.52.0.457.g6b5491de43-goog
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] KVM: selftests: Add TDP unmap helpers
  2026-01-21  0:49 ` [PATCH 2/3] KVM: selftests: Add TDP unmap helpers Kevin Cheng
@ 2026-01-21 22:21   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-01-21 22:21 UTC (permalink / raw)
  To: Kevin Cheng; +Cc: pbonzini, kvm, linux-kernel, yosry.ahmed

On Wed, Jan 21, 2026, Kevin Cheng wrote:
> Add __virt_pg_unmap(), __tdp_unmap(), and tdp_unmap() as counterparts
> to the existing __virt_pg_map(), __tdp_map(), and tdp_map() functions.
> These helpers allow tests to selectively unmap pages from the TDP/NPT,
> enabling testing of NPT faults for unmapped pages.

This is both overkill and insufficient, just do:

	*tdp_get_pte(vm, <addr>) &= ~PTE_PRESENT_MASK(&vm->stage2_mmu);

Then when a test wants to validate more than just !PRESENT we don't need to add
another API.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK
  2026-01-21 22:07   ` Sean Christopherson
@ 2026-01-22  0:45     ` Yosry Ahmed
  2026-01-28 15:48       ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Yosry Ahmed @ 2026-01-22  0:45 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Kevin Cheng, pbonzini, kvm, linux-kernel

On Wed, Jan 21, 2026 at 02:07:56PM -0800, Sean Christopherson wrote:
> On Wed, Jan 21, 2026, Kevin Cheng wrote:
> > When KVM emulates an instruction for L2 and encounters a nested page
> > fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
> > injects an NPF to L1. However, the code incorrectly hardcodes
> > (1ULL << 32) for exit_info_1's upper bits when the original exit was
> > not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
> > occurred on a page table page, preventing L1 from correctly identifying
> > the cause of the fault.
> > 
> > Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
> > occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
> > the fault occurs on the final GPA-to-HPA translation.
> > 
> > Widen error_code in struct x86_exception from u16 to u64 to accommodate
> > the PFERR_GUEST_* bits (bits 32 and 33).
> 
> Please do this in a separate patch.  Intel CPUs straight up don't support 32-bit
> error codes, let alone 64-bit error codes, so this seemingly innocuous change
> needs to be accompanied by a lengthy changelog that effectively audits all usage
> to "prove" this change is ok.

Semi-jokingly, we can add error_code_hi to track the high bits and
side-step the problem for Intel (dejavu?).

> 
> > Update nested_svm_inject_npf_exit() to use fault->error_code directly
> > instead of hardcoding the upper bits. Also add a WARN_ON_ONCE if neither
> > PFERR_GUEST_FINAL_MASK nor PFERR_GUEST_PAGE_MASK is set, as this would
> > indicate a bug in the page fault handling code.
> > 
> > Signed-off-by: Kevin Cheng <chengkev@google.com>
[..]
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index de90b104a0dd5..f8dfd5c333023 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -40,18 +40,17 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> >  	struct vmcb *vmcb = svm->vmcb;
> >  
> >  	if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> > -		/*
> > -		 * TODO: track the cause of the nested page fault, and
> > -		 * correctly fill in the high bits of exit_info_1.
> > -		 */
> > -		vmcb->control.exit_code = SVM_EXIT_NPF;
> > -		vmcb->control.exit_info_1 = (1ULL << 32);
> > +		vmcb->control.exit_info_1 = fault->error_code;
> >  		vmcb->control.exit_info_2 = fault->address;
> >  	}
> >  
> > +	vmcb->control.exit_code = SVM_EXIT_NPF;
> >  	vmcb->control.exit_info_1 &= ~0xffffffffULL;
> >  	vmcb->control.exit_info_1 |= fault->error_code;
> 
> So... what happens when exit_info_1 already has PFERR_GUEST_PAGE_MASK, and then
> @fault sets PFERR_GUEST_FINAL_MASK?  Presumably that can't/shouldn't happen,
> but nothing in the changelog explains why such a scenario is
> impossible, and nothing in the code hardens KVM against such goofs.

I guess we can update the WARN below to check for that as well, and
fallback to the current behavior (set PFERR_GUEST_FINAL_MASK):

	fault_stage = vmcb->control.exit_info_1 &
			(PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK);
	if (WARN_ON_ONCE(fault_stage != PFERR_GUEST_FINAL_MASK &&
			 fault_stage != PFERR_GUEST_PAGE_MASK))
		vmcb->control.exit_info_1 |= PFERR_GUEST_FINAL_MASK;

> 
> > +	WARN_ON_ONCE(!(vmcb->control.exit_info_1 &
> > +		       (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK)));
> > +
> >  	nested_svm_vmexit(svm);
> >  }
> >  
> > -- 
> > 2.52.0.457.g6b5491de43-goog
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: selftests: Add nested NPF injection test for SVM
  2026-01-21  0:49 ` [PATCH 3/3] KVM: selftests: Add nested NPF injection test for SVM Kevin Cheng
@ 2026-01-23  1:13   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-01-23  1:13 UTC (permalink / raw)
  To: Kevin Cheng; +Cc: pbonzini, kvm, linux-kernel, yosry.ahmed

On Wed, Jan 21, 2026, Kevin Cheng wrote:
> ---
>  tools/testing/selftests/kvm/Makefile.kvm      |   1 +
>  .../selftests/kvm/x86/svm_nested_npf_test.c   | 154 ++++++++++++++++++
>  2 files changed, 155 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_npf_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index e88699e227ddf..8babe6e228e11 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -112,6 +112,7 @@ TEST_GEN_PROGS_x86 += x86/svm_vmcall_test
>  TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test
>  TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test
>  TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test
> +TEST_GEN_PROGS_x86 += x86/svm_nested_npf_test

a, b, c, d, e, f, g, h, i, j, k, l, m, N, o, p, q, r, S, t, u, v, w, x, y, z

>  TEST_GEN_PROGS_x86 += x86/tsc_scaling_sync
>  TEST_GEN_PROGS_x86 += x86/sync_regs_test
>  TEST_GEN_PROGS_x86 += x86/ucna_injection_test
> diff --git a/tools/testing/selftests/kvm/x86/svm_nested_npf_test.c b/tools/testing/selftests/kvm/x86/svm_nested_npf_test.c
> new file mode 100644
> index 0000000000000..c0a894acbc483
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86/svm_nested_npf_test.c
> @@ -0,0 +1,154 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * svm_nested_npf_test
> + *
> + * Test nested NPF injection when the original VM exit was not an NPF.
> + * This exercises nested_svm_inject_npf_exit() with exit_code != SVM_EXIT_NPF.
> + *
> + * L2 executes OUTS with the source address mapped in L2's page tables but
> + * not in L1's NPT. KVM emulates the string I/O instruction, and when it
> + * tries to read the source operand, the GPA->HPA translation fails. KVM
> + * then injects an NPF to L1 even though the original exit was IOIO.
> + *
> + * Test 1: Final data page GPA not in NPT (PFERR_GUEST_FINAL_MASK)
> + * Test 2: Page table page GPA not in NPT (PFERR_GUEST_PAGE_MASK)

Please don't add file-level comments (the Copyright is fine), because things like
the name of the test/file inevitably become stale, and they're useless, and the
description of _what_ the test is doing is almost always more helpful if it's
the comment is closer to the code it's documenting.

> + *
> + * Copyright (C) 2025, Google, Inc.
> + */
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +#include "svm_util.h"
> +
> +#define L2_GUEST_STACK_SIZE 64
> +
> +enum test_type {
> +	TEST_FINAL_PAGE_UNMAPPED, /* Final data page GPA not in NPT */
> +	TEST_PT_PAGE_UNMAPPED, /* Page table page GPA not in NPT */
> +};
> +
> +static void *l2_test_page;

Why store it as a "void *"?  Just track a vm_addr_t and avoid a bunch of casts.

> +
> +#define TEST_IO_PORT 0x80
> +#define TEST1_VADDR 0x8000000ULL
> +#define TEST2_VADDR 0x10000000ULL
> +
> +/*
> + * L2 executes OUTS with source at l2_test_page, triggering a nested NPF.
> + * The address is mapped in L2's page tables, but either the data page or
> + * a PT page is unmapped from L1's NPT, causing the fault.
> + */
> +static void l2_guest_code(void *unused)
> +{
> +	asm volatile("outsb" ::"S"(l2_test_page), "d"(TEST_IO_PORT) : "memory");
> +	GUEST_ASSERT(0);

	GUEST_FAIL
> +}
> +

...

> +static void run_test(enum test_type type)
> +{
> +	vm_paddr_t expected_fault_gpa;
> +	uint64_t exit_info_1_mask;
> +	vm_vaddr_t svm_gva;
> +
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	struct ucall uc;
> +
> +	vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
> +	vm_enable_npt(vm);
> +	vcpu_alloc_svm(vm, &svm_gva);
> +
> +	if (type == TEST_FINAL_PAGE_UNMAPPED) {
> +		/*
> +		 * Test 1: Unmap the final data page from NPT. The page table
> +		 * walk succeeds, but the final GPA->HPA translation fails.
> +		 */
> +		l2_test_page =
> +			(void *)vm_vaddr_alloc(vm, vm->page_size, TEST1_VADDR);
> +		expected_fault_gpa = addr_gva2gpa(vm, (vm_vaddr_t)l2_test_page);
> +		exit_info_1_mask = PFERR_GUEST_FINAL_MASK;
> +	} else {
> +		/*
> +		 * Test 2: Unmap a PT page from NPT. The hardware page table
> +		 * walk fails when translating the PT page's GPA through NPT.
> +		 */
> +		l2_test_page =
> +			(void *)vm_vaddr_alloc(vm, vm->page_size, TEST2_VADDR);
> +		expected_fault_gpa =
> +			get_pt_gpa_for_vaddr(vm, (vm_vaddr_t)l2_test_page);
> +		exit_info_1_mask = PFERR_GUEST_PAGE_MASK;
> +	}
> +
> +	tdp_identity_map_default_memslots(vm);
> +	tdp_unmap(vm, expected_fault_gpa, vm->page_size);

Hrm.  This should really be a vendor agnostic test.  There exactly results are
vendor specific, but thye core concept and pretty much all of the configuration
is nearly identical.

It'd also be nice to support more than just !PRESENT, e.g. to verify protection
violations and other things that set PFERR/EXIT_QUAL bits.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK
  2026-01-22  0:45     ` Yosry Ahmed
@ 2026-01-28 15:48       ` Sean Christopherson
  2026-02-04 16:22         ` Kevin Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2026-01-28 15:48 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Kevin Cheng, pbonzini, kvm, linux-kernel

On Thu, Jan 22, 2026, Yosry Ahmed wrote:
> On Wed, Jan 21, 2026 at 02:07:56PM -0800, Sean Christopherson wrote:
> > On Wed, Jan 21, 2026, Kevin Cheng wrote:
> > > When KVM emulates an instruction for L2 and encounters a nested page
> > > fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
> > > injects an NPF to L1. However, the code incorrectly hardcodes
> > > (1ULL << 32) for exit_info_1's upper bits when the original exit was
> > > not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
> > > occurred on a page table page, preventing L1 from correctly identifying
> > > the cause of the fault.
> > > 
> > > Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
> > > occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
> > > the fault occurs on the final GPA-to-HPA translation.
> > > 
> > > Widen error_code in struct x86_exception from u16 to u64 to accommodate
> > > the PFERR_GUEST_* bits (bits 32 and 33).
> > 
> > Please do this in a separate patch.  Intel CPUs straight up don't support 32-bit
> > error codes, let alone 64-bit error codes, so this seemingly innocuous change
> > needs to be accompanied by a lengthy changelog that effectively audits all usage
> > to "prove" this change is ok.
> 
> Semi-jokingly, we can add error_code_hi to track the high bits and
> side-step the problem for Intel (dejavu?).

Technically, it would require three fields: u16 error_code, u16 error_code_hi,
and u32 error_code_ultra_hi.  :-D

Isolating the (ultra) hi flags is very tempting, but I worry that it would lead
to long term pain, e.g. because inevitably we'll forget to grab the hi flags at
some point.  I'd rather audit the current code and ensure that KVM truncates the
error code as needed.

VMX is probably a-ok, e.g. see commit eba9799b5a6e ("KVM: VMX: Drop bits 31:16
when shoving exception error code into VMCS").  I'd be more worred SVM, where
it's legal to shove a 32-bit value into the error code, i.e. where KVM might not
have existing explicit truncation.

> > > Update nested_svm_inject_npf_exit() to use fault->error_code directly
> > > instead of hardcoding the upper bits. Also add a WARN_ON_ONCE if neither
> > > PFERR_GUEST_FINAL_MASK nor PFERR_GUEST_PAGE_MASK is set, as this would
> > > indicate a bug in the page fault handling code.
> > > 
> > > Signed-off-by: Kevin Cheng <chengkev@google.com>
> [..]
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > index de90b104a0dd5..f8dfd5c333023 100644
> > > --- a/arch/x86/kvm/svm/nested.c
> > > +++ b/arch/x86/kvm/svm/nested.c
> > > @@ -40,18 +40,17 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> > >  	struct vmcb *vmcb = svm->vmcb;
> > >  
> > >  	if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> > > -		/*
> > > -		 * TODO: track the cause of the nested page fault, and
> > > -		 * correctly fill in the high bits of exit_info_1.
> > > -		 */
> > > -		vmcb->control.exit_code = SVM_EXIT_NPF;
> > > -		vmcb->control.exit_info_1 = (1ULL << 32);
> > > +		vmcb->control.exit_info_1 = fault->error_code;
> > >  		vmcb->control.exit_info_2 = fault->address;
> > >  	}
> > >  
> > > +	vmcb->control.exit_code = SVM_EXIT_NPF;
> > >  	vmcb->control.exit_info_1 &= ~0xffffffffULL;
> > >  	vmcb->control.exit_info_1 |= fault->error_code;
> > 
> > So... what happens when exit_info_1 already has PFERR_GUEST_PAGE_MASK, and then
> > @fault sets PFERR_GUEST_FINAL_MASK?  Presumably that can't/shouldn't happen,
> > but nothing in the changelog explains why such a scenario is
> > impossible, and nothing in the code hardens KVM against such goofs.
> 
> I guess we can update the WARN below to check for that as well, and
> fallback to the current behavior (set PFERR_GUEST_FINAL_MASK):
> 
> 	fault_stage = vmcb->control.exit_info_1 &
> 			(PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK);
> 	if (WARN_ON_ONCE(fault_stage != PFERR_GUEST_FINAL_MASK &&
> 			 fault_stage != PFERR_GUEST_PAGE_MASK))
> 		vmcb->control.exit_info_1 |= PFERR_GUEST_FINAL_MASK;

Except that doesn't do the right thing if both bits are set.  And we can use
hweight64(), which is a single POPCNT on modern CPUs.

Might be easiest to add something like PFERR_GUEST_FAULT_STAGE_MASK, then do:

	/*
	 * All nested page faults should be annotated as occuring on the final
	 * translation *OR* the page walk.  Arbitrarily choose "final" if KVM
	 * is buggy and enumerated both or none.
	 */
	if (WARN_ON_ONCE(hweight64(vmcb->control.exit_info_1 &
				   PFERR_GUEST_FAULT_STAGE_MASK) != 1)) {
		vmcb->control.exit_info_1 &= ~PFERR_GUEST_FAULT_STAGE_MASK;	
		vmcb->control.exit_info_1 |= PFERR_GUEST_FINAL_MASK;
	}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK
  2026-01-28 15:48       ` Sean Christopherson
@ 2026-02-04 16:22         ` Kevin Cheng
  2026-02-06  0:21           ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Cheng @ 2026-02-04 16:22 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Yosry Ahmed, pbonzini, kvm, linux-kernel

On Wed, Jan 28, 2026 at 10:48 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jan 22, 2026, Yosry Ahmed wrote:
> > On Wed, Jan 21, 2026 at 02:07:56PM -0800, Sean Christopherson wrote:
> > > On Wed, Jan 21, 2026, Kevin Cheng wrote:
> > > > When KVM emulates an instruction for L2 and encounters a nested page
> > > > fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
> > > > injects an NPF to L1. However, the code incorrectly hardcodes
> > > > (1ULL << 32) for exit_info_1's upper bits when the original exit was
> > > > not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
> > > > occurred on a page table page, preventing L1 from correctly identifying
> > > > the cause of the fault.
> > > >
> > > > Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
> > > > occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
> > > > the fault occurs on the final GPA-to-HPA translation.
> > > >
> > > > Widen error_code in struct x86_exception from u16 to u64 to accommodate
> > > > the PFERR_GUEST_* bits (bits 32 and 33).
> > >
> > > Please do this in a separate patch.  Intel CPUs straight up don't support 32-bit
> > > error codes, let alone 64-bit error codes, so this seemingly innocuous change
> > > needs to be accompanied by a lengthy changelog that effectively audits all usage
> > > to "prove" this change is ok.
> >
> > Semi-jokingly, we can add error_code_hi to track the high bits and
> > side-step the problem for Intel (dejavu?).
>
> Technically, it would require three fields: u16 error_code, u16 error_code_hi,
> and u32 error_code_ultra_hi.  :-D
>
> Isolating the (ultra) hi flags is very tempting, but I worry that it would lead
> to long term pain, e.g. because inevitably we'll forget to grab the hi flags at
> some point.  I'd rather audit the current code and ensure that KVM truncates the
> error code as needed.
>
> VMX is probably a-ok, e.g. see commit eba9799b5a6e ("KVM: VMX: Drop bits 31:16
> when shoving exception error code into VMCS").  I'd be more worred SVM, where
> it's legal to shove a 32-bit value into the error code, i.e. where KVM might not
> have existing explicit truncation.

As I understand it, intel CPUs don't allow for setting bits 31:16 of
the error code, but AMD CPUs allow bits 31:16 to be set. The
86_exception error_code field is u16 currently so it is always
truncated to u16 by default. In that case, after widening the error
code to 64 bits, do I have to ensure that any usage of the error that
isn't for NPF, has to truncate it to 16 bits? Or do I just need to
verify that all SVM usages of the error_code for exceptions truncate
the 64 bits down to 32 bits and all VMX usages truncate to 16 bits?

Just wanted to clarify because I think the wording of that statement
is confusing me into thinking that maybe there is something wrong with
32 bit error codes for SVM?

If the only usage of the widened field is NPF, wouldn't it be better
to go with an additional field like Yosry suggested (I see that VMX
has the added exit_qualification field in the struct)?

>
> > > > Update nested_svm_inject_npf_exit() to use fault->error_code directly
> > > > instead of hardcoding the upper bits. Also add a WARN_ON_ONCE if neither
> > > > PFERR_GUEST_FINAL_MASK nor PFERR_GUEST_PAGE_MASK is set, as this would
> > > > indicate a bug in the page fault handling code.
> > > >
> > > > Signed-off-by: Kevin Cheng <chengkev@google.com>
> > [..]
> > > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > > index de90b104a0dd5..f8dfd5c333023 100644
> > > > --- a/arch/x86/kvm/svm/nested.c
> > > > +++ b/arch/x86/kvm/svm/nested.c
> > > > @@ -40,18 +40,17 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> > > >   struct vmcb *vmcb = svm->vmcb;
> > > >
> > > >   if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> > > > -         /*
> > > > -          * TODO: track the cause of the nested page fault, and
> > > > -          * correctly fill in the high bits of exit_info_1.
> > > > -          */
> > > > -         vmcb->control.exit_code = SVM_EXIT_NPF;
> > > > -         vmcb->control.exit_info_1 = (1ULL << 32);
> > > > +         vmcb->control.exit_info_1 = fault->error_code;
> > > >           vmcb->control.exit_info_2 = fault->address;
> > > >   }
> > > >
> > > > + vmcb->control.exit_code = SVM_EXIT_NPF;
> > > >   vmcb->control.exit_info_1 &= ~0xffffffffULL;
> > > >   vmcb->control.exit_info_1 |= fault->error_code;
> > >
> > > So... what happens when exit_info_1 already has PFERR_GUEST_PAGE_MASK, and then
> > > @fault sets PFERR_GUEST_FINAL_MASK?  Presumably that can't/shouldn't happen,
> > > but nothing in the changelog explains why such a scenario is
> > > impossible, and nothing in the code hardens KVM against such goofs.
> >
> > I guess we can update the WARN below to check for that as well, and
> > fallback to the current behavior (set PFERR_GUEST_FINAL_MASK):
> >
> >       fault_stage = vmcb->control.exit_info_1 &
> >                       (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK);
> >       if (WARN_ON_ONCE(fault_stage != PFERR_GUEST_FINAL_MASK &&
> >                        fault_stage != PFERR_GUEST_PAGE_MASK))
> >               vmcb->control.exit_info_1 |= PFERR_GUEST_FINAL_MASK;
>
> Except that doesn't do the right thing if both bits are set.  And we can use
> hweight64(), which is a single POPCNT on modern CPUs.
>
> Might be easiest to add something like PFERR_GUEST_FAULT_STAGE_MASK, then do:
>
>         /*
>          * All nested page faults should be annotated as occuring on the final
>          * translation *OR* the page walk.  Arbitrarily choose "final" if KVM
>          * is buggy and enumerated both or none.
>          */
>         if (WARN_ON_ONCE(hweight64(vmcb->control.exit_info_1 &
>                                    PFERR_GUEST_FAULT_STAGE_MASK) != 1)) {
>                 vmcb->control.exit_info_1 &= ~PFERR_GUEST_FAULT_STAGE_MASK;
>                 vmcb->control.exit_info_1 |= PFERR_GUEST_FINAL_MASK;
>         }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK
  2026-02-04 16:22         ` Kevin Cheng
@ 2026-02-06  0:21           ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-02-06  0:21 UTC (permalink / raw)
  To: Kevin Cheng; +Cc: Yosry Ahmed, pbonzini, kvm, linux-kernel

On Wed, Feb 04, 2026, Kevin Cheng wrote:
> On Wed, Jan 28, 2026 at 10:48 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Jan 22, 2026, Yosry Ahmed wrote:
> > > On Wed, Jan 21, 2026 at 02:07:56PM -0800, Sean Christopherson wrote:
> > > > On Wed, Jan 21, 2026, Kevin Cheng wrote:
> > > > > When KVM emulates an instruction for L2 and encounters a nested page
> > > > > fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
> > > > > injects an NPF to L1. However, the code incorrectly hardcodes
> > > > > (1ULL << 32) for exit_info_1's upper bits when the original exit was
> > > > > not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
> > > > > occurred on a page table page, preventing L1 from correctly identifying
> > > > > the cause of the fault.
> > > > >
> > > > > Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
> > > > > occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
> > > > > the fault occurs on the final GPA-to-HPA translation.
> > > > >
> > > > > Widen error_code in struct x86_exception from u16 to u64 to accommodate
> > > > > the PFERR_GUEST_* bits (bits 32 and 33).
> > > >
> > > > Please do this in a separate patch.  Intel CPUs straight up don't support 32-bit
> > > > error codes, let alone 64-bit error codes, so this seemingly innocuous change
> > > > needs to be accompanied by a lengthy changelog that effectively audits all usage
> > > > to "prove" this change is ok.
> > >
> > > Semi-jokingly, we can add error_code_hi to track the high bits and
> > > side-step the problem for Intel (dejavu?).
> >
> > Technically, it would require three fields: u16 error_code, u16 error_code_hi,
> > and u32 error_code_ultra_hi.  :-D
> >
> > Isolating the (ultra) hi flags is very tempting, but I worry that it would lead
> > to long term pain, e.g. because inevitably we'll forget to grab the hi flags at
> > some point.  I'd rather audit the current code and ensure that KVM truncates the
> > error code as needed.
> >
> > VMX is probably a-ok, e.g. see commit eba9799b5a6e ("KVM: VMX: Drop bits 31:16
> > when shoving exception error code into VMCS").  I'd be more worred SVM, where
> > it's legal to shove a 32-bit value into the error code, i.e. where KVM might not
> > have existing explicit truncation.
> 
> As I understand it, intel CPUs don't allow for setting bits 31:16 of
> the error code, but AMD CPUs allow bits 31:16 to be set.

Yep.

> The 86_exception error_code field is u16 currently so it is always truncated
> to u16 by default. In that case, after widening the error code to 64 bits, do
> I have to ensure that any usage of the error that isn't for NPF, has to
> truncate it to 16 bits?
>
> Or do I just need to verify that all SVM usages of the error_code for
> exceptions truncate the 64 bits down to 32 bits and all VMX usages truncate
> to 16 bits?

Hmm, good question.

I was going to say "the second one", but that's actually meaningless because
(a) "struct kvm_queued_exception" stores the error code as a u32, which it should,
and (b) event_inj_err is also a u32, i.e. it's impossible to shove a 64-bit error
code into hardware on SVM.

And thinking through this more, if there is _existing_ code that tries to set
bits > 15 in the error_code, then UBSAN would likely have detected the issue,
e.g. due to trying to OR in a "bad" value.

Aha!  A serendipitous quirk in this patch is that it does NOT change the local
error_code in FNAME(walk_addr_generic) from a u16 to a u64.

We should double down on that with a comment.  That'd give me enough confidence
that we aren't likely to break legacy shadow paging now or in the future.  E.g.
in a patch to change x86_exception.error_code to a u64, also do:

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 901cd2bd40b8..f1790aa9e391 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -317,6 +317,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
        const int write_fault = access & PFERR_WRITE_MASK;
        const int user_fault  = access & PFERR_USER_MASK;
        const int fetch_fault = access & PFERR_FETCH_MASK;
+       /*
+        * Note!  Track the error_code that's common to legacy shadow paging
+        * and NPT shadow paging as a u16 to guard against unintentionally
+        * setting any of bits 63:16.  Architecturally, the #PF error code is
+        * 32 bits, and Intel CPUs don't support settings bits 31:16.
+        */
        u16 errcode = 0;
        gpa_t real_gpa;
        gfn_t gfn;

> Just wanted to clarify because I think the wording of that statement
> is confusing me into thinking that maybe there is something wrong with
> 32 bit error codes for SVM?

It's more that I am confident that either KVM already truncates the error code
on VMX, or that we'll notice *really* quickly, because failure to truncate an
error code will generate a VM-Fail.

On SVM, we could royally screw up an error code and it's entirely possible we
wouldn't notice until some random guest breaks in some weird way.

> If the only usage of the widened field is NPF, wouldn't it be better
> to go with an additional field like Yosry suggested (I see that VMX
> has the added exit_qualification field in the struct)?

No?  paging_tmpl.h is used to shadow all flavors of nested NPT as well as all
flavors of legacy paging.  And more importantly, unlike EPT, nested NPT isn't
otherwise special cased.  As shown by this patch, it _is_ possible to identify
nested NPT in select flows, and we could certainly do so more generically by
checking if the MMU is nested, but I'd prefer not to special case any particular
type of shadow paging without a strong reason to do so.

And more importantly, if we have two (or three) error code fields, then we need
to remember to pull data from all error code fields.  I.e. in an effort to avoid
introducing bugs, we would actually make it easier to introduce _other_ bugs.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-06  0:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-21  0:49 [PATCH 0/3] KVM: SVM: Set PFERR_GUEST_{PAGE,FINAL}_MASK for nested NPF and add selftest Kevin Cheng
2026-01-21  0:49 ` [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
2026-01-21 22:07   ` Sean Christopherson
2026-01-22  0:45     ` Yosry Ahmed
2026-01-28 15:48       ` Sean Christopherson
2026-02-04 16:22         ` Kevin Cheng
2026-02-06  0:21           ` Sean Christopherson
2026-01-21  0:49 ` [PATCH 2/3] KVM: selftests: Add TDP unmap helpers Kevin Cheng
2026-01-21 22:21   ` Sean Christopherson
2026-01-21  0:49 ` [PATCH 3/3] KVM: selftests: Add nested NPF injection test for SVM Kevin Cheng
2026-01-23  1:13   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox