[PATCH 0/4] Process some MMIO-related errors without KVM exit

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] Process some MMIO-related errors without KVM exit
@ 2024-09-23 14:18 Ivan Orlov
  2024-09-23 14:18 ` [PATCH 1/4] KVM: vmx, svm, mmu: Fix MMIO during event delivery handling Ivan Orlov
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Ivan Orlov @ 2024-09-23 14:18 UTC (permalink / raw)
  To: hpa, bp, dave.hansen, mingo, pbonzini, seanjc, shuah, tglx
  Cc: Ivan Orlov, jalliste, nh-open-source, kvm, linux-kernel,
	linux-kselftest, x86

Currently, KVM may return a variety of internal errors to VMM when
accessing MMIO, and some of them could be gracefully handled on the KVM
level instead. Moreover, some of the MMIO-related errors are handled
differently in VMX in comparison with SVM, which produces certain
inconsistency and should be fixed. This patch series introduces
KVM-level handling for the following situations:

1) Guest is accessing MMIO during event delivery: triple fault instead
of internal error on VMX and infinite loop on SVM

2) Guest fetches an instruction from MMIO: inject #UD and resume guest
execution without internal error

Additionaly, this patch series includes a KVM selftest which covers
different cases of MMIO misuse.

Also, update the set_memory_region_test to expect the triple fault when
starting VM with no RAM.

Ivan Orlov (4):
  KVM: vmx, svm, mmu: Fix MMIO during event delivery handling
  KVM: x86: Inject UD when fetching from MMIO
  selftests: KVM: Change expected exit code in test_zero_memory_regions
  selftests: KVM: Add new test for faulty mmio usage

 arch/x86/include/asm/kvm_host.h               |   6 +
 arch/x86/kvm/emulate.c                        |   3 +
 arch/x86/kvm/kvm_emulate.h                    |   1 +
 arch/x86/kvm/mmu/mmu.c                        |  13 +-
 arch/x86/kvm/svm/svm.c                        |   4 +
 arch/x86/kvm/vmx/vmx.c                        |  21 +-
 arch/x86/kvm/x86.c                            |   7 +-
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/set_memory_region_test.c    |   3 +-
 .../selftests/kvm/x86_64/faulty_mmio.c        | 199 ++++++++++++++++++
 10 files changed, 242 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/faulty_mmio.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] KVM: vmx, svm, mmu: Fix MMIO during event delivery handling
  2024-09-23 14:18 [PATCH 0/4] Process some MMIO-related errors without KVM exit Ivan Orlov
@ 2024-09-23 14:18 ` Ivan Orlov
  2024-09-23 14:18 ` [PATCH 2/4] KVM: x86: Inject UD when fetching from MMIO Ivan Orlov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Ivan Orlov @ 2024-09-23 14:18 UTC (permalink / raw)
  To: hpa, bp, dave.hansen, mingo, pbonzini, seanjc, shuah, tglx
  Cc: Ivan Orlov, jalliste, nh-open-source, kvm, linux-kernel,
	linux-kselftest, x86

Currently, the situation when guest accesses MMIO during event delivery
is handled differently in VMX and SVM: on VMX KVM returns internal error
with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV, when SVM simply goes
into infinite loop trying to execute faulty instruction again and again.

Such a situation can happen when the guest sets the IDTR (or GDTR)
descriptor base to point to MMIO region, and as this issue can be
triggered from the guest it is not an "internal" KVM error and it
should be gracefully handled by KVM.

Eliminate the SVM/VMX difference by triggering triple fault when MMIO
happens during event delivery. As we don't have a reliable way to
detect MMIO operation on SVM before actually looking at the GPA,
move the problem detection into the common KVM x86 layer (into the
kvm_mmu_page_fault function) and add the PFERR_EVT_DELIVERY flag
which gets set in the SVM/VMX specific vmexit handler to signal
that we are in the middle of the event delivery.

Signed-off-by: Ivan Orlov <iorlov@amazon.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/mmu/mmu.c          | 13 ++++++++++++-
 arch/x86/kvm/svm/svm.c          |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 21 ++++++++-------------
 4 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a68cb3eba78..292657fda063 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -284,6 +284,12 @@ enum x86_intercept_stage;
 				 PFERR_WRITE_MASK |		\
 				 PFERR_PRESENT_MASK)
 
+/*
+ * EVT_DELIVERY is a KVM-defined flag used to indicate that vmexit occurred
+ * during event delivery.
+ */
+#define PFERR_EVT_DELIVERY		BIT_ULL(50)
+
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC	0
 /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7813d28b082f..80db379766fb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5992,8 +5992,19 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 			return -EFAULT;
 
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
-		if (r == RET_PF_EMULATE)
+		if (r == RET_PF_EMULATE) {
+			/*
+			 * Request triple fault if guest accesses MMIO during event delivery.
+			 * It could happen if the guest sets the IDTR base to point to an MMIO
+			 * range. This is not allowed and there is no way to recover after it.
+			 */
+			if (error_code & PFERR_EVT_DELIVERY) {
+				pr_warn("Guest accesses MMIO during event delivery. Requesting triple fault.\n");
+				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+				return 1;
+			}
 			goto emulate;
+		}
 	}
 
 	if (r == RET_PF_INVALID) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5ab2c92c7331..b83ca69b0e57 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2058,6 +2058,7 @@ static int npf_interception(struct kvm_vcpu *vcpu)
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
+	u32 int_type = svm->vmcb->control.exit_int_info & SVM_EXITINTINFO_TYPE_MASK;
 
 	/*
 	 * WARN if hardware generates a fault with an error code that collides
@@ -2071,6 +2072,9 @@ static int npf_interception(struct kvm_vcpu *vcpu)
 	if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
 		error_code |= PFERR_PRIVATE_ACCESS;
 
+	if (int_type)
+		error_code |= PFERR_EVT_DELIVERY;
+
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
 	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
 				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 733a0c45d1a6..4d136fee7d63 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5823,7 +5823,12 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	gpa_t gpa;
+	u64 error_code = PFERR_RSVD_MASK;
+
+	if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK)
+		error_code |= PFERR_EVT_DELIVERY;
 
 	if (vmx_check_emulate_instruction(vcpu, EMULTYPE_PF, NULL, 0))
 		return 1;
@@ -5839,7 +5844,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 		return kvm_skip_emulated_instruction(vcpu);
 	}
 
-	return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
+	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
 static int handle_nmi_window(struct kvm_vcpu *vcpu)
@@ -6532,20 +6537,14 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 		return 0;
 	}
 
-	/*
-	 * Note:
-	 * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
-	 * delivery event since it indicates guest is accessing MMIO.
-	 * The vm-exit can be triggered again after return to guest that
-	 * will cause infinite loop.
-	 */
 	if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
 	    (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI &&
 	     exit_reason.basic != EXIT_REASON_EPT_VIOLATION &&
 	     exit_reason.basic != EXIT_REASON_PML_FULL &&
 	     exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
 	     exit_reason.basic != EXIT_REASON_TASK_SWITCH &&
-	     exit_reason.basic != EXIT_REASON_NOTIFY)) {
+	     exit_reason.basic != EXIT_REASON_NOTIFY &&
+	     exit_reason.basic != EXIT_REASON_EPT_MISCONFIG)) {
 		int ndata = 3;
 
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
@@ -6553,10 +6552,6 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 		vcpu->run->internal.data[0] = vectoring_info;
 		vcpu->run->internal.data[1] = exit_reason.full;
 		vcpu->run->internal.data[2] = vmx_get_exit_qual(vcpu);
-		if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) {
-			vcpu->run->internal.data[ndata++] =
-				vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-		}
 		vcpu->run->internal.data[ndata++] = vcpu->arch.last_vmentry_cpu;
 		vcpu->run->internal.ndata = ndata;
 		return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] KVM: x86: Inject UD when fetching from MMIO
  2024-09-23 14:18 [PATCH 0/4] Process some MMIO-related errors without KVM exit Ivan Orlov
  2024-09-23 14:18 ` [PATCH 1/4] KVM: vmx, svm, mmu: Fix MMIO during event delivery handling Ivan Orlov
@ 2024-09-23 14:18 ` Ivan Orlov
  2024-09-23 14:18 ` [PATCH 3/4] selftests: KVM: Change expected exit code in test_zero_memory_regions Ivan Orlov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Ivan Orlov @ 2024-09-23 14:18 UTC (permalink / raw)
  To: hpa, bp, dave.hansen, mingo, pbonzini, seanjc, shuah, tglx
  Cc: Ivan Orlov, jalliste, nh-open-source, kvm, linux-kernel,
	linux-kselftest, x86

Currently, we simply return a KVM internal error with suberror =
KVM_INTERNAL_ERROR_EMULATION if the guest tries to fetch instruction
from MMIO range as we simply can't decode it.

I believe it is not the best thing to do, considering that

  1) we don't give enough information to VMM about the issue we faced
  2) the issue is triggered by the guest itself, so it is not the KVM
     "internal" error.

Inject the #UD into the guest instead and resume it's execution without
giving an error to VMM, as it would be if we can't find a valid
instruction at MMIO address.

Signed-off-by: Ivan Orlov <iorlov@amazon.com>
---
 arch/x86/kvm/emulate.c     | 3 +++
 arch/x86/kvm/kvm_emulate.h | 1 +
 arch/x86/kvm/x86.c         | 7 ++++++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e72aed25d721..d610c47fa1f4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4742,10 +4742,13 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	ctxt->fetch.end = ctxt->fetch.data + insn_len;
 	ctxt->opcode_len = 1;
 	ctxt->intercept = x86_intercept_none;
+	ctxt->is_mmio_fetch = false;
 	if (insn_len > 0)
 		memcpy(ctxt->fetch.data, insn, insn_len);
 	else {
 		rc = __do_insn_fetch_bytes(ctxt, 1);
+		if (rc == X86EMUL_IO_NEEDED)
+			ctxt->is_mmio_fetch = true;
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 55a18e2f2dcd..46c0d1111ec1 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -362,6 +362,7 @@ struct x86_emulate_ctxt {
 	u8 seg_override;
 	u64 d;
 	unsigned long _eip;
+	bool is_mmio_fetch;
 
 	/* Here begins the usercopy section. */
 	struct operand src;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c983c8e434b8..4fb57280ec7b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8857,7 +8857,12 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 
 	kvm_queue_exception(vcpu, UD_VECTOR);
 
-	if (!is_guest_mode(vcpu) && kvm_x86_call(get_cpl)(vcpu) == 0) {
+	/*
+	 * Don't return an internal error if the emulation error is caused by a fetch from MMIO
+	 * address. Injecting a #UD should be enough.
+	 */
+	if (!is_guest_mode(vcpu) && kvm_x86_call(get_cpl)(vcpu) == 0 &&
+	    !vcpu->arch.emulate_ctxt->is_mmio_fetch) {
 		prepare_emulation_ctxt_failure_exit(vcpu);
 		return 0;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] selftests: KVM: Change expected exit code in test_zero_memory_regions
  2024-09-23 14:18 [PATCH 0/4] Process some MMIO-related errors without KVM exit Ivan Orlov
  2024-09-23 14:18 ` [PATCH 1/4] KVM: vmx, svm, mmu: Fix MMIO during event delivery handling Ivan Orlov
  2024-09-23 14:18 ` [PATCH 2/4] KVM: x86: Inject UD when fetching from MMIO Ivan Orlov
@ 2024-09-23 14:18 ` Ivan Orlov
  2024-09-23 14:18 ` [PATCH 4/4] selftests: KVM: Add new test for faulty mmio usage Ivan Orlov
  2024-09-23 17:04 ` [PATCH 0/4] Process some MMIO-related errors without KVM exit Sean Christopherson
  4 siblings, 0 replies; 12+ messages in thread
From: Ivan Orlov @ 2024-09-23 14:18 UTC (permalink / raw)
  To: hpa, bp, dave.hansen, mingo, pbonzini, seanjc, shuah, tglx
  Cc: Ivan Orlov, jalliste, nh-open-source, kvm, linux-kernel,
	linux-kselftest, x86

Update the set_memory_region test, test case test_zero_memory_regions to
use an updated exit code if the VM starts with no RAM. Now we are
issuing a triple fault in such a case, not an internal error.

Signed-off-by: Ivan Orlov <iorlov@amazon.com>
---
 tools/testing/selftests/kvm/set_memory_region_test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index bb8002084f52..d84d86668932 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -331,7 +331,8 @@ static void test_zero_memory_regions(void)
 
 	vm_ioctl(vm, KVM_SET_NR_MMU_PAGES, (void *)64ul);
 	vcpu_run(vcpu);
-	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_INTERNAL_ERROR);
+	/* No memory at all, we should triple fault */
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SHUTDOWN);
 
 	kvm_vm_free(vm);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] selftests: KVM: Add new test for faulty mmio usage
  2024-09-23 14:18 [PATCH 0/4] Process some MMIO-related errors without KVM exit Ivan Orlov
                   ` (2 preceding siblings ...)
  2024-09-23 14:18 ` [PATCH 3/4] selftests: KVM: Change expected exit code in test_zero_memory_regions Ivan Orlov
@ 2024-09-23 14:18 ` Ivan Orlov
  2024-09-23 17:04 ` [PATCH 0/4] Process some MMIO-related errors without KVM exit Sean Christopherson
  4 siblings, 0 replies; 12+ messages in thread
From: Ivan Orlov @ 2024-09-23 14:18 UTC (permalink / raw)
  To: hpa, bp, dave.hansen, mingo, pbonzini, seanjc, shuah, tglx
  Cc: Ivan Orlov, jalliste, nh-open-source, kvm, linux-kernel,
	linux-kselftest, x86

Implement the test which covers "weird" mmio usage. The test has 4 test
cases:

1) Guest sets IDT/GDT base to point to an MMIO region. Triple fault and
shutdown are expected there.
2) Guest jumps to MMIO address. Fetches from MMIO are not permitted, so
UD is expected there.
3) Guest sets an IDT entry to point to MMIO range. MMIO here happens
after event delivery, so UD is expected.
4) Guest points the UD IDT entry to MMIO range and causes UD after that.
We should not go into infinite loop here, as we are constantly putting
exception info onto the stack and it will eventually overflow.

These test cases depend on previous patches in this patch series.

Signed-off-by: Ivan Orlov <iorlov@amazon.com>
---
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/faulty_mmio.c        | 199 ++++++++++++++++++
 2 files changed, 200 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/faulty_mmio.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 0c4b254ab56b..d9928c54e851 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -129,6 +129,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/amx_test
 TEST_GEN_PROGS_x86_64 += x86_64/max_vcpuid_cap_test
 TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test
 TEST_GEN_PROGS_x86_64 += x86_64/recalc_apic_map_test
+TEST_GEN_PROGS_x86_64 += x86_64/faulty_mmio
 TEST_GEN_PROGS_x86_64 += access_tracking_perf_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
diff --git a/tools/testing/selftests/kvm/x86_64/faulty_mmio.c b/tools/testing/selftests/kvm/x86_64/faulty_mmio.c
new file mode 100644
index 000000000000..b83c1d646696
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/faulty_mmio.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This test covers error processing when doing weird things with MMIO addresses,
+ * i.e. jumping into MMIO range or specifying it as IDT / GDT descriptor base.
+ */
+#include <stdio.h>
+#include "kvm_util.h"
+#include "processor.h"
+#include <stdint.h>
+
+#define MMIO_ADDR 0xDEADBEE000UL
+/* This address is not canonical, so any reference will result in #GP */
+#define GP_ADDR 0xDEADBEEFDEADBEEFULL
+
+enum test_desc_type {
+	TEST_DESC_IDT,
+	TEST_DESC_GDT,
+};
+
+static const struct desc_ptr faulty_desc = {
+	.address = MMIO_ADDR,
+	.size = 0xFFF,
+};
+
+static void faulty_desc_guest_code(enum test_desc_type dtype)
+{
+	if (dtype == TEST_DESC_IDT)
+		__asm__ __volatile__("lidt %0"::"m"(faulty_desc));
+	else
+		__asm__ __volatile__("lgdt %0"::"m"(faulty_desc));
+
+	/* Generate a #GP */
+	*((uint8_t *)GP_ADDR) = 0x1;
+
+	/* We should never reach this point */
+	GUEST_ASSERT(0);
+}
+
+/*
+ * This test tries to point the IDT / GDT descriptor to an MMIO range.
+ * This action should cause a triple fault in guest, as it happens when
+ * your descriptors are messed up on the actual hardware.
+ */
+static void test_faulty_desc(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	enum test_desc_type dtype_tests[] = { TEST_DESC_IDT, TEST_DESC_GDT };
+
+	for (i = 0; i < ARRAY_SIZE(dtype_tests); i++) {
+		vm = vm_create_with_one_vcpu(&vcpu, faulty_desc_guest_code);
+		vcpu_args_set(vcpu, 1, dtype_tests[i]);
+		virt_map(vm, MMIO_ADDR, MMIO_ADDR, 1);
+
+		vcpu_run(vcpu);
+		TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SHUTDOWN);
+		kvm_vm_free(vm);
+	}
+}
+
+static void jump_to_mmio_guest_code(bool write_first)
+{
+	void (*f)(void) = (void *)(MMIO_ADDR);
+
+	if (write_first) {
+		/*
+		 * We get different vmexit codes when accessing the MMIO address for the second
+		 * time with VMX. For the first time it is an EPT violation, for the second -
+		 * EPT misconfig. We need to check that we get #UD in both cases.
+		 */
+		*((char *)MMIO_ADDR) = 0x1;
+	}
+
+	f();
+
+	/* We should never reach this point */
+	GUEST_ASSERT(0);
+}
+
+static void guest_ud_handler(struct ex_regs *regs)
+{
+	GUEST_DONE();
+}
+
+/*
+ * This test tries to jump to an MMIO address. As fetching the instructions
+ * from MMIO is not supported by KVM and doesn't make any practical sense,
+ * KVM should handle it gracefully and inject #UD into guest.
+ */
+static void test_jump_to_mmio(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct kvm_run *run;
+	struct ucall uc;
+	int i;
+
+	bool test_cases_write_first[] = { false, true };
+
+	for (i = 0; i < ARRAY_SIZE(test_cases_write_first); i++) {
+		vm = vm_create_with_one_vcpu(&vcpu, jump_to_mmio_guest_code);
+		virt_map(vm, MMIO_ADDR, MMIO_ADDR, 1);
+		vcpu_args_set(vcpu, 1, test_cases_write_first[i]);
+		vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
+
+		run = vcpu->run;
+
+		vcpu_run(vcpu);
+		if (test_cases_write_first[i] && run->exit_reason == KVM_EXIT_MMIO) {
+			/* Process first MMIO access if required */
+			vcpu_run(vcpu);
+		}
+
+		/* If #UD was injected correctly, our #UD handler will issue UCALL_DONE */
+		TEST_ASSERT_KVM_EXIT_REASON(vcpu, UCALL_EXIT_REASON);
+		TEST_ASSERT(get_ucall(vcpu, &uc) == UCALL_DONE,
+			    "Guest should have gone into #UD handler when jumping to MMIO address, however it didn't happen");
+		kvm_vm_free(vm);
+	}
+}
+
+static void faulty_idte_guest_code(void)
+{
+	/*
+	 * We are triggering #GP here, and as it's IDT entry points to an MMIO range,
+	 * we should get an #UD as instruction fetching from MMIO address is prohibited
+	 */
+	*((uint8_t *)GP_ADDR) = 0x1;
+
+	/* We should never reach this point */
+	GUEST_ASSERT(0);
+}
+
+/*
+ * When IDT entry points to an MMIO address, it should be handled as a jump to MMIO address
+ * and should cause #UD in the guest, as fetches from MMIO are not supported. It should not
+ * cause a triple fault in such a case, so we don't expect KVM_EXIT_SHUTDOWN here.
+ */
+static void test_faulty_idte(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct ucall uc;
+
+	vm = vm_create_with_one_vcpu(&vcpu, faulty_idte_guest_code);
+	virt_map(vm, MMIO_ADDR, MMIO_ADDR, 1);
+
+	/* GP vector points to MMIO range, jumping to it will trigger an #UD */
+	vm_install_exception_handler(vm, GP_VECTOR, (void *)MMIO_ADDR);
+	vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
+
+	vcpu_run(vcpu);
+	/* If we reach #UD handler it will issue UCALL_DONE */
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, UCALL_EXIT_REASON);
+	TEST_ASSERT(get_ucall(vcpu, &uc) == UCALL_DONE,
+		    "Guest should have gone into #UD handler when jumping to MMIO address, however it didn't happen");
+	kvm_vm_free(vm);
+}
+
+static void faulty_ud_idte_guest_code(void)
+{
+	asm("ud2");
+
+	/* We should never reach this point */
+	GUEST_ASSERT(0);
+}
+
+/*
+ * This test checks that we won't hang in the infinite loop if the #UD handler
+ * also causes #UD (as it points to an MMIO address). In this situation, we will
+ * run out of stack eventually, which will cause a triple fault
+ */
+static void test_faulty_ud_handler(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+
+	vm = vm_create_with_one_vcpu(&vcpu, faulty_ud_idte_guest_code);
+	virt_map(vm, MMIO_ADDR, MMIO_ADDR, 1);
+
+	vm_install_exception_handler(vm, UD_VECTOR, (void *)MMIO_ADDR);
+
+	vcpu_run(vcpu);
+	/* #UD caused when jumping to #UD handler should overflow stack causing a triple fault */
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SHUTDOWN);
+	kvm_vm_free(vm);
+}
+
+int main(void)
+{
+	test_faulty_desc();
+	test_jump_to_mmio();
+	test_faulty_idte();
+	test_faulty_ud_handler();
+
+	return EXIT_SUCCESS;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit
  2024-09-23 14:18 [PATCH 0/4] Process some MMIO-related errors without KVM exit Ivan Orlov
                   ` (3 preceding siblings ...)
  2024-09-23 14:18 ` [PATCH 4/4] selftests: KVM: Add new test for faulty mmio usage Ivan Orlov
@ 2024-09-23 17:04 ` Sean Christopherson
  2024-09-23 19:38   ` Allister, Jack
  4 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2024-09-23 17:04 UTC (permalink / raw)
  To: Ivan Orlov
  Cc: hpa, bp, dave.hansen, mingo, pbonzini, shuah, tglx, jalliste,
	nh-open-source, kvm, linux-kernel, linux-kselftest, x86

On Mon, Sep 23, 2024, Ivan Orlov wrote:
> Currently, KVM may return a variety of internal errors to VMM when
> accessing MMIO, and some of them could be gracefully handled on the KVM
> level instead. Moreover, some of the MMIO-related errors are handled
> differently in VMX in comparison with SVM, which produces certain
> inconsistency and should be fixed. This patch series introduces
> KVM-level handling for the following situations:
> 
> 1) Guest is accessing MMIO during event delivery: triple fault instead
> of internal error on VMX and infinite loop on SVM
> 
> 2) Guest fetches an instruction from MMIO: inject #UD and resume guest
> execution without internal error

No.  This is not architectural behavior.  It's not even remotely close to
architectural behavior.  KVM's behavior isn't great, but making up _guest visible_
behavior is not going to happen.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit
  2024-09-23 17:04 ` [PATCH 0/4] Process some MMIO-related errors without KVM exit Sean Christopherson
@ 2024-09-23 19:38   ` Allister, Jack
  2024-09-23 21:46     ` Sean Christopherson
  2024-09-24  7:38     ` Sean Christopherson
  0 siblings, 2 replies; 12+ messages in thread
From: Allister, Jack @ 2024-09-23 19:38 UTC (permalink / raw)
  To: Orlov, Ivan, seanjc@google.com
  Cc: linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	mingo@redhat.com, tglx@linutronix.de, Allister, Jack,
	pbonzini@redhat.com, nh-open-source@amazon.com, shuah@kernel.org,
	kvm@vger.kernel.org, x86@kernel.org

On Mon, 2024-09-23 at 10:04 -0700, Sean Christopherson wrote:
> 
> On Mon, Sep 23, 2024, Ivan Orlov wrote:
> > Currently, KVM may return a variety of internal errors to VMM when
> > accessing MMIO, and some of them could be gracefully handled on the
> > KVM
> > level instead. Moreover, some of the MMIO-related errors are
> > handled
> > differently in VMX in comparison with SVM, which produces certain
> > inconsistency and should be fixed. This patch series introduces
> > KVM-level handling for the following situations:
> > 
> > 1) Guest is accessing MMIO during event delivery: triple fault
> > instead
> > of internal error on VMX and infinite loop on SVM
> > 
> > 2) Guest fetches an instruction from MMIO: inject #UD and resume
> > guest
> > execution without internal error
> 
> No.  This is not architectural behavior.  It's not even remotely
> close to
> architectural behavior.  KVM's behavior isn't great, but making up
> _guest visible_
> behavior is not going to happen.

Is this a no to the whole series or from the cover letter? 

For patch 1 we have observed that if a guest has incorrectly set it's
IDT base to point inside of an MMIO region it will result in a triple
fault (bare metal Cascake Lake Intel). Yes a sane operating system is
not really going to be doing setting it's IDT or GDT base to point into
an MMIO region, but we've seen occurrences. Normally when other
external things have gone horribly wrong.

Ivan can clarify as to what's been seen on AMD platforms regarding the
infinite loop for patch one. This was also tested on bare metal
hardware. Injection of the #UD within patch 2 may be debatable but I
believe Ivan has some more data from experiments backing this up.

Best regards,
Jack




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit
  2024-09-23 19:38   ` Allister, Jack
@ 2024-09-23 21:46     ` Sean Christopherson
  2024-09-24  9:54       ` Ivan Orlov
  2024-09-24  7:38     ` Sean Christopherson
  1 sibling, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2024-09-23 21:46 UTC (permalink / raw)
  To: Jack Allister
  Cc: Ivan Orlov, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	tglx@linutronix.de, pbonzini@redhat.com,
	nh-open-source@amazon.com, shuah@kernel.org, kvm@vger.kernel.org,
	x86@kernel.org

On Mon, Sep 23, 2024, Jack Allister wrote:
> On Mon, 2024-09-23 at 10:04 -0700, Sean Christopherson wrote:
> > 
> > On Mon, Sep 23, 2024, Ivan Orlov wrote:
> > > Currently, KVM may return a variety of internal errors to VMM when
> > > accessing MMIO, and some of them could be gracefully handled on the
> > > KVM
> > > level instead. Moreover, some of the MMIO-related errors are
> > > handled
> > > differently in VMX in comparison with SVM, which produces certain
> > > inconsistency and should be fixed. This patch series introduces
> > > KVM-level handling for the following situations:
> > > 
> > > 1) Guest is accessing MMIO during event delivery: triple fault
> > > instead
> > > of internal error on VMX and infinite loop on SVM
> > > 
> > > 2) Guest fetches an instruction from MMIO: inject #UD and resume
> > > guest
> > > execution without internal error
> > 
> > No.  This is not architectural behavior.  It's not even remotely
> > close to
> > architectural behavior.  KVM's behavior isn't great, but making up
> > _guest visible_
> > behavior is not going to happen.
> 
> Is this a no to the whole series or from the cover letter? 

The whole series.

> For patch 1 we have observed that if a guest has incorrectly set it's
> IDT base to point inside of an MMIO region it will result in a triple
> fault (bare metal Cascake Lake Intel).

That happens because the IDT is garbage and/or the CPU is getting master abort
semantics back, not because anything in the x86 architectures says that accessing
MMIO during exception vectoring goes straight to shutdown.

> Yes a sane operating system is not really going to be doing setting it's IDT
> or GDT base to point into an MMIO region, but we've seen occurrences.
> Normally when other external things have gone horribly wrong.
> 
> Ivan can clarify as to what's been seen on AMD platforms regarding the
> infinite loop for patch one. This was also tested on bare metal
> hardware. Injection of the #UD within patch 2 may be debatable but I
> believe Ivan has some more data from experiments backing this up.

I have no problems improving KVM's handling of scenarios that KVM can't emulate,
but there needs to be reasonable justification for taking on complexity, and KVM
must not make up guest visible behavior.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit
  2024-09-23 19:38   ` Allister, Jack
  2024-09-23 21:46     ` Sean Christopherson
@ 2024-09-24  7:38     ` Sean Christopherson
  1 sibling, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2024-09-24  7:38 UTC (permalink / raw)
  To: Jack Allister
  Cc: Ivan Orlov, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	tglx@linutronix.de, pbonzini@redhat.com,
	nh-open-source@amazon.com, shuah@kernel.org, kvm@vger.kernel.org,
	x86@kernel.org

On Mon, Sep 23, 2024, Jack Allister wrote:
> On Mon, 2024-09-23 at 10:04 -0700, Sean Christopherson wrote:
> > 
> > On Mon, Sep 23, 2024, Ivan Orlov wrote:
> > > Currently, KVM may return a variety of internal errors to VMM when
> > > accessing MMIO, and some of them could be gracefully handled on the
> > > KVM
> > > level instead. Moreover, some of the MMIO-related errors are
> > > handled
> > > differently in VMX in comparison with SVM, which produces certain
> > > inconsistency and should be fixed. This patch series introduces
> > > KVM-level handling for the following situations:
> > > 
> > > 1) Guest is accessing MMIO during event delivery: triple fault
> > > instead
> > > of internal error on VMX and infinite loop on SVM
> > > 
> > > 2) Guest fetches an instruction from MMIO: inject #UD and resume
> > > guest
> > > execution without internal error
> > 
> > No.  This is not architectural behavior.  It's not even remotely close to
> > architectural behavior.  KVM's behavior isn't great, but making up _guest
> > visible_ behavior is not going to happen.
> 
> Is this a no to the whole series or from the cover letter? 

The whole series.

> For patch 1 we have observed that if a guest has incorrectly set it's
> IDT base to point inside of an MMIO region it will result in a triple
> fault (bare metal Cascake Lake Intel).

The triple fault occurs because the MMIO read returns garbage, e.g. because it
gets back master abort semantics.

> Yes a sane operating system is not really going to be doing setting it's IDT
> or GDT base to point into an MMIO region, but we've seen occurrences.

Sure, but that doesn't make it architecturally correct to synthesize arbitrary
faults.

> Normally when other external things have gone horribly wrong.
> 
> Ivan can clarify as to what's been seen on AMD platforms regarding the
> infinite loop for patch one.

So it sounds like what you really want to do is not put the vCPU into an infinite
loop.  Have you tried kvm/next or kvm-x86/next, which has fixes for infinite
loops on TDP faults?  Specifically, these commits:

  98a69b96caca3e07aff57ca91fd7cc3a3853871a KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
  d859b16161c81ee929b7b02a85227b8e3250bc97 KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
  6b3dcabc10911711eba15816d808e2a18f130406 KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
  2876624e1adcd9a3a3ffa8c4fe3bf8dbba969d95 KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
  4df685664bed04794ad72b58d8af1fa4fcc60261 KVM: x86: Update retry protection fields when forcing retry on emulation failure
  dabc4ff70c35756bc107bc5d035d0f0746396a9a KVM: x86: Apply retry protection to "unprotect on failure" path
  19ab2c8be070160be70a88027b3b93106fef7b89 KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
  620525739521376a65a690df899e1596d56791f8 KVM: x86: Remove manual pfn lookup when retrying #PF after failed emulation
  b299c273c06f005976cdc1b9e9299d492527607e KVM: x86/mmu: Move event re-injection unprotect+retry into common path
  29e495bdf847ac6ad0e0d03e5db39a3ed9f12858 KVM: x86/mmu: Always walk guest PTEs with WRITE access when unprotecting
  b7e948898e772ac900950c0dac4ca90e905cd0c0 KVM: x86/mmu: Don't try to unprotect an INVALID_GPA
  2df354e37c1398a85bb43cbbf1f913eb3f91d035 KVM: x86: Fold retry_instruction() into x86_emulate_instruction()
  41e6e367d576ce1801dc5c2b106e14cde35e3c80 KVM: x86: Move EMULTYPE_ALLOW_RETRY_PF to x86_emulate_instruction()
  dfaae8447c53819749cf3ba10ce24d3c609752e3 KVM: x86/mmu: Try "unprotect for retry" iff there are indirect SPs
  01dd4d319207c4cfd51a1c9a1812909e944d8c86 KVM: x86/mmu: Apply retry protection to "fast nTDP unprotect" path
  9c19129e535bfff85bdfcb5a804e19e5aae935b2 KVM: x86: Store gpa as gpa_t, not unsigned long, when unprotecting for retry
  019f3f84a40c88b68ca4d455306b92c20733e784 KVM: x86: Get RIP from vCPU state when storing it to last_retry_eip
  c1edcc41c3603c65f34000ae031a20971f4e56f9 KVM: x86: Retry to-be-emulated insn in "slow" unprotect path iff sp is zapped
  2fb2b7877b3a4cac4de070ef92437b38f13559b0 KVM: x86/mmu: Skip emulation on page fault iff 1+ SPs were unprotected
  989a84c93f592e6b288fb3b96d2eeec827d75bef KVM: x86/mmu: Trigger unprotect logic only on write-protection page faults
  4ececec19a0914873634ad69bbaca5557c33e855 KVM: x86/mmu: Replace PFERR_NESTED_GUEST_PAGE with a more descriptive helper

> This was also tested on bare metal hardware. Injection of the #UD within
> patch 2 may be debatable but I believe Ivan has some more data from
> experiments backing this up.

Heh, it's not debatable.  Fetching from MMIO is perfectly legal.  Again, any #UD
you see on bare metal is all but guaranteed to be due to fetching garbage.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit
  2024-09-23 21:46     ` Sean Christopherson
@ 2024-09-24  9:54       ` Ivan Orlov
  2024-09-26  0:06         ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Ivan Orlov @ 2024-09-24  9:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jack Allister, Ivan Orlov, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	tglx@linutronix.de, pbonzini@redhat.com,
	nh-open-source@amazon.com, shuah@kernel.org, kvm@vger.kernel.org,
	x86@kernel.org

On Mon, Sep 23, 2024 at 02:46:17PM -0700, Sean Christopherson wrote:
 > >
> > > No.  This is not architectural behavior.  It's not even remotely
> > > close to
> > > architectural behavior.  KVM's behavior isn't great, but making up
> > > _guest visible_
> > > behavior is not going to happen.
> >
> > Is this a no to the whole series or from the cover letter?
> 
> The whole series.
> 
> > For patch 1 we have observed that if a guest has incorrectly set it's
> > IDT base to point inside of an MMIO region it will result in a triple
> > fault (bare metal Cascake Lake Intel).
> 
> That happens because the IDT is garbage and/or the CPU is getting master abort
> semantics back, not because anything in the x86 architectures says that accessing
> MMIO during exception vectoring goes straight to shutdown.
>

Hi Sean, thank you for the detailed reply.

Yes, I ran the reproducer on my AMD Ryzen 5 once again, and it seems like
pointing the IDT descriptor base to a framebuffer works perfectly fine,
without any triple faults, so injecting it into guest is not a correct
solution.

However, I believe KVM should demonstrate consistent behaviour in
handling MMIO during event delivery on vmx/svm, so either by returning
an event delivery error in both cases or going into infinite loop in
both cases. I'm going to test the kvm/next with the commits you
mentioned, and send a fix if it still hits an infinite loop on SVM.

Also, I reckon as detecting such an issue on the KVM level doesn't
introduce much complexity, returning a sane error flag would be nice. For
instance, we could set one of the 'internal.data' elements to identify
that the problem occured due to MMIO during event delivery

> > Yes a sane operating system is not really going to be doing setting it's IDT
> > or GDT base to point into an MMIO region, but we've seen occurrences.
> > Normally when other external things have gone horribly wrong.
> >
> > Ivan can clarify as to what's been seen on AMD platforms regarding the
> > infinite loop for patch one. This was also tested on bare metal
> > hardware. Injection of the #UD within patch 2 may be debatable but I
> > believe Ivan has some more data from experiments backing this up.
> 
> I have no problems improving KVM's handling of scenarios that KVM can't emulate,
> but there needs to be reasonable justification for taking on complexity, and KVM
> must not make up guest visible behavior.

Regarding the #UD-related change: the way how I formulated it in the
cover letter and the patch is confusing, sorry. We are _alredy_ enqueuing
an #UD when fetching from MMIO, so I believe it is already architecturally
incorrect (see handle_emulation_failure in arch/x86/kvm/x86.c). However,
we return an emulation failure after that, and I believe how we do this
is debatable. I maintain we should either set a flag in emulation_failure.flags
to indicate that the error happened due to fetch from mmio (to give more
information to VMM), or we shouldn't return an error at all... Maybe it
should be KVM_EXIT_MMIO with some flag set? What do you think?

Thank you!

Kind regards,
Ivan Orlov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit
  2024-09-24  9:54       ` Ivan Orlov
@ 2024-09-26  0:06         ` Sean Christopherson
  2024-09-27 12:13           ` Ivan Orlov
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2024-09-26  0:06 UTC (permalink / raw)
  To: Ivan Orlov
  Cc: Jack Allister, Ivan Orlov, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	tglx@linutronix.de, pbonzini@redhat.com,
	nh-open-source@amazon.com, shuah@kernel.org, kvm@vger.kernel.org,
	x86@kernel.org

On Tue, Sep 24, 2024, Ivan Orlov wrote:
> On Mon, Sep 23, 2024 at 02:46:17PM -0700, Sean Christopherson wrote:
>  > >
> > > > No.  This is not architectural behavior.  It's not even remotely
> > > > close to
> > > > architectural behavior.  KVM's behavior isn't great, but making up
> > > > _guest visible_
> > > > behavior is not going to happen.
> > >
> > > Is this a no to the whole series or from the cover letter?
> > 
> > The whole series.
> > 
> > > For patch 1 we have observed that if a guest has incorrectly set it's
> > > IDT base to point inside of an MMIO region it will result in a triple
> > > fault (bare metal Cascake Lake Intel).
> > 
> > That happens because the IDT is garbage and/or the CPU is getting master abort
> > semantics back, not because anything in the x86 architectures says that accessing
> > MMIO during exception vectoring goes straight to shutdown.
> >
> 
> Hi Sean, thank you for the detailed reply.
> 
> Yes, I ran the reproducer on my AMD Ryzen 5 once again, and it seems like
> pointing the IDT descriptor base to a framebuffer works perfectly fine,
> without any triple faults, so injecting it into guest is not a correct
> solution.
> 
> However, I believe KVM should demonstrate consistent behaviour in
> handling MMIO during event delivery on vmx/svm, so either by returning
> an event delivery error in both cases or going into infinite loop in
> both cases. I'm going to test the kvm/next with the commits you
> mentioned, and send a fix if it still hits an infinite loop on SVM.
> 
> Also, I reckon as detecting such an issue on the KVM level doesn't
> introduce much complexity, returning a sane error flag would be nice. For
> instance, we could set one of the 'internal.data' elements to identify
> that the problem occured due to MMIO during event delivery
> 
> > > Yes a sane operating system is not really going to be doing setting it's IDT
> > > or GDT base to point into an MMIO region, but we've seen occurrences.
> > > Normally when other external things have gone horribly wrong.
> > >
> > > Ivan can clarify as to what's been seen on AMD platforms regarding the
> > > infinite loop for patch one. This was also tested on bare metal
> > > hardware. Injection of the #UD within patch 2 may be debatable but I
> > > believe Ivan has some more data from experiments backing this up.
> > 
> > I have no problems improving KVM's handling of scenarios that KVM can't emulate,
> > but there needs to be reasonable justification for taking on complexity, and KVM
> > must not make up guest visible behavior.
> 
> Regarding the #UD-related change: the way how I formulated it in the
> cover letter and the patch is confusing, sorry. We are _alredy_ enqueuing
> an #UD when fetching from MMIO, so I believe it is already architecturally
> incorrect (see handle_emulation_failure in arch/x86/kvm/x86.c). However,
> we return an emulation failure after that,

Yeah, but only because the alternative sucks worse.  If KVM unconditionally exited
with an emulation error, then unsuspecting (read: old) VMMs would likely terminate
the guest, which gives guest userspace a way to DoS the entire VM, especially on
older CPUs where KVM needs to emulate much more often.

	if (kvm->arch.exit_on_emulation_error ||
	    (emulation_type & EMULTYPE_SKIP)) {
		prepare_emulation_ctxt_failure_exit(vcpu);
		return 0;
	}

	kvm_queue_exception(vcpu, UD_VECTOR);

	if (!is_guest_mode(vcpu) && kvm_x86_call(get_cpl)(vcpu) == 0) {
		prepare_emulation_ctxt_failure_exit(vcpu);
		return 0;
	}

	return 1;

And that's exactly why KVM_CAP_EXIT_ON_EMULATION_FAILURE exists.  VMMs that know
they won't unintentionally give guest userspace what amounts to a privilege
escalation can trap the emulation failure, do some logging or whatever, and then
take whatever action it wants to take.

> and I believe how we do this
> is debatable. I maintain we should either set a flag in emulation_failure.flags
> to indicate that the error happened due to fetch from mmio (to give more
> information to VMM),

Generally speaking, I'm not opposed to adding more information along those lines.
Though realistically, I don't know that an extra flag is warranted in this case,
as it shouldn't be _that_ hard for userspace to deduce what went wrong, especially
if KVM_TRANSLATE2[*] lands (though I'm somewhat curious as to why QEMU doesn't do
the page walks itself).

[*] https://lore.kernel.org/all/20240910152207.38974-1-nikwip@amazon.de

> or we shouldn't return an error at all... Maybe it should be KVM_EXIT_MMIO with
> some flag set? What do you think?

It'd be a breaking change and added complexity, for no benefit as far as I can
tell.  KVM_EXIT_INTERNAL_ERROR is _not_ a death sentence, or at least it doesn't
have to be.  Most VMMs do terminate the guest, but nothing is stopping userspace
from grabbing RIP and emulating the instruction.  I.e. userspace doesn't need
"permission" in the form of KVM_EXIT_MMIO to try and keep the guest alive.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit
  2024-09-26  0:06         ` Sean Christopherson
@ 2024-09-27 12:13           ` Ivan Orlov
  0 siblings, 0 replies; 12+ messages in thread
From: Ivan Orlov @ 2024-09-27 12:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jack Allister, Ivan Orlov, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	tglx@linutronix.de, pbonzini@redhat.com,
	nh-open-source@amazon.com, shuah@kernel.org, kvm@vger.kernel.org,
	x86@kernel.org

On Wed, Sep 25, 2024 at 05:06:45PM -0700, Sean Christopherson wrote:
> 
> Yeah, but only because the alternative sucks worse.  If KVM unconditionally exited
> with an emulation error, then unsuspecting (read: old) VMMs would likely terminate
> the guest, which gives guest userspace a way to DoS the entire VM, especially on
> older CPUs where KVM needs to emulate much more often.
> 
>         if (kvm->arch.exit_on_emulation_error ||
>             (emulation_type & EMULTYPE_SKIP)) {
>                 prepare_emulation_ctxt_failure_exit(vcpu);
>                 return 0;
>         }
> 
>         kvm_queue_exception(vcpu, UD_VECTOR);
> 
>         if (!is_guest_mode(vcpu) && kvm_x86_call(get_cpl)(vcpu) == 0) {
>                 prepare_emulation_ctxt_failure_exit(vcpu);
>                 return 0;
>         }
> 
>         return 1;
> 
> And that's exactly why KVM_CAP_EXIT_ON_EMULATION_FAILURE exists.  VMMs that know
> they won't unintentionally give guest userspace what amounts to a privilege
> escalation can trap the emulation failure, do some logging or whatever, and then
> take whatever action it wants to take.
> 

Hi Sean,

Makes sense, thank you for the explanation.

> > and I believe how we do this
> > is debatable. I maintain we should either set a flag in emulation_failure.flags
> > to indicate that the error happened due to fetch from mmio (to give more
> > information to VMM),
> 
> Generally speaking, I'm not opposed to adding more information along those lines.
> Though realistically, I don't know that an extra flag is warranted in this case,
> as it shouldn't be _that_ hard for userspace to deduce what went wrong, especially
> if KVM_TRANSLATE2[*] lands (though I'm somewhat curious as to why QEMU doesn't do
> the page walks itself).
> 
> [*] https://lore.kernel.org/all/20240910152207.38974-1-nikwip@amazon.de
> 

Fair enough, but I still believe that it would be good to provide more
information about the failure to the VMM (considering the fact that KVM
tries to emulate an instruction anyway, adding a flag won't introduce any
performance overhead). I'll think about how we could do that without
being redundant :)

> > or we shouldn't return an error at all... Maybe it should be KVM_EXIT_MMIO with
> > some flag set? What do you think?
> 
> It'd be a breaking change and added complexity, for no benefit as far as I can
> tell.  KVM_EXIT_INTERNAL_ERROR is _not_ a death sentence, or at least it doesn't
> have to be.  Most VMMs do terminate the guest, but nothing is stopping userspace
> from grabbing RIP and emulating the instruction.  I.e. userspace doesn't need
> "permission" in the form of KVM_EXIT_MMIO to try and keep the guest alive.

Yeah, I just thought that "internal error" is not the best exit code for
the situations when guest fetches from MMIO (since it is a perfectly legal
operation from the architectural point of view). But I agree that it
would be a breaking change without functional benefit ( especially if we
provide a flag about what happened :) ).

P.S. I tested the latest kvm/next, and if we set the IDT descriptor base to
an MMIO address it still falls into the infinite loop on SVM. I'm going
to send the fix in the next couple of days.

Kind regards,
Ivan Orlov

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-27 12:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 14:18 [PATCH 0/4] Process some MMIO-related errors without KVM exit Ivan Orlov
2024-09-23 14:18 ` [PATCH 1/4] KVM: vmx, svm, mmu: Fix MMIO during event delivery handling Ivan Orlov
2024-09-23 14:18 ` [PATCH 2/4] KVM: x86: Inject UD when fetching from MMIO Ivan Orlov
2024-09-23 14:18 ` [PATCH 3/4] selftests: KVM: Change expected exit code in test_zero_memory_regions Ivan Orlov
2024-09-23 14:18 ` [PATCH 4/4] selftests: KVM: Add new test for faulty mmio usage Ivan Orlov
2024-09-23 17:04 ` [PATCH 0/4] Process some MMIO-related errors without KVM exit Sean Christopherson
2024-09-23 19:38   ` Allister, Jack
2024-09-23 21:46     ` Sean Christopherson
2024-09-24  9:54       ` Ivan Orlov
2024-09-26  0:06         ` Sean Christopherson
2024-09-27 12:13           ` Ivan Orlov
2024-09-24  7:38     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).