public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: x86: Accelerate reading CR3 for guest debug
@ 2025-11-21 19:32 Yosry Ahmed
  2025-11-21 19:32 ` [PATCH 1/3] KVM: x86: Add CR3 to guest debug info Yosry Ahmed
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Yosry Ahmed @ 2025-11-21 19:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel, Yosry Ahmed

Some guest debuggers use the value of CR3 to attribute a debug event to
a guest process. Providing CR3 in the debug info makes this
significantly faster than doing KVM_GET_SREGS every time, so add support
for that. Also extend the debug_regs selftest to cover this.

Yosry Ahmed (3):
  KVM: x86: Add CR3 to guest debug info
  KVM: selftests: Use TEST_ASSERT_EQ() in debug_regs
  KVM: selftests: Verify CR3 in debug_regs

 arch/x86/include/uapi/asm/kvm.h              |  1 +
 arch/x86/kvm/svm/svm.c                       |  2 +
 arch/x86/kvm/vmx/vmx.c                       |  2 +
 arch/x86/kvm/x86.c                           |  3 +
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/x86/debug_regs.c | 82 ++++++++++----------
 6 files changed, 48 insertions(+), 43 deletions(-)


base-commit: d209f1ea367750edfcba7db8d199a856e4186511
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] KVM: x86: Add CR3 to guest debug info
  2025-11-21 19:32 [PATCH 0/3] KVM: x86: Accelerate reading CR3 for guest debug Yosry Ahmed
@ 2025-11-21 19:32 ` Yosry Ahmed
  2025-11-21 21:01   ` Sean Christopherson
  2025-11-21 19:32 ` [PATCH 2/3] KVM: selftests: Use TEST_ASSERT_EQ() in debug_regs Yosry Ahmed
  2025-11-21 19:32 ` [PATCH 3/3] KVM: selftests: Verify CR3 " Yosry Ahmed
  2 siblings, 1 reply; 8+ messages in thread
From: Yosry Ahmed @ 2025-11-21 19:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel, Yosry Ahmed

Add the value of CR3 to the information returned to userspace on
KVM_EXIT_DEBUG. Use KVM_CAP_X86_GUEST_DEBUG_CR3 to advertise this.

During guest debugging, the value of CR3 can be used by VM debuggers to
(roughly) identify the process running in the guest. This can be used to
index debugging events by process, or filter events from some processes
and quickly skip them.

Currently, debuggers would need to use the KVM_GET_SREGS ioctl on every
event to get the value of CR3, which considerably slows things down.
This can be easily avoided by adding the value of CR3 to the captured
debugging info.

Signed-off-by: Ken Hofsass <hofsass@google.com>
Co-developed-by: Ken Hofsass <hofsass@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/include/uapi/asm/kvm.h | 1 +
 arch/x86/kvm/svm/svm.c          | 2 ++
 arch/x86/kvm/vmx/vmx.c          | 2 ++
 arch/x86/kvm/x86.c              | 3 +++
 include/uapi/linux/kvm.h        | 1 +
 5 files changed, 9 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7ceff6583652..c351e458189b 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -293,6 +293,7 @@ struct kvm_debug_exit_arch {
 	__u64 pc;
 	__u64 dr6;
 	__u64 dr7;
+	__u64 cr3;
 };
 
 #define KVM_GUESTDBG_USE_SW_BP		0x00010000
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f56c2d895011..85982e96b927 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1920,6 +1920,7 @@ static int db_interception(struct kvm_vcpu *vcpu)
 		kvm_run->debug.arch.dr7 = svm->vmcb->save.dr7;
 		kvm_run->debug.arch.pc =
 			svm->vmcb->save.cs.base + svm->vmcb->save.rip;
+		kvm_run->debug.arch.cr3 = svm->vmcb->save.cr3;
 		kvm_run->debug.arch.exception = DB_VECTOR;
 		return 0;
 	}
@@ -1934,6 +1935,7 @@ static int bp_interception(struct kvm_vcpu *vcpu)
 
 	kvm_run->exit_reason = KVM_EXIT_DEBUG;
 	kvm_run->debug.arch.pc = svm->vmcb->save.cs.base + svm->vmcb->save.rip;
+	kvm_run->debug.arch.cr3 = svm->vmcb->save.cr3;
 	kvm_run->debug.arch.exception = BP_VECTOR;
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4cbe8c84b636..9fd8a6140af4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5454,6 +5454,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		kvm_run->exit_reason = KVM_EXIT_DEBUG;
 		kvm_run->debug.arch.pc = kvm_get_linear_rip(vcpu);
 		kvm_run->debug.arch.exception = ex_no;
+		kvm_run->debug.arch.cr3 = kvm_read_cr3(vcpu);
 		break;
 	case AC_VECTOR:
 		if (vmx_guest_inject_ac(vcpu)) {
@@ -5685,6 +5686,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
 			vcpu->run->debug.arch.dr6 = DR6_BD | DR6_ACTIVE_LOW;
 			vcpu->run->debug.arch.dr7 = dr7;
 			vcpu->run->debug.arch.pc = kvm_get_linear_rip(vcpu);
+			vcpu->run->debug.arch.cr3 = kvm_read_cr3(vcpu);
 			vcpu->run->debug.arch.exception = DB_VECTOR;
 			vcpu->run->exit_reason = KVM_EXIT_DEBUG;
 			return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c6d899d53dd..636ad63ac34a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4855,6 +4855,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_MEMORY_FAULT_INFO:
 	case KVM_CAP_X86_GUEST_MODE:
 	case KVM_CAP_ONE_REG:
+	case KVM_CAP_X86_GUEST_DEBUG_CR3:
 		r = 1;
 		break;
 	case KVM_CAP_PRE_FAULT_MEMORY:
@@ -9195,6 +9196,7 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
 		kvm_run->debug.arch.dr6 = DR6_BS | DR6_ACTIVE_LOW;
 		kvm_run->debug.arch.pc = kvm_get_linear_rip(vcpu);
+		kvm_run->debug.arch.cr3 = kvm_read_cr3(vcpu);
 		kvm_run->debug.arch.exception = DB_VECTOR;
 		kvm_run->exit_reason = KVM_EXIT_DEBUG;
 		return 0;
@@ -9277,6 +9279,7 @@ static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu,
 		if (dr6 != 0) {
 			kvm_run->debug.arch.dr6 = dr6 | DR6_ACTIVE_LOW;
 			kvm_run->debug.arch.pc = eip;
+			kvm_run->debug.arch.cr3 = kvm_read_cr3(vcpu);
 			kvm_run->debug.arch.exception = DB_VECTOR;
 			kvm_run->exit_reason = KVM_EXIT_DEBUG;
 			*r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 52f6000ab020..58842b74fca1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -963,6 +963,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_RISCV_MP_STATE_RESET 242
 #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
 #define KVM_CAP_GUEST_MEMFD_FLAGS 244
+#define KVM_CAP_X86_GUEST_DEBUG_CR3 245
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] KVM: selftests: Use TEST_ASSERT_EQ() in debug_regs
  2025-11-21 19:32 [PATCH 0/3] KVM: x86: Accelerate reading CR3 for guest debug Yosry Ahmed
  2025-11-21 19:32 ` [PATCH 1/3] KVM: x86: Add CR3 to guest debug info Yosry Ahmed
@ 2025-11-21 19:32 ` Yosry Ahmed
  2025-11-21 19:32 ` [PATCH 3/3] KVM: selftests: Verify CR3 " Yosry Ahmed
  2 siblings, 0 replies; 8+ messages in thread
From: Yosry Ahmed @ 2025-11-21 19:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel, Yosry Ahmed

The test currently uses calls of TEST_ASSERT() that combine all checked
conditions for every test case. This makes it unclear which value is
incorrect without visually parsing the output.

The only useful content in the error message is the test case,
especially for test cases running in loops where even the line number of
the failed assertion does not provide full information.

Switch to using TEST_ASSERT_EQ(), and print the test case currently
being asserted to keep the information intact. The test output is a lot
more verbose now, but debuggability trumps conciseness (right?).

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 tools/testing/selftests/kvm/x86/debug_regs.c | 66 +++++++-------------
 1 file changed, 24 insertions(+), 42 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86/debug_regs.c b/tools/testing/selftests/kvm/x86/debug_regs.c
index 2d814c1d1dc4..563e52217cdd 100644
--- a/tools/testing/selftests/kvm/x86/debug_regs.c
+++ b/tools/testing/selftests/kvm/x86/debug_regs.c
@@ -104,20 +104,19 @@ int main(void)
 	run = vcpu->run;
 
 	/* Test software BPs - int3 */
+	pr_info("Testing INT3\n");
 	memset(&debug, 0, sizeof(debug));
 	debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
 	vcpu_guest_debug_set(vcpu, &debug);
 	vcpu_run(vcpu);
-	TEST_ASSERT(run->exit_reason == KVM_EXIT_DEBUG &&
-		    run->debug.arch.exception == BP_VECTOR &&
-		    run->debug.arch.pc == CAST_TO_RIP(sw_bp),
-		    "INT3: exit %d exception %d rip 0x%llx (should be 0x%llx)",
-		    run->exit_reason, run->debug.arch.exception,
-		    run->debug.arch.pc, CAST_TO_RIP(sw_bp));
+	TEST_ASSERT_EQ(run->exit_reason, KVM_EXIT_DEBUG);
+	TEST_ASSERT_EQ(run->debug.arch.exception, BP_VECTOR);
+	TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(sw_bp));
 	vcpu_skip_insn(vcpu, 1);
 
 	/* Test instruction HW BP over DR[0-3] */
 	for (i = 0; i < 4; i++) {
+		pr_info("Testing INS_HW_BP DR[%d]\n", i);
 		memset(&debug, 0, sizeof(debug));
 		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP;
 		debug.arch.debugreg[i] = CAST_TO_RIP(hw_bp);
@@ -125,21 +124,17 @@ int main(void)
 		vcpu_guest_debug_set(vcpu, &debug);
 		vcpu_run(vcpu);
 		target_dr6 = 0xffff0ff0 | (1UL << i);
-		TEST_ASSERT(run->exit_reason == KVM_EXIT_DEBUG &&
-			    run->debug.arch.exception == DB_VECTOR &&
-			    run->debug.arch.pc == CAST_TO_RIP(hw_bp) &&
-			    run->debug.arch.dr6 == target_dr6,
-			    "INS_HW_BP (DR%d): exit %d exception %d rip 0x%llx "
-			    "(should be 0x%llx) dr6 0x%llx (should be 0x%llx)",
-			    i, run->exit_reason, run->debug.arch.exception,
-			    run->debug.arch.pc, CAST_TO_RIP(hw_bp),
-			    run->debug.arch.dr6, target_dr6);
+		TEST_ASSERT_EQ(run->exit_reason, KVM_EXIT_DEBUG);
+		TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
+		TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(hw_bp));
+		TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
 	}
 	/* Skip "nop" */
 	vcpu_skip_insn(vcpu, 1);
 
 	/* Test data access HW BP over DR[0-3] */
 	for (i = 0; i < 4; i++) {
+		pr_info("Testing DATA_HW_BP DR[%d]\n", i);
 		memset(&debug, 0, sizeof(debug));
 		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP;
 		debug.arch.debugreg[i] = CAST_TO_RIP(guest_value);
@@ -148,15 +143,10 @@ int main(void)
 		vcpu_guest_debug_set(vcpu, &debug);
 		vcpu_run(vcpu);
 		target_dr6 = 0xffff0ff0 | (1UL << i);
-		TEST_ASSERT(run->exit_reason == KVM_EXIT_DEBUG &&
-			    run->debug.arch.exception == DB_VECTOR &&
-			    run->debug.arch.pc == CAST_TO_RIP(write_data) &&
-			    run->debug.arch.dr6 == target_dr6,
-			    "DATA_HW_BP (DR%d): exit %d exception %d rip 0x%llx "
-			    "(should be 0x%llx) dr6 0x%llx (should be 0x%llx)",
-			    i, run->exit_reason, run->debug.arch.exception,
-			    run->debug.arch.pc, CAST_TO_RIP(write_data),
-			    run->debug.arch.dr6, target_dr6);
+		TEST_ASSERT_EQ(run->exit_reason, KVM_EXIT_DEBUG);
+		TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
+		TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(write_data));
+		TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
 		/* Rollback the 4-bytes "mov" */
 		vcpu_skip_insn(vcpu, -7);
 	}
@@ -167,6 +157,7 @@ int main(void)
 	target_rip = CAST_TO_RIP(ss_start);
 	target_dr6 = 0xffff4ff0ULL;
 	for (i = 0; i < ARRAY_SIZE(ss_size); i++) {
+		pr_info("Testing SINGLE_STEP (%d)\n", i);
 		target_rip += ss_size[i];
 		memset(&debug, 0, sizeof(debug));
 		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP |
@@ -174,33 +165,24 @@ int main(void)
 		debug.arch.debugreg[7] = 0x00000400;
 		vcpu_guest_debug_set(vcpu, &debug);
 		vcpu_run(vcpu);
-		TEST_ASSERT(run->exit_reason == KVM_EXIT_DEBUG &&
-			    run->debug.arch.exception == DB_VECTOR &&
-			    run->debug.arch.pc == target_rip &&
-			    run->debug.arch.dr6 == target_dr6,
-			    "SINGLE_STEP[%d]: exit %d exception %d rip 0x%llx "
-			    "(should be 0x%llx) dr6 0x%llx (should be 0x%llx)",
-			    i, run->exit_reason, run->debug.arch.exception,
-			    run->debug.arch.pc, target_rip, run->debug.arch.dr6,
-			    target_dr6);
+		TEST_ASSERT_EQ(run->exit_reason, KVM_EXIT_DEBUG);
+		TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
+		TEST_ASSERT_EQ(run->debug.arch.pc, target_rip);
+		TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
 	}
 
 	/* Finally test global disable */
+	pr_info("Testing DR7.GD\n");
 	memset(&debug, 0, sizeof(debug));
 	debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP;
 	debug.arch.debugreg[7] = 0x400 | DR7_GD;
 	vcpu_guest_debug_set(vcpu, &debug);
 	vcpu_run(vcpu);
 	target_dr6 = 0xffff0ff0 | DR6_BD;
-	TEST_ASSERT(run->exit_reason == KVM_EXIT_DEBUG &&
-		    run->debug.arch.exception == DB_VECTOR &&
-		    run->debug.arch.pc == CAST_TO_RIP(bd_start) &&
-		    run->debug.arch.dr6 == target_dr6,
-			    "DR7.GD: exit %d exception %d rip 0x%llx "
-			    "(should be 0x%llx) dr6 0x%llx (should be 0x%llx)",
-			    run->exit_reason, run->debug.arch.exception,
-			    run->debug.arch.pc, target_rip, run->debug.arch.dr6,
-			    target_dr6);
+	TEST_ASSERT_EQ(run->exit_reason, KVM_EXIT_DEBUG);
+	TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
+	TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(bd_start));
+	TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
 
 	/* Disable all debug controls, run to the end */
 	memset(&debug, 0, sizeof(debug));
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] KVM: selftests: Verify CR3 in debug_regs
  2025-11-21 19:32 [PATCH 0/3] KVM: x86: Accelerate reading CR3 for guest debug Yosry Ahmed
  2025-11-21 19:32 ` [PATCH 1/3] KVM: x86: Add CR3 to guest debug info Yosry Ahmed
  2025-11-21 19:32 ` [PATCH 2/3] KVM: selftests: Use TEST_ASSERT_EQ() in debug_regs Yosry Ahmed
@ 2025-11-21 19:32 ` Yosry Ahmed
  2 siblings, 0 replies; 8+ messages in thread
From: Yosry Ahmed @ 2025-11-21 19:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel, Yosry Ahmed

If KVM_CAP_X86_GUEST_DEBUG_CR3 is set, check that the value of CR3 in
struct kvm_run on KVM_EXIT_DEBUG matches that returned by KVM_GET_SREGS.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 tools/testing/selftests/kvm/x86/debug_regs.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/x86/debug_regs.c b/tools/testing/selftests/kvm/x86/debug_regs.c
index 563e52217cdd..ecad92789182 100644
--- a/tools/testing/selftests/kvm/x86/debug_regs.c
+++ b/tools/testing/selftests/kvm/x86/debug_regs.c
@@ -80,8 +80,9 @@ static void vcpu_skip_insn(struct kvm_vcpu *vcpu, int insn_len)
 
 int main(void)
 {
+	unsigned long long target_dr6, target_rip, target_cr3;
 	struct kvm_guest_debug debug;
-	unsigned long long target_dr6, target_rip;
+	struct kvm_sregs sregs;
 	struct kvm_vcpu *vcpu;
 	struct kvm_run *run;
 	struct kvm_vm *vm;
@@ -103,6 +104,14 @@ int main(void)
 	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
 	run = vcpu->run;
 
+	if (kvm_has_cap(KVM_CAP_X86_GUEST_DEBUG_CR3)) {
+		pr_info("Debug info includes guest CR3\n");
+		vcpu_sregs_get(vcpu, &sregs);
+		target_cr3 = sregs.cr3;
+	} else {
+		target_cr3 = 0;
+	}
+
 	/* Test software BPs - int3 */
 	pr_info("Testing INT3\n");
 	memset(&debug, 0, sizeof(debug));
@@ -112,6 +121,7 @@ int main(void)
 	TEST_ASSERT_EQ(run->exit_reason, KVM_EXIT_DEBUG);
 	TEST_ASSERT_EQ(run->debug.arch.exception, BP_VECTOR);
 	TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(sw_bp));
+	TEST_ASSERT_EQ(run->debug.arch.cr3, target_cr3);
 	vcpu_skip_insn(vcpu, 1);
 
 	/* Test instruction HW BP over DR[0-3] */
@@ -128,6 +138,7 @@ int main(void)
 		TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
 		TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(hw_bp));
 		TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
+		TEST_ASSERT_EQ(run->debug.arch.cr3, target_cr3);
 	}
 	/* Skip "nop" */
 	vcpu_skip_insn(vcpu, 1);
@@ -147,6 +158,7 @@ int main(void)
 		TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
 		TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(write_data));
 		TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
+		TEST_ASSERT_EQ(run->debug.arch.cr3, target_cr3);
 		/* Rollback the 4-bytes "mov" */
 		vcpu_skip_insn(vcpu, -7);
 	}
@@ -169,6 +181,7 @@ int main(void)
 		TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
 		TEST_ASSERT_EQ(run->debug.arch.pc, target_rip);
 		TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
+		TEST_ASSERT_EQ(run->debug.arch.cr3, target_cr3);
 	}
 
 	/* Finally test global disable */
@@ -183,6 +196,7 @@ int main(void)
 	TEST_ASSERT_EQ(run->debug.arch.exception, DB_VECTOR);
 	TEST_ASSERT_EQ(run->debug.arch.pc, CAST_TO_RIP(bd_start));
 	TEST_ASSERT_EQ(run->debug.arch.dr6, target_dr6);
+	TEST_ASSERT_EQ(run->debug.arch.cr3, target_cr3);
 
 	/* Disable all debug controls, run to the end */
 	memset(&debug, 0, sizeof(debug));
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Add CR3 to guest debug info
  2025-11-21 19:32 ` [PATCH 1/3] KVM: x86: Add CR3 to guest debug info Yosry Ahmed
@ 2025-11-21 21:01   ` Sean Christopherson
  2025-11-21 23:12     ` Yosry Ahmed
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-11-21 21:01 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel

On Fri, Nov 21, 2025, Yosry Ahmed wrote:
> Add the value of CR3 to the information returned to userspace on
> KVM_EXIT_DEBUG. Use KVM_CAP_X86_GUEST_DEBUG_CR3 to advertise this.
> 
> During guest debugging, the value of CR3 can be used by VM debuggers to
> (roughly) identify the process running in the guest. This can be used to
> index debugging events by process, or filter events from some processes
> and quickly skip them.
> 
> Currently, debuggers would need to use the KVM_GET_SREGS ioctl on every
> event to get the value of CR3, which considerably slows things down.
> This can be easily avoided by adding the value of CR3 to the captured
> debugging info.
> 
> Signed-off-by: Ken Hofsass <hofsass@google.com>
> Co-developed-by: Ken Hofsass <hofsass@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/include/uapi/asm/kvm.h | 1 +
>  arch/x86/kvm/svm/svm.c          | 2 ++
>  arch/x86/kvm/vmx/vmx.c          | 2 ++
>  arch/x86/kvm/x86.c              | 3 +++
>  include/uapi/linux/kvm.h        | 1 +
>  5 files changed, 9 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 7ceff6583652..c351e458189b 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -293,6 +293,7 @@ struct kvm_debug_exit_arch {
>  	__u64 pc;
>  	__u64 dr6;
>  	__u64 dr7;
> +	__u64 cr3;
>  };

I really, really don't like this.  It "solves" a very specific problem for a very
specific use case without any consideration for uAPI, precedence or maintenance.
E.g. in most cases, CR3 without CR0, CR4, EFER, etc. is largely meaningless.  The
only thing it's really useful for is an opaque guest process identifer.

KVM already provides kvm_run.kvm_valid_regs to let userspace grab register state
on exit to userspace.  If userspace is debugging, why not simply save all regs on
exit?

If the answer is "because it slows down all other exits", then I would much rather
give userspace the ability to conditionally save registers based on the exit reason,
e.g. something like this (completely untested, no CAP, etc.)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c6d899d53dd..337043d49ee6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -127,7 +127,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static void process_nmi(struct kvm_vcpu *vcpu);
 static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
-static void store_regs(struct kvm_vcpu *vcpu);
+static void kvm_run_save_regs_on_exit(struct kvm_vcpu *vcpu);
 static int sync_regs(struct kvm_vcpu *vcpu);
 static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu);
 
@@ -10487,6 +10487,8 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
 {
        struct kvm_run *kvm_run = vcpu->run;
 
+       kvm_run_save_regs_on_exit(vcpu);
+
        kvm_run->if_flag = kvm_x86_call(get_if_flag)(vcpu);
        kvm_run->cr8 = kvm_get_cr8(vcpu);
        kvm_run->apic_base = vcpu->arch.apic_base;
@@ -11978,8 +11980,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 out:
        kvm_put_guest_fpu(vcpu);
-       if (kvm_run->kvm_valid_regs && likely(!vcpu->arch.guest_state_protected))
-               store_regs(vcpu);
        post_kvm_run_save(vcpu);
        kvm_vcpu_srcu_read_unlock(vcpu);
 
@@ -12598,10 +12598,30 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
        return 0;
 }
 
-static void store_regs(struct kvm_vcpu *vcpu)
+static void kvm_run_save_regs_on_exit(struct kvm_vcpu *vcpu)
 {
+       struct kvm_run *run = vcpu->run;
+       u32 nr_exit_reasons = sizeof(run->kvm_save_regs_on_exit) * BITS_PER_BYTE;
+       u64 valid_regs = READ_ONCE(run->kvm_valid_regs);
+       u32 exit_reason = READ_ONCE(run->exit_reason);
+
        BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
 
+       if (!valid_regs)
+               return;
+
+       if (unlikely(!vcpu->arch.guest_state_protected))
+               return;
+
+       if (valid_regs & KVM_SYNC_REGS_CONDITIONAL) {
+               if (exit_reason >= nr_exit_reasons)
+                       return;
+
+               exit_reason = array_index_nospec(exit_reason, nr_exit_reasons);
+               if (!test_bit(exit_reason, (void *)run->kvm_save_regs_on_exit))
+                       return;
+       }
+
        if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_REGS)
                __get_regs(vcpu, &vcpu->run->s.regs.regs);
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 52f6000ab020..452805c1337b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -494,8 +494,12 @@ struct kvm_run {
                struct kvm_sync_regs regs;
                char padding[SYNC_REGS_SIZE_BYTES];
        } s;
+
+       __u64 kvm_save_regs_on_exit[16];
 };
 
+#define KVM_SYNC_REGS_CONDITIONAL      _BITULL(63)
+
 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
 
 struct kvm_coalesced_mmio_zone {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Add CR3 to guest debug info
  2025-11-21 21:01   ` Sean Christopherson
@ 2025-11-21 23:12     ` Yosry Ahmed
  2025-11-24 14:45       ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yosry Ahmed @ 2025-11-21 23:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel

On Fri, Nov 21, 2025 at 01:01:40PM -0800, Sean Christopherson wrote:
> On Fri, Nov 21, 2025, Yosry Ahmed wrote:
> > Add the value of CR3 to the information returned to userspace on
> > KVM_EXIT_DEBUG. Use KVM_CAP_X86_GUEST_DEBUG_CR3 to advertise this.
> > 
> > During guest debugging, the value of CR3 can be used by VM debuggers to
> > (roughly) identify the process running in the guest. This can be used to
> > index debugging events by process, or filter events from some processes
> > and quickly skip them.
> > 
> > Currently, debuggers would need to use the KVM_GET_SREGS ioctl on every
> > event to get the value of CR3, which considerably slows things down.
> > This can be easily avoided by adding the value of CR3 to the captured
> > debugging info.
> > 
> > Signed-off-by: Ken Hofsass <hofsass@google.com>
> > Co-developed-by: Ken Hofsass <hofsass@google.com>
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/include/uapi/asm/kvm.h | 1 +
> >  arch/x86/kvm/svm/svm.c          | 2 ++
> >  arch/x86/kvm/vmx/vmx.c          | 2 ++
> >  arch/x86/kvm/x86.c              | 3 +++
> >  include/uapi/linux/kvm.h        | 1 +
> >  5 files changed, 9 insertions(+)
> > 
> > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> > index 7ceff6583652..c351e458189b 100644
> > --- a/arch/x86/include/uapi/asm/kvm.h
> > +++ b/arch/x86/include/uapi/asm/kvm.h
> > @@ -293,6 +293,7 @@ struct kvm_debug_exit_arch {
> >  	__u64 pc;
> >  	__u64 dr6;
> >  	__u64 dr7;
> > +	__u64 cr3;
> >  };
> 
> I really, really don't like this.  It "solves" a very specific problem for a very
> specific use case without any consideration for uAPI, precedence or maintenance.
> E.g. in most cases, CR3 without CR0, CR4, EFER, etc. is largely meaningless.  The
> only thing it's really useful for is an opaque guest process identifer.

To be fair, I never advertised it to be anything more than that :P

> 
> KVM already provides kvm_run.kvm_valid_regs to let userspace grab register state
> on exit to userspace.  If userspace is debugging, why not simply save all regs on
> exit?
> 
> If the answer is "because it slows down all other exits", then I would much rather
> give userspace the ability to conditionally save registers based on the exit reason,
> e.g. something like this (completely untested, no CAP, etc.)

I like this approach conceptually, but I think it's an overkill for this
use case tbh. Especially the memory usage, that's 1K per vCPU for the
bitmap. I know it can be smaller, but probably not small either because
it will be a problem if we run out of bits.

I think it may be sufficient for now to add something
KVM_SYNC_REGS_DEBUG to sync registers on KVM_EXIT_DEBUG. Not very
generic, but I don't think a lot exit reasons will make use of this.

That being said, let me take a closer look at our VMM and see what
options could work for us before spending more time on this. We
currently use CR3 as implemented in this patch, so this would have been
a drop-in replacement. For anything else I need to make sure our VMM
will be able to use it first.

> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0c6d899d53dd..337043d49ee6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -127,7 +127,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>  static void update_cr8_intercept(struct kvm_vcpu *vcpu);
>  static void process_nmi(struct kvm_vcpu *vcpu);
>  static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
> -static void store_regs(struct kvm_vcpu *vcpu);
> +static void kvm_run_save_regs_on_exit(struct kvm_vcpu *vcpu);
>  static int sync_regs(struct kvm_vcpu *vcpu);
>  static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu);
>  
> @@ -10487,6 +10487,8 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
>  {
>         struct kvm_run *kvm_run = vcpu->run;
>  
> +       kvm_run_save_regs_on_exit(vcpu);
> +
>         kvm_run->if_flag = kvm_x86_call(get_if_flag)(vcpu);
>         kvm_run->cr8 = kvm_get_cr8(vcpu);
>         kvm_run->apic_base = vcpu->arch.apic_base;
> @@ -11978,8 +11980,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  
>  out:
>         kvm_put_guest_fpu(vcpu);
> -       if (kvm_run->kvm_valid_regs && likely(!vcpu->arch.guest_state_protected))
> -               store_regs(vcpu);
>         post_kvm_run_save(vcpu);
>         kvm_vcpu_srcu_read_unlock(vcpu);
>  
> @@ -12598,10 +12598,30 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
>         return 0;
>  }
>  
> -static void store_regs(struct kvm_vcpu *vcpu)
> +static void kvm_run_save_regs_on_exit(struct kvm_vcpu *vcpu)
>  {
> +       struct kvm_run *run = vcpu->run;
> +       u32 nr_exit_reasons = sizeof(run->kvm_save_regs_on_exit) * BITS_PER_BYTE;
> +       u64 valid_regs = READ_ONCE(run->kvm_valid_regs);
> +       u32 exit_reason = READ_ONCE(run->exit_reason);
> +
>         BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
>  
> +       if (!valid_regs)
> +               return;
> +
> +       if (unlikely(!vcpu->arch.guest_state_protected))
> +               return;
> +
> +       if (valid_regs & KVM_SYNC_REGS_CONDITIONAL) {
> +               if (exit_reason >= nr_exit_reasons)
> +                       return;
> +
> +               exit_reason = array_index_nospec(exit_reason, nr_exit_reasons);
> +               if (!test_bit(exit_reason, (void *)run->kvm_save_regs_on_exit))
> +                       return;
> +       }
> +
>         if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_REGS)
>                 __get_regs(vcpu, &vcpu->run->s.regs.regs);
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 52f6000ab020..452805c1337b 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -494,8 +494,12 @@ struct kvm_run {
>                 struct kvm_sync_regs regs;
>                 char padding[SYNC_REGS_SIZE_BYTES];
>         } s;
> +
> +       __u64 kvm_save_regs_on_exit[16];
>  };
>  
> +#define KVM_SYNC_REGS_CONDITIONAL      _BITULL(63)
> +
>  /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
>  
>  struct kvm_coalesced_mmio_zone {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Add CR3 to guest debug info
  2025-11-21 23:12     ` Yosry Ahmed
@ 2025-11-24 14:45       ` Sean Christopherson
  2025-11-24 15:35         ` Yosry Ahmed
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-11-24 14:45 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel

On Fri, Nov 21, 2025, Yosry Ahmed wrote:
> On Fri, Nov 21, 2025 at 01:01:40PM -0800, Sean Christopherson wrote:
> > On Fri, Nov 21, 2025, Yosry Ahmed wrote:
> > KVM already provides kvm_run.kvm_valid_regs to let userspace grab register state
> > on exit to userspace.  If userspace is debugging, why not simply save all regs on
> > exit?
> > 
> > If the answer is "because it slows down all other exits", then I would much rather
> > give userspace the ability to conditionally save registers based on the exit reason,
> > e.g. something like this (completely untested, no CAP, etc.)
> 
> I like this approach conceptually, but I think it's an overkill for this
> use case tbh. Especially the memory usage, that's 1K per vCPU for the
> bitmap. I know it can be smaller, but probably not small either because
> it will be a problem if we run out of bits.
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 52f6000ab020..452805c1337b 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -494,8 +494,12 @@ struct kvm_run {
> >                 struct kvm_sync_regs regs;
> >                 char padding[SYNC_REGS_SIZE_BYTES];
> >         } s;
> > +
> > +       __u64 kvm_save_regs_on_exit[16];

Heh, check your math.  It's 1024 bits, 128 bytes.  Reserving space for 1024 exits
is likely extreme overkill given that KVM is sitting at 40 exits after ~18 years,
so as you say we could cut that down significantly depending on how willing we are
to risk having to add kvm_save_regs_on_exit2 in the future.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Add CR3 to guest debug info
  2025-11-24 14:45       ` Sean Christopherson
@ 2025-11-24 15:35         ` Yosry Ahmed
  0 siblings, 0 replies; 8+ messages in thread
From: Yosry Ahmed @ 2025-11-24 15:35 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Ken Hofsass, kvm, linux-kernel

On Mon, Nov 24, 2025 at 06:45:22AM -0800, Sean Christopherson wrote:
> On Fri, Nov 21, 2025, Yosry Ahmed wrote:
> > On Fri, Nov 21, 2025 at 01:01:40PM -0800, Sean Christopherson wrote:
> > > On Fri, Nov 21, 2025, Yosry Ahmed wrote:
> > > KVM already provides kvm_run.kvm_valid_regs to let userspace grab register state
> > > on exit to userspace.  If userspace is debugging, why not simply save all regs on
> > > exit?
> > > 
> > > If the answer is "because it slows down all other exits", then I would much rather
> > > give userspace the ability to conditionally save registers based on the exit reason,
> > > e.g. something like this (completely untested, no CAP, etc.)
> > 
> > I like this approach conceptually, but I think it's an overkill for this
> > use case tbh. Especially the memory usage, that's 1K per vCPU for the
> > bitmap. I know it can be smaller, but probably not small either because
> > it will be a problem if we run out of bits.
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 52f6000ab020..452805c1337b 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -494,8 +494,12 @@ struct kvm_run {
> > >                 struct kvm_sync_regs regs;
> > >                 char padding[SYNC_REGS_SIZE_BYTES];
> > >         } s;
> > > +
> > > +       __u64 kvm_save_regs_on_exit[16];
> 
> Heh, check your math.  It's 1024 bits, 128 bytes.  Reserving space for 1024 exits
> is likely extreme overkill given that KVM is sitting at 40 exits after ~18 years,
> so as you say we could cut that down significantly depending on how willing we are
> to risk having to add kvm_save_regs_on_exit2 in the future.

Oh well, I am gonna try and blame this one on Friday afternoon :D

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-24 15:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-21 19:32 [PATCH 0/3] KVM: x86: Accelerate reading CR3 for guest debug Yosry Ahmed
2025-11-21 19:32 ` [PATCH 1/3] KVM: x86: Add CR3 to guest debug info Yosry Ahmed
2025-11-21 21:01   ` Sean Christopherson
2025-11-21 23:12     ` Yosry Ahmed
2025-11-24 14:45       ` Sean Christopherson
2025-11-24 15:35         ` Yosry Ahmed
2025-11-21 19:32 ` [PATCH 2/3] KVM: selftests: Use TEST_ASSERT_EQ() in debug_regs Yosry Ahmed
2025-11-21 19:32 ` [PATCH 3/3] KVM: selftests: Verify CR3 " Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox