* [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough
@ 2025-05-22 0:55 Maxim Levitsky
2025-05-22 0:55 ` [PATCH v5 1/5] KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap Maxim Levitsky
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Maxim Levitsky @ 2025-05-22 0:55 UTC (permalink / raw)
To: kvm
Cc: H. Peter Anvin, Thomas Gleixner, Sean Christopherson, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Maxim Levitsky
Currently KVM allows the guest to set IA32_DEBUGCTL to whatever value
the guest wants, only capped by a bitmask of allowed bits
(except in the nested entry where KVM apparently doesn't even check
this set of allowed bits - this patch series also fixes that)
However some IA32_DEBUGCTL bits can be useful for the host, e.g the
IA32_DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM which isolates the PMU from
the influence of the host's SMM.
Reshuffle some of the code to allow (currently only this bit) to be passed
though from its host value to the guest.
Note that host value of this bit can be toggled by writing 0 or 1 to
/sys/devices/cpu/freeze_on_smi
This was tested on a Intel(R) Xeon(R) Silver 4410Y with KVM unit tests and
kvm selftests running in parallel with tight loop writing to IO port 0xB2
which on this machine generates #SMIs.
SMI generation was also verified also by reading the MSR 0x34 which
shows the current count of #SMIs received.
Despite the flood of #SMIs, the tests survived with this patch applied.
V5: addressed the review feedback. Thanks.
I also decided to wrap the read/write of the GUEST_IA32_DEBUGCTL in pmu_intel.c as
well, just for the sake of consistency.
Best regards,
Maxim Levitsky
Maxim Levitsky (3):
KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2
KVM: VMX: wrap guest access to IA32_DEBUGCTL with wrappers
KVM: VMX: preserve DEBUGCTLMSR_FREEZE_IN_SMM
Sean Christopherson (2):
KVM: x86: Convert vcpu_run()'s immediate exit param into a generic
bitmap
KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag
arch/x86/include/asm/kvm-x86-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 9 ++++++--
arch/x86/kvm/svm/svm.c | 14 +++++++-----
arch/x86/kvm/vmx/main.c | 15 +++----------
arch/x86/kvm/vmx/nested.c | 7 +++---
arch/x86/kvm/vmx/pmu_intel.c | 8 +++----
arch/x86/kvm/vmx/tdx.c | 3 ++-
arch/x86/kvm/vmx/vmx.c | 36 +++++++++++++++++++++---------
arch/x86/kvm/vmx/vmx.h | 3 +++
arch/x86/kvm/vmx/x86_ops.h | 4 ++--
arch/x86/kvm/x86.c | 18 ++++++++++-----
11 files changed, 71 insertions(+), 47 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v5 1/5] KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap
2025-05-22 0:55 [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
@ 2025-05-22 0:55 ` Maxim Levitsky
2025-05-22 17:41 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 2/5] KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag Maxim Levitsky
` (4 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Maxim Levitsky @ 2025-05-22 0:55 UTC (permalink / raw)
To: kvm
Cc: H. Peter Anvin, Thomas Gleixner, Sean Christopherson, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Maxim Levitsky
From: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 6 +++++-
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/vmx/main.c | 6 +++---
arch/x86/kvm/vmx/tdx.c | 3 ++-
arch/x86/kvm/vmx/vmx.c | 3 ++-
arch/x86/kvm/vmx/x86_ops.h | 4 ++--
arch/x86/kvm/x86.c | 11 ++++++++---
7 files changed, 24 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6c06f3d6e081..72e7bb48e7ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1671,6 +1671,10 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
}
+enum kvm_x86_run_flags {
+ KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
+};
+
struct kvm_x86_ops {
const char *name;
@@ -1752,7 +1756,7 @@ struct kvm_x86_ops {
int (*vcpu_pre_run)(struct kvm_vcpu *vcpu);
enum exit_fastpath_completion (*vcpu_run)(struct kvm_vcpu *vcpu,
- bool force_immediate_exit);
+ u64 run_flags);
int (*handle_exit)(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion exit_fastpath);
int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cc1c721ba067..c8b8a9947057 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4259,9 +4259,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_exit_irqoff();
}
-static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
- bool force_immediate_exit)
+static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
{
+ bool force_immediate_exit = run_flags & KVM_RUN_FORCE_IMMEDIATE_EXIT;
struct vcpu_svm *svm = to_svm(vcpu);
bool spec_ctrl_intercepted = msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 94d5d907d37b..a8e80d66e77a 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -176,12 +176,12 @@ static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
return vmx_vcpu_pre_run(vcpu);
}
-static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
+static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
{
if (is_td_vcpu(vcpu))
- return tdx_vcpu_run(vcpu, force_immediate_exit);
+ return tdx_vcpu_run(vcpu, run_flags);
- return vmx_vcpu_run(vcpu, force_immediate_exit);
+ return vmx_vcpu_run(vcpu, run_flags);
}
static int vt_handle_exit(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b952bc673271..7dbfad28debc 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1020,8 +1020,9 @@ static void tdx_load_host_xsave_state(struct kvm_vcpu *vcpu)
DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI | \
DEBUGCTLMSR_FREEZE_IN_SMM)
-fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
{
+ bool force_immediate_exit = run_flags & KVM_RUN_FORCE_IMMEDIATE_EXIT;
struct vcpu_tdx *tdx = to_tdx(vcpu);
struct vcpu_vt *vt = to_vt(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef2d7208dd20..609563da270c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7324,8 +7324,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
guest_state_exit_irqoff();
}
-fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
+fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
{
+ bool force_immediate_exit = run_flags & KVM_RUN_FORCE_IMMEDIATE_EXIT;
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long cr3, cr4;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 6bf8be570b2e..e1dfacd6b41f 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -21,7 +21,7 @@ void vmx_vm_destroy(struct kvm *kvm);
int vmx_vcpu_precreate(struct kvm *kvm);
int vmx_vcpu_create(struct kvm_vcpu *vcpu);
int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu);
-fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit);
+fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags);
void vmx_vcpu_free(struct kvm_vcpu *vcpu);
void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
@@ -132,7 +132,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
void tdx_vcpu_free(struct kvm_vcpu *vcpu);
void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu);
-fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit);
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags);
void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
void tdx_vcpu_put(struct kvm_vcpu *vcpu);
bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f6ce044b090a..45e8a9eb438a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10752,6 +10752,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
+ u64 run_flags;
bool req_immediate_exit = false;
@@ -10996,8 +10997,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
goto cancel_injection;
}
- if (req_immediate_exit)
+ run_flags = 0;
+ if (req_immediate_exit) {
+ run_flags |= KVM_RUN_FORCE_IMMEDIATE_EXIT;
kvm_make_request(KVM_REQ_EVENT, vcpu);
+ }
fpregs_assert_state_consistent();
if (test_thread_flag(TIF_NEED_FPU_LOAD))
@@ -11034,8 +11038,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) != kvm_vcpu_apicv_active(vcpu)) &&
(kvm_get_apic_mode(vcpu) != LAPIC_MODE_DISABLED));
- exit_fastpath = kvm_x86_call(vcpu_run)(vcpu,
- req_immediate_exit);
+ exit_fastpath = kvm_x86_call(vcpu_run)(vcpu, run_flags);
if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
break;
@@ -11047,6 +11050,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
break;
}
+ run_flags = 0;
+
/* Note, VM-Exits that go down the "slow" path are accounted below. */
++vcpu->stat.exits;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 2/5] KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag
2025-05-22 0:55 [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
2025-05-22 0:55 ` [PATCH v5 1/5] KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap Maxim Levitsky
@ 2025-05-22 0:55 ` Maxim Levitsky
2025-05-22 17:42 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2 Maxim Levitsky
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Maxim Levitsky @ 2025-05-22 0:55 UTC (permalink / raw)
To: kvm
Cc: H. Peter Anvin, Thomas Gleixner, Sean Christopherson, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Maxim Levitsky
From: Sean Christopherson <seanjc@google.com>
Instruct vendor code to load the guest's DR6 into hardware via a new
KVM_RUN flag, and remove kvm_x86_ops.set_dr6(), whose sole purpose was to
load vcpu->arch.dr6 into hardware when DR6 can be read/written directly
by the guest.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 10 ++++++----
arch/x86/kvm/vmx/main.c | 9 ---------
arch/x86/kvm/vmx/vmx.c | 9 +++------
arch/x86/kvm/x86.c | 2 +-
6 files changed, 11 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 79406bf07a1c..a2248817470c 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -49,7 +49,6 @@ KVM_X86_OP(set_idt)
KVM_X86_OP(get_gdt)
KVM_X86_OP(set_gdt)
KVM_X86_OP(sync_dirty_debug_regs)
-KVM_X86_OP(set_dr6)
KVM_X86_OP(set_dr7)
KVM_X86_OP(cache_reg)
KVM_X86_OP(get_rflags)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 72e7bb48e7ac..32ed568babcf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1673,6 +1673,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
+ KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
};
struct kvm_x86_ops {
@@ -1725,7 +1726,6 @@ struct kvm_x86_ops {
void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*sync_dirty_debug_regs)(struct kvm_vcpu *vcpu);
- void (*set_dr6)(struct kvm_vcpu *vcpu, unsigned long value);
void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c8b8a9947057..026b28051fff 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4308,10 +4308,13 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
svm_hv_update_vp_id(svm->vmcb, vcpu);
/*
- * Run with all-zero DR6 unless needed, so that we can get the exact cause
- * of a #DB.
+ * Run with all-zero DR6 unless the guest can write DR6 freely, so that
+ * KVM can get the exact cause of a #DB. Note, loading guest DR6 from
+ * KVM's snapshot is only necessary when DR accesses won't exit.
*/
- if (likely(!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)))
+ if (unlikely(run_flags & KVM_RUN_LOAD_GUEST_DR6))
+ svm_set_dr6(vcpu, vcpu->arch.dr6);
+ else if (likely(!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)))
svm_set_dr6(vcpu, DR6_ACTIVE_LOW);
clgi();
@@ -5119,7 +5122,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.set_idt = svm_set_idt,
.get_gdt = svm_get_gdt,
.set_gdt = svm_set_gdt,
- .set_dr6 = svm_set_dr6,
.set_dr7 = svm_set_dr7,
.sync_dirty_debug_regs = svm_sync_dirty_debug_regs,
.cache_reg = svm_cache_reg,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index a8e80d66e77a..28f854055e2c 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -498,14 +498,6 @@ static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
vmx_set_gdt(vcpu, dt);
}
-static void vt_set_dr6(struct kvm_vcpu *vcpu, unsigned long val)
-{
- if (is_td_vcpu(vcpu))
- return;
-
- vmx_set_dr6(vcpu, val);
-}
-
static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
{
if (is_td_vcpu(vcpu))
@@ -945,7 +937,6 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.set_idt = vt_set_idt,
.get_gdt = vt_get_gdt,
.set_gdt = vt_set_gdt,
- .set_dr6 = vt_set_dr6,
.set_dr7 = vt_set_dr7,
.sync_dirty_debug_regs = vt_sync_dirty_debug_regs,
.cache_reg = vt_cache_reg,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 609563da270c..9953de0cb32a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5611,12 +5611,6 @@ void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
set_debugreg(DR6_RESERVED, 6);
}
-void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val)
-{
- lockdep_assert_irqs_disabled();
- set_debugreg(vcpu->arch.dr6, 6);
-}
-
void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
{
vmcs_writel(GUEST_DR7, val);
@@ -7371,6 +7365,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);
vcpu->arch.regs_dirty = 0;
+ if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
+ set_debugreg(vcpu->arch.dr6, 6);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45e8a9eb438a..38875a38be52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11019,7 +11019,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(vcpu->arch.eff_db[3], 3);
/* When KVM_DEBUGREG_WONT_EXIT, dr6 is accessible in guest. */
if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT))
- kvm_x86_call(set_dr6)(vcpu, vcpu->arch.dr6);
+ run_flags |= KVM_RUN_LOAD_GUEST_DR6;
} else if (unlikely(hw_breakpoint_active())) {
set_debugreg(0, 7);
}
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2
2025-05-22 0:55 [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
2025-05-22 0:55 ` [PATCH v5 1/5] KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap Maxim Levitsky
2025-05-22 0:55 ` [PATCH v5 2/5] KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag Maxim Levitsky
@ 2025-05-22 0:55 ` Maxim Levitsky
2025-05-22 21:31 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 4/5] KVM: VMX: wrap guest access to IA32_DEBUGCTL with wrappers Maxim Levitsky
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Maxim Levitsky @ 2025-05-22 0:55 UTC (permalink / raw)
To: kvm
Cc: H. Peter Anvin, Thomas Gleixner, Sean Christopherson, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Maxim Levitsky, Chao Gao
Check the vmcs12 guest_ia32_debugctl value before loading it, to avoid L2
being able to load arbitrary values to hardware IA32_DEBUGCTL.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/kvm/vmx/nested.c | 3 ++-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/vmx/vmx.h | 1 +
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e073e3008b16..00f2b762710c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3146,7 +3146,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(vmcs12->guest_ia32_debugctl & ~vmx_get_supported_debugctl(vcpu, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9953de0cb32a..9046ee2e9a04 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2179,7 +2179,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 6d1e40ecc024..eb924c2acfd0 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,7 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
}
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 4/5] KVM: VMX: wrap guest access to IA32_DEBUGCTL with wrappers
2025-05-22 0:55 [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
` (2 preceding siblings ...)
2025-05-22 0:55 ` [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2 Maxim Levitsky
@ 2025-05-22 0:55 ` Maxim Levitsky
2025-05-22 17:47 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 5/5] KVM: VMX: preserve DEBUGCTLMSR_FREEZE_IN_SMM Maxim Levitsky
2025-05-22 18:06 ` [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Sean Christopherson
5 siblings, 1 reply; 15+ messages in thread
From: Maxim Levitsky @ 2025-05-22 0:55 UTC (permalink / raw)
To: kvm
Cc: H. Peter Anvin, Thomas Gleixner, Sean Christopherson, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Maxim Levitsky
Introduce two wrappers vmx_guest_debugctl_read and vmx_guest_debugctl_write
which read/write the GUEST_IA32_DEBUGCTL vmcs field.
In the next patch these wrappers will be used to hide the
DEBUGCTLMSR_FREEZE_IN_SMM bit from the guest.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/kvm/vmx/nested.c | 4 ++--
arch/x86/kvm/vmx/pmu_intel.c | 8 ++++----
arch/x86/kvm/vmx/vmx.c | 18 +++++++++++++++---
arch/x86/kvm/vmx/vmx.h | 2 ++
4 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 00f2b762710c..b505f3f7e9ab 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2653,7 +2653,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmx_guest_debugctl_write(vcpu, vmcs12->guest_ia32_debugctl);
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -4789,7 +4789,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
__vmx_set_segment(vcpu, &seg, VCPU_SREG_LDTR);
kvm_set_dr(vcpu, 7, 0x400);
- vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
+ vmx_guest_debugctl_write(vcpu, 0);
if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr,
vmcs12->vm_exit_msr_load_count))
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 8a94b52c5731..578b4ef58260 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -652,11 +652,11 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
*/
static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
{
- u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ u64 data = vmx_guest_debugctl_read();
if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
data &= ~DEBUGCTLMSR_LBR;
- vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ vmx_guest_debugctl_write(vcpu, data);
}
}
@@ -729,7 +729,7 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
if (!lbr_desc->event) {
vmx_disable_lbr_msrs_passthrough(vcpu);
- if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
+ if (vmx_guest_debugctl_read() & DEBUGCTLMSR_LBR)
goto warn;
if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
goto warn;
@@ -751,7 +751,7 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
{
- if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+ if (!(vmx_guest_debugctl_read() & DEBUGCTLMSR_LBR))
intel_pmu_release_guest_lbr_event(vcpu);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9046ee2e9a04..cfab76b40780 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2154,7 +2154,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
case MSR_IA32_DEBUGCTLMSR:
- msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ msr_info->data = vmx_guest_debugctl_read();
break;
default:
find_uret_msr:
@@ -2194,6 +2194,16 @@ u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
return debugctl;
}
+void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
+{
+ vmcs_write64(GUEST_IA32_DEBUGCTL, val);
+}
+
+u64 vmx_guest_debugctl_read(void)
+{
+ return vmcs_read64(GUEST_IA32_DEBUGCTL);
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2279,7 +2289,8 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
VM_EXIT_SAVE_DEBUG_CONTROLS)
get_vmcs12(vcpu)->guest_ia32_debugctl = data;
- vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ vmx_guest_debugctl_write(vcpu, data);
+
if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
(data & DEBUGCTLMSR_LBR))
intel_pmu_create_guest_lbr_event(vcpu);
@@ -4795,7 +4806,8 @@ static void init_vmcs(struct vcpu_vmx *vmx)
vmcs_write32(GUEST_SYSENTER_CS, 0);
vmcs_writel(GUEST_SYSENTER_ESP, 0);
vmcs_writel(GUEST_SYSENTER_EIP, 0);
- vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
+
+ vmx_guest_debugctl_write(&vmx->vcpu, 0);
if (cpu_has_vmx_tpr_shadow()) {
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index eb924c2acfd0..dffc0e18e5f7 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -415,6 +415,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val);
+u64 vmx_guest_debugctl_read(void);
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 5/5] KVM: VMX: preserve DEBUGCTLMSR_FREEZE_IN_SMM
2025-05-22 0:55 [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
` (3 preceding siblings ...)
2025-05-22 0:55 ` [PATCH v5 4/5] KVM: VMX: wrap guest access to IA32_DEBUGCTL with wrappers Maxim Levitsky
@ 2025-05-22 0:55 ` Maxim Levitsky
2025-05-22 17:54 ` Sean Christopherson
2025-05-22 18:06 ` [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Sean Christopherson
5 siblings, 1 reply; 15+ messages in thread
From: Maxim Levitsky @ 2025-05-22 0:55 UTC (permalink / raw)
To: kvm
Cc: H. Peter Anvin, Thomas Gleixner, Sean Christopherson, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Maxim Levitsky
Pass through the host's DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM to the guest
GUEST_IA32_DEBUGCTL without the guest seeing this value.
Since the value of the host DEBUGCTL can in theory change between VM runs,
check if has changed, and if yes, then reload the GUEST_IA32_DEBUGCTL with
the new value.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 6 +++++-
arch/x86/kvm/x86.c | 7 +++++--
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32ed568babcf..6bbde18a5783 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1674,6 +1674,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cfab76b40780..c70fe7cbede6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2196,12 +2196,13 @@ u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
}
/*
@@ -7380,6 +7381,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_guest_debugctl_write(vcpu, vmx_guest_debugctl_read());
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 38875a38be52..3663cd6721ae 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10752,7 +10752,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11024,7 +11024,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ debug_ctl = get_debugctlmsr();
+ if (!vcpu->arch.guest_state_protected && (debug_ctl != vcpu->arch.host_debugctl))
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v5 1/5] KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap
2025-05-22 0:55 ` [PATCH v5 1/5] KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap Maxim Levitsky
@ 2025-05-22 17:41 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 17:41 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini
On Wed, May 21, 2025, Maxim Levitsky wrote:
> ---
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b952bc673271..7dbfad28debc 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1020,8 +1020,9 @@ static void tdx_load_host_xsave_state(struct kvm_vcpu *vcpu)
> DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI | \
> DEBUGCTLMSR_FREEZE_IN_SMM)
>
> -fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> +fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> {
> + bool force_immediate_exit = run_flags & KVM_RUN_FORCE_IMMEDIATE_EXIT;
Talking to myself, but I think it makes sense to drop the local force_immediate_exit
entirely, specifically so that the WARN_ON_ONCE() can just yell on run_flags being
non-zero. All immediate usage of run_flags is mutually exclusive with TDX.
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3cfe89aad68e..9a758d8b38ea 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1018,20 +1018,20 @@ static void tdx_load_host_xsave_state(struct kvm_vcpu *vcpu)
DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI | \
DEBUGCTLMSR_FREEZE_IN_SMM)
-fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
{
struct vcpu_tdx *tdx = to_tdx(vcpu);
struct vcpu_vt *vt = to_vt(vcpu);
/*
- * force_immediate_exit requires vCPU entering for events injection with
- * an immediately exit followed. But The TDX module doesn't guarantee
- * entry, it's already possible for KVM to _think_ it completely entry
- * to the guest without actually having done so.
- * Since KVM never needs to force an immediate exit for TDX, and can't
- * do direct injection, just warn on force_immediate_exit.
+ * WARN if KVM wants to force an immediate exit, as the TDX module does
+ * not guarantee entry into the guest, i.e. it's possible for KVM to
+ * _think_ it completed entry to the guest and forced an immediate exit
+ * without actually having done so. Luckily, KVM never needs to force
+ * an immediate exit for TDX (KVM can't do direct event injection, so
+ * just WARN and continue on.
*/
- WARN_ON_ONCE(force_immediate_exit);
+ WARN_ON_ONCE(run_flags);
/*
* Wait until retry of SEPT-zap-related SEAMCALL completes before
@@ -1041,7 +1041,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
if (unlikely(READ_ONCE(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap)))
return EXIT_FASTPATH_EXIT_HANDLED;
- trace_kvm_entry(vcpu, force_immediate_exit);
+ trace_kvm_entry(vcpu, run_flags & KVM_RUN_FORCE_IMMEDIATE_EXIT);
if (pi_test_on(&vt->pi_desc)) {
apic->send_IPI_self(POSTED_INTR_VECTOR);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v5 2/5] KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag
2025-05-22 0:55 ` [PATCH v5 2/5] KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag Maxim Levitsky
@ 2025-05-22 17:42 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 17:42 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini
On Wed, May 21, 2025, Maxim Levitsky wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Instruct vendor code to load the guest's DR6 into hardware via a new
> KVM_RUN flag, and remove kvm_x86_ops.set_dr6(), whose sole purpose was to
> load vcpu->arch.dr6 into hardware when DR6 can be read/written directly
> by the guest.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 -
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/svm/svm.c | 10 ++++++----
> arch/x86/kvm/vmx/main.c | 9 ---------
> arch/x86/kvm/vmx/vmx.c | 9 +++------
> arch/x86/kvm/x86.c | 2 +-
> 6 files changed, 11 insertions(+), 22 deletions(-)
As alluded to in the previous patch, TDX should WARN, because guest DR6 is owned
by the TDX module (the KVM_DEBUGREG_AUTO_SWITCH guard prevents KVM_RUN_LOAD_GUEST_DR6
from being set).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 4/5] KVM: VMX: wrap guest access to IA32_DEBUGCTL with wrappers
2025-05-22 0:55 ` [PATCH v5 4/5] KVM: VMX: wrap guest access to IA32_DEBUGCTL with wrappers Maxim Levitsky
@ 2025-05-22 17:47 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 17:47 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini
This doesn't just wrap guest access, it wraps all access.
On Wed, May 21, 2025, Maxim Levitsky wrote:
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 00f2b762710c..b505f3f7e9ab 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2653,7 +2653,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> if (vmx->nested.nested_run_pending &&
> (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
> kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
> - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
> + vmx_guest_debugctl_write(vcpu, vmcs12->guest_ia32_debugctl);
> } else {
> kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
> vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
I think it makes sense to use the accessors for this case as well. Conceptually,
pre_vmenter_debugctl holds the guest value. The fact that it holds the combined
value _as written_ is rather subtle, and could change for the worse, e.g. it'd
be quite unfortunate/hilarious if someone converted the read path to use the
getter, and in doing so introduced a bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 5/5] KVM: VMX: preserve DEBUGCTLMSR_FREEZE_IN_SMM
2025-05-22 0:55 ` [PATCH v5 5/5] KVM: VMX: preserve DEBUGCTLMSR_FREEZE_IN_SMM Maxim Levitsky
@ 2025-05-22 17:54 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 17:54 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini
On Wed, May 21, 2025, Maxim Levitsky wrote:
> Pass through the host's DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> GUEST_IA32_DEBUGCTL without the guest seeing this value.
>
> Since the value of the host DEBUGCTL can in theory change between VM runs,
> check if has changed, and if yes, then reload the GUEST_IA32_DEBUGCTL with
> the new value.
>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 6 +++++-
> arch/x86/kvm/x86.c | 7 +++++--
> 3 files changed, 11 insertions(+), 3 deletions(-)
SVM and TDX definitely should WARN (though TDX can simply reuse the WARN on a
non-zero run_fags), if only to document that KVM isn't buggy.
> @@ -7380,6 +7381,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
> set_debugreg(vcpu->arch.dr6, 6);
>
> + if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
> + vmx_guest_debugctl_write(vcpu, vmx_guest_debugctl_read());
There's a rather amusing and subtle nested VMX bug. On a VM-Fail that is missed
by KVM, KVM will have done vcpu_enter_guest() => vmx_vcpu_run() with vmcs02,
i.e. will have updated the host_debugctl snapshot, but won't explicitly write
vmcs01 because nested_vmx_restore_host_state() doesn't emulate a VM-Exit (it mostly
restores state that KVM shoved into its software model).
I mention that here, because I was already wondering if it made sense to add a
helper to perform the VMWRITE if and only if necessary. I was leaning "no",
because for this path, it should always be necessary. But with the nested VM-Fail
path in play, it will often be unnecessary.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough
2025-05-22 0:55 [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
` (4 preceding siblings ...)
2025-05-22 0:55 ` [PATCH v5 5/5] KVM: VMX: preserve DEBUGCTLMSR_FREEZE_IN_SMM Maxim Levitsky
@ 2025-05-22 18:06 ` Sean Christopherson
2025-05-22 22:45 ` Sean Christopherson
5 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 18:06 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini
On Wed, May 21, 2025, Maxim Levitsky wrote:
> V5: addressed the review feedback. Thanks.
I'll send v6 later today. I want to include the TDX fix[*] in this series, and
hopefully we can save some back-and-forth on the series (I want to get this into
6.16-rc1). My plan is to shove this into a dedicated topic branch by end of week,
and then send a separate pull request for the branch sometime next week.
[*] https://lore.kernel.org/all/aC0IwYfNvuo_vUDU@google.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2
2025-05-22 0:55 ` [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2 Maxim Levitsky
@ 2025-05-22 21:31 ` Sean Christopherson
2025-05-22 22:44 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 21:31 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Chao Gao
On Wed, May 21, 2025, Maxim Levitsky wrote:
> Check the vmcs12 guest_ia32_debugctl value before loading it, to avoid L2
> being able to load arbitrary values to hardware IA32_DEBUGCTL.
>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
> arch/x86/kvm/vmx/nested.c | 3 ++-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> arch/x86/kvm/vmx/vmx.h | 1 +
> 3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index e073e3008b16..00f2b762710c 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3146,7 +3146,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> return -EINVAL;
>
> if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> - CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
> + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> + CC(vmcs12->guest_ia32_debugctl & ~vmx_get_supported_debugctl(vcpu, false))))
This is a breaking change. For better or worse (read: worse), KVM's ABI is to
drop BTF and LBR if they're unsupported (the former is always unsupported).
Failure to honor that ABI means L1 can't excplitly load what it think is its
current value into L2.
I'll slot in a path to provide another helper for checking the validity of
DEBUGCTL. I think I've managed to cobble together something that isn't too
horrific (options are a bit limited due to the existing ugliness).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2
2025-05-22 21:31 ` Sean Christopherson
@ 2025-05-22 22:44 ` Sean Christopherson
2025-06-04 14:02 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 22:44 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Chao Gao, Jim Mattson
+Jim
On Thu, May 22, 2025, Sean Christopherson wrote:
> On Wed, May 21, 2025, Maxim Levitsky wrote:
> > Check the vmcs12 guest_ia32_debugctl value before loading it, to avoid L2
> > being able to load arbitrary values to hardware IA32_DEBUGCTL.
> >
> > Reviewed-by: Chao Gao <chao.gao@intel.com>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> > arch/x86/kvm/vmx/nested.c | 3 ++-
> > arch/x86/kvm/vmx/vmx.c | 2 +-
> > arch/x86/kvm/vmx/vmx.h | 1 +
> > 3 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index e073e3008b16..00f2b762710c 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3146,7 +3146,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> > return -EINVAL;
> >
> > if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> > - CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
> > + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> > + CC(vmcs12->guest_ia32_debugctl & ~vmx_get_supported_debugctl(vcpu, false))))
>
> This is a breaking change. For better or worse (read: worse), KVM's ABI is to
> drop BTF and LBR if they're unsupported (the former is always unsupported).
> Failure to honor that ABI means L1 can't excplitly load what it think is its
> current value into L2.
>
> I'll slot in a path to provide another helper for checking the validity of
> DEBUGCTL. I think I've managed to cobble together something that isn't too
> horrific (options are a bit limited due to the existing ugliness).
And then Jim ruined my day. :-)
As evidenced by this hilarious KUT testcase, it's entirely possible there are
existing KVM guests that are utilizing/abusing DEBUGCTL features.
/*
* Optional RTM test for hardware that supports RTM, to
* demonstrate that the current volume 3 of the SDM
* (325384-067US), table 27-1 is incorrect. Bit 16 of the exit
* qualification for debug exceptions is not reserved. It is
* set to 1 if a debug exception (#DB) or a breakpoint
* exception (#BP) occurs inside an RTM region while advanced
* debugging of RTM transactional regions is enabled.
*/
if (this_cpu_has(X86_FEATURE_RTM)) {
vmcs_write(ENT_CONTROLS,
vmcs_read(ENT_CONTROLS) | ENT_LOAD_DBGCTLS);
/*
* Set DR7.RTM[bit 11] and IA32_DEBUGCTL.RTM[bit 15]
* in the guest to enable advanced debugging of RTM
* transactional regions.
*/
vmcs_write(GUEST_DR7, BIT(11));
vmcs_write(GUEST_DEBUGCTL, BIT(15));
single_step_guest("Hardware delivered single-step in "
"transactional region", starting_dr6, 0);
check_db_exit(false, false, false, &xbegin, BIT(16),
starting_dr6);
} else {
vmcs_write(GUEST_RIP, (u64)&skip_rtm);
enter_guest();
}
For RTM specifically, disallowing DEBUGCTL.RTM but allowing DR7.RTM seems a bit
silly. Unless there's a security concern, that can probably be fixed by adding
support for RTM. Note, there's also a virtualization hole here, as KVM doesn't
vet DR7 beyond checking that bits 63:32 are zero, i.e. a guest could set DR7.RTM
even if KVM doesn't advertise support. Of course, closing that hole would require
completely dropping support for disabling DR interception, since VMX doesn't
give per-DR controls.
For the other bits, I don't see a good solution. The only viable options I see
are to silently drop all unsupported bits (maybe with a quirk?), or enforce all
bits and cross our fingers that no L1 VMM is running guests with those bits set
in GUEST_DEBUGCTL.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough
2025-05-22 18:06 ` [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Sean Christopherson
@ 2025-05-22 22:45 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-05-22 22:45 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini
On Thu, May 22, 2025, Sean Christopherson wrote:
> On Wed, May 21, 2025, Maxim Levitsky wrote:
> > V5: addressed the review feedback. Thanks.
>
> I'll send v6 later today.
This ain't happening. See the conversation I'm having with myself in patch 3.
I do have a refreshed and tested (ignoring the nested mess) series prepped, i.e.
no need for you to work on a v6, I just won't get it posted today.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2
2025-05-22 22:44 ` Sean Christopherson
@ 2025-06-04 14:02 ` Sean Christopherson
0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-06-04 14:02 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, H. Peter Anvin, Thomas Gleixner, Dave Hansen,
Borislav Petkov, Ingo Molnar, linux-kernel, x86, Paolo Bonzini,
Chao Gao, Jim Mattson
On Thu, May 22, 2025, Sean Christopherson wrote:
> +Jim
>
> On Thu, May 22, 2025, Sean Christopherson wrote:
> > On Wed, May 21, 2025, Maxim Levitsky wrote:
> > > Check the vmcs12 guest_ia32_debugctl value before loading it, to avoid L2
> > > being able to load arbitrary values to hardware IA32_DEBUGCTL.
> > >
> > > Reviewed-by: Chao Gao <chao.gao@intel.com>
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > > arch/x86/kvm/vmx/nested.c | 3 ++-
> > > arch/x86/kvm/vmx/vmx.c | 2 +-
> > > arch/x86/kvm/vmx/vmx.h | 1 +
> > > 3 files changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index e073e3008b16..00f2b762710c 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -3146,7 +3146,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> > > return -EINVAL;
> > >
> > > if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> > > - CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
> > > + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> > > + CC(vmcs12->guest_ia32_debugctl & ~vmx_get_supported_debugctl(vcpu, false))))
> >
> > This is a breaking change. For better or worse (read: worse), KVM's ABI is to
> > drop BTF and LBR if they're unsupported (the former is always unsupported).
> > Failure to honor that ABI means L1 can't excplitly load what it think is its
> > current value into L2.
> >
> > I'll slot in a path to provide another helper for checking the validity of
> > DEBUGCTL. I think I've managed to cobble together something that isn't too
> > horrific (options are a bit limited due to the existing ugliness).
>
> And then Jim ruined my day. :-)
>
> As evidenced by this hilarious KUT testcase, it's entirely possible there are
> existing KVM guests that are utilizing/abusing DEBUGCTL features.
>
> /*
> * Optional RTM test for hardware that supports RTM, to
> * demonstrate that the current volume 3 of the SDM
> * (325384-067US), table 27-1 is incorrect. Bit 16 of the exit
> * qualification for debug exceptions is not reserved. It is
> * set to 1 if a debug exception (#DB) or a breakpoint
> * exception (#BP) occurs inside an RTM region while advanced
> * debugging of RTM transactional regions is enabled.
> */
> if (this_cpu_has(X86_FEATURE_RTM)) {
> vmcs_write(ENT_CONTROLS,
> vmcs_read(ENT_CONTROLS) | ENT_LOAD_DBGCTLS);
> /*
> * Set DR7.RTM[bit 11] and IA32_DEBUGCTL.RTM[bit 15]
> * in the guest to enable advanced debugging of RTM
> * transactional regions.
> */
> vmcs_write(GUEST_DR7, BIT(11));
> vmcs_write(GUEST_DEBUGCTL, BIT(15));
> single_step_guest("Hardware delivered single-step in "
> "transactional region", starting_dr6, 0);
> check_db_exit(false, false, false, &xbegin, BIT(16),
> starting_dr6);
> } else {
> vmcs_write(GUEST_RIP, (u64)&skip_rtm);
> enter_guest();
> }
>
> For RTM specifically, disallowing DEBUGCTL.RTM but allowing DR7.RTM seems a bit
> silly. Unless there's a security concern, that can probably be fixed by adding
> support for RTM. Note, there's also a virtualization hole here, as KVM doesn't
> vet DR7 beyond checking that bits 63:32 are zero, i.e. a guest could set DR7.RTM
> even if KVM doesn't advertise support. Of course, closing that hole would require
> completely dropping support for disabling DR interception, since VMX doesn't
> give per-DR controls.
>
> For the other bits, I don't see a good solution. The only viable options I see
> are to silently drop all unsupported bits (maybe with a quirk?), or enforce all
> bits and cross our fingers that no L1 VMM is running guests with those bits set
> in GUEST_DEBUGCTL.
Paolo and I discussed this in PUCK this morning.
We agree trying to close the DR7 virtualization hole would be futile, and that we
should add support for DEBUGCTL.RTM to avoid breaking use of that specific bit.
For the other DEBUGCTL bits, none of them actually work (although, somewhat
amusingly, FREEZE_WHILE_SMM would "work" for real SMIs, which aren't visible to
L1), so we're going to roll the dice, cross our fingers that no existing workload
is setting those bits only in vmcs12.GUEST_DEBUGCTL, and enforce
vmx_get_supported_debugctl() at VM-Enter without any quirk.
6 TR: Setting this bit to 1 enables branch trace messages to be sent.
7 BTS: Setting this bit enables branch trace messages (BTMs) to be logged in a BTS buffer.
8 BTINT: When clear, BTMs are logged in a BTS buffer in circular fashion.
9 BTS_OFF_OS: When set, BTS or BTM is skipped if CPL = 0.
10 BTS_OFF_USR: When set, BTS or BTM is skipped if CPL > 0.
11 FREEZE_LBRS_ON_PMI: When set, the LBR stack is frozen on a PMI request.
12 FREEZE_PERFMON_ON_PMI: When set, each ENABLE bit of the global counter control MSR are frozen on a PMI request.
13 ENABLE_UNCORE_PMI: When set, enables the logical processor to receive and generate PMI on behalf of the uncore.
14 FREEZE_WHILE_SMM: When set, freezes perfmon and trace messages while in SMM.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-04 14:02 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 0:55 [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
2025-05-22 0:55 ` [PATCH v5 1/5] KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap Maxim Levitsky
2025-05-22 17:41 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 2/5] KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag Maxim Levitsky
2025-05-22 17:42 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value given by L2 Maxim Levitsky
2025-05-22 21:31 ` Sean Christopherson
2025-05-22 22:44 ` Sean Christopherson
2025-06-04 14:02 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 4/5] KVM: VMX: wrap guest access to IA32_DEBUGCTL with wrappers Maxim Levitsky
2025-05-22 17:47 ` Sean Christopherson
2025-05-22 0:55 ` [PATCH v5 5/5] KVM: VMX: preserve DEBUGCTLMSR_FREEZE_IN_SMM Maxim Levitsky
2025-05-22 17:54 ` Sean Christopherson
2025-05-22 18:06 ` [PATCH v5 0/5] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Sean Christopherson
2025-05-22 22:45 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).