* [PATCH v2 0/5] KVM: SVM: Fix DEBUGCTL bugs
@ 2025-02-27 1:13 Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 1/5] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 1:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
whanos
Fix a long-lurking bug in SVM where KVM runs the guest with the host's
DEBUGCTL if LBR virtualization is disabled. AMD CPUs rather stupidly
context switch DEBUGCTL if and only if LBR virtualization is enabled (not
just supported, but fully enabled).
The bug has gone unnoticed because until recently, the only bits that
KVM would leave set were things like BTF, which are guest visible but
won't cause functional problems unless guest software is being especially
particular about #DBs.
The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
as the resulting #DBs due to split-lock accesses in guest userspace (lol
Steam) get reflected into the guest by KVM.
v2:
- Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
- Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
- Collect a review. [Xiaoyao]
- Make bits 5:3 fully reserved, in a separate not-for-stable patch.
v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
Sean Christopherson (5):
KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is
disabled
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: SVM: Treat DEBUGCTL[5:2] as reserved
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/svm.c | 15 +++++++++++++++
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 8 ++------
arch/x86/kvm/vmx/vmx.h | 2 --
arch/x86/kvm/x86.c | 2 ++
6 files changed, 21 insertions(+), 9 deletions(-)
base-commit: fed48e2967f402f561d80075a20c5c9e16866e53
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/5] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
2025-02-27 1:13 [PATCH v2 0/5] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
@ 2025-02-27 1:13 ` Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 2/5] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 1:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
whanos
Drop bits 5:2 from the guest's effective DEBUGCTL value, as AMD changed
the architectural behavior of the bits and broke backwards compatibility.
On CPUs without BusLockTrap (or at least, in APMs from before ~2023),
bits 5:2 controlled the behavior of external pins:
Performance-Monitoring/Breakpoint Pin-Control (PBi)—Bits 5:2, read/write.
Software uses thesebits to control the type of information reported by
the four external performance-monitoring/breakpoint pins on the
processor. When a PBi bit is cleared to 0, the corresponding external pin
(BPi) reports performance-monitor information. When a PBi bit is set to
1, the corresponding external pin (BPi) reports breakpoint information.
With the introduction of BusLockTrap, presumably to be compatible with
Intel CPUs, AMD redefined bit 2 to be BLCKDB:
Bus Lock #DB Trap (BLCKDB)—Bit 2, read/write. Software sets this bit to
enable generation of a #DB trap following successful execution of a bus
lock when CPL is > 0.
and redefined bits 5:3 (and bit 6) as "6:3 Reserved MBZ".
Ideally, KVM would treat bits 5:2 as reserved. Defer that change to a
feature cleanup to avoid breaking existing guest in LTS kernels. For now,
drop the bits to retain backwards compatibility (of a sort).
Note, dropping bits 5:2 is still a guest-visible change, e.g. if the guest
is enabling LBRs *and* the legacy PBi bits, then the state of the PBi bits
is visible to the guest, whereas now the guest will always see '0'.
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 12 ++++++++++++
arch/x86/kvm/svm/svm.h | 2 +-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b8aa0f36850f..2280bd1d0863 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3165,6 +3165,18 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
kvm_pr_unimpl_wrmsr(vcpu, ecx, data);
break;
}
+
+ /*
+ * AMD changed the architectural behavior of bits 5:2. On CPUs
+ * without BusLockTrap, bits 5:2 control "external pins", but
+ * on CPUs that support BusLockDetect, bit 2 enables BusLockTrap
+ * and bits 5:3 are reserved-to-zero. Sadly, old KVM allowed
+ * the guest to set bits 5:2 despite not actually virtualizing
+ * Performance-Monitoring/Breakpoint external pins. Drop bits
+ * 5:2 for backwards compatibility.
+ */
+ data &= ~GENMASK(5, 2);
+
if (data & DEBUGCTL_RESERVED_BITS)
return 1;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5b159f017055..f573548b7b41 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -582,7 +582,7 @@ static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
/* svm.c */
#define MSR_INVALID 0xffffffffU
-#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
+#define DEBUGCTL_RESERVED_BITS (~(DEBUGCTLMSR_BTF | DEBUGCTLMSR_LBR))
extern bool dump_invalid_vmcb;
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/5] KVM: x86: Snapshot the host's DEBUGCTL in common x86
2025-02-27 1:13 [PATCH v2 0/5] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 1/5] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
@ 2025-02-27 1:13 ` Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled Sean Christopherson
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 1:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
whanos
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 8 ++------
arch/x86/kvm/vmx/vmx.h | 2 --
arch/x86/kvm/x86.c | 1 +
4 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3506f497741b..02bffe6b54c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -781,6 +781,7 @@ struct kvm_vcpu_arch {
u32 pkru;
u32 hflags;
u64 efer;
+ u64 host_debugctl;
u64 apic_base;
struct kvm_lapic *apic; /* kernel irqchip context */
bool load_eoi_exitmap_pending;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b71392989609..729c224b72dd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
*/
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- struct vcpu_vmx *vmx = to_vmx(vcpu);
-
if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
shrink_ple_window(vcpu);
vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
vmx_vcpu_pi_load(vcpu, cpu);
-
- vmx->host_debugctlmsr = get_debugctlmsr();
}
void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
}
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
- if (vmx->host_debugctlmsr)
- update_debugctlmsr(vmx->host_debugctlmsr);
+ if (vcpu->arch.host_debugctl)
+ update_debugctlmsr(vcpu->arch.host_debugctl);
#ifndef CONFIG_X86_64
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8b111ce1087c..951e44dc9d0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,8 +340,6 @@ struct vcpu_vmx {
/* apic deadline value in host tsc */
u64 hv_deadline_tsc;
- unsigned long host_debugctlmsr;
-
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
* msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58b82d6fd77c..09c3d27cc01a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4991,6 +4991,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
+ vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 1:13 [PATCH v2 0/5] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 1/5] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 2/5] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
@ 2025-02-27 1:13 ` Sean Christopherson
2025-02-27 13:59 ` Ravi Bangoria
2025-02-27 1:13 ` [PATCH v2 4/5] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 5/5] KVM: SVM: Treat DEBUGCTL[5:2] as reserved Sean Christopherson
4 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 1:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
whanos
Manually load the guest's DEBUGCTL prior to VMRUN (and restore the host's
value on #VMEXIT) if it diverges from the host's value and LBR
virtualization is disabled, as hardware only context switches DEBUGCTL if
LBR virtualization is fully enabled. Running the guest with the host's
value has likely been mildly problematic for quite some time, e.g. it will
result in undesirable behavior if BTF diverges.
But the bug became fatal with the introduction of Bus Lock Trap ("Detect"
in kernel paralance) support for AMD (commit 408eb7417a92
("x86/bus_lock: Add support for AMD")), as a bus lock in the guest will
trigger an unexpected #DB.
Note, suppressing the bus lock #DB, i.e. simply resuming the guest without
injecting a #DB, is not an option. It wouldn't address the general issue
with DEBUGCTL, e.g. for things like BTF, and there are other guest-visible
side effects if BusLockTrap is left enabled.
If BusLockTrap is disabled, then DR6.BLD is reserved-to-1; any attempts to
clear it by software are ignored. But if BusLockTrap is enabled, software
can clear DR6.BLD:
Software enables bus lock trap by setting DebugCtl MSR[BLCKDB] (bit 2)
to 1. When bus lock trap is enabled, ... The processor indicates that
this #DB was caused by a bus lock by clearing DR6[BLD] (bit 11). DR6[11]
previously had been defined to be always 1.
and clearing DR6.BLD is "sticky" in that it's not set (i.e. lowered) by
other #DBs:
All other #DB exceptions leave DR6[BLD] unmodified
E.g. leaving BusLockTrap enable can confuse a legacy guest that writes '0'
to reset DR6.
Reported-by: rangemachine@gmail.com
Reported-by: whanos@sergal.fun
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219787
Closes: https://lore.kernel.org/all/bug-219787-28872@https.bugzilla.kernel.org%2F
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2280bd1d0863..3924b9b198f4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4265,6 +4265,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
clgi();
kvm_load_guest_xsave_state(vcpu);
+ /*
+ * Hardware only context switches DEBUGCTL if LBR virtualization is
+ * enabled. Manually load DEBUGCTL if necessary (and restore it after
+ * VM-Exit), as running with the host's DEBUGCTL can negatively affect
+ * guest state and can even be fatal, e.g. due to Bus Lock Detect.
+ */
+ if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+ vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
+ update_debugctlmsr(0);
+
kvm_wait_lapic_expire(vcpu);
/*
@@ -4292,6 +4302,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
+ if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+ vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
+ update_debugctlmsr(vcpu->arch.host_debugctl);
+
kvm_load_host_xsave_state(vcpu);
stgi();
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/5] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
2025-02-27 1:13 [PATCH v2 0/5] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
` (2 preceding siblings ...)
2025-02-27 1:13 ` [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled Sean Christopherson
@ 2025-02-27 1:13 ` Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 5/5] KVM: SVM: Treat DEBUGCTL[5:2] as reserved Sean Christopherson
4 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 1:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
whanos
Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single(). Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09c3d27cc01a..a2cd734beef5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4991,7 +4991,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
- vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
@@ -10984,6 +10983,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
+ vcpu->arch.host_debugctl = get_debugctlmsr();
+
guest_timing_enter_irqoff();
for (;;) {
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 5/5] KVM: SVM: Treat DEBUGCTL[5:2] as reserved
2025-02-27 1:13 [PATCH v2 0/5] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
` (3 preceding siblings ...)
2025-02-27 1:13 ` [PATCH v2 4/5] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
@ 2025-02-27 1:13 ` Sean Christopherson
4 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 1:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
whanos
Stop ignoring DEBUGCTL[5:2] on AMD CPUs and instead treat them as reserved.
KVM has never properly virtualized AMD's legacy PBi bits, but did allow
the guest (and host userspace) to set the bits. To avoid breaking guests
when running on CPUs with BusLockTrap, which redefined bit 2 to BLCKDB and
made bits 5:3 reserved, a previous KVM change ignored bits 5:3, e.g. so
that legacy guest software wouldn't inadvertently enable BusLockTrap or
hit a VMRUN failure due to setting reserved.
To allow for virtualizing BusLockTrap and whatever future features may use
bits 5:3, treat bits 5:2 as reserved (and hope that doing so doesn't break
any existing guests).
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3924b9b198f4..7fc99c30d2cc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3166,17 +3166,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
break;
}
- /*
- * AMD changed the architectural behavior of bits 5:2. On CPUs
- * without BusLockTrap, bits 5:2 control "external pins", but
- * on CPUs that support BusLockDetect, bit 2 enables BusLockTrap
- * and bits 5:3 are reserved-to-zero. Sadly, old KVM allowed
- * the guest to set bits 5:2 despite not actually virtualizing
- * Performance-Monitoring/Breakpoint external pins. Drop bits
- * 5:2 for backwards compatibility.
- */
- data &= ~GENMASK(5, 2);
-
if (data & DEBUGCTL_RESERVED_BITS)
return 1;
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 1:13 ` [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled Sean Christopherson
@ 2025-02-27 13:59 ` Ravi Bangoria
2025-02-27 14:09 ` Ravi Bangoria
2025-02-27 14:29 ` Sean Christopherson
0 siblings, 2 replies; 13+ messages in thread
From: Ravi Bangoria @ 2025-02-27 13:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
whanos, Ravi Bangoria
Hi Sean,
> @@ -4265,6 +4265,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
> clgi();
> kvm_load_guest_xsave_state(vcpu);
>
> + /*
> + * Hardware only context switches DEBUGCTL if LBR virtualization is
> + * enabled. Manually load DEBUGCTL if necessary (and restore it after
> + * VM-Exit), as running with the host's DEBUGCTL can negatively affect
> + * guest state and can even be fatal, e.g. due to Bus Lock Detect.
> + */
> + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
> + vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
> + update_debugctlmsr(0);
^^^^^^^^^^^^^^^^^^^^^
You mean:
update_debugctlmsr(svm->vmcb->save.dbgctl);
?
Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF]
on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when
LBR virtualization is disabled), it's KVM's responsibility to clear
DEBUGCTL[BTF].
---
@@ -2090,6 +2090,14 @@ static int db_interception(struct kvm_vcpu *vcpu)
(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) &&
!svm->nmi_singlestep) {
u32 payload = svm->vmcb->save.dr6 ^ DR6_ACTIVE_LOW;
+
+ /*
+ * CPU automatically clears DEBUGCTL[BTF] on #DB exception.
+ * Simulate it when DEBUGCTL isn't auto save/restored.
+ */
+ if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK))
+ svm->vmcb->save.dbgctl &= ~0x2;
+
kvm_queue_exception_p(vcpu, DB_VECTOR, payload);
return 1;
}
---
Thanks,
Ravi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 13:59 ` Ravi Bangoria
@ 2025-02-27 14:09 ` Ravi Bangoria
2025-02-27 14:30 ` Sean Christopherson
2025-02-27 17:20 ` Sean Christopherson
2025-02-27 14:29 ` Sean Christopherson
1 sibling, 2 replies; 13+ messages in thread
From: Ravi Bangoria @ 2025-02-27 14:09 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
whanos, Ravi Bangoria
> Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF]
> on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when
> LBR virtualization is disabled), it's KVM's responsibility to clear
> DEBUGCTL[BTF].
Found this with below KUT test.
(I wasn't sure whether I should send a separate series for kernel fix + KUT
patch, or you can squash kernel fix in your patch and I shall send only KUT
patch. So for now, sending it as a reply here.)
---
diff --git a/x86/debug.c b/x86/debug.c
index f493567c..2d204c63 100644
--- a/x86/debug.c
+++ b/x86/debug.c
@@ -409,6 +409,45 @@ static noinline unsigned long singlestep_with_sti_hlt(void)
return start_rip;
}
+static noinline unsigned long __run_basic_block_ss_test(void)
+{
+ unsigned long start_rip;
+
+ wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_BTF);
+
+ asm volatile(
+ "pushf\n\t"
+ "pop %%rax\n\t"
+ "or $(1<<8),%%rax\n\t"
+ "push %%rax\n\t"
+ "popf\n\t"
+ "1: nop\n\t"
+ "jmp 2f\n\t"
+ "nop\n\t"
+ "2: lea 1b(%%rip), %0\n\t"
+ : "=r" (start_rip) : : "rax"
+ );
+
+ return start_rip;
+}
+
+static void run_basic_block_ss_test(void)
+{
+ unsigned long jmp_target;
+ unsigned long debugctl;
+
+ write_dr6(0);
+ jmp_target = __run_basic_block_ss_test() + 4;
+
+ report(is_single_step_db(dr6[0]) && db_addr[0] == jmp_target,
+ "Basic Block Single-step #DB: 0x%lx == 0x%lx", db_addr[0],
+ jmp_target);
+
+ debugctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
+ /* CPU should automatically clear DEBUGCTL[BTF] on #DB exception */
+ report(debugctl == 0, "DebugCtl[BTF] reset post #DB. 0x%lx", debugctl);
+}
+
int main(int ac, char **av)
{
unsigned long cr4;
@@ -475,6 +514,12 @@ int main(int ac, char **av)
run_ss_db_test(singlestep_with_movss_blocking_and_dr7_gd);
run_ss_db_test(singlestep_with_sti_hlt);
+ /* Seems DEBUGCTL[BTF] is not supported on Intel. Run it only on AMD */
+ if (this_cpu_has(X86_FEATURE_SVM)) {
+ n = 0;
+ run_basic_block_ss_test();
+ }
+
n = 0;
write_dr1((void *)&value);
write_dr6(DR6_BS);
---
Thanks,
Ravi
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 13:59 ` Ravi Bangoria
2025-02-27 14:09 ` Ravi Bangoria
@ 2025-02-27 14:29 ` Sean Christopherson
2025-02-27 14:44 ` Ravi Bangoria
1 sibling, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 14:29 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
whanos
On Thu, Feb 27, 2025, Ravi Bangoria wrote:
> Hi Sean,
>
> > @@ -4265,6 +4265,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
> > clgi();
> > kvm_load_guest_xsave_state(vcpu);
> >
> > + /*
> > + * Hardware only context switches DEBUGCTL if LBR virtualization is
> > + * enabled. Manually load DEBUGCTL if necessary (and restore it after
> > + * VM-Exit), as running with the host's DEBUGCTL can negatively affect
> > + * guest state and can even be fatal, e.g. due to Bus Lock Detect.
> > + */
> > + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
> > + vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
> > + update_debugctlmsr(0);
>
> ^^^^^^^^^^^^^^^^^^^^^
> You mean:
> update_debugctlmsr(svm->vmcb->save.dbgctl);
> ?
Argh, yes.
> Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF]
> on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when
> LBR virtualization is disabled), it's KVM's responsibility to clear
> DEBUGCTL[BTF].
> ---
> @@ -2090,6 +2090,14 @@ static int db_interception(struct kvm_vcpu *vcpu)
> (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) &&
> !svm->nmi_singlestep) {
> u32 payload = svm->vmcb->save.dr6 ^ DR6_ACTIVE_LOW;
> +
> + /*
> + * CPU automatically clears DEBUGCTL[BTF] on #DB exception.
> + * Simulate it when DEBUGCTL isn't auto save/restored.
> + */
> + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK))
> + svm->vmcb->save.dbgctl &= ~0x2;
Any reason not to clear is unconditionally?
svm->vmcb->save.dbgctl &= ~DEBUGCTLMSR_BTF;
> kvm_queue_exception_p(vcpu, DB_VECTOR, payload);
> return 1;
> }
> ---
>
> Thanks,
> Ravi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 14:09 ` Ravi Bangoria
@ 2025-02-27 14:30 ` Sean Christopherson
2025-02-27 17:20 ` Sean Christopherson
1 sibling, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 14:30 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
whanos
On Thu, Feb 27, 2025, Ravi Bangoria wrote:
> > Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF]
> > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when
> > LBR virtualization is disabled), it's KVM's responsibility to clear
> > DEBUGCTL[BTF].
>
> Found this with below KUT test.
>
> (I wasn't sure whether I should send a separate series for kernel fix + KUT
> patch, or you can squash kernel fix in your patch and I shall send only KUT
> patch. So for now, sending it as a reply here.)
Go ahead and send the KUT test. They two repositories evolve independently no
matter the order, just put a Link to lore of the kernel fix/discussion.
Thanks a ton for writing a test!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 14:29 ` Sean Christopherson
@ 2025-02-27 14:44 ` Ravi Bangoria
0 siblings, 0 replies; 13+ messages in thread
From: Ravi Bangoria @ 2025-02-27 14:44 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
whanos, Ravi Bangoria
>> Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF]
>> on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when
>> LBR virtualization is disabled), it's KVM's responsibility to clear
>> DEBUGCTL[BTF].
>> ---
>> @@ -2090,6 +2090,14 @@ static int db_interception(struct kvm_vcpu *vcpu)
>> (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) &&
>> !svm->nmi_singlestep) {
>> u32 payload = svm->vmcb->save.dr6 ^ DR6_ACTIVE_LOW;
>> +
>> + /*
>> + * CPU automatically clears DEBUGCTL[BTF] on #DB exception.
>> + * Simulate it when DEBUGCTL isn't auto save/restored.
>> + */
>> + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK))
>> + svm->vmcb->save.dbgctl &= ~0x2;
>
> Any reason not to clear is unconditionally?
>
> svm->vmcb->save.dbgctl &= ~DEBUGCTLMSR_BTF;
No particular reason, just that HW would have already done it when LBRV
is enabled.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 14:09 ` Ravi Bangoria
2025-02-27 14:30 ` Sean Christopherson
@ 2025-02-27 17:20 ` Sean Christopherson
2025-02-27 17:55 ` Sean Christopherson
1 sibling, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 17:20 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
whanos
On Thu, Feb 27, 2025, Ravi Bangoria wrote:
> > Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF]
> > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when
> > LBR virtualization is disabled), it's KVM's responsibility to clear
> > DEBUGCTL[BTF].
>
> Found this with below KUT test.
>
> (I wasn't sure whether I should send a separate series for kernel fix + KUT
> patch, or you can squash kernel fix in your patch and I shall send only KUT
> patch. So for now, sending it as a reply here.)
Actualy, I'll post this along with some other cleanups to the test, and a fix
for Intel if needed (it _should_ pass on Intel). All the open-coded EFLAGS.TF
literals can be replaced, and clobbering arithmetic flags with SS is really, really,
gross.
> ---
> diff --git a/x86/debug.c b/x86/debug.c
> index f493567c..2d204c63 100644
> --- a/x86/debug.c
> +++ b/x86/debug.c
> @@ -409,6 +409,45 @@ static noinline unsigned long singlestep_with_sti_hlt(void)
> return start_rip;
> }
>
> +static noinline unsigned long __run_basic_block_ss_test(void)
> +{
> + unsigned long start_rip;
> +
> + wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_BTF);
> +
> + asm volatile(
> + "pushf\n\t"
> + "pop %%rax\n\t"
> + "or $(1<<8),%%rax\n\t"
> + "push %%rax\n\t"
> + "popf\n\t"
> + "1: nop\n\t"
> + "jmp 2f\n\t"
> + "nop\n\t"
> + "2: lea 1b(%%rip), %0\n\t"
> + : "=r" (start_rip) : : "rax"
> + );
> +
> + return start_rip;
> +}
> +
> +static void run_basic_block_ss_test(void)
> +{
> + unsigned long jmp_target;
> + unsigned long debugctl;
> +
> + write_dr6(0);
> + jmp_target = __run_basic_block_ss_test() + 4;
> +
> + report(is_single_step_db(dr6[0]) && db_addr[0] == jmp_target,
> + "Basic Block Single-step #DB: 0x%lx == 0x%lx", db_addr[0],
> + jmp_target);
> +
> + debugctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
> + /* CPU should automatically clear DEBUGCTL[BTF] on #DB exception */
> + report(debugctl == 0, "DebugCtl[BTF] reset post #DB. 0x%lx", debugctl);
> +}
> +
> int main(int ac, char **av)
> {
> unsigned long cr4;
> @@ -475,6 +514,12 @@ int main(int ac, char **av)
> run_ss_db_test(singlestep_with_movss_blocking_and_dr7_gd);
> run_ss_db_test(singlestep_with_sti_hlt);
>
> + /* Seems DEBUGCTL[BTF] is not supported on Intel. Run it only on AMD */
> + if (this_cpu_has(X86_FEATURE_SVM)) {
> + n = 0;
> + run_basic_block_ss_test();
> + }
> +
> n = 0;
> write_dr1((void *)&value);
> write_dr6(DR6_BS);
> ---
>
> Thanks,
> Ravi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
2025-02-27 17:20 ` Sean Christopherson
@ 2025-02-27 17:55 ` Sean Christopherson
0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2025-02-27 17:55 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
whanos
On Thu, Feb 27, 2025, Sean Christopherson wrote:
> On Thu, Feb 27, 2025, Ravi Bangoria wrote:
> > > Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF]
> > > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when
> > > LBR virtualization is disabled), it's KVM's responsibility to clear
> > > DEBUGCTL[BTF].
> >
> > Found this with below KUT test.
> >
> > (I wasn't sure whether I should send a separate series for kernel fix + KUT
> > patch, or you can squash kernel fix in your patch and I shall send only KUT
> > patch. So for now, sending it as a reply here.)
>
> Actualy, I'll post this along with some other cleanups to the test, and a fix
> for Intel if needed (it _should_ pass on Intel).
*sigh*
I forgot that KVM doesn't actually support DEBUGCTL_BTF. VMX drops the flag
entirely, SVM doesn't clear BTF on #DB, the emulator doesn't honor it, it doesn't
play nice KVM_GUESTDBG_SINGLESTEP, and who knows what else.
I could hack in enough support to get it limping, but I most definitely don't want
to do that for an LTS backport. The only way it has worked in any capacity on AMD
is if the guest happened to enable LBRs at the same time. So rather than trying
to go straight to a half-baked implementation, I think the least awful option is
to give SVM the same treatment and explicitly squash BTF. And then bribe someone
to put in the effort to get it fully functional (or at least, as close to fully
functional as we can get it).
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-02-27 17:55 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-27 1:13 [PATCH v2 0/5] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 1/5] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 2/5] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 3/5] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled Sean Christopherson
2025-02-27 13:59 ` Ravi Bangoria
2025-02-27 14:09 ` Ravi Bangoria
2025-02-27 14:30 ` Sean Christopherson
2025-02-27 17:20 ` Sean Christopherson
2025-02-27 17:55 ` Sean Christopherson
2025-02-27 14:29 ` Sean Christopherson
2025-02-27 14:44 ` Ravi Bangoria
2025-02-27 1:13 ` [PATCH v2 4/5] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
2025-02-27 1:13 ` [PATCH v2 5/5] KVM: SVM: Treat DEBUGCTL[5:2] as reserved Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox