* [PATCH 1/3] KVM: x86: Snapshot the host's DEBUGCTL in common x86
2025-02-24 18:13 [PATCH 0/3] KVM: SVM: Zero DEBUGCTL before VMRUN if necessary Sean Christopherson
@ 2025-02-24 18:13 ` Sean Christopherson
2025-02-25 16:19 ` Xiaoyao Li
2025-02-24 18:13 ` [PATCH 2/3] KVM: SVM: Manually zero/restore DEBUGCTL if LBR virtualization is disabled Sean Christopherson
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-02-24 18:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, rangemachine, whanos, Ravi Bangoria,
Peter Zijlstra
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 8 ++------
arch/x86/kvm/vmx/vmx.h | 2 --
arch/x86/kvm/x86.c | 1 +
4 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3506f497741b..02bffe6b54c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -781,6 +781,7 @@ struct kvm_vcpu_arch {
u32 pkru;
u32 hflags;
u64 efer;
+ u64 host_debugctl;
u64 apic_base;
struct kvm_lapic *apic; /* kernel irqchip context */
bool load_eoi_exitmap_pending;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b71392989609..729c224b72dd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
*/
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- struct vcpu_vmx *vmx = to_vmx(vcpu);
-
if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
shrink_ple_window(vcpu);
vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
vmx_vcpu_pi_load(vcpu, cpu);
-
- vmx->host_debugctlmsr = get_debugctlmsr();
}
void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
}
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
- if (vmx->host_debugctlmsr)
- update_debugctlmsr(vmx->host_debugctlmsr);
+ if (vcpu->arch.host_debugctl)
+ update_debugctlmsr(vcpu->arch.host_debugctl);
#ifndef CONFIG_X86_64
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8b111ce1087c..951e44dc9d0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,8 +340,6 @@ struct vcpu_vmx {
/* apic deadline value in host tsc */
u64 hv_deadline_tsc;
- unsigned long host_debugctlmsr;
-
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
* msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58b82d6fd77c..09c3d27cc01a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4991,6 +4991,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
+ vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 1/3] KVM: x86: Snapshot the host's DEBUGCTL in common x86
2025-02-24 18:13 ` [PATCH 1/3] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
@ 2025-02-25 16:19 ` Xiaoyao Li
0 siblings, 0 replies; 8+ messages in thread
From: Xiaoyao Li @ 2025-02-25 16:19 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, rangemachine, whanos, Ravi Bangoria,
Peter Zijlstra
On 2/25/2025 2:13 AM, Sean Christopherson wrote:
> Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
> common x86, so that SVM can also use the snapshot.
>
> Opportunistically change the field to a u64. While bits 63:32 are reserved
> on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
> long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 8 ++------
> arch/x86/kvm/vmx/vmx.h | 2 --
> arch/x86/kvm/x86.c | 1 +
> 4 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3506f497741b..02bffe6b54c8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -781,6 +781,7 @@ struct kvm_vcpu_arch {
> u32 pkru;
> u32 hflags;
> u64 efer;
> + u64 host_debugctl;
> u64 apic_base;
> struct kvm_lapic *apic; /* kernel irqchip context */
> bool load_eoi_exitmap_pending;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b71392989609..729c224b72dd 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> */
> void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> {
> - struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
> shrink_ple_window(vcpu);
>
> vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
>
> vmx_vcpu_pi_load(vcpu, cpu);
> -
> - vmx->host_debugctlmsr = get_debugctlmsr();
> }
>
> void vmx_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> }
>
> /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
> - if (vmx->host_debugctlmsr)
> - update_debugctlmsr(vmx->host_debugctlmsr);
> + if (vcpu->arch.host_debugctl)
> + update_debugctlmsr(vcpu->arch.host_debugctl);
>
> #ifndef CONFIG_X86_64
> /*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 8b111ce1087c..951e44dc9d0e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -340,8 +340,6 @@ struct vcpu_vmx {
> /* apic deadline value in host tsc */
> u64 hv_deadline_tsc;
>
> - unsigned long host_debugctlmsr;
> -
> /*
> * Only bits masked by msr_ia32_feature_control_valid_bits can be set in
> * msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 58b82d6fd77c..09c3d27cc01a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4991,6 +4991,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
> /* Save host pkru register if supported */
> vcpu->arch.host_pkru = read_pkru();
> + vcpu->arch.host_debugctl = get_debugctlmsr();
>
> /* Apply any externally detected TSC adjustments (due to suspend) */
> if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] KVM: SVM: Manually zero/restore DEBUGCTL if LBR virtualization is disabled
2025-02-24 18:13 [PATCH 0/3] KVM: SVM: Zero DEBUGCTL before VMRUN if necessary Sean Christopherson
2025-02-24 18:13 ` [PATCH 1/3] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
@ 2025-02-24 18:13 ` Sean Christopherson
2025-02-26 6:15 ` Ravi Bangoria
2025-02-24 18:13 ` [PATCH 3/3] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
2025-02-25 15:56 ` [PATCH 0/3] KVM: SVM: Zero DEBUGCTL before VMRUN if necessary Peter Zijlstra
3 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-02-24 18:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, rangemachine, whanos, Ravi Bangoria,
Peter Zijlstra
Manually zero DEBUGCTL prior to VMRUN if the host's value is non-zero and
LBR virtualization is disabled, as hardware only context switches DEBUGCTL
if LBR virtualization is fully enabled. Running the guest with the host's
value has likely been mildly problematic for quite some time, e.g. it will
result in undesirable behavior if host is running with BTF=1.
But the bug became fatal with the introduction of Bus Lock Trap ("Detect"
in kernel paralance) support for AMD (commit 408eb7417a92
("x86/bus_lock: Add support for AMD")), as a bus lock in the guest will
trigger an unexpected #DB.
Note, suppressing the bus lock #DB, i.e. simply resuming the guest without
injecting a #DB, is not an option. It wouldn't address the general issue
with DEBUGCTL, e.g. for things like BTF, and there are other guest-visible
side effects if BusLockTrap is left enabled.
If BusLockTrap is disabled, then DR6.BLD is reserved-to-1; any attempts to
clear it by software are ignored. But if BusLockTrap is enabled, software
can clear DR6.BLD:
Software enables bus lock trap by setting DebugCtl MSR[BLCKDB] (bit 2)
to 1. When bus lock trap is enabled, ... The processor indicates that
this #DB was caused by a bus lock by clearing DR6[BLD] (bit 11). DR6[11]
previously had been defined to be always 1.
and clearing DR6.BLD is "sticky" in that it's not set (i.e. lowered) by
other #DBs:
All other #DB exceptions leave DR6[BLD] unmodified
E.g. leaving BusLockTrap enable can confuse a legacy guest that writes '0'
to reset DR6.
Reported-by: rangemachine@gmail.com
Reported-by: whanos@sergal.fun
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219787
Closes: https://lore.kernel.org/all/bug-219787-28872@https.bugzilla.kernel.org%2F
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b8aa0f36850f..d5519e592cb3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4253,6 +4253,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
clgi();
kvm_load_guest_xsave_state(vcpu);
+ /*
+ * Hardware only context switches DEBUGCTL if LBR virtualization is
+ * enabled. Manually zero DEBUGCTL if necessary (and restore it after
+ * VM-Exit), as running with the host's DEBUGCTL can negatively affect
+ * guest state and can even be fatal, e.g. due to Bus Lock Detect.
+ */
+ if (vcpu->arch.host_debugctl &&
+ !(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK))
+ update_debugctlmsr(0);
+
kvm_wait_lapic_expire(vcpu);
/*
@@ -4280,6 +4290,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
+ if (vcpu->arch.host_debugctl &&
+ !(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK))
+ update_debugctlmsr(vcpu->arch.host_debugctl);
+
kvm_load_host_xsave_state(vcpu);
stgi();
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 2/3] KVM: SVM: Manually zero/restore DEBUGCTL if LBR virtualization is disabled
2025-02-24 18:13 ` [PATCH 2/3] KVM: SVM: Manually zero/restore DEBUGCTL if LBR virtualization is disabled Sean Christopherson
@ 2025-02-26 6:15 ` Ravi Bangoria
2025-02-26 15:42 ` Sean Christopherson
0 siblings, 1 reply; 8+ messages in thread
From: Ravi Bangoria @ 2025-02-26 6:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, rangemachine, whanos,
Peter Zijlstra, Ravi Bangoria
Hi Sean,
On 24-Feb-25 11:43 PM, Sean Christopherson wrote:
> Manually zero DEBUGCTL prior to VMRUN if the host's value is non-zero and
> LBR virtualization is disabled, as hardware only context switches DEBUGCTL
> if LBR virtualization is fully enabled. Running the guest with the host's
> value has likely been mildly problematic for quite some time, e.g. it will
> result in undesirable behavior if host is running with BTF=1.
>
> But the bug became fatal with the introduction of Bus Lock Trap ("Detect"
> in kernel paralance) support for AMD (commit 408eb7417a92
> ("x86/bus_lock: Add support for AMD")), as a bus lock in the guest will
> trigger an unexpected #DB.
>
> Note, suppressing the bus lock #DB, i.e. simply resuming the guest without
> injecting a #DB, is not an option. It wouldn't address the general issue
> with DEBUGCTL, e.g. for things like BTF, and there are other guest-visible
> side effects if BusLockTrap is left enabled.
>
> If BusLockTrap is disabled, then DR6.BLD is reserved-to-1; any attempts to
> clear it by software are ignored. But if BusLockTrap is enabled, software
> can clear DR6.BLD:
>
> Software enables bus lock trap by setting DebugCtl MSR[BLCKDB] (bit 2)
> to 1. When bus lock trap is enabled, ... The processor indicates that
> this #DB was caused by a bus lock by clearing DR6[BLD] (bit 11). DR6[11]
> previously had been defined to be always 1.
>
> and clearing DR6.BLD is "sticky" in that it's not set (i.e. lowered) by
> other #DBs:
>
> All other #DB exceptions leave DR6[BLD] unmodified
>
> E.g. leaving BusLockTrap enable can confuse a legacy guest that writes '0'
> to reset DR6.
What if guest sets DEBUGCTL[BusLockTrapEn] and runs an application which
causes a bus lock? Guest will receive #DB due to bus lock, even though
guest CPUID says BusLockTrap isn't supported. Should KVM prevent guest
to write to DEBUGCTL[BusLockTrapEn]? Something like:
---
@@ -3168,6 +3168,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (data & DEBUGCTL_RESERVED_BITS)
return 1;
+ if ((data & DEBUGCTLMSR_BUS_LOCK_DETECT) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ return 1;
+
svm_get_lbr_vmcb(svm)->save.dbgctl = data;
svm_update_lbrv(vcpu);
break;
---
Thanks,
Ravi
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/3] KVM: SVM: Manually zero/restore DEBUGCTL if LBR virtualization is disabled
2025-02-26 6:15 ` Ravi Bangoria
@ 2025-02-26 15:42 ` Sean Christopherson
0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2025-02-26 15:42 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Paolo Bonzini, kvm, linux-kernel, rangemachine, whanos,
Peter Zijlstra
On Wed, Feb 26, 2025, Ravi Bangoria wrote:
> Hi Sean,
>
> On 24-Feb-25 11:43 PM, Sean Christopherson wrote:
> > Manually zero DEBUGCTL prior to VMRUN if the host's value is non-zero and
> > LBR virtualization is disabled, as hardware only context switches DEBUGCTL
> > if LBR virtualization is fully enabled. Running the guest with the host's
> > value has likely been mildly problematic for quite some time, e.g. it will
> > result in undesirable behavior if host is running with BTF=1.
> >
> > But the bug became fatal with the introduction of Bus Lock Trap ("Detect"
> > in kernel paralance) support for AMD (commit 408eb7417a92
> > ("x86/bus_lock: Add support for AMD")), as a bus lock in the guest will
> > trigger an unexpected #DB.
> >
> > Note, suppressing the bus lock #DB, i.e. simply resuming the guest without
> > injecting a #DB, is not an option. It wouldn't address the general issue
> > with DEBUGCTL, e.g. for things like BTF, and there are other guest-visible
> > side effects if BusLockTrap is left enabled.
> >
> > If BusLockTrap is disabled, then DR6.BLD is reserved-to-1; any attempts to
> > clear it by software are ignored. But if BusLockTrap is enabled, software
> > can clear DR6.BLD:
> >
> > Software enables bus lock trap by setting DebugCtl MSR[BLCKDB] (bit 2)
> > to 1. When bus lock trap is enabled, ... The processor indicates that
> > this #DB was caused by a bus lock by clearing DR6[BLD] (bit 11). DR6[11]
> > previously had been defined to be always 1.
> >
> > and clearing DR6.BLD is "sticky" in that it's not set (i.e. lowered) by
> > other #DBs:
> >
> > All other #DB exceptions leave DR6[BLD] unmodified
> >
> > E.g. leaving BusLockTrap enable can confuse a legacy guest that writes '0'
> > to reset DR6.
>
> What if guest sets DEBUGCTL[BusLockTrapEn] and runs an application which
> causes a bus lock? Guest will receive #DB due to bus lock, even though
> guest CPUID says BusLockTrap isn't supported. Should KVM prevent guest
> to write to DEBUGCTL[BusLockTrapEn]? Something like:
Ugh, right, AMD's legacy DEBUGCTL_RESERVED_BITS weirdness. Ideally, KVM would
make bits 5:2 reserved. I suspect we could get away with that, because VMX has
rejected all bits except BTF and LBR since the beginning. But I really, really
don't want to deal with more guest breakage due to sending such a change to
stable kernels, so for an immediate fix, I'll add a patch to drop those bits.
That'll still be a guest-visible change, e.g. if the guest is enabling LBRs *and*
the legacy PBi bits, then the state of the PBi bits would be accurate. But given
KVM's craptastic handling of DEBUGCTL, I highly doubt dropping bits 5:2 will break
anything.
*sigh*
And that's exposes yet another bug in this code. Zeroing DEBUGCTL before VMRUN
is wrong if the guest has enabled BTF. KVM should *load* the guest's desired
value if DEBUGCTL == BTF, i.e. if BTF is enabled but LBRs are not.
> ---
> @@ -3168,6 +3168,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> if (data & DEBUGCTL_RESERVED_BITS)
> return 1;
>
> + if ((data & DEBUGCTLMSR_BUS_LOCK_DETECT) &&
> + !guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
> + return 1;
> +
> svm_get_lbr_vmcb(svm)->save.dbgctl = data;
> svm_update_lbrv(vcpu);
> break;
> ---
>
> Thanks,
> Ravi
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
2025-02-24 18:13 [PATCH 0/3] KVM: SVM: Zero DEBUGCTL before VMRUN if necessary Sean Christopherson
2025-02-24 18:13 ` [PATCH 1/3] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
2025-02-24 18:13 ` [PATCH 2/3] KVM: SVM: Manually zero/restore DEBUGCTL if LBR virtualization is disabled Sean Christopherson
@ 2025-02-24 18:13 ` Sean Christopherson
2025-02-25 15:56 ` [PATCH 0/3] KVM: SVM: Zero DEBUGCTL before VMRUN if necessary Peter Zijlstra
3 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2025-02-24 18:13 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, rangemachine, whanos, Ravi Bangoria,
Peter Zijlstra
Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single(). Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09c3d27cc01a..a2cd734beef5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4991,7 +4991,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
- vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
@@ -10984,6 +10983,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
+ vcpu->arch.host_debugctl = get_debugctlmsr();
+
guest_timing_enter_irqoff();
for (;;) {
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 0/3] KVM: SVM: Zero DEBUGCTL before VMRUN if necessary
2025-02-24 18:13 [PATCH 0/3] KVM: SVM: Zero DEBUGCTL before VMRUN if necessary Sean Christopherson
` (2 preceding siblings ...)
2025-02-24 18:13 ` [PATCH 3/3] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
@ 2025-02-25 15:56 ` Peter Zijlstra
3 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2025-02-25 15:56 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, rangemachine, whanos,
Ravi Bangoria
On Mon, Feb 24, 2025 at 10:13:12AM -0800, Sean Christopherson wrote:
> PeterZ,
>
> Can you confirm that the last patch (snapshot and restore DEBUGCTL with
> IRQs disabled) is actually necessary? I'm 99% certain it is, but I'm
> holding out hope that it somehow isn't, because I don't love the idea of
> adding a RDMSR to every VM-Entry.
I think you're right. I mean, I'd have to go double check and trace the
various call paths again, but I'd be very surprised if we can't change
DEBUGCTL from NMI context.
> Assuming DEBUGCTL can indeed get modified in IRQ context, it probably
> makes sense to add a per-CPU cache to eliminate the RDMSR. Unfortunately,
> there are quite a few open-coded WRMSRs, so it's not a trivial change.
This, I'm surprised we've not yet done that.
> On to the main event...
>
> Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> DEBUGCTL if LBR virtualization is disabled. AMD CPUs rather stupidly
> context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> just supported, but fully enabled).
>
> The bug has gone unnoticed because until recently, the only bits that
> KVM would leave set were things like BTF, which are guest visible but
> won't cause functional problems unless guest software is being especially
> particular about #DBs.
>
> The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> as the resulting #DBs due to split-lock accesses in guest userspace (lol
> Steam) get reflected into the guest by KVM.
Hehe, yeah, games. Yeah we ran into that with bus-lock on intel too :-)
^ permalink raw reply [flat|nested] 8+ messages in thread