linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
@ 2025-02-27 22:24 Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 1/6] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-27 22:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

Fix a long-lurking bug in SVM where KVM runs the guest with the host's
DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
context switch DEBUGCTL if and only if LBR virtualization is enabled (not
just supported, but fully enabled).

The bug has gone unnoticed because until recently, the only bits that
KVM would leave set were things like BTF, which are guest visible but
won't cause functional problems unless guest software is being especially
particular about #DBs.

The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
as the resulting #DBs due to split-lock accesses in guest userspace (lol
Steam) get reflected into the guest by KVM.

Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
likely the behavior that SVM guests have gotten the vast, vast majority of
the time, and given that it's the behavior on Intel, it's (hopefully) a safe
option for a fix, e.g. versus trying to add proper BTF virtualization on the
fly.

v3:
 - Suppress BTF, as KVM doesn't actually support it. [Ravi]
 - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
   it's guaranteed to be '0' in this scenario). [Ravi]

v2:
 - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
 - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
   unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
 - Collect a review. [Xiaoyao]
 - Make bits 5:3 fully reserved, in a separate not-for-stable patch.

v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com


Sean Christopherson (6):
  KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
  KVM: SVM: Suppress DEBUGCTL.BTF on AMD
  KVM: x86: Snapshot the host's DEBUGCTL in common x86
  KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is
    disabled
  KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
  KVM: SVM: Treat DEBUGCTL[5:2] as reserved

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm/svm.c          | 24 ++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h          |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  8 ++------
 arch/x86/kvm/vmx/vmx.h          |  2 --
 arch/x86/kvm/x86.c              |  2 ++
 6 files changed, 30 insertions(+), 9 deletions(-)


base-commit: fed48e2967f402f561d80075a20c5c9e16866e53
-- 
2.48.1.711.g2feabab25a-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/6] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
@ 2025-02-27 22:24 ` Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 2/6] KVM: SVM: Suppress DEBUGCTL.BTF on AMD Sean Christopherson
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-27 22:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

Drop bits 5:2 from the guest's effective DEBUGCTL value, as AMD changed
the architectural behavior of the bits and broke backwards compatibility.
On CPUs without BusLockTrap (or at least, in APMs from before ~2023),
bits 5:2 controlled the behavior of external pins:

  Performance-Monitoring/Breakpoint Pin-Control (PBi)—Bits 5:2, read/write.
  Software uses thesebits to control the type of information reported by
  the four external performance-monitoring/breakpoint pins on the
  processor. When a PBi bit is cleared to 0, the corresponding external pin
  (BPi) reports performance-monitor information. When a PBi bit is set to
  1, the corresponding external pin (BPi) reports breakpoint information.

With the introduction of BusLockTrap, presumably to be compatible with
Intel CPUs, AMD redefined bit 2 to be BLCKDB:

  Bus Lock #DB Trap (BLCKDB)—Bit 2, read/write. Software sets this bit to
  enable generation of a #DB trap following successful execution of a bus
  lock when CPL is > 0.

and redefined bits 5:3 (and bit 6) as "6:3 Reserved MBZ".

Ideally, KVM would treat bits 5:2 as reserved.  Defer that change to a
feature cleanup to avoid breaking existing guest in LTS kernels.  For now,
drop the bits to retain backwards compatibility (of a sort).

Note, dropping bits 5:2 is still a guest-visible change, e.g. if the guest
is enabling LBRs *and* the legacy PBi bits, then the state of the PBi bits
is visible to the guest, whereas now the guest will always see '0'.

Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 12 ++++++++++++
 arch/x86/kvm/svm/svm.h |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b8aa0f36850f..2280bd1d0863 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3165,6 +3165,18 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 			kvm_pr_unimpl_wrmsr(vcpu, ecx, data);
 			break;
 		}
+
+		/*
+		 * AMD changed the architectural behavior of bits 5:2.  On CPUs
+		 * without BusLockTrap, bits 5:2 control "external pins", but
+		 * on CPUs that support BusLockDetect, bit 2 enables BusLockTrap
+		 * and bits 5:3 are reserved-to-zero.  Sadly, old KVM allowed
+		 * the guest to set bits 5:2 despite not actually virtualizing
+		 * Performance-Monitoring/Breakpoint external pins.  Drop bits
+		 * 5:2 for backwards compatibility.
+		 */
+		data &= ~GENMASK(5, 2);
+
 		if (data & DEBUGCTL_RESERVED_BITS)
 			return 1;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5b159f017055..f573548b7b41 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -582,7 +582,7 @@ static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
+#define DEBUGCTL_RESERVED_BITS (~(DEBUGCTLMSR_BTF | DEBUGCTLMSR_LBR))
 
 extern bool dump_invalid_vmcb;
 
-- 
2.48.1.711.g2feabab25a-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/6] KVM: SVM: Suppress DEBUGCTL.BTF on AMD
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 1/6] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
@ 2025-02-27 22:24 ` Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 3/6] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-27 22:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

Mark BTF as reserved in DEBUGCTL on AMD, as KVM doesn't actually support
BTF, and fully enabling BTF virtualization is non-trivial due to
interactions with the emulator, guest_debug, #DB interception, nested SVM,
etc.

Don't inject #GP if the guest attempts to set BTF, as there's no way to
communicate lack of support to the guest, and instead suppress the flag
and treat the WRMSR as (partially) unsupported.

In short, make KVM behave the same on AMD and Intel (VMX already squashes
BTF).

Note, due to other bugs in KVM's handling of DEBUGCTL, the only way BTF
has "worked" in any capacity is if the guest simultaneously enables LBRs.

Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 9 +++++++++
 arch/x86/kvm/svm/svm.h | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2280bd1d0863..b70c754686c4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3177,6 +3177,15 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 */
 		data &= ~GENMASK(5, 2);
 
+		/*
+		 * Suppress BTF as KVM doesn't virtualize BTF, but there's no
+		 * way to communicate lack of support to the guest.
+		 */
+		if (data & DEBUGCTLMSR_BTF) {
+			kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data);
+			data &= ~DEBUGCTLMSR_BTF;
+		}
+
 		if (data & DEBUGCTL_RESERVED_BITS)
 			return 1;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f573548b7b41..798c11e755e2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -582,7 +582,7 @@ static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-#define DEBUGCTL_RESERVED_BITS (~(DEBUGCTLMSR_BTF | DEBUGCTLMSR_LBR))
+#define DEBUGCTL_RESERVED_BITS (~DEBUGCTLMSR_LBR)
 
 extern bool dump_invalid_vmcb;
 
-- 
2.48.1.711.g2feabab25a-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/6] KVM: x86: Snapshot the host's DEBUGCTL in common x86
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 1/6] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 2/6] KVM: SVM: Suppress DEBUGCTL.BTF on AMD Sean Christopherson
@ 2025-02-27 22:24 ` Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 4/6] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled Sean Christopherson
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-27 22:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.

Opportunistically change the field to a u64.  While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx/vmx.c          | 8 ++------
 arch/x86/kvm/vmx/vmx.h          | 2 --
 arch/x86/kvm/x86.c              | 1 +
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3506f497741b..02bffe6b54c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -781,6 +781,7 @@ struct kvm_vcpu_arch {
 	u32 pkru;
 	u32 hflags;
 	u64 efer;
+	u64 host_debugctl;
 	u64 apic_base;
 	struct kvm_lapic *apic;    /* kernel irqchip context */
 	bool load_eoi_exitmap_pending;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b71392989609..729c224b72dd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
  */
 void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
 	if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
 		shrink_ple_window(vcpu);
 
 	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
 
 	vmx_vcpu_pi_load(vcpu, cpu);
-
-	vmx->host_debugctlmsr = get_debugctlmsr();
 }
 
 void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 	}
 
 	/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
-	if (vmx->host_debugctlmsr)
-		update_debugctlmsr(vmx->host_debugctlmsr);
+	if (vcpu->arch.host_debugctl)
+		update_debugctlmsr(vcpu->arch.host_debugctl);
 
 #ifndef CONFIG_X86_64
 	/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8b111ce1087c..951e44dc9d0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,8 +340,6 @@ struct vcpu_vmx {
 	/* apic deadline value in host tsc */
 	u64 hv_deadline_tsc;
 
-	unsigned long host_debugctlmsr;
-
 	/*
 	 * Only bits masked by msr_ia32_feature_control_valid_bits can be set in
 	 * msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58b82d6fd77c..09c3d27cc01a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4991,6 +4991,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	/* Save host pkru register if supported */
 	vcpu->arch.host_pkru = read_pkru();
+	vcpu->arch.host_debugctl = get_debugctlmsr();
 
 	/* Apply any externally detected TSC adjustments (due to suspend) */
 	if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
-- 
2.48.1.711.g2feabab25a-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/6] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-02-27 22:24 ` [PATCH v3 3/6] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
@ 2025-02-27 22:24 ` Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 5/6] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-27 22:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

Manually load the guest's DEBUGCTL prior to VMRUN (and restore the host's
value on #VMEXIT) if it diverges from the host's value and LBR
virtualization is disabled, as hardware only context switches DEBUGCTL if
LBR virtualization is fully enabled.  Running the guest with the host's
value has likely been mildly problematic for quite some time, e.g. it will
result in undesirable behavior if BTF diverges (with the caveat that KVM
now suppresses guest BTF due to lack of support).

But the bug became fatal with the introduction of Bus Lock Trap ("Detect"
in kernel paralance) support for AMD (commit 408eb7417a92
("x86/bus_lock: Add support for AMD")), as a bus lock in the guest will
trigger an unexpected #DB.

Note, suppressing the bus lock #DB, i.e. simply resuming the guest without
injecting a #DB, is not an option.  It wouldn't address the general issue
with DEBUGCTL, e.g. for things like BTF, and there are other guest-visible
side effects if BusLockTrap is left enabled.

If BusLockTrap is disabled, then DR6.BLD is reserved-to-1; any attempts to
clear it by software are ignored.  But if BusLockTrap is enabled, software
can clear DR6.BLD:

  Software enables bus lock trap by setting DebugCtl MSR[BLCKDB] (bit 2)
  to 1.  When bus lock trap is enabled, ... The processor indicates that
  this #DB was caused by a bus lock by clearing DR6[BLD] (bit 11).  DR6[11]
  previously had been defined to be always 1.

and clearing DR6.BLD is "sticky" in that it's not set (i.e. lowered) by
other #DBs:

  All other #DB exceptions leave DR6[BLD] unmodified

E.g. leaving BusLockTrap enable can confuse a legacy guest that writes '0'
to reset DR6.

Reported-by: rangemachine@gmail.com
Reported-by: whanos@sergal.fun
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219787
Closes: https://lore.kernel.org/all/bug-219787-28872@https.bugzilla.kernel.org%2F
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b70c754686c4..78664f9b45c5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4274,6 +4274,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 	clgi();
 	kvm_load_guest_xsave_state(vcpu);
 
+	/*
+	 * Hardware only context switches DEBUGCTL if LBR virtualization is
+	 * enabled.  Manually load DEBUGCTL if necessary (and restore it after
+	 * VM-Exit), as running with the host's DEBUGCTL can negatively affect
+	 * guest state and can even be fatal, e.g. due to Bus Lock Detect.
+	 */
+	if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+	    vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
+		update_debugctlmsr(svm->vmcb->save.dbgctl);
+
 	kvm_wait_lapic_expire(vcpu);
 
 	/*
@@ -4301,6 +4311,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
 
+	if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+	    vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
+		update_debugctlmsr(vcpu->arch.host_debugctl);
+
 	kvm_load_host_xsave_state(vcpu);
 	stgi();
 
-- 
2.48.1.711.g2feabab25a-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/6] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-02-27 22:24 ` [PATCH v3 4/6] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled Sean Christopherson
@ 2025-02-27 22:24 ` Sean Christopherson
  2025-02-27 22:24 ` [PATCH v3 6/6] KVM: SVM: Treat DEBUGCTL[5:2] as reserved Sean Christopherson
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-27 22:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single().  Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09c3d27cc01a..a2cd734beef5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4991,7 +4991,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	/* Save host pkru register if supported */
 	vcpu->arch.host_pkru = read_pkru();
-	vcpu->arch.host_debugctl = get_debugctlmsr();
 
 	/* Apply any externally detected TSC adjustments (due to suspend) */
 	if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
@@ -10984,6 +10983,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		set_debugreg(0, 7);
 	}
 
+	vcpu->arch.host_debugctl = get_debugctlmsr();
+
 	guest_timing_enter_irqoff();
 
 	for (;;) {
-- 
2.48.1.711.g2feabab25a-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 6/6] KVM: SVM: Treat DEBUGCTL[5:2] as reserved
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-02-27 22:24 ` [PATCH v3 5/6] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
@ 2025-02-27 22:24 ` Sean Christopherson
  2025-02-28  9:31 ` [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Ravi Bangoria
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-27 22:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

Stop ignoring DEBUGCTL[5:2] on AMD CPUs and instead treat them as reserved.
KVM has never properly virtualized AMD's legacy PBi bits, but did allow
the guest (and host userspace) to set the bits.  To avoid breaking guests
when running on CPUs with BusLockTrap, which redefined bit 2 to BLCKDB and
made bits 5:3 reserved, a previous KVM change ignored bits 5:3, e.g. so
that legacy guest software wouldn't inadvertently enable BusLockTrap or
hit a VMRUN failure due to setting reserved.

To allow for virtualizing BusLockTrap and whatever future features may use
bits 5:3, treat bits 5:2 as reserved (and hope that doing so doesn't break
any existing guests).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 78664f9b45c5..fc9f9a624d93 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3166,17 +3166,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 			break;
 		}
 
-		/*
-		 * AMD changed the architectural behavior of bits 5:2.  On CPUs
-		 * without BusLockTrap, bits 5:2 control "external pins", but
-		 * on CPUs that support BusLockDetect, bit 2 enables BusLockTrap
-		 * and bits 5:3 are reserved-to-zero.  Sadly, old KVM allowed
-		 * the guest to set bits 5:2 despite not actually virtualizing
-		 * Performance-Monitoring/Breakpoint external pins.  Drop bits
-		 * 5:2 for backwards compatibility.
-		 */
-		data &= ~GENMASK(5, 2);
-
 		/*
 		 * Suppress BTF as KVM doesn't virtualize BTF, but there's no
 		 * way to communicate lack of support to the guest.
-- 
2.48.1.711.g2feabab25a-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-02-27 22:24 ` [PATCH v3 6/6] KVM: SVM: Treat DEBUGCTL[5:2] as reserved Sean Christopherson
@ 2025-02-28  9:31 ` Ravi Bangoria
  2025-02-28 14:04   ` Sean Christopherson
  2025-02-28 23:40 ` Sean Christopherson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Ravi Bangoria @ 2025-02-28  9:31 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Xiaoyao Li, rangemachine, whanos,
	Ravi Bangoria

On 28-Feb-25 3:54 AM, Sean Christopherson wrote:
> Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> just supported, but fully enabled).
> 
> The bug has gone unnoticed because until recently, the only bits that
> KVM would leave set were things like BTF, which are guest visible but
> won't cause functional problems unless guest software is being especially
> particular about #DBs.
> 
> The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> as the resulting #DBs due to split-lock accesses in guest userspace (lol
> Steam) get reflected into the guest by KVM.
> 
> Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> likely the behavior that SVM guests have gotten the vast, vast majority of
> the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> option for a fix, e.g. versus trying to add proper BTF virtualization on the
> fly.
> 
> v3:
>  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
>  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
>    it's guaranteed to be '0' in this scenario). [Ravi]
> 
> v2:
>  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
>  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
>    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
>  - Collect a review. [Xiaoyao]
>  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> 
> v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com

For the series,

Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-02-28  9:31 ` [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Ravi Bangoria
@ 2025-02-28 14:04   ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-28 14:04 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Paolo Bonzini, kvm, linux-kernel, Xiaoyao Li, rangemachine,
	whanos

On Fri, Feb 28, 2025, Ravi Bangoria wrote:
> On 28-Feb-25 3:54 AM, Sean Christopherson wrote:
> > Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> > DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> > context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> > just supported, but fully enabled).
> > 
> > The bug has gone unnoticed because until recently, the only bits that
> > KVM would leave set were things like BTF, which are guest visible but
> > won't cause functional problems unless guest software is being especially
> > particular about #DBs.
> > 
> > The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> > as the resulting #DBs due to split-lock accesses in guest userspace (lol
> > Steam) get reflected into the guest by KVM.
> > 
> > Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> > likely the behavior that SVM guests have gotten the vast, vast majority of
> > the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> > option for a fix, e.g. versus trying to add proper BTF virtualization on the
> > fly.
> > 
> > v3:
> >  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
> >  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
> >    it's guaranteed to be '0' in this scenario). [Ravi]
> > 
> > v2:
> >  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
> >  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
> >    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
> >  - Collect a review. [Xiaoyao]
> >  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> > 
> > v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
> 
> For the series,
> 
> Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>

Thank you for all your help, much appreciated!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-02-28  9:31 ` [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Ravi Bangoria
@ 2025-02-28 23:40 ` Sean Christopherson
  2025-04-02  3:57 ` Maxim Levitsky
  2025-05-02 21:51 ` Sean Christopherson
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-02-28 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

On Thu, 27 Feb 2025 14:24:05 -0800, Sean Christopherson wrote:
> Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> just supported, but fully enabled).
> 
> The bug has gone unnoticed because until recently, the only bits that
> KVM would leave set were things like BTF, which are guest visible but
> won't cause functional problems unless guest software is being especially
> particular about #DBs.
> 
> [...]

Applied 1-5 to kvm-x86 fixes (for 6.14).  I'm going to hold off on making
DEBUGCTL[5:2] reserved until at least 6.15.

[1/6] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
      https://github.com/kvm-x86/linux/commit/ee89e8013383
[2/6] KVM: SVM: Suppress DEBUGCTL.BTF on AMD
      https://github.com/kvm-x86/linux/commit/d0eac42f5cec
[3/6] KVM: x86: Snapshot the host's DEBUGCTL in common x86
      https://github.com/kvm-x86/linux/commit/fb71c7959356
[4/6] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
      https://github.com/kvm-x86/linux/commit/433265870ab3
[5/6] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
      https://github.com/kvm-x86/linux/commit/189ecdb3e112
[6/6] KVM: SVM: Treat DEBUGCTL[5:2] as reserved
      (no commit info)

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-02-28 23:40 ` Sean Christopherson
@ 2025-04-02  3:57 ` Maxim Levitsky
  2025-04-08 15:08   ` Maxim Levitsky
  2025-04-08 22:43   ` Sean Christopherson
  2025-05-02 21:51 ` Sean Christopherson
  9 siblings, 2 replies; 17+ messages in thread
From: Maxim Levitsky @ 2025-04-02  3:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote:
> Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> just supported, but fully enabled).
> 
> The bug has gone unnoticed because until recently, the only bits that
> KVM would leave set were things like BTF, which are guest visible but
> won't cause functional problems unless guest software is being especially
> particular about #DBs.
> 
> The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> as the resulting #DBs due to split-lock accesses in guest userspace (lol
> Steam) get reflected into the guest by KVM.
> 
> Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> likely the behavior that SVM guests have gotten the vast, vast majority of
> the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> option for a fix, e.g. versus trying to add proper BTF virtualization on the
> fly.
> 
> v3:
>  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
>  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
>    it's guaranteed to be '0' in this scenario). [Ravi]
> 
> v2:
>  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
>  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
>    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
>  - Collect a review. [Xiaoyao]
>  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> 
> v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
> 


Hi,

Amusingly there is another DEBUGCTL issue, which I just got to the bottom of.
(if I am not mistaken of course).

We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither
set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host
(If I read the code correctly, I didn't verify this in runtime)

This means that the host #SMIs will interfere with the guest PMU.
In particular this causes the 'pmu' kvm-unit-test to fail, which is something that our CI caught.

I think that kvm should just set this bit, or even better, use the host value of this bit,
and hide it from the guest, because the guest shouldn't know about host's smm, 
and we AFAIK don't really support freezing perfmon when the guest enters its own emulated SMM.

What do you think? I'll post patches if you think that this is a good idea.
(A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me)

I also need to check if AMD also has this feature, or if this is Intel specific.

Best regards,
	Maxim Levitsky

> 
> Sean Christopherson (6):
>   KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
>   KVM: SVM: Suppress DEBUGCTL.BTF on AMD
>   KVM: x86: Snapshot the host's DEBUGCTL in common x86
>   KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is
>     disabled
>   KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
>   KVM: SVM: Treat DEBUGCTL[5:2] as reserved
> 
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/svm/svm.c          | 24 ++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.h          |  2 +-
>  arch/x86/kvm/vmx/vmx.c          |  8 ++------
>  arch/x86/kvm/vmx/vmx.h          |  2 --
>  arch/x86/kvm/x86.c              |  2 ++
>  6 files changed, 30 insertions(+), 9 deletions(-)
> 
> 
> base-commit: fed48e2967f402f561d80075a20c5c9e16866e53



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-04-02  3:57 ` Maxim Levitsky
@ 2025-04-08 15:08   ` Maxim Levitsky
  2025-04-08 22:43   ` Sean Christopherson
  1 sibling, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2025-04-08 15:08 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

On Tue, 2025-04-01 at 23:57 -0400, Maxim Levitsky wrote:
> On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote:
> > Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> > DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> > context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> > just supported, but fully enabled).
> > 
> > The bug has gone unnoticed because until recently, the only bits that
> > KVM would leave set were things like BTF, which are guest visible but
> > won't cause functional problems unless guest software is being especially
> > particular about #DBs.
> > 
> > The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> > as the resulting #DBs due to split-lock accesses in guest userspace (lol
> > Steam) get reflected into the guest by KVM.
> > 
> > Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> > likely the behavior that SVM guests have gotten the vast, vast majority of
> > the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> > option for a fix, e.g. versus trying to add proper BTF virtualization on the
> > fly.
> > 
> > v3:
> >  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
> >  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
> >    it's guaranteed to be '0' in this scenario). [Ravi]
> > 
> > v2:
> >  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
> >  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
> >    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
> >  - Collect a review. [Xiaoyao]
> >  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> > 
> > v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
> > 
> 
> Hi,
> 
> Amusingly there is another DEBUGCTL issue, which I just got to the bottom of.
> (if I am not mistaken of course).
> 
> We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither
> set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host
> (If I read the code correctly, I didn't verify this in runtime)
> 
> This means that the host #SMIs will interfere with the guest PMU.
> In particular this causes the 'pmu' kvm-unit-test to fail, which is something that our CI caught.
> 
> I think that kvm should just set this bit, or even better, use the host value of this bit,
> and hide it from the guest, because the guest shouldn't know about host's smm, 
> and we AFAIK don't really support freezing perfmon when the guest enters its own emulated SMM.
> 
> What do you think? I'll post patches if you think that this is a good idea.
> (A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me)
> 
> I also need to check if AMD also has this feature, or if this is Intel specific.

Any update?

Best regards,
	Maxim Levitsky

> 
> Best regards,
> 	Maxim Levitsky
> 
> > Sean Christopherson (6):
> >   KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
> >   KVM: SVM: Suppress DEBUGCTL.BTF on AMD
> >   KVM: x86: Snapshot the host's DEBUGCTL in common x86
> >   KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is
> >     disabled
> >   KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
> >   KVM: SVM: Treat DEBUGCTL[5:2] as reserved
> > 
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/svm/svm.c          | 24 ++++++++++++++++++++++++
> >  arch/x86/kvm/svm/svm.h          |  2 +-
> >  arch/x86/kvm/vmx/vmx.c          |  8 ++------
> >  arch/x86/kvm/vmx/vmx.h          |  2 --
> >  arch/x86/kvm/x86.c              |  2 ++
> >  6 files changed, 30 insertions(+), 9 deletions(-)
> > 
> > 
> > base-commit: fed48e2967f402f561d80075a20c5c9e16866e53



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-04-02  3:57 ` Maxim Levitsky
  2025-04-08 15:08   ` Maxim Levitsky
@ 2025-04-08 22:43   ` Sean Christopherson
  2025-04-09 20:52     ` Maxim Levitsky
  2025-04-14  6:32     ` Sandipan Das
  1 sibling, 2 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-04-08 22:43 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li,
	rangemachine, whanos

On Tue, Apr 01, 2025, Maxim Levitsky wrote:
> On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote:
> > Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> > DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> > context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> > just supported, but fully enabled).
> > 
> > The bug has gone unnoticed because until recently, the only bits that
> > KVM would leave set were things like BTF, which are guest visible but
> > won't cause functional problems unless guest software is being especially
> > particular about #DBs.
> > 
> > The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> > as the resulting #DBs due to split-lock accesses in guest userspace (lol
> > Steam) get reflected into the guest by KVM.
> > 
> > Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> > likely the behavior that SVM guests have gotten the vast, vast majority of
> > the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> > option for a fix, e.g. versus trying to add proper BTF virtualization on the
> > fly.
> > 
> > v3:
> >  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
> >  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
> >    it's guaranteed to be '0' in this scenario). [Ravi]
> > 
> > v2:
> >  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
> >  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
> >    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
> >  - Collect a review. [Xiaoyao]
> >  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> > 
> > v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
> > 
> 
> 
> Hi,
> 
> Amusingly there is another DEBUGCTL issue, which I just got to the bottom of.
> (if I am not mistaken of course).
> 
> We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither
> set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host
> (If I read the code correctly, I didn't verify this in runtime)

Ugh, SMM.  Yeah, KVM doesn't propagate DEBUGCTLMSR_FREEZE_IN_SMM to the guest
value.  KVM intercepts reads and writes to DEBUGCTL, so it should be easy enough
to shove the bit in on writes, and drop it on reads.

> This means that the host #SMIs will interfere with the guest PMU.  In
> particular this causes the 'pmu' kvm-unit-test to fail, which is something
> that our CI caught.
> 
> I think that kvm should just set this bit, or even better, use the host value
> of this bit, and hide it from the guest, because the guest shouldn't know
> about host's smm, and we AFAIK don't really support freezing perfmon when the
> guest enters its own emulated SMM.

Agreed.  Easy thing is to use the host's value, so that KVM doesn't need to check
for its existence.  I can't think of anything that would go sideways by freezing
perfmon if the host happens to take an SMI.

> What do you think? I'll post patches if you think that this is a good idea.
> (A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me)
> 
> I also need to check if AMD also has this feature, or if this is Intel specific.

Intel only.  I assume/think/hope AMD's Host/Guest Only field in the event selector
effectively hides SMM from the guest.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-04-08 22:43   ` Sean Christopherson
@ 2025-04-09 20:52     ` Maxim Levitsky
  2025-04-14  6:32     ` Sandipan Das
  1 sibling, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2025-04-09 20:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li,
	rangemachine, whanos

On Tue, 2025-04-08 at 15:43 -0700, Sean Christopherson wrote:
> On Tue, Apr 01, 2025, Maxim Levitsky wrote:
> > On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote:
> > > Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> > > DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> > > context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> > > just supported, but fully enabled).
> > > 
> > > The bug has gone unnoticed because until recently, the only bits that
> > > KVM would leave set were things like BTF, which are guest visible but
> > > won't cause functional problems unless guest software is being especially
> > > particular about #DBs.
> > > 
> > > The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> > > as the resulting #DBs due to split-lock accesses in guest userspace (lol
> > > Steam) get reflected into the guest by KVM.
> > > 
> > > Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> > > likely the behavior that SVM guests have gotten the vast, vast majority of
> > > the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> > > option for a fix, e.g. versus trying to add proper BTF virtualization on the
> > > fly.
> > > 
> > > v3:
> > >  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
> > >  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
> > >    it's guaranteed to be '0' in this scenario). [Ravi]
> > > 
> > > v2:
> > >  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
> > >  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
> > >    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
> > >  - Collect a review. [Xiaoyao]
> > >  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> > > 
> > > v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
> > > 
> > 
> > Hi,
> > 
> > Amusingly there is another DEBUGCTL issue, which I just got to the bottom of.
> > (if I am not mistaken of course).
> > 
> > We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither
> > set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host
> > (If I read the code correctly, I didn't verify this in runtime)
> 
> Ugh, SMM.  Yeah, KVM doesn't propagate DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> value.  KVM intercepts reads and writes to DEBUGCTL, so it should be easy enough
> to shove the bit in on writes, and drop it on reads.
> 
> > This means that the host #SMIs will interfere with the guest PMU.  In
> > particular this causes the 'pmu' kvm-unit-test to fail, which is something
> > that our CI caught.
> > 
> > I think that kvm should just set this bit, or even better, use the host value
> > of this bit, and hide it from the guest, because the guest shouldn't know
> > about host's smm, and we AFAIK don't really support freezing perfmon when the
> > guest enters its own emulated SMM.
> 
> Agreed.  Easy thing is to use the host's value, so that KVM doesn't need to check
> for its existence.  I can't think of anything that would go sideways by freezing
> perfmon if the host happens to take an SMI.
> 
> > What do you think? I'll post patches if you think that this is a good idea.
> > (A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me)
> > 
> > I also need to check if AMD also has this feature, or if this is Intel specific.
> 
> Intel only.  I assume/think/hope AMD's Host/Guest Only field in the event selector
> effectively hides SMM from the guest.
> 

Hi,

I will post a patch soon then. I just got my hands on the CI machine where the test failed
and yes, the machine receives about 8 #SMIs per second on each core. Oh well...

BTW pmu_counters_test selftest is also affected since it counts # of retired instructions.
With #SMI getting in the way, the number of course soars.

It doesn't fail often at this rate but it does when the test test is done for sufficient
number or times or you just get lucky.


Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-04-08 22:43   ` Sean Christopherson
  2025-04-09 20:52     ` Maxim Levitsky
@ 2025-04-14  6:32     ` Sandipan Das
  2025-04-14 14:43       ` Maxim Levitsky
  1 sibling, 1 reply; 17+ messages in thread
From: Sandipan Das @ 2025-04-14  6:32 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li,
	rangemachine, whanos, nikunj.dadhania

On 4/9/2025 4:13 AM, Sean Christopherson wrote:
> On Tue, Apr 01, 2025, Maxim Levitsky wrote:
>> On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote:
>>> Fix a long-lurking bug in SVM where KVM runs the guest with the host's
>>> DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
>>> context switch DEBUGCTL if and only if LBR virtualization is enabled (not
>>> just supported, but fully enabled).
>>>
>>> The bug has gone unnoticed because until recently, the only bits that
>>> KVM would leave set were things like BTF, which are guest visible but
>>> won't cause functional problems unless guest software is being especially
>>> particular about #DBs.
>>>
>>> The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
>>> as the resulting #DBs due to split-lock accesses in guest userspace (lol
>>> Steam) get reflected into the guest by KVM.
>>>
>>> Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
>>> likely the behavior that SVM guests have gotten the vast, vast majority of
>>> the time, and given that it's the behavior on Intel, it's (hopefully) a safe
>>> option for a fix, e.g. versus trying to add proper BTF virtualization on the
>>> fly.
>>>
>>> v3:
>>>  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
>>>  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
>>>    it's guaranteed to be '0' in this scenario). [Ravi]
>>>
>>> v2:
>>>  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
>>>  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
>>>    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
>>>  - Collect a review. [Xiaoyao]
>>>  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
>>>
>>> v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
>>>
>>
>>
>> Hi,
>>
>> Amusingly there is another DEBUGCTL issue, which I just got to the bottom of.
>> (if I am not mistaken of course).
>>
>> We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither
>> set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host
>> (If I read the code correctly, I didn't verify this in runtime)
> 
> Ugh, SMM.  Yeah, KVM doesn't propagate DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> value.  KVM intercepts reads and writes to DEBUGCTL, so it should be easy enough
> to shove the bit in on writes, and drop it on reads.
> 
>> This means that the host #SMIs will interfere with the guest PMU.  In
>> particular this causes the 'pmu' kvm-unit-test to fail, which is something
>> that our CI caught.
>>
>> I think that kvm should just set this bit, or even better, use the host value
>> of this bit, and hide it from the guest, because the guest shouldn't know
>> about host's smm, and we AFAIK don't really support freezing perfmon when the
>> guest enters its own emulated SMM.
> 
> Agreed.  Easy thing is to use the host's value, so that KVM doesn't need to check
> for its existence.  I can't think of anything that would go sideways by freezing
> perfmon if the host happens to take an SMI.
> 
>> What do you think? I'll post patches if you think that this is a good idea.
>> (A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me)
>>
>> I also need to check if AMD also has this feature, or if this is Intel specific.
> 
> Intel only.  I assume/think/hope AMD's Host/Guest Only field in the event selector
> effectively hides SMM from the guest.

Just using the GuestOnly bit does not hide SMM activity from guests. SMIs are
generally intercepted (kvm_amd.intercept_smi defaults to true) and handled in the
host context. So guest PMCs are isolated by a combination of having the GuestOnly
bit set and the #VMEXITs resulting from SMI interception.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-04-14  6:32     ` Sandipan Das
@ 2025-04-14 14:43       ` Maxim Levitsky
  0 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2025-04-14 14:43 UTC (permalink / raw)
  To: Sandipan Das, Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li,
	rangemachine, whanos, nikunj.dadhania

On Mon, 2025-04-14 at 12:02 +0530, Sandipan Das wrote:
> On 4/9/2025 4:13 AM, Sean Christopherson wrote:
> > On Tue, Apr 01, 2025, Maxim Levitsky wrote:
> > > On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote:
> > > > Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> > > > DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> > > > context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> > > > just supported, but fully enabled).
> > > > 
> > > > The bug has gone unnoticed because until recently, the only bits that
> > > > KVM would leave set were things like BTF, which are guest visible but
> > > > won't cause functional problems unless guest software is being especially
> > > > particular about #DBs.
> > > > 
> > > > The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> > > > as the resulting #DBs due to split-lock accesses in guest userspace (lol
> > > > Steam) get reflected into the guest by KVM.
> > > > 
> > > > Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> > > > likely the behavior that SVM guests have gotten the vast, vast majority of
> > > > the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> > > > option for a fix, e.g. versus trying to add proper BTF virtualization on the
> > > > fly.
> > > > 
> > > > v3:
> > > >  - Suppress BTF, as KVM doesn't actually support it. [Ravi]
> > > >  - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
> > > >    it's guaranteed to be '0' in this scenario). [Ravi]
> > > > 
> > > > v2:
> > > >  - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
> > > >  - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
> > > >    unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
> > > >  - Collect a review. [Xiaoyao]
> > > >  - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> > > > 
> > > > v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@google.com
> > > > 
> > > 
> > > Hi,
> > > 
> > > Amusingly there is another DEBUGCTL issue, which I just got to the bottom of.
> > > (if I am not mistaken of course).
> > > 
> > > We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither
> > > set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host
> > > (If I read the code correctly, I didn't verify this in runtime)
> > 
> > Ugh, SMM.  Yeah, KVM doesn't propagate DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> > value.  KVM intercepts reads and writes to DEBUGCTL, so it should be easy enough
> > to shove the bit in on writes, and drop it on reads.
> > 
> > > This means that the host #SMIs will interfere with the guest PMU.  In
> > > particular this causes the 'pmu' kvm-unit-test to fail, which is something
> > > that our CI caught.
> > > 
> > > I think that kvm should just set this bit, or even better, use the host value
> > > of this bit, and hide it from the guest, because the guest shouldn't know
> > > about host's smm, and we AFAIK don't really support freezing perfmon when the
> > > guest enters its own emulated SMM.
> > 
> > Agreed.  Easy thing is to use the host's value, so that KVM doesn't need to check
> > for its existence.  I can't think of anything that would go sideways by freezing
> > perfmon if the host happens to take an SMI.
> > 
> > > What do you think? I'll post patches if you think that this is a good idea.
> > > (A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me)
> > > 
> > > I also need to check if AMD also has this feature, or if this is Intel specific.
> > 
> > Intel only.  I assume/think/hope AMD's Host/Guest Only field in the event selector
> > effectively hides SMM from the guest.
> 
> Just using the GuestOnly bit does not hide SMM activity from guests. SMIs are
> generally intercepted (kvm_amd.intercept_smi defaults to true)

Hi,

Actually this setting doesn't really work these days, at lesat not on my Zen2 machine (3070x).

Long ago I tested it, and despite loading the system with SMIs either via APIC or via 0xB2 ioport write, 
where in both cases I noticed significant slowdown of a VM, pinned on the receiving CPU I got no SMI VM exits.

BIOS likely has an option to override this setting. 

I guess the reason is security, because with SVM,
one can effectively block the SMIs from being processed on the host.

Best regards,
	Maxim Levitsky


>  and handled in the
> host context. So guest PMCs are isolated by a combination of having the GuestOnly
> bit set and the #VMEXITs resulting from SMI interception.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
  2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-04-02  3:57 ` Maxim Levitsky
@ 2025-05-02 21:51 ` Sean Christopherson
  9 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2025-05-02 21:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ravi Bangoria, Xiaoyao Li, rangemachine,
	whanos

On Thu, 27 Feb 2025 14:24:05 -0800, Sean Christopherson wrote:
> Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> DEBUGCTL if LBR virtualization is disabled.  AMD CPUs rather stupidly
> context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> just supported, but fully enabled).
> 
> The bug has gone unnoticed because until recently, the only bits that
> KVM would leave set were things like BTF, which are guest visible but
> won't cause functional problems unless guest software is being especially
> particular about #DBs.
> 
> [...]

Applied patch 6 to kvm-x86 svm (1-5 already went into 6.15).

[6/6] KVM: SVM: Treat DEBUGCTL[5:2] as reserved
      https://github.com/kvm-x86/linux/commit/5ecdb48dd918

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-05-02 21:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-27 22:24 [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Sean Christopherson
2025-02-27 22:24 ` [PATCH v3 1/6] KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value Sean Christopherson
2025-02-27 22:24 ` [PATCH v3 2/6] KVM: SVM: Suppress DEBUGCTL.BTF on AMD Sean Christopherson
2025-02-27 22:24 ` [PATCH v3 3/6] KVM: x86: Snapshot the host's DEBUGCTL in common x86 Sean Christopherson
2025-02-27 22:24 ` [PATCH v3 4/6] KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled Sean Christopherson
2025-02-27 22:24 ` [PATCH v3 5/6] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs Sean Christopherson
2025-02-27 22:24 ` [PATCH v3 6/6] KVM: SVM: Treat DEBUGCTL[5:2] as reserved Sean Christopherson
2025-02-28  9:31 ` [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs Ravi Bangoria
2025-02-28 14:04   ` Sean Christopherson
2025-02-28 23:40 ` Sean Christopherson
2025-04-02  3:57 ` Maxim Levitsky
2025-04-08 15:08   ` Maxim Levitsky
2025-04-08 22:43   ` Sean Christopherson
2025-04-09 20:52     ` Maxim Levitsky
2025-04-14  6:32     ` Sandipan Das
2025-04-14 14:43       ` Maxim Levitsky
2025-05-02 21:51 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).