[PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context
@ 2024-10-09 17:49 Sean Christopherson
  2024-10-09 17:49 ` [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out() Sean Christopherson
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Sean Christopherson @ 2024-10-09 17:49 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

Fix a (VMX only) bug reported by Maxim where KVM caches a stale SS.AR_BYTES
when involuntary preemption schedules out a vCPU during vmx_vcpu_rest(), and
ultimately clobbers the VMCS's SS.AR_BYTES if userspace does KVM_GET_SREGS
=> KVM_SET_SREGS, i.e. if userspace writes the stale value back into KVM.

v4, as this is a spiritual successor to Maxim's earlier series.

Patch 1 fixes the underlying problem by avoiding the cache in kvm_sched_out().

Patch 2 fixes vmx_vcpu_reset() to invalidate the cache _after_ writing the
VMCS, which also fixes the VMCS clobbering bug, but isn't as robust of a fix
for KVM as a whole, e.g. any other flow that invalidates the cache too "early"
would be susceptible to the bug, and on its own doesn't allow for the
hardening in patch 3.

Patch 3 hardens KVM against using the register caches from !TASK context.
Except for PMI callbacks, which are tightly bounded, i.e. can't run while
KVM is modifying segment information, using the register caches from IRQ/NMI
is unsafe.

Patch 4 is a tangentially related cleanup.

v3: https://lore.kernel.org/all/20240725175232.337266-1-mlevitsk@redhat.com

Maxim Levitsky (1):
  KVM: VMX: reset the segment cache after segment init in
    vmx_vcpu_reset()

Sean Christopherson (3):
  KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()
  KVM: x86: Add lockdep-guarded asserts on register cache usage
  KVM: x86: Use '0' for guest RIP if PMI encounters protected guest
    state

 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/kvm_cache_regs.h      | 17 +++++++++++++++++
 arch/x86/kvm/svm/svm.c             |  1 +
 arch/x86/kvm/vmx/main.c            |  1 +
 arch/x86/kvm/vmx/vmx.c             | 29 +++++++++++++++++++++--------
 arch/x86/kvm/vmx/vmx.h             |  1 +
 arch/x86/kvm/x86.c                 | 15 ++++++++++++++-
 8 files changed, 57 insertions(+), 9 deletions(-)

base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
-- 
2.47.0.rc1.288.g06298d1525-goog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()
  2024-10-09 17:49 [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Sean Christopherson
@ 2024-10-09 17:49 ` Sean Christopherson
  2024-10-30 21:11   ` Maxim Levitsky
  2024-10-09 17:50 ` [PATCH v4 2/4] KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() Sean Christopherson
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2024-10-09 17:49 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

When querying guest CPL to determine if a vCPU was preempted while in
kernel mode, bypass the register cache, i.e. always read SS.AR_BYTES from
the VMCS on Intel CPUs.  If the kernel is running with full preemption
enabled, using the register cache in the preemption path can result in
stale and/or uninitialized data being cached in the segment cache.

In particular the following scenario is currently possible:

 - vCPU is just created, and the vCPU thread is preempted before
   SS.AR_BYTES is written in vmx_vcpu_reset().

 - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
   vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.

 - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
   invoke vmx_segment_cache_clear() to invalidate the cache.

As a result, KVM retains a stale value in the cache, which can be read,
e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
selftests) reads and writes system registers just after the vCPU was
created, _without_ modifying SS.AR_BYTES, userspace will write back the
stale '0' value and ultimately will trigger a VM-Entry failure due to
incorrect SS segment type.

Note, the VM-Enter failure can also be avoided by moving the call to
vmx_segment_cache_clear() until after the vmx_vcpu_reset() initializes all
segments.  However, while that change is correct and desirable (and will
come along shortly), it does not address the underlying problem that
accessing KVM's register caches from !task context is generally unsafe.

In addition to fixing the immediate bug, bypassing the cache for this
particular case will allow hardening KVM register caching log to assert
that the caches are accessed only when KVM _knows_ it is safe to do so.

Fixes: de63ad4cf497 ("KVM: X86: implement the logic for spinlock optimization")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Closes: https://lore.kernel.org/all/20240716022014.240960-3-mlevitsk@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/svm/svm.c             |  1 +
 arch/x86/kvm/vmx/main.c            |  1 +
 arch/x86/kvm/vmx/vmx.c             | 23 ++++++++++++++++++-----
 arch/x86/kvm/vmx/vmx.h             |  1 +
 arch/x86/kvm/x86.c                 |  8 +++++++-
 7 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 861d080ed4c6..5aff7222e40f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -34,6 +34,7 @@ KVM_X86_OP(set_msr)
 KVM_X86_OP(get_segment_base)
 KVM_X86_OP(get_segment)
 KVM_X86_OP(get_cpl)
+KVM_X86_OP(get_cpl_no_cache)
 KVM_X86_OP(set_segment)
 KVM_X86_OP(get_cs_db_l_bits)
 KVM_X86_OP(is_valid_cr0)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6d9f763a7bb9..3ae90df0a177 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1656,6 +1656,7 @@ struct kvm_x86_ops {
 	void (*get_segment)(struct kvm_vcpu *vcpu,
 			    struct kvm_segment *var, int seg);
 	int (*get_cpl)(struct kvm_vcpu *vcpu);
+	int (*get_cpl_no_cache)(struct kvm_vcpu *vcpu);
 	void (*set_segment)(struct kvm_vcpu *vcpu,
 			    struct kvm_segment *var, int seg);
 	void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9df3e1e5ae81..50f6b0e03d04 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5031,6 +5031,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.get_segment = svm_get_segment,
 	.set_segment = svm_set_segment,
 	.get_cpl = svm_get_cpl,
+	.get_cpl_no_cache = svm_get_cpl,
 	.get_cs_db_l_bits = svm_get_cs_db_l_bits,
 	.is_valid_cr0 = svm_is_valid_cr0,
 	.set_cr0 = svm_set_cr0,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7668e2fb8043..92d35cc6cd15 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -50,6 +50,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.get_segment = vmx_get_segment,
 	.set_segment = vmx_set_segment,
 	.get_cpl = vmx_get_cpl,
+	.get_cpl_no_cache = vmx_get_cpl_no_cache,
 	.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
 	.is_valid_cr0 = vmx_is_valid_cr0,
 	.set_cr0 = vmx_set_cr0,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a4438358c5e..12dd7009efbe 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3568,16 +3568,29 @@ u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
 	return vmx_read_guest_seg_base(to_vmx(vcpu), seg);
 }
 
-int vmx_get_cpl(struct kvm_vcpu *vcpu)
+static int __vmx_get_cpl(struct kvm_vcpu *vcpu, bool no_cache)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int ar;
 
 	if (unlikely(vmx->rmode.vm86_active))
 		return 0;
-	else {
-		int ar = vmx_read_guest_seg_ar(vmx, VCPU_SREG_SS);
-		return VMX_AR_DPL(ar);
-	}
+
+	if (no_cache)
+		ar = vmcs_read32(GUEST_SS_AR_BYTES);
+	else
+		ar = vmx_read_guest_seg_ar(vmx, VCPU_SREG_SS);
+	return VMX_AR_DPL(ar);
+}
+
+int vmx_get_cpl(struct kvm_vcpu *vcpu)
+{
+	return __vmx_get_cpl(vcpu, false);
+}
+
+int vmx_get_cpl_no_cache(struct kvm_vcpu *vcpu)
+{
+	return __vmx_get_cpl(vcpu, true);
 }
 
 static u32 vmx_segment_access_rights(struct kvm_segment *var)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 2325f773a20b..bcf40c7f3a38 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -385,6 +385,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
 			unsigned long fs_base, unsigned long gs_base);
 int vmx_get_cpl(struct kvm_vcpu *vcpu);
+int vmx_get_cpl_no_cache(struct kvm_vcpu *vcpu);
 bool vmx_emulation_required(struct kvm_vcpu *vcpu);
 unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
 void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146f..830073294640 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5094,7 +5094,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	int idx;
 
 	if (vcpu->preempted) {
-		vcpu->arch.preempted_in_kernel = kvm_arch_vcpu_in_kernel(vcpu);
+		/*
+		 * Assume protected guests are in-kernel.  Inefficient yielding
+		 * due to false positives is preferable to never yielding due
+		 * to false negatives.
+		 */
+		vcpu->arch.preempted_in_kernel = vcpu->arch.guest_state_protected ||
+						 !kvm_x86_call(get_cpl_no_cache)(vcpu);
 
 		/*
 		 * Take the srcu lock as memslots will be accessed to check the gfn
-- 
2.47.0.rc1.288.g06298d1525-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/4] KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
  2024-10-09 17:49 [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Sean Christopherson
  2024-10-09 17:49 ` [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out() Sean Christopherson
@ 2024-10-09 17:50 ` Sean Christopherson
  2024-10-09 17:50 ` [PATCH v4 3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage Sean Christopherson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2024-10-09 17:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

From: Maxim Levitsky <mlevitsk@redhat.com>

Reset the segment cache after segment initialization in vmx_vcpu_reset()
to harden KVM against caching stale/uninitialized data.  Without the
recent fix to bypass the cache in kvm_arch_vcpu_put(), the following
scenario is possible:

 - vCPU is just created, and the vCPU thread is preempted before
   SS.AR_BYTES is written in vmx_vcpu_reset().

 - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
   vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.

 - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
   invoke vmx_segment_cache_clear() to invalidate the cache.

As a result, KVM retains a stale value in the cache, which can be read,
e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
selftests) reads and writes system registers just after the vCPU was
created, _without_ modifying SS.AR_BYTES, userspace will write back the
stale '0' value and ultimately will trigger a VM-Entry failure due to
incorrect SS segment type.

Invalidating the cache after writing the VMCS doesn't address the general
issue of cache accesses from IRQ context being unsafe, but it does prevent
KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a
bug that results in an unsafe cache access.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: 2fb92db1ec08 ("KVM: VMX: Cache vmcs segment fields")
[sean: rework changelog to account for previous patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 12dd7009efbe..a11faab67b4a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4901,9 +4901,6 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vmx->hv_deadline_tsc = -1;
 	kvm_set_cr8(vcpu, 0);
 
-	vmx_segment_cache_clear(vmx);
-	kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS);
-
 	seg_setup(VCPU_SREG_CS);
 	vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
 	vmcs_writel(GUEST_CS_BASE, 0xffff0000ul);
@@ -4930,6 +4927,9 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vmcs_writel(GUEST_IDTR_BASE, 0);
 	vmcs_write32(GUEST_IDTR_LIMIT, 0xffff);
 
+	vmx_segment_cache_clear(vmx);
+	kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS);
+
 	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 	vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
 	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0);
-- 
2.47.0.rc1.288.g06298d1525-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage
  2024-10-09 17:49 [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Sean Christopherson
  2024-10-09 17:49 ` [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out() Sean Christopherson
  2024-10-09 17:50 ` [PATCH v4 2/4] KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() Sean Christopherson
@ 2024-10-09 17:50 ` Sean Christopherson
  2024-10-30 21:13   ` Maxim Levitsky
  2024-10-09 17:50 ` [PATCH v4 4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state Sean Christopherson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2024-10-09 17:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

When lockdep is enabled, assert that KVM accesses the register caches if
and only if cache fills are guaranteed to consume fresh data, i.e. when
KVM when KVM is in control of the code sequence.  Concretely, the caches
can only be used from task context (synchronous) or when handling a PMI
VM-Exit (asynchronous, but only in specific windows where the caches are
in a known, stable state).

Generally speaking, there are very few flows where reading register state
from an asynchronous context is correct or even necessary.  So, rather
than trying to figure out a generic solution, simply disallow using the
caches outside of task context by default, and deal with any future
exceptions on a case-by-case basis _if_ they arise.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/kvm_cache_regs.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index b1eb46e26b2e..36a8786db291 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -43,6 +43,18 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
 BUILD_KVM_GPR_ACCESSORS(r15, R15)
 #endif
 
+/*
+ * Using the register cache from interrupt context is generally not allowed, as
+ * caching a register and marking it available/dirty can't be done atomically,
+ * i.e. accesses from interrupt context may clobber state or read stale data if
+ * the vCPU task is in the process of updating the cache.  The exception is if
+ * KVM is handling a PMI IRQ/NMI VM-Exit, as that bound code sequence doesn't
+ * touch the cache, it runs after the cache is reset (post VM-Exit), and PMIs
+ * need to access several registers that are cacheable.
+ */
+#define kvm_assert_register_caching_allowed(vcpu)		\
+	lockdep_assert_once(in_task() || kvm_arch_pmi_in_guest(vcpu))
+
 /*
  * avail  dirty
  * 0	  0	  register in VMCS/VMCB
@@ -53,24 +65,28 @@ BUILD_KVM_GPR_ACCESSORS(r15, R15)
 static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
 					     enum kvm_reg reg)
 {
+	kvm_assert_register_caching_allowed(vcpu);
 	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
 }
 
 static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
 					 enum kvm_reg reg)
 {
+	kvm_assert_register_caching_allowed(vcpu);
 	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
 }
 
 static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
 					       enum kvm_reg reg)
 {
+	kvm_assert_register_caching_allowed(vcpu);
 	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
 }
 
 static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
 					   enum kvm_reg reg)
 {
+	kvm_assert_register_caching_allowed(vcpu);
 	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
 	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
 }
@@ -84,6 +100,7 @@ static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
 static __always_inline bool kvm_register_test_and_mark_available(struct kvm_vcpu *vcpu,
 								 enum kvm_reg reg)
 {
+	kvm_assert_register_caching_allowed(vcpu);
 	return arch___test_and_set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
 }
 
-- 
2.47.0.rc1.288.g06298d1525-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state
  2024-10-09 17:49 [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Sean Christopherson
                   ` (2 preceding siblings ...)
  2024-10-09 17:50 ` [PATCH v4 3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage Sean Christopherson
@ 2024-10-09 17:50 ` Sean Christopherson
  2024-10-30 21:13   ` Maxim Levitsky
  2024-10-10 13:06 ` [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Paolo Bonzini
  2024-10-31 19:51 ` Sean Christopherson
  5 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2024-10-09 17:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

Explicitly return '0' for guest RIP when handling a PMI VM-Exit for a vCPU
with protected guest state, i.e. when KVM can't read the real RIP.  While
there is no "right" value, and profiling a protect guest is rather futile,
returning the last known RIP is worse than returning obviously "bad" data.
E.g. for SEV-ES+, the last known RIP will often point somewhere in the
guest's boot flow.

Opportunistically add WARNs to effectively assert that the in_kernel() and
get_ip() callbacks are restricted to the common PMI handler, as the return
values for the protected guest state case are largely arbitrary, i.e. only
make any sense whatsoever for PMIs, where the returned values have no
functional impact and thus don't truly matter.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 830073294640..516cf6c71567 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13213,6 +13213,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
+	WARN_ON_ONCE(!kvm_arch_pmi_in_guest(vcpu));
+
 	if (vcpu->arch.guest_state_protected)
 		return true;
 
@@ -13221,6 +13223,11 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 
 unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
 {
+	WARN_ON_ONCE(!kvm_arch_pmi_in_guest(vcpu));
+
+	if (vcpu->arch.guest_state_protected)
+		return 0;
+
 	return kvm_rip_read(vcpu);
 }
 
-- 
2.47.0.rc1.288.g06298d1525-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context
  2024-10-09 17:49 [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Sean Christopherson
                   ` (3 preceding siblings ...)
  2024-10-09 17:50 ` [PATCH v4 4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state Sean Christopherson
@ 2024-10-10 13:06 ` Paolo Bonzini
  2024-10-10 16:17   ` Sean Christopherson
  2024-10-31 19:51 ` Sean Christopherson
  5 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2024-10-10 13:06 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel, Maxim Levitsky

On 10/9/24 19:49, Sean Christopherson wrote:
> Fix a (VMX only) bug reported by Maxim where KVM caches a stale SS.AR_BYTES
> when involuntary preemption schedules out a vCPU during vmx_vcpu_rest(), and
> ultimately clobbers the VMCS's SS.AR_BYTES if userspace does KVM_GET_SREGS
> => KVM_SET_SREGS, i.e. if userspace writes the stale value back into KVM.
> 
> v4, as this is a spiritual successor to Maxim's earlier series.
> 
> Patch 1 fixes the underlying problem by avoiding the cache in kvm_sched_out().

I think we want this one in stable?

Thanks,

Paolo

> Patch 2 fixes vmx_vcpu_reset() to invalidate the cache _after_ writing the
> VMCS, which also fixes the VMCS clobbering bug, but isn't as robust of a fix
> for KVM as a whole, e.g. any other flow that invalidates the cache too "early"
> would be susceptible to the bug, and on its own doesn't allow for the
> hardening in patch 3.
> 
> Patch 3 hardens KVM against using the register caches from !TASK context.
> Except for PMI callbacks, which are tightly bounded, i.e. can't run while
> KVM is modifying segment information, using the register caches from IRQ/NMI
> is unsafe.
> 
> Patch 4 is a tangentially related cleanup.
> 
> v3: https://lore.kernel.org/all/20240725175232.337266-1-mlevitsk@redhat.com
> 
> Maxim Levitsky (1):
>    KVM: VMX: reset the segment cache after segment init in
>      vmx_vcpu_reset()
> 
> Sean Christopherson (3):
>    KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()
>    KVM: x86: Add lockdep-guarded asserts on register cache usage
>    KVM: x86: Use '0' for guest RIP if PMI encounters protected guest
>      state
> 
>   arch/x86/include/asm/kvm-x86-ops.h |  1 +
>   arch/x86/include/asm/kvm_host.h    |  1 +
>   arch/x86/kvm/kvm_cache_regs.h      | 17 +++++++++++++++++
>   arch/x86/kvm/svm/svm.c             |  1 +
>   arch/x86/kvm/vmx/main.c            |  1 +
>   arch/x86/kvm/vmx/vmx.c             | 29 +++++++++++++++++++++--------
>   arch/x86/kvm/vmx/vmx.h             |  1 +
>   arch/x86/kvm/x86.c                 | 15 ++++++++++++++-
>   8 files changed, 57 insertions(+), 9 deletions(-)
> 
> 
> base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context
  2024-10-10 13:06 ` [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Paolo Bonzini
@ 2024-10-10 16:17   ` Sean Christopherson
  2024-10-10 16:24     ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2024-10-10 16:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

On Thu, Oct 10, 2024, Paolo Bonzini wrote:
> On 10/9/24 19:49, Sean Christopherson wrote:
> > Fix a (VMX only) bug reported by Maxim where KVM caches a stale SS.AR_BYTES
> > when involuntary preemption schedules out a vCPU during vmx_vcpu_rest(), and
> > ultimately clobbers the VMCS's SS.AR_BYTES if userspace does KVM_GET_SREGS
> > => KVM_SET_SREGS, i.e. if userspace writes the stale value back into KVM.
> > 
> > v4, as this is a spiritual successor to Maxim's earlier series.
> > 
> > Patch 1 fixes the underlying problem by avoiding the cache in kvm_sched_out().
> 
> I think we want this one in stable?

If anything, we should send Maxim's patch to stable trees.  While not a complete
fix, it resolves the only known scenario where caching SS.AR_BYTES is truly
problematic, it's as low risk as patches get, and it's much more likely to backport
cleanly to older kernels.

> > Patch 2 fixes vmx_vcpu_reset() to invalidate the cache _after_ writing the
> > VMCS, which also fixes the VMCS clobbering bug, but isn't as robust of a fix
> > for KVM as a whole, e.g. any other flow that invalidates the cache too "early"
> > would be susceptible to the bug, and on its own doesn't allow for the
> > hardening in patch 3.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context
  2024-10-10 16:17   ` Sean Christopherson
@ 2024-10-10 16:24     ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2024-10-10 16:24 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel, Maxim Levitsky

On Thu, Oct 10, 2024 at 6:17 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Oct 10, 2024, Paolo Bonzini wrote:
> > On 10/9/24 19:49, Sean Christopherson wrote:
> > > Fix a (VMX only) bug reported by Maxim where KVM caches a stale SS.AR_BYTES
> > > when involuntary preemption schedules out a vCPU during vmx_vcpu_rest(), and
> > > ultimately clobbers the VMCS's SS.AR_BYTES if userspace does KVM_GET_SREGS
> > > => KVM_SET_SREGS, i.e. if userspace writes the stale value back into KVM.
> > >
> > > v4, as this is a spiritual successor to Maxim's earlier series.
> > >
> > > Patch 1 fixes the underlying problem by avoiding the cache in kvm_sched_out().
> >
> > I think we want this one in stable?
>
> If anything, we should send Maxim's patch to stable trees.  While not a complete
> fix, it resolves the only known scenario where caching SS.AR_BYTES is truly
> problematic, it's as low risk as patches get, and it's much more likely to backport
> cleanly to older kernels.

Ok, this works for me.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()
  2024-10-09 17:49 ` [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out() Sean Christopherson
@ 2024-10-30 21:11   ` Maxim Levitsky
  0 siblings, 0 replies; 13+ messages in thread
From: Maxim Levitsky @ 2024-10-30 21:11 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

On Wed, 2024-10-09 at 10:49 -0700, Sean Christopherson wrote:
> When querying guest CPL to determine if a vCPU was preempted while in
> kernel mode, bypass the register cache, i.e. always read SS.AR_BYTES from
> the VMCS on Intel CPUs.  If the kernel is running with full preemption
> enabled, using the register cache in the preemption path can result in
> stale and/or uninitialized data being cached in the segment cache.
> 
> In particular the following scenario is currently possible:
> 
>  - vCPU is just created, and the vCPU thread is preempted before
>    SS.AR_BYTES is written in vmx_vcpu_reset().
> 
>  - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
>    vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.
> 
>  - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
>    invoke vmx_segment_cache_clear() to invalidate the cache.
> 
> As a result, KVM retains a stale value in the cache, which can be read,
> e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
> segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
> selftests) reads and writes system registers just after the vCPU was
> created, _without_ modifying SS.AR_BYTES, userspace will write back the
> stale '0' value and ultimately will trigger a VM-Entry failure due to
> incorrect SS segment type.
> 
> Note, the VM-Enter failure can also be avoided by moving the call to
> vmx_segment_cache_clear() until after the vmx_vcpu_reset() initializes all
> segments.  However, while that change is correct and desirable (and will
> come along shortly), it does not address the underlying problem that
> accessing KVM's register caches from !task context is generally unsafe.
> 
> In addition to fixing the immediate bug, bypassing the cache for this
> particular case will allow hardening KVM register caching log to assert
> that the caches are accessed only when KVM _knows_ it is safe to do so.
> 
> Fixes: de63ad4cf497 ("KVM: X86: implement the logic for spinlock optimization")
> Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
> Closes: https://lore.kernel.org/all/20240716022014.240960-3-mlevitsk@redhat.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/svm/svm.c             |  1 +
>  arch/x86/kvm/vmx/main.c            |  1 +
>  arch/x86/kvm/vmx/vmx.c             | 23 ++++++++++++++++++-----
>  arch/x86/kvm/vmx/vmx.h             |  1 +
>  arch/x86/kvm/x86.c                 |  8 +++++++-
>  7 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 861d080ed4c6..5aff7222e40f 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -34,6 +34,7 @@ KVM_X86_OP(set_msr)
>  KVM_X86_OP(get_segment_base)
>  KVM_X86_OP(get_segment)
>  KVM_X86_OP(get_cpl)
> +KVM_X86_OP(get_cpl_no_cache)
>  KVM_X86_OP(set_segment)
>  KVM_X86_OP(get_cs_db_l_bits)
>  KVM_X86_OP(is_valid_cr0)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6d9f763a7bb9..3ae90df0a177 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1656,6 +1656,7 @@ struct kvm_x86_ops {
>  	void (*get_segment)(struct kvm_vcpu *vcpu,
>  			    struct kvm_segment *var, int seg);
>  	int (*get_cpl)(struct kvm_vcpu *vcpu);
> +	int (*get_cpl_no_cache)(struct kvm_vcpu *vcpu);
>  	void (*set_segment)(struct kvm_vcpu *vcpu,
>  			    struct kvm_segment *var, int seg);
>  	void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 9df3e1e5ae81..50f6b0e03d04 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5031,6 +5031,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.get_segment = svm_get_segment,
>  	.set_segment = svm_set_segment,
>  	.get_cpl = svm_get_cpl,
> +	.get_cpl_no_cache = svm_get_cpl,
>  	.get_cs_db_l_bits = svm_get_cs_db_l_bits,
>  	.is_valid_cr0 = svm_is_valid_cr0,
>  	.set_cr0 = svm_set_cr0,
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 7668e2fb8043..92d35cc6cd15 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -50,6 +50,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  	.get_segment = vmx_get_segment,
>  	.set_segment = vmx_set_segment,
>  	.get_cpl = vmx_get_cpl,
> +	.get_cpl_no_cache = vmx_get_cpl_no_cache,
>  	.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
>  	.is_valid_cr0 = vmx_is_valid_cr0,
>  	.set_cr0 = vmx_set_cr0,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1a4438358c5e..12dd7009efbe 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3568,16 +3568,29 @@ u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
>  	return vmx_read_guest_seg_base(to_vmx(vcpu), seg);
>  }
>  
> -int vmx_get_cpl(struct kvm_vcpu *vcpu)
> +static int __vmx_get_cpl(struct kvm_vcpu *vcpu, bool no_cache)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int ar;
>  
>  	if (unlikely(vmx->rmode.vm86_active))
>  		return 0;
> -	else {
> -		int ar = vmx_read_guest_seg_ar(vmx, VCPU_SREG_SS);
> -		return VMX_AR_DPL(ar);
> -	}
> +
> +	if (no_cache)
> +		ar = vmcs_read32(GUEST_SS_AR_BYTES);
> +	else
> +		ar = vmx_read_guest_seg_ar(vmx, VCPU_SREG_SS);
> +	return VMX_AR_DPL(ar);
> +}
> +
> +int vmx_get_cpl(struct kvm_vcpu *vcpu)
> +{
> +	return __vmx_get_cpl(vcpu, false);
> +}
> +
> +int vmx_get_cpl_no_cache(struct kvm_vcpu *vcpu)
> +{
> +	return __vmx_get_cpl(vcpu, true);
>  }
>  
>  static u32 vmx_segment_access_rights(struct kvm_segment *var)
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 2325f773a20b..bcf40c7f3a38 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -385,6 +385,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
>  void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
>  			unsigned long fs_base, unsigned long gs_base);
>  int vmx_get_cpl(struct kvm_vcpu *vcpu);
> +int vmx_get_cpl_no_cache(struct kvm_vcpu *vcpu);
>  bool vmx_emulation_required(struct kvm_vcpu *vcpu);
>  unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
>  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 83fe0a78146f..830073294640 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5094,7 +5094,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	int idx;
>  
>  	if (vcpu->preempted) {
> -		vcpu->arch.preempted_in_kernel = kvm_arch_vcpu_in_kernel(vcpu);
> +		/*
> +		 * Assume protected guests are in-kernel.  Inefficient yielding
> +		 * due to false positives is preferable to never yielding due
> +		 * to false negatives.
> +		 */
> +		vcpu->arch.preempted_in_kernel = vcpu->arch.guest_state_protected ||
> +						 !kvm_x86_call(get_cpl_no_cache)(vcpu);
>  
>  		/*
>  		 * Take the srcu lock as memslots will be accessed to check the gfn


Initially I thought that we need to do this for the CPL, and the RIP at least, but
if this is done only for the CPL, it is reasonable IMHO.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage
  2024-10-09 17:50 ` [PATCH v4 3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage Sean Christopherson
@ 2024-10-30 21:13   ` Maxim Levitsky
  0 siblings, 0 replies; 13+ messages in thread
From: Maxim Levitsky @ 2024-10-30 21:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

On Wed, 2024-10-09 at 10:50 -0700, Sean Christopherson wrote:
> When lockdep is enabled, assert that KVM accesses the register caches if
> and only if cache fills are guaranteed to consume fresh data, i.e. when
> KVM when KVM is in control of the code sequence.  Concretely, the caches
> can only be used from task context (synchronous) or when handling a PMI
> VM-Exit (asynchronous, but only in specific windows where the caches are
> in a known, stable state).
> 
> Generally speaking, there are very few flows where reading register state
> from an asynchronous context is correct or even necessary.  So, rather
> than trying to figure out a generic solution, simply disallow using the
> caches outside of task context by default, and deal with any future
> exceptions on a case-by-case basis _if_ they arise.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/kvm_cache_regs.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index b1eb46e26b2e..36a8786db291 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -43,6 +43,18 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
>  BUILD_KVM_GPR_ACCESSORS(r15, R15)
>  #endif
>  
> +/*
> + * Using the register cache from interrupt context is generally not allowed, as
> + * caching a register and marking it available/dirty can't be done atomically,
> + * i.e. accesses from interrupt context may clobber state or read stale data if
> + * the vCPU task is in the process of updating the cache.  The exception is if
> + * KVM is handling a PMI IRQ/NMI VM-Exit, as that bound code sequence doesn't
> + * touch the cache, it runs after the cache is reset (post VM-Exit), and PMIs
> + * need to access several registers that are cacheable.
> + */
> +#define kvm_assert_register_caching_allowed(vcpu)		\
> +	lockdep_assert_once(in_task() || kvm_arch_pmi_in_guest(vcpu))
> +
>  /*
>   * avail  dirty
>   * 0	  0	  register in VMCS/VMCB
> @@ -53,24 +65,28 @@ BUILD_KVM_GPR_ACCESSORS(r15, R15)
>  static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
>  					     enum kvm_reg reg)
>  {
> +	kvm_assert_register_caching_allowed(vcpu);
>  	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
>  }
>  
>  static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
>  					 enum kvm_reg reg)
>  {
> +	kvm_assert_register_caching_allowed(vcpu);
>  	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
>  }
>  
>  static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
>  					       enum kvm_reg reg)
>  {
> +	kvm_assert_register_caching_allowed(vcpu);
>  	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
>  }
>  
>  static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
>  					   enum kvm_reg reg)
>  {
> +	kvm_assert_register_caching_allowed(vcpu);
>  	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
>  	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
>  }
> @@ -84,6 +100,7 @@ static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
>  static __always_inline bool kvm_register_test_and_mark_available(struct kvm_vcpu *vcpu,
>  								 enum kvm_reg reg)
>  {
> +	kvm_assert_register_caching_allowed(vcpu);
>  	return arch___test_and_set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
>  }
>  

Using lockdep for non 100% lockdep purposes is odd, but then these asserts do
guard against races, so I guess this is a fair use of this assert.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state
  2024-10-09 17:50 ` [PATCH v4 4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state Sean Christopherson
@ 2024-10-30 21:13   ` Maxim Levitsky
  0 siblings, 0 replies; 13+ messages in thread
From: Maxim Levitsky @ 2024-10-30 21:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

On Wed, 2024-10-09 at 10:50 -0700, Sean Christopherson wrote:
> Explicitly return '0' for guest RIP when handling a PMI VM-Exit for a vCPU
> with protected guest state, i.e. when KVM can't read the real RIP.  While
> there is no "right" value, and profiling a protect guest is rather futile,
> returning the last known RIP is worse than returning obviously "bad" data.
> E.g. for SEV-ES+, the last known RIP will often point somewhere in the
> guest's boot flow.
> 
> Opportunistically add WARNs to effectively assert that the in_kernel() and
> get_ip() callbacks are restricted to the common PMI handler, as the return
> values for the protected guest state case are largely arbitrary, i.e. only
> make any sense whatsoever for PMIs, where the returned values have no
> functional impact and thus don't truly matter.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 830073294640..516cf6c71567 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13213,6 +13213,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  
>  bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
>  {
> +	WARN_ON_ONCE(!kvm_arch_pmi_in_guest(vcpu));
> +
>  	if (vcpu->arch.guest_state_protected)
>  		return true;
>  
> @@ -13221,6 +13223,11 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
>  
>  unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
>  {
> +	WARN_ON_ONCE(!kvm_arch_pmi_in_guest(vcpu));
> +
> +	if (vcpu->arch.guest_state_protected)
> +		return 0;
> +
>  	return kvm_rip_read(vcpu);
>  }
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context
  2024-10-09 17:49 [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Sean Christopherson
                   ` (4 preceding siblings ...)
  2024-10-10 13:06 ` [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Paolo Bonzini
@ 2024-10-31 19:51 ` Sean Christopherson
  2024-11-01 19:28   ` Sean Christopherson
  5 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2024-10-31 19:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

On Wed, 09 Oct 2024 10:49:58 -0700, Sean Christopherson wrote:
> Fix a (VMX only) bug reported by Maxim where KVM caches a stale SS.AR_BYTES
> when involuntary preemption schedules out a vCPU during vmx_vcpu_rest(), and
> ultimately clobbers the VMCS's SS.AR_BYTES if userspace does KVM_GET_SREGS
> => KVM_SET_SREGS, i.e. if userspace writes the stale value back into KVM.
> 
> v4, as this is a spiritual successor to Maxim's earlier series.
> 
> [...]

Applied 1 and 3-4 to kvm-x86 misc.  Patch 2 went into 6.12.  Thanks!

[1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()
      https://github.com/kvm-x86/linux/commit/8c8e90f79c56
[2/4] KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
      (no commit info)
[3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage
      https://github.com/kvm-x86/linux/commit/21abefc6958d
[4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state
      https://github.com/kvm-x86/linux/commit/a395d143ef40

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context
  2024-10-31 19:51 ` Sean Christopherson
@ 2024-11-01 19:28   ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2024-11-01 19:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Maxim Levitsky

On Thu, Oct 31, 2024, Sean Christopherson wrote:
> On Wed, 09 Oct 2024 10:49:58 -0700, Sean Christopherson wrote:
> > Fix a (VMX only) bug reported by Maxim where KVM caches a stale SS.AR_BYTES
> > when involuntary preemption schedules out a vCPU during vmx_vcpu_rest(), and
> > ultimately clobbers the VMCS's SS.AR_BYTES if userspace does KVM_GET_SREGS
> > => KVM_SET_SREGS, i.e. if userspace writes the stale value back into KVM.
> > 
> > v4, as this is a spiritual successor to Maxim's earlier series.
> > 
> > [...]
> 
> Applied 1 and 3-4 to kvm-x86 misc.  Patch 2 went into 6.12.  Thanks!
> 
> [1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()
>       https://github.com/kvm-x86/linux/commit/8c8e90f79c56
> [2/4] KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
>       (no commit info)
> [3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage
>       https://github.com/kvm-x86/linux/commit/21abefc6958d
> [4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state
>       https://github.com/kvm-x86/linux/commit/a395d143ef40

FYI, I rebased misc to v6.12-rc5, as patches in another series had already been
taken through the tip tree.  New hashes:

[1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()
      https://github.com/kvm-x86/linux/commit/f0e7012c4b93
...

[3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage
      https://github.com/kvm-x86/linux/commit/1c932fc7620d
[4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state
      https://github.com/kvm-x86/linux/commit/eecf3985459a

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-01 19:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09 17:49 [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Sean Christopherson
2024-10-09 17:49 ` [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out() Sean Christopherson
2024-10-30 21:11   ` Maxim Levitsky
2024-10-09 17:50 ` [PATCH v4 2/4] KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() Sean Christopherson
2024-10-09 17:50 ` [PATCH v4 3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage Sean Christopherson
2024-10-30 21:13   ` Maxim Levitsky
2024-10-09 17:50 ` [PATCH v4 4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state Sean Christopherson
2024-10-30 21:13   ` Maxim Levitsky
2024-10-10 13:06 ` [PATCH v4 0/4] KVM: x86: Fix and harden reg caching from !TASK context Paolo Bonzini
2024-10-10 16:17   ` Sean Christopherson
2024-10-10 16:24     ` Paolo Bonzini
2024-10-31 19:51 ` Sean Christopherson
2024-11-01 19:28   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox