* [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL
@ 2024-04-10 14:34 Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
` (9 more replies)
0 siblings, 10 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Adam Dunlap,
Arjan van de Ven, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ilpo Järvinen, Ingo Molnar, Jithu Joseph, Jonathan Corbet,
Josh Poimboeuf, Kan Liang, linux-doc, Maciej S. Szmigiero,
Nikolay Borisov, Paolo Bonzini, Peter Zijlstra, Rick Edgecombe,
Sandipan Das, Sean Christopherson, Thomas Gleixner, Vegard Nossum,
x86
Hi all,
This series is tagged as RFC because I want to seek your feedback on
1. the KVM<->userspace ABI defined in patch 1
I am wondering if we can allow the userspace to configure the mask
and the shadow value during guest's lifetime and do it on a vCPU basis.
this way, in conjunction with "virtual MSRs" or any other interfaces,
the usespace can adjust hardware mitigations applied to the guest during
guest's lifetime e.g., for the best performance.
2. Intel-defined virtual MSRs vs. a new interface
The situation is some other OS already adopts the Intel-defined virtual
MSRs. Given this, I am not sure whether defining a new interface is
still preferable, as it will add more complexities if we end up with two
interfaces for the same purpose.
So, I just want to reconfirm whether the suggestion remains to define a
new interface through community collaboration as suggested at [1].
Below is the cover letter:
Background
==========
Branch History Injection (BHI) is a special form of Spectre variant 2,
where an attacker may manipulate branch history before transitioning
from user to supervisor mode (or from VMX non-root/guest to root mode)
in an effort to cause an indirect branch predictor to select a specific
predictor entry for an indirect branch, and a disclosure gadget at the
predicted target will transiently execute.
To mitigate BHI attacks, the kernel may use the hardware mitigation, i.e.,
BHI_DIS_S or resort to a SW loop, i.e., the BHB-clearing sequence, when the
hardware mitigation is not supported.
Problem
=======
However, the SW loop is effective on pre-SPR parts but not on SPR and
future parts. This creates a mitigation effectiveness problem for virtual
machines:
Migrating a guest using the SW loop on a pre-SPR part to parts where
the SW loop is ineffective (e.g., a SPR or future part) makes the
guest become vulnerable to BHI.
[For bare-metal, it isn't a problem. because parts on which the SW loop
is ineffective always support BHI_DIS_S, which is a more preferable
mitigation than the SW loop.]
Solution
========
This series proposes QEMU+KVM to deploy BHI_DIS_S using "virtualize
IA32_SPEC_CTRL" for the guest if the SW loop is ineffective on the host.
Note that: "virtualize IA32_SPEC_CTRL" allows the VMM to prevent the
guest from changing some bits of IA32_SPEC_CTRL MSR w/o intercepting
guest's writes to the MSR.
This solution leads to a new problem:
Deploying BHI_DIS_S for the guest may cause unnecessary performance loss
if the guest is using other mitigations for BHI or doesn't care BHI
attacks at all.
To overcome this unnecessary performance loss, we want to allow the guest
to opt out of BHI_DIS_S in this case. the idea is to let the guest report
whether it is using the SW loop to KVM/QEMU. Then KVM/QEMU won't deploy
BHI_DIS_S for the guest if the SW loop isn't in use.
Intel defines a set of para-virtualized MSRs [2] for guests to report
software mitigation status. This series emulates the para-virtualized
MSRs in KVM.
Overall, the series has two parts:
1. patch 1-3: Define the KVM ABI for userspace VMMs (e.g., QEMU) to deploy
hardware mitigations for the guest to solve the mitigation effectivenss
problem when migrating guests across parts w/ different microarchitecture.
2. patch 4-10: Emulate virtual MSRs so that the guest can report software
mitigation status to avoid the unnecessary performance loss.
[1] https://lore.kernel.org/all/ZH9kwlg2Ac9IER7Y@google.com/
[2] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html#inpage-nav-4
Chao Gao (4):
KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS
KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02
KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU
KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT
Daniel Sneddon (1):
KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
Pawan Gupta (2):
x86/bugs: Use Virtual MSRs to request BHI_DIS_S
x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S
Zhang Chen (3):
KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
KVM: VMX: Advertise MITIGATION_CTRL support
KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
Documentation/virt/kvm/api.rst | 39 +++++++
arch/x86/include/asm/kvm_host.h | 4 +
arch/x86/include/asm/msr-index.h | 24 +++++
arch/x86/include/asm/vmx.h | 5 +
arch/x86/include/asm/vmxfeatures.h | 2 +
arch/x86/kernel/cpu/bugs.c | 33 ++++++
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/cpu.h | 1 +
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/vmx/capabilities.h | 5 +
arch/x86/kvm/vmx/nested.c | 30 ++++++
arch/x86/kvm/vmx/vmx.c | 162 +++++++++++++++++++++++++++--
arch/x86/kvm/vmx/vmx.h | 21 +++-
arch/x86/kvm/x86.c | 49 ++++++++-
arch/x86/kvm/x86.h | 1 +
include/uapi/linux/kvm.h | 4 +
16 files changed, 376 insertions(+), 8 deletions(-)
base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
--
2.39.3
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-12 4:07 ` Jim Mattson
2024-04-10 14:34 ` [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS Chao Gao
` (8 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Sean Christopherson, Chao Gao,
Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-doc
From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
value directly to hardware. There is a tertiary control for
IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
masked to prevent guests from changing those bits.
Add controls setting the mask for IA32_SPEC_CTRL and desired value for
masked bits.
These new controls are especially helpful for protecting guests that
don't know about BHI_DIS_S and that are running on hardware that
supports it. This allows the hypervisor to set BHI_DIS_S to fully
protect the guest.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
[ add a new ioctl to report supported bits. Fix the inverted check ]
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Documentation/virt/kvm/api.rst | 39 +++++++++++++++++
arch/x86/include/asm/kvm_host.h | 4 ++
arch/x86/include/asm/vmx.h | 5 +++
arch/x86/include/asm/vmxfeatures.h | 2 +
arch/x86/kvm/vmx/capabilities.h | 5 +++
arch/x86/kvm/vmx/vmx.c | 68 +++++++++++++++++++++++++++---
arch/x86/kvm/vmx/vmx.h | 3 +-
arch/x86/kvm/x86.c | 30 +++++++++++++
arch/x86/kvm/x86.h | 1 +
include/uapi/linux/kvm.h | 4 ++
10 files changed, 155 insertions(+), 6 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..b6eeb1d6eb65 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6352,6 +6352,19 @@ a single guest_memfd file, but the bound ranges must not overlap).
See KVM_SET_USER_MEMORY_REGION2 for additional details.
+4.143 KVM_GET_SUPPORTED_FORCE_SPEC_CTRL
+---------------------------------------
+
+:Capability: KVM_CAP_FORCE_SPEC_CTRL
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: u64 supported_bitmask (out)
+:Returns: 0 on success, -EFAULT if supported_bitmap cannot be accessed
+
+Returns a bitmask of SPEC_CTRL MSR bits which can be forced on. All bits can be
+forced to 0 (i.e., prevent guest from setting it) even if KVM doesn't support
+the bit.
+
5. The kvm_run structure
========================
@@ -8063,6 +8076,32 @@ error/annotated fault.
See KVM_EXIT_MEMORY_FAULT for more information.
+7.35 KVM_CAP_FORCE_SPEC_CTRL
+----------------------------
+
+:Architectures: x86
+:Parameters: args[0] contains the bitmask to prevent guests from modifying those
+ bits
+ args[1] contains the desired value to set in IA32_SPEC_CTRL for the
+ masked bits
+:Returns: 0 on success, -EINVAL if args[0] or args[1] contain invalid values
+
+This capability allows userspace to configure the value of IA32_SPEC_CTRL and
+what bits the VM can and cannot access. This is especially useful when a VM is
+migrated to newer hardware with hardware based speculation mitigations not
+provided to the VM previously.
+
+IA32_SPEC_CTRL virtualization works by introducing the IA32_SPEC_CTRL shadow
+and mask fields. When a guest writes to IA32_SPEC_CTRL when it is virtualized
+the value written is:
+
+(GUEST_WRMSR_VAL & ~MASK) | (REAL_MSR_VAL & MASK).
+
+No bit that is masked can be modified by the guest.
+
+The shadow field contains the value the guest wrote to the MSR and is what is
+returned to the guest when the virtualized MSR is read.
+
8. Other capabilities.
======================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 16e07a2eee19..8220414cf697 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1404,6 +1404,10 @@ struct kvm_arch {
u32 notify_window;
u32 notify_vmexit_flags;
+
+ u64 force_spec_ctrl_mask;
+ u64 force_spec_ctrl_value;
+
/*
* If exit_on_emulation_error is set, and the in-kernel instruction
* emulator fails to emulate an instruction, allow userspace
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 4dba17363008..f65651a3898c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -84,6 +84,7 @@
* Definitions of Tertiary Processor-Based VM-Execution Controls.
*/
#define TERTIARY_EXEC_IPI_VIRT VMCS_CONTROL_BIT(IPI_VIRT)
+#define TERTIARY_EXEC_SPEC_CTRL_SHADOW VMCS_CONTROL_BIT(SPEC_CTRL_SHADOW)
#define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING)
#define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING)
@@ -236,6 +237,10 @@ enum vmcs_field {
TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035,
PID_POINTER_TABLE = 0x00002042,
PID_POINTER_TABLE_HIGH = 0x00002043,
+ IA32_SPEC_CTRL_MASK = 0x0000204A,
+ IA32_SPEC_CTRL_MASK_HIGH = 0x0000204B,
+ IA32_SPEC_CTRL_SHADOW = 0x0000204C,
+ IA32_SPEC_CTRL_SHADOW_HIGH = 0x0000204D,
GUEST_PHYSICAL_ADDRESS = 0x00002400,
GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
VMCS_LINK_POINTER = 0x00002800,
diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index 266daf5b5b84..6dbfe9004d92 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -90,4 +90,6 @@
/* Tertiary Processor-Based VM-Execution Controls, word 3 */
#define VMX_FEATURE_IPI_VIRT ( 3*32+ 4) /* Enable IPI virtualization */
+#define VMX_FEATURE_SPEC_CTRL_SHADOW ( 3*32+ 7) /* IA32_SPEC_CTRL shadow */
+
#endif /* _ASM_X86_VMXFEATURES_H */
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 41a4533f9989..6c51a5abb16b 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -138,6 +138,11 @@ static inline bool cpu_has_tertiary_exec_ctrls(void)
CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
}
+static inline bool cpu_has_spec_ctrl_shadow(void)
+{
+ return vmcs_config.cpu_based_3rd_exec_ctrl & TERTIARY_EXEC_SPEC_CTRL_SHADOW;
+}
+
static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c37a89eda90f..a6154d725025 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2008,7 +2008,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
!guest_has_spec_ctrl_msr(vcpu))
return 1;
- msr_info->data = to_vmx(vcpu)->spec_ctrl;
+ if (cpu_has_spec_ctrl_shadow())
+ msr_info->data = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
+ else
+ msr_info->data = to_vmx(vcpu)->spec_ctrl;
break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
@@ -2148,6 +2151,19 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
+static void vmx_set_spec_ctrl(struct kvm_vcpu *vcpu, u64 val)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ vmx->spec_ctrl = val;
+
+ if (cpu_has_spec_ctrl_shadow()) {
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, val);
+
+ vmx->spec_ctrl |= vcpu->kvm->arch.force_spec_ctrl_value;
+ }
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2273,7 +2289,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (kvm_spec_ctrl_test_value(data))
return 1;
- vmx->spec_ctrl = data;
+ vmx_set_spec_ctrl(vcpu, data);
+
if (!data)
break;
@@ -4785,6 +4802,23 @@ static void init_vmcs(struct vcpu_vmx *vmx)
if (cpu_has_vmx_xsaves())
vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
+ if (cpu_has_spec_ctrl_shadow()) {
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, 0);
+
+ /*
+ * Note, IA32_SPEC_CTRL_{SHADOW,MASK} subtly behave *very*
+ * differently than other shadow+mask combinations. Attempts
+ * to modify bits in MASK are silently ignored and do NOT cause
+ * a VM-Exit. This allows the host to force bits to be set or
+ * cleared on behalf of the guest, while still allowing the
+ * guest modify other bits at will, without triggering VM-Exits.
+ */
+ if (kvm->arch.force_spec_ctrl_mask)
+ vmcs_write64(IA32_SPEC_CTRL_MASK, kvm->arch.force_spec_ctrl_mask);
+ else
+ vmcs_write64(IA32_SPEC_CTRL_MASK, 0);
+ }
+
if (enable_pml) {
vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
@@ -4853,7 +4887,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
__vmx_vcpu_reset(vcpu);
vmx->rmode.vm86_active = 0;
- vmx->spec_ctrl = 0;
+ vmx_set_spec_ctrl(vcpu, 0);
vmx->msr_ia32_umwait_control = 0;
@@ -7211,8 +7245,14 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
if (!cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL))
return;
- if (flags & VMX_RUN_SAVE_SPEC_CTRL)
- vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+ if (flags & VMX_RUN_SAVE_SPEC_CTRL) {
+ if (cpu_has_spec_ctrl_shadow())
+ vmx->spec_ctrl = (vmcs_read64(IA32_SPEC_CTRL_SHADOW) &
+ ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask) |
+ vmx->vcpu.kvm->arch.force_spec_ctrl_value;
+ else
+ vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+ }
/*
* If the guest/host SPEC_CTRL values differ, restore the host value.
@@ -8598,6 +8638,24 @@ static __init int hardware_setup(void)
kvm_caps.tsc_scaling_ratio_frac_bits = 48;
kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
+ kvm_caps.supported_force_spec_ctrl = 0;
+
+ if (cpu_has_spec_ctrl_shadow()) {
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_IBRS;
+
+ if (boot_cpu_has(X86_FEATURE_STIBP))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_STIBP;
+
+ if (boot_cpu_has(X86_FEATURE_SSBD))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_SSBD;
+
+ if (boot_cpu_has(X86_FEATURE_RRSBA_CTRL) &&
+ (host_arch_capabilities & ARCH_CAP_RRSBA))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_RRSBA_DIS_S;
+
+ if (boot_cpu_has(X86_FEATURE_BHI_CTRL))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_BHI_DIS_S;
+ }
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 65786dbe7d60..f26ac82b5a59 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -578,7 +578,8 @@ static inline u8 vmx_get_rvi(void)
#define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0
#define KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL \
- (TERTIARY_EXEC_IPI_VIRT)
+ (TERTIARY_EXEC_IPI_VIRT | \
+ TERTIARY_EXEC_SPEC_CTRL_SHADOW)
#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \
static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 984ea2089efc..9a59b5a93d0e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4836,6 +4836,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
r |= BIT(KVM_X86_SW_PROTECTED_VM);
break;
+ case KVM_CAP_FORCE_SPEC_CTRL:
+ r = !!kvm_caps.supported_force_spec_ctrl;
+ break;
default:
break;
}
@@ -4990,6 +4993,13 @@ long kvm_arch_dev_ioctl(struct file *filp,
r = kvm_x86_dev_has_attr(&attr);
break;
}
+ case KVM_GET_SUPPORTED_FORCE_SPEC_CTRL: {
+ r = 0;
+ if (copy_to_user(argp, &kvm_caps.supported_force_spec_ctrl,
+ sizeof(kvm_caps.supported_force_spec_ctrl)))
+ r = -EFAULT;
+ break;
+ }
default:
r = -EINVAL;
break;
@@ -6729,6 +6739,26 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_FORCE_SPEC_CTRL:
+ r = -EINVAL;
+
+ mutex_lock(&kvm->lock);
+
+ /*
+ * Note, only the value is restricted to known bits that KVM
+ * can force on. Userspace is allowed to set any mask bits,
+ * i.e. can prevent the guest from setting a bit, even if KVM
+ * doesn't support the bit.
+ */
+ if (kvm_caps.supported_force_spec_ctrl && !kvm->created_vcpus &&
+ !(~kvm_caps.supported_force_spec_ctrl & cap->args[1]) &&
+ !(~cap->args[0] & cap->args[1])) {
+ kvm->arch.force_spec_ctrl_mask = cap->args[0];
+ kvm->arch.force_spec_ctrl_value = cap->args[1];
+ r = 0;
+ }
+ mutex_unlock(&kvm->lock);
+ break;
default:
r = -EINVAL;
break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a8b71803777b..6dd12776b310 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -29,6 +29,7 @@ struct kvm_caps {
u64 supported_xcr0;
u64 supported_xss;
u64 supported_perf_cap;
+ u64 supported_force_spec_ctrl;
};
void kvm_spurious_fault(void);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..fb918bdb930c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_FORCE_SPEC_CTRL 236
struct kvm_irq_routing_irqchip {
__u32 irqchip;
@@ -1243,6 +1244,9 @@ struct kvm_vfio_spapr_tce {
#define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr)
+/* Available with KVM_CAP_FORCE_SPEC_CTRL */
+#define KVM_GET_SUPPORTED_FORCE_SPEC_CTRL _IOR(KVMIO, 0xe4, __u64)
+
/*
* ioctls for vcpu fds
*/
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02 Chao Gao
` (7 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
This field is effectively the value of IA32_SPEC_CTRL MSR in guest's
view. Cache it for nested VMX transitions. The value should be
propagated between vmcs01 and vmcs02 so that across nested VMX
transitions, in guest's view, IA32_SPEC_CTRL MSR won't be changed
magically.
IA32_SPEC_CTRL_SHADOW field may be changed by guest if IA32_SPEC_CTRL
MSR is pass-thru'd to the guest. So, update the cache right after
VM-exit to ensure it is always consistent with the value in guest's
view.
A bonus is vmx_get_msr() can return the cache directly thus no need
to make a VMREAD.
No functional change intended.
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 12 ++++++++----
arch/x86/kvm/vmx/vmx.h | 6 ++++++
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a6154d725025..93c208f009cf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2009,7 +2009,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
if (cpu_has_spec_ctrl_shadow())
- msr_info->data = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
+ msr_info->data = to_vmx(vcpu)->spec_ctrl_shadow;
else
msr_info->data = to_vmx(vcpu)->spec_ctrl;
break;
@@ -2158,6 +2158,7 @@ static void vmx_set_spec_ctrl(struct kvm_vcpu *vcpu, u64 val)
vmx->spec_ctrl = val;
if (cpu_has_spec_ctrl_shadow()) {
+ vmx->spec_ctrl_shadow = val;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, val);
vmx->spec_ctrl |= vcpu->kvm->arch.force_spec_ctrl_value;
@@ -4803,6 +4804,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
if (cpu_has_spec_ctrl_shadow()) {
+ vmx->spec_ctrl_shadow = 0;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, 0);
/*
@@ -7246,12 +7248,14 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
return;
if (flags & VMX_RUN_SAVE_SPEC_CTRL) {
- if (cpu_has_spec_ctrl_shadow())
- vmx->spec_ctrl = (vmcs_read64(IA32_SPEC_CTRL_SHADOW) &
+ if (cpu_has_spec_ctrl_shadow()) {
+ vmx->spec_ctrl_shadow = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
+ vmx->spec_ctrl = (vmx->spec_ctrl_shadow &
~vmx->vcpu.kvm->arch.force_spec_ctrl_mask) |
vmx->vcpu.kvm->arch.force_spec_ctrl_value;
- else
+ } else {
vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+ }
}
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f26ac82b5a59..97324f6ee01c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -281,6 +281,12 @@ struct vcpu_vmx {
#endif
u64 spec_ctrl;
+ /*
+ * Cache IA32_SPEC_CTRL_SHADOW field of VMCS, i.e., the value of
+ * MSR_IA32_SPEC_CTRL in guest's view.
+ */
+ u64 spec_ctrl_shadow;
+
u32 msr_ia32_umwait_control;
/*
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S Chao Gao
` (6 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
to prevent nested guests from changing the SPEC_CTRL bits that userspace
doesn't allow a guest to change.
Propagate tertiary vm-exec controls from vmcs01 to vmcs02 and program
the mask of SPEC_CTRL MSRs as the userspace VMM requested.
With SPEC_CTRL virtualization enabled, guest will read from the shadow
value in VMCS. To ensure consistent view across nested VMX transitions,
propagate the shadow value between vmcs01 and vmcs02.
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/nested.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d05ddf751491..174790b2ffbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2381,6 +2381,20 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
secondary_exec_controls_set(vmx, exec_control);
}
+ /*
+ * TERTIARY EXEC CONTROLS
+ */
+ if (cpu_has_tertiary_exec_ctrls()) {
+ exec_control = __tertiary_exec_controls_get(vmcs01);
+
+ exec_control &= TERTIARY_EXEC_SPEC_CTRL_SHADOW;
+ if (exec_control & TERTIARY_EXEC_SPEC_CTRL_SHADOW)
+ vmcs_write64(IA32_SPEC_CTRL_MASK,
+ vmx->vcpu.kvm->arch.force_spec_ctrl_mask);
+
+ tertiary_exec_controls_set(vmx, exec_control);
+ }
+
/*
* ENTRY CONTROLS
*
@@ -2625,6 +2639,19 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (kvm_caps.has_tsc_control)
vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
+ /*
+ * L2 after nested VM-entry should observe the same value of
+ * IA32_SPEC_CTRL MSR as L1 unless:
+ * a. L1 loads IA32_SPEC_CTRL via MSR-load area.
+ * b. L1 enables IA32_SPEC_CTRL virtualization. this cannot
+ * happen since KVM doesn't expose this feature to L1.
+ *
+ * Propagate spec_ctrl_shadow (the value guest will get via RDMSR)
+ * to vmcs02. Later nested_vmx_load_msr() will take care of case a.
+ */
+ if (vmx->nested.nested_run_pending && cpu_has_spec_ctrl_shadow())
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, vmx->spec_ctrl_shadow);
+
nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
if (nested_cpu_has_ept(vmcs12))
@@ -4883,6 +4910,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx_update_cpu_dirty_logging(vcpu);
}
+ if (cpu_has_spec_ctrl_shadow())
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, vmx->spec_ctrl_shadow);
+
/* Unpin physical memory we referred to in vmcs02 */
kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (2 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02 Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S Chao Gao
` (5 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Josh Poimboeuf, Ilpo Järvinen,
Sean Christopherson, Kai Huang, Jithu Joseph, Kan Liang,
Paolo Bonzini, Sandipan Das, Vegard Nossum, Nikolay Borisov,
Rick Edgecombe, Adam Dunlap, Arjan van de Ven
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Mitigation for BHI is to use hardware control BHI_DIS_S or the software
sequence. On platforms that support BHI_DIS_S, a software sequence may
be ineffective to mitigate BHI. Guests that are not aware of BHI_DIS_S
on host, and deploy the ineffective software sequence clear_bhb_loop(),
may become vulnerable to BHI.
To overcome this problem Intel has defined a virtual MSR interface
through which guests can report their mitigation status and request VMM
to deploy relevant hardware mitigations.
Use this virtual MSR interface to tell VMM that the guest is using a
short software sequence. Based on this information a VMM can deploy
BHI_DIS_S for the guest using virtual SPEC_CTRL.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/include/asm/msr-index.h | 18 ++++++++++++++++++
arch/x86/kernel/cpu/bugs.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/cpu.h | 1 +
4 files changed, 46 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e72c2b872957..18a4081bf5cb 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -196,6 +196,7 @@
* IA32_XAPIC_DISABLE_STATUS MSR
* supported
*/
+#define ARCH_CAP_VIRTUAL_ENUM BIT_ULL(63) /* MSR_VIRTUAL_ENUMERATION supported */
#define MSR_IA32_FLUSH_CMD 0x0000010b
#define L1D_FLUSH BIT(0) /*
@@ -1178,6 +1179,23 @@
#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F
+/* Intel virtual MSRs */
+#define MSR_VIRTUAL_ENUMERATION 0x50000000
+#define VIRT_ENUM_MITIGATION_CTRL_SUPPORT BIT(0) /*
+ * Mitigation ctrl via virtual
+ * MSRs supported
+ */
+
+#define MSR_VIRTUAL_MITIGATION_ENUM 0x50000001
+#define MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT BIT(0) /* VMM supports BHI_DIS_S */
+
+#define MSR_VIRTUAL_MITIGATION_CTRL 0x50000002
+#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT 0 /*
+ * Request VMM to deploy
+ * BHI_DIS_S mitigation
+ */
+#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED BIT(MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT)
+
/* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114
#define MSR_VM_IGNNE 0xc0010115
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 295463707e68..e74e4c51d387 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -50,6 +50,8 @@ static void __init l1d_flush_select_mitigation(void);
static void __init srso_select_mitigation(void);
static void __init gds_select_mitigation(void);
+void virt_mitigation_ctrl_init(void);
+
/* The base value of the SPEC_CTRL MSR without task-specific bits set */
u64 x86_spec_ctrl_base;
EXPORT_SYMBOL_GPL(x86_spec_ctrl_base);
@@ -171,6 +173,8 @@ void __init cpu_select_mitigations(void)
*/
srso_select_mitigation();
gds_select_mitigation();
+
+ virt_mitigation_ctrl_init();
}
/*
@@ -1680,6 +1684,28 @@ static void __init bhi_select_mitigation(void)
pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
}
+void virt_mitigation_ctrl_init(void)
+{
+ u64 msr_virt_enum, msr_mitigation_enum;
+
+ if (!(x86_read_arch_cap_msr() & ARCH_CAP_VIRTUAL_ENUM))
+ return;
+
+ rdmsrl(MSR_VIRTUAL_ENUMERATION, msr_virt_enum);
+ if (!(msr_virt_enum & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return;
+
+ rdmsrl(MSR_VIRTUAL_MITIGATION_ENUM, msr_mitigation_enum);
+
+ if (msr_mitigation_enum & MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT) {
+ /* When BHI short seq is being used, request BHI_DIS_S */
+ if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
+ msr_set_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT);
+ else
+ msr_clear_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT);
+ }
+}
+
static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 754d91857d63..29f16655a7a0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1960,6 +1960,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
update_gds_msr();
tsx_ap_init();
+ virt_mitigation_ctrl_init();
}
void print_cpu_info(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index ea9e07d57c8d..1cddf506b6ae 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -87,6 +87,7 @@ void cpu_select_mitigations(void);
extern void x86_spec_ctrl_setup_ap(void);
extern void update_srbds_msr(void);
extern void update_gds_msr(void);
+extern void virt_mitigation_ctrl_init(void);
extern enum spectre_v2_mitigation spectre_v2_enabled;
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (3 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU Chao Gao
` (4 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Josh Poimboeuf, Ilpo Järvinen, Tony Luck,
Maciej S. Szmigiero, Kan Liang, Paolo Bonzini, Sandipan Das
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
On CPUs with RRSBA behavior a guest using retpoline mitigation could
become vulnerable to BHI. On such CPUs, when RSB underflows a RET could
take prediction from BTB. Although these predictions are limited to same
domain, they may be controllable from userspace using BHI.
Alderlake and newer CPUs have RRSBA_DIS_S knob in MSR_SPEC_CTRL to
disable RRSBA behavior. A guest migrating from older CPU may not be
aware of RRSBA_DIS_S. Use MSR_VIRTUAL_MITIGATION_CTRL to request VMM to
deploy RRSBA_DIS_S when retpoline mitigation is in use.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/include/asm/msr-index.h | 6 ++++++
arch/x86/kernel/cpu/bugs.c | 7 +++++++
2 files changed, 13 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18a4081bf5cb..469ab38c0ec8 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1188,6 +1188,7 @@
#define MSR_VIRTUAL_MITIGATION_ENUM 0x50000001
#define MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT BIT(0) /* VMM supports BHI_DIS_S */
+#define MITI_ENUM_RETPOLINE_S_SUPPORT BIT(1) /* VMM supports RRSBA_DIS_S */
#define MSR_VIRTUAL_MITIGATION_CTRL 0x50000002
#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT 0 /*
@@ -1195,6 +1196,11 @@
* BHI_DIS_S mitigation
*/
#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED BIT(MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT)
+#define MITI_CTRL_RETPOLINE_S_USED_BIT 1 /*
+ * Request VMM to deploy
+ * RRSBA_DIS_S mitigation
+ */
+#define MITI_CTRL_RETPOLINE_S_USED BIT(MITI_CTRL_RETPOLINE_S_USED_BIT)
/* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index e74e4c51d387..766f4340eddf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1704,6 +1704,13 @@ void virt_mitigation_ctrl_init(void)
else
msr_clear_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT);
}
+ if (msr_mitigation_enum & MITI_ENUM_RETPOLINE_S_SUPPORT) {
+ /* When retpoline is being used, request RRSBA_DIS_S */
+ if (boot_cpu_has(X86_FEATURE_RETPOLINE))
+ msr_set_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_RETPOLINE_S_USED_BIT);
+ else
+ msr_clear_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_RETPOLINE_S_USED_BIT);
+ }
}
static void __init spectre_v2_select_mitigation(void)
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (4 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
` (3 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
so that KVM can adjust the mask/value for each vCPU according to the
software mitigations the vCPU is using.
KVM_CAP_FORCE_SPEC_CTRL allows the userspace VMM to proactively enable
hardware mitigations (by setting some bits in IA32_SPEC_CTRL MSRs) to
protect the guest from becoming vulnerable to some security issues after
live migration. E.g., if a guest using the short BHB-clearing sequence
for BHI is migrated from a pre-SPR part to a SPR part will become
vulnerable for BHI. Current solution is the userspace VMM deploys
BHI_DIS_S for all guests migrated to SPR parts from pre-SPR parts.
But KVM_CAP_FORCE_SPEC_CTRL isn't flexible because the userspace VMM may
configure KVM to enable BHI_DIS_S for guests which don't care about BHI
at all or are using other mitigations (e.g, TSX abort sequence) for BHI.
This would cause unnecessary overhead to the guest.
To reduce the overhead, the idea is to let the guest communicate which
software mitigations are being used to the VMM via Intel-defined virtual
MSRs [1]. This information from guests is much more accurate. KVM can
adjust hardware mitigations accordingly to reduce the performance impact
to the guest as much as possible.
The Intel-defined value MSRs are per-thread scope. vCPUs _can_ program
different values to them. This means, KVM may need to apply different
mask/value to IA32_SPEC_CTRL MSR. So, cache force_spec_ctrl_value/mask
for each vCPU in preparation for adding support for intel-defined
virtual MSRs.
[1]: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 11 +++++++----
arch/x86/kvm/vmx/vmx.h | 7 +++++++
3 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 174790b2ffbc..efbc871d0466 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2390,7 +2390,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
exec_control &= TERTIARY_EXEC_SPEC_CTRL_SHADOW;
if (exec_control & TERTIARY_EXEC_SPEC_CTRL_SHADOW)
vmcs_write64(IA32_SPEC_CTRL_MASK,
- vmx->vcpu.kvm->arch.force_spec_ctrl_mask);
+ vmx->force_spec_ctrl_mask);
tertiary_exec_controls_set(vmx, exec_control);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 93c208f009cf..cdfcc1290d82 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2161,7 +2161,7 @@ static void vmx_set_spec_ctrl(struct kvm_vcpu *vcpu, u64 val)
vmx->spec_ctrl_shadow = val;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, val);
- vmx->spec_ctrl |= vcpu->kvm->arch.force_spec_ctrl_value;
+ vmx->spec_ctrl |= vmx->force_spec_ctrl_value;
}
}
@@ -4803,6 +4803,9 @@ static void init_vmcs(struct vcpu_vmx *vmx)
if (cpu_has_vmx_xsaves())
vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
+ vmx->force_spec_ctrl_mask = kvm->arch.force_spec_ctrl_mask;
+ vmx->force_spec_ctrl_value = kvm->arch.force_spec_ctrl_value;
+
if (cpu_has_spec_ctrl_shadow()) {
vmx->spec_ctrl_shadow = 0;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, 0);
@@ -4816,7 +4819,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
* guest modify other bits at will, without triggering VM-Exits.
*/
if (kvm->arch.force_spec_ctrl_mask)
- vmcs_write64(IA32_SPEC_CTRL_MASK, kvm->arch.force_spec_ctrl_mask);
+ vmcs_write64(IA32_SPEC_CTRL_MASK, vmx->force_spec_ctrl_mask);
else
vmcs_write64(IA32_SPEC_CTRL_MASK, 0);
}
@@ -7251,8 +7254,8 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
if (cpu_has_spec_ctrl_shadow()) {
vmx->spec_ctrl_shadow = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
vmx->spec_ctrl = (vmx->spec_ctrl_shadow &
- ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask) |
- vmx->vcpu.kvm->arch.force_spec_ctrl_value;
+ ~vmx->force_spec_ctrl_mask) |
+ vmx->force_spec_ctrl_value;
} else {
vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 97324f6ee01c..a4dfe538e5a8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -287,6 +287,13 @@ struct vcpu_vmx {
*/
u64 spec_ctrl_shadow;
+ /*
+ * Mask and value of SPEC_CTRL MSR bits which the guest is not allowed to
+ * change.
+ */
+ u64 force_spec_ctrl_mask;
+ u64 force_spec_ctrl_value;
+
u32 msr_ia32_umwait_control;
/*
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (5 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-12 4:22 ` Jim Mattson
2024-04-10 14:34 ` [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support Chao Gao
` (2 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Zhang Chen, Chao Gao,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
From: Zhang Chen <chen.zhang@intel.com>
Bit 63 of IA32_ARCH_CAPABILITIES MSR indicates availablility of the
VIRTUAL_ENUMERATION_MSR (index 0x50000000) which enumerates features
like e.g., mitigation enumeration that in turn is used for the guest to
report software mitigations it is using.
Advertise ARCH_CAP_VIRTUAL_ENUM support for VMX and emulate read/write
of the VIRTUAL_ENUMERATION_MSR. Now VIRTUAL_ENUMERATION_MSR is always 0.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 2 ++
arch/x86/kvm/x86.c | 16 +++++++++++++++-
4 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d1a9f9951635..e3406971a8b7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4288,6 +4288,7 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
{
switch (index) {
case MSR_IA32_MCG_EXT_CTL:
+ case MSR_VIRTUAL_ENUMERATION:
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
return false;
case MSR_IA32_SMBASE:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cdfcc1290d82..dcb06406fd09 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1955,6 +1955,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
return !(msr->data & ~valid_bits);
}
+#define VIRTUAL_ENUMERATION_VALID_BITS 0ULL
+
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
switch (msr->index) {
@@ -1962,6 +1964,9 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
if (!nested)
return 1;
return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
+ case MSR_VIRTUAL_ENUMERATION:
+ msr->data = VIRTUAL_ENUMERATION_VALID_BITS;
+ return 0;
default:
return KVM_MSR_RET_INVALID;
}
@@ -2113,6 +2118,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
+ case MSR_VIRTUAL_ENUMERATION:
+ if (!msr_info->host_initiated &&
+ !(vcpu->arch.arch_capabilities & ARCH_CAP_VIRTUAL_ENUM))
+ return 1;
+ msr_info->data = vmx->msr_virtual_enumeration;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2457,6 +2468,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
ret = kvm_set_msr_common(vcpu, msr_info);
break;
+ case MSR_VIRTUAL_ENUMERATION:
+ if (!msr_info->host_initiated)
+ return 1;
+ if (data & ~VIRTUAL_ENUMERATION_VALID_BITS)
+ return 1;
+
+ vmx->msr_virtual_enumeration = data;
+ break;
default:
find_uret_msr:
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a4dfe538e5a8..0519cf6187ac 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -294,6 +294,8 @@ struct vcpu_vmx {
u64 force_spec_ctrl_mask;
u64 force_spec_ctrl_value;
+ u64 msr_virtual_enumeration;
+
u32 msr_ia32_umwait_control;
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a59b5a93d0e..4721b6fe7641 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1564,6 +1564,7 @@ static const u32 emulated_msrs_all[] = {
MSR_K7_HWCR,
MSR_KVM_POLL_CONTROL,
+ MSR_VIRTUAL_ENUMERATION,
};
static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
@@ -1579,6 +1580,7 @@ static const u32 msr_based_features_all_except_vmx[] = {
MSR_IA32_UCODE_REV,
MSR_IA32_ARCH_CAPABILITIES,
MSR_IA32_PERF_CAPABILITIES,
+ MSR_VIRTUAL_ENUMERATION,
};
static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
@@ -1621,7 +1623,8 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
- ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO)
+ ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | \
+ ARCH_CAP_VIRTUAL_ENUM)
static u64 kvm_get_arch_capabilities(void)
{
@@ -1635,6 +1638,17 @@ static u64 kvm_get_arch_capabilities(void)
*/
data |= ARCH_CAP_PSCHANGE_MC_NO;
+ /*
+ * Virtual enumeration is a paravirt feature. The only usage for now
+ * is to bridge the gap caused by microarchitecture changes between
+ * different Intel processors. And its usage is linked to "virtualize
+ * IA32_SPEC_CTRL" which is a VMX feature. Whether AMD SVM can benefit
+ * from the same usage and how to implement it is still unclear. Limit
+ * virtual enumeration to VMX.
+ */
+ if (static_call(kvm_x86_has_emulated_msr)(NULL, MSR_VIRTUAL_ENUMERATION))
+ data |= ARCH_CAP_VIRTUAL_ENUM;
+
/*
* If we're doing cache flushes (either "always" or "cond")
* we will do one whenever the guest does a vmlaunch/vmresume.
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (6 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT Chao Gao
9 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Zhang Chen, Chao Gao,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
From: Zhang Chen <chen.zhang@intel.com>
Advertise MITIGATION_CTRL support and emulate accesses to two associated
MSRs.
MITIGATION_CTRL is enumerated by bit 0 of MSR_VIRTUAL_ENUMERATION. If
supported, two virtual MSRs MSR_VIRTUAL_MITIGATION_ENUM(0x50000001) and
MSR_VIRTUAL_MITIGATION_CTRL(0x50000002) are available.
The guest can use the two MSRs to report software mitigation status.
According to this information, KVM can deploy some alternative
mitigations (e.g., hardware mitigations) for the guest if some software
mitigations are not effective on the host.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 36 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 3 +++
arch/x86/kvm/x86.c | 3 +++
4 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e3406971a8b7..8a080592aa54 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4289,6 +4289,8 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
switch (index) {
case MSR_IA32_MCG_EXT_CTL:
case MSR_VIRTUAL_ENUMERATION:
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ case MSR_VIRTUAL_MITIGATION_CTRL:
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
return false;
case MSR_IA32_SMBASE:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dcb06406fd09..cc260b14f8df 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1955,7 +1955,9 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
return !(msr->data & ~valid_bits);
}
-#define VIRTUAL_ENUMERATION_VALID_BITS 0ULL
+#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
+#define MITI_ENUM_VALID_BITS 0ULL
+#define MITI_CTRL_VALID_BITS 0ULL
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -1967,6 +1969,9 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
case MSR_VIRTUAL_ENUMERATION:
msr->data = VIRTUAL_ENUMERATION_VALID_BITS;
return 0;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ msr->data = MITI_ENUM_VALID_BITS;
+ return 0;
default:
return KVM_MSR_RET_INVALID;
}
@@ -2124,6 +2129,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = vmx->msr_virtual_enumeration;
break;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ msr_info->data = vmx->msr_virtual_mitigation_enum;
+ break;
+ case MSR_VIRTUAL_MITIGATION_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ msr_info->data = vmx->msr_virtual_mitigation_ctrl;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2476,7 +2493,23 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmx->msr_virtual_enumeration = data;
break;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ if (!msr_info->host_initiated)
+ return 1;
+ if (data & ~MITI_ENUM_VALID_BITS)
+ return 1;
+
+ vmx->msr_virtual_mitigation_enum = data;
+ break;
+ case MSR_VIRTUAL_MITIGATION_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ if (data & ~MITI_CTRL_VALID_BITS)
+ return 1;
+ vmx->msr_virtual_mitigation_ctrl = data;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_index);
@@ -4901,6 +4934,7 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
*/
vmx->pi_desc.nv = POSTED_INTR_VECTOR;
vmx->pi_desc.sn = 1;
+ vmx->msr_virtual_mitigation_ctrl = 0;
}
static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0519cf6187ac..7be5dd5dde6c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -296,6 +296,9 @@ struct vcpu_vmx {
u64 msr_virtual_enumeration;
+ u64 msr_virtual_mitigation_enum;
+ u64 msr_virtual_mitigation_ctrl;
+
u32 msr_ia32_umwait_control;
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4721b6fe7641..f55d26d7c79a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1565,6 +1565,8 @@ static const u32 emulated_msrs_all[] = {
MSR_K7_HWCR,
MSR_KVM_POLL_CONTROL,
MSR_VIRTUAL_ENUMERATION,
+ MSR_VIRTUAL_MITIGATION_ENUM,
+ MSR_VIRTUAL_MITIGATION_CTRL,
};
static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
@@ -1581,6 +1583,7 @@ static const u32 msr_based_features_all_except_vmx[] = {
MSR_IA32_ARCH_CAPABILITIES,
MSR_IA32_PERF_CAPABILITIES,
MSR_VIRTUAL_ENUMERATION,
+ MSR_VIRTUAL_MITIGATION_ENUM,
};
static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (7 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-06-11 1:34 ` Sean Christopherson
2024-04-10 14:34 ` [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT Chao Gao
9 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Zhang Chen, Chao Gao,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
From: Zhang Chen <chen.zhang@intel.com>
Allow guest to report if the short BHB-clearing sequence is in use.
KVM will deploy BHI_DIS_S for the guest if the short BHB-clearing
sequence is in use and the processor doesn't enumerate BHI_NO.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc260b14f8df..c5ceaebd954b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1956,8 +1956,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
}
#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
-#define MITI_ENUM_VALID_BITS 0ULL
-#define MITI_CTRL_VALID_BITS 0ULL
+#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
+#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
struct vmx_uret_msr *msr;
int ret = 0;
u32 msr_index = msr_info->index;
- u64 data = msr_info->data;
+ u64 data = msr_info->data, spec_ctrl_mask = 0;
u32 index;
switch (msr_index) {
@@ -2508,6 +2508,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~MITI_CTRL_VALID_BITS)
return 1;
+ if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
+ kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
+ !(host_arch_capabilities & ARCH_CAP_BHI_NO))
+ spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
+
+ /*
+ * Intercept IA32_SPEC_CTRL to disallow guest from changing
+ * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
+ * e.g., in nested case.
+ */
+ if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
+ vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
+
+ /*
+ * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
+ * MSR_VIRTUAL_MITIGATION_CTRL.
+ */
+ spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
+
+ vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
+ spec_ctrl_mask;
+ vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
+ spec_ctrl_mask;
+ vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
+
vmx->msr_virtual_mitigation_ctrl = data;
break;
default:
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (8 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
9 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Zhang Chen,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
Allow guest to report if retpoline is used in supervisor mode.
KVM will deploy RRSBA_DIS_S for guest if guest is using retpoline and
the processor enumerates RRSBA.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c5ceaebd954b..235cb6ad69c0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1956,8 +1956,10 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
}
#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
-#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
-#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
+#define MITI_ENUM_VALID_BITS (MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT | \
+ MITI_ENUM_RETPOLINE_S_SUPPORT)
+#define MITI_CTRL_VALID_BITS (MITI_CTRL_BHB_CLEAR_SEQ_S_USED | \
+ MITI_CTRL_RETPOLINE_S_USED)
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -2508,6 +2510,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~MITI_CTRL_VALID_BITS)
return 1;
+ if (data & MITI_CTRL_RETPOLINE_S_USED &&
+ kvm_cpu_cap_has(X86_FEATURE_RRSBA_CTRL) &&
+ host_arch_capabilities & ARCH_CAP_RRSBA)
+ spec_ctrl_mask |= SPEC_CTRL_RRSBA_DIS_S;
+
if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
!(host_arch_capabilities & ARCH_CAP_BHI_NO))
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
@ 2024-04-11 4:15 kernel test robot
0 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2024-04-11 4:15 UTC (permalink / raw)
Cc: oe-kbuild-all, llvm
In-Reply-To: <20240410143446.797262-2-chao.gao@intel.com>
References: <20240410143446.797262-2-chao.gao@intel.com>
TO: Chao Gao <chao.gao@intel.com>
Hi Chao,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:
[auto build test WARNING on 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702]
url: https://github.com/intel-lab-lkp/linux/commits/Chao-Gao/KVM-VMX-Virtualize-Intel-IA32_SPEC_CTRL/20240410-224015
base: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
patch link: https://lore.kernel.org/r/20240410143446.797262-2-chao.gao%40intel.com
patch subject: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20240411/202404111234.ubrDd2tE-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240411/202404111234.ubrDd2tE-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404111234.ubrDd2tE-lkp@intel.com/
All warnings (new ones prefixed by >>):
vmlinux.o: warning: objtool: balance_leaf+0x7738: stack state mismatch: cfa1=4+376 cfa2=4+368
>> vmlinux.o: warning: objtool: vmx_spec_ctrl_restore_host+0x21: call to cpu_has_spec_ctrl_shadow() leaves .noinstr.text section
vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x46: relocation to !ENDBR: .text+0x3bedb4
vmlinux.o: warning: objtool: bad call to elf_init_reloc_text_sym() for data symbol .rodata
objdump-func vmlinux.o vmx_spec_ctrl_restore_host:
0000 0000000000001ec0 <vmx_spec_ctrl_restore_host>:
0000 1ec0: f3 0f 1e fa endbr64
0004 1ec4: 41 56 push %r14
0006 1ec6: 53 push %rbx
0007 1ec7: 65 48 8b 1d 00 00 00 00 mov %gs:0x0(%rip),%rbx # 1ecf <vmx_spec_ctrl_restore_host+0xf> 1ecb: R_X86_64_PC32 x86_spec_ctrl_current-0x4
000f 1ecf: e9 00 00 00 00 jmp 1ed4 <vmx_spec_ctrl_restore_host+0x14> 1ed0: R_X86_64_PLT32 .altinstr_aux+0x566
0014 1ed4: f3 0f 1e fa endbr64
0018 1ed8: 49 89 fe mov %rdi,%r14
001b 1edb: 40 f6 c6 02 test $0x2,%sil
001f 1edf: 74 47 je 1f28 <vmx_spec_ctrl_restore_host+0x68>
0021 1ee1: e8 00 00 00 00 call 1ee6 <vmx_spec_ctrl_restore_host+0x26> 1ee2: R_X86_64_PLT32 .text+0x1ff87c
0026 1ee6: 84 c0 test %al,%al
0028 1ee8: 74 29 je 1f13 <vmx_spec_ctrl_restore_host+0x53>
002a 1eea: 66 90 xchg %ax,%ax
002c 1eec: b8 4c 20 00 00 mov $0x204c,%eax
0031 1ef1: 0f 78 c0 vmread %rax,%rax
0034 1ef4: 0f 86 9d 00 00 00 jbe 1f97 <vmx_spec_ctrl_restore_host+0xd7>
003a 1efa: 49 8b 0e mov (%r14),%rcx
003d 1efd: 48 8b 91 60 a1 00 00 mov 0xa160(%rcx),%rdx
0044 1f04: 48 f7 d2 not %rdx
0047 1f07: 48 21 c2 and %rax,%rdx
004a 1f0a: 48 0b 91 68 a1 00 00 or 0xa168(%rcx),%rdx
0051 1f11: eb 0e jmp 1f21 <vmx_spec_ctrl_restore_host+0x61>
0053 1f13: b9 48 00 00 00 mov $0x48,%ecx
0058 1f18: 0f 32 rdmsr
005a 1f1a: 48 c1 e2 20 shl $0x20,%rdx
005e 1f1e: 48 09 c2 or %rax,%rdx
0061 1f21: 49 89 96 38 1f 00 00 mov %rdx,0x1f38(%r14)
0068 1f28: e9 00 00 00 00 jmp 1f2d <vmx_spec_ctrl_restore_host+0x6d> 1f29: R_X86_64_PLT32 .altinstr_aux+0x578
006d 1f2d: f3 0f 1e fa endbr64
0071 1f31: 48 89 da mov %rbx,%rdx
0074 1f34: 48 c1 ea 20 shr $0x20,%rdx
0078 1f38: b9 48 00 00 00 mov $0x48,%ecx
007d 1f3d: 89 d8 mov %ebx,%eax
007f 1f3f: 0f 30 wrmsr
0081 1f41: 90 nop
0082 1f42: 90 nop
0083 1f43: 90 nop
0084 1f44: f3 0f 1e fa endbr64
0088 1f48: 5b pop %rbx
0089 1f49: 41 5e pop %r14
008b 1f4b: 31 c0 xor %eax,%eax
008d 1f4d: 31 c9 xor %ecx,%ecx
008f 1f4f: 31 ff xor %edi,%edi
0091 1f51: 31 d2 xor %edx,%edx
0093 1f53: 31 f6 xor %esi,%esi
0095 1f55: 2e e9 00 00 00 00 cs jmp 1f5b <vmx_spec_ctrl_restore_host+0x9b> 1f57: R_X86_64_PLT32 __x86_return_thunk-0x4
009b 1f5b: f3 0f 1e fa endbr64
009f 1f5f: 49 39 9e 38 1f 00 00 cmp %rbx,0x1f38(%r14)
00a6 1f66: 74 d9 je 1f41 <vmx_spec_ctrl_restore_host+0x81>
00a8 1f68: eb c3 jmp 1f2d <vmx_spec_ctrl_restore_host+0x6d>
00aa 1f6a: f3 0f 1e fa endbr64
00ae 1f6e: 81 3d 00 00 00 00 09 13 00 00 cmpl $0x1309,0x0(%rip) # 1f78 <vmx_spec_ctrl_restore_host+0xb8> 1f70: R_X86_64_PC32 nr_evmcs_1_fields-0x8
00b8 1f78: 72 3f jb 1fb9 <vmx_spec_ctrl_restore_host+0xf9>
00ba 1f7a: 0f b7 05 00 00 00 00 movzwl 0x0(%rip),%eax # 1f81 <vmx_spec_ctrl_restore_host+0xc1> 1f7d: R_X86_64_PC32 vmcs_field_to_evmcs_1+0x4c1c
00c1 1f81: 48 85 c0 test %rax,%rax
00c4 1f84: 74 33 je 1fb9 <vmx_spec_ctrl_restore_host+0xf9>
00c6 1f86: 65 48 8b 0d 00 00 00 00 mov %gs:0x0(%rip),%rcx # 1f8e <vmx_spec_ctrl_restore_host+0xce> 1f8a: R_X86_64_PC32 current_vmcs-0x4
00ce 1f8e: 48 8b 04 01 mov (%rcx,%rax,1),%rax
00d2 1f92: e9 63 ff ff ff jmp 1efa <vmx_spec_ctrl_restore_host+0x3a>
00d7 1f97: f3 0f 1e fa endbr64
00db 1f9b: 90 nop
00dc 1f9c: bf 4c 20 00 00 mov $0x204c,%edi
00e1 1fa1: e8 00 00 00 00 call 1fa6 <vmx_spec_ctrl_restore_host+0xe6> 1fa2: R_X86_64_PLT32 vmread_error-0x4
00e6 1fa6: 90 nop
00e7 1fa7: eb 09 jmp 1fb2 <vmx_spec_ctrl_restore_host+0xf2>
00e9 1fa9: f3 0f 1e fa endbr64
00ed 1fad: e8 00 00 00 00 call 1fb2 <vmx_spec_ctrl_restore_host+0xf2> 1fae: R_X86_64_PLT32 kvm_spurious_fault-0x4
00f2 1fb2: 31 c0 xor %eax,%eax
00f4 1fb4: e9 41 ff ff ff jmp 1efa <vmx_spec_ctrl_restore_host+0x3a>
00f9 1fb9: 80 3d 00 00 00 00 00 cmpb $0x0,0x0(%rip) # 1fc0 <vmx_spec_ctrl_restore_host+0x100> 1fbb: R_X86_64_PC32 .data.once+0x88
0100 1fc0: 75 f0 jne 1fb2 <vmx_spec_ctrl_restore_host+0xf2>
0102 1fc2: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # 1fc9 <vmx_spec_ctrl_restore_host+0x109> 1fc4: R_X86_64_PC32 .data.once+0x88
0109 1fc9: 90 nop
010a 1fca: be 4c 20 00 00 mov $0x204c,%esi
010f 1fcf: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1fd2: R_X86_64_32S .rodata.str1.1+0xb59591
0116 1fd6: e8 00 00 00 00 call 1fdb <vmx_spec_ctrl_restore_host+0x11b> 1fd7: R_X86_64_PLT32 __warn_printk-0x4
011b 1fdb: 90 nop
011c 1fdc: 0f 0b ud2
011e 1fde: 90 nop
011f 1fdf: 90 nop
0120 1fe0: eb d0 jmp 1fb2 <vmx_spec_ctrl_restore_host+0xf2>
0122 1fe2: 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
0131 1ff1: 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
0140 2000: 90 nop
0141 2001: 90 nop
0142 2002: 90 nop
0143 2003: 90 nop
0144 2004: 90 nop
0145 2005: 90 nop
0146 2006: 90 nop
0147 2007: 90 nop
0148 2008: 90 nop
0149 2009: 90 nop
014a 200a: 90 nop
014b 200b: 90 nop
014c 200c: 90 nop
014d 200d: 90 nop
014e 200e: 90 nop
014f 200f: 90 nop
0150 2010: 90 nop
0151 2011: 90 nop
0152 2012: 90 nop
0153 2013: 90 nop
0154 2014: 90 nop
0155 2015: 90 nop
0156 2016: 90 nop
0157 2017: 90 nop
0158 2018: 90 nop
0159 2019: 90 nop
015a 201a: 90 nop
015b 201b: 90 nop
015c 201c: 90 nop
015d 201d: 90 nop
015e 201e: 90 nop
015f 201f: 90 nop
0160 2020: 90 nop
0161 2021: 90 nop
0162 2022: 90 nop
0163 2023: 90 nop
0164 2024: 90 nop
0165 2025: 90 nop
0166 2026: 90 nop
0167 2027: 90 nop
0168 2028: 90 nop
0169 2029: 90 nop
016a 202a: 90 nop
016b 202b: 90 nop
016c 202c: 90 nop
016d 202d: 90 nop
016e 202e: 90 nop
016f 202f: 90 nop
0170 2030: 90 nop
0171 2031: 90 nop
0172 2032: 90 nop
0173 2033: 90 nop
0174 2034: 90 nop
0175 2035: 90 nop
0176 2036: 90 nop
0177 2037: 90 nop
0178 2038: 90 nop
0179 2039: 90 nop
017a 203a: 90 nop
017b 203b: 90 nop
017c 203c: 90 nop
017d 203d: 90 nop
017e 203e: 90 nop
017f 203f: 90 nop
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
@ 2024-04-12 4:07 ` Jim Mattson
2024-04-12 10:18 ` Chao Gao
0 siblings, 1 reply; 21+ messages in thread
From: Jim Mattson @ 2024-04-12 4:07 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta,
Sean Christopherson, Paolo Bonzini, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-doc
On Wed, Apr 10, 2024 at 7:35 AM Chao Gao <chao.gao@intel.com> wrote:
>
> From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>
> Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
> written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
> value directly to hardware. There is a tertiary control for
> IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
> masked to prevent guests from changing those bits.
>
> Add controls setting the mask for IA32_SPEC_CTRL and desired value for
> masked bits.
>
> These new controls are especially helpful for protecting guests that
> don't know about BHI_DIS_S and that are running on hardware that
> supports it. This allows the hypervisor to set BHI_DIS_S to fully
> protect the guest.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> [ add a new ioctl to report supported bits. Fix the inverted check ]
> Signed-off-by: Chao Gao <chao.gao@intel.com>
This looks quite Intel-centric. Isn't this feature essentially the
same as AMD's V_SPEC_CTRL? Can't we consolidate the code, rather than
having completely independent implementations for AMD and Intel?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
@ 2024-04-12 4:22 ` Jim Mattson
0 siblings, 0 replies; 21+ messages in thread
From: Jim Mattson @ 2024-04-12 4:22 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
On Wed, Apr 10, 2024 at 8:08 AM Chao Gao <chao.gao@intel.com> wrote:
>
> From: Zhang Chen <chen.zhang@intel.com>
>
> Bit 63 of IA32_ARCH_CAPABILITIES MSR indicates availablility of the
> VIRTUAL_ENUMERATION_MSR (index 0x50000000) which enumerates features
> like e.g., mitigation enumeration that in turn is used for the guest to
> report software mitigations it is using.
>
> Advertise ARCH_CAP_VIRTUAL_ENUM support for VMX and emulate read/write
> of the VIRTUAL_ENUMERATION_MSR. Now VIRTUAL_ENUMERATION_MSR is always 0.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Co-developed-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> arch/x86/kvm/svm/svm.c | 1 +
> arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 2 ++
> arch/x86/kvm/x86.c | 16 +++++++++++++++-
> 4 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d1a9f9951635..e3406971a8b7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4288,6 +4288,7 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
> {
> switch (index) {
> case MSR_IA32_MCG_EXT_CTL:
> + case MSR_VIRTUAL_ENUMERATION:
> case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
> return false;
> case MSR_IA32_SMBASE:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index cdfcc1290d82..dcb06406fd09 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1955,6 +1955,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
> return !(msr->data & ~valid_bits);
> }
>
> +#define VIRTUAL_ENUMERATION_VALID_BITS 0ULL
> +
> static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> {
> switch (msr->index) {
> @@ -1962,6 +1964,9 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> if (!nested)
> return 1;
> return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
> + case MSR_VIRTUAL_ENUMERATION:
> + msr->data = VIRTUAL_ENUMERATION_VALID_BITS;
> + return 0;
> default:
> return KVM_MSR_RET_INVALID;
> }
> @@ -2113,6 +2118,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_DEBUGCTLMSR:
> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> break;
> + case MSR_VIRTUAL_ENUMERATION:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.arch_capabilities & ARCH_CAP_VIRTUAL_ENUM))
> + return 1;
> + msr_info->data = vmx->msr_virtual_enumeration;
> + break;
> default:
> find_uret_msr:
> msr = vmx_find_uret_msr(vmx, msr_info->index);
> @@ -2457,6 +2468,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> }
> ret = kvm_set_msr_common(vcpu, msr_info);
> break;
> + case MSR_VIRTUAL_ENUMERATION:
> + if (!msr_info->host_initiated)
> + return 1;
> + if (data & ~VIRTUAL_ENUMERATION_VALID_BITS)
> + return 1;
> +
> + vmx->msr_virtual_enumeration = data;
> + break;
>
> default:
> find_uret_msr:
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index a4dfe538e5a8..0519cf6187ac 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -294,6 +294,8 @@ struct vcpu_vmx {
> u64 force_spec_ctrl_mask;
> u64 force_spec_ctrl_value;
>
> + u64 msr_virtual_enumeration;
> +
> u32 msr_ia32_umwait_control;
>
> /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9a59b5a93d0e..4721b6fe7641 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1564,6 +1564,7 @@ static const u32 emulated_msrs_all[] = {
>
> MSR_K7_HWCR,
> MSR_KVM_POLL_CONTROL,
> + MSR_VIRTUAL_ENUMERATION,
> };
>
> static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
> @@ -1579,6 +1580,7 @@ static const u32 msr_based_features_all_except_vmx[] = {
> MSR_IA32_UCODE_REV,
> MSR_IA32_ARCH_CAPABILITIES,
> MSR_IA32_PERF_CAPABILITIES,
> + MSR_VIRTUAL_ENUMERATION,
> };
>
> static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
> @@ -1621,7 +1623,8 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
> ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
> ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
> ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
> - ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO)
> + ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | \
> + ARCH_CAP_VIRTUAL_ENUM)
>
> static u64 kvm_get_arch_capabilities(void)
> {
> @@ -1635,6 +1638,17 @@ static u64 kvm_get_arch_capabilities(void)
> */
> data |= ARCH_CAP_PSCHANGE_MC_NO;
>
> + /*
> + * Virtual enumeration is a paravirt feature. The only usage for now
> + * is to bridge the gap caused by microarchitecture changes between
> + * different Intel processors. And its usage is linked to "virtualize
> + * IA32_SPEC_CTRL" which is a VMX feature. Whether AMD SVM can benefit
> + * from the same usage and how to implement it is still unclear. Limit
> + * virtual enumeration to VMX.
> + */
Virtualize IA32_SPEC_CTRL has been an SVM feature for years. See
https://lore.kernel.org/kvm/160738054169.28590.5171339079028237631.stgit@bmoger-ubuntu/.
> + if (static_call(kvm_x86_has_emulated_msr)(NULL, MSR_VIRTUAL_ENUMERATION))
> + data |= ARCH_CAP_VIRTUAL_ENUM;
> +
> /*
> * If we're doing cache flushes (either "always" or "cond")
> * we will do one whenever the guest does a vmlaunch/vmresume.
> --
> 2.39.3
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-12 4:07 ` Jim Mattson
@ 2024-04-12 10:18 ` Chao Gao
2024-06-03 23:55 ` Sean Christopherson
0 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2024-04-12 10:18 UTC (permalink / raw)
To: Jim Mattson
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta,
Sean Christopherson, Paolo Bonzini, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-doc
On Thu, Apr 11, 2024 at 09:07:31PM -0700, Jim Mattson wrote:
>On Wed, Apr 10, 2024 at 7:35 AM Chao Gao <chao.gao@intel.com> wrote:
>>
>> From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>>
>> Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
>> written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
>> value directly to hardware. There is a tertiary control for
>> IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
>> masked to prevent guests from changing those bits.
>>
>> Add controls setting the mask for IA32_SPEC_CTRL and desired value for
>> masked bits.
>>
>> These new controls are especially helpful for protecting guests that
>> don't know about BHI_DIS_S and that are running on hardware that
>> supports it. This allows the hypervisor to set BHI_DIS_S to fully
>> protect the guest.
>>
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>> [ add a new ioctl to report supported bits. Fix the inverted check ]
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>
>This looks quite Intel-centric. Isn't this feature essentially the
>same as AMD's V_SPEC_CTRL?
Yes. they are almost the same. one small difference is intel's version can
force some bits off though I don't see how forcing bits off can be useful.
>Can't we consolidate the code, rather than
>having completely independent implementations for AMD and Intel?
We surely can consolidate the code. I will do this.
I have a question about V_SPEC_CTRL. w/ V_SPEC_CTRL, the SPEC_CTRL MSR retains
the host's value on VM-enter:
.macro RESTORE_GUEST_SPEC_CTRL
/* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
ALTERNATIVE_2 "", \
"jmp 800f", X86_FEATURE_MSR_SPEC_CTRL, \
"", X86_FEATURE_V_SPEC_CTRL
Does this mean all mitigations used by the host will be enabled for the guest
and guests cannot disable them?
Is this intentional? this looks suboptimal. Why not set SPEC_CTRL value to 0 and
let guest decide which features to enable? On the VMX side, we need host to
apply certain hardware mitigations (i.e., BHI_DIS_S and RRSBA_DIS_S) for guest
because BHI's software mitigation may be ineffective. I am not sure why SVM is
enabling all mitigations used by the host for guests. Wouldn't it be better to
enable them on an as-needed basis?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-12 10:18 ` Chao Gao
@ 2024-06-03 23:55 ` Sean Christopherson
0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2024-06-03 23:55 UTC (permalink / raw)
To: Chao Gao
Cc: Jim Mattson, kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta,
Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-doc
On Fri, Apr 12, 2024, Chao Gao wrote:
> On Thu, Apr 11, 2024 at 09:07:31PM -0700, Jim Mattson wrote:
> >On Wed, Apr 10, 2024 at 7:35 AM Chao Gao <chao.gao@intel.com> wrote:
> >>
> >> From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> >>
> >> Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
> >> written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
> >> value directly to hardware. There is a tertiary control for
> >> IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
> >> masked to prevent guests from changing those bits.
> >>
> >> Add controls setting the mask for IA32_SPEC_CTRL and desired value for
> >> masked bits.
> >>
> >> These new controls are especially helpful for protecting guests that
> >> don't know about BHI_DIS_S and that are running on hardware that
> >> supports it. This allows the hypervisor to set BHI_DIS_S to fully
> >> protect the guest.
> >>
> >> Suggested-by: Sean Christopherson <seanjc@google.com>
> >> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> >> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> >> [ add a new ioctl to report supported bits. Fix the inverted check ]
> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >
> >This looks quite Intel-centric. Isn't this feature essentially the
> >same as AMD's V_SPEC_CTRL?
In spirit, yes. In practice, not really. The implementations required for each
end up being quite different. I think the only bit of code that could be reused
by SVM, and isn't already, is the generation of supported_force_spec_ctrl.
+ kvm_caps.supported_force_spec_ctrl = 0;
+
+ if (cpu_has_spec_ctrl_shadow()) {
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_IBRS;
+
+ if (boot_cpu_has(X86_FEATURE_STIBP))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_STIBP;
+
+ if (boot_cpu_has(X86_FEATURE_SSBD))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_SSBD;
+
+ if (boot_cpu_has(X86_FEATURE_RRSBA_CTRL) &&
+ (host_arch_capabilities & ARCH_CAP_RRSBA))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_RRSBA_DIS_S;
+
+ if (boot_cpu_has(X86_FEATURE_BHI_CTRL))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_BHI_DIS_S;
+ }
> Yes. they are almost the same. one small difference is intel's version can
> force some bits off though I don't see how forcing bits off can be useful.
Another not-so-small difference is that Intel's version can also force bits *on*,
and force them on only for the guest with minimal overhead.
> >Can't we consolidate the code, rather than
> >having completely independent implementations for AMD and Intel?
>
> We surely can consolidate the code. I will do this.
>
> I have a question about V_SPEC_CTRL. w/ V_SPEC_CTRL, the SPEC_CTRL MSR retains
> the host's value on VM-enter:
>
> .macro RESTORE_GUEST_SPEC_CTRL
> /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
> ALTERNATIVE_2 "", \
> "jmp 800f", X86_FEATURE_MSR_SPEC_CTRL, \
> "", X86_FEATURE_V_SPEC_CTRL
>
> Does this mean all mitigations used by the host will be enabled for the guest
> and guests cannot disable them?
Yes.
> Is this intentional? this looks suboptimal. Why not set SPEC_CTRL value to 0 and
> let guest decide which features to enable? On the VMX side, we need host to
> apply certain hardware mitigations (i.e., BHI_DIS_S and RRSBA_DIS_S) for guest
> because BHI's software mitigation may be ineffective. I am not sure why SVM is
> enabling all mitigations used by the host for guests. Wouldn't it be better to
> enable them on an as-needed basis?
AMD's V_SPEC_CTRL doesn't provide a fast context switch of SPEC_CTRL, it performs
a bitwise-OR of the host and guest values. So to load a subset (or superset) of
the host protections, KVM would need to do an extra WRMSR before VMRUN, and again
after VMRUN.
That said, I have no idea whether or not avoiding WRMSR on AMD is optimal.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
@ 2024-06-11 1:34 ` Sean Christopherson
2024-06-11 10:48 ` Chao Gao
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2024-06-11 1:34 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Wed, Apr 10, 2024, Chao Gao wrote:
> From: Zhang Chen <chen.zhang@intel.com>
>
> Allow guest to report if the short BHB-clearing sequence is in use.
>
> KVM will deploy BHI_DIS_S for the guest if the short BHB-clearing
> sequence is in use and the processor doesn't enumerate BHI_NO.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 31 ++++++++++++++++++++++++++++---
> 1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index cc260b14f8df..c5ceaebd954b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1956,8 +1956,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
> }
>
> #define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
> -#define MITI_ENUM_VALID_BITS 0ULL
> -#define MITI_CTRL_VALID_BITS 0ULL
> +#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
> +#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
>
> static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> {
> @@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> struct vmx_uret_msr *msr;
> int ret = 0;
> u32 msr_index = msr_info->index;
> - u64 data = msr_info->data;
> + u64 data = msr_info->data, spec_ctrl_mask = 0;
> u32 index;
>
> switch (msr_index) {
> @@ -2508,6 +2508,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (data & ~MITI_CTRL_VALID_BITS)
> return 1;
>
> + if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
> + kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
> + !(host_arch_capabilities & ARCH_CAP_BHI_NO))
> + spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
> +
> + /*
> + * Intercept IA32_SPEC_CTRL to disallow guest from changing
> + * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
> + * e.g., in nested case.
> + */
> + if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
> + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
> +
> + /*
> + * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
> + * MSR_VIRTUAL_MITIGATION_CTRL.
> + */
> + spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
> +
> + vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
> + spec_ctrl_mask;
> + vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
> + spec_ctrl_mask;
> + vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
> +
> vmx->msr_virtual_mitigation_ctrl = data;
> break;
I continue find all of this unpalatable. The guest tells KVM what software
mitigations the guest is using, and then KVM is supposed to translate that into
some hardware functionality? And merge that with userspace's own overrides?
Blech.
With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
there's no need for KVM to handle them for performance reasons.
That way KVM doesn't need to deal with the the virtual MSRs, userspace can make
an informed decision when deciding how to set KVM_CAP_FORCE_SPEC_CTRL, and as a
bonus, rollouts for new mitigation thingies should be faster as updating userspace
is typically easier than updating the kernel/KVM.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 1:34 ` Sean Christopherson
@ 2024-06-11 10:48 ` Chao Gao
2024-06-11 13:34 ` Sean Christopherson
0 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2024-06-11 10:48 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
>> + if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
>> + kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
>> + !(host_arch_capabilities & ARCH_CAP_BHI_NO))
>> + spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
>> +
>> + /*
>> + * Intercept IA32_SPEC_CTRL to disallow guest from changing
>> + * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
>> + * e.g., in nested case.
>> + */
>> + if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
>> + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
>> +
>> + /*
>> + * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
>> + * MSR_VIRTUAL_MITIGATION_CTRL.
>> + */
>> + spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
>> +
>> + vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
>> + spec_ctrl_mask;
>> + vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
>> + spec_ctrl_mask;
>> + vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
>> +
>> vmx->msr_virtual_mitigation_ctrl = data;
>> break;
>
>I continue find all of this unpalatable. The guest tells KVM what software
>mitigations the guest is using, and then KVM is supposed to translate that into
>some hardware functionality? And merge that with userspace's own overrides?
Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
wanted to punt to userspace ...
>
>Blech.
>
>With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
>Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
>Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
>to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
>there's no need for KVM to handle them for performance reasons.
... I had this idea of implementing policy-related stuff in userspace, and I wrote
in the cover-letter:
"""
1. the KVM<->userspace ABI defined in patch 1
I am wondering if we can allow the userspace to configure the mask
and the shadow value during guest's lifetime and do it on a vCPU basis.
this way, in conjunction with "virtual MSRs" or any other interfaces,
the usespace can adjust hardware mitigations applied to the guest during
guest's lifetime e.g., for the best performance.
"""
As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
the mask and shadow values adjustable and applicable on a per-vCPU basis. The
tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
preferable interfaces, they could also benefit from these changes.
Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
>
>That way KVM doesn't need to deal with the the virtual MSRs, userspace can make
>an informed decision when deciding how to set KVM_CAP_FORCE_SPEC_CTRL, and as a
>bonus, rollouts for new mitigation thingies should be faster as updating userspace
>is typically easier than updating the kernel/KVM.
Good point!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 10:48 ` Chao Gao
@ 2024-06-11 13:34 ` Sean Christopherson
2024-06-11 14:08 ` Chao Gao
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2024-06-11 13:34 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Tue, Jun 11, 2024, Chao Gao wrote:
> >I continue find all of this unpalatable. The guest tells KVM what software
> >mitigations the guest is using, and then KVM is supposed to translate that into
> >some hardware functionality? And merge that with userspace's own overrides?
>
> Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
> wanted to punt to userspace ...
>
> >
> >Blech.
> >
> >With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
> >Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
> >Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
> >to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
> >there's no need for KVM to handle them for performance reasons.
>
> ... I had this idea of implementing policy-related stuff in userspace, and I wrote
> in the cover-letter:
>
> """
> 1. the KVM<->userspace ABI defined in patch 1
>
> I am wondering if we can allow the userspace to configure the mask
> and the shadow value during guest's lifetime and do it on a vCPU basis.
> this way, in conjunction with "virtual MSRs" or any other interfaces,
> the usespace can adjust hardware mitigations applied to the guest during
> guest's lifetime e.g., for the best performance.
> """
Gah, sorry, I speed read the cover letter and didn't take the time to process that.
> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
> preferable interfaces, they could also benefit from these changes.
>
> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
mitigations be system-wide / VM-wide?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 13:34 ` Sean Christopherson
@ 2024-06-11 14:08 ` Chao Gao
2024-06-11 16:32 ` Sean Christopherson
0 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2024-06-11 14:08 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Tue, Jun 11, 2024 at 06:34:49AM -0700, Sean Christopherson wrote:
>On Tue, Jun 11, 2024, Chao Gao wrote:
>> >I continue find all of this unpalatable. The guest tells KVM what software
>> >mitigations the guest is using, and then KVM is supposed to translate that into
>> >some hardware functionality? And merge that with userspace's own overrides?
>>
>> Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
>> wanted to punt to userspace ...
>>
>> >
>> >Blech.
>> >
>> >With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
>> >Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
>> >Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
>> >to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
>> >there's no need for KVM to handle them for performance reasons.
>>
>> ... I had this idea of implementing policy-related stuff in userspace, and I wrote
>> in the cover-letter:
>>
>> """
>> 1. the KVM<->userspace ABI defined in patch 1
>>
>> I am wondering if we can allow the userspace to configure the mask
>> and the shadow value during guest's lifetime and do it on a vCPU basis.
>> this way, in conjunction with "virtual MSRs" or any other interfaces,
>> the usespace can adjust hardware mitigations applied to the guest during
>> guest's lifetime e.g., for the best performance.
>> """
>
>Gah, sorry, I speed read the cover letter and didn't take the time to process that.
>
>> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
>> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
>> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
>> preferable interfaces, they could also benefit from these changes.
>>
>> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
>
>Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
>mitigations be system-wide / VM-wide?
Because spec_ctrl is per-vCPU and Intel-defined virtual MSRs are also per-vCPU.
i.e., a guest __can__ configure different values to virtual MSRs on different
vCPUs even though a sane guest won't do this. If KVM doesn't want to rule out
the possibility of supporting Intel-defined virtual MSRs in userspace or any
other per-vCPU interfaces, KVM_CAP_FORCE_SPEC_CTRL needs to be per-vCPU.
implementation-wise, being per-vCPU is simpler because, otherwise, once userspace
adjusts the hardware mitigations to enforce, KVM needs to kick all vCPUs. This
will add more complexity.
And IMO, requiring guests to deploy same mitigations on vCPUs is an unnecessary
limitation.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 14:08 ` Chao Gao
@ 2024-06-11 16:32 ` Sean Christopherson
0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2024-06-11 16:32 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Tue, Jun 11, 2024, Chao Gao wrote:
> On Tue, Jun 11, 2024 at 06:34:49AM -0700, Sean Christopherson wrote:
> >> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
> >> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
> >> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
> >> preferable interfaces, they could also benefit from these changes.
> >>
> >> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
> >
> >Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
> >mitigations be system-wide / VM-wide?
>
> Because spec_ctrl is per-vCPU and Intel-defined virtual MSRs are also per-vCPU.
I figured that was the answer, but part of me was hopeful :-)
> i.e., a guest __can__ configure different values to virtual MSRs on different
> vCPUs even though a sane guest won't do this. If KVM doesn't want to rule out
> the possibility of supporting Intel-defined virtual MSRs in userspace or any
> other per-vCPU interfaces, KVM_CAP_FORCE_SPEC_CTRL needs to be per-vCPU.
>
> implementation-wise, being per-vCPU is simpler because, otherwise, once userspace
> adjusts the hardware mitigations to enforce, KVM needs to kick all vCPUs. This
> will add more complexity.
+1, I even typed up as much before reading this paragraph.
> And IMO, requiring guests to deploy same mitigations on vCPUs is an unnecessary
> limitation.
Yeah, I can see how it would make things weird for no good reason.
So yeah, if the only thing stopping us from letting userspace deal with the virtual
MSRs is converting to a vCPU-scoped ioctl(), then by all means, lets do that.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-06-11 16:32 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
2024-04-12 4:07 ` Jim Mattson
2024-04-12 10:18 ` Chao Gao
2024-06-03 23:55 ` Sean Christopherson
2024-04-10 14:34 ` [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02 Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
2024-04-12 4:22 ` Jim Mattson
2024-04-10 14:34 ` [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
2024-06-11 1:34 ` Sean Christopherson
2024-06-11 10:48 ` Chao Gao
2024-06-11 13:34 ` Sean Christopherson
2024-06-11 14:08 ` Chao Gao
2024-06-11 16:32 ` Sean Christopherson
2024-04-10 14:34 ` [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT Chao Gao
-- strict thread matches above, loose matches on Subject: below --
2024-04-11 4:15 [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.