* [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL
@ 2024-04-10 14:34 Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
` (9 more replies)
0 siblings, 10 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Adam Dunlap,
Arjan van de Ven, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ilpo Järvinen, Ingo Molnar, Jithu Joseph, Jonathan Corbet,
Josh Poimboeuf, Kan Liang, linux-doc, Maciej S. Szmigiero,
Nikolay Borisov, Paolo Bonzini, Peter Zijlstra, Rick Edgecombe,
Sandipan Das, Sean Christopherson, Thomas Gleixner, Vegard Nossum,
x86
Hi all,
This series is tagged as RFC because I want to seek your feedback on
1. the KVM<->userspace ABI defined in patch 1
I am wondering if we can allow the userspace to configure the mask
and the shadow value during guest's lifetime and do it on a vCPU basis.
this way, in conjunction with "virtual MSRs" or any other interfaces,
the usespace can adjust hardware mitigations applied to the guest during
guest's lifetime e.g., for the best performance.
2. Intel-defined virtual MSRs vs. a new interface
The situation is some other OS already adopts the Intel-defined virtual
MSRs. Given this, I am not sure whether defining a new interface is
still preferable, as it will add more complexities if we end up with two
interfaces for the same purpose.
So, I just want to reconfirm whether the suggestion remains to define a
new interface through community collaboration as suggested at [1].
Below is the cover letter:
Background
==========
Branch History Injection (BHI) is a special form of Spectre variant 2,
where an attacker may manipulate branch history before transitioning
from user to supervisor mode (or from VMX non-root/guest to root mode)
in an effort to cause an indirect branch predictor to select a specific
predictor entry for an indirect branch, and a disclosure gadget at the
predicted target will transiently execute.
To mitigate BHI attacks, the kernel may use the hardware mitigation, i.e.,
BHI_DIS_S or resort to a SW loop, i.e., the BHB-clearing sequence, when the
hardware mitigation is not supported.
Problem
=======
However, the SW loop is effective on pre-SPR parts but not on SPR and
future parts. This creates a mitigation effectiveness problem for virtual
machines:
Migrating a guest using the SW loop on a pre-SPR part to parts where
the SW loop is ineffective (e.g., a SPR or future part) makes the
guest become vulnerable to BHI.
[For bare-metal, it isn't a problem. because parts on which the SW loop
is ineffective always support BHI_DIS_S, which is a more preferable
mitigation than the SW loop.]
Solution
========
This series proposes QEMU+KVM to deploy BHI_DIS_S using "virtualize
IA32_SPEC_CTRL" for the guest if the SW loop is ineffective on the host.
Note that: "virtualize IA32_SPEC_CTRL" allows the VMM to prevent the
guest from changing some bits of IA32_SPEC_CTRL MSR w/o intercepting
guest's writes to the MSR.
This solution leads to a new problem:
Deploying BHI_DIS_S for the guest may cause unnecessary performance loss
if the guest is using other mitigations for BHI or doesn't care BHI
attacks at all.
To overcome this unnecessary performance loss, we want to allow the guest
to opt out of BHI_DIS_S in this case. the idea is to let the guest report
whether it is using the SW loop to KVM/QEMU. Then KVM/QEMU won't deploy
BHI_DIS_S for the guest if the SW loop isn't in use.
Intel defines a set of para-virtualized MSRs [2] for guests to report
software mitigation status. This series emulates the para-virtualized
MSRs in KVM.
Overall, the series has two parts:
1. patch 1-3: Define the KVM ABI for userspace VMMs (e.g., QEMU) to deploy
hardware mitigations for the guest to solve the mitigation effectivenss
problem when migrating guests across parts w/ different microarchitecture.
2. patch 4-10: Emulate virtual MSRs so that the guest can report software
mitigation status to avoid the unnecessary performance loss.
[1] https://lore.kernel.org/all/ZH9kwlg2Ac9IER7Y@google.com/
[2] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html#inpage-nav-4
Chao Gao (4):
KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS
KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02
KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU
KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT
Daniel Sneddon (1):
KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
Pawan Gupta (2):
x86/bugs: Use Virtual MSRs to request BHI_DIS_S
x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S
Zhang Chen (3):
KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
KVM: VMX: Advertise MITIGATION_CTRL support
KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
Documentation/virt/kvm/api.rst | 39 +++++++
arch/x86/include/asm/kvm_host.h | 4 +
arch/x86/include/asm/msr-index.h | 24 +++++
arch/x86/include/asm/vmx.h | 5 +
arch/x86/include/asm/vmxfeatures.h | 2 +
arch/x86/kernel/cpu/bugs.c | 33 ++++++
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/cpu.h | 1 +
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/vmx/capabilities.h | 5 +
arch/x86/kvm/vmx/nested.c | 30 ++++++
arch/x86/kvm/vmx/vmx.c | 162 +++++++++++++++++++++++++++--
arch/x86/kvm/vmx/vmx.h | 21 +++-
arch/x86/kvm/x86.c | 49 ++++++++-
arch/x86/kvm/x86.h | 1 +
include/uapi/linux/kvm.h | 4 +
16 files changed, 376 insertions(+), 8 deletions(-)
base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-12 4:07 ` Jim Mattson
2024-04-10 14:34 ` [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS Chao Gao
` (8 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Sean Christopherson, Chao Gao,
Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-doc
From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
value directly to hardware. There is a tertiary control for
IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
masked to prevent guests from changing those bits.
Add controls setting the mask for IA32_SPEC_CTRL and desired value for
masked bits.
These new controls are especially helpful for protecting guests that
don't know about BHI_DIS_S and that are running on hardware that
supports it. This allows the hypervisor to set BHI_DIS_S to fully
protect the guest.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
[ add a new ioctl to report supported bits. Fix the inverted check ]
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Documentation/virt/kvm/api.rst | 39 +++++++++++++++++
arch/x86/include/asm/kvm_host.h | 4 ++
arch/x86/include/asm/vmx.h | 5 +++
arch/x86/include/asm/vmxfeatures.h | 2 +
arch/x86/kvm/vmx/capabilities.h | 5 +++
arch/x86/kvm/vmx/vmx.c | 68 +++++++++++++++++++++++++++---
arch/x86/kvm/vmx/vmx.h | 3 +-
arch/x86/kvm/x86.c | 30 +++++++++++++
arch/x86/kvm/x86.h | 1 +
include/uapi/linux/kvm.h | 4 ++
10 files changed, 155 insertions(+), 6 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..b6eeb1d6eb65 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6352,6 +6352,19 @@ a single guest_memfd file, but the bound ranges must not overlap).
See KVM_SET_USER_MEMORY_REGION2 for additional details.
+4.143 KVM_GET_SUPPORTED_FORCE_SPEC_CTRL
+---------------------------------------
+
+:Capability: KVM_CAP_FORCE_SPEC_CTRL
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: u64 supported_bitmask (out)
+:Returns: 0 on success, -EFAULT if supported_bitmap cannot be accessed
+
+Returns a bitmask of SPEC_CTRL MSR bits which can be forced on. All bits can be
+forced to 0 (i.e., prevent guest from setting it) even if KVM doesn't support
+the bit.
+
5. The kvm_run structure
========================
@@ -8063,6 +8076,32 @@ error/annotated fault.
See KVM_EXIT_MEMORY_FAULT for more information.
+7.35 KVM_CAP_FORCE_SPEC_CTRL
+----------------------------
+
+:Architectures: x86
+:Parameters: args[0] contains the bitmask to prevent guests from modifying those
+ bits
+ args[1] contains the desired value to set in IA32_SPEC_CTRL for the
+ masked bits
+:Returns: 0 on success, -EINVAL if args[0] or args[1] contain invalid values
+
+This capability allows userspace to configure the value of IA32_SPEC_CTRL and
+what bits the VM can and cannot access. This is especially useful when a VM is
+migrated to newer hardware with hardware based speculation mitigations not
+provided to the VM previously.
+
+IA32_SPEC_CTRL virtualization works by introducing the IA32_SPEC_CTRL shadow
+and mask fields. When a guest writes to IA32_SPEC_CTRL when it is virtualized
+the value written is:
+
+(GUEST_WRMSR_VAL & ~MASK) | (REAL_MSR_VAL & MASK).
+
+No bit that is masked can be modified by the guest.
+
+The shadow field contains the value the guest wrote to the MSR and is what is
+returned to the guest when the virtualized MSR is read.
+
8. Other capabilities.
======================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 16e07a2eee19..8220414cf697 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1404,6 +1404,10 @@ struct kvm_arch {
u32 notify_window;
u32 notify_vmexit_flags;
+
+ u64 force_spec_ctrl_mask;
+ u64 force_spec_ctrl_value;
+
/*
* If exit_on_emulation_error is set, and the in-kernel instruction
* emulator fails to emulate an instruction, allow userspace
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 4dba17363008..f65651a3898c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -84,6 +84,7 @@
* Definitions of Tertiary Processor-Based VM-Execution Controls.
*/
#define TERTIARY_EXEC_IPI_VIRT VMCS_CONTROL_BIT(IPI_VIRT)
+#define TERTIARY_EXEC_SPEC_CTRL_SHADOW VMCS_CONTROL_BIT(SPEC_CTRL_SHADOW)
#define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING)
#define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING)
@@ -236,6 +237,10 @@ enum vmcs_field {
TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035,
PID_POINTER_TABLE = 0x00002042,
PID_POINTER_TABLE_HIGH = 0x00002043,
+ IA32_SPEC_CTRL_MASK = 0x0000204A,
+ IA32_SPEC_CTRL_MASK_HIGH = 0x0000204B,
+ IA32_SPEC_CTRL_SHADOW = 0x0000204C,
+ IA32_SPEC_CTRL_SHADOW_HIGH = 0x0000204D,
GUEST_PHYSICAL_ADDRESS = 0x00002400,
GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
VMCS_LINK_POINTER = 0x00002800,
diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index 266daf5b5b84..6dbfe9004d92 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -90,4 +90,6 @@
/* Tertiary Processor-Based VM-Execution Controls, word 3 */
#define VMX_FEATURE_IPI_VIRT ( 3*32+ 4) /* Enable IPI virtualization */
+#define VMX_FEATURE_SPEC_CTRL_SHADOW ( 3*32+ 7) /* IA32_SPEC_CTRL shadow */
+
#endif /* _ASM_X86_VMXFEATURES_H */
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 41a4533f9989..6c51a5abb16b 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -138,6 +138,11 @@ static inline bool cpu_has_tertiary_exec_ctrls(void)
CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
}
+static inline bool cpu_has_spec_ctrl_shadow(void)
+{
+ return vmcs_config.cpu_based_3rd_exec_ctrl & TERTIARY_EXEC_SPEC_CTRL_SHADOW;
+}
+
static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c37a89eda90f..a6154d725025 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2008,7 +2008,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
!guest_has_spec_ctrl_msr(vcpu))
return 1;
- msr_info->data = to_vmx(vcpu)->spec_ctrl;
+ if (cpu_has_spec_ctrl_shadow())
+ msr_info->data = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
+ else
+ msr_info->data = to_vmx(vcpu)->spec_ctrl;
break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
@@ -2148,6 +2151,19 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
+static void vmx_set_spec_ctrl(struct kvm_vcpu *vcpu, u64 val)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ vmx->spec_ctrl = val;
+
+ if (cpu_has_spec_ctrl_shadow()) {
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, val);
+
+ vmx->spec_ctrl |= vcpu->kvm->arch.force_spec_ctrl_value;
+ }
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2273,7 +2289,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (kvm_spec_ctrl_test_value(data))
return 1;
- vmx->spec_ctrl = data;
+ vmx_set_spec_ctrl(vcpu, data);
+
if (!data)
break;
@@ -4785,6 +4802,23 @@ static void init_vmcs(struct vcpu_vmx *vmx)
if (cpu_has_vmx_xsaves())
vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
+ if (cpu_has_spec_ctrl_shadow()) {
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, 0);
+
+ /*
+ * Note, IA32_SPEC_CTRL_{SHADOW,MASK} subtly behave *very*
+ * differently than other shadow+mask combinations. Attempts
+ * to modify bits in MASK are silently ignored and do NOT cause
+ * a VM-Exit. This allows the host to force bits to be set or
+ * cleared on behalf of the guest, while still allowing the
+ * guest modify other bits at will, without triggering VM-Exits.
+ */
+ if (kvm->arch.force_spec_ctrl_mask)
+ vmcs_write64(IA32_SPEC_CTRL_MASK, kvm->arch.force_spec_ctrl_mask);
+ else
+ vmcs_write64(IA32_SPEC_CTRL_MASK, 0);
+ }
+
if (enable_pml) {
vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
@@ -4853,7 +4887,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
__vmx_vcpu_reset(vcpu);
vmx->rmode.vm86_active = 0;
- vmx->spec_ctrl = 0;
+ vmx_set_spec_ctrl(vcpu, 0);
vmx->msr_ia32_umwait_control = 0;
@@ -7211,8 +7245,14 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
if (!cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL))
return;
- if (flags & VMX_RUN_SAVE_SPEC_CTRL)
- vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+ if (flags & VMX_RUN_SAVE_SPEC_CTRL) {
+ if (cpu_has_spec_ctrl_shadow())
+ vmx->spec_ctrl = (vmcs_read64(IA32_SPEC_CTRL_SHADOW) &
+ ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask) |
+ vmx->vcpu.kvm->arch.force_spec_ctrl_value;
+ else
+ vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+ }
/*
* If the guest/host SPEC_CTRL values differ, restore the host value.
@@ -8598,6 +8638,24 @@ static __init int hardware_setup(void)
kvm_caps.tsc_scaling_ratio_frac_bits = 48;
kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
+ kvm_caps.supported_force_spec_ctrl = 0;
+
+ if (cpu_has_spec_ctrl_shadow()) {
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_IBRS;
+
+ if (boot_cpu_has(X86_FEATURE_STIBP))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_STIBP;
+
+ if (boot_cpu_has(X86_FEATURE_SSBD))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_SSBD;
+
+ if (boot_cpu_has(X86_FEATURE_RRSBA_CTRL) &&
+ (host_arch_capabilities & ARCH_CAP_RRSBA))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_RRSBA_DIS_S;
+
+ if (boot_cpu_has(X86_FEATURE_BHI_CTRL))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_BHI_DIS_S;
+ }
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 65786dbe7d60..f26ac82b5a59 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -578,7 +578,8 @@ static inline u8 vmx_get_rvi(void)
#define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0
#define KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL \
- (TERTIARY_EXEC_IPI_VIRT)
+ (TERTIARY_EXEC_IPI_VIRT | \
+ TERTIARY_EXEC_SPEC_CTRL_SHADOW)
#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \
static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 984ea2089efc..9a59b5a93d0e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4836,6 +4836,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
r |= BIT(KVM_X86_SW_PROTECTED_VM);
break;
+ case KVM_CAP_FORCE_SPEC_CTRL:
+ r = !!kvm_caps.supported_force_spec_ctrl;
+ break;
default:
break;
}
@@ -4990,6 +4993,13 @@ long kvm_arch_dev_ioctl(struct file *filp,
r = kvm_x86_dev_has_attr(&attr);
break;
}
+ case KVM_GET_SUPPORTED_FORCE_SPEC_CTRL: {
+ r = 0;
+ if (copy_to_user(argp, &kvm_caps.supported_force_spec_ctrl,
+ sizeof(kvm_caps.supported_force_spec_ctrl)))
+ r = -EFAULT;
+ break;
+ }
default:
r = -EINVAL;
break;
@@ -6729,6 +6739,26 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_FORCE_SPEC_CTRL:
+ r = -EINVAL;
+
+ mutex_lock(&kvm->lock);
+
+ /*
+ * Note, only the value is restricted to known bits that KVM
+ * can force on. Userspace is allowed to set any mask bits,
+ * i.e. can prevent the guest from setting a bit, even if KVM
+ * doesn't support the bit.
+ */
+ if (kvm_caps.supported_force_spec_ctrl && !kvm->created_vcpus &&
+ !(~kvm_caps.supported_force_spec_ctrl & cap->args[1]) &&
+ !(~cap->args[0] & cap->args[1])) {
+ kvm->arch.force_spec_ctrl_mask = cap->args[0];
+ kvm->arch.force_spec_ctrl_value = cap->args[1];
+ r = 0;
+ }
+ mutex_unlock(&kvm->lock);
+ break;
default:
r = -EINVAL;
break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a8b71803777b..6dd12776b310 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -29,6 +29,7 @@ struct kvm_caps {
u64 supported_xcr0;
u64 supported_xss;
u64 supported_perf_cap;
+ u64 supported_force_spec_ctrl;
};
void kvm_spurious_fault(void);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..fb918bdb930c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_FORCE_SPEC_CTRL 236
struct kvm_irq_routing_irqchip {
__u32 irqchip;
@@ -1243,6 +1244,9 @@ struct kvm_vfio_spapr_tce {
#define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr)
+/* Available with KVM_CAP_FORCE_SPEC_CTRL */
+#define KVM_GET_SUPPORTED_FORCE_SPEC_CTRL _IOR(KVMIO, 0xe4, __u64)
+
/*
* ioctls for vcpu fds
*/
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02 Chao Gao
` (7 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
This field is effectively the value of IA32_SPEC_CTRL MSR in guest's
view. Cache it for nested VMX transitions. The value should be
propagated between vmcs01 and vmcs02 so that across nested VMX
transitions, in guest's view, IA32_SPEC_CTRL MSR won't be changed
magically.
IA32_SPEC_CTRL_SHADOW field may be changed by guest if IA32_SPEC_CTRL
MSR is pass-thru'd to the guest. So, update the cache right after
VM-exit to ensure it is always consistent with the value in guest's
view.
A bonus is vmx_get_msr() can return the cache directly thus no need
to make a VMREAD.
No functional change intended.
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 12 ++++++++----
arch/x86/kvm/vmx/vmx.h | 6 ++++++
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a6154d725025..93c208f009cf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2009,7 +2009,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
if (cpu_has_spec_ctrl_shadow())
- msr_info->data = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
+ msr_info->data = to_vmx(vcpu)->spec_ctrl_shadow;
else
msr_info->data = to_vmx(vcpu)->spec_ctrl;
break;
@@ -2158,6 +2158,7 @@ static void vmx_set_spec_ctrl(struct kvm_vcpu *vcpu, u64 val)
vmx->spec_ctrl = val;
if (cpu_has_spec_ctrl_shadow()) {
+ vmx->spec_ctrl_shadow = val;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, val);
vmx->spec_ctrl |= vcpu->kvm->arch.force_spec_ctrl_value;
@@ -4803,6 +4804,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
if (cpu_has_spec_ctrl_shadow()) {
+ vmx->spec_ctrl_shadow = 0;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, 0);
/*
@@ -7246,12 +7248,14 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
return;
if (flags & VMX_RUN_SAVE_SPEC_CTRL) {
- if (cpu_has_spec_ctrl_shadow())
- vmx->spec_ctrl = (vmcs_read64(IA32_SPEC_CTRL_SHADOW) &
+ if (cpu_has_spec_ctrl_shadow()) {
+ vmx->spec_ctrl_shadow = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
+ vmx->spec_ctrl = (vmx->spec_ctrl_shadow &
~vmx->vcpu.kvm->arch.force_spec_ctrl_mask) |
vmx->vcpu.kvm->arch.force_spec_ctrl_value;
- else
+ } else {
vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+ }
}
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f26ac82b5a59..97324f6ee01c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -281,6 +281,12 @@ struct vcpu_vmx {
#endif
u64 spec_ctrl;
+ /*
+ * Cache IA32_SPEC_CTRL_SHADOW field of VMCS, i.e., the value of
+ * MSR_IA32_SPEC_CTRL in guest's view.
+ */
+ u64 spec_ctrl_shadow;
+
u32 msr_ia32_umwait_control;
/*
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S Chao Gao
` (6 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
to prevent nested guests from changing the SPEC_CTRL bits that userspace
doesn't allow a guest to change.
Propagate tertiary vm-exec controls from vmcs01 to vmcs02 and program
the mask of SPEC_CTRL MSRs as the userspace VMM requested.
With SPEC_CTRL virtualization enabled, guest will read from the shadow
value in VMCS. To ensure consistent view across nested VMX transitions,
propagate the shadow value between vmcs01 and vmcs02.
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/nested.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d05ddf751491..174790b2ffbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2381,6 +2381,20 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
secondary_exec_controls_set(vmx, exec_control);
}
+ /*
+ * TERTIARY EXEC CONTROLS
+ */
+ if (cpu_has_tertiary_exec_ctrls()) {
+ exec_control = __tertiary_exec_controls_get(vmcs01);
+
+ exec_control &= TERTIARY_EXEC_SPEC_CTRL_SHADOW;
+ if (exec_control & TERTIARY_EXEC_SPEC_CTRL_SHADOW)
+ vmcs_write64(IA32_SPEC_CTRL_MASK,
+ vmx->vcpu.kvm->arch.force_spec_ctrl_mask);
+
+ tertiary_exec_controls_set(vmx, exec_control);
+ }
+
/*
* ENTRY CONTROLS
*
@@ -2625,6 +2639,19 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (kvm_caps.has_tsc_control)
vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
+ /*
+ * L2 after nested VM-entry should observe the same value of
+ * IA32_SPEC_CTRL MSR as L1 unless:
+ * a. L1 loads IA32_SPEC_CTRL via MSR-load area.
+ * b. L1 enables IA32_SPEC_CTRL virtualization. this cannot
+ * happen since KVM doesn't expose this feature to L1.
+ *
+ * Propagate spec_ctrl_shadow (the value guest will get via RDMSR)
+ * to vmcs02. Later nested_vmx_load_msr() will take care of case a.
+ */
+ if (vmx->nested.nested_run_pending && cpu_has_spec_ctrl_shadow())
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, vmx->spec_ctrl_shadow);
+
nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
if (nested_cpu_has_ept(vmcs12))
@@ -4883,6 +4910,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx_update_cpu_dirty_logging(vcpu);
}
+ if (cpu_has_spec_ctrl_shadow())
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, vmx->spec_ctrl_shadow);
+
/* Unpin physical memory we referred to in vmcs02 */
kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (2 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02 Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S Chao Gao
` (5 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Josh Poimboeuf, Ilpo Järvinen,
Sean Christopherson, Kai Huang, Jithu Joseph, Kan Liang,
Paolo Bonzini, Sandipan Das, Vegard Nossum, Nikolay Borisov,
Rick Edgecombe, Adam Dunlap, Arjan van de Ven
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Mitigation for BHI is to use hardware control BHI_DIS_S or the software
sequence. On platforms that support BHI_DIS_S, a software sequence may
be ineffective to mitigate BHI. Guests that are not aware of BHI_DIS_S
on host, and deploy the ineffective software sequence clear_bhb_loop(),
may become vulnerable to BHI.
To overcome this problem Intel has defined a virtual MSR interface
through which guests can report their mitigation status and request VMM
to deploy relevant hardware mitigations.
Use this virtual MSR interface to tell VMM that the guest is using a
short software sequence. Based on this information a VMM can deploy
BHI_DIS_S for the guest using virtual SPEC_CTRL.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/include/asm/msr-index.h | 18 ++++++++++++++++++
arch/x86/kernel/cpu/bugs.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/cpu.h | 1 +
4 files changed, 46 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e72c2b872957..18a4081bf5cb 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -196,6 +196,7 @@
* IA32_XAPIC_DISABLE_STATUS MSR
* supported
*/
+#define ARCH_CAP_VIRTUAL_ENUM BIT_ULL(63) /* MSR_VIRTUAL_ENUMERATION supported */
#define MSR_IA32_FLUSH_CMD 0x0000010b
#define L1D_FLUSH BIT(0) /*
@@ -1178,6 +1179,23 @@
#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F
+/* Intel virtual MSRs */
+#define MSR_VIRTUAL_ENUMERATION 0x50000000
+#define VIRT_ENUM_MITIGATION_CTRL_SUPPORT BIT(0) /*
+ * Mitigation ctrl via virtual
+ * MSRs supported
+ */
+
+#define MSR_VIRTUAL_MITIGATION_ENUM 0x50000001
+#define MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT BIT(0) /* VMM supports BHI_DIS_S */
+
+#define MSR_VIRTUAL_MITIGATION_CTRL 0x50000002
+#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT 0 /*
+ * Request VMM to deploy
+ * BHI_DIS_S mitigation
+ */
+#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED BIT(MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT)
+
/* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114
#define MSR_VM_IGNNE 0xc0010115
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 295463707e68..e74e4c51d387 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -50,6 +50,8 @@ static void __init l1d_flush_select_mitigation(void);
static void __init srso_select_mitigation(void);
static void __init gds_select_mitigation(void);
+void virt_mitigation_ctrl_init(void);
+
/* The base value of the SPEC_CTRL MSR without task-specific bits set */
u64 x86_spec_ctrl_base;
EXPORT_SYMBOL_GPL(x86_spec_ctrl_base);
@@ -171,6 +173,8 @@ void __init cpu_select_mitigations(void)
*/
srso_select_mitigation();
gds_select_mitigation();
+
+ virt_mitigation_ctrl_init();
}
/*
@@ -1680,6 +1684,28 @@ static void __init bhi_select_mitigation(void)
pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
}
+void virt_mitigation_ctrl_init(void)
+{
+ u64 msr_virt_enum, msr_mitigation_enum;
+
+ if (!(x86_read_arch_cap_msr() & ARCH_CAP_VIRTUAL_ENUM))
+ return;
+
+ rdmsrl(MSR_VIRTUAL_ENUMERATION, msr_virt_enum);
+ if (!(msr_virt_enum & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return;
+
+ rdmsrl(MSR_VIRTUAL_MITIGATION_ENUM, msr_mitigation_enum);
+
+ if (msr_mitigation_enum & MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT) {
+ /* When BHI short seq is being used, request BHI_DIS_S */
+ if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
+ msr_set_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT);
+ else
+ msr_clear_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT);
+ }
+}
+
static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 754d91857d63..29f16655a7a0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1960,6 +1960,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
update_gds_msr();
tsx_ap_init();
+ virt_mitigation_ctrl_init();
}
void print_cpu_info(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index ea9e07d57c8d..1cddf506b6ae 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -87,6 +87,7 @@ void cpu_select_mitigations(void);
extern void x86_spec_ctrl_setup_ap(void);
extern void update_srbds_msr(void);
extern void update_gds_msr(void);
+extern void virt_mitigation_ctrl_init(void);
extern enum spectre_v2_mitigation spectre_v2_enabled;
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (3 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU Chao Gao
` (4 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Josh Poimboeuf, Ilpo Järvinen, Tony Luck,
Maciej S. Szmigiero, Kan Liang, Paolo Bonzini, Sandipan Das
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
On CPUs with RRSBA behavior a guest using retpoline mitigation could
become vulnerable to BHI. On such CPUs, when RSB underflows a RET could
take prediction from BTB. Although these predictions are limited to same
domain, they may be controllable from userspace using BHI.
Alderlake and newer CPUs have RRSBA_DIS_S knob in MSR_SPEC_CTRL to
disable RRSBA behavior. A guest migrating from older CPU may not be
aware of RRSBA_DIS_S. Use MSR_VIRTUAL_MITIGATION_CTRL to request VMM to
deploy RRSBA_DIS_S when retpoline mitigation is in use.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/include/asm/msr-index.h | 6 ++++++
arch/x86/kernel/cpu/bugs.c | 7 +++++++
2 files changed, 13 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18a4081bf5cb..469ab38c0ec8 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1188,6 +1188,7 @@
#define MSR_VIRTUAL_MITIGATION_ENUM 0x50000001
#define MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT BIT(0) /* VMM supports BHI_DIS_S */
+#define MITI_ENUM_RETPOLINE_S_SUPPORT BIT(1) /* VMM supports RRSBA_DIS_S */
#define MSR_VIRTUAL_MITIGATION_CTRL 0x50000002
#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT 0 /*
@@ -1195,6 +1196,11 @@
* BHI_DIS_S mitigation
*/
#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED BIT(MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT)
+#define MITI_CTRL_RETPOLINE_S_USED_BIT 1 /*
+ * Request VMM to deploy
+ * RRSBA_DIS_S mitigation
+ */
+#define MITI_CTRL_RETPOLINE_S_USED BIT(MITI_CTRL_RETPOLINE_S_USED_BIT)
/* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index e74e4c51d387..766f4340eddf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1704,6 +1704,13 @@ void virt_mitigation_ctrl_init(void)
else
msr_clear_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_BHB_CLEAR_SEQ_S_USED_BIT);
}
+ if (msr_mitigation_enum & MITI_ENUM_RETPOLINE_S_SUPPORT) {
+ /* When retpoline is being used, request RRSBA_DIS_S */
+ if (boot_cpu_has(X86_FEATURE_RETPOLINE))
+ msr_set_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_RETPOLINE_S_USED_BIT);
+ else
+ msr_clear_bit(MSR_VIRTUAL_MITIGATION_CTRL, MITI_CTRL_RETPOLINE_S_USED_BIT);
+ }
}
static void __init spectre_v2_select_mitigation(void)
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (4 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
` (3 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
so that KVM can adjust the mask/value for each vCPU according to the
software mitigations the vCPU is using.
KVM_CAP_FORCE_SPEC_CTRL allows the userspace VMM to proactively enable
hardware mitigations (by setting some bits in IA32_SPEC_CTRL MSRs) to
protect the guest from becoming vulnerable to some security issues after
live migration. E.g., if a guest using the short BHB-clearing sequence
for BHI is migrated from a pre-SPR part to a SPR part will become
vulnerable for BHI. Current solution is the userspace VMM deploys
BHI_DIS_S for all guests migrated to SPR parts from pre-SPR parts.
But KVM_CAP_FORCE_SPEC_CTRL isn't flexible because the userspace VMM may
configure KVM to enable BHI_DIS_S for guests which don't care about BHI
at all or are using other mitigations (e.g, TSX abort sequence) for BHI.
This would cause unnecessary overhead to the guest.
To reduce the overhead, the idea is to let the guest communicate which
software mitigations are being used to the VMM via Intel-defined virtual
MSRs [1]. This information from guests is much more accurate. KVM can
adjust hardware mitigations accordingly to reduce the performance impact
to the guest as much as possible.
The Intel-defined value MSRs are per-thread scope. vCPUs _can_ program
different values to them. This means, KVM may need to apply different
mask/value to IA32_SPEC_CTRL MSR. So, cache force_spec_ctrl_value/mask
for each vCPU in preparation for adding support for intel-defined
virtual MSRs.
[1]: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 11 +++++++----
arch/x86/kvm/vmx/vmx.h | 7 +++++++
3 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 174790b2ffbc..efbc871d0466 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2390,7 +2390,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
exec_control &= TERTIARY_EXEC_SPEC_CTRL_SHADOW;
if (exec_control & TERTIARY_EXEC_SPEC_CTRL_SHADOW)
vmcs_write64(IA32_SPEC_CTRL_MASK,
- vmx->vcpu.kvm->arch.force_spec_ctrl_mask);
+ vmx->force_spec_ctrl_mask);
tertiary_exec_controls_set(vmx, exec_control);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 93c208f009cf..cdfcc1290d82 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2161,7 +2161,7 @@ static void vmx_set_spec_ctrl(struct kvm_vcpu *vcpu, u64 val)
vmx->spec_ctrl_shadow = val;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, val);
- vmx->spec_ctrl |= vcpu->kvm->arch.force_spec_ctrl_value;
+ vmx->spec_ctrl |= vmx->force_spec_ctrl_value;
}
}
@@ -4803,6 +4803,9 @@ static void init_vmcs(struct vcpu_vmx *vmx)
if (cpu_has_vmx_xsaves())
vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
+ vmx->force_spec_ctrl_mask = kvm->arch.force_spec_ctrl_mask;
+ vmx->force_spec_ctrl_value = kvm->arch.force_spec_ctrl_value;
+
if (cpu_has_spec_ctrl_shadow()) {
vmx->spec_ctrl_shadow = 0;
vmcs_write64(IA32_SPEC_CTRL_SHADOW, 0);
@@ -4816,7 +4819,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
* guest modify other bits at will, without triggering VM-Exits.
*/
if (kvm->arch.force_spec_ctrl_mask)
- vmcs_write64(IA32_SPEC_CTRL_MASK, kvm->arch.force_spec_ctrl_mask);
+ vmcs_write64(IA32_SPEC_CTRL_MASK, vmx->force_spec_ctrl_mask);
else
vmcs_write64(IA32_SPEC_CTRL_MASK, 0);
}
@@ -7251,8 +7254,8 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
if (cpu_has_spec_ctrl_shadow()) {
vmx->spec_ctrl_shadow = vmcs_read64(IA32_SPEC_CTRL_SHADOW);
vmx->spec_ctrl = (vmx->spec_ctrl_shadow &
- ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask) |
- vmx->vcpu.kvm->arch.force_spec_ctrl_value;
+ ~vmx->force_spec_ctrl_mask) |
+ vmx->force_spec_ctrl_value;
} else {
vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 97324f6ee01c..a4dfe538e5a8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -287,6 +287,13 @@ struct vcpu_vmx {
*/
u64 spec_ctrl_shadow;
+ /*
+ * Mask and value of SPEC_CTRL MSR bits which the guest is not allowed to
+ * change.
+ */
+ u64 force_spec_ctrl_mask;
+ u64 force_spec_ctrl_value;
+
u32 msr_ia32_umwait_control;
/*
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (5 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-12 4:22 ` Jim Mattson
2024-04-10 14:34 ` [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support Chao Gao
` (2 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Zhang Chen, Chao Gao,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
From: Zhang Chen <chen.zhang@intel.com>
Bit 63 of IA32_ARCH_CAPABILITIES MSR indicates availablility of the
VIRTUAL_ENUMERATION_MSR (index 0x50000000) which enumerates features
like e.g., mitigation enumeration that in turn is used for the guest to
report software mitigations it is using.
Advertise ARCH_CAP_VIRTUAL_ENUM support for VMX and emulate read/write
of the VIRTUAL_ENUMERATION_MSR. Now VIRTUAL_ENUMERATION_MSR is always 0.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 2 ++
arch/x86/kvm/x86.c | 16 +++++++++++++++-
4 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d1a9f9951635..e3406971a8b7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4288,6 +4288,7 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
{
switch (index) {
case MSR_IA32_MCG_EXT_CTL:
+ case MSR_VIRTUAL_ENUMERATION:
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
return false;
case MSR_IA32_SMBASE:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cdfcc1290d82..dcb06406fd09 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1955,6 +1955,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
return !(msr->data & ~valid_bits);
}
+#define VIRTUAL_ENUMERATION_VALID_BITS 0ULL
+
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
switch (msr->index) {
@@ -1962,6 +1964,9 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
if (!nested)
return 1;
return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
+ case MSR_VIRTUAL_ENUMERATION:
+ msr->data = VIRTUAL_ENUMERATION_VALID_BITS;
+ return 0;
default:
return KVM_MSR_RET_INVALID;
}
@@ -2113,6 +2118,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
+ case MSR_VIRTUAL_ENUMERATION:
+ if (!msr_info->host_initiated &&
+ !(vcpu->arch.arch_capabilities & ARCH_CAP_VIRTUAL_ENUM))
+ return 1;
+ msr_info->data = vmx->msr_virtual_enumeration;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2457,6 +2468,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
ret = kvm_set_msr_common(vcpu, msr_info);
break;
+ case MSR_VIRTUAL_ENUMERATION:
+ if (!msr_info->host_initiated)
+ return 1;
+ if (data & ~VIRTUAL_ENUMERATION_VALID_BITS)
+ return 1;
+
+ vmx->msr_virtual_enumeration = data;
+ break;
default:
find_uret_msr:
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a4dfe538e5a8..0519cf6187ac 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -294,6 +294,8 @@ struct vcpu_vmx {
u64 force_spec_ctrl_mask;
u64 force_spec_ctrl_value;
+ u64 msr_virtual_enumeration;
+
u32 msr_ia32_umwait_control;
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a59b5a93d0e..4721b6fe7641 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1564,6 +1564,7 @@ static const u32 emulated_msrs_all[] = {
MSR_K7_HWCR,
MSR_KVM_POLL_CONTROL,
+ MSR_VIRTUAL_ENUMERATION,
};
static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
@@ -1579,6 +1580,7 @@ static const u32 msr_based_features_all_except_vmx[] = {
MSR_IA32_UCODE_REV,
MSR_IA32_ARCH_CAPABILITIES,
MSR_IA32_PERF_CAPABILITIES,
+ MSR_VIRTUAL_ENUMERATION,
};
static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
@@ -1621,7 +1623,8 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
- ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO)
+ ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | \
+ ARCH_CAP_VIRTUAL_ENUM)
static u64 kvm_get_arch_capabilities(void)
{
@@ -1635,6 +1638,17 @@ static u64 kvm_get_arch_capabilities(void)
*/
data |= ARCH_CAP_PSCHANGE_MC_NO;
+ /*
+ * Virtual enumeration is a paravirt feature. The only usage for now
+ * is to bridge the gap caused by microarchitecture changes between
+ * different Intel processors. And its usage is linked to "virtualize
+ * IA32_SPEC_CTRL" which is a VMX feature. Whether AMD SVM can benefit
+ * from the same usage and how to implement it is still unclear. Limit
+ * virtual enumeration to VMX.
+ */
+ if (static_call(kvm_x86_has_emulated_msr)(NULL, MSR_VIRTUAL_ENUMERATION))
+ data |= ARCH_CAP_VIRTUAL_ENUM;
+
/*
* If we're doing cache flushes (either "always" or "cond")
* we will do one whenever the guest does a vmlaunch/vmresume.
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (6 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT Chao Gao
9 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Zhang Chen, Chao Gao,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
From: Zhang Chen <chen.zhang@intel.com>
Advertise MITIGATION_CTRL support and emulate accesses to two associated
MSRs.
MITIGATION_CTRL is enumerated by bit 0 of MSR_VIRTUAL_ENUMERATION. If
supported, two virtual MSRs MSR_VIRTUAL_MITIGATION_ENUM(0x50000001) and
MSR_VIRTUAL_MITIGATION_CTRL(0x50000002) are available.
The guest can use the two MSRs to report software mitigation status.
According to this information, KVM can deploy some alternative
mitigations (e.g., hardware mitigations) for the guest if some software
mitigations are not effective on the host.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 36 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 3 +++
arch/x86/kvm/x86.c | 3 +++
4 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e3406971a8b7..8a080592aa54 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4289,6 +4289,8 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
switch (index) {
case MSR_IA32_MCG_EXT_CTL:
case MSR_VIRTUAL_ENUMERATION:
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ case MSR_VIRTUAL_MITIGATION_CTRL:
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
return false;
case MSR_IA32_SMBASE:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dcb06406fd09..cc260b14f8df 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1955,7 +1955,9 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
return !(msr->data & ~valid_bits);
}
-#define VIRTUAL_ENUMERATION_VALID_BITS 0ULL
+#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
+#define MITI_ENUM_VALID_BITS 0ULL
+#define MITI_CTRL_VALID_BITS 0ULL
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -1967,6 +1969,9 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
case MSR_VIRTUAL_ENUMERATION:
msr->data = VIRTUAL_ENUMERATION_VALID_BITS;
return 0;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ msr->data = MITI_ENUM_VALID_BITS;
+ return 0;
default:
return KVM_MSR_RET_INVALID;
}
@@ -2124,6 +2129,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = vmx->msr_virtual_enumeration;
break;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ msr_info->data = vmx->msr_virtual_mitigation_enum;
+ break;
+ case MSR_VIRTUAL_MITIGATION_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ msr_info->data = vmx->msr_virtual_mitigation_ctrl;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2476,7 +2493,23 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmx->msr_virtual_enumeration = data;
break;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ if (!msr_info->host_initiated)
+ return 1;
+ if (data & ~MITI_ENUM_VALID_BITS)
+ return 1;
+
+ vmx->msr_virtual_mitigation_enum = data;
+ break;
+ case MSR_VIRTUAL_MITIGATION_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ if (data & ~MITI_CTRL_VALID_BITS)
+ return 1;
+ vmx->msr_virtual_mitigation_ctrl = data;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_index);
@@ -4901,6 +4934,7 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
*/
vmx->pi_desc.nv = POSTED_INTR_VECTOR;
vmx->pi_desc.sn = 1;
+ vmx->msr_virtual_mitigation_ctrl = 0;
}
static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0519cf6187ac..7be5dd5dde6c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -296,6 +296,9 @@ struct vcpu_vmx {
u64 msr_virtual_enumeration;
+ u64 msr_virtual_mitigation_enum;
+ u64 msr_virtual_mitigation_ctrl;
+
u32 msr_ia32_umwait_control;
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4721b6fe7641..f55d26d7c79a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1565,6 +1565,8 @@ static const u32 emulated_msrs_all[] = {
MSR_K7_HWCR,
MSR_KVM_POLL_CONTROL,
MSR_VIRTUAL_ENUMERATION,
+ MSR_VIRTUAL_MITIGATION_ENUM,
+ MSR_VIRTUAL_MITIGATION_CTRL,
};
static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
@@ -1581,6 +1583,7 @@ static const u32 msr_based_features_all_except_vmx[] = {
MSR_IA32_ARCH_CAPABILITIES,
MSR_IA32_PERF_CAPABILITIES,
MSR_VIRTUAL_ENUMERATION,
+ MSR_VIRTUAL_MITIGATION_ENUM,
};
static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (7 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
2024-06-11 1:34 ` Sean Christopherson
2024-04-10 14:34 ` [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT Chao Gao
9 siblings, 1 reply; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Zhang Chen, Chao Gao,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
From: Zhang Chen <chen.zhang@intel.com>
Allow guest to report if the short BHB-clearing sequence is in use.
KVM will deploy BHI_DIS_S for the guest if the short BHB-clearing
sequence is in use and the processor doesn't enumerate BHI_NO.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc260b14f8df..c5ceaebd954b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1956,8 +1956,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
}
#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
-#define MITI_ENUM_VALID_BITS 0ULL
-#define MITI_CTRL_VALID_BITS 0ULL
+#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
+#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
struct vmx_uret_msr *msr;
int ret = 0;
u32 msr_index = msr_info->index;
- u64 data = msr_info->data;
+ u64 data = msr_info->data, spec_ctrl_mask = 0;
u32 index;
switch (msr_index) {
@@ -2508,6 +2508,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~MITI_CTRL_VALID_BITS)
return 1;
+ if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
+ kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
+ !(host_arch_capabilities & ARCH_CAP_BHI_NO))
+ spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
+
+ /*
+ * Intercept IA32_SPEC_CTRL to disallow guest from changing
+ * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
+ * e.g., in nested case.
+ */
+ if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
+ vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
+
+ /*
+ * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
+ * MSR_VIRTUAL_MITIGATION_CTRL.
+ */
+ spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
+
+ vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
+ spec_ctrl_mask;
+ vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
+ spec_ctrl_mask;
+ vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
+
vmx->msr_virtual_mitigation_ctrl = data;
break;
default:
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
` (8 preceding siblings ...)
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
@ 2024-04-10 14:34 ` Chao Gao
9 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2024-04-10 14:34 UTC (permalink / raw)
To: kvm, linux-kernel
Cc: daniel.sneddon, pawan.kumar.gupta, Chao Gao, Zhang Chen,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
Allow guest to report if retpoline is used in supervisor mode.
KVM will deploy RRSBA_DIS_S for guest if guest is using retpoline and
the processor enumerates RRSBA.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c5ceaebd954b..235cb6ad69c0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1956,8 +1956,10 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
}
#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
-#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
-#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
+#define MITI_ENUM_VALID_BITS (MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT | \
+ MITI_ENUM_RETPOLINE_S_SUPPORT)
+#define MITI_CTRL_VALID_BITS (MITI_CTRL_BHB_CLEAR_SEQ_S_USED | \
+ MITI_CTRL_RETPOLINE_S_USED)
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -2508,6 +2510,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~MITI_CTRL_VALID_BITS)
return 1;
+ if (data & MITI_CTRL_RETPOLINE_S_USED &&
+ kvm_cpu_cap_has(X86_FEATURE_RRSBA_CTRL) &&
+ host_arch_capabilities & ARCH_CAP_RRSBA)
+ spec_ctrl_mask |= SPEC_CTRL_RRSBA_DIS_S;
+
if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
!(host_arch_capabilities & ARCH_CAP_BHI_NO))
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
@ 2024-04-12 4:07 ` Jim Mattson
2024-04-12 10:18 ` Chao Gao
0 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2024-04-12 4:07 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta,
Sean Christopherson, Paolo Bonzini, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-doc
On Wed, Apr 10, 2024 at 7:35 AM Chao Gao <chao.gao@intel.com> wrote:
>
> From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>
> Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
> written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
> value directly to hardware. There is a tertiary control for
> IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
> masked to prevent guests from changing those bits.
>
> Add controls setting the mask for IA32_SPEC_CTRL and desired value for
> masked bits.
>
> These new controls are especially helpful for protecting guests that
> don't know about BHI_DIS_S and that are running on hardware that
> supports it. This allows the hypervisor to set BHI_DIS_S to fully
> protect the guest.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> [ add a new ioctl to report supported bits. Fix the inverted check ]
> Signed-off-by: Chao Gao <chao.gao@intel.com>
This looks quite Intel-centric. Isn't this feature essentially the
same as AMD's V_SPEC_CTRL? Can't we consolidate the code, rather than
having completely independent implementations for AMD and Intel?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
@ 2024-04-12 4:22 ` Jim Mattson
0 siblings, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2024-04-12 4:22 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
On Wed, Apr 10, 2024 at 8:08 AM Chao Gao <chao.gao@intel.com> wrote:
>
> From: Zhang Chen <chen.zhang@intel.com>
>
> Bit 63 of IA32_ARCH_CAPABILITIES MSR indicates availablility of the
> VIRTUAL_ENUMERATION_MSR (index 0x50000000) which enumerates features
> like e.g., mitigation enumeration that in turn is used for the guest to
> report software mitigations it is using.
>
> Advertise ARCH_CAP_VIRTUAL_ENUM support for VMX and emulate read/write
> of the VIRTUAL_ENUMERATION_MSR. Now VIRTUAL_ENUMERATION_MSR is always 0.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Co-developed-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> arch/x86/kvm/svm/svm.c | 1 +
> arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 2 ++
> arch/x86/kvm/x86.c | 16 +++++++++++++++-
> 4 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d1a9f9951635..e3406971a8b7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4288,6 +4288,7 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
> {
> switch (index) {
> case MSR_IA32_MCG_EXT_CTL:
> + case MSR_VIRTUAL_ENUMERATION:
> case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
> return false;
> case MSR_IA32_SMBASE:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index cdfcc1290d82..dcb06406fd09 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1955,6 +1955,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
> return !(msr->data & ~valid_bits);
> }
>
> +#define VIRTUAL_ENUMERATION_VALID_BITS 0ULL
> +
> static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> {
> switch (msr->index) {
> @@ -1962,6 +1964,9 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> if (!nested)
> return 1;
> return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
> + case MSR_VIRTUAL_ENUMERATION:
> + msr->data = VIRTUAL_ENUMERATION_VALID_BITS;
> + return 0;
> default:
> return KVM_MSR_RET_INVALID;
> }
> @@ -2113,6 +2118,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_DEBUGCTLMSR:
> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> break;
> + case MSR_VIRTUAL_ENUMERATION:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.arch_capabilities & ARCH_CAP_VIRTUAL_ENUM))
> + return 1;
> + msr_info->data = vmx->msr_virtual_enumeration;
> + break;
> default:
> find_uret_msr:
> msr = vmx_find_uret_msr(vmx, msr_info->index);
> @@ -2457,6 +2468,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> }
> ret = kvm_set_msr_common(vcpu, msr_info);
> break;
> + case MSR_VIRTUAL_ENUMERATION:
> + if (!msr_info->host_initiated)
> + return 1;
> + if (data & ~VIRTUAL_ENUMERATION_VALID_BITS)
> + return 1;
> +
> + vmx->msr_virtual_enumeration = data;
> + break;
>
> default:
> find_uret_msr:
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index a4dfe538e5a8..0519cf6187ac 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -294,6 +294,8 @@ struct vcpu_vmx {
> u64 force_spec_ctrl_mask;
> u64 force_spec_ctrl_value;
>
> + u64 msr_virtual_enumeration;
> +
> u32 msr_ia32_umwait_control;
>
> /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9a59b5a93d0e..4721b6fe7641 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1564,6 +1564,7 @@ static const u32 emulated_msrs_all[] = {
>
> MSR_K7_HWCR,
> MSR_KVM_POLL_CONTROL,
> + MSR_VIRTUAL_ENUMERATION,
> };
>
> static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
> @@ -1579,6 +1580,7 @@ static const u32 msr_based_features_all_except_vmx[] = {
> MSR_IA32_UCODE_REV,
> MSR_IA32_ARCH_CAPABILITIES,
> MSR_IA32_PERF_CAPABILITIES,
> + MSR_VIRTUAL_ENUMERATION,
> };
>
> static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
> @@ -1621,7 +1623,8 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
> ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
> ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
> ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
> - ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO)
> + ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | \
> + ARCH_CAP_VIRTUAL_ENUM)
>
> static u64 kvm_get_arch_capabilities(void)
> {
> @@ -1635,6 +1638,17 @@ static u64 kvm_get_arch_capabilities(void)
> */
> data |= ARCH_CAP_PSCHANGE_MC_NO;
>
> + /*
> + * Virtual enumeration is a paravirt feature. The only usage for now
> + * is to bridge the gap caused by microarchitecture changes between
> + * different Intel processors. And its usage is linked to "virtualize
> + * IA32_SPEC_CTRL" which is a VMX feature. Whether AMD SVM can benefit
> + * from the same usage and how to implement it is still unclear. Limit
> + * virtual enumeration to VMX.
> + */
Virtualize IA32_SPEC_CTRL has been an SVM feature for years. See
https://lore.kernel.org/kvm/160738054169.28590.5171339079028237631.stgit@bmoger-ubuntu/.
> + if (static_call(kvm_x86_has_emulated_msr)(NULL, MSR_VIRTUAL_ENUMERATION))
> + data |= ARCH_CAP_VIRTUAL_ENUM;
> +
> /*
> * If we're doing cache flushes (either "always" or "cond")
> * we will do one whenever the guest does a vmlaunch/vmresume.
> --
> 2.39.3
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-12 4:07 ` Jim Mattson
@ 2024-04-12 10:18 ` Chao Gao
2024-06-03 23:55 ` Sean Christopherson
0 siblings, 1 reply; 20+ messages in thread
From: Chao Gao @ 2024-04-12 10:18 UTC (permalink / raw)
To: Jim Mattson
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta,
Sean Christopherson, Paolo Bonzini, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-doc
On Thu, Apr 11, 2024 at 09:07:31PM -0700, Jim Mattson wrote:
>On Wed, Apr 10, 2024 at 7:35 AM Chao Gao <chao.gao@intel.com> wrote:
>>
>> From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>>
>> Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
>> written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
>> value directly to hardware. There is a tertiary control for
>> IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
>> masked to prevent guests from changing those bits.
>>
>> Add controls setting the mask for IA32_SPEC_CTRL and desired value for
>> masked bits.
>>
>> These new controls are especially helpful for protecting guests that
>> don't know about BHI_DIS_S and that are running on hardware that
>> supports it. This allows the hypervisor to set BHI_DIS_S to fully
>> protect the guest.
>>
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>> [ add a new ioctl to report supported bits. Fix the inverted check ]
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>
>This looks quite Intel-centric. Isn't this feature essentially the
>same as AMD's V_SPEC_CTRL?
Yes. they are almost the same. one small difference is intel's version can
force some bits off though I don't see how forcing bits off can be useful.
>Can't we consolidate the code, rather than
>having completely independent implementations for AMD and Intel?
We surely can consolidate the code. I will do this.
I have a question about V_SPEC_CTRL. w/ V_SPEC_CTRL, the SPEC_CTRL MSR retains
the host's value on VM-enter:
.macro RESTORE_GUEST_SPEC_CTRL
/* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
ALTERNATIVE_2 "", \
"jmp 800f", X86_FEATURE_MSR_SPEC_CTRL, \
"", X86_FEATURE_V_SPEC_CTRL
Does this mean all mitigations used by the host will be enabled for the guest
and guests cannot disable them?
Is this intentional? this looks suboptimal. Why not set SPEC_CTRL value to 0 and
let guest decide which features to enable? On the VMX side, we need host to
apply certain hardware mitigations (i.e., BHI_DIS_S and RRSBA_DIS_S) for guest
because BHI's software mitigation may be ineffective. I am not sure why SVM is
enabling all mitigations used by the host for guests. Wouldn't it be better to
enable them on an as-needed basis?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 01/10] KVM: VMX: Virtualize Intel IA32_SPEC_CTRL
2024-04-12 10:18 ` Chao Gao
@ 2024-06-03 23:55 ` Sean Christopherson
0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2024-06-03 23:55 UTC (permalink / raw)
To: Chao Gao
Cc: Jim Mattson, kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta,
Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-doc
On Fri, Apr 12, 2024, Chao Gao wrote:
> On Thu, Apr 11, 2024 at 09:07:31PM -0700, Jim Mattson wrote:
> >On Wed, Apr 10, 2024 at 7:35 AM Chao Gao <chao.gao@intel.com> wrote:
> >>
> >> From: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> >>
> >> Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
> >> written to IA32_SPEC_CTRL by guest. The guest is allowed to write any
> >> value directly to hardware. There is a tertiary control for
> >> IA32_SPEC_CTRL. This control allows for bits in IA32_SPEC_CTRL to be
> >> masked to prevent guests from changing those bits.
> >>
> >> Add controls setting the mask for IA32_SPEC_CTRL and desired value for
> >> masked bits.
> >>
> >> These new controls are especially helpful for protecting guests that
> >> don't know about BHI_DIS_S and that are running on hardware that
> >> supports it. This allows the hypervisor to set BHI_DIS_S to fully
> >> protect the guest.
> >>
> >> Suggested-by: Sean Christopherson <seanjc@google.com>
> >> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> >> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> >> [ add a new ioctl to report supported bits. Fix the inverted check ]
> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >
> >This looks quite Intel-centric. Isn't this feature essentially the
> >same as AMD's V_SPEC_CTRL?
In spirit, yes. In practice, not really. The implementations required for each
end up being quite different. I think the only bit of code that could be reused
by SVM, and isn't already, is the generation of supported_force_spec_ctrl.
+ kvm_caps.supported_force_spec_ctrl = 0;
+
+ if (cpu_has_spec_ctrl_shadow()) {
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_IBRS;
+
+ if (boot_cpu_has(X86_FEATURE_STIBP))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_STIBP;
+
+ if (boot_cpu_has(X86_FEATURE_SSBD))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_SSBD;
+
+ if (boot_cpu_has(X86_FEATURE_RRSBA_CTRL) &&
+ (host_arch_capabilities & ARCH_CAP_RRSBA))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_RRSBA_DIS_S;
+
+ if (boot_cpu_has(X86_FEATURE_BHI_CTRL))
+ kvm_caps.supported_force_spec_ctrl |= SPEC_CTRL_BHI_DIS_S;
+ }
> Yes. they are almost the same. one small difference is intel's version can
> force some bits off though I don't see how forcing bits off can be useful.
Another not-so-small difference is that Intel's version can also force bits *on*,
and force them on only for the guest with minimal overhead.
> >Can't we consolidate the code, rather than
> >having completely independent implementations for AMD and Intel?
>
> We surely can consolidate the code. I will do this.
>
> I have a question about V_SPEC_CTRL. w/ V_SPEC_CTRL, the SPEC_CTRL MSR retains
> the host's value on VM-enter:
>
> .macro RESTORE_GUEST_SPEC_CTRL
> /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
> ALTERNATIVE_2 "", \
> "jmp 800f", X86_FEATURE_MSR_SPEC_CTRL, \
> "", X86_FEATURE_V_SPEC_CTRL
>
> Does this mean all mitigations used by the host will be enabled for the guest
> and guests cannot disable them?
Yes.
> Is this intentional? this looks suboptimal. Why not set SPEC_CTRL value to 0 and
> let guest decide which features to enable? On the VMX side, we need host to
> apply certain hardware mitigations (i.e., BHI_DIS_S and RRSBA_DIS_S) for guest
> because BHI's software mitigation may be ineffective. I am not sure why SVM is
> enabling all mitigations used by the host for guests. Wouldn't it be better to
> enable them on an as-needed basis?
AMD's V_SPEC_CTRL doesn't provide a fast context switch of SPEC_CTRL, it performs
a bitwise-OR of the host and guest values. So to load a subset (or superset) of
the host protections, KVM would need to do an extra WRMSR before VMRUN, and again
after VMRUN.
That said, I have no idea whether or not avoiding WRMSR on AMD is optimal.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
@ 2024-06-11 1:34 ` Sean Christopherson
2024-06-11 10:48 ` Chao Gao
0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2024-06-11 1:34 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Wed, Apr 10, 2024, Chao Gao wrote:
> From: Zhang Chen <chen.zhang@intel.com>
>
> Allow guest to report if the short BHB-clearing sequence is in use.
>
> KVM will deploy BHI_DIS_S for the guest if the short BHB-clearing
> sequence is in use and the processor doesn't enumerate BHI_NO.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 31 ++++++++++++++++++++++++++++---
> 1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index cc260b14f8df..c5ceaebd954b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1956,8 +1956,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
> }
>
> #define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
> -#define MITI_ENUM_VALID_BITS 0ULL
> -#define MITI_CTRL_VALID_BITS 0ULL
> +#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
> +#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
>
> static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> {
> @@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> struct vmx_uret_msr *msr;
> int ret = 0;
> u32 msr_index = msr_info->index;
> - u64 data = msr_info->data;
> + u64 data = msr_info->data, spec_ctrl_mask = 0;
> u32 index;
>
> switch (msr_index) {
> @@ -2508,6 +2508,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (data & ~MITI_CTRL_VALID_BITS)
> return 1;
>
> + if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
> + kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
> + !(host_arch_capabilities & ARCH_CAP_BHI_NO))
> + spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
> +
> + /*
> + * Intercept IA32_SPEC_CTRL to disallow guest from changing
> + * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
> + * e.g., in nested case.
> + */
> + if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
> + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
> +
> + /*
> + * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
> + * MSR_VIRTUAL_MITIGATION_CTRL.
> + */
> + spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
> +
> + vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
> + spec_ctrl_mask;
> + vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
> + spec_ctrl_mask;
> + vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
> +
> vmx->msr_virtual_mitigation_ctrl = data;
> break;
I continue find all of this unpalatable. The guest tells KVM what software
mitigations the guest is using, and then KVM is supposed to translate that into
some hardware functionality? And merge that with userspace's own overrides?
Blech.
With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
there's no need for KVM to handle them for performance reasons.
That way KVM doesn't need to deal with the the virtual MSRs, userspace can make
an informed decision when deciding how to set KVM_CAP_FORCE_SPEC_CTRL, and as a
bonus, rollouts for new mitigation thingies should be faster as updating userspace
is typically easier than updating the kernel/KVM.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 1:34 ` Sean Christopherson
@ 2024-06-11 10:48 ` Chao Gao
2024-06-11 13:34 ` Sean Christopherson
0 siblings, 1 reply; 20+ messages in thread
From: Chao Gao @ 2024-06-11 10:48 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
>> + if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
>> + kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
>> + !(host_arch_capabilities & ARCH_CAP_BHI_NO))
>> + spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
>> +
>> + /*
>> + * Intercept IA32_SPEC_CTRL to disallow guest from changing
>> + * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
>> + * e.g., in nested case.
>> + */
>> + if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
>> + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
>> +
>> + /*
>> + * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
>> + * MSR_VIRTUAL_MITIGATION_CTRL.
>> + */
>> + spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
>> +
>> + vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
>> + spec_ctrl_mask;
>> + vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
>> + spec_ctrl_mask;
>> + vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
>> +
>> vmx->msr_virtual_mitigation_ctrl = data;
>> break;
>
>I continue find all of this unpalatable. The guest tells KVM what software
>mitigations the guest is using, and then KVM is supposed to translate that into
>some hardware functionality? And merge that with userspace's own overrides?
Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
wanted to punt to userspace ...
>
>Blech.
>
>With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
>Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
>Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
>to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
>there's no need for KVM to handle them for performance reasons.
... I had this idea of implementing policy-related stuff in userspace, and I wrote
in the cover-letter:
"""
1. the KVM<->userspace ABI defined in patch 1
I am wondering if we can allow the userspace to configure the mask
and the shadow value during guest's lifetime and do it on a vCPU basis.
this way, in conjunction with "virtual MSRs" or any other interfaces,
the usespace can adjust hardware mitigations applied to the guest during
guest's lifetime e.g., for the best performance.
"""
As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
the mask and shadow values adjustable and applicable on a per-vCPU basis. The
tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
preferable interfaces, they could also benefit from these changes.
Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
>
>That way KVM doesn't need to deal with the the virtual MSRs, userspace can make
>an informed decision when deciding how to set KVM_CAP_FORCE_SPEC_CTRL, and as a
>bonus, rollouts for new mitigation thingies should be faster as updating userspace
>is typically easier than updating the kernel/KVM.
Good point!
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 10:48 ` Chao Gao
@ 2024-06-11 13:34 ` Sean Christopherson
2024-06-11 14:08 ` Chao Gao
0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2024-06-11 13:34 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Tue, Jun 11, 2024, Chao Gao wrote:
> >I continue find all of this unpalatable. The guest tells KVM what software
> >mitigations the guest is using, and then KVM is supposed to translate that into
> >some hardware functionality? And merge that with userspace's own overrides?
>
> Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
> wanted to punt to userspace ...
>
> >
> >Blech.
> >
> >With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
> >Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
> >Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
> >to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
> >there's no need for KVM to handle them for performance reasons.
>
> ... I had this idea of implementing policy-related stuff in userspace, and I wrote
> in the cover-letter:
>
> """
> 1. the KVM<->userspace ABI defined in patch 1
>
> I am wondering if we can allow the userspace to configure the mask
> and the shadow value during guest's lifetime and do it on a vCPU basis.
> this way, in conjunction with "virtual MSRs" or any other interfaces,
> the usespace can adjust hardware mitigations applied to the guest during
> guest's lifetime e.g., for the best performance.
> """
Gah, sorry, I speed read the cover letter and didn't take the time to process that.
> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
> preferable interfaces, they could also benefit from these changes.
>
> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
mitigations be system-wide / VM-wide?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 13:34 ` Sean Christopherson
@ 2024-06-11 14:08 ` Chao Gao
2024-06-11 16:32 ` Sean Christopherson
0 siblings, 1 reply; 20+ messages in thread
From: Chao Gao @ 2024-06-11 14:08 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Tue, Jun 11, 2024 at 06:34:49AM -0700, Sean Christopherson wrote:
>On Tue, Jun 11, 2024, Chao Gao wrote:
>> >I continue find all of this unpalatable. The guest tells KVM what software
>> >mitigations the guest is using, and then KVM is supposed to translate that into
>> >some hardware functionality? And merge that with userspace's own overrides?
>>
>> Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
>> wanted to punt to userspace ...
>>
>> >
>> >Blech.
>> >
>> >With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
>> >Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
>> >Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
>> >to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
>> >there's no need for KVM to handle them for performance reasons.
>>
>> ... I had this idea of implementing policy-related stuff in userspace, and I wrote
>> in the cover-letter:
>>
>> """
>> 1. the KVM<->userspace ABI defined in patch 1
>>
>> I am wondering if we can allow the userspace to configure the mask
>> and the shadow value during guest's lifetime and do it on a vCPU basis.
>> this way, in conjunction with "virtual MSRs" or any other interfaces,
>> the usespace can adjust hardware mitigations applied to the guest during
>> guest's lifetime e.g., for the best performance.
>> """
>
>Gah, sorry, I speed read the cover letter and didn't take the time to process that.
>
>> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
>> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
>> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
>> preferable interfaces, they could also benefit from these changes.
>>
>> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
>
>Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
>mitigations be system-wide / VM-wide?
Because spec_ctrl is per-vCPU and Intel-defined virtual MSRs are also per-vCPU.
i.e., a guest __can__ configure different values to virtual MSRs on different
vCPUs even though a sane guest won't do this. If KVM doesn't want to rule out
the possibility of supporting Intel-defined virtual MSRs in userspace or any
other per-vCPU interfaces, KVM_CAP_FORCE_SPEC_CTRL needs to be per-vCPU.
implementation-wise, being per-vCPU is simpler because, otherwise, once userspace
adjusts the hardware mitigations to enforce, KVM needs to kick all vCPUs. This
will add more complexity.
And IMO, requiring guests to deploy same mitigations on vCPUs is an unnecessary
limitation.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
2024-06-11 14:08 ` Chao Gao
@ 2024-06-11 16:32 ` Sean Christopherson
0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2024-06-11 16:32 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-kernel, daniel.sneddon, pawan.kumar.gupta, Zhang Chen,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
On Tue, Jun 11, 2024, Chao Gao wrote:
> On Tue, Jun 11, 2024 at 06:34:49AM -0700, Sean Christopherson wrote:
> >> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
> >> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
> >> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
> >> preferable interfaces, they could also benefit from these changes.
> >>
> >> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
> >
> >Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
> >mitigations be system-wide / VM-wide?
>
> Because spec_ctrl is per-vCPU and Intel-defined virtual MSRs are also per-vCPU.
I figured that was the answer, but part of me was hopeful :-)
> i.e., a guest __can__ configure different values to virtual MSRs on different
> vCPUs even though a sane guest won't do this. If KVM doesn't want to rule out
> the possibility of supporting Intel-defined virtual MSRs in userspace or any
> other per-vCPU interfaces, KVM_CAP_FORCE_SPEC_CTRL needs to be per-vCPU.
>
> implementation-wise, being per-vCPU is simpler because, otherwise, once userspace
> adjusts the hardware mitigations to enforce, KVM needs to kick all vCPUs. This
> will add more complexity.
+1, I even typed up as much before reading this paragraph.
> And IMO, requiring guests to deploy same mitigations on vCPUs is an unnecessary
> limitation.
Yeah, I can see how it would make things weird for no good reason.
So yeah, if the only thing stopping us from letting userspace deal with the virtual
MSRs is converting to a vCPU-scoped ioctl(), then by all means, lets do that.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-06-11 16:32 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-10 14:34 [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 01/10] KVM: VMX: " Chao Gao
2024-04-12 4:07 ` Jim Mattson
2024-04-12 10:18 ` Chao Gao
2024-06-03 23:55 ` Sean Christopherson
2024-04-10 14:34 ` [RFC PATCH v3 02/10] KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02 Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 04/10] x86/bugs: Use Virtual MSRs to request BHI_DIS_S Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 05/10] x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 06/10] KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 07/10] KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support Chao Gao
2024-04-12 4:22 ` Jim Mattson
2024-04-10 14:34 ` [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support Chao Gao
2024-04-10 14:34 ` [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT Chao Gao
2024-06-11 1:34 ` Sean Christopherson
2024-06-11 10:48 ` Chao Gao
2024-06-11 13:34 ` Sean Christopherson
2024-06-11 14:08 ` Chao Gao
2024-06-11 16:32 ` Sean Christopherson
2024-04-10 14:34 ` [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT Chao Gao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).