[PATCH v6 0/5] KVM: Expose speculation control feature to guests

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/5] KVM: Expose speculation control feature to guests
@ 2018-02-01 21:59 KarimAllah Ahmed
  2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel,
	Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu <m

Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is used by Intel processors to
indicate RDCL_NO and IBRS_ALL.

Keep in mind that the SVM part of the patch is unchanged this time. Mostly to
get feedback/confirmation about the nested handling for VMX first, once this is
done I will update SVM as well.

v6:
- Do not penalize (save/restore IBRS) all L2 guests when anyone of them starts
  using the SPEC_CTRL.

v5:
- svm: add PRED_CMD and SPEC_CTRL to direct_access_msrs list.
- vmx: check also for X86_FEATURE_SPEC_CTRL for msr reads and writes.
- vmx: Use MSR_TYPE_W instead of MSR_TYPE_R for the nested IBPB MSR
- rewrite commit message for IBPB patch [2/5] (Ashok)

v4:
- Add IBRS passthrough for SVM (5/5).
- Handle nested guests properly.
- expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features

Ashok Raj (1):
  KVM: x86: Add IBPB support

KarimAllah Ahmed (4):
  KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

 arch/x86/kvm/cpuid.c |  22 ++++--
 arch/x86/kvm/cpuid.h |   1 +
 arch/x86/kvm/svm.c   |  87 +++++++++++++++++++++++
 arch/x86/kvm/vmx.c   | 196 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c   |   1 +
 5 files changed, 299 insertions(+), 8 deletions(-)

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org

-- 
2.7.4

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  2018-02-01 21:59 [PATCH v6 0/5] KVM: Expose speculation control feature to guests KarimAllah Ahmed
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 17:37   ` Jim Mattson
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, David Woodhouse

[dwmw2: Stop using KF() for bits in it, too]
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/cpuid.c | 8 +++-----
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW     2
-#define KVM_CPUID_BIT_AVX512_4FMAPS     3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
 				entry->ecx &= ~F(PKU);
 			entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-			entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+			cpuid_mask(&entry->edx, CPUID_7_EDX);
 		} else {
 			entry->ebx = 0;
 			entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
 	[CPUID_7_ECX]         = {         7, 0, CPUID_ECX},
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
+	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
@ 2018-02-02 17:37   ` Jim Mattson
  0 siblings, 0 replies; 34+ messages in thread
From: Jim Mattson @ 2018-02-02 17:37 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm list, LKML, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, David Woodhouse

On Thu, Feb 1, 2018 at 1:59 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
> [dwmw2: Stop using KF() for bits in it, too]
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 [PATCH v6 0/5] KVM: Expose speculation control feature to guests KarimAllah Ahmed
  2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 34+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra, David Woodhouse, KarimAllah Ahmed

From: Ashok Raj <ashok.raj@intel.com>

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
    there is a IBPB in that path.
    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
    Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
---
v6:
- introduce msr_write_intercepted_l01

v5:
- Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
- Always merge the bitmaps unconditionally.
- Add PRED_CMD to direct_access_msrs.
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
- rewrite the commit message (from ashok.raj@)
---
 arch/x86/kvm/cpuid.c | 11 +++++++-
 arch/x86/kvm/svm.c   | 28 ++++++++++++++++++
 arch/x86/kvm/vmx.c   | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
 		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+	/* cpuid 0x80000008.ebx */
+	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+		F(IBPB);
+
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
 		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		if (!g_phys_as)
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
-		entry->ebx = entry->edx = 0;
+		entry->edx = 0;
+		/* IBPB isn't necessarily present in hardware cpuid */
+		if (boot_cpu_has(X86_FEATURE_IBPB))
+			entry->ebx |= F(IBPB);
+		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
 	}
 	case 0x80000019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..254eefb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_PRED_CMD,			.always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
@@ -529,6 +530,7 @@ struct svm_cpu_data {
 	struct kvm_ldttss_desc *tss_desc;
 
 	struct page *save_area;
+	struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
+	/*
+	 * The vmcb page can be recycled, causing a false negative in
+	 * svm_vcpu_load(). So do a full IBPB now.
+	 */
+	indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	int i;
 
 	if (unlikely(cpu != vcpu->cpu)) {
@@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+	if (sd->current_vmcb != svm->vmcb) {
+		sd->current_vmcb = svm->vmcb;
+		indirect_branch_prediction_barrier();
+	}
 	avic_vcpu_load(vcpu, cpu);
 }
 
@@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+		if (is_guest_mode(vcpu))
+			break;
+		set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+		break;
 	case MSR_STAR:
 		svm->vmcb->save.star = data;
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d46a61b..263eb1f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -592,6 +592,7 @@ struct vcpu_vmx {
 	u64 		      msr_host_kernel_gs_base;
 	u64 		      msr_guest_kernel_gs_base;
 #endif
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	u32 secondary_exec_control;
@@ -936,6 +937,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
 					    u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+							  u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -1907,6 +1910,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 	vmcs_write32(EXCEPTION_BITMAP, eb);
 }
 
+/*
+ * Check if MSR is intercepted for L01 MSR bitmap.
+ */
+static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
 static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
 		unsigned long entry, unsigned long exit)
 {
@@ -2285,6 +2311,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
 		vmcs_load(vmx->loaded_vmcs->vmcs);
+		indirect_branch_prediction_barrier();
 	}
 
 	if (!already_loaded) {
@@ -3342,6 +3369,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+					      MSR_TYPE_W);
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -10044,9 +10099,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	struct page *page;
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
+	/*
+	 * pred_cmd is trying to verify two things:
+	 *
+	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
+	 *    ensures that we do not accidentally generate an L02 MSR bitmap
+	 *    from the L12 MSR bitmap that is too permissive.
+	 * 2. That L1 or L2s have actually used the MSR. This avoids
+	 *    unnecessarily merging of the bitmap if the MSR is unused. This
+	 *    works properly because we only update the L01 MSR bitmap lazily.
+	 *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
+	 *    updated to reflect this when L1 (or its L2s) actually write to
+	 *    the MSR.
+	 */
+	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
 
-	/* This shortcut is ok because we support only x2APIC MSRs so far. */
-	if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+	    !pred_cmd)
 		return false;
 
 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10079,6 +10148,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 				MSR_TYPE_W);
 		}
 	}
+
+	if (pred_cmd)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_PRED_CMD,
+					MSR_TYPE_W);
+
 	kunmap(page);
 	kvm_release_page_clean(page);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
@ 2018-02-02 17:49   ` Konrad Rzeszutek Wilk
  2018-02-02 18:02     ` David Woodhouse
  2018-02-16  3:44   ` Jim Mattson
  2018-05-03  1:27   ` Wanpeng Li
  2 siblings, 1 reply; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 17:49 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Ashok Raj, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Peter Zijlstra,
	David Woodhouse

On Thu, Feb 01, 2018 at 10:59:43PM +0100, KarimAllah Ahmed wrote:
> From: Ashok Raj <ashok.raj@intel.com>
> 
> The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
> control mechanism. It keeps earlier branches from influencing
> later ones.
> 
> Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
> It's a command that ensures predicted branch targets aren't used after
> the barrier. Although IBRS and IBPB are enumerated by the same CPUID
> enumeration, IBPB is very different.
> 
> IBPB helps mitigate against three potential attacks:
> 
> * Mitigate guests from being attacked by other guests.
>   - This is addressed by issing IBPB when we do a guest switch.
> 
> * Mitigate attacks from guest/ring3->host/ring3.
>   These would require a IBPB during context switch in host, or after
>   VMEXIT. The host process has two ways to mitigate
>   - Either it can be compiled with retpoline
>   - If its going through context switch, and has set !dumpable then
>     there is a IBPB in that path.
>     (Tim's patch: https://patchwork.kernel.org/patch/10192871)
>   - The case where after a VMEXIT you return back to Qemu might make
>     Qemu attackable from guest when Qemu isn't compiled with retpoline.
>   There are issues reported when doing IBPB on every VMEXIT that resulted
>   in some tsc calibration woes in guest.
> 
> * Mitigate guest/ring0->host/ring0 attacks.
>   When host kernel is using retpoline it is safe against these attacks.
>   If host kernel isn't using retpoline we might need to do a IBPB flush on
>   every VMEXIT.
> 
> Even when using retpoline for indirect calls, in certain conditions 'ret'
> can use the BTB on Skylake-era CPUs. There are other mitigations
> available like RSB stuffing/clearing.
> 
> * IBPB is issued only for SVM during svm_free_vcpu().
>   VMX has a vmclear and SVM doesn't.  Follow discussion here:
>   https://lkml.org/lkml/2018/1/15/146
> 
> Please refer to the following spec for more details on the enumeration
> and control.
> 
> Refer here to get documentation about mitigations.
> 
> https://software.intel.com/en-us/side-channel-security-support
> 
> [peterz: rebase and changelog rewrite]
> [karahmed: - rebase
>            - vmx: expose PRED_CMD if guest has it in CPUID
>            - svm: only pass through IBPB if guest has it in CPUID
>            - vmx: support !cpu_has_vmx_msr_bitmap()]
>            - vmx: support nested]
> [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
>         PRED_CMD is a write-only MSR]
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

with some small nits.
> ---
> v6:
> - introduce msr_write_intercepted_l01
> 
> v5:
> - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
> - Always merge the bitmaps unconditionally.
> - Add PRED_CMD to direct_access_msrs.
> - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
> - rewrite the commit message (from ashok.raj@)
> ---
>  arch/x86/kvm/cpuid.c | 11 +++++++-
>  arch/x86/kvm/svm.c   | 28 ++++++++++++++++++
>  arch/x86/kvm/vmx.c   | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 116 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index c0eb337..033004d 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
>  		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
>  
> +	/* cpuid 0x80000008.ebx */
> +	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
> +		F(IBPB);
> +
>  	/* cpuid 0xC0000001.edx */
>  	const u32 kvm_cpuid_C000_0001_edx_x86_features =
>  		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
> @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		if (!g_phys_as)
>  			g_phys_as = phys_as;
>  		entry->eax = g_phys_as | (virt_as << 8);
> -		entry->ebx = entry->edx = 0;
> +		entry->edx = 0;
> +		/* IBPB isn't necessarily present in hardware cpuid */

It is with x86/pti nowadays. I think you can remove that comment.

..snip..
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d46a61b..263eb1f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -592,6 +592,7 @@ struct vcpu_vmx {
>  	u64 		      msr_host_kernel_gs_base;
>  	u64 		      msr_guest_kernel_gs_base;
>  #endif
> +

Spurious..

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
@ 2018-02-02 18:02     ` David Woodhouse
  2018-02-02 19:56       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2018-02-02 18:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Ashok Raj, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > kvm_cpuid_entry2 *entry, u32 function,
> >                 if (!g_phys_as)
> >                         g_phys_as = phys_as;
> >                 entry->eax = g_phys_as | (virt_as << 8);
> > -               entry->ebx = entry->edx = 0;
> > +               entry->edx = 0;
> > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > +                       entry->ebx |= F(IBPB);
> > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> 
> It is with x86/pti nowadays. I think you can remove that comment.

In this code we use the actual CPUID instruction, then filter stuff out
of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
off any bits which were otherwise present in the hardware and *would*
have been supported by KVM, but which the kernel has decided to pretend
are not present.

Nothing would *set* the IBPB bit though, since that's a "virtual" bit
on Intel hardware. The comment explains why we have that |= F(IBPB),
and if the comment wasn't true, we wouldn't need that code either.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 18:02     ` David Woodhouse
@ 2018-02-02 19:56       ` Konrad Rzeszutek Wilk
  2018-02-02 20:16         ` David Woodhouse
  0 siblings, 1 reply; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 19:56 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra

On Fri, Feb 02, 2018 at 06:02:24PM +0000, David Woodhouse wrote:
> On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > > kvm_cpuid_entry2 *entry, u32 function,
> > >                 if (!g_phys_as)
> > >                         g_phys_as = phys_as;
> > >                 entry->eax = g_phys_as | (virt_as << 8);
> > > -               entry->ebx = entry->edx = 0;
> > > +               entry->edx = 0;
> > > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > > +                       entry->ebx |= F(IBPB);
> > > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> > 
> > It is with x86/pti nowadays. I think you can remove that comment.
> 
> In this code we use the actual CPUID instruction, then filter stuff out
> of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
> off any bits which were otherwise present in the hardware and *would*
> have been supported by KVM, but which the kernel has decided to pretend
> are not present.
> 
> Nothing would *set* the IBPB bit though, since that's a "virtual" bit
> on Intel hardware. The comment explains why we have that |= F(IBPB),
> and if the comment wasn't true, we wouldn't need that code either.

But this seems wrong. That is on Intel CPUs we will advertise on
AMD leaf that the IBPB feature is available.

Shouldn't we just check to see if the machine is AMD before advertising
this bit?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 19:56       ` Konrad Rzeszutek Wilk
@ 2018-02-02 20:16         ` David Woodhouse
  2018-02-02 20:28           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2018-02-02 20:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]



On Fri, 2018-02-02 at 14:56 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Feb 02, 2018 at 06:02:24PM +0000, David Woodhouse wrote:
> > 
> > On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > > 
> > > > 
> > > > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > > > kvm_cpuid_entry2 *entry, u32 function,
> > > >                  if (!g_phys_as)
> > > >                          g_phys_as = phys_as;
> > > >                  entry->eax = g_phys_as | (virt_as << 8);
> > > > -               entry->ebx = entry->edx = 0;
> > > > +               entry->edx = 0;
> > > > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > > > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > > > +                       entry->ebx |= F(IBPB);
> > > > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > > > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> > > It is with x86/pti nowadays. I think you can remove that comment.
> > In this code we use the actual CPUID instruction, then filter stuff out
> > of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
> > off any bits which were otherwise present in the hardware and *would*
> > have been supported by KVM, but which the kernel has decided to pretend
> > are not present.
> > 
> > Nothing would *set* the IBPB bit though, since that's a "virtual" bit
> > on Intel hardware. The comment explains why we have that |= F(IBPB),
> > and if the comment wasn't true, we wouldn't need that code either.
>
> But this seems wrong. That is on Intel CPUs we will advertise on
> AMD leaf that the IBPB feature is available.
> 
> Shouldn't we just check to see if the machine is AMD before advertising
> this bit?

No. The AMD feature bits give us more fine-grained support for exposing
IBPB or IBRS alone, so we expose those bits on Intel too.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:16         ` David Woodhouse
@ 2018-02-02 20:28           ` Konrad Rzeszutek Wilk
  2018-02-02 20:31             ` David Woodhouse
                               ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 20:28 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra

On Fri, Feb 02, 2018 at 08:16:15PM +0000, David Woodhouse wrote:
> 
> 
> On Fri, 2018-02-02 at 14:56 -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Feb 02, 2018 at 06:02:24PM +0000, David Woodhouse wrote:
> > > 
> > > On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > > > 
> > > > > 
> > > > > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > > > > kvm_cpuid_entry2 *entry, u32 function,
> > > > >                  if (!g_phys_as)
> > > > >                          g_phys_as = phys_as;
> > > > >                  entry->eax = g_phys_as | (virt_as << 8);
> > > > > -               entry->ebx = entry->edx = 0;
> > > > > +               entry->edx = 0;
> > > > > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > > > > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > > > > +                       entry->ebx |= F(IBPB);
> > > > > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > > > > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> > > > It is with x86/pti nowadays. I think you can remove that comment.
> > > In this code we use the actual CPUID instruction, then filter stuff out
> > > of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
> > > off any bits which were otherwise present in the hardware and *would*
> > > have been supported by KVM, but which the kernel has decided to pretend
> > > are not present.
> > > 
> > > Nothing would *set* the IBPB bit though, since that's a "virtual" bit
> > > on Intel hardware. The comment explains why we have that |= F(IBPB),
> > > and if the comment wasn't true, we wouldn't need that code either.
> >
> > But this seems wrong. That is on Intel CPUs we will advertise on
> > AMD leaf that the IBPB feature is available.
> > 
> > Shouldn't we just check to see if the machine is AMD before advertising
> > this bit?
> 
> No. The AMD feature bits give us more fine-grained support for exposing
> IBPB or IBRS alone, so we expose those bits on Intel too.

But but.. that runs smack against the idea of exposing a platform that
is as close to emulating the real hardware as possible.

As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
leaf. Hence KVM (nor any hypervisor) should not do it either.

Unless Intel is doing it? Did I miss a new spec update?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:28           ` Konrad Rzeszutek Wilk
@ 2018-02-02 20:31             ` David Woodhouse
  2018-02-02 20:52               ` Konrad Rzeszutek Wilk
  2018-02-02 20:52             ` Alan Cox
  2018-02-05 19:24             ` Paolo Bonzini
  2 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2018-02-02 20:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 664 bytes --]

On Fri, 2018-02-02 at 15:28 -0500, Konrad Rzeszutek Wilk wrote:
> 
> > 
> > No. The AMD feature bits give us more fine-grained support for exposing
> > IBPB or IBRS alone, so we expose those bits on Intel too.
> 
> But but.. that runs smack against the idea of exposing a platform that
> is as close to emulating the real hardware as possible.
> 
> As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
> leaf. Hence KVM (nor any hypervisor) should not do it either.
> 
> Unless Intel is doing it? Did I miss a new spec update?

Are you telling me there's no way you can infer from CPUID that you're
running in a hypervisor?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:31             ` David Woodhouse
@ 2018-02-02 20:52               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 20:52 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra

On Fri, Feb 02, 2018 at 08:31:27PM +0000, David Woodhouse wrote:
> On Fri, 2018-02-02 at 15:28 -0500, Konrad Rzeszutek Wilk wrote:
> > 
> > > 
> > > No. The AMD feature bits give us more fine-grained support for exposing
> > > IBPB or IBRS alone, so we expose those bits on Intel too.
> > 
> > But but.. that runs smack against the idea of exposing a platform that
> > is as close to emulating the real hardware as possible.
> > 
> > As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
> > leaf. Hence KVM (nor any hypervisor) should not do it either.
> > 
> > Unless Intel is doing it? Did I miss a new spec update?
> 
> Are you telling me there's no way you can infer from CPUID that you're
> running in a hypervisor?

That is not what I am saying. The CPUIDs 0x40000000 ... 0x400000ff
are reserved for hypervisor usage. The SDM is pretty clear about it.

The Intel SDM and the AMD equivalant are pretty clear about what the
other leafs should have on its platform.

[5 minutes later]

And I am eating my words here. 

CPUID.80000008 shows how MAXPHYSADDR is used (on the Intel SDM).

Never mind the noise.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:28           ` Konrad Rzeszutek Wilk
  2018-02-02 20:31             ` David Woodhouse
@ 2018-02-02 20:52             ` Alan Cox
  2018-02-05 19:22               ` Paolo Bonzini
  2018-02-05 19:24             ` Paolo Bonzini
  2 siblings, 1 reply; 34+ messages in thread
From: Alan Cox @ 2018-02-02 20:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Woodhouse, KarimAllah Ahmed, kvm, linux-kernel, x86,
	Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

> > No. The AMD feature bits give us more fine-grained support for exposing
> > IBPB or IBRS alone, so we expose those bits on Intel too.  
> 
> But but.. that runs smack against the idea of exposing a platform that
> is as close to emulating the real hardware as possible.

Agreed, and it's asking for problems in the future if for example Intel
or another non AMD vendor did ever use that leaf for something different.

Now whether there ought to be an MSR range every vendor agrees is never
implemented so software can use it is an interesting discussion.

Alan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:52             ` Alan Cox
@ 2018-02-05 19:22               ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2018-02-05 19:22 UTC (permalink / raw)
  To: Alan Cox, Konrad Rzeszutek Wilk
  Cc: David Woodhouse, KarimAllah Ahmed, kvm, linux-kernel, x86,
	Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Peter Zijlstra

On 02/02/2018 21:52, Alan Cox wrote:
>>> No. The AMD feature bits give us more fine-grained support for exposing
>>> IBPB or IBRS alone, so we expose those bits on Intel too.  
>> But but.. that runs smack against the idea of exposing a platform that
>> is as close to emulating the real hardware as possible.
> Agreed, and it's asking for problems in the future if for example Intel
> or another non AMD vendor did ever use that leaf for something different.

Leaves starting at 0 are reserved to Intel; leaves starting at
0x80000000 are reserved to AMD.

0x40000000 to 0x400000FF (some will say 0x4FFFFFFF) are reserved to
hypervisors.

> Now whether there ought to be an MSR range every vendor agrees is never
> implemented so software can use it is an interesting discussion.

For MSRs there is no explicit indication, but traditionally Intel is
using numbers based at 0 and AMD is using numbers based at 0xC0000000.

Furthermore, the manuals for virtualization extensions tell you that
Intel isn't planning to go beyond 0x1FFF, and AMD is planning to use
only 0xC0000000-0xC0001FFF and 0xC0010000-0xC0011FFF.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:28           ` Konrad Rzeszutek Wilk
  2018-02-02 20:31             ` David Woodhouse
  2018-02-02 20:52             ` Alan Cox
@ 2018-02-05 19:24             ` Paolo Bonzini
  2 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2018-02-05 19:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Peter Zijlstra

On 02/02/2018 21:28, Konrad Rzeszutek Wilk wrote:
>>>> Nothing would *set* the IBPB bit though, since that's a "virtual" bit
>>>> on Intel hardware. The comment explains why we have that |= F(IBPB),
>>>> and if the comment wasn't true, we wouldn't need that code either.
>>> But this seems wrong. That is on Intel CPUs we will advertise on
>>> AMD leaf that the IBPB feature is available.
>>>
>>> Shouldn't we just check to see if the machine is AMD before advertising
>>> this bit?
>> No. The AMD feature bits give us more fine-grained support for exposing
>> IBPB or IBRS alone, so we expose those bits on Intel too.
> But but.. that runs smack against the idea of exposing a platform that
> is as close to emulating the real hardware as possible.
> 
> As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
> leaf. Hence KVM (nor any hypervisor) should not do it either.

This is KVM_GET_*SUPPORTED*_CPUID.  The actual CPUID bits that are
exposed (and also which CPUID leafs are there, even though this one is
present in both Intel and AMD) are determined by userspace.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
@ 2018-02-16  3:44   ` Jim Mattson
  2018-02-16  4:22     ` Andi Kleen
  2018-05-03  1:27   ` Wanpeng Li
  2 siblings, 1 reply; 34+ messages in thread
From: Jim Mattson @ 2018-02-16  3:44 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm list, LKML, the arch/x86 maintainers, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra, David Woodhouse

On Thu, Feb 1, 2018 at 1:59 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:

> @@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);

Should this be wrmsrl_safe? I don't see where we've verified host
support of this MSR.

> @@ -3342,6 +3369,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr_info);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);

And here too...wrmsrl_safe?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-16  3:44   ` Jim Mattson
@ 2018-02-16  4:22     ` Andi Kleen
  0 siblings, 0 replies; 34+ messages in thread
From: Andi Kleen @ 2018-02-16  4:22 UTC (permalink / raw)
  To: Jim Mattson
  Cc: KarimAllah Ahmed, kvm list, LKML, the arch/x86 maintainers,
	Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra, David Woodhouse

> > +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> 
> Should this be wrmsrl_safe? I don't see where we've verified host
> support of this MSR.

In mainline all wrmsr are wrmsrl_safe now.

-Andi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
  2018-02-16  3:44   ` Jim Mattson
@ 2018-05-03  1:27   ` Wanpeng Li
  2018-05-03  9:19     ` Paolo Bonzini
  2 siblings, 1 reply; 34+ messages in thread
From: Wanpeng Li @ 2018-05-03  1:27 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, LKML, the arch/x86 maintainers, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra, David Woodhouse

Hi Ashok，
2018-02-02 5:59 GMT+08:00 KarimAllah Ahmed <karahmed@amazon.de>:
> From: Ashok Raj <ashok.raj@intel.com>
>
> The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
> control mechanism. It keeps earlier branches from influencing
> later ones.
>
> Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
> It's a command that ensures predicted branch targets aren't used after
> the barrier. Although IBRS and IBPB are enumerated by the same CPUID
> enumeration, IBPB is very different.
>
> IBPB helps mitigate against three potential attacks:
>
> * Mitigate guests from being attacked by other guests.
>   - This is addressed by issing IBPB when we do a guest switch.
>
> * Mitigate attacks from guest/ring3->host/ring3.
>   These would require a IBPB during context switch in host, or after
>   VMEXIT. The host process has two ways to mitigate
>   - Either it can be compiled with retpoline
>   - If its going through context switch, and has set !dumpable then
>     there is a IBPB in that path.
>     (Tim's patch: https://patchwork.kernel.org/patch/10192871)
>   - The case where after a VMEXIT you return back to Qemu might make
>     Qemu attackable from guest when Qemu isn't compiled with retpoline.
>   There are issues reported when doing IBPB on every VMEXIT that resulted
>   in some tsc calibration woes in guest.
>
> * Mitigate guest/ring0->host/ring0 attacks.
>   When host kernel is using retpoline it is safe against these attacks.
>   If host kernel isn't using retpoline we might need to do a IBPB flush on
>   every VMEXIT.
>

So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
three scenarios and retpoline is not needed?

Regards,
Wanpeng Li

> Even when using retpoline for indirect calls, in certain conditions 'ret'
> can use the BTB on Skylake-era CPUs. There are other mitigations
> available like RSB stuffing/clearing.
>
> * IBPB is issued only for SVM during svm_free_vcpu().
>   VMX has a vmclear and SVM doesn't.  Follow discussion here:
>   https://lkml.org/lkml/2018/1/15/146
>
> Please refer to the following spec for more details on the enumeration
> and control.
>
> Refer here to get documentation about mitigations.
>
> https://software.intel.com/en-us/side-channel-security-support
>
> [peterz: rebase and changelog rewrite]
> [karahmed: - rebase
>            - vmx: expose PRED_CMD if guest has it in CPUID
>            - svm: only pass through IBPB if guest has it in CPUID
>            - vmx: support !cpu_has_vmx_msr_bitmap()]
>            - vmx: support nested]
> [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
>         PRED_CMD is a write-only MSR]
>
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> ---
> v6:
> - introduce msr_write_intercepted_l01
>
> v5:
> - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
> - Always merge the bitmaps unconditionally.
> - Add PRED_CMD to direct_access_msrs.
> - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
> - rewrite the commit message (from ashok.raj@)
> ---
>  arch/x86/kvm/cpuid.c | 11 +++++++-
>  arch/x86/kvm/svm.c   | 28 ++++++++++++++++++
>  arch/x86/kvm/vmx.c   | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 116 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index c0eb337..033004d 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>                 F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
>                 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
>
> +       /* cpuid 0x80000008.ebx */
> +       const u32 kvm_cpuid_8000_0008_ebx_x86_features =
> +               F(IBPB);
> +
>         /* cpuid 0xC0000001.edx */
>         const u32 kvm_cpuid_C000_0001_edx_x86_features =
>                 F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
> @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>                 if (!g_phys_as)
>                         g_phys_as = phys_as;
>                 entry->eax = g_phys_as | (virt_as << 8);
> -               entry->ebx = entry->edx = 0;
> +               entry->edx = 0;
> +               /* IBPB isn't necessarily present in hardware cpuid */
> +               if (boot_cpu_has(X86_FEATURE_IBPB))
> +                       entry->ebx |= F(IBPB);
> +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
>                 break;
>         }
>         case 0x80000019:
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f40d0da..254eefb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs {
>         { .index = MSR_CSTAR,                           .always = true  },
>         { .index = MSR_SYSCALL_MASK,                    .always = true  },
>  #endif
> +       { .index = MSR_IA32_PRED_CMD,                   .always = false },
>         { .index = MSR_IA32_LASTBRANCHFROMIP,           .always = false },
>         { .index = MSR_IA32_LASTBRANCHTOIP,             .always = false },
>         { .index = MSR_IA32_LASTINTFROMIP,              .always = false },
> @@ -529,6 +530,7 @@ struct svm_cpu_data {
>         struct kvm_ldttss_desc *tss_desc;
>
>         struct page *save_area;
> +       struct vmcb *current_vmcb;
>  };
>
>  static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> @@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>         __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>         kvm_vcpu_uninit(vcpu);
>         kmem_cache_free(kvm_vcpu_cache, svm);
> +       /*
> +        * The vmcb page can be recycled, causing a false negative in
> +        * svm_vcpu_load(). So do a full IBPB now.
> +        */
> +       indirect_branch_prediction_barrier();
>  }
>
>  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>         struct vcpu_svm *svm = to_svm(vcpu);
> +       struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>         int i;
>
>         if (unlikely(cpu != vcpu->cpu)) {
> @@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>         if (static_cpu_has(X86_FEATURE_RDTSCP))
>                 wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>
> +       if (sd->current_vmcb != svm->vmcb) {
> +               sd->current_vmcb = svm->vmcb;
> +               indirect_branch_prediction_barrier();
> +       }
>         avic_vcpu_load(vcpu, cpu);
>  }
>
> @@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> +               if (is_guest_mode(vcpu))
> +                       break;
> +               set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
> +               break;
>         case MSR_STAR:
>                 svm->vmcb->save.star = data;
>                 break;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d46a61b..263eb1f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -592,6 +592,7 @@ struct vcpu_vmx {
>         u64                   msr_host_kernel_gs_base;
>         u64                   msr_guest_kernel_gs_base;
>  #endif
> +
>         u32 vm_entry_controls_shadow;
>         u32 vm_exit_controls_shadow;
>         u32 secondary_exec_control;
> @@ -936,6 +937,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
>  static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
>                                             u16 error_code);
>  static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
> +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> +                                                         u32 msr, int type);
>
>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> @@ -1907,6 +1910,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
>         vmcs_write32(EXCEPTION_BITMAP, eb);
>  }
>
> +/*
> + * Check if MSR is intercepted for L01 MSR bitmap.
> + */
> +static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +       unsigned long *msr_bitmap;
> +       int f = sizeof(unsigned long);
> +
> +       if (!cpu_has_vmx_msr_bitmap())
> +               return true;
> +
> +       msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
> +
> +       if (msr <= 0x1fff) {
> +               return !!test_bit(msr, msr_bitmap + 0x800 / f);
> +       } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
> +               msr &= 0x1fff;
> +               return !!test_bit(msr, msr_bitmap + 0xc00 / f);
> +       }
> +
> +       return true;
> +}
> +
>  static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
>                 unsigned long entry, unsigned long exit)
>  {
> @@ -2285,6 +2311,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>         if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
>                 per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
>                 vmcs_load(vmx->loaded_vmcs->vmcs);
> +               indirect_branch_prediction_barrier();
>         }
>
>         if (!already_loaded) {
> @@ -3342,6 +3369,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr_info);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> +
> +               /*
> +                * For non-nested:
> +                * When it's written (to non-zero) for the first time, pass
> +                * it through.
> +                *
> +                * For nested:
> +                * The handling of the MSR bitmap for L2 guests is done in
> +                * nested_vmx_merge_msr_bitmap. We should not touch the
> +                * vmcs02.msr_bitmap here since it gets completely overwritten
> +                * in the merging.
> +                */
> +               vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> +                                             MSR_TYPE_W);
> +               break;
>         case MSR_IA32_CR_PAT:
>                 if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
>                         if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
> @@ -10044,9 +10099,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
>         struct page *page;
>         unsigned long *msr_bitmap_l1;
>         unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
> +       /*
> +        * pred_cmd is trying to verify two things:
> +        *
> +        * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
> +        *    ensures that we do not accidentally generate an L02 MSR bitmap
> +        *    from the L12 MSR bitmap that is too permissive.
> +        * 2. That L1 or L2s have actually used the MSR. This avoids
> +        *    unnecessarily merging of the bitmap if the MSR is unused. This
> +        *    works properly because we only update the L01 MSR bitmap lazily.
> +        *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
> +        *    updated to reflect this when L1 (or its L2s) actually write to
> +        *    the MSR.
> +        */
> +       bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
>
> -       /* This shortcut is ok because we support only x2APIC MSRs so far. */
> -       if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
> +       if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
> +           !pred_cmd)
>                 return false;
>
>         page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
> @@ -10079,6 +10148,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
>                                 MSR_TYPE_W);
>                 }
>         }
> +
> +       if (pred_cmd)
> +               nested_vmx_disable_intercept_for_msr(
> +                                       msr_bitmap_l1, msr_bitmap_l0,
> +                                       MSR_IA32_PRED_CMD,
> +                                       MSR_TYPE_W);
> +
>         kunmap(page);
>         kvm_release_page_clean(page);
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-05-03  1:27   ` Wanpeng Li
@ 2018-05-03  9:19     ` Paolo Bonzini
  2018-05-03 12:01       ` Wanpeng Li
  2018-05-03 12:46       ` Tian, Kevin
  0 siblings, 2 replies; 34+ messages in thread
From: Paolo Bonzini @ 2018-05-03  9:19 UTC (permalink / raw)
  To: Wanpeng Li, KarimAllah Ahmed
  Cc: kvm, LKML, the arch/x86 maintainers, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Peter Zijlstra,
	David Woodhouse

On 03/05/2018 03:27, Wanpeng Li wrote:
> So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
> guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
> three scenarios and retpoline is not needed?

In theory yes, in practice if you want to do that IBPB is much more
expensive than retpolines, because you'd need an IBPB on vmexit or a
cache flush on vmentry.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-05-03  9:19     ` Paolo Bonzini
@ 2018-05-03 12:01       ` Wanpeng Li
  2018-05-03 12:46       ` Tian, Kevin
  1 sibling, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2018-05-03 12:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KarimAllah Ahmed, kvm, LKML, the arch/x86 maintainers, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Peter Zijlstra, David Woodhouse

2018-05-03 17:19 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 03/05/2018 03:27, Wanpeng Li wrote:
>> So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
>> guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
>> three scenarios and retpoline is not needed?
>
> In theory yes, in practice if you want to do that IBPB is much more
> expensive than retpolines, because you'd need an IBPB on vmexit or a
> cache flush on vmentry.

https://lkml.org/lkml/2018/1/4/615 Retpoline is not recommended on
Skylake, so we need to pay the penalty for IBPB flush on each vmexit I
think.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-05-03  9:19     ` Paolo Bonzini
  2018-05-03 12:01       ` Wanpeng Li
@ 2018-05-03 12:46       ` Tian, Kevin
  1 sibling, 0 replies; 34+ messages in thread
From: Tian, Kevin @ 2018-05-03 12:46 UTC (permalink / raw)
  To: Paolo Bonzini, Wanpeng Li, KarimAllah Ahmed
  Cc: kvm, LKML, the arch/x86 maintainers, Raj, Ashok, Mallick, Asit K,
	Hansen, Dave, Van De Ven, Arjan, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Williams, Dan J,
	Nakajima, Jun, Andy Lutomirski, Greg KH, Peter Zijlstra,
	David Woodhouse

> From: Paolo Bonzini
> Sent: Thursday, May 3, 2018 5:20 PM
> 
> On 03/05/2018 03:27, Wanpeng Li wrote:
> > So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
> > guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
> > three scenarios and retpoline is not needed?
> 
> In theory yes, in practice if you want to do that IBPB is much more
> expensive than retpolines, because you'd need an IBPB on vmexit or a
> cache flush on vmentry.
> 

yes if HT is disabled. otherwise IBPB alone is not sufficient since it's 
just one-time effect while poison from sibling thread can happen 
anytime. in latter case retpoline or IBRS is expected to use with
IBPB in conjunction as a full mitigation.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-01 21:59 [PATCH v6 0/5] KVM: Expose speculation control feature to guests KarimAllah Ahmed
  2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 10:53   ` Darren Kenny
  2018-02-02 17:51   ` Konrad Rzeszutek Wilk
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
  2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
  4 siblings, 2 replies; 34+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Asit Mallick, Dave Hansen, Arjan Van De Ven,
	Tim Chen, Linus Torvalds, Andrea Arcangeli, Andi Kleen,
	Thomas Gleixner, Dan Williams, Jun Nakajima, Andy Lutomirski,
	Greg KH, Paolo Bonzini, Ashok Raj, David Woodhouse

Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++++++++++++++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 263eb1f..b13314a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,8 @@ struct vcpu_vmx {
 	u64 		      msr_guest_kernel_gs_base;
 #endif
 
+	u64 		      arch_capabilities;
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	u32 secondary_exec_control;
@@ -3262,6 +3264,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+			return 1;
+		msr_info->data = to_vmx(vcpu)->arch_capabilities;
+		break;
 	case MSR_IA32_SYSENTER_CS:
 		msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
 		break;
@@ -3397,6 +3405,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
 					      MSR_TYPE_W);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated)
+			return 1;
+		vmx->arch_capabilities = data;
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5659,6 +5672,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 		++vmx->nmsrs;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+	MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
@ 2018-02-02 10:53   ` Darren Kenny
  2018-02-02 17:35     ` Jim Mattson
  2018-02-02 17:51   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 34+ messages in thread
From: Darren Kenny @ 2018-02-02 10:53 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Ashok Raj,
	David Woodhouse

On Thu, Feb 01, 2018 at 10:59:44PM +0100, KarimAllah Ahmed wrote:
>Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
>(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
>contents will come directly from the hardware, but user-space can still
>override it.
>
>[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
>
>Cc: Asit Mallick <asit.k.mallick@intel.com>
>Cc: Dave Hansen <dave.hansen@intel.com>
>Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>Cc: Tim Chen <tim.c.chen@linux.intel.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Andrea Arcangeli <aarcange@redhat.com>
>Cc: Andi Kleen <ak@linux.intel.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Greg KH <gregkh@linuxfoundation.org>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

>---
> arch/x86/kvm/cpuid.c |  2 +-
> arch/x86/kvm/vmx.c   | 15 +++++++++++++++
> arch/x86/kvm/x86.c   |  1 +
> 3 files changed, 17 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index 033004d..1909635 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>
> 	/* cpuid 7.0.edx*/
> 	const u32 kvm_cpuid_7_0_edx_x86_features =
>-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
>+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
>
> 	/* all calls to cpuid_count() should be made on the same cpu */
> 	get_cpu();
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index 263eb1f..b13314a 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -593,6 +593,8 @@ struct vcpu_vmx {
> 	u64 		      msr_guest_kernel_gs_base;
> #endif
>
>+	u64 		      arch_capabilities;
>+
> 	u32 vm_entry_controls_shadow;
> 	u32 vm_exit_controls_shadow;
> 	u32 secondary_exec_control;
>@@ -3262,6 +3264,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_TSC:
> 		msr_info->data = guest_read_tsc(vcpu);
> 		break;
>+	case MSR_IA32_ARCH_CAPABILITIES:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
>+			return 1;
>+		msr_info->data = to_vmx(vcpu)->arch_capabilities;
>+		break;
> 	case MSR_IA32_SYSENTER_CS:
> 		msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
> 		break;
>@@ -3397,6 +3405,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> 					      MSR_TYPE_W);
> 		break;
>+	case MSR_IA32_ARCH_CAPABILITIES:
>+		if (!msr_info->host_initiated)
>+			return 1;
>+		vmx->arch_capabilities = data;
>+		break;
> 	case MSR_IA32_CR_PAT:
> 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
> 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
>@@ -5659,6 +5672,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> 		++vmx->nmsrs;
> 	}
>
>+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
>+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
>
> 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index c53298d..4ec142e 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
> #endif
> 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
>+	MSR_IA32_ARCH_CAPABILITIES
> };
>
> static unsigned num_msrs_to_save;
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-02 10:53   ` Darren Kenny
@ 2018-02-02 17:35     ` Jim Mattson
  0 siblings, 0 replies; 34+ messages in thread
From: Jim Mattson @ 2018-02-02 17:35 UTC (permalink / raw)
  To: KarimAllah Ahmed, kvm list, LKML, the arch/x86 maintainers,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Ashok Raj, David Woodhouse

On Fri, Feb 2, 2018 at 2:53 AM, Darren Kenny <darren.kenny@oracle.com> wrote:
> On Thu, Feb 01, 2018 at 10:59:44PM +0100, KarimAllah Ahmed wrote:
>>
>> Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
>> (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
>> contents will come directly from the hardware, but user-space can still
>> override it.
>>
>> [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
>>
>> Cc: Asit Mallick <asit.k.mallick@intel.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>
>
> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
  2018-02-02 10:53   ` Darren Kenny
@ 2018-02-02 17:51   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 17:51 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Ashok Raj,
	David Woodhouse

On Thu, Feb 01, 2018 at 10:59:44PM +0100, KarimAllah Ahmed wrote:
> Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
> (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
> contents will come directly from the hardware, but user-space can still
> override it.
> 
> [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 [PATCH v6 0/5] KVM: Expose speculation control feature to guests KarimAllah Ahmed
                   ` (2 preceding siblings ...)
  2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 11:03   ` Darren Kenny
                     ` (3 more replies)
  2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
  4 siblings, 4 replies; 34+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Asit Mallick, Arjan Van De Ven, Dave Hansen,
	Andi Kleen, Andrea Arcangeli, Linus Torvalds, Tim Chen,
	Thomas Gleixner, Dan Williams, Jun Nakajima, Paolo Bonzini,
	David Woodhouse, Greg KH, Andy Lutomirski, Ashok Raj

[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
v6:
- got rid of save_spec_ctrl_on_exit
- introduce msr_write_intercepted
v5:
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
v4:
- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
- Handling nested guests
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
  disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
 arch/x86/kvm/cpuid.c |   9 +++--
 arch/x86/kvm/vmx.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c   |   2 +-
 3 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 0x80000008.ebx */
 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-		F(IBPB);
+		F(IBPB) | F(IBRS);
 
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+		F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
 		entry->edx = 0;
-		/* IBPB isn't necessarily present in hardware cpuid */
+		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
 		if (boot_cpu_has(X86_FEATURE_IBPB))
 			entry->ebx |= F(IBPB);
+		if (boot_cpu_has(X86_FEATURE_IBRS))
+			entry->ebx |= F(IBRS);
 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b13314a..5d8a6a91 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,7 @@ struct vcpu_vmx {
 #endif
 
 	u64 		      arch_capabilities;
+	u64 		      spec_ctrl;
 
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
@@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
+/*
  * Check if MSR is intercepted for L01 MSR bitmap.
  */
 static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
@@ -3264,6 +3288,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		msr_info->data = to_vmx(vcpu)->spec_ctrl;
+		break;
 	case MSR_IA32_ARCH_CAPABILITIES:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -3377,6 +3409,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		/* The STIBP bit doesn't fault even if it's not advertised */
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+			return 1;
+
+		vmx->spec_ctrl = data;
+
+		if (!data)
+			break;
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging. We update the vmcs01 here for L1 as well
+		 * since it will end up touching the MSR anyway now.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
+					      MSR_IA32_SPEC_CTRL,
+					      MSR_TYPE_RW);
+		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
@@ -5702,6 +5765,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	u64 cr0;
 
 	vmx->rmode.vm86_active = 0;
+	vmx->spec_ctrl = 0;
 
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
 	kvm_set_cr8(vcpu, 0);
@@ -9373,6 +9437,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	vmx_arm_hv_timer(vcpu);
 
+	/*
+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
+	 * is no need to worry about the conditional branch over the wrmsr
+	 * being speculatively taken.
+	 */
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
 	vmx->__launched = vmx->loaded_vmcs->launched;
 	asm(
 		/* Store host registers */
@@ -9491,6 +9564,27 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
 	      );
 
+	/*
+	 * We do not use IBRS in the kernel. If this vCPU has used the
+	 * SPEC_CTRL MSR it may have left it on; save the value and
+	 * turn it off. This is much more efficient than blindly adding
+	 * it to the atomic save/restore list. Especially as the former
+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+	 *
+	 * For non-nested case:
+	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 *
+	 * For nested case:
+	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 */
+	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
+		rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 
@@ -10115,7 +10209,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
 	/*
-	 * pred_cmd is trying to verify two things:
+	 * pred_cmd & spec_ctrl are trying to verify two things:
 	 *
 	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
 	 *    ensures that we do not accidentally generate an L02 MSR bitmap
@@ -10128,9 +10222,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	 *    the MSR.
 	 */
 	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
-	    !pred_cmd)
+	    !pred_cmd && !spec_ctrl)
 		return false;
 
 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10164,6 +10259,12 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 		}
 	}
 
+	if (spec_ctrl)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_SPEC_CTRL,
+					MSR_TYPE_R | MSR_TYPE_W);
+
 	if (pred_cmd)
 		nested_vmx_disable_intercept_for_msr(
 					msr_bitmap_l1, msr_bitmap_l0,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ec142e..ac38143 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
-	MSR_IA32_ARCH_CAPABILITIES
+	MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
@ 2018-02-02 11:03   ` Darren Kenny
  2018-02-02 11:27   ` David Woodhouse
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Darren Kenny @ 2018-02-02 11:03 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

On Thu, Feb 01, 2018 at 10:59:45PM +0100, KarimAllah Ahmed wrote:
>[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
>
>Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
>guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
>be using a retpoline+IBPB based approach.
>
>To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
>guests that do not actually use the MSR, only start saving and restoring
>when a non-zero is written to it.
>
>No attempt is made to handle STIBP here, intentionally. Filtering STIBP
>may be added in a future patch, which may require trapping all writes
>if we don't want to pass it through directly to the guest.
>
>[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
>
>Cc: Asit Mallick <asit.k.mallick@intel.com>
>Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>Cc: Dave Hansen <dave.hansen@intel.com>
>Cc: Andi Kleen <ak@linux.intel.com>
>Cc: Andrea Arcangeli <aarcange@redhat.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Tim Chen <tim.c.chen@linux.intel.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: David Woodhouse <dwmw@amazon.co.uk>
>Cc: Greg KH <gregkh@linuxfoundation.org>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

>---
>v6:
>- got rid of save_spec_ctrl_on_exit
>- introduce msr_write_intercepted
>v5:
>- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
>v4:
>- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
>- Handling nested guests
>v3:
>- Save/restore manually
>- Fix CPUID handling
>- Fix a copy & paste error in the name of SPEC_CTRL MSR in
>  disable_intercept.
>- support !cpu_has_vmx_msr_bitmap()
>v2:
>- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
>- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
>  when the instance never used the MSR (dwmw@).
>- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
>- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
>---
> arch/x86/kvm/cpuid.c |   9 +++--
> arch/x86/kvm/vmx.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/x86.c   |   2 +-
> 3 files changed, 110 insertions(+), 6 deletions(-)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index 1909635..13f5d42 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>
> 	/* cpuid 0x80000008.ebx */
> 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
>-		F(IBPB);
>+		F(IBPB) | F(IBRS);
>
> 	/* cpuid 0xC0000001.edx */
> 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
>@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>
> 	/* cpuid 7.0.edx*/
> 	const u32 kvm_cpuid_7_0_edx_x86_features =
>-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
>+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>+		F(ARCH_CAPABILITIES);
>
> 	/* all calls to cpuid_count() should be made on the same cpu */
> 	get_cpu();
>@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> 			g_phys_as = phys_as;
> 		entry->eax = g_phys_as | (virt_as << 8);
> 		entry->edx = 0;
>-		/* IBPB isn't necessarily present in hardware cpuid */
>+		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
> 		if (boot_cpu_has(X86_FEATURE_IBPB))
> 			entry->ebx |= F(IBPB);
>+		if (boot_cpu_has(X86_FEATURE_IBRS))
>+			entry->ebx |= F(IBRS);
> 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> 		break;
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index b13314a..5d8a6a91 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -594,6 +594,7 @@ struct vcpu_vmx {
> #endif
>
> 	u64 		      arch_capabilities;
>+	u64 		      spec_ctrl;
>
> 	u32 vm_entry_controls_shadow;
> 	u32 vm_exit_controls_shadow;
>@@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
> }
>
> /*
>+ * Check if MSR is intercepted for currently loaded MSR bitmap.
>+ */
>+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
>+{
>+	unsigned long *msr_bitmap;
>+	int f = sizeof(unsigned long);
>+
>+	if (!cpu_has_vmx_msr_bitmap())
>+		return true;
>+
>+	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
>+
>+	if (msr <= 0x1fff) {
>+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
>+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
>+		msr &= 0x1fff;
>+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
>+	}
>+
>+	return true;
>+}
>+
>+/*
>  * Check if MSR is intercepted for L01 MSR bitmap.
>  */
> static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
>@@ -3264,6 +3288,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_TSC:
> 		msr_info->data = guest_read_tsc(vcpu);
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>+			return 1;
>+
>+		msr_info->data = to_vmx(vcpu)->spec_ctrl;
>+		break;
> 	case MSR_IA32_ARCH_CAPABILITIES:
> 		if (!msr_info->host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
>@@ -3377,6 +3409,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_TSC:
> 		kvm_write_tsc(vcpu, msr_info);
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>+			return 1;
>+
>+		/* The STIBP bit doesn't fault even if it's not advertised */
>+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
>+			return 1;
>+
>+		vmx->spec_ctrl = data;
>+
>+		if (!data)
>+			break;
>+
>+		/*
>+		 * For non-nested:
>+		 * When it's written (to non-zero) for the first time, pass
>+		 * it through.
>+		 *
>+		 * For nested:
>+		 * The handling of the MSR bitmap for L2 guests is done in
>+		 * nested_vmx_merge_msr_bitmap. We should not touch the
>+		 * vmcs02.msr_bitmap here since it gets completely overwritten
>+		 * in the merging. We update the vmcs01 here for L1 as well
>+		 * since it will end up touching the MSR anyway now.
>+		 */
>+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
>+					      MSR_IA32_SPEC_CTRL,
>+					      MSR_TYPE_RW);
>+		break;
> 	case MSR_IA32_PRED_CMD:
> 		if (!msr_info->host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
>@@ -5702,6 +5765,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 	u64 cr0;
>
> 	vmx->rmode.vm86_active = 0;
>+	vmx->spec_ctrl = 0;
>
> 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> 	kvm_set_cr8(vcpu, 0);
>@@ -9373,6 +9437,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> 	vmx_arm_hv_timer(vcpu);
>
>+	/*
>+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
>+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
>+	 * is no need to worry about the conditional branch over the wrmsr
>+	 * being speculatively taken.
>+	 */
>+	if (vmx->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
>+
> 	vmx->__launched = vmx->loaded_vmcs->launched;
> 	asm(
> 		/* Store host registers */
>@@ -9491,6 +9564,27 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> #endif
> 	      );
>
>+	/*
>+	 * We do not use IBRS in the kernel. If this vCPU has used the
>+	 * SPEC_CTRL MSR it may have left it on; save the value and
>+	 * turn it off. This is much more efficient than blindly adding
>+	 * it to the atomic save/restore list. Especially as the former
>+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
>+	 *
>+	 * For non-nested case:
>+	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
>+	 * save it.
>+	 *
>+	 * For nested case:
>+	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
>+	 * save it.
>+	 */
>+	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
>+		rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
>+
>+	if (vmx->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>+
> 	/* Eliminate branch target predictions from guest mode */
> 	vmexit_fill_RSB();
>
>@@ -10115,7 +10209,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> 	unsigned long *msr_bitmap_l1;
> 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
> 	/*
>-	 * pred_cmd is trying to verify two things:
>+	 * pred_cmd & spec_ctrl are trying to verify two things:
> 	 *
> 	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
> 	 *    ensures that we do not accidentally generate an L02 MSR bitmap
>@@ -10128,9 +10222,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> 	 *    the MSR.
> 	 */
> 	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
>+	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
>
> 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
>-	    !pred_cmd)
>+	    !pred_cmd && !spec_ctrl)
> 		return false;
>
> 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
>@@ -10164,6 +10259,12 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> 		}
> 	}
>
>+	if (spec_ctrl)
>+		nested_vmx_disable_intercept_for_msr(
>+					msr_bitmap_l1, msr_bitmap_l0,
>+					MSR_IA32_SPEC_CTRL,
>+					MSR_TYPE_R | MSR_TYPE_W);
>+
> 	if (pred_cmd)
> 		nested_vmx_disable_intercept_for_msr(
> 					msr_bitmap_l1, msr_bitmap_l0,
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 4ec142e..ac38143 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = {
> #endif
> 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
>-	MSR_IA32_ARCH_CAPABILITIES
>+	MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
> };
>
> static unsigned num_msrs_to_save;
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
  2018-02-02 11:03   ` Darren Kenny
@ 2018-02-02 11:27   ` David Woodhouse
  2018-02-02 17:53   ` Konrad Rzeszutek Wilk
  2018-02-02 17:57   ` Jim Mattson
  3 siblings, 0 replies; 34+ messages in thread
From: David Woodhouse @ 2018-02-02 11:27 UTC (permalink / raw)
  To: KarimAllah Ahmed, kvm, linux-kernel, x86
  Cc: Asit Mallick, Arjan Van De Ven, Dave Hansen, Andi Kleen,
	Andrea Arcangeli, Linus Torvalds, Tim Chen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Paolo Bonzini, Greg KH,
	Andy Lutomirski, Ashok Raj

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

On Thu, 2018-02-01 at 22:59 +0100, KarimAllah Ahmed wrote:

> [ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
> 
> Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
> guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
> be using a retpoline+IBPB based approach.
> 
> To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
> guests that do not actually use the MSR, only start saving and restoring
> when a non-zero is written to it.
> 
> No attempt is made to handle STIBP here, intentionally. Filtering STIBP
> may be added in a future patch, which may require trapping all writes
> if we don't want to pass it through directly to the guest.
> 
> [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v6:
> - got rid of save_spec_ctrl_on_exit
> - introduce msr_write_intercepted
> v5:
> - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
> v4:
> - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
> - Handling nested guests
> v3:
> - Save/restore manually
> - Fix CPUID handling
> - Fix a copy & paste error in the name of SPEC_CTRL MSR in
>   disable_intercept.
> - support !cpu_has_vmx_msr_bitmap()
> v2:
> - remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
> - special case writing '0' in SPEC_CTRL to avoid confusing live-migration
>   when the instance never used the MSR (dwmw@).
> - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
> - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).

This looks very good to me now, and the comments are helpful. Thank you
for your persistence with getting the details right. If we make the SVM
one look like this, as you mentioned, I think we ought finally be ready
to merge it.

Good work ;)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
  2018-02-02 11:03   ` Darren Kenny
  2018-02-02 11:27   ` David Woodhouse
@ 2018-02-02 17:53   ` Konrad Rzeszutek Wilk
  2018-02-02 18:05     ` David Woodhouse
  2018-02-02 17:57   ` Jim Mattson
  3 siblings, 1 reply; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 17:53 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

.snip..
> @@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
>  }
>  
>  /*
> + * Check if MSR is intercepted for currently loaded MSR bitmap.
> + */
> +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +	unsigned long *msr_bitmap;
> +	int f = sizeof(unsigned long);

unsigned int
> +
> +	if (!cpu_has_vmx_msr_bitmap())
> +		return true;
> +
> +	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
> +
> +	if (msr <= 0x1fff) {
> +		return !!test_bit(msr, msr_bitmap + 0x800 / f);
> +	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
> +		msr &= 0x1fff;
> +		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
> +	}
> +
> +	return true;
> +}

with that:

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-02 17:53   ` Konrad Rzeszutek Wilk
@ 2018-02-02 18:05     ` David Woodhouse
  2018-02-02 18:19       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2018-02-02 18:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, Greg KH, Andy Lutomirski, Ashok Raj

[-- Attachment #1: Type: text/plain, Size: 837 bytes --]

On Fri, 2018-02-02 at 12:53 -0500, Konrad Rzeszutek Wilk wrote:
> .snip..
> > 
> > @@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct
> > kvm_vcpu *vcpu)
> >  }
> >  
> >  /*
> > + * Check if MSR is intercepted for currently loaded MSR bitmap.
> > + */
> > +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> > +{
> > +	unsigned long *msr_bitmap;
> > +	int f = sizeof(unsigned long);
>
> unsigned int

$ grep -n 'f = sizeof' vmx.c
1934:	int f = sizeof(unsigned long);
5013:	int f = sizeof(unsigned long);
5048:	int f = sizeof(unsigned long);
5097:	int f = sizeof(unsigned long);

It sucks enough that we're doing this stuff repeatedly, and it's a
prime candidate for cleaning up, but I wasn't going to send Karim off
to bikeshed that today. Let's at least keep it consistent.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-02 18:05     ` David Woodhouse
@ 2018-02-02 18:19       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 18:19 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Asit Mallick,
	Arjan Van De Ven, Dave Hansen, Andi Kleen, Andrea Arcangeli,
	Linus Torvalds, Tim Chen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Paolo Bonzini, Greg KH, Andy Lutomirski, Ashok Raj

On Fri, Feb 02, 2018 at 06:05:54PM +0000, David Woodhouse wrote:
> On Fri, 2018-02-02 at 12:53 -0500, Konrad Rzeszutek Wilk wrote:
> > .snip..
> > > 
> > > @@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct
> > > kvm_vcpu *vcpu)
> > >  }
> > >  
> > >  /*
> > > + * Check if MSR is intercepted for currently loaded MSR bitmap.
> > > + */
> > > +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> > > +{
> > > +	unsigned long *msr_bitmap;
> > > +	int f = sizeof(unsigned long);
> >
> > unsigned int
> 
> $ grep -n 'f = sizeof' vmx.c
> 1934:	int f = sizeof(unsigned long);
> 5013:	int f = sizeof(unsigned long);
> 5048:	int f = sizeof(unsigned long);
> 5097:	int f = sizeof(unsigned long);
> 
> It sucks enough that we're doing this stuff repeatedly, and it's a
> prime candidate for cleaning up, but I wasn't going to send Karim off
> to bikeshed that today. Let's at least keep it consistent.

Sure.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
                     ` (2 preceding siblings ...)
  2018-02-02 17:53   ` Konrad Rzeszutek Wilk
@ 2018-02-02 17:57   ` Jim Mattson
  3 siblings, 0 replies; 34+ messages in thread
From: Jim Mattson @ 2018-02-02 17:57 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm list, LKML, the arch/x86 maintainers, Asit Mallick,
	Arjan Van De Ven, Dave Hansen, Andi Kleen, Andrea Arcangeli,
	Linus Torvalds, Tim Chen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Paolo Bonzini, David Woodhouse, Greg KH,
	Andy Lutomirski, Ashok Raj

On Thu, Feb 1, 2018 at 1:59 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
> [ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
>
> Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
> guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
> be using a retpoline+IBPB based approach.
>
> To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
> guests that do not actually use the MSR, only start saving and restoring
> when a non-zero is written to it.
>
> No attempt is made to handle STIBP here, intentionally. Filtering STIBP
> may be added in a future patch, which may require trapping all writes
> if we don't want to pass it through directly to the guest.
>
> [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
>
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 [PATCH v6 0/5] KVM: Expose speculation control feature to guests KarimAllah Ahmed
                   ` (3 preceding siblings ...)
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 11:06   ` Darren Kenny
  2018-02-02 18:02   ` Konrad Rzeszutek Wilk
  4 siblings, 2 replies; 34+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Asit Mallick, Arjan Van De Ven, Dave Hansen,
	Andi Kleen, Andrea Arcangeli, Linus Torvalds, Tim Chen,
	Thomas Gleixner, Dan Williams, Jun Nakajima, Paolo Bonzini,
	David Woodhouse, Greg KH, Andy Lutomirski, Ashok Raj

[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
v5:
- Add SPEC_CTRL to direct_access_msrs.
---
 arch/x86/kvm/svm.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 254eefb..c6ab343 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,9 @@ struct vcpu_svm {
 		u64 gs_base;
 	} host;
 
+	u64 spec_ctrl;
+	bool save_spec_ctrl_on_exit;
+
 	u32 *msrpm;
 
 	ulong nmi_iret_rip;
@@ -249,6 +252,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },
 	{ .index = MSR_IA32_PRED_CMD,			.always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
@@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	u32 dummy;
 	u32 eax = 1;
 
+	svm->spec_ctrl = 0;
+
 	if (!init_event) {
 		svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
 					   MSR_IA32_APICBASE_ENABLE;
@@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_VM_CR:
 		msr_info->data = svm->nested.vm_cr_msr;
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+			return 1;
+
+		msr_info->data = svm->spec_ctrl;
+		break;
 	case MSR_IA32_UCODE_REV:
 		msr_info->data = 0x01000065;
 		break;
@@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+			return 1;
+
+		/* The STIBP bit doesn't fault even if it's not advertised */
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+			return 1;
+
+		svm->spec_ctrl = data;
+
+		/*
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through. This means we don't have to take the perf
+		 * hit of saving it on vmexit for the common case of guests
+		 * that don't use it.
+		 */
+		if (data && !svm->save_spec_ctrl_on_exit) {
+			svm->save_spec_ctrl_on_exit = true;
+			if (is_guest_mode(vcpu))
+				break;
+			set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		}
+		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	local_irq_enable();
 
+	/*
+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
+	 * is no need to worry about the conditional branch over the wrmsr
+	 * being speculatively taken.
+	 */
+	if (svm->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
 	asm volatile (
 		"push %%" _ASM_BP "; \n\t"
 		"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
 		);
 
+	/*
+	 * We do not use IBRS in the kernel. If this vCPU has used the
+	 * SPEC_CTRL MSR it may have left it on; save the value and
+	 * turn it off. This is much more efficient than blindly adding
+	 * it to the atomic save/restore list. Especially as the former
+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+	 */
+	if (svm->save_spec_ctrl_on_exit)
+		rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
+	if (svm->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
@ 2018-02-02 11:06   ` Darren Kenny
  2018-02-02 18:02   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 34+ messages in thread
From: Darren Kenny @ 2018-02-02 11:06 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

On Thu, Feb 01, 2018 at 10:59:46PM +0100, KarimAllah Ahmed wrote:
>[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]
>
>... basically doing exactly what we do for VMX:
>
>- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
>- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
>  actually used it.
>
>Cc: Asit Mallick <asit.k.mallick@intel.com>
>Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>Cc: Dave Hansen <dave.hansen@intel.com>
>Cc: Andi Kleen <ak@linux.intel.com>
>Cc: Andrea Arcangeli <aarcange@redhat.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Tim Chen <tim.c.chen@linux.intel.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: David Woodhouse <dwmw@amazon.co.uk>
>Cc: Greg KH <gregkh@linuxfoundation.org>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

>---
>v5:
>- Add SPEC_CTRL to direct_access_msrs.
>---
> arch/x86/kvm/svm.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 59 insertions(+)
>
>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>index 254eefb..c6ab343 100644
>--- a/arch/x86/kvm/svm.c
>+++ b/arch/x86/kvm/svm.c
>@@ -184,6 +184,9 @@ struct vcpu_svm {
> 		u64 gs_base;
> 	} host;
>
>+	u64 spec_ctrl;
>+	bool save_spec_ctrl_on_exit;
>+
> 	u32 *msrpm;
>
> 	ulong nmi_iret_rip;
>@@ -249,6 +252,7 @@ static const struct svm_direct_access_msrs {
> 	{ .index = MSR_CSTAR,				.always = true  },
> 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
> #endif
>+	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },
> 	{ .index = MSR_IA32_PRED_CMD,			.always = false },
> 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
> 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
>@@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 	u32 dummy;
> 	u32 eax = 1;
>
>+	svm->spec_ctrl = 0;
>+
> 	if (!init_event) {
> 		svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
> 					   MSR_IA32_APICBASE_ENABLE;
>@@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_VM_CR:
> 		msr_info->data = svm->nested.vm_cr_msr;
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
>+			return 1;
>+
>+		msr_info->data = svm->spec_ctrl;
>+		break;
> 	case MSR_IA32_UCODE_REV:
> 		msr_info->data = 0x01000065;
> 		break;
>@@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> 	case MSR_IA32_TSC:
> 		kvm_write_tsc(vcpu, msr);
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
>+			return 1;
>+
>+		/* The STIBP bit doesn't fault even if it's not advertised */
>+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
>+			return 1;
>+
>+		svm->spec_ctrl = data;
>+
>+		/*
>+		 * When it's written (to non-zero) for the first time, pass
>+		 * it through. This means we don't have to take the perf
>+		 * hit of saving it on vmexit for the common case of guests
>+		 * that don't use it.
>+		 */
>+		if (data && !svm->save_spec_ctrl_on_exit) {
>+			svm->save_spec_ctrl_on_exit = true;
>+			if (is_guest_mode(vcpu))
>+				break;
>+			set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
>+		}
>+		break;
> 	case MSR_IA32_PRED_CMD:
> 		if (!msr->host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
>@@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>
> 	local_irq_enable();
>
>+	/*
>+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
>+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
>+	 * is no need to worry about the conditional branch over the wrmsr
>+	 * being speculatively taken.
>+	 */
>+	if (svm->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
>+
> 	asm volatile (
> 		"push %%" _ASM_BP "; \n\t"
> 		"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
>@@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
> #endif
> 		);
>
>+	/*
>+	 * We do not use IBRS in the kernel. If this vCPU has used the
>+	 * SPEC_CTRL MSR it may have left it on; save the value and
>+	 * turn it off. This is much more efficient than blindly adding
>+	 * it to the atomic save/restore list. Especially as the former
>+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
>+	 */
>+	if (svm->save_spec_ctrl_on_exit)
>+		rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
>+
>+	if (svm->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>+
> 	/* Eliminate branch target predictions from guest mode */
> 	vmexit_fill_RSB();
>
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
  2018-02-02 11:06   ` Darren Kenny
@ 2018-02-02 18:02   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 18:02 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

On Thu, Feb 01, 2018 at 10:59:46PM +0100, KarimAllah Ahmed wrote:
> [ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]
> 
> ... basically doing exactly what we do for VMX:
> 
> - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
> - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
>   actually used it.
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> +	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },


This .always = [false|true] field keeps throwing me off.

So glad: https://www.spinics.net/lists/kvm/msg161606.html explains it better.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-05-03 12:46 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-01 21:59 [PATCH v6 0/5] KVM: Expose speculation control feature to guests KarimAllah Ahmed
2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
2018-02-02 17:37   ` Jim Mattson
2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
2018-02-02 17:49   ` Konrad Rzeszutek Wilk
2018-02-02 18:02     ` David Woodhouse
2018-02-02 19:56       ` Konrad Rzeszutek Wilk
2018-02-02 20:16         ` David Woodhouse
2018-02-02 20:28           ` Konrad Rzeszutek Wilk
2018-02-02 20:31             ` David Woodhouse
2018-02-02 20:52               ` Konrad Rzeszutek Wilk
2018-02-02 20:52             ` Alan Cox
2018-02-05 19:22               ` Paolo Bonzini
2018-02-05 19:24             ` Paolo Bonzini
2018-02-16  3:44   ` Jim Mattson
2018-02-16  4:22     ` Andi Kleen
2018-05-03  1:27   ` Wanpeng Li
2018-05-03  9:19     ` Paolo Bonzini
2018-05-03 12:01       ` Wanpeng Li
2018-05-03 12:46       ` Tian, Kevin
2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
2018-02-02 10:53   ` Darren Kenny
2018-02-02 17:35     ` Jim Mattson
2018-02-02 17:51   ` Konrad Rzeszutek Wilk
2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
2018-02-02 11:03   ` Darren Kenny
2018-02-02 11:27   ` David Woodhouse
2018-02-02 17:53   ` Konrad Rzeszutek Wilk
2018-02-02 18:05     ` David Woodhouse
2018-02-02 18:19       ` Konrad Rzeszutek Wilk
2018-02-02 17:57   ` Jim Mattson
2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
2018-02-02 11:06   ` Darren Kenny
2018-02-02 18:02   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox