From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tglx@linutronix.de>
Received: from mail.linutronix.de (146.0.238.70:993) by
  crypto-ml.lab.linutronix.de with IMAP4-SSL for <speck@linutronix.de>; 23 Feb
  2019 00:03:28 -0000
Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de)
	by Galois.linutronix.de with esmtp (Exim 4.80)
	(envelope-from <tglx@linutronix.de>)
	id 1gxKmt-0007Se-76
	for speck@linutronix.de; Sat, 23 Feb 2019 01:03:27 +0100
Message-Id: <20190222222418.405369026@linutronix.de>
Date: Fri, 22 Feb 2019 23:24:18 +0100
From: Thomas Gleixner <tglx@linutronix.de>
Subject: [patch V4 00/11] MDS basics
To: speck@linutronix.de
List-ID: <speck.linutronix.de>

Hi!

Another day, another update.

Changes since V3:

  - Add the #DF mitigation and document why I can't be bothered
    to sprinkle the buffer clear into #MC

  - Add a comment about the segment selector choice. It makes sense on it's
    own but it won't prevent anyone from thinking that we're crazy.

  - Addressed the review feedback vs. documentation

  - Resurrected the admin documentation patch, tidied it up and filled the
    gaps.

Delta patch without the admin documentation parts below.

Git tree WIP.mds branch is updated as well.

If anyone of the people new to this need access to the git repo,
please send me a public SSH key so I can add to the gitolite config.

There is one point left which I did not look into yet and I'm happy to
delegate that to the virtualization wizards:

  XEON PHI is not affected by L1TF, so it won't get the L1TF
  mitigations. But it is affected by MSBDS, so it needs separate
  mitigation, i.e. clearing CPU buffers on VMENTER.


Thanks,

	Thomas

8<-------------------

 Documentation/ABI/testing/sysfs-devices-system-cpu |    1 
 Documentation/admin-guide/hw-vuln/index.rst        |   13 +
 Documentation/admin-guide/hw-vuln/l1tf.rst         |    1 
 Documentation/admin-guide/hw-vuln/mds.rst          |  258 +++++++++++++++++++++
 Documentation/admin-guide/index.rst                |    6 
 Documentation/admin-guide/kernel-parameters.txt    |   27 ++
 Documentation/index.rst                            |    1 
 Documentation/x86/conf.py                          |   10 
 Documentation/x86/index.rst                        |    8 
 Documentation/x86/mds.rst                          |  205 ++++++++++++++++
 arch/x86/entry/common.c                            |   10 
 arch/x86/include/asm/cpufeatures.h                 |    2 
 arch/x86/include/asm/irqflags.h                    |    4 
 arch/x86/include/asm/msr-index.h                   |   39 +--
 arch/x86/include/asm/mwait.h                       |    7 
 arch/x86/include/asm/nospec-branch.h               |   39 +++
 arch/x86/include/asm/processor.h                   |    7 
 arch/x86/kernel/cpu/bugs.c                         |  105 ++++++++
 arch/x86/kernel/cpu/common.c                       |   13 +
 arch/x86/kernel/nmi.c                              |    6 
 arch/x86/kernel/traps.c                            |    9 
 arch/x86/kvm/cpuid.c                               |    3 
 drivers/base/cpu.c                                 |    8 
 include/linux/cpu.h                                |    2 
 24 files changed, 762 insertions(+), 22 deletions(-)

diff --git a/Documentation/x86/mds.rst b/Documentation/x86/mds.rst
index 0c0d802367e6..ce3dbddbd3b8 100644
--- a/Documentation/x86/mds.rst
+++ b/Documentation/x86/mds.rst
@@ -1,7 +1,12 @@
 Microarchitecural Data Sampling (MDS) mitigation
 ================================================
 
-Microarchitectural Data Sampling (MDS) is a class of side channel attacks
+.. _mds:
+
+Overview
+--------
+
+Microarchitectural Data Sampling (MDS) is a family of side channel attacks
 on internal buffers in Intel CPUs. The variants are:
 
  - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
@@ -33,6 +38,7 @@ faulting or assisting loads under certain conditions, which again can be
 exploited eventually. Load ports are shared between Hyper-Threads so cross
 thread leakage is possible.
 
+
 Exposure assumptions
 --------------------
 
@@ -48,7 +54,7 @@ needed for exploiting MDS requires:
  - to control the pointer through which the disclosure gadget exposes the
    data
 
-The existance of such a construct cannot be excluded with 100% certainty,
+The existence of such a construct cannot be excluded with 100% certainty,
 but the complexity involved makes it extremly unlikely.
 
 There is one exception, which is untrusted BPF. The functionality of
@@ -91,13 +97,37 @@ the invocation can be enforced or conditional.
 As a special quirk to address virtualization scenarios where the host has
 the microcode updated, but the hypervisor does not (yet) expose the
 MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
-hope that it might work. The state is reflected accordingly.
+hope that it might actually clear the buffers. The state is reflected
+accordingly.
 
 According to current knowledge additional mitigations inside the kernel
 itself are not required because the necessary gadgets to expose the leaked
 data cannot be controlled in a way which allows exploitation from malicious
 user space or VM guests.
 
+
+Kernel internal mitigation modes
+--------------------------------
+
+ ======= ===========================================================
+ off     Mitigation is disabled. Either the CPU is not affected or
+         mds=off is supplied on the kernel command line
+
+ full    Mitigation is eanbled. CPU is affected and MD_CLEAR is
+         advertised in CPUID.
+
+ vmwerv	 Mitigation is enabled. CPU is affected and MD_CLEAR is not
+         advertised in CPUID. That is mainly for virtualization
+	 scenarios where the host has the updated microcode but the
+	 hypervisor does not expose MD_CLEAR in CPUID. It's a best
+	 effort approach without guarantee.
+ ======= ===========================================================
+
+If the CPU is affected and mds=off is not supplied on the kernel
+command line then the kernel selects the appropriate mitigation mode
+depending on the availability of the MD_CLEAR CPUID bit.
+
+
 Mitigation points
 -----------------
 
@@ -128,8 +158,16 @@ Mitigation points
    coverage.
 
    There is one non maskable exception which returns through paranoid exit
-   and is not mitigated: #DF. If user space is able to trigger a double
-   fault the possible MDS leakage is the least problem to worry about.
+   and is to some extent controllable from user space through
+   modify_ldt(2): #DF. So mitigation is required in the double fault
+   handler as well.
+
+   Another corner case is a #MC which hits between the buffer clear and the
+   actual return to user. As this still is in kernel space it takes the
+   paranoid exit path which does not clear the CPU buffers. So the #MC
+   handler repopulates the buffers to some extent. Machine checks are not
+   reliably controllable and the window is extremly small so mitigation
+   would just tick a checkbox that this theoretical corner case is covered.
 
 
 2. C-State transition
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 8be9158d848e..3e27ccd6d5c5 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -338,6 +338,8 @@ static inline void mds_clear_cpu_buffers(void)
 	 * Has to be the memory-operand variant because only that
 	 * guarantees the CPU buffer flush functionality according to
 	 * documentation. The register-operand variant does not.
+	 * Works with any segment selector, but a valid writable
+	 * data segment is the fastest variant.
 	 *
 	 * "cc" clobber is required because VERW modifies ZF.
 	 */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0fb241a78de3..83b19bb54093 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -68,6 +68,7 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 DEFINE_STATIC_KEY_FALSE(mds_user_clear);
 /* Control MDS CPU buffer clear before idling (halt, mwait) */
 DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
+EXPORT_SYMBOL_GPL(mds_idle_clear);
 
 void __init check_bugs(void)
 {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a7..d2779f4730f5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -366,6 +366,15 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 		regs->ip = (unsigned long)general_protection;
 		regs->sp = (unsigned long)&gpregs->orig_ax;
 
+		/*
+		 * This situation can be triggered by userspace via
+		 * modify_ldt(2) and the return does not take the regular
+		 * user space exit, so a CPU buffer clear is required when
+		 * MDS mitigation is enabled.
+		 */
+		if (static_branch_unlikely(&mds_user_clear))
+			mds_clear_cpu_buffers();
+
 		return;
 	}
 #endif