From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 23 Feb 2019 00:03:28 -0000 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1gxKmt-0007Se-76 for speck@linutronix.de; Sat, 23 Feb 2019 01:03:27 +0100 Message-Id: <20190222222418.405369026@linutronix.de> Date: Fri, 22 Feb 2019 23:24:18 +0100 From: Thomas Gleixner Subject: [patch V4 00/11] MDS basics To: speck@linutronix.de List-ID: Hi! Another day, another update. Changes since V3: - Add the #DF mitigation and document why I can't be bothered to sprinkle the buffer clear into #MC - Add a comment about the segment selector choice. It makes sense on it's own but it won't prevent anyone from thinking that we're crazy. - Addressed the review feedback vs. documentation - Resurrected the admin documentation patch, tidied it up and filled the gaps. Delta patch without the admin documentation parts below. Git tree WIP.mds branch is updated as well. If anyone of the people new to this need access to the git repo, please send me a public SSH key so I can add to the gitolite config. There is one point left which I did not look into yet and I'm happy to delegate that to the virtualization wizards: XEON PHI is not affected by L1TF, so it won't get the L1TF mitigations. But it is affected by MSBDS, so it needs separate mitigation, i.e. clearing CPU buffers on VMENTER. Thanks, Thomas 8<------------------- Documentation/ABI/testing/sysfs-devices-system-cpu | 1 Documentation/admin-guide/hw-vuln/index.rst | 13 + Documentation/admin-guide/hw-vuln/l1tf.rst | 1 Documentation/admin-guide/hw-vuln/mds.rst | 258 +++++++++++++++++++++ Documentation/admin-guide/index.rst | 6 Documentation/admin-guide/kernel-parameters.txt | 27 ++ Documentation/index.rst | 1 Documentation/x86/conf.py | 10 Documentation/x86/index.rst | 8 Documentation/x86/mds.rst | 205 ++++++++++++++++ arch/x86/entry/common.c | 10 arch/x86/include/asm/cpufeatures.h | 2 arch/x86/include/asm/irqflags.h | 4 arch/x86/include/asm/msr-index.h | 39 +-- arch/x86/include/asm/mwait.h | 7 arch/x86/include/asm/nospec-branch.h | 39 +++ arch/x86/include/asm/processor.h | 7 arch/x86/kernel/cpu/bugs.c | 105 ++++++++ arch/x86/kernel/cpu/common.c | 13 + arch/x86/kernel/nmi.c | 6 arch/x86/kernel/traps.c | 9 arch/x86/kvm/cpuid.c | 3 drivers/base/cpu.c | 8 include/linux/cpu.h | 2 24 files changed, 762 insertions(+), 22 deletions(-) diff --git a/Documentation/x86/mds.rst b/Documentation/x86/mds.rst index 0c0d802367e6..ce3dbddbd3b8 100644 --- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -1,7 +1,12 @@ Microarchitecural Data Sampling (MDS) mitigation ================================================ -Microarchitectural Data Sampling (MDS) is a class of side channel attacks +.. _mds: + +Overview +-------- + +Microarchitectural Data Sampling (MDS) is a family of side channel attacks on internal buffers in Intel CPUs. The variants are: - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) @@ -33,6 +38,7 @@ faulting or assisting loads under certain conditions, which again can be exploited eventually. Load ports are shared between Hyper-Threads so cross thread leakage is possible. + Exposure assumptions -------------------- @@ -48,7 +54,7 @@ needed for exploiting MDS requires: - to control the pointer through which the disclosure gadget exposes the data -The existance of such a construct cannot be excluded with 100% certainty, +The existence of such a construct cannot be excluded with 100% certainty, but the complexity involved makes it extremly unlikely. There is one exception, which is untrusted BPF. The functionality of @@ -91,13 +97,37 @@ the invocation can be enforced or conditional. As a special quirk to address virtualization scenarios where the host has the microcode updated, but the hypervisor does not (yet) expose the MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the -hope that it might work. The state is reflected accordingly. +hope that it might actually clear the buffers. The state is reflected +accordingly. According to current knowledge additional mitigations inside the kernel itself are not required because the necessary gadgets to expose the leaked data cannot be controlled in a way which allows exploitation from malicious user space or VM guests. + +Kernel internal mitigation modes +-------------------------------- + + ======= =========================================================== + off Mitigation is disabled. Either the CPU is not affected or + mds=off is supplied on the kernel command line + + full Mitigation is eanbled. CPU is affected and MD_CLEAR is + advertised in CPUID. + + vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not + advertised in CPUID. That is mainly for virtualization + scenarios where the host has the updated microcode but the + hypervisor does not expose MD_CLEAR in CPUID. It's a best + effort approach without guarantee. + ======= =========================================================== + +If the CPU is affected and mds=off is not supplied on the kernel +command line then the kernel selects the appropriate mitigation mode +depending on the availability of the MD_CLEAR CPUID bit. + + Mitigation points ----------------- @@ -128,8 +158,16 @@ Mitigation points coverage. There is one non maskable exception which returns through paranoid exit - and is not mitigated: #DF. If user space is able to trigger a double - fault the possible MDS leakage is the least problem to worry about. + and is to some extent controllable from user space through + modify_ldt(2): #DF. So mitigation is required in the double fault + handler as well. + + Another corner case is a #MC which hits between the buffer clear and the + actual return to user. As this still is in kernel space it takes the + paranoid exit path which does not clear the CPU buffers. So the #MC + handler repopulates the buffers to some extent. Machine checks are not + reliably controllable and the window is extremly small so mitigation + would just tick a checkbox that this theoretical corner case is covered. 2. C-State transition diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 8be9158d848e..3e27ccd6d5c5 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -338,6 +338,8 @@ static inline void mds_clear_cpu_buffers(void) * Has to be the memory-operand variant because only that * guarantees the CPU buffer flush functionality according to * documentation. The register-operand variant does not. + * Works with any segment selector, but a valid writable + * data segment is the fastest variant. * * "cc" clobber is required because VERW modifies ZF. */ diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 0fb241a78de3..83b19bb54093 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -68,6 +68,7 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb); DEFINE_STATIC_KEY_FALSE(mds_user_clear); /* Control MDS CPU buffer clear before idling (halt, mwait) */ DEFINE_STATIC_KEY_FALSE(mds_idle_clear); +EXPORT_SYMBOL_GPL(mds_idle_clear); void __init check_bugs(void) { diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 9b7c4ca8f0a7..d2779f4730f5 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -366,6 +366,15 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) regs->ip = (unsigned long)general_protection; regs->sp = (unsigned long)&gpregs->orig_ax; + /* + * This situation can be triggered by userspace via + * modify_ldt(2) and the return does not take the regular + * user space exit, so a CPU buffer clear is required when + * MDS mitigation is enabled. + */ + if (static_branch_unlikely(&mds_user_clear)) + mds_clear_cpu_buffers(); + return; } #endif