From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 20 Feb 2019 15:18:12 -0000 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1gwTdE-000626-6b for speck@linutronix.de; Wed, 20 Feb 2019 16:17:56 +0100 Message-Id: <20190220151400.306266355@linutronix.de> Date: Wed, 20 Feb 2019 16:07:57 +0100 From: Thomas Gleixner References: <20190220150753.665964899@linutronix.de> MIME-Version: 1.0 Subject: [patch V2 04/10] MDS basics+ 4 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: speck@linutronix.de List-ID: Subject: [patch V2 04/10] x86/speculation/mds: Clear CPU buffers on exit to user From: Thomas Gleixner Add a static key which controls the invocation of the CPU buffer clear mechanism on exit to user space and add the call into prepare_exit_to_usermode() right before actually returning. Add documentation which kernel to user space transition this covers and explain in detail why those which are not mitigated do not need it. Signed-off-by: Thomas Gleixner --- Documentation/x86/mds.rst | 79 +++++++++++++++++++++++++++++++++++ arch/x86/entry/common.c | 9 +++ arch/x86/include/asm/nospec-branch.h | 2 arch/x86/kernel/cpu/bugs.c | 4 + 4 files changed, 93 insertions(+), 1 deletion(-) --- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -64,3 +64,82 @@ itself are not required because the nece data cannot be controlled in a way which allows exploitation from malicious user space or VM guests. +Mitigation points +----------------- + +1. Return to user space +^^^^^^^^^^^^^^^^^^^^^^^ + When transition from kernel to user space the CPU buffers are flushed + on affected CPUs: + + - always when the mitigation mode is full. In this case the invocation + depends on the static key mds_user_clear_always. + + - depending on executed functions between entering kernel space and + returning to user space. This is not yet implemented. + + This covers transitions from kernel to user space through a return to + user space from a syscall and from an interrupt or a regular exception. + + There are other kernel to user space transitions which are not covered + by this: NMIs and all non maskable exceptions which go through the + paranoid exit, which means that they are not going to the regular + prepare_exit_to_usermode() exit path which handles the CPU buffer + clearing. + + The occasional non maskable exceptions which go through paranoid exit + are not controllable by user space in any way and most of these + exceptions cannot expose any valuable information either. + + Neither can NMIs be reliably controlled by a non priviledged attacker + and their exposure to sensitive data is very limited. NMIs originate + from: + + - Performance monitoring. + + Performance monitoring is restricted by various mechanisms, i.e. a + regular user on a properly secured system can- if at all - only + monitor it's own user space processes. The performance monitoring + NMI surely executes priviledged kernel code and accesses kernel + internal data structures, which might be exploitable to break the + kernel's address space layout randomization, which is a non-issue + on affected CPUs as there are simpler ways to achieve that. + + - Watchdog + + The kernel uses - if enabled - a performance monitoring event to + trigger NMIs periodically which allow detection of hard lockups in + kernel space due to deadlocks or other issues. + + The watchdog period is a multiple of seconds and the code path + executed cannot expose any secret information other than kernel + address space layout. Due to the low frequency and a limited + control of a potential attacker to align on the watchdog period the + attack surface is close to zero. + + - Legacy oprofile NMI handler + + Similar to performance monitoring, albeit potentially less + restricted, but has been widely replaced by the performance + monitoring interface perf. State of the art systems will not expose + the oprofile interface and even if exposed the potentially + exploitable information is accessible by other and simpler means. + + - KGBD + + If the kernel debugger is accessible by an unpriviledged attacker, + then the NMI handler is the least of the problems. + + - ACPI/GHES + + A firmware based error reporting mechanism which uses NMIs for + notification. Similar to Machine Check Exceptions there is no known + way for an attacker to reliably control and trigger errors which + would cause NMIs. Even if that would be the case the potentially + exploitable data, e.g. kernel address space layout, would be + accessible by simpler means. + + - IPMI, vendor specific NMIs, forced shutdown NMI + + None of those are controllable by unpriviledged attackers to form a + reliable exploit surface. --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -31,6 +31,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -180,6 +181,12 @@ static void exit_to_usermode_loop(struct } } +static inline void mds_user_clear_cpu_buffers(void) +{ + if (static_branch_likely(&mds_user_clear_always)) + mds_clear_cpu_buffers(); +} + /* Called with IRQs disabled. */ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs) { @@ -212,6 +219,8 @@ static void exit_to_usermode_loop(struct #endif user_enter_irqoff(); + + mds_user_clear_cpu_buffers(); } #define SYSCALL_EXIT_WORK_FLAGS \ --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -318,6 +318,8 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_ DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb); +DECLARE_STATIC_KEY_FALSE(mds_user_clear_always); + #include /** --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -63,10 +63,12 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_i /* Control unconditional IBPB in switch_mm() */ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb); +/* Control MDS CPU buffer clear before returning to user space */ +DEFINE_STATIC_KEY_FALSE(mds_user_clear_always); + void __init check_bugs(void) { identify_boot_cpu(); - /* * identify_boot_cpu() initialized SMT support information, let the * core code know.