From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Brian Gerst <brgerst@gmail.com>,
Chris Metcalf <cmetcalf@mellanox.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Liang Z Li <liang.z.li@intel.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Huang Rui <ray.huang@amd.com>, Jiri Slaby <jslaby@suse.cz>,
Jonathan Corbet <corbet@lwn.net>,
"Michael S. Tsirkin" <mst@redhat.com>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
Vlastimil Babka <vbabka@suse.cz>, Chen Yucong <slaoub@gmail.com>,
Alexandre Julliard <julliard@winehq.org>,
Fenghua Yu <fenghua.yu@intel.com>, Stas Sergeev <stsp@list.ru>,
"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
Shuah Khan <shuah@kernel.org>,
linux-kernel@vger.kern
Subject: [v3 PATCH 07/10] x86: Add emulation code for UMIP instructions
Date: Wed, 25 Jan 2017 12:23:50 -0800 [thread overview]
Message-ID: <20170125202353.101059-8-ricardo.neri-calderon@linux.intel.com> (raw)
In-Reply-To: <20170125202353.101059-1-ricardo.neri-calderon@linux.intel.com>
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.
Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).
This emulation is needed because certain applications (e.g., WineHQ) rely
on this subset of instructions to function.
The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).
For the instructions that return a kernel memory address. This is needed as
applications such as WineHQ rely on the result being located in the kernel
memory space function. The result is emulated as a hard-coded value that,
in x86_64, lies in one of the memory holes in the kernel memory map. For
i386, the same values are used but truncated to 4 bytes. the location of a
dummy variable in the kernel memory space. The limit for the GDT and the
IDT are set to zero.
The instructions sldt and str return a segment selector relative to the
base address of the global descriptor table. Since the actual address of
such table is not revealed, it makes sense to emulate the result as zero.
The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32/64.S files.
Given that these particular instructions are especially important for
virtual-8086 tasks, care is taken to appropriately emulate the results
in this particular case: utilize segment selectors to form addresses and
parse the contents of the general purpose results in 16-bit addressing
encodings.
Lastly, it might be possible that the memory address to which the results
must be written is not accessible. In such a case a SIGSEGV signal is
generated with error code equal to SEGV_MAPERR.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
arch/x86/include/asm/umip.h | 15 +++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/umip.c | 251 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 267 insertions(+)
create mode 100644 arch/x86/include/asm/umip.h
create mode 100644 arch/x86/kernel/umip.c
diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 0000000..077b236
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs)
+{
+ return false;
+}
+#endif /* CONFIG_X86_INTEL_UMIP */
+#endif /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 581386c..c4aec02 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
ifdef CONFIG_FRAME_POINTER
obj-y += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 0000000..229f7a0
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,251 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ * sgdt
+ * sldt
+ * sidt
+ * str
+ * smsw
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ * Ricardo Neri <ricardo.neri@linux.intel.com>
+ */
+
+#include <linux/uaccess.h>
+#include <asm/umip.h>
+#include <asm/traps.h>
+#include <asm/insn.h>
+#include <asm/insn-kernel.h>
+#include <linux/ratelimit.h>
+
+/*
+ * == Base addresses of GDT and IDT
+ * Some applications to function rely finding the global descriptor table (GDT)
+ * and the interrupt descriptor table (IDT) in kernel memory. For x86_64 this
+ * matches a memory hole as detailed in Documentation/x86/x86_64/mm.txt.
+ * For x86_32, it does not match any particular hole, but it suffices
+ * to provide a memory location within kernel memory.
+ *
+ * == CRO flags for SMSW
+ * Use the flags given when booting, as found in head_64/32.S
+ */
+
+#define CR0_FLAGS (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
+ X86_CR0_WP | X86_CR0_AM)
+#ifdef CONFIG_X86_64
+#define UMIP_DUMMY_GDT_BASE 0xfffffffffffe0000
+#define UMIP_DUMMY_IDT_BASE 0xffffffffffff0000
+#define CR0_STATE (CR0_FLAGS | X86_CR0_PG)
+#else
+#define UMIP_DUMMY_GDT_BASE 0xfffe0000
+#define UMIP_DUMMY_IDT_BASE 0xffff0000
+#define CR0_STATE CR0_FLAGS
+#endif
+
+/*
+ * Definitions for x86 page fault error code bits. Only a simple
+ * pagefault during a write in user context is supported.
+ */
+#define UMIP_PF_USER BIT(2)
+#define UMIP_PF_WRITE BIT(1)
+
+enum umip_insn {
+ UMIP_SGDT = 0, /* opcode 0f 01 ModR/M reg 0 */
+ UMIP_SIDT, /* opcode 0f 01 ModR/M reg 1 */
+ UMIP_SLDT, /* opcode 0f 00 ModR/M reg 0 */
+ UMIP_SMSW, /* opcode 0f 01 ModR/M reg 4 */
+ UMIP_STR, /* opcode 0f 00 ModR/M reg 1 */
+};
+
+static int __identify_insn(struct insn *insn)
+{
+ /* By getting modrm we also get the opcode. */
+ insn_get_modrm(insn);
+
+ /* All the instructions of interest start with 0x0f. */
+ if (insn->opcode.bytes[0] != 0xf)
+ return -EINVAL;
+
+ if (insn->opcode.bytes[1] == 0x1) {
+ switch (X86_MODRM_REG(insn->modrm.value)) {
+ case 0:
+ return UMIP_SGDT;
+ case 1:
+ return UMIP_SIDT;
+ case 4:
+ return UMIP_SMSW;
+ default:
+ return -EINVAL;
+ }
+ } else if (insn->opcode.bytes[1] == 0x0) {
+ if (X86_MODRM_REG(insn->modrm.value) == 0)
+ return UMIP_SLDT;
+ else if (X86_MODRM_REG(insn->modrm.value) == 1)
+ return UMIP_STR;
+ else
+ return -EINVAL;
+ } else {
+ return -EINVAL;
+ }
+}
+
+static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
+ unsigned char *data, int *data_size)
+{
+ unsigned long dummy_base_addr;
+ unsigned short dummy_limit = 0;
+ unsigned int dummy_value = 0;
+
+ switch (umip_inst) {
+ /*
+ * These two instructions return the base address and limit of the
+ * global and interrupt descriptor table. The base address can be
+ * 32-bit or 64-bit. Limit is always 16-bit.
+ */
+ case UMIP_SGDT:
+ case UMIP_SIDT:
+ if (umip_inst == UMIP_SGDT)
+ dummy_base_addr = UMIP_DUMMY_GDT_BASE;
+ else
+ dummy_base_addr = UMIP_DUMMY_IDT_BASE;
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ /* SGDT and SIDT do not take register as argument. */
+ return -EINVAL;
+ }
+ /* Fill most significant byte with zeros if 16-bit addressing*/
+ if (insn->addr_bytes == 2)
+ dummy_base_addr &= 0xffffff;
+ memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
+ memcpy(data, &dummy_limit, sizeof(dummy_limit));
+ *data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);
+ break;
+ case UMIP_SMSW:
+ dummy_value = CR0_STATE;
+ /*
+ * These two instructions return a 16-bit value. We return
+ * all zeros. This is equivalent to a null descriptor for
+ * str and sldt.
+ */
+ case UMIP_SLDT:
+ case UMIP_STR:
+ /* if operand is a register, it is zero-extended*/
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ memset(data, 0, insn->opnd_bytes);
+ *data_size = insn->opnd_bytes;
+ /* if not, only the two least significant bytes are copied */
+ } else {
+ *data_size = 2;
+ }
+ memcpy(data, &dummy_value, sizeof(dummy_value));
+ break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static void __force_sig_info_umip_fault(void __user *address,
+ struct pt_regs *regs)
+{
+ siginfo_t info;
+ struct task_struct *tsk = current;
+
+ if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
+ printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%lx in %lx\n",
+ tsk->comm, task_pid_nr(tsk), regs->ip,
+ regs->sp, UMIP_PF_USER | UMIP_PF_WRITE, regs->ip);
+ }
+
+ tsk->thread.cr2 = (unsigned long)address;
+ tsk->thread.error_code = UMIP_PF_USER | UMIP_PF_WRITE;
+ tsk->thread.trap_nr = X86_TRAP_PF;
+
+ info.si_signo = SIGSEGV;
+ info.si_errno = 0;
+ info.si_code = SEGV_MAPERR;
+ info.si_addr = address;
+ force_sig_info(SIGSEGV, &info, tsk);
+}
+
+bool fixup_umip_exception(struct pt_regs *regs)
+{
+ struct insn insn;
+ unsigned char buf[MAX_INSN_SIZE];
+ /* 10 bytes is the maximum size of the result of UMIP instructions */
+ unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+#ifdef CONFIG_X86_64
+ int x86_64 = user_64bit_mode(regs);
+#else
+ int x86_64 = 0;
+#endif
+ int not_copied, nr_copied, reg_offset, dummy_data_size;
+ void __user *uaddr;
+ unsigned long *reg_addr;
+ enum umip_insn umip_inst;
+
+ if (v8086_mode(regs))
+ /*
+ * In virtual-8086 mode the segment selector cs does not point
+ * to a segment descriptor but is used directly to compute the
+ * address. This is done after shifting it 4 bytes to the left.
+ * The result is added to the instruction pointer.
+ */
+ not_copied = copy_from_user(buf,
+ (void __user *)((regs->cs << 4) +
+ regs->ip),
+ sizeof(buf));
+ else
+ not_copied = copy_from_user(buf, (void __user *)regs->ip,
+ sizeof(buf));
+ nr_copied = sizeof(buf) - not_copied;
+ /*
+ * The copy_from_user above could have failed if user code is protected
+ * by a memory protection key. Give up on emulation in such a case.
+ * Should we issue a page fault?
+ */
+ if (!nr_copied)
+ return false;
+ insn_init(&insn, buf, nr_copied, x86_64);
+ /*
+ * Set address and operand sizes to the default of 16 bits. If
+ * overrides are found, sizes will be fixed when parsing the instruction
+ * prefixes.
+ */
+ if (v8086_mode(regs)) {
+ insn.addr_bytes = 2;
+ insn.opnd_bytes = 2;
+ }
+ insn_get_length(&insn);
+ if (nr_copied < insn.length)
+ return false;
+
+ umip_inst = __identify_insn(&insn);
+ /* Check if we found an instruction protected by UMIP */
+ if (umip_inst < 0)
+ return false;
+
+ if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
+ return false;
+
+ /* If operand is a register, write directly to it */
+ if (X86_MODRM_MOD(insn.modrm.value) == 3) {
+ reg_offset = insn_get_reg_offset_rm(&insn, regs);
+ reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
+ memcpy(reg_addr, dummy_data, dummy_data_size);
+ } else {
+ uaddr = insn_get_addr_ref(&insn, regs);
+ nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
+ if (nr_copied > 0) {
+ /*
+ * If copy fails, send a signal and tell caller that
+ * fault was fixed up.
+ */
+ __force_sig_info_umip_fault(uaddr, regs);
+ return true;
+ }
+ }
+
+ /* increase IP to let the program keep going */
+ regs->ip += insn.length;
+ return true;
+}
--
2.9.3
next prev parent reply other threads:[~2017-01-25 20:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-25 20:23 [v3 PATCH 00/10] x86: Enable User-Mode Instruction Prevention Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 01/10] x86/mpx: Do not use SIB index if index points to R/ESP Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 02/10] x86/mpx: Fail decoding when SIB baseR/EBP is and no displacement is used Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 03/10] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel Ricardo Neri
2017-01-26 2:23 ` Masami Hiramatsu
2017-01-25 20:23 ` [v3 PATCH 04/10] x86/insn-kernel: Add a function to obtain register offset in ModRM Ricardo Neri
2017-01-26 2:11 ` Masami Hiramatsu
2017-01-26 6:07 ` Ricardo Neri
2017-01-27 7:53 ` Masami Hiramatsu
2017-02-01 1:01 ` Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 05/10] x86/insn-kernel: Add support to resolve 16-bit addressing encodings Ricardo Neri
2017-01-25 21:58 ` Andy Lutomirski
2017-01-25 22:04 ` H. Peter Anvin
2017-01-26 5:50 ` Ricardo Neri
2017-01-26 17:05 ` Andy Lutomirski
2017-01-27 3:44 ` Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 06/10] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
2017-01-25 20:23 ` Ricardo Neri [this message]
2017-01-25 20:38 ` [v3 PATCH 07/10] x86: Add emulation code for UMIP instructions H. Peter Anvin
2017-01-26 5:54 ` Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 08/10] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 09/10] x86: Enable User-Mode Instruction Prevention Ricardo Neri
2017-01-25 20:23 ` [v3 PATCH 10/10] selftests/x86: Add tests for " Ricardo Neri
2017-01-25 20:34 ` [v3 PATCH 00/10] x86: Enable " H. Peter Anvin
2017-01-26 5:51 ` Ricardo Neri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170125202353.101059-8-ricardo.neri-calderon@linux.intel.com \
--to=ricardo.neri-calderon@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bp@suse.de \
--cc=brgerst@gmail.com \
--cc=cmetcalf@mellanox.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=fenghua.yu@intel.com \
--cc=hpa@zytor.com \
--cc=jslaby@suse.cz \
--cc=julliard@winehq.org \
--cc=liang.z.li@intel.com \
--cc=linux-kernel@vger.kern \
--cc=luto@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=mst@redhat.com \
--cc=paul.gortmaker@windriver.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ravi.v.shankar@intel.com \
--cc=ray.huang@amd.com \
--cc=shuah@kernel.org \
--cc=slaoub@gmail.com \
--cc=stsp@list.ru \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox