From: Nadav Amit <namit@vmware.com>
To: Ingo Molnar <mingo@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <x86@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Nadav Amit <namit@vmware.com>, Andy Lutomirski <luto@kernel.org>,
Kees Cook <keescook@chromium.org>,
Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@intel.com>,
Masami Hiramatsu <mhiramat@kernel.org>
Subject: [PATCH v4 08/10] x86: avoid W^X being broken during modules loading
Date: Sat, 10 Nov 2018 15:17:30 -0800 [thread overview]
Message-ID: <20181110231732.15060-9-namit@vmware.com> (raw)
In-Reply-To: <20181110231732.15060-1-namit@vmware.com>
When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. This patch
prevents having writable executable PTEs in this stage.
In addition, avoiding having R+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch.
To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
arch/x86/kernel/alternative.c | 28 +++++++++++++++++++++-------
arch/x86/kernel/module.c | 2 +-
include/linux/filter.h | 6 ++++++
kernel/module.c | 10 ++++++++++
4 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 96607ef285c3..70827332da0f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -667,15 +667,29 @@ void __init alternative_instructions(void)
* handlers seeing an inconsistent instruction while you patch.
*/
void *__init_or_module text_poke_early(void *addr, const void *opcode,
- size_t len)
+ size_t len)
{
unsigned long flags;
- local_irq_save(flags);
- memcpy(addr, opcode, len);
- local_irq_restore(flags);
- sync_core();
- /* Could also do a CLFLUSH here to speed up CPU recovery; but
- that causes hangs on some VIA CPUs. */
+
+ if (static_cpu_has(X86_FEATURE_NX) &&
+ is_module_text_address((unsigned long)addr)) {
+ /*
+ * Modules text is marked initially as non-executable, so the
+ * code cannot be running and speculative code-fetches are
+ * prevented. We can just change the code.
+ */
+ memcpy(addr, opcode, len);
+ } else {
+ local_irq_save(flags);
+ memcpy(addr, opcode, len);
+ local_irq_restore(flags);
+ sync_core();
+
+ /*
+ * Could also do a CLFLUSH here to speed up CPU recovery; but
+ * that causes hangs on some VIA CPUs.
+ */
+ }
return addr;
}
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b052e883dd8c..cfa3106faee4 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -87,7 +87,7 @@ void *module_alloc(unsigned long size)
p = __vmalloc_node_range(size, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL,
- PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+ PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
if (p && (kasan_module_alloc(p, size) < 0)) {
vfree(p);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index de629b706d1d..ee9ae03c5f56 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -704,7 +704,13 @@ static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
{
+ /*
+ * Perform mapping changes in two stages to avoid opening a time-window
+ * in which a PTE is cached in any TLB as writable, but marked as
+ * executable in the memory-resident mappings (e.g., page-tables).
+ */
set_memory_ro((unsigned long)hdr, hdr->pages);
+ set_memory_x((unsigned long)hdr, hdr->pages);
}
static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..7cb207249437 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1946,9 +1946,19 @@ void module_enable_ro(const struct module *mod, bool after_init)
if (!rodata_enabled)
return;
+ /*
+ * Perform mapping changes in two stages to avoid opening a time-window
+ * in which a PTE is cached in any TLB as writable, but marked as
+ * executable in the memory-resident mappings (e.g., page-tables).
+ */
frob_text(&mod->core_layout, set_memory_ro);
+ frob_text(&mod->core_layout, set_memory_x);
+
frob_rodata(&mod->core_layout, set_memory_ro);
+
frob_text(&mod->init_layout, set_memory_ro);
+ frob_text(&mod->init_layout, set_memory_x);
+
frob_rodata(&mod->init_layout, set_memory_ro);
if (after_init)
--
2.17.1
next prev parent reply other threads:[~2018-11-11 6:35 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-10 23:17 [PATCH v4 00/10] x86/alternative: text_poke() fixes Nadav Amit
2018-11-10 23:17 ` [PATCH v4 01/10] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()" Nadav Amit
2018-11-12 2:54 ` Masami Hiramatsu
2018-11-12 10:59 ` Jiri Kosina
2018-11-10 23:17 ` [PATCH v4 02/10] x86/jump_label: Use text_poke_early() during early init Nadav Amit
2018-11-12 20:12 ` Nadav Amit
2018-11-10 23:17 ` [PATCH v4 03/10] x86/mm: temporary mm struct Nadav Amit
2018-11-10 23:17 ` [PATCH v4 04/10] fork: provide a function for copying init_mm Nadav Amit
2018-11-10 23:17 ` [PATCH v4 05/10] x86/alternative: initializing temporary mm for patching Nadav Amit
2018-11-11 14:43 ` Peter Zijlstra
2018-11-11 20:38 ` Nadav Amit
2018-11-12 0:34 ` Peter Zijlstra
2018-11-10 23:17 ` [PATCH v4 06/10] x86/alternative: use temporary mm for text poking Nadav Amit
2018-11-11 14:59 ` Peter Zijlstra
2018-11-11 20:53 ` Nadav Amit
2018-11-11 23:52 ` Peter Zijlstra
2018-11-12 0:09 ` Nadav Amit
2018-11-12 0:41 ` Peter Zijlstra
2018-11-12 0:36 ` Peter Zijlstra
2018-11-12 3:46 ` Ingo Molnar
2018-11-12 8:50 ` Peter Zijlstra
2018-11-11 19:11 ` Damian Tometzki
2018-11-11 20:41 ` Nadav Amit
2018-11-10 23:17 ` [PATCH v4 07/10] x86/kgdb: avoid redundant comparison of code Nadav Amit
2018-11-10 23:17 ` Nadav Amit [this message]
2018-11-10 23:17 ` [PATCH v4 09/10] x86/jump-label: remove support for custom poker Nadav Amit
2018-11-11 15:05 ` Peter Zijlstra
2018-11-11 20:31 ` Nadav Amit
2018-11-10 23:17 ` [PATCH v4 10/10] x86/alternative: remove the return value of text_poke_*() Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181110231732.15060-9-namit@vmware.com \
--to=namit@vmware.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.