All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Ingo Molnar <mingo@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Nadav Amit <nadav.amit@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>, <linux_dti@icloud.com>,
	<linux-integrity@vger.kernel.org>,
	<linux-security-module@vger.kernel.org>,
	Nadav Amit <namit@vmware.com>, Kees Cook <keescook@chromium.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: [PATCH v7 10/14] x86: avoid W^X being broken during modules loading
Date: Tue, 4 Dec 2018 17:34:04 -0800	[thread overview]
Message-ID: <20181205013408.47725-11-namit@vmware.com> (raw)
In-Reply-To: <20181205013408.47725-1-namit@vmware.com>

When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. This patch
prevents having writable executable PTEs in this stage.

In addition, avoiding having R+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch. This was actually the
main motivation for this patch.

To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.

Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/kernel/alternative.c | 28 +++++++++++++++++++++-------
 arch/x86/kernel/module.c      |  2 +-
 include/linux/filter.h        |  6 ++++++
 kernel/module.c               | 10 ++++++++++
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8fc4685f3117..18415e3b6000 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -667,15 +667,29 @@ void __init alternative_instructions(void)
  * handlers seeing an inconsistent instruction while you patch.
  */
 void *__init_or_module text_poke_early(void *addr, const void *opcode,
-					      size_t len)
+				       size_t len)
 {
 	unsigned long flags;
-	local_irq_save(flags);
-	memcpy(addr, opcode, len);
-	local_irq_restore(flags);
-	sync_core();
-	/* Could also do a CLFLUSH here to speed up CPU recovery; but
-	   that causes hangs on some VIA CPUs. */
+
+	if (static_cpu_has(X86_FEATURE_NX) &&
+	    is_module_text_address((unsigned long)addr)) {
+		/*
+		 * Modules text is marked initially as non-executable, so the
+		 * code cannot be running and speculative code-fetches are
+		 * prevented. We can just change the code.
+		 */
+		memcpy(addr, opcode, len);
+	} else {
+		local_irq_save(flags);
+		memcpy(addr, opcode, len);
+		local_irq_restore(flags);
+		sync_core();
+
+		/*
+		 * Could also do a CLFLUSH here to speed up CPU recovery; but
+		 * that causes hangs on some VIA CPUs.
+		 */
+	}
 	return addr;
 }
 
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b052e883dd8c..cfa3106faee4 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -87,7 +87,7 @@ void *module_alloc(unsigned long size)
 	p = __vmalloc_node_range(size, MODULE_ALIGN,
 				    MODULES_VADDR + get_module_load_offset(),
 				    MODULES_END, GFP_KERNEL,
-				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				    PAGE_KERNEL, 0, NUMA_NO_NODE,
 				    __builtin_return_address(0));
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index de629b706d1d..ee9ae03c5f56 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -704,7 +704,13 @@ static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
 
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
+	/*
+	 * Perform mapping changes in two stages to avoid opening a time-window
+	 * in which a PTE is cached in any TLB as writable, but marked as
+	 * executable in the memory-resident mappings (e.g., page-tables).
+	 */
 	set_memory_ro((unsigned long)hdr, hdr->pages);
+	set_memory_x((unsigned long)hdr, hdr->pages);
 }
 
 static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..7cb207249437 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1946,9 +1946,19 @@ void module_enable_ro(const struct module *mod, bool after_init)
 	if (!rodata_enabled)
 		return;
 
+	/*
+	 * Perform mapping changes in two stages to avoid opening a time-window
+	 * in which a PTE is cached in any TLB as writable, but marked as
+	 * executable in the memory-resident mappings (e.g., page-tables).
+	 */
 	frob_text(&mod->core_layout, set_memory_ro);
+	frob_text(&mod->core_layout, set_memory_x);
+
 	frob_rodata(&mod->core_layout, set_memory_ro);
+
 	frob_text(&mod->init_layout, set_memory_ro);
+	frob_text(&mod->init_layout, set_memory_x);
+
 	frob_rodata(&mod->init_layout, set_memory_ro);
 
 	if (after_init)
-- 
2.17.1


  parent reply	other threads:[~2018-12-05  8:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05  1:33 [PATCH v7 00/14] x86/alternative: text_poke() enhancements Nadav Amit
2018-12-05  1:33 ` [PATCH v7 01/14] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()" Nadav Amit
2018-12-05  1:33 ` [PATCH v7 02/14] x86/jump_label: Use text_poke_early() during early init Nadav Amit
2018-12-05  1:33 ` [PATCH v7 03/14] x86/mm: temporary mm struct Nadav Amit
2018-12-05  1:33 ` [PATCH v7 04/14] fork: provide a function for copying init_mm Nadav Amit
2018-12-05  1:33 ` [PATCH v7 05/14] x86/alternative: initializing temporary mm for patching Nadav Amit
2018-12-05  1:34 ` [PATCH v7 06/14] x86/alternative: use temporary mm for text poking Nadav Amit
2018-12-05  1:34 ` [PATCH v7 07/14] x86/kgdb: avoid redundant comparison of patched code Nadav Amit
2018-12-05  1:34 ` [PATCH v7 08/14] x86/ftrace: Use text_poke_*() infrastructure Nadav Amit
2018-12-06  0:06   ` Nadav Amit
2018-12-06 16:28     ` Ingo Molnar
2018-12-05  1:34 ` [PATCH v7 09/14] x86/kprobes: Instruction pages initialization enhancements Nadav Amit
2018-12-06 13:09   ` Masami Hiramatsu
2018-12-05  1:34 ` Nadav Amit [this message]
2018-12-05  1:34 ` [PATCH v7 11/14] x86/jump-label: remove support for custom poker Nadav Amit
2018-12-05  1:34 ` [PATCH v7 12/14] x86/alternative: Remove the return value of text_poke_*() Nadav Amit
2018-12-05  1:34 ` [PATCH v7 13/14] module: Do not set nx for module memory before freeing Nadav Amit
2018-12-06  9:57   ` Peter Zijlstra
2018-12-06 17:28     ` Nadav Amit
2018-12-06 11:13   ` Andrea Parri
2018-12-06 18:52   ` Andy Lutomirski
2018-12-06 18:56     ` Nadav Amit
2018-12-06 20:21     ` Edgecombe, Rick P
2018-12-06 20:29       ` Nadav Amit
2018-12-13 14:10   ` Jessica Yu
2018-12-13 17:25     ` Nadav Amit
2018-12-05  1:34 ` [PATCH v7 14/14] module: Prevent module removal racing with text_poke() Nadav Amit
2018-12-06 10:01   ` Peter Zijlstra
2018-12-06 10:03 ` [PATCH v7 00/14] x86/alternative: text_poke() enhancements Peter Zijlstra
2018-12-10  1:06   ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181205013408.47725-11-namit@vmware.com \
    --to=namit@vmware.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux_dti@icloud.com \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.