From: Masami Hiramatsu <mhiramat@kernel.org>
To: Nadav Amit <namit@vmware.com>
Cc: Ingo Molnar <mingo@redhat.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Damian Tometzki <linux_dti@icloud.com>,
linux-integrity <linux-integrity@vger.kernel.org>,
LSM List <linux-security-module@vger.kernel.org>,
Andy Lutomirski <luto@kernel.org>,
Kees Cook <keescook@chromium.org>,
Dave Hansen <dave.hansen@intel.com>,
Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH v6 08/10] x86: avoid W^X being broken during modules loading
Date: Fri, 30 Nov 2018 11:32:01 +0900 [thread overview]
Message-ID: <20181130113201.c215e2a48c756230ddb48da3@kernel.org> (raw)
In-Reply-To: <5E938E6F-6BE6-4F86-AC9E-5C389B492682@vmware.com>
On Wed, 28 Nov 2018 18:59:30 +0000
Nadav Amit <namit@vmware.com> wrote:
> > On Nov 20, 2018, at 12:35 PM, Nadav Amit <namit@vmware.com> wrote:
> >
> > When modules and BPF filters are loaded, there is a time window in
> > which some memory is both writable and executable. An attacker that has
> > already found another vulnerability (e.g., a dangling pointer) might be
> > able to exploit this behavior to overwrite kernel code. This patch
> > prevents having writable executable PTEs in this stage.
> >
> > In addition, avoiding having R+X mappings can also slightly simplify the
> > patching of modules code on initialization (e.g., by alternatives and
> > static-key), as would be done in the next patch.
> >
> > To avoid having W+X mappings, set them initially as RW (NX) and after
> > they are set as RO set them as X as well. Setting them as executable is
> > done as a separate step to avoid one core in which the old PTE is cached
> > (hence writable), and another which sees the updated PTE (executable),
> > which would break the W^X protection.
> >
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Dave Hansen <dave.hansen@intel.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Suggested-by: Andy Lutomirski <luto@amacapital.net>
> > Signed-off-by: Nadav Amit <namit@vmware.com>
> > ---
> > arch/x86/kernel/alternative.c | 28 +++++++++++++++++++++-------
> > arch/x86/kernel/module.c | 2 +-
> > include/linux/filter.h | 6 ++++++
> > kernel/module.c | 10 ++++++++++
> > 4 files changed, 38 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > index 8fc4685f3117..18415e3b6000 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -667,15 +667,29 @@ void __init alternative_instructions(void)
> > * handlers seeing an inconsistent instruction while you patch.
> > */
> > void *__init_or_module text_poke_early(void *addr, const void *opcode,
> > - size_t len)
> > + size_t len)
> > {
> > unsigned long flags;
> > - local_irq_save(flags);
> > - memcpy(addr, opcode, len);
> > - local_irq_restore(flags);
> > - sync_core();
> > - /* Could also do a CLFLUSH here to speed up CPU recovery; but
> > - that causes hangs on some VIA CPUs. */
> > +
> > + if (static_cpu_has(X86_FEATURE_NX) &&
> > + is_module_text_address((unsigned long)addr)) {
> > + /*
> > + * Modules text is marked initially as non-executable, so the
> > + * code cannot be running and speculative code-fetches are
> > + * prevented. We can just change the code.
> > + */
> > + memcpy(addr, opcode, len);
> > + } else {
> > + local_irq_save(flags);
> > + memcpy(addr, opcode, len);
> > + local_irq_restore(flags);
> > + sync_core();
> > +
> > + /*
> > + * Could also do a CLFLUSH here to speed up CPU recovery; but
> > + * that causes hangs on some VIA CPUs.
> > + */
> > + }
> > return addr;
> > }
> >
> > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> > index b052e883dd8c..cfa3106faee4 100644
> > --- a/arch/x86/kernel/module.c
> > +++ b/arch/x86/kernel/module.c
> > @@ -87,7 +87,7 @@ void *module_alloc(unsigned long size)
> > p = __vmalloc_node_range(size, MODULE_ALIGN,
> > MODULES_VADDR + get_module_load_offset(),
> > MODULES_END, GFP_KERNEL,
> > - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> > + PAGE_KERNEL, 0, NUMA_NO_NODE,
> > __builtin_return_address(0));
> > if (p && (kasan_module_alloc(p, size) < 0)) {
> > vfree(p);
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index de629b706d1d..ee9ae03c5f56 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -704,7 +704,13 @@ static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
> >
> > static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
> > {
> > + /*
> > + * Perform mapping changes in two stages to avoid opening a time-window
> > + * in which a PTE is cached in any TLB as writable, but marked as
> > + * executable in the memory-resident mappings (e.g., page-tables).
> > + */
> > set_memory_ro((unsigned long)hdr, hdr->pages);
> > + set_memory_x((unsigned long)hdr, hdr->pages);
> > }
> >
> > static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 49a405891587..7cb207249437 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -1946,9 +1946,19 @@ void module_enable_ro(const struct module *mod, bool after_init)
> > if (!rodata_enabled)
> > return;
> >
> > + /*
> > + * Perform mapping changes in two stages to avoid opening a time-window
> > + * in which a PTE is cached in any TLB as writable, but marked as
> > + * executable in the memory-resident mappings (e.g., page-tables).
> > + */
> > frob_text(&mod->core_layout, set_memory_ro);
> > + frob_text(&mod->core_layout, set_memory_x);
> > +
> > frob_rodata(&mod->core_layout, set_memory_ro);
> > +
> > frob_text(&mod->init_layout, set_memory_ro);
> > + frob_text(&mod->init_layout, set_memory_x);
> > +
> > frob_rodata(&mod->init_layout, set_memory_ro);
> >
> > if (after_init)
> > --
> > 2.17.1
>
> Rick pointed out that I screwed up ftrace and kprobes.
>
> For kprobes, I think I need to add set_memory_x() to alloc_insn_page() and
> change arch_ftrace_update_trampoline().
Oops, right. It should be easy to fix for kprobe :) what you need is below.
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c33b06f..51818f3 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -431,8 +431,10 @@ void *alloc_insn_page(void)
void *page;
page = module_alloc(PAGE_SIZE);
- if (page)
+ if (page) {
set_memory_ro((unsigned long)page & PAGE_MASK, 1);
+ set_memory_x((unsigned long)page & PAGE_MASK, 1);
+ }
return page;
}
>
> For arch_ftrace_update_trampoline(), I think I should remove not use
> set_memory_rw() when patching in __probe_kernel_write() should be done
> through text_poke().
right.
>
> I’ll give it another look and send another version later.
Thank you,
>
> Regards,
> Nadav
>
>
--
Masami Hiramatsu <mhiramat@kernel.org>
next prev parent reply other threads:[~2018-11-30 2:32 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-20 20:35 [PATCH v6 00/10] x86/alternative: text_poke() fixes Nadav Amit
2018-11-20 20:35 ` [PATCH v6 01/10] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()" Nadav Amit
2018-11-20 20:35 ` [PATCH v6 02/10] x86/jump_label: Use text_poke_early() during early init Nadav Amit
2018-11-20 20:35 ` [PATCH v6 03/10] x86/mm: temporary mm struct Nadav Amit
2018-11-20 20:35 ` [PATCH v6 04/10] fork: provide a function for copying init_mm Nadav Amit
2018-11-20 20:35 ` [PATCH v6 05/10] x86/alternative: initializing temporary mm for patching Nadav Amit
2018-11-20 20:35 ` [PATCH v6 06/10] x86/alternative: use temporary mm for text poking Nadav Amit
2018-11-20 20:35 ` [PATCH v6 07/10] x86/kgdb: avoid redundant comparison of patched code Nadav Amit
2018-11-20 20:35 ` [PATCH v6 08/10] x86: avoid W^X being broken during modules loading Nadav Amit
2018-11-28 18:59 ` Nadav Amit
2018-11-30 2:32 ` Masami Hiramatsu [this message]
2018-11-20 20:35 ` [PATCH v6 09/10] x86/jump-label: remove support for custom poker Nadav Amit
2018-11-20 20:35 ` [PATCH v6 10/10] x86/alternative: remove the return value of text_poke_*() Nadav Amit
2018-11-26 10:32 ` [PATCH v6 00/10] x86/alternative: text_poke() fixes Peter Zijlstra
2018-11-26 17:46 ` Nadav Amit
2018-11-29 12:58 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181130113201.c215e2a48c756230ddb48da3@kernel.org \
--to=mhiramat@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-integrity@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=linux_dti@icloud.com \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=namit@vmware.com \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.