* [PATCH v7 0/8] x86/module: use large ROX pages for text allocations
@ 2024-10-23 16:27 Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 1/8] mm: vmalloc: group declarations depending on CONFIG_MMU together Mike Rapoport
` (8 more replies)
0 siblings, 9 replies; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Hi,
This is an updated version of execmem ROX caches.
v6: https://lore.kernel.org/all/20241016122424.1655560-1-rppt@kernel.org
* Fixed handling of alternatives for fineibt (kbuild bot)
* Restored usage of text_poke_early for ftrace boot time initialization (Steve)
* Made !module path in module_writable_address inline
v5: https://lore.kernel.org/all/20241009180816.83591-1-rppt@kernel.org
* Droped check for !area in mas_for_each() loop (Kees Bakker)
* Droped externs in include/linux/vmalloc.h (Christoph)
* Fixed handling of alternatives for CFI-enabled configs (Nathan)
* Fixed interaction with kmemleak (Sergey).
It looks like execmem and kmemleak interaction should be improved
further, but it's out of scope of this series.
* Added ARCH_HAS_EXECMEM_ROX configuration option to arch/Kconfig. The
option serves two purposes:
- make sure architecture that uses ROX caches implements
execmem_fill_trapping_insns() callback (Christoph)
- make sure entire physical memory is mapped in the direct map (Dave)
v4: https://lore.kernel.org/all/20241007062858.44248-1-rppt@kernel.org
* Fix copy/paste error in looongarch (Huacai)
v3: https://lore.kernel.org/all/20240909064730.3290724-1-rppt@kernel.org
* Drop ftrace_swap_func(). It is not needed because mcount array lives
in a data section (Peter)
* Update maple_tree usage (Liam)
* Set ->fill_trapping_insns pointer on init (Ard)
* Instead of using VM_FLUSH_RESET_PERMS for execmem cache, completely
remove it from the direct map
v2: https://lore.kernel.org/all/20240826065532.2618273-1-rppt@kernel.org
* add comment why ftrace_swap_func() is needed (Steve)
Since RFC: https://lore.kernel.org/all/20240411160526.2093408-1-rppt@kernel.org
* update changelog about HUGE_VMAP allocations (Christophe)
* move module_writable_address() from x86 to modules core (Ingo)
* rename execmem_invalidate() to execmem_fill_trapping_insns() (Peter)
* call alternatives_smp_unlock() after module text in-place is up to
date (Nadav)
= Original cover letter =
These patches add support for using large ROX pages for allocations of
executable memory on x86.
They address Andy's comments [1] about having executable mappings for code
that was not completely formed.
The approach taken is to allocate ROX memory along with writable but not
executable memory and use the writable copy to perform relocations and
alternatives patching. After the module text gets into its final shape, the
contents of the writable memory is copied into the actual ROX location
using text poking.
The allocations of the ROX memory use vmalloc(VMAP_ALLOW_HUGE_MAP) to
allocate PMD aligned memory, fill that memory with invalid instructions and
in the end remap it as ROX. Portions of these large pages are handed out to
execmem_alloc() callers without any changes to the permissions. When the
memory is freed with execmem_free() it is invalidated again so that it
won't contain stale instructions.
The module memory allocation, x86 code dealing with relocations and
alternatives patching take into account the existence of the two copies,
the writable memory and the ROX memory at the actual allocated virtual
address.
The patches are available at git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/x86-rox/v6
[1] https://lore.kernel.org/all/a17c65c6-863f-4026-9c6f-a04b659e9ab4@app.fastmail.com
Mike Rapoport (Microsoft) (8):
mm: vmalloc: group declarations depending on CONFIG_MMU together
mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations
asm-generic: introduce text-patching.h
module: prepare to handle ROX allocations for text
arch: introduce set_direct_map_valid_noflush()
x86/module: prepare module loading for ROX allocations of text
execmem: add support for cache of large ROX pages
x86/module: enable ROX caches for module text on 64 bit
arch/Kconfig | 8 +
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
.../include/asm/{patch.h => text-patching.h} | 0
arch/arm/kernel/ftrace.c | 2 +-
arch/arm/kernel/jump_label.c | 2 +-
arch/arm/kernel/kgdb.c | 2 +-
arch/arm/kernel/patch.c | 2 +-
arch/arm/probes/kprobes/core.c | 2 +-
arch/arm/probes/kprobes/opt-arm.c | 2 +-
arch/arm64/include/asm/set_memory.h | 1 +
.../asm/{patching.h => text-patching.h} | 0
arch/arm64/kernel/ftrace.c | 2 +-
arch/arm64/kernel/jump_label.c | 2 +-
arch/arm64/kernel/kgdb.c | 2 +-
arch/arm64/kernel/patching.c | 2 +-
arch/arm64/kernel/probes/kprobes.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/arm64/mm/pageattr.c | 10 +
arch/arm64/net/bpf_jit_comp.c | 2 +-
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/set_memory.h | 1 +
arch/loongarch/mm/pageattr.c | 19 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
.../include/asm/{patch.h => text-patching.h} | 0
arch/parisc/kernel/ftrace.c | 2 +-
arch/parisc/kernel/jump_label.c | 2 +-
arch/parisc/kernel/kgdb.c | 2 +-
arch/parisc/kernel/kprobes.c | 2 +-
arch/parisc/kernel/patch.c | 2 +-
arch/powerpc/include/asm/kprobes.h | 2 +-
.../asm/{code-patching.h => text-patching.h} | 0
arch/powerpc/kernel/crash_dump.c | 2 +-
arch/powerpc/kernel/epapr_paravirt.c | 2 +-
arch/powerpc/kernel/jump_label.c | 2 +-
arch/powerpc/kernel/kgdb.c | 2 +-
arch/powerpc/kernel/kprobes.c | 2 +-
arch/powerpc/kernel/module_32.c | 2 +-
arch/powerpc/kernel/module_64.c | 2 +-
arch/powerpc/kernel/optprobes.c | 2 +-
arch/powerpc/kernel/process.c | 2 +-
arch/powerpc/kernel/security.c | 2 +-
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/powerpc/kernel/static_call.c | 2 +-
arch/powerpc/kernel/trace/ftrace.c | 2 +-
arch/powerpc/kernel/trace/ftrace_64_pg.c | 2 +-
arch/powerpc/lib/code-patching.c | 2 +-
arch/powerpc/lib/feature-fixups.c | 2 +-
arch/powerpc/lib/test-code-patching.c | 2 +-
arch/powerpc/lib/test_emulate_step.c | 2 +-
arch/powerpc/mm/book3s32/mmu.c | 2 +-
arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
arch/powerpc/mm/book3s64/slb.c | 2 +-
arch/powerpc/mm/kasan/init_32.c | 2 +-
arch/powerpc/mm/mem.c | 2 +-
arch/powerpc/mm/nohash/44x.c | 2 +-
arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +-
arch/powerpc/mm/nohash/tlb.c | 2 +-
arch/powerpc/mm/nohash/tlb_64e.c | 2 +-
arch/powerpc/net/bpf_jit_comp.c | 2 +-
arch/powerpc/perf/8xx-pmu.c | 2 +-
arch/powerpc/perf/core-book3s.c | 2 +-
arch/powerpc/platforms/85xx/smp.c | 2 +-
arch/powerpc/platforms/86xx/mpc86xx_smp.c | 2 +-
arch/powerpc/platforms/cell/smp.c | 2 +-
arch/powerpc/platforms/powermac/smp.c | 2 +-
arch/powerpc/platforms/powernv/idle.c | 2 +-
arch/powerpc/platforms/powernv/smp.c | 2 +-
arch/powerpc/platforms/pseries/smp.c | 2 +-
arch/powerpc/xmon/xmon.c | 2 +-
arch/riscv/errata/andes/errata.c | 2 +-
arch/riscv/errata/sifive/errata.c | 2 +-
arch/riscv/errata/thead/errata.c | 2 +-
arch/riscv/include/asm/set_memory.h | 1 +
.../include/asm/{patch.h => text-patching.h} | 0
arch/riscv/include/asm/uprobes.h | 2 +-
arch/riscv/kernel/alternative.c | 2 +-
arch/riscv/kernel/cpufeature.c | 3 +-
arch/riscv/kernel/ftrace.c | 2 +-
arch/riscv/kernel/jump_label.c | 2 +-
arch/riscv/kernel/patch.c | 2 +-
arch/riscv/kernel/probes/kprobes.c | 2 +-
arch/riscv/mm/pageattr.c | 15 +
arch/riscv/net/bpf_jit_comp64.c | 2 +-
arch/riscv/net/bpf_jit_core.c | 2 +-
arch/s390/include/asm/set_memory.h | 1 +
arch/s390/mm/pageattr.c | 11 +
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/kernel/um_arch.c | 16 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +-
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/include/asm/text-patching.h | 1 +
arch/x86/kernel/alternative.c | 181 ++++++----
arch/x86/kernel/ftrace.c | 30 +-
arch/x86/kernel/module.c | 45 ++-
arch/x86/mm/init.c | 37 +-
arch/x86/mm/pat/set_memory.c | 8 +
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/text-patching.h | 5 +
include/linux/execmem.h | 37 ++
include/linux/module.h | 16 +
include/linux/moduleloader.h | 4 +
include/linux/set_memory.h | 6 +
include/linux/text-patching.h | 15 +
include/linux/vmalloc.h | 60 ++--
kernel/module/debug_kmemleak.c | 3 +-
kernel/module/main.c | 74 +++-
kernel/module/strict_rwx.c | 3 +
mm/execmem.c | 336 +++++++++++++++++-
mm/internal.h | 1 +
mm/vmalloc.c | 14 +-
121 files changed, 885 insertions(+), 247 deletions(-)
rename arch/arm/include/asm/{patch.h => text-patching.h} (100%)
rename arch/arm64/include/asm/{patching.h => text-patching.h} (100%)
rename arch/parisc/include/asm/{patch.h => text-patching.h} (100%)
rename arch/powerpc/include/asm/{code-patching.h => text-patching.h} (100%)
rename arch/riscv/include/asm/{patch.h => text-patching.h} (100%)
create mode 100644 include/asm-generic/text-patching.h
create mode 100644 include/linux/text-patching.h
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
--
2.43.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v7 1/8] mm: vmalloc: group declarations depending on CONFIG_MMU together
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 2/8] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations Mike Rapoport
` (7 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
There are a couple of declarations that depend on CONFIG_MMU in
include/linux/vmalloc.h spread all over the file.
Group them all together to improve code readability.
No functional changes.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
include/linux/vmalloc.h | 60 +++++++++++++++++------------------------
1 file changed, 24 insertions(+), 36 deletions(-)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index ad2ce7a6ab7a..27408f21e501 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -134,12 +134,6 @@ extern void vm_unmap_ram(const void *mem, unsigned int count);
extern void *vm_map_ram(struct page **pages, unsigned int count, int node);
extern void vm_unmap_aliases(void);
-#ifdef CONFIG_MMU
-extern unsigned long vmalloc_nr_pages(void);
-#else
-static inline unsigned long vmalloc_nr_pages(void) { return 0; }
-#endif
-
extern void *vmalloc_noprof(unsigned long size) __alloc_size(1);
#define vmalloc(...) alloc_hooks(vmalloc_noprof(__VA_ARGS__))
@@ -266,12 +260,29 @@ static inline bool is_vm_area_hugepages(const void *addr)
#endif
}
+/* for /proc/kcore */
+long vread_iter(struct iov_iter *iter, const char *addr, size_t count);
+
+/*
+ * Internals. Don't use..
+ */
+__init void vm_area_add_early(struct vm_struct *vm);
+__init void vm_area_register_early(struct vm_struct *vm, size_t align);
+
+int register_vmap_purge_notifier(struct notifier_block *nb);
+int unregister_vmap_purge_notifier(struct notifier_block *nb);
+
#ifdef CONFIG_MMU
+#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
+
+unsigned long vmalloc_nr_pages(void);
+
int vm_area_map_pages(struct vm_struct *area, unsigned long start,
unsigned long end, struct page **pages);
void vm_area_unmap_pages(struct vm_struct *area, unsigned long start,
unsigned long end);
void vunmap_range(unsigned long addr, unsigned long end);
+
static inline void set_vm_flush_reset_perms(void *addr)
{
struct vm_struct *vm = find_vm_area(addr);
@@ -279,24 +290,14 @@ static inline void set_vm_flush_reset_perms(void *addr)
if (vm)
vm->flags |= VM_FLUSH_RESET_PERMS;
}
+#else /* !CONFIG_MMU */
+#define VMALLOC_TOTAL 0UL
-#else
-static inline void set_vm_flush_reset_perms(void *addr)
-{
-}
-#endif
-
-/* for /proc/kcore */
-extern long vread_iter(struct iov_iter *iter, const char *addr, size_t count);
-
-/*
- * Internals. Don't use..
- */
-extern __init void vm_area_add_early(struct vm_struct *vm);
-extern __init void vm_area_register_early(struct vm_struct *vm, size_t align);
+static inline unsigned long vmalloc_nr_pages(void) { return 0; }
+static inline void set_vm_flush_reset_perms(void *addr) {}
+#endif /* CONFIG_MMU */
-#ifdef CONFIG_SMP
-# ifdef CONFIG_MMU
+#if defined(CONFIG_MMU) && defined(CONFIG_SMP)
struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
const size_t *sizes, int nr_vms,
size_t align);
@@ -311,22 +312,9 @@ pcpu_get_vm_areas(const unsigned long *offsets,
return NULL;
}
-static inline void
-pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
-{
-}
-# endif
-#endif
-
-#ifdef CONFIG_MMU
-#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
-#else
-#define VMALLOC_TOTAL 0UL
+static inline void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) {}
#endif
-int register_vmap_purge_notifier(struct notifier_block *nb);
-int unregister_vmap_purge_notifier(struct notifier_block *nb);
-
#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK)
bool vmalloc_dump_obj(void *object);
#else
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v7 2/8] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 1/8] mm: vmalloc: group declarations depending on CONFIG_MMU together Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 3/8] asm-generic: introduce text-patching.h Mike Rapoport
` (6 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explicitly
specify node ID will use huge pages only if size_per_node is larger than
a huge page.
Still the actual allocated memory is not distributed between nodes and
there is no advantage in such approach.
On the contrary, BPF allocates SZ_2M * num_possible_nodes() for each
new bpf_prog_pack, while it could do with a single huge page per pack.
Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with
NUMA_NO_NODE and use huge pages whenever the requested allocation size
is larger than a huge page.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
mm/vmalloc.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 634162271c00..86b2344d7461 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3763,8 +3763,6 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
}
if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
- unsigned long size_per_node;
-
/*
* Try huge pages. Only try for PAGE_KERNEL allocations,
* others like modules don't yet expect huge pages in
@@ -3772,13 +3770,10 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
* supporting them.
*/
- size_per_node = size;
- if (node == NUMA_NO_NODE)
- size_per_node /= num_online_nodes();
- if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
+ if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
shift = PMD_SHIFT;
else
- shift = arch_vmap_pte_supported_shift(size_per_node);
+ shift = arch_vmap_pte_supported_shift(size);
align = max(real_align, 1UL << shift);
size = ALIGN(real_size, 1UL << shift);
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v7 3/8] asm-generic: introduce text-patching.h
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 1/8] mm: vmalloc: group declarations depending on CONFIG_MMU together Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 2/8] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 4/8] module: prepare to handle ROX allocations for text Mike Rapoport
` (5 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Several architectures support text patching, but they name the header
files that declare patching functions differently.
Make all such headers consistently named text-patching.h and add an empty
header in asm-generic for architectures that do not support text patching.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/{patch.h => text-patching.h} | 0
arch/arm/kernel/ftrace.c | 2 +-
arch/arm/kernel/jump_label.c | 2 +-
arch/arm/kernel/kgdb.c | 2 +-
arch/arm/kernel/patch.c | 2 +-
arch/arm/probes/kprobes/core.c | 2 +-
arch/arm/probes/kprobes/opt-arm.c | 2 +-
.../include/asm/{patching.h => text-patching.h} | 0
arch/arm64/kernel/ftrace.c | 2 +-
arch/arm64/kernel/jump_label.c | 2 +-
arch/arm64/kernel/kgdb.c | 2 +-
arch/arm64/kernel/patching.c | 2 +-
arch/arm64/kernel/probes/kprobes.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/arm64/net/bpf_jit_comp.c | 2 +-
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
.../include/asm/{patch.h => text-patching.h} | 0
arch/parisc/kernel/ftrace.c | 2 +-
arch/parisc/kernel/jump_label.c | 2 +-
arch/parisc/kernel/kgdb.c | 2 +-
arch/parisc/kernel/kprobes.c | 2 +-
arch/parisc/kernel/patch.c | 2 +-
arch/powerpc/include/asm/kprobes.h | 2 +-
.../asm/{code-patching.h => text-patching.h} | 0
arch/powerpc/kernel/crash_dump.c | 2 +-
arch/powerpc/kernel/epapr_paravirt.c | 2 +-
arch/powerpc/kernel/jump_label.c | 2 +-
arch/powerpc/kernel/kgdb.c | 2 +-
arch/powerpc/kernel/kprobes.c | 2 +-
arch/powerpc/kernel/module_32.c | 2 +-
arch/powerpc/kernel/module_64.c | 2 +-
arch/powerpc/kernel/optprobes.c | 2 +-
arch/powerpc/kernel/process.c | 2 +-
arch/powerpc/kernel/security.c | 2 +-
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/powerpc/kernel/static_call.c | 2 +-
arch/powerpc/kernel/trace/ftrace.c | 2 +-
arch/powerpc/kernel/trace/ftrace_64_pg.c | 2 +-
arch/powerpc/lib/code-patching.c | 2 +-
arch/powerpc/lib/feature-fixups.c | 2 +-
arch/powerpc/lib/test-code-patching.c | 2 +-
arch/powerpc/lib/test_emulate_step.c | 2 +-
arch/powerpc/mm/book3s32/mmu.c | 2 +-
arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
arch/powerpc/mm/book3s64/slb.c | 2 +-
arch/powerpc/mm/kasan/init_32.c | 2 +-
arch/powerpc/mm/mem.c | 2 +-
arch/powerpc/mm/nohash/44x.c | 2 +-
arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +-
arch/powerpc/mm/nohash/tlb.c | 2 +-
arch/powerpc/mm/nohash/tlb_64e.c | 2 +-
arch/powerpc/net/bpf_jit_comp.c | 2 +-
arch/powerpc/perf/8xx-pmu.c | 2 +-
arch/powerpc/perf/core-book3s.c | 2 +-
arch/powerpc/platforms/85xx/smp.c | 2 +-
arch/powerpc/platforms/86xx/mpc86xx_smp.c | 2 +-
arch/powerpc/platforms/cell/smp.c | 2 +-
arch/powerpc/platforms/powermac/smp.c | 2 +-
arch/powerpc/platforms/powernv/idle.c | 2 +-
arch/powerpc/platforms/powernv/smp.c | 2 +-
arch/powerpc/platforms/pseries/smp.c | 2 +-
arch/powerpc/xmon/xmon.c | 2 +-
arch/riscv/errata/andes/errata.c | 2 +-
arch/riscv/errata/sifive/errata.c | 2 +-
arch/riscv/errata/thead/errata.c | 2 +-
.../include/asm/{patch.h => text-patching.h} | 0
arch/riscv/include/asm/uprobes.h | 2 +-
arch/riscv/kernel/alternative.c | 2 +-
arch/riscv/kernel/cpufeature.c | 3 ++-
arch/riscv/kernel/ftrace.c | 2 +-
arch/riscv/kernel/jump_label.c | 2 +-
arch/riscv/kernel/patch.c | 2 +-
arch/riscv/kernel/probes/kprobes.c | 2 +-
arch/riscv/net/bpf_jit_comp64.c | 2 +-
arch/riscv/net/bpf_jit_core.c | 2 +-
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/kernel/um_arch.c | 5 +++++
arch/x86/include/asm/text-patching.h | 1 +
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/text-patching.h | 5 +++++
include/linux/text-patching.h | 15 +++++++++++++++
92 files changed, 110 insertions(+), 70 deletions(-)
rename arch/arm/include/asm/{patch.h => text-patching.h} (100%)
rename arch/arm64/include/asm/{patching.h => text-patching.h} (100%)
rename arch/parisc/include/asm/{patch.h => text-patching.h} (100%)
rename arch/powerpc/include/asm/{code-patching.h => text-patching.h} (100%)
rename arch/riscv/include/asm/{patch.h => text-patching.h} (100%)
create mode 100644 include/asm-generic/text-patching.h
create mode 100644 include/linux/text-patching.h
diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 396caece6d6d..483965c5a4de 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += agp.h
generic-y += asm-offsets.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 49285a3ce239..4c69522e0328 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -6,3 +6,4 @@ generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/arm/include/asm/patch.h b/arch/arm/include/asm/text-patching.h
similarity index 100%
rename from arch/arm/include/asm/patch.h
rename to arch/arm/include/asm/text-patching.h
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index e61591f33a6c..845acf9ce21e 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -23,7 +23,7 @@
#include <asm/insn.h>
#include <asm/set_memory.h>
#include <asm/stacktrace.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
/*
* The compiler emitted profiling hook consists of
diff --git a/arch/arm/kernel/jump_label.c b/arch/arm/kernel/jump_label.c
index eb9c24b6e8e2..a06a92d0f550 100644
--- a/arch/arm/kernel/jump_label.c
+++ b/arch/arm/kernel/jump_label.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/jump_label.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/insn.h>
static void __arch_jump_label_transform(struct jump_entry *entry,
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index 22f937e6f3ff..ab76c55fd610 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -15,7 +15,7 @@
#include <linux/kgdb.h>
#include <linux/uaccess.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/traps.h>
struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index e9e828b6bb30..4d45e60cd46d 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -9,7 +9,7 @@
#include <asm/fixmap.h>
#include <asm/smp_plat.h>
#include <asm/opcodes.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
struct patch {
void *addr;
diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index d8238da095df..9fd877c87a38 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -25,7 +25,7 @@
#include <asm/cacheflush.h>
#include <linux/percpu.h>
#include <linux/bug.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>
#include "../decode-arm.h"
diff --git a/arch/arm/probes/kprobes/opt-arm.c b/arch/arm/probes/kprobes/opt-arm.c
index 7f65048380ca..966c6042c5ad 100644
--- a/arch/arm/probes/kprobes/opt-arm.c
+++ b/arch/arm/probes/kprobes/opt-arm.c
@@ -14,7 +14,7 @@
/* for arm_gen_branch */
#include <asm/insn.h>
/* for patch_text */
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include "core.h"
diff --git a/arch/arm64/include/asm/patching.h b/arch/arm64/include/asm/text-patching.h
similarity index 100%
rename from arch/arm64/include/asm/patching.h
rename to arch/arm64/include/asm/text-patching.h
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index a650f5e11fc5..3575d03d60af 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -15,7 +15,7 @@
#include <asm/debug-monitors.h>
#include <asm/ftrace.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
struct fregs_offset {
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
index f63ea915d6ad..b345425193d2 100644
--- a/arch/arm64/kernel/jump_label.c
+++ b/arch/arm64/kernel/jump_label.c
@@ -9,7 +9,7 @@
#include <linux/jump_label.h>
#include <linux/smp.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
bool arch_jump_label_transform_queue(struct jump_entry *entry,
enum jump_label_type type)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 4e1f983df3d1..f3c4d3a8a20f 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -17,7 +17,7 @@
#include <asm/debug-monitors.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/traps.h>
struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index 945df74005c7..7f99723fbb8c 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c
@@ -10,7 +10,7 @@
#include <asm/fixmap.h>
#include <asm/insn.h>
#include <asm/kprobes.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>
static DEFINE_RAW_SPINLOCK(patch_lock);
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 4268678d0e86..01dbe9a56956 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -27,7 +27,7 @@
#include <asm/debug-monitors.h>
#include <asm/insn.h>
#include <asm/irq.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/ptrace.h>
#include <asm/sections.h>
#include <asm/system_misc.h>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 563cbce11126..7d8199804086 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -41,7 +41,7 @@
#include <asm/extable.h>
#include <asm/insn.h>
#include <asm/kprobes.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/traps.h>
#include <asm/smp.h>
#include <asm/stack_pointer.h>
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8bbd0b20136a..2da25086c4ed 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -19,7 +19,7 @@
#include <asm/cacheflush.h>
#include <asm/debug-monitors.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/set_memory.h>
#include "bpf_jit.h"
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 9a9bc65b57a9..3a5c7f6e5aac 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += qspinlock.h
generic-y += parport.h
generic-y += user.h
generic-y += vmlinux.lds.h
+generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 8c1a78c8f527..1efa1e993d4b 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += extable.h
generic-y += iomap.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 5b5a6c90e6e2..80ddb5edb845 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += ioctl.h
generic-y += mmzone.h
generic-y += statfs.h
generic-y += param.h
+generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 0dbf9c5c6fae..b282e0dd8dc1 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += spinlock.h
+generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index a055f5dbe00a..7178f990e8b3 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -8,3 +8,4 @@ generic-y += parport.h
generic-y += syscalls.h
generic-y += tlb.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 7ba67a0d6c97..684569b2ecd6 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -13,3 +13,4 @@ generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 0d09829ed144..28004301c236 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -7,3 +7,4 @@ generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += spinlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index cef49d60d74c..2b1a6b00cdac 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -9,3 +9,4 @@ generic-y += spinlock.h
generic-y += qrwlock_types.h
generic-y += qrwlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/parisc/include/asm/patch.h b/arch/parisc/include/asm/text-patching.h
similarity index 100%
rename from arch/parisc/include/asm/patch.h
rename to arch/parisc/include/asm/text-patching.h
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index c91f9c2e61ed..3e34b4473d3a 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -20,7 +20,7 @@
#include <asm/assembly.h>
#include <asm/sections.h>
#include <asm/ftrace.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#define __hot __section(".text.hot")
diff --git a/arch/parisc/kernel/jump_label.c b/arch/parisc/kernel/jump_label.c
index e253b134500d..ea51f15bf0e6 100644
--- a/arch/parisc/kernel/jump_label.c
+++ b/arch/parisc/kernel/jump_label.c
@@ -8,7 +8,7 @@
#include <linux/jump_label.h>
#include <linux/bug.h>
#include <asm/alternative.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
static inline int reassemble_17(int as17)
{
diff --git a/arch/parisc/kernel/kgdb.c b/arch/parisc/kernel/kgdb.c
index b16fa9bac5f4..fee81f877525 100644
--- a/arch/parisc/kernel/kgdb.c
+++ b/arch/parisc/kernel/kgdb.c
@@ -16,7 +16,7 @@
#include <asm/ptrace.h>
#include <asm/traps.h>
#include <asm/processor.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/cacheflush.h>
const struct kgdb_arch arch_kgdb_ops = {
diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 6e0b86652f30..9255adba67a3 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -12,7 +12,7 @@
#include <linux/kprobes.h>
#include <linux/slab.h>
#include <asm/cacheflush.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
index e59574f65e64..35dd764b871e 100644
--- a/arch/parisc/kernel/patch.c
+++ b/arch/parisc/kernel/patch.c
@@ -13,7 +13,7 @@
#include <asm/cacheflush.h>
#include <asm/fixmap.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
struct patch {
void *addr;
diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
index 4525a9c68260..dfe2e5ad3b21 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -21,7 +21,7 @@
#include <linux/percpu.h>
#include <linux/module.h>
#include <asm/probes.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#ifdef CONFIG_KPROBES
#define __ARCH_WANT_KPROBES_INSN_SLOT
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/text-patching.h
similarity index 100%
rename from arch/powerpc/include/asm/code-patching.h
rename to arch/powerpc/include/asm/text-patching.h
diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 2086fa6cdc25..103b6605dd68 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -13,7 +13,7 @@
#include <linux/io.h>
#include <linux/memblock.h>
#include <linux/of.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/kdump.h>
#include <asm/firmware.h>
#include <linux/uio.h>
diff --git a/arch/powerpc/kernel/epapr_paravirt.c b/arch/powerpc/kernel/epapr_paravirt.c
index d4b8aff20815..247ab2acaccc 100644
--- a/arch/powerpc/kernel/epapr_paravirt.c
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -9,7 +9,7 @@
#include <linux/of_fdt.h>
#include <asm/epapr_hcalls.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/machdep.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
index 5277cf582c16..2659e1ac8604 100644
--- a/arch/powerpc/kernel/jump_label.c
+++ b/arch/powerpc/kernel/jump_label.c
@@ -5,7 +5,7 @@
#include <linux/kernel.h>
#include <linux/jump_label.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>
void arch_jump_label_transform(struct jump_entry *entry,
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 7a8bc03a00af..5081334b7bd2 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -21,7 +21,7 @@
#include <asm/processor.h>
#include <asm/machdep.h>
#include <asm/debug.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <linux/slab.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index f8aa91bc3b17..9c85bbcc5201 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -21,7 +21,7 @@
#include <linux/slab.h>
#include <linux/set_memory.h>
#include <linux/execmem.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cacheflush.h>
#include <asm/sstep.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index 816a63fd71fb..f930e3395a7f 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -18,7 +18,7 @@
#include <linux/bug.h>
#include <linux/sort.h>
#include <asm/setup.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
/* Count how many different relocations (different symbol, different
addend) */
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index e9bab599d0c2..135960918d14 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -17,7 +17,7 @@
#include <linux/kernel.h>
#include <asm/module.h>
#include <asm/firmware.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <linux/sort.h>
#include <asm/setup.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index c0b351d61058..2e83702bf9ba 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -13,7 +13,7 @@
#include <asm/kprobes.h>
#include <asm/ptrace.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/sstep.h>
#include <asm/ppc-opcode.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ff61a3e7984c..7b739b9a91ab 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -54,7 +54,7 @@
#include <asm/firmware.h>
#include <asm/hw_irq.h>
#endif
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/exec.h>
#include <asm/livepatch.h>
#include <asm/cpu_has_feature.h>
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 4856e1a5161c..fbb7ebd8aa08 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -14,7 +14,7 @@
#include <linux/debugfs.h>
#include <asm/asm-prototypes.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/security_features.h>
#include <asm/sections.h>
#include <asm/setup.h>
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index e515c1f7d8d3..75dbf3e0d9c4 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -40,7 +40,7 @@
#include <asm/time.h>
#include <asm/serial.h>
#include <asm/udbg.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cpu_has_feature.h>
#include <asm/asm-prototypes.h>
#include <asm/kdump.h>
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 22f83fbbc762..3ebf5b9fbe98 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -60,7 +60,7 @@
#include <asm/xmon.h>
#include <asm/udbg.h>
#include <asm/kexec.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/ftrace.h>
#include <asm/opal.h>
#include <asm/cputhreads.h>
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
index 1502b7e439ca..7cfd0710e757 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -2,7 +2,7 @@
#include <linux/memory.h>
#include <linux/static_call.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
{
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..be1a245241b3 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -23,7 +23,7 @@
#include <linux/list.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/ftrace.h>
#include <asm/syscall.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..9e862ba55263 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -23,7 +23,7 @@
#include <linux/list.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/ftrace.h>
#include <asm/syscall.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index acdab294b340..af97fbb3c257 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -17,7 +17,7 @@
#include <asm/tlb.h>
#include <asm/tlbflush.h>
#include <asm/page.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>
static int __patch_mem(void *exec_addr, unsigned long val, void *patch_addr, bool is_dword)
diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
index b7201ba50b2e..587c8cf1230f 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -16,7 +16,7 @@
#include <linux/sched/mm.h>
#include <linux/stop_machine.h>
#include <asm/cputable.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/interrupt.h>
#include <asm/page.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/lib/test-code-patching.c b/arch/powerpc/lib/test-code-patching.c
index 8cd3b32f805b..1440d99630b3 100644
--- a/arch/powerpc/lib/test-code-patching.c
+++ b/arch/powerpc/lib/test-code-patching.c
@@ -6,7 +6,7 @@
#include <linux/vmalloc.h>
#include <linux/init.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
static int __init instr_is_branch_to_addr(const u32 *instr, unsigned long addr)
{
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 23c7805fb7b3..66b5b4fa1686 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -11,7 +11,7 @@
#include <asm/cpu_has_feature.h>
#include <asm/sstep.h>
#include <asm/ppc-opcode.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>
#define MAX_SUBTESTS 16
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 2db167f4233f..6978344edcb4 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -25,7 +25,7 @@
#include <asm/mmu.h>
#include <asm/machdep.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>
#include <mm/mmu_decl.h>
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index e1eadd03f133..47b22282269c 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -57,7 +57,7 @@
#include <asm/sections.h>
#include <asm/copro.h>
#include <asm/udbg.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/fadump.h>
#include <asm/firmware.h>
#include <asm/tm.h>
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index f2708c8629a5..6b783552403c 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -24,7 +24,7 @@
#include <linux/pgtable.h>
#include <asm/udbg.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include "internal.h"
diff --git a/arch/powerpc/mm/kasan/init_32.c b/arch/powerpc/mm/kasan/init_32.c
index aa9aa11927b2..03666d790a53 100644
--- a/arch/powerpc/mm/kasan/init_32.c
+++ b/arch/powerpc/mm/kasan/init_32.c
@@ -7,7 +7,7 @@
#include <linux/memblock.h>
#include <linux/sched/task.h>
#include <asm/pgalloc.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <mm/mmu_decl.h>
static pgprot_t __init kasan_prot_ro(void)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 1221c561b43a..c7708c8fad29 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -26,7 +26,7 @@
#include <asm/svm.h>
#include <asm/mmzone.h>
#include <asm/ftrace.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/setup.h>
#include <asm/fixmap.h>
diff --git a/arch/powerpc/mm/nohash/44x.c b/arch/powerpc/mm/nohash/44x.c
index 1beae802bb1c..6d10c6d8be71 100644
--- a/arch/powerpc/mm/nohash/44x.c
+++ b/arch/powerpc/mm/nohash/44x.c
@@ -24,7 +24,7 @@
#include <asm/mmu.h>
#include <asm/page.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/smp.h>
#include <mm/mmu_decl.h>
diff --git a/arch/powerpc/mm/nohash/book3e_pgtable.c b/arch/powerpc/mm/nohash/book3e_pgtable.c
index ad2a7c26f2a0..062e8785c1bb 100644
--- a/arch/powerpc/mm/nohash/book3e_pgtable.c
+++ b/arch/powerpc/mm/nohash/book3e_pgtable.c
@@ -10,7 +10,7 @@
#include <asm/pgalloc.h>
#include <asm/tlb.h>
#include <asm/dma.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <mm/mmu_decl.h>
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index b653a7be4cb1..0a650742f3a0 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -37,7 +37,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/tlb.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cputhreads.h>
#include <asm/hugetlb.h>
#include <asm/paca.h>
diff --git a/arch/powerpc/mm/nohash/tlb_64e.c b/arch/powerpc/mm/nohash/tlb_64e.c
index d26656b07b72..4f925adf2695 100644
--- a/arch/powerpc/mm/nohash/tlb_64e.c
+++ b/arch/powerpc/mm/nohash/tlb_64e.c
@@ -24,7 +24,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/tlb.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cputhreads.h>
#include <mm/mmu_decl.h>
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 2a36cc2e7e9e..68c6a13e6acb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -18,7 +18,7 @@
#include <linux/bpf.h>
#include <asm/kprobes.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include "bpf_jit.h"
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index 308a2e40d7be..1d2972229e3a 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -14,7 +14,7 @@
#include <asm/machdep.h>
#include <asm/firmware.h>
#include <asm/ptrace.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>
#define PERF_8xx_ID_CPU_CYCLES 1
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 42867469752d..a727cd111cac 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -16,7 +16,7 @@
#include <asm/machdep.h>
#include <asm/firmware.h>
#include <asm/ptrace.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/hw_irq.h>
#include <asm/interrupt.h>
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index e52b848b64b7..32fa5fb557c0 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -23,7 +23,7 @@
#include <asm/mpic.h>
#include <asm/cacheflush.h>
#include <asm/dbell.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cputhreads.h>
#include <asm/fsl_pm.h>
diff --git a/arch/powerpc/platforms/86xx/mpc86xx_smp.c b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
index 8a7e55acf090..9be33e41af6d 100644
--- a/arch/powerpc/platforms/86xx/mpc86xx_smp.c
+++ b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
@@ -12,7 +12,7 @@
#include <linux/delay.h>
#include <linux/pgtable.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/page.h>
#include <asm/pci-bridge.h>
#include <asm/mpic.h>
diff --git a/arch/powerpc/platforms/cell/smp.c b/arch/powerpc/platforms/cell/smp.c
index fee638fd8970..0e8f20ecca08 100644
--- a/arch/powerpc/platforms/cell/smp.c
+++ b/arch/powerpc/platforms/cell/smp.c
@@ -35,7 +35,7 @@
#include <asm/firmware.h>
#include <asm/rtas.h>
#include <asm/cputhreads.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include "interrupt.h"
#include <asm/udbg.h>
diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index d21b681f52fb..09e7fe24fac1 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -35,7 +35,7 @@
#include <asm/ptrace.h>
#include <linux/atomic.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/irq.h>
#include <asm/page.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index ad41dffe4d92..d98b933e4984 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -18,7 +18,7 @@
#include <asm/opal.h>
#include <asm/cputhreads.h>
#include <asm/cpuidle.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/smp.h>
#include <asm/runlatch.h>
#include <asm/dbell.h>
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 8f14f0581a21..6b746feeabe4 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -28,7 +28,7 @@
#include <asm/xive.h>
#include <asm/opal.h>
#include <asm/runlatch.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/dbell.h>
#include <asm/kvm_ppc.h>
#include <asm/ppc-opcode.h>
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index c597711ef20a..db99725e752b 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -39,7 +39,7 @@
#include <asm/xive.h>
#include <asm/dbell.h>
#include <asm/plpar_wrappers.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/svm.h>
#include <asm/kvm_guest.h>
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e6cddbb2305f..e76e1d5d0611 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -50,7 +50,7 @@
#include <asm/xive.h>
#include <asm/opal.h>
#include <asm/firmware.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>
#include <asm/inst.h>
#include <asm/interrupt.h>
diff --git a/arch/riscv/errata/andes/errata.c b/arch/riscv/errata/andes/errata.c
index fc1a34faa5f3..dcc9d1ee5ffd 100644
--- a/arch/riscv/errata/andes/errata.c
+++ b/arch/riscv/errata/andes/errata.c
@@ -13,7 +13,7 @@
#include <asm/alternative.h>
#include <asm/cacheflush.h>
#include <asm/errata_list.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/processor.h>
#include <asm/sbi.h>
#include <asm/vendorid_list.h>
diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
index cea3b96ade11..38aac2c47845 100644
--- a/arch/riscv/errata/sifive/errata.c
+++ b/arch/riscv/errata/sifive/errata.c
@@ -8,7 +8,7 @@
#include <linux/module.h>
#include <linux/string.h>
#include <linux/bug.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/alternative.h>
#include <asm/vendorid_list.h>
#include <asm/errata_list.h>
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index f5120e07c318..e24770a77932 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -16,7 +16,7 @@
#include <asm/errata_list.h>
#include <asm/hwprobe.h>
#include <asm/io.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/vendorid_list.h>
#include <asm/vendor_extensions.h>
diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/text-patching.h
similarity index 100%
rename from arch/riscv/include/asm/patch.h
rename to arch/riscv/include/asm/text-patching.h
diff --git a/arch/riscv/include/asm/uprobes.h b/arch/riscv/include/asm/uprobes.h
index 3fc7deda9190..5008f76cdc27 100644
--- a/arch/riscv/include/asm/uprobes.h
+++ b/arch/riscv/include/asm/uprobes.h
@@ -4,7 +4,7 @@
#define _ASM_RISCV_UPROBES_H
#include <asm/probes.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/bug.h>
#define MAX_UINSN_BYTES 8
diff --git a/arch/riscv/kernel/alternative.c b/arch/riscv/kernel/alternative.c
index 0128b161bfda..7eb3cb1215c6 100644
--- a/arch/riscv/kernel/alternative.c
+++ b/arch/riscv/kernel/alternative.c
@@ -18,7 +18,7 @@
#include <asm/sbi.h>
#include <asm/csr.h>
#include <asm/insn.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
struct cpu_manufacturer_info_t {
unsigned long vendor_id;
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 3a8eeaa9310c..826f46b21f2e 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -20,7 +20,8 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwcap.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
+#include <asm/hwprobe.h>
#include <asm/processor.h>
#include <asm/sbi.h>
#include <asm/vector.h>
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 4b95c574fd04..a7620ef93b6c 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -10,7 +10,7 @@
#include <linux/memory.h>
#include <linux/stop_machine.h>
#include <asm/cacheflush.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#ifdef CONFIG_DYNAMIC_FTRACE
void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
index 11ad789c60c6..6eee6f736f68 100644
--- a/arch/riscv/kernel/jump_label.c
+++ b/arch/riscv/kernel/jump_label.c
@@ -10,7 +10,7 @@
#include <linux/mutex.h>
#include <asm/bug.h>
#include <asm/cacheflush.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#define RISCV_INSN_NOP 0x00000013U
#define RISCV_INSN_JAL 0x0000006fU
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 34ef522f07a8..db13c9ddf9e3 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -13,7 +13,7 @@
#include <asm/cacheflush.h>
#include <asm/fixmap.h>
#include <asm/ftrace.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>
struct patch_insn {
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index 474a65213657..380a0e8cecc0 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -12,7 +12,7 @@
#include <asm/sections.h>
#include <asm/cacheflush.h>
#include <asm/bug.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include "decode-insn.h"
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 99f34409fb60..c9f6c4d56012 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -10,7 +10,7 @@
#include <linux/filter.h>
#include <linux/memory.h>
#include <linux/stop_machine.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/cfi.h>
#include <asm/percpu.h>
#include "bpf_jit.h"
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 6de753c667f4..f8cd2f70a7fb 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -9,7 +9,7 @@
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/memory.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/cfi.h>
#include "bpf_jit.h"
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index fc44d9c88b41..4d3f10ed8275 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,3 +3,4 @@ generated-y += syscall_table.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 43b0ae4c2c21..17ee8a273aa6 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,3 +4,4 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += text-patching.h
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 8e594cda6d77..f8de31a0c5d1 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -468,6 +468,11 @@ void *text_poke(void *addr, const void *opcode, size_t len)
return memcpy(addr, opcode, len);
}
+void *text_poke_copy(void *addr, const void *opcode, size_t len)
+{
+ return text_poke(addr, opcode, len);
+}
+
void text_poke_sync(void)
{
}
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 6259f1937fe7..ab9e143ec9fe 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -35,6 +35,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void text_poke_sync(void);
extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
+#define text_poke_copy text_poke_copy
extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok);
extern void *text_poke_set(void *addr, int c, size_t len);
extern int poke_int3_handler(struct pt_regs *regs);
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index fa07c686cbcc..cc5dba738389 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -8,3 +8,4 @@ generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/include/asm-generic/text-patching.h b/include/asm-generic/text-patching.h
new file mode 100644
index 000000000000..2245c641b741
--- /dev/null
+++ b/include/asm-generic/text-patching.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_TEXT_PATCHING_H
+#define _ASM_GENERIC_TEXT_PATCHING_H
+
+#endif /* _ASM_GENERIC_TEXT_PATCHING_H */
diff --git a/include/linux/text-patching.h b/include/linux/text-patching.h
new file mode 100644
index 000000000000..ad5877ab0855
--- /dev/null
+++ b/include/linux/text-patching.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TEXT_PATCHING_H
+#define _LINUX_TEXT_PATCHING_H
+
+#include <asm/text-patching.h>
+
+#ifndef text_poke_copy
+static inline void *text_poke_copy(void *dst, const void *src, size_t len)
+{
+ return memcpy(dst, src, len);
+}
+#define text_poke_copy text_poke_copy
+#endif
+
+#endif /* _LINUX_TEXT_PATCHING_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v7 4/8] module: prepare to handle ROX allocations for text
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
` (2 preceding siblings ...)
2024-10-23 16:27 ` [PATCH v7 3/8] asm-generic: introduce text-patching.h Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 5/8] arch: introduce set_direct_map_valid_noflush() Mike Rapoport
` (4 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
In order to support ROX allocations for module text, it is necessary to
handle modifications to the code, such as relocations and alternatives
patching, without write access to that memory.
One option is to use text patching, but this would make module loading
extremely slow and will expose executable code that is not finally formed.
A better way is to have memory allocated with ROX permissions contain
invalid instructions and keep a writable, but not executable copy of the
module text. The relocations and alternative patches would be done on the
writable copy using the addresses of the ROX memory.
Once the module is completely ready, the updated text will be copied to ROX
memory using text patching in one go and the writable copy will be freed.
Add support for that to module initialization code and provide necessary
interfaces in execmem.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewd-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
include/linux/execmem.h | 23 +++++++++++
include/linux/module.h | 16 ++++++++
include/linux/moduleloader.h | 4 ++
kernel/module/debug_kmemleak.c | 3 +-
kernel/module/main.c | 74 ++++++++++++++++++++++++++++++----
kernel/module/strict_rwx.c | 3 ++
mm/execmem.c | 11 +++++
7 files changed, 126 insertions(+), 8 deletions(-)
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 32cef1144117..dfdf19f8a5e8 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -46,9 +46,11 @@ enum execmem_type {
/**
* enum execmem_range_flags - options for executable memory allocations
* @EXECMEM_KASAN_SHADOW: allocate kasan shadow
+ * @EXECMEM_ROX_CACHE: allocations should use ROX cache of huge pages
*/
enum execmem_range_flags {
EXECMEM_KASAN_SHADOW = (1 << 0),
+ EXECMEM_ROX_CACHE = (1 << 1),
};
/**
@@ -123,6 +125,27 @@ void *execmem_alloc(enum execmem_type type, size_t size);
*/
void execmem_free(void *ptr);
+/**
+ * execmem_update_copy - copy an update to executable memory
+ * @dst: destination address to update
+ * @src: source address containing the data
+ * @size: how many bytes of memory shold be copied
+ *
+ * Copy @size bytes from @src to @dst using text poking if the memory at
+ * @dst is read-only.
+ *
+ * Return: a pointer to @dst or NULL on error
+ */
+void *execmem_update_copy(void *dst, const void *src, size_t size);
+
+/**
+ * execmem_is_rox - check if execmem is read-only
+ * @type - the execmem type to check
+ *
+ * Return: %true if the @type is read-only, %false if it's writable
+ */
+bool execmem_is_rox(enum execmem_type type);
+
#if defined(CONFIG_EXECMEM) && !defined(CONFIG_ARCH_WANTS_EXECMEM_LATE)
void execmem_init(void);
#else
diff --git a/include/linux/module.h b/include/linux/module.h
index 88ecc5e9f523..2a9386cbdf85 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -367,6 +367,8 @@ enum mod_mem_type {
struct module_memory {
void *base;
+ void *rw_copy;
+ bool is_rox;
unsigned int size;
#ifdef CONFIG_MODULES_TREE_LOOKUP
@@ -767,6 +769,15 @@ static inline bool is_livepatch_module(struct module *mod)
void set_module_sig_enforced(void);
+void *__module_writable_address(struct module *mod, void *loc);
+
+static inline void *module_writable_address(struct module *mod, void *loc)
+{
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX) || !mod)
+ return loc;
+ return __module_writable_address(mod, loc);
+}
+
#else /* !CONFIG_MODULES... */
static inline struct module *__module_address(unsigned long addr)
@@ -874,6 +885,11 @@ static inline bool module_is_coming(struct module *mod)
{
return false;
}
+
+static inline void *module_writable_address(struct module *mod, void *loc)
+{
+ return loc;
+}
#endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index e395461d59e5..1f5507ba5a12 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -108,6 +108,10 @@ int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *mod);
+int module_post_finalize(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ struct module *mod);
+
#ifdef CONFIG_MODULES
void flush_module_init_free_work(void);
#else
diff --git a/kernel/module/debug_kmemleak.c b/kernel/module/debug_kmemleak.c
index b4cc03842d70..df873dad049d 100644
--- a/kernel/module/debug_kmemleak.c
+++ b/kernel/module/debug_kmemleak.c
@@ -14,7 +14,8 @@ void kmemleak_load_module(const struct module *mod,
{
/* only scan writable, non-executable sections */
for_each_mod_mem_type(type) {
- if (type != MOD_DATA && type != MOD_INIT_DATA)
+ if (type != MOD_DATA && type != MOD_INIT_DATA &&
+ !mod->mem[type].is_rox)
kmemleak_no_scan(mod->mem[type].base);
}
}
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 49b9bca9de12..73b588fe98d4 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1189,6 +1189,18 @@ void __weak module_arch_freeing_init(struct module *mod)
{
}
+void *__module_writable_address(struct module *mod, void *loc)
+{
+ for_class_mod_mem_type(type, text) {
+ struct module_memory *mem = &mod->mem[type];
+
+ if (loc >= mem->base && loc < mem->base + mem->size)
+ return loc + (mem->rw_copy - mem->base);
+ }
+
+ return loc;
+}
+
static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
{
unsigned int size = PAGE_ALIGN(mod->mem[type].size);
@@ -1206,6 +1218,23 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
if (!ptr)
return -ENOMEM;
+ mod->mem[type].base = ptr;
+
+ if (execmem_is_rox(execmem_type)) {
+ ptr = vzalloc(size);
+
+ if (!ptr) {
+ execmem_free(mod->mem[type].base);
+ return -ENOMEM;
+ }
+
+ mod->mem[type].rw_copy = ptr;
+ mod->mem[type].is_rox = true;
+ } else {
+ mod->mem[type].rw_copy = mod->mem[type].base;
+ memset(mod->mem[type].base, 0, size);
+ }
+
/*
* The pointer to these blocks of memory are stored on the module
* structure and we keep that around so long as the module is
@@ -1219,16 +1248,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
*/
kmemleak_not_leak(ptr);
- memset(ptr, 0, size);
- mod->mem[type].base = ptr;
-
return 0;
}
static void module_memory_free(struct module *mod, enum mod_mem_type type,
bool unload_codetags)
{
- void *ptr = mod->mem[type].base;
+ struct module_memory *mem = &mod->mem[type];
+ void *ptr = mem->base;
+
+ if (mem->is_rox)
+ vfree(mem->rw_copy);
if (!unload_codetags && mod_mem_type_is_core_data(type))
return;
@@ -2251,6 +2281,7 @@ static int move_module(struct module *mod, struct load_info *info)
for_each_mod_mem_type(type) {
if (!mod->mem[type].size) {
mod->mem[type].base = NULL;
+ mod->mem[type].rw_copy = NULL;
continue;
}
@@ -2267,11 +2298,14 @@ static int move_module(struct module *mod, struct load_info *info)
void *dest;
Elf_Shdr *shdr = &info->sechdrs[i];
enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
+ unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
+ unsigned long addr;
if (!(shdr->sh_flags & SHF_ALLOC))
continue;
- dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
+ addr = (unsigned long)mod->mem[type].base + offset;
+ dest = mod->mem[type].rw_copy + offset;
if (shdr->sh_type != SHT_NOBITS) {
/*
@@ -2293,7 +2327,7 @@ static int move_module(struct module *mod, struct load_info *info)
* users of info can keep taking advantage and using the newly
* minted official memory area.
*/
- shdr->sh_addr = (unsigned long)dest;
+ shdr->sh_addr = addr;
pr_debug("\t0x%lx 0x%.8lx %s\n", (long)shdr->sh_addr,
(long)shdr->sh_size, info->secstrings + shdr->sh_name);
}
@@ -2441,8 +2475,17 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
}
+int __weak module_post_finalize(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ struct module *me)
+{
+ return 0;
+}
+
static int post_relocation(struct module *mod, const struct load_info *info)
{
+ int ret;
+
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -2454,7 +2497,24 @@ static int post_relocation(struct module *mod, const struct load_info *info)
add_kallsyms(mod, info);
/* Arch-specific module finalizing. */
- return module_finalize(info->hdr, info->sechdrs, mod);
+ ret = module_finalize(info->hdr, info->sechdrs, mod);
+ if (ret)
+ return ret;
+
+ for_each_mod_mem_type(type) {
+ struct module_memory *mem = &mod->mem[type];
+
+ if (mem->is_rox) {
+ if (!execmem_update_copy(mem->base, mem->rw_copy,
+ mem->size))
+ return -ENOMEM;
+
+ vfree(mem->rw_copy);
+ mem->rw_copy = NULL;
+ }
+ }
+
+ return module_post_finalize(info->hdr, info->sechdrs, mod);
}
/* Call module constructors. */
diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c
index c45caa4690e5..239e5013359d 100644
--- a/kernel/module/strict_rwx.c
+++ b/kernel/module/strict_rwx.c
@@ -34,6 +34,9 @@ int module_enable_text_rox(const struct module *mod)
for_class_mod_mem_type(type, text) {
int ret;
+ if (mod->mem[type].is_rox)
+ continue;
+
if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
ret = module_set_memory(mod, type, set_memory_rox);
else
diff --git a/mm/execmem.c b/mm/execmem.c
index 0c4b36bc6d10..0f6691e9ffe6 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -10,6 +10,7 @@
#include <linux/vmalloc.h>
#include <linux/execmem.h>
#include <linux/moduleloader.h>
+#include <linux/text-patching.h>
static struct execmem_info *execmem_info __ro_after_init;
static struct execmem_info default_execmem_info __ro_after_init;
@@ -69,6 +70,16 @@ void execmem_free(void *ptr)
vfree(ptr);
}
+void *execmem_update_copy(void *dst, const void *src, size_t size)
+{
+ return text_poke_copy(dst, src, size);
+}
+
+bool execmem_is_rox(enum execmem_type type)
+{
+ return !!(execmem_info->ranges[type].flags & EXECMEM_ROX_CACHE);
+}
+
static bool execmem_validate(struct execmem_info *info)
{
struct execmem_range *r = &info->ranges[EXECMEM_DEFAULT];
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v7 5/8] arch: introduce set_direct_map_valid_noflush()
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
` (3 preceding siblings ...)
2024-10-23 16:27 ` [PATCH v7 4/8] module: prepare to handle ROX allocations for text Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text Mike Rapoport
` (3 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Add an API that will allow updates of the direct/linear map for a set of
physically contiguous pages.
It will be used in the following patches.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
arch/arm64/include/asm/set_memory.h | 1 +
arch/arm64/mm/pageattr.c | 10 ++++++++++
arch/loongarch/include/asm/set_memory.h | 1 +
arch/loongarch/mm/pageattr.c | 19 +++++++++++++++++++
arch/riscv/include/asm/set_memory.h | 1 +
arch/riscv/mm/pageattr.c | 15 +++++++++++++++
arch/s390/include/asm/set_memory.h | 1 +
arch/s390/mm/pageattr.c | 11 +++++++++++
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/mm/pat/set_memory.c | 8 ++++++++
include/linux/set_memory.h | 6 ++++++
11 files changed, 74 insertions(+)
diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h
index 917761feeffd..98088c043606 100644
--- a/arch/arm64/include/asm/set_memory.h
+++ b/arch/arm64/include/asm/set_memory.h
@@ -13,6 +13,7 @@ int set_memory_valid(unsigned long addr, int numpages, int enable);
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
bool kernel_page_present(struct page *page);
#endif /* _ASM_ARM64_SET_MEMORY_H */
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 0e270a1c51e6..01225900293a 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -192,6 +192,16 @@ int set_direct_map_default_noflush(struct page *page)
PAGE_SIZE, change_page_range, &data);
}
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+
+ if (!can_set_direct_map())
+ return 0;
+
+ return set_memory_valid(addr, nr, valid);
+}
+
#ifdef CONFIG_DEBUG_PAGEALLOC
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
diff --git a/arch/loongarch/include/asm/set_memory.h b/arch/loongarch/include/asm/set_memory.h
index d70505b6676c..55dfaefd02c8 100644
--- a/arch/loongarch/include/asm/set_memory.h
+++ b/arch/loongarch/include/asm/set_memory.h
@@ -17,5 +17,6 @@ int set_memory_rw(unsigned long addr, int numpages);
bool kernel_page_present(struct page *page);
int set_direct_map_default_noflush(struct page *page);
int set_direct_map_invalid_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
#endif /* _ASM_LOONGARCH_SET_MEMORY_H */
diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
index ffd8d76021d4..bf8678248444 100644
--- a/arch/loongarch/mm/pageattr.c
+++ b/arch/loongarch/mm/pageattr.c
@@ -216,3 +216,22 @@ int set_direct_map_invalid_noflush(struct page *page)
return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
}
+
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgprot_t set, clear;
+
+ if (addr < vm_map_base)
+ return 0;
+
+ if (valid) {
+ set = PAGE_KERNEL;
+ clear = __pgprot(0);
+ } else {
+ set = __pgprot(0);
+ clear = __pgprot(_PAGE_PRESENT | _PAGE_VALID);
+ }
+
+ return __set_memory(addr, 1, set, clear);
+}
diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h
index ab92fc84e1fc..ea263d3683ef 100644
--- a/arch/riscv/include/asm/set_memory.h
+++ b/arch/riscv/include/asm/set_memory.h
@@ -42,6 +42,7 @@ static inline int set_kernel_memory(char *startp, char *endp,
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
bool kernel_page_present(struct page *page);
#endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 271d01a5ba4d..d815448758a1 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -386,6 +386,21 @@ int set_direct_map_default_noflush(struct page *page)
PAGE_KERNEL, __pgprot(_PAGE_EXEC));
}
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+ pgprot_t set, clear;
+
+ if (valid) {
+ set = PAGE_KERNEL;
+ clear = __pgprot(_PAGE_EXEC);
+ } else {
+ set = __pgprot(0);
+ clear = __pgprot(_PAGE_PRESENT);
+ }
+
+ return __set_memory((unsigned long)page_address(page), nr, set, clear);
+}
+
#ifdef CONFIG_DEBUG_PAGEALLOC
static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data)
{
diff --git a/arch/s390/include/asm/set_memory.h b/arch/s390/include/asm/set_memory.h
index 06fbabe2f66c..240bcfbdcdce 100644
--- a/arch/s390/include/asm/set_memory.h
+++ b/arch/s390/include/asm/set_memory.h
@@ -62,5 +62,6 @@ __SET_MEMORY_FUNC(set_memory_4k, SET_MEMORY_4K)
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
#endif
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index 5f805ad42d4c..4c7ee74aa130 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -406,6 +406,17 @@ int set_direct_map_default_noflush(struct page *page)
return __set_memory((unsigned long)page_to_virt(page), 1, SET_MEMORY_DEF);
}
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+ unsigned long flags;
+
+ if (valid)
+ flags = SET_MEMORY_DEF;
+ else
+ flags = SET_MEMORY_INV;
+
+ return __set_memory((unsigned long)page_to_virt(page), nr, flags);
+}
#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
static void ipte_range(pte_t *pte, unsigned long address, int nr)
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 4b2abce2e3e7..cc62ef70ccc0 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -89,6 +89,7 @@ int set_pages_rw(struct page *page, int numpages);
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
bool kernel_page_present(struct page *page);
extern int kernel_set_to_readonly;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 44f7b2ea6a07..069e421c2247 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2444,6 +2444,14 @@ int set_direct_map_default_noflush(struct page *page)
return __set_pages_p(page, 1);
}
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+ if (valid)
+ return __set_pages_p(page, nr);
+
+ return __set_pages_np(page, nr);
+}
+
#ifdef CONFIG_DEBUG_PAGEALLOC
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index e7aec20fb44f..3030d9245f5a 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page)
return 0;
}
+static inline int set_direct_map_valid_noflush(struct page *page,
+ unsigned nr, bool valid)
+{
+ return 0;
+}
+
static inline bool kernel_page_present(struct page *page)
{
return true;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
` (4 preceding siblings ...)
2024-10-23 16:27 ` [PATCH v7 5/8] arch: introduce set_direct_map_valid_noflush() Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2024-11-04 23:27 ` Nathan Chancellor
2024-10-23 16:27 ` [PATCH v7 7/8] execmem: add support for cache of large ROX pages Mike Rapoport
` (2 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
When module text memory will be allocated with ROX permissions, the
memory at the actual address where the module will live will contain
invalid instructions and there will be a writable copy that contains the
actual module code.
Update relocations and alternatives patching to deal with it.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
arch/um/kernel/um_arch.c | 11 +-
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +--
arch/x86/kernel/alternative.c | 181 +++++++++++++++++------------
arch/x86/kernel/ftrace.c | 30 ++---
arch/x86/kernel/module.c | 45 ++++---
6 files changed, 167 insertions(+), 117 deletions(-)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index f8de31a0c5d1..e8e8b54b3037 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -435,24 +435,25 @@ void __init arch_cpu_finalize_init(void)
os_check_bugs();
}
-void apply_seal_endbr(s32 *start, s32 *end)
+void apply_seal_endbr(s32 *start, s32 *end, struct module *mod)
{
}
-void apply_retpolines(s32 *start, s32 *end)
+void apply_retpolines(s32 *start, s32 *end, struct module *mod)
{
}
-void apply_returns(s32 *start, s32 *end)
+void apply_returns(s32 *start, s32 *end, struct module *mod)
{
}
void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
}
-void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+void apply_alternatives(struct alt_instr *start, struct alt_instr *end,
+ struct module *mod)
{
}
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b8fed8b8b9cc..ed21151923c3 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -54,7 +54,8 @@ int __init init_vdso_image(const struct vdso_image *image)
apply_alternatives((struct alt_instr *)(image->data + image->alt),
(struct alt_instr *)(image->data + image->alt +
- image->alt_len));
+ image->alt_len),
+ NULL);
return 0;
}
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index ca9ae606aab9..dc03a647776d 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -96,16 +96,16 @@ extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
* instructions were patched in already:
*/
extern int alternatives_patched;
+struct module;
extern void alternative_instructions(void);
-extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
-extern void apply_retpolines(s32 *start, s32 *end);
-extern void apply_returns(s32 *start, s32 *end);
-extern void apply_seal_endbr(s32 *start, s32 *end);
+extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end,
+ struct module *mod);
+extern void apply_retpolines(s32 *start, s32 *end, struct module *mod);
+extern void apply_returns(s32 *start, s32 *end, struct module *mod);
+extern void apply_seal_endbr(s32 *start, s32 *end, struct module *mod);
extern void apply_fineibt(s32 *start_retpoline, s32 *end_retpoine,
- s32 *start_cfi, s32 *end_cfi);
-
-struct module;
+ s32 *start_cfi, s32 *end_cfi, struct module *mod);
struct callthunk_sites {
s32 *call_start, *call_end;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d17518ca19b8..3407efc26528 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -392,8 +392,10 @@ EXPORT_SYMBOL(BUG_func);
* Rewrite the "call BUG_func" replacement to point to the target of the
* indirect pv_ops call "call *disp(%ip)".
*/
-static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a)
+static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a,
+ struct module *mod)
{
+ u8 *wr_instr = module_writable_address(mod, instr);
void *target, *bug = &BUG_func;
s32 disp;
@@ -403,14 +405,14 @@ static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a)
}
if (a->instrlen != 6 ||
- instr[0] != CALL_RIP_REL_OPCODE ||
- instr[1] != CALL_RIP_REL_MODRM) {
+ wr_instr[0] != CALL_RIP_REL_OPCODE ||
+ wr_instr[1] != CALL_RIP_REL_MODRM) {
pr_err("ALT_FLAG_DIRECT_CALL set for unrecognized indirect call\n");
BUG();
}
/* Skip CALL_RIP_REL_OPCODE and CALL_RIP_REL_MODRM */
- disp = *(s32 *)(instr + 2);
+ disp = *(s32 *)(wr_instr + 2);
#ifdef CONFIG_X86_64
/* ff 15 00 00 00 00 call *0x0(%rip) */
/* target address is stored at "next instruction + disp". */
@@ -448,7 +450,8 @@ static inline u8 * instr_va(struct alt_instr *i)
* to refetch changed I$ lines.
*/
void __init_or_module noinline apply_alternatives(struct alt_instr *start,
- struct alt_instr *end)
+ struct alt_instr *end,
+ struct module *mod)
{
u8 insn_buff[MAX_PATCH_LEN];
u8 *instr, *replacement;
@@ -477,6 +480,7 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
*/
for (a = start; a < end; a++) {
int insn_buff_sz = 0;
+ u8 *wr_instr, *wr_replacement;
/*
* In case of nested ALTERNATIVE()s the outer alternative might
@@ -490,7 +494,11 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
}
instr = instr_va(a);
+ wr_instr = module_writable_address(mod, instr);
+
replacement = (u8 *)&a->repl_offset + a->repl_offset;
+ wr_replacement = module_writable_address(mod, replacement);
+
BUG_ON(a->instrlen > sizeof(insn_buff));
BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
@@ -501,9 +509,9 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
* patch if feature is *NOT* present.
*/
if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) {
- memcpy(insn_buff, instr, a->instrlen);
+ memcpy(insn_buff, wr_instr, a->instrlen);
optimize_nops(instr, insn_buff, a->instrlen);
- text_poke_early(instr, insn_buff, a->instrlen);
+ text_poke_early(wr_instr, insn_buff, a->instrlen);
continue;
}
@@ -513,11 +521,12 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
instr, instr, a->instrlen,
replacement, a->replacementlen, a->flags);
- memcpy(insn_buff, replacement, a->replacementlen);
+ memcpy(insn_buff, wr_replacement, a->replacementlen);
insn_buff_sz = a->replacementlen;
if (a->flags & ALT_FLAG_DIRECT_CALL) {
- insn_buff_sz = alt_replace_call(instr, insn_buff, a);
+ insn_buff_sz = alt_replace_call(instr, insn_buff, a,
+ mod);
if (insn_buff_sz < 0)
continue;
}
@@ -527,11 +536,11 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
apply_relocation(insn_buff, instr, a->instrlen, replacement, a->replacementlen);
- DUMP_BYTES(ALT, instr, a->instrlen, "%px: old_insn: ", instr);
+ DUMP_BYTES(ALT, wr_instr, a->instrlen, "%px: old_insn: ", instr);
DUMP_BYTES(ALT, replacement, a->replacementlen, "%px: rpl_insn: ", replacement);
DUMP_BYTES(ALT, insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
- text_poke_early(instr, insn_buff, insn_buff_sz);
+ text_poke_early(wr_instr, insn_buff, insn_buff_sz);
}
kasan_enable_current();
@@ -722,18 +731,20 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
/*
* Generated by 'objtool --retpoline'.
*/
-void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end,
+ struct module *mod)
{
s32 *s;
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = module_writable_address(mod, addr);
struct insn insn;
int len, ret;
u8 bytes[16];
u8 op1, op2;
- ret = insn_decode_kernel(&insn, addr);
+ ret = insn_decode_kernel(&insn, wr_addr);
if (WARN_ON_ONCE(ret < 0))
continue;
@@ -761,9 +772,9 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
len = patch_retpoline(addr, &insn, bytes);
if (len == insn.length) {
optimize_nops(addr, bytes, len);
- DUMP_BYTES(RETPOLINE, ((u8*)addr), len, "%px: orig: ", addr);
+ DUMP_BYTES(RETPOLINE, ((u8*)wr_addr), len, "%px: orig: ", addr);
DUMP_BYTES(RETPOLINE, ((u8*)bytes), len, "%px: repl: ", addr);
- text_poke_early(addr, bytes, len);
+ text_poke_early(wr_addr, bytes, len);
}
}
}
@@ -799,7 +810,8 @@ static int patch_return(void *addr, struct insn *insn, u8 *bytes)
return i;
}
-void __init_or_module noinline apply_returns(s32 *start, s32 *end)
+void __init_or_module noinline apply_returns(s32 *start, s32 *end,
+ struct module *mod)
{
s32 *s;
@@ -808,12 +820,13 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
for (s = start; s < end; s++) {
void *dest = NULL, *addr = (void *)s + *s;
+ void *wr_addr = module_writable_address(mod, addr);
struct insn insn;
int len, ret;
u8 bytes[16];
u8 op;
- ret = insn_decode_kernel(&insn, addr);
+ ret = insn_decode_kernel(&insn, wr_addr);
if (WARN_ON_ONCE(ret < 0))
continue;
@@ -833,32 +846,35 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
len = patch_return(addr, &insn, bytes);
if (len == insn.length) {
- DUMP_BYTES(RET, ((u8*)addr), len, "%px: orig: ", addr);
+ DUMP_BYTES(RET, ((u8*)wr_addr), len, "%px: orig: ", addr);
DUMP_BYTES(RET, ((u8*)bytes), len, "%px: repl: ", addr);
- text_poke_early(addr, bytes, len);
+ text_poke_early(wr_addr, bytes, len);
}
}
}
#else
-void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
+void __init_or_module noinline apply_returns(s32 *start, s32 *end,
+ struct module *mod) { }
#endif /* CONFIG_MITIGATION_RETHUNK */
#else /* !CONFIG_MITIGATION_RETPOLINE || !CONFIG_OBJTOOL */
-void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
-void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end,
+ struct module *mod) { }
+void __init_or_module noinline apply_returns(s32 *start, s32 *end,
+ struct module *mod) { }
#endif /* CONFIG_MITIGATION_RETPOLINE && CONFIG_OBJTOOL */
#ifdef CONFIG_X86_KERNEL_IBT
-static void poison_cfi(void *addr);
+static void poison_cfi(void *addr, void *wr_addr);
-static void __init_or_module poison_endbr(void *addr, bool warn)
+static void __init_or_module poison_endbr(void *addr, void *wr_addr, bool warn)
{
u32 endbr, poison = gen_endbr_poison();
- if (WARN_ON_ONCE(get_kernel_nofault(endbr, addr)))
+ if (WARN_ON_ONCE(get_kernel_nofault(endbr, wr_addr)))
return;
if (!is_endbr(endbr)) {
@@ -873,7 +889,7 @@ static void __init_or_module poison_endbr(void *addr, bool warn)
*/
DUMP_BYTES(ENDBR, ((u8*)addr), 4, "%px: orig: ", addr);
DUMP_BYTES(ENDBR, ((u8*)&poison), 4, "%px: repl: ", addr);
- text_poke_early(addr, &poison, 4);
+ text_poke_early(wr_addr, &poison, 4);
}
/*
@@ -882,22 +898,23 @@ static void __init_or_module poison_endbr(void *addr, bool warn)
* Seal the functions for indirect calls by clobbering the ENDBR instructions
* and the kCFI hash value.
*/
-void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end)
+void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct module *mod)
{
s32 *s;
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = module_writable_address(mod, addr);
- poison_endbr(addr, true);
+ poison_endbr(addr, wr_addr, true);
if (IS_ENABLED(CONFIG_FINEIBT))
- poison_cfi(addr - 16);
+ poison_cfi(addr - 16, wr_addr - 16);
}
}
#else
-void __init_or_module apply_seal_endbr(s32 *start, s32 *end) { }
+void __init_or_module apply_seal_endbr(s32 *start, s32 *end, struct module *mod) { }
#endif /* CONFIG_X86_KERNEL_IBT */
@@ -1119,7 +1136,7 @@ static u32 decode_caller_hash(void *addr)
}
/* .retpoline_sites */
-static int cfi_disable_callers(s32 *start, s32 *end)
+static int cfi_disable_callers(s32 *start, s32 *end, struct module *mod)
{
/*
* Disable kCFI by patching in a JMP.d8, this leaves the hash immediate
@@ -1131,20 +1148,23 @@ static int cfi_disable_callers(s32 *start, s32 *end)
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr;
u32 hash;
addr -= fineibt_caller_size;
- hash = decode_caller_hash(addr);
+ wr_addr = module_writable_address(mod, addr);
+ hash = decode_caller_hash(wr_addr);
+
if (!hash) /* nocfi callers */
continue;
- text_poke_early(addr, jmp, 2);
+ text_poke_early(wr_addr, jmp, 2);
}
return 0;
}
-static int cfi_enable_callers(s32 *start, s32 *end)
+static int cfi_enable_callers(s32 *start, s32 *end, struct module *mod)
{
/*
* Re-enable kCFI, undo what cfi_disable_callers() did.
@@ -1154,106 +1174,115 @@ static int cfi_enable_callers(s32 *start, s32 *end)
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr;
u32 hash;
addr -= fineibt_caller_size;
- hash = decode_caller_hash(addr);
+ wr_addr = module_writable_address(mod, addr);
+ hash = decode_caller_hash(wr_addr);
if (!hash) /* nocfi callers */
continue;
- text_poke_early(addr, mov, 2);
+ text_poke_early(wr_addr, mov, 2);
}
return 0;
}
/* .cfi_sites */
-static int cfi_rand_preamble(s32 *start, s32 *end)
+static int cfi_rand_preamble(s32 *start, s32 *end, struct module *mod)
{
s32 *s;
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = module_writable_address(mod, addr);
u32 hash;
- hash = decode_preamble_hash(addr);
+ hash = decode_preamble_hash(wr_addr);
if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
addr, addr, 5, addr))
return -EINVAL;
hash = cfi_rehash(hash);
- text_poke_early(addr + 1, &hash, 4);
+ text_poke_early(wr_addr + 1, &hash, 4);
}
return 0;
}
-static int cfi_rewrite_preamble(s32 *start, s32 *end)
+static int cfi_rewrite_preamble(s32 *start, s32 *end, struct module *mod)
{
s32 *s;
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = module_writable_address(mod, addr);
u32 hash;
- hash = decode_preamble_hash(addr);
+ hash = decode_preamble_hash(wr_addr);
if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
addr, addr, 5, addr))
return -EINVAL;
- text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size);
- WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678);
- text_poke_early(addr + fineibt_preamble_hash, &hash, 4);
+ text_poke_early(wr_addr, fineibt_preamble_start, fineibt_preamble_size);
+ WARN_ON(*(u32 *)(wr_addr + fineibt_preamble_hash) != 0x12345678);
+ text_poke_early(wr_addr + fineibt_preamble_hash, &hash, 4);
}
return 0;
}
-static void cfi_rewrite_endbr(s32 *start, s32 *end)
+static void cfi_rewrite_endbr(s32 *start, s32 *end, struct module *mod)
{
s32 *s;
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = module_writable_address(mod, addr);
- poison_endbr(addr+16, false);
+ poison_endbr(addr+16, wr_addr, false);
}
}
/* .retpoline_sites */
-static int cfi_rand_callers(s32 *start, s32 *end)
+static int cfi_rand_callers(s32 *start, s32 *end, struct module *mod)
{
s32 *s;
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr;
u32 hash;
addr -= fineibt_caller_size;
- hash = decode_caller_hash(addr);
+ wr_addr = module_writable_address(mod, addr);
+ hash = decode_caller_hash(wr_addr);
if (hash) {
hash = -cfi_rehash(hash);
- text_poke_early(addr + 2, &hash, 4);
+ text_poke_early(wr_addr + 2, &hash, 4);
}
}
return 0;
}
-static int cfi_rewrite_callers(s32 *start, s32 *end)
+static int cfi_rewrite_callers(s32 *start, s32 *end, struct module *mod)
{
s32 *s;
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr;
u32 hash;
addr -= fineibt_caller_size;
- hash = decode_caller_hash(addr);
+ wr_addr = module_writable_address(mod, addr);
+ hash = decode_caller_hash(wr_addr);
if (hash) {
- text_poke_early(addr, fineibt_caller_start, fineibt_caller_size);
- WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678);
- text_poke_early(addr + fineibt_caller_hash, &hash, 4);
+ text_poke_early(wr_addr, fineibt_caller_start, fineibt_caller_size);
+ WARN_ON(*(u32 *)(wr_addr + fineibt_caller_hash) != 0x12345678);
+ text_poke_early(wr_addr + fineibt_caller_hash, &hash, 4);
}
/* rely on apply_retpolines() */
}
@@ -1262,8 +1291,9 @@ static int cfi_rewrite_callers(s32 *start, s32 *end)
}
static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi, bool builtin)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
+ bool builtin = mod ? false : true;
int ret;
if (WARN_ONCE(fineibt_preamble_size != 16,
@@ -1281,7 +1311,7 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
* rewrite them. This disables all CFI. If this succeeds but any of the
* later stages fails, we're without CFI.
*/
- ret = cfi_disable_callers(start_retpoline, end_retpoline);
+ ret = cfi_disable_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;
@@ -1292,11 +1322,11 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
cfi_bpf_subprog_hash = cfi_rehash(cfi_bpf_subprog_hash);
}
- ret = cfi_rand_preamble(start_cfi, end_cfi);
+ ret = cfi_rand_preamble(start_cfi, end_cfi, mod);
if (ret)
goto err;
- ret = cfi_rand_callers(start_retpoline, end_retpoline);
+ ret = cfi_rand_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;
}
@@ -1308,7 +1338,7 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
return;
case CFI_KCFI:
- ret = cfi_enable_callers(start_retpoline, end_retpoline);
+ ret = cfi_enable_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;
@@ -1318,17 +1348,17 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
case CFI_FINEIBT:
/* place the FineIBT preamble at func()-16 */
- ret = cfi_rewrite_preamble(start_cfi, end_cfi);
+ ret = cfi_rewrite_preamble(start_cfi, end_cfi, mod);
if (ret)
goto err;
/* rewrite the callers to target func()-16 */
- ret = cfi_rewrite_callers(start_retpoline, end_retpoline);
+ ret = cfi_rewrite_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;
/* now that nobody targets func()+0, remove ENDBR there */
- cfi_rewrite_endbr(start_cfi, end_cfi);
+ cfi_rewrite_endbr(start_cfi, end_cfi, mod);
if (builtin)
pr_info("Using FineIBT CFI\n");
@@ -1347,7 +1377,7 @@ static inline void poison_hash(void *addr)
*(u32 *)addr = 0;
}
-static void poison_cfi(void *addr)
+static void poison_cfi(void *addr, void *wr_addr)
{
switch (cfi_mode) {
case CFI_FINEIBT:
@@ -1359,8 +1389,8 @@ static void poison_cfi(void *addr)
* ud2
* 1: nop
*/
- poison_endbr(addr, false);
- poison_hash(addr + fineibt_preamble_hash);
+ poison_endbr(addr, wr_addr, false);
+ poison_hash(wr_addr + fineibt_preamble_hash);
break;
case CFI_KCFI:
@@ -1369,7 +1399,7 @@ static void poison_cfi(void *addr)
* movl $0, %eax
* .skip 11, 0x90
*/
- poison_hash(addr + 1);
+ poison_hash(wr_addr + 1);
break;
default:
@@ -1380,22 +1410,21 @@ static void poison_cfi(void *addr)
#else
static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi, bool builtin)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
}
#ifdef CONFIG_X86_KERNEL_IBT
-static void poison_cfi(void *addr) { }
+static void poison_cfi(void *addr, void *wr_addr) { }
#endif
#endif
void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
return __apply_fineibt(start_retpoline, end_retpoline,
- start_cfi, end_cfi,
- /* .builtin = */ false);
+ start_cfi, end_cfi, mod);
}
#ifdef CONFIG_SMP
@@ -1692,16 +1721,16 @@ void __init alternative_instructions(void)
paravirt_set_cap();
__apply_fineibt(__retpoline_sites, __retpoline_sites_end,
- __cfi_sites, __cfi_sites_end, true);
+ __cfi_sites, __cfi_sites_end, NULL);
/*
* Rewrite the retpolines, must be done before alternatives since
* those can rewrite the retpoline thunks.
*/
- apply_retpolines(__retpoline_sites, __retpoline_sites_end);
- apply_returns(__return_sites, __return_sites_end);
+ apply_retpolines(__retpoline_sites, __retpoline_sites_end, NULL);
+ apply_returns(__return_sites, __return_sites_end, NULL);
- apply_alternatives(__alt_instructions, __alt_instructions_end);
+ apply_alternatives(__alt_instructions, __alt_instructions_end, NULL);
/*
* Now all calls are established. Apply the call thunks if
@@ -1712,7 +1741,7 @@ void __init alternative_instructions(void)
/*
* Seal all functions that do not have their address taken.
*/
- apply_seal_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end);
+ apply_seal_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end, NULL);
#ifdef CONFIG_SMP
/* Patch to UP if other cpus not imminent. */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8da0e66ca22d..b498897b213c 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -118,10 +118,13 @@ ftrace_modify_code_direct(unsigned long ip, const char *old_code,
return ret;
/* replace the text with the new text */
- if (ftrace_poke_late)
+ if (ftrace_poke_late) {
text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
- else
- text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
+ } else {
+ mutex_lock(&text_mutex);
+ text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
+ mutex_unlock(&text_mutex);
+ }
return 0;
}
@@ -318,7 +321,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 };
unsigned const char retq[] = { RET_INSN_OPCODE, INT3_INSN_OPCODE };
union ftrace_op_code_union op_ptr;
- int ret;
+ void *ret;
if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
start_offset = (unsigned long)ftrace_regs_caller;
@@ -349,15 +352,15 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE);
/* Copy ftrace_caller onto the trampoline memory */
- ret = copy_from_kernel_nofault(trampoline, (void *)start_offset, size);
- if (WARN_ON(ret < 0))
+ ret = text_poke_copy(trampoline, (void *)start_offset, size);
+ if (WARN_ON(!ret))
goto fail;
ip = trampoline + size;
if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
__text_gen_insn(ip, JMP32_INSN_OPCODE, ip, x86_return_thunk, JMP32_INSN_SIZE);
else
- memcpy(ip, retq, sizeof(retq));
+ text_poke_copy(ip, retq, sizeof(retq));
/* No need to test direct calls on created trampolines */
if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
@@ -365,8 +368,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
ip = trampoline + (jmp_offset - start_offset);
if (WARN_ON(*(char *)ip != 0x75))
goto fail;
- ret = copy_from_kernel_nofault(ip, x86_nops[2], 2);
- if (ret < 0)
+ if (!text_poke_copy(ip, x86_nops[2], 2))
goto fail;
}
@@ -379,7 +381,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
*/
ptr = (unsigned long *)(trampoline + size + RET_SIZE);
- *ptr = (unsigned long)ops;
+ text_poke_copy(ptr, &ops, sizeof(unsigned long));
op_offset -= start_offset;
memcpy(&op_ptr, trampoline + op_offset, OP_REF_SIZE);
@@ -395,7 +397,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
op_ptr.offset = offset;
/* put in the new offset to the ftrace_ops */
- memcpy(trampoline + op_offset, &op_ptr, OP_REF_SIZE);
+ text_poke_copy(trampoline + op_offset, &op_ptr, OP_REF_SIZE);
/* put in the call to the function */
mutex_lock(&text_mutex);
@@ -405,9 +407,9 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
* the depth accounting before the call already.
*/
dest = ftrace_ops_get_func(ops);
- memcpy(trampoline + call_offset,
- text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest),
- CALL_INSN_SIZE);
+ text_poke_copy_locked(trampoline + call_offset,
+ text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest),
+ CALL_INSN_SIZE, false);
mutex_unlock(&text_mutex);
/* ALLOC_TRAMP flags lets us know we created it */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 837450b6e882..8984abd91c00 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -146,18 +146,21 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
}
if (apply) {
- if (memcmp(loc, &zero, size)) {
+ void *wr_loc = module_writable_address(me, loc);
+
+ if (memcmp(wr_loc, &zero, size)) {
pr_err("x86/modules: Invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",
(int)ELF64_R_TYPE(rel[i].r_info), loc, val);
return -ENOEXEC;
}
- write(loc, &val, size);
+ write(wr_loc, &val, size);
} else {
if (memcmp(loc, &val, size)) {
pr_warn("x86/modules: Invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
(int)ELF64_R_TYPE(rel[i].r_info), loc, val);
return -ENOEXEC;
}
+ /* FIXME: needs care for ROX module allocations */
write(loc, &zero, size);
}
}
@@ -224,7 +227,7 @@ int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *me)
{
- const Elf_Shdr *s, *alt = NULL, *locks = NULL,
+ const Elf_Shdr *s, *alt = NULL,
*orc = NULL, *orc_ip = NULL,
*retpolines = NULL, *returns = NULL, *ibt_endbr = NULL,
*calls = NULL, *cfi = NULL;
@@ -233,8 +236,6 @@ int module_finalize(const Elf_Ehdr *hdr,
for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
if (!strcmp(".altinstructions", secstrings + s->sh_name))
alt = s;
- if (!strcmp(".smp_locks", secstrings + s->sh_name))
- locks = s;
if (!strcmp(".orc_unwind", secstrings + s->sh_name))
orc = s;
if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
@@ -265,20 +266,20 @@ int module_finalize(const Elf_Ehdr *hdr,
csize = cfi->sh_size;
}
- apply_fineibt(rseg, rseg + rsize, cseg, cseg + csize);
+ apply_fineibt(rseg, rseg + rsize, cseg, cseg + csize, me);
}
if (retpolines) {
void *rseg = (void *)retpolines->sh_addr;
- apply_retpolines(rseg, rseg + retpolines->sh_size);
+ apply_retpolines(rseg, rseg + retpolines->sh_size, me);
}
if (returns) {
void *rseg = (void *)returns->sh_addr;
- apply_returns(rseg, rseg + returns->sh_size);
+ apply_returns(rseg, rseg + returns->sh_size, me);
}
if (alt) {
/* patch .altinstructions */
void *aseg = (void *)alt->sh_addr;
- apply_alternatives(aseg, aseg + alt->sh_size);
+ apply_alternatives(aseg, aseg + alt->sh_size, me);
}
if (calls || alt) {
struct callthunk_sites cs = {};
@@ -297,8 +298,28 @@ int module_finalize(const Elf_Ehdr *hdr,
}
if (ibt_endbr) {
void *iseg = (void *)ibt_endbr->sh_addr;
- apply_seal_endbr(iseg, iseg + ibt_endbr->sh_size);
+ apply_seal_endbr(iseg, iseg + ibt_endbr->sh_size, me);
}
+
+ if (orc && orc_ip)
+ unwind_module_init(me, (void *)orc_ip->sh_addr, orc_ip->sh_size,
+ (void *)orc->sh_addr, orc->sh_size);
+
+ return 0;
+}
+
+int module_post_finalize(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ struct module *me)
+{
+ const Elf_Shdr *s, *locks = NULL;
+ char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
+
+ for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
+ if (!strcmp(".smp_locks", secstrings + s->sh_name))
+ locks = s;
+ }
+
if (locks) {
void *lseg = (void *)locks->sh_addr;
void *text = me->mem[MOD_TEXT].base;
@@ -308,10 +329,6 @@ int module_finalize(const Elf_Ehdr *hdr,
text, text_end);
}
- if (orc && orc_ip)
- unwind_module_init(me, (void *)orc_ip->sh_addr, orc_ip->sh_size,
- (void *)orc->sh_addr, orc->sh_size);
-
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v7 7/8] execmem: add support for cache of large ROX pages
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
` (5 preceding siblings ...)
2024-10-23 16:27 ` [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2025-02-27 11:13 ` Ryan Roberts
2024-10-23 16:27 ` [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit Mike Rapoport
2024-11-18 18:25 ` [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Steven Rostedt
8 siblings, 1 reply; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Using large pages to map text areas reduces iTLB pressure and improves
performance.
Extend execmem_alloc() with an ability to use huge pages with ROX
permissions as a cache for smaller allocations.
To populate the cache, a writable large page is allocated from vmalloc with
VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
ROX.
The direct map alias of that large page is exculded from the direct map.
Portions of that large page are handed out to execmem_alloc() callers
without any changes to the permissions.
When the memory is freed with execmem_free() it is invalidated again so
that it won't contain stale instructions.
An architecture has to implement execmem_fill_trapping_insns() callback
and select ARCH_HAS_EXECMEM_ROX configuration option to be able to use
the ROX cache.
The cache is enabled on per-range basis when an architecture sets
EXECMEM_ROX_CACHE flag in definition of an execmem_range.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
arch/Kconfig | 8 +
include/linux/execmem.h | 14 ++
mm/execmem.c | 325 +++++++++++++++++++++++++++++++++++++++-
mm/internal.h | 1 +
mm/vmalloc.c | 5 +
5 files changed, 345 insertions(+), 8 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 98157b38f5cf..f4f6e170eb7e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1010,6 +1010,14 @@ config ARCH_WANTS_EXECMEM_LATE
enough entropy for module space randomization, for instance
arm64.
+config ARCH_HAS_EXECMEM_ROX
+ bool
+ depends on MMU && !HIGHMEM
+ help
+ For architectures that support allocations of executable memory
+ with read-only execute permissions. Architecture must implement
+ execmem_fill_trapping_insns() callback to enable this.
+
config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index dfdf19f8a5e8..1517fa196bf7 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -53,6 +53,20 @@ enum execmem_range_flags {
EXECMEM_ROX_CACHE = (1 << 1),
};
+#ifdef CONFIG_ARCH_HAS_EXECMEM_ROX
+/**
+ * execmem_fill_trapping_insns - set memory to contain instructions that
+ * will trap
+ * @ptr: pointer to memory to fill
+ * @size: size of the range to fill
+ * @writable: is the memory poited by @ptr is writable or ROX
+ *
+ * A hook for architecures to fill execmem ranges with invalid instructions.
+ * Architectures that use EXECMEM_ROX_CACHE must implement this.
+ */
+void execmem_fill_trapping_insns(void *ptr, size_t size, bool writable);
+#endif
+
/**
* struct execmem_range - definition of an address space suitable for code and
* related data allocations
diff --git a/mm/execmem.c b/mm/execmem.c
index 0f6691e9ffe6..576a57e2161f 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -6,29 +6,41 @@
* Copyright (C) 2024 Mike Rapoport IBM.
*/
+#define pr_fmt(fmt) "execmem: " fmt
+
#include <linux/mm.h>
+#include <linux/mutex.h>
#include <linux/vmalloc.h>
#include <linux/execmem.h>
+#include <linux/maple_tree.h>
+#include <linux/set_memory.h>
#include <linux/moduleloader.h>
#include <linux/text-patching.h>
+#include <asm/tlbflush.h>
+
+#include "internal.h"
+
static struct execmem_info *execmem_info __ro_after_init;
static struct execmem_info default_execmem_info __ro_after_init;
-static void *__execmem_alloc(struct execmem_range *range, size_t size)
+#ifdef CONFIG_MMU
+static void *execmem_vmalloc(struct execmem_range *range, size_t size,
+ pgprot_t pgprot, unsigned long vm_flags)
{
bool kasan = range->flags & EXECMEM_KASAN_SHADOW;
- unsigned long vm_flags = VM_FLUSH_RESET_PERMS;
gfp_t gfp_flags = GFP_KERNEL | __GFP_NOWARN;
+ unsigned int align = range->alignment;
unsigned long start = range->start;
unsigned long end = range->end;
- unsigned int align = range->alignment;
- pgprot_t pgprot = range->pgprot;
void *p;
if (kasan)
vm_flags |= VM_DEFER_KMEMLEAK;
+ if (vm_flags & VM_ALLOW_HUGE_VMAP)
+ align = PMD_SIZE;
+
p = __vmalloc_node_range(size, align, start, end, gfp_flags,
pgprot, vm_flags, NUMA_NO_NODE,
__builtin_return_address(0));
@@ -41,7 +53,7 @@ static void *__execmem_alloc(struct execmem_range *range, size_t size)
}
if (!p) {
- pr_warn_ratelimited("execmem: unable to allocate memory\n");
+ pr_warn_ratelimited("unable to allocate memory\n");
return NULL;
}
@@ -50,14 +62,298 @@ static void *__execmem_alloc(struct execmem_range *range, size_t size)
return NULL;
}
- return kasan_reset_tag(p);
+ return p;
}
+#else
+static void *execmem_vmalloc(struct execmem_range *range, size_t size,
+ pgprot_t pgprot, unsigned long vm_flags)
+{
+ return vmalloc(size);
+}
+#endif /* CONFIG_MMU */
+
+#ifdef CONFIG_ARCH_HAS_EXECMEM_ROX
+struct execmem_cache {
+ struct mutex mutex;
+ struct maple_tree busy_areas;
+ struct maple_tree free_areas;
+};
+
+static struct execmem_cache execmem_cache = {
+ .mutex = __MUTEX_INITIALIZER(execmem_cache.mutex),
+ .busy_areas = MTREE_INIT_EXT(busy_areas, MT_FLAGS_LOCK_EXTERN,
+ execmem_cache.mutex),
+ .free_areas = MTREE_INIT_EXT(free_areas, MT_FLAGS_LOCK_EXTERN,
+ execmem_cache.mutex),
+};
+
+static inline unsigned long mas_range_len(struct ma_state *mas)
+{
+ return mas->last - mas->index + 1;
+}
+
+static int execmem_set_direct_map_valid(struct vm_struct *vm, bool valid)
+{
+ unsigned int nr = (1 << get_vm_area_page_order(vm));
+ unsigned int updated = 0;
+ int err = 0;
+
+ for (int i = 0; i < vm->nr_pages; i += nr) {
+ err = set_direct_map_valid_noflush(vm->pages[i], nr, valid);
+ if (err)
+ goto err_restore;
+ updated += nr;
+ }
+
+ return 0;
+
+err_restore:
+ for (int i = 0; i < updated; i += nr)
+ set_direct_map_valid_noflush(vm->pages[i], nr, !valid);
+
+ return err;
+}
+
+static void execmem_cache_clean(struct work_struct *work)
+{
+ struct maple_tree *free_areas = &execmem_cache.free_areas;
+ struct mutex *mutex = &execmem_cache.mutex;
+ MA_STATE(mas, free_areas, 0, ULONG_MAX);
+ void *area;
+
+ mutex_lock(mutex);
+ mas_for_each(&mas, area, ULONG_MAX) {
+ size_t size = mas_range_len(&mas);
+
+ if (IS_ALIGNED(size, PMD_SIZE) &&
+ IS_ALIGNED(mas.index, PMD_SIZE)) {
+ struct vm_struct *vm = find_vm_area(area);
+
+ execmem_set_direct_map_valid(vm, true);
+ mas_store_gfp(&mas, NULL, GFP_KERNEL);
+ vfree(area);
+ }
+ }
+ mutex_unlock(mutex);
+}
+
+static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean);
+
+static int execmem_cache_add(void *ptr, size_t size)
+{
+ struct maple_tree *free_areas = &execmem_cache.free_areas;
+ struct mutex *mutex = &execmem_cache.mutex;
+ unsigned long addr = (unsigned long)ptr;
+ MA_STATE(mas, free_areas, addr - 1, addr + 1);
+ unsigned long lower, upper;
+ void *area = NULL;
+ int err;
+
+ lower = addr;
+ upper = addr + size - 1;
+
+ mutex_lock(mutex);
+ area = mas_walk(&mas);
+ if (area && mas.last == addr - 1)
+ lower = mas.index;
+
+ area = mas_next(&mas, ULONG_MAX);
+ if (area && mas.index == addr + size)
+ upper = mas.last;
+
+ mas_set_range(&mas, lower, upper);
+ err = mas_store_gfp(&mas, (void *)lower, GFP_KERNEL);
+ mutex_unlock(mutex);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static bool within_range(struct execmem_range *range, struct ma_state *mas,
+ size_t size)
+{
+ unsigned long addr = mas->index;
+
+ if (addr >= range->start && addr + size < range->end)
+ return true;
+
+ if (range->fallback_start &&
+ addr >= range->fallback_start && addr + size < range->fallback_end)
+ return true;
+
+ return false;
+}
+
+static void *__execmem_cache_alloc(struct execmem_range *range, size_t size)
+{
+ struct maple_tree *free_areas = &execmem_cache.free_areas;
+ struct maple_tree *busy_areas = &execmem_cache.busy_areas;
+ MA_STATE(mas_free, free_areas, 0, ULONG_MAX);
+ MA_STATE(mas_busy, busy_areas, 0, ULONG_MAX);
+ struct mutex *mutex = &execmem_cache.mutex;
+ unsigned long addr, last, area_size = 0;
+ void *area, *ptr = NULL;
+ int err;
+
+ mutex_lock(mutex);
+ mas_for_each(&mas_free, area, ULONG_MAX) {
+ area_size = mas_range_len(&mas_free);
+
+ if (area_size >= size && within_range(range, &mas_free, size))
+ break;
+ }
+
+ if (area_size < size)
+ goto out_unlock;
+
+ addr = mas_free.index;
+ last = mas_free.last;
+
+ /* insert allocated size to busy_areas at range [addr, addr + size) */
+ mas_set_range(&mas_busy, addr, addr + size - 1);
+ err = mas_store_gfp(&mas_busy, (void *)addr, GFP_KERNEL);
+ if (err)
+ goto out_unlock;
+
+ mas_store_gfp(&mas_free, NULL, GFP_KERNEL);
+ if (area_size > size) {
+ void *ptr = (void *)(addr + size);
+
+ /*
+ * re-insert remaining free size to free_areas at range
+ * [addr + size, last]
+ */
+ mas_set_range(&mas_free, addr + size, last);
+ err = mas_store_gfp(&mas_free, ptr, GFP_KERNEL);
+ if (err) {
+ mas_store_gfp(&mas_busy, NULL, GFP_KERNEL);
+ goto out_unlock;
+ }
+ }
+ ptr = (void *)addr;
+
+out_unlock:
+ mutex_unlock(mutex);
+ return ptr;
+}
+
+static int execmem_cache_populate(struct execmem_range *range, size_t size)
+{
+ unsigned long vm_flags = VM_ALLOW_HUGE_VMAP;
+ unsigned long start, end;
+ struct vm_struct *vm;
+ size_t alloc_size;
+ int err = -ENOMEM;
+ void *p;
+
+ alloc_size = round_up(size, PMD_SIZE);
+ p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags);
+ if (!p)
+ return err;
+
+ vm = find_vm_area(p);
+ if (!vm)
+ goto err_free_mem;
+
+ /* fill memory with instructions that will trap */
+ execmem_fill_trapping_insns(p, alloc_size, /* writable = */ true);
+
+ start = (unsigned long)p;
+ end = start + alloc_size;
+
+ vunmap_range(start, end);
+
+ err = execmem_set_direct_map_valid(vm, false);
+ if (err)
+ goto err_free_mem;
+
+ err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages,
+ PMD_SHIFT);
+ if (err)
+ goto err_free_mem;
+
+ err = execmem_cache_add(p, alloc_size);
+ if (err)
+ goto err_free_mem;
+
+ return 0;
+
+err_free_mem:
+ vfree(p);
+ return err;
+}
+
+static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
+{
+ void *p;
+ int err;
+
+ p = __execmem_cache_alloc(range, size);
+ if (p)
+ return p;
+
+ err = execmem_cache_populate(range, size);
+ if (err)
+ return NULL;
+
+ return __execmem_cache_alloc(range, size);
+}
+
+static bool execmem_cache_free(void *ptr)
+{
+ struct maple_tree *busy_areas = &execmem_cache.busy_areas;
+ struct mutex *mutex = &execmem_cache.mutex;
+ unsigned long addr = (unsigned long)ptr;
+ MA_STATE(mas, busy_areas, addr, addr);
+ size_t size;
+ void *area;
+
+ mutex_lock(mutex);
+ area = mas_walk(&mas);
+ if (!area) {
+ mutex_unlock(mutex);
+ return false;
+ }
+ size = mas_range_len(&mas);
+
+ mas_store_gfp(&mas, NULL, GFP_KERNEL);
+ mutex_unlock(mutex);
+
+ execmem_fill_trapping_insns(ptr, size, /* writable = */ false);
+
+ execmem_cache_add(ptr, size);
+
+ schedule_work(&execmem_cache_clean_work);
+
+ return true;
+}
+#else /* CONFIG_ARCH_HAS_EXECMEM_ROX */
+static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
+{
+ return NULL;
+}
+
+static bool execmem_cache_free(void *ptr)
+{
+ return false;
+}
+#endif /* CONFIG_ARCH_HAS_EXECMEM_ROX */
void *execmem_alloc(enum execmem_type type, size_t size)
{
struct execmem_range *range = &execmem_info->ranges[type];
+ bool use_cache = range->flags & EXECMEM_ROX_CACHE;
+ unsigned long vm_flags = VM_FLUSH_RESET_PERMS;
+ pgprot_t pgprot = range->pgprot;
+ void *p;
- return __execmem_alloc(range, size);
+ if (use_cache)
+ p = execmem_cache_alloc(range, size);
+ else
+ p = execmem_vmalloc(range, size, pgprot, vm_flags);
+
+ return kasan_reset_tag(p);
}
void execmem_free(void *ptr)
@@ -67,7 +363,9 @@ void execmem_free(void *ptr)
* supported by vmalloc.
*/
WARN_ON(in_interrupt());
- vfree(ptr);
+
+ if (!execmem_cache_free(ptr))
+ vfree(ptr);
}
void *execmem_update_copy(void *dst, const void *src, size_t size)
@@ -89,6 +387,17 @@ static bool execmem_validate(struct execmem_info *info)
return false;
}
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX)) {
+ for (int i = EXECMEM_DEFAULT; i < EXECMEM_TYPE_MAX; i++) {
+ r = &info->ranges[i];
+
+ if (r->flags & EXECMEM_ROX_CACHE) {
+ pr_warn_once("ROX cache is not supported\n");
+ r->flags &= ~EXECMEM_ROX_CACHE;
+ }
+ }
+ }
+
return true;
}
diff --git a/mm/internal.h b/mm/internal.h
index 93083bbeeefa..95befbc19852 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1189,6 +1189,7 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
void __init vmalloc_init(void);
int __must_check vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift);
+unsigned int get_vm_area_page_order(struct vm_struct *vm);
#else
static inline void vmalloc_init(void)
{
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 86b2344d7461..f340e38716c0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3007,6 +3007,11 @@ static inline unsigned int vm_area_page_order(struct vm_struct *vm)
#endif
}
+unsigned int get_vm_area_page_order(struct vm_struct *vm)
+{
+ return vm_area_page_order(vm);
+}
+
static inline void set_vm_area_page_order(struct vm_struct *vm, unsigned int order)
{
#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
` (6 preceding siblings ...)
2024-10-23 16:27 ` [PATCH v7 7/8] execmem: add support for cache of large ROX pages Mike Rapoport
@ 2024-10-23 16:27 ` Mike Rapoport
2025-01-12 18:42 ` [REGRESSION] " Ville Syrjälä
2024-11-18 18:25 ` [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Steven Rostedt
8 siblings, 1 reply; 23+ messages in thread
From: Mike Rapoport @ 2024-10-23 16:27 UTC (permalink / raw)
To: Andrew Morton, Luis Chamberlain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Mike Rapoport, Oleg Nesterov,
Palmer Dabbelt, Peter Zijlstra, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module
text allocations on 64 bit.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
---
arch/x86/Kconfig | 1 +
arch/x86/mm/init.c | 37 ++++++++++++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2852fcd82cbd..ff71d18253ba 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -83,6 +83,7 @@ config X86
select ARCH_HAS_DMA_OPS if GART_IOMMU || XEN
select ARCH_HAS_EARLY_DEBUG if KGDB
select ARCH_HAS_ELF_RANDOMIZE
+ select ARCH_HAS_EXECMEM_ROX if X86_64
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index eb503f53c319..c2e4f389f47f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1053,18 +1053,53 @@ unsigned long arch_max_swapfile_size(void)
#ifdef CONFIG_EXECMEM
static struct execmem_info execmem_info __ro_after_init;
+#ifdef CONFIG_ARCH_HAS_EXECMEM_ROX
+void execmem_fill_trapping_insns(void *ptr, size_t size, bool writeable)
+{
+ /* fill memory with INT3 instructions */
+ if (writeable)
+ memset(ptr, INT3_INSN_OPCODE, size);
+ else
+ text_poke_set(ptr, INT3_INSN_OPCODE, size);
+}
+#endif
+
struct execmem_info __init *execmem_arch_setup(void)
{
unsigned long start, offset = 0;
+ enum execmem_range_flags flags;
+ pgprot_t pgprot;
if (kaslr_enabled())
offset = get_random_u32_inclusive(1, 1024) * PAGE_SIZE;
start = MODULES_VADDR + offset;
+ if (IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX)) {
+ pgprot = PAGE_KERNEL_ROX;
+ flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE;
+ } else {
+ pgprot = PAGE_KERNEL;
+ flags = EXECMEM_KASAN_SHADOW;
+ }
+
execmem_info = (struct execmem_info){
.ranges = {
- [EXECMEM_DEFAULT] = {
+ [EXECMEM_MODULE_TEXT] = {
+ .flags = flags,
+ .start = start,
+ .end = MODULES_END,
+ .pgprot = pgprot,
+ .alignment = MODULE_ALIGN,
+ },
+ [EXECMEM_KPROBES ... EXECMEM_BPF] = {
+ .flags = EXECMEM_KASAN_SHADOW,
+ .start = start,
+ .end = MODULES_END,
+ .pgprot = PAGE_KERNEL,
+ .alignment = MODULE_ALIGN,
+ },
+ [EXECMEM_MODULE_DATA] = {
.flags = EXECMEM_KASAN_SHADOW,
.start = start,
.end = MODULES_END,
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text
2024-10-23 16:27 ` [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text Mike Rapoport
@ 2024-11-04 23:27 ` Nathan Chancellor
2024-11-05 7:02 ` Mike Rapoport
0 siblings, 1 reply; 23+ messages in thread
From: Nathan Chancellor @ 2024-11-04 23:27 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Borislav Petkov, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
Hi Mike,
On Wed, Oct 23, 2024 at 07:27:09PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> When module text memory will be allocated with ROX permissions, the
> memory at the actual address where the module will live will contain
> invalid instructions and there will be a writable copy that contains the
> actual module code.
>
> Update relocations and alternatives patching to deal with it.
>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Tested-by: kdevops <kdevops@lists.linux.dev>
Hopefully the last time you have to hear from me, as I am only
experiencing issues with only one of my test machines at this point and
it is my only machine that supports IBT, so it seems to point to
something specific with the IBT part of the FineIBT support. I notice
either a boot hang or an almost immediate reboot (triple fault?). I
guess this is how I missed reporting this earlier, as my machine was
falling back to the default distribution kernel after the restart and I
did not notice I was not actually testing a -next kernel.
Checking out the version of this change that is in next-20241104, commit
7ca6ed09db62 ("x86/module: prepare module loading for ROX allocations of
text"), it boots with either 'cfi=off' or 'cfi=kcfi' but it exhibits the
issues noted above with 'cfi=fineibt'. At the immediate parent, commit
b575d981092f ("arch: introduce set_direct_map_valid_noflush()"), all
three combinations boot fine.
$ uname -r; tr ' ' '\n' </proc/cmdline | grep cfi=
6.12.0-rc5-debug-00214-g7ca6ed09db62
cfi=kcfi
6.12.0-rc5-debug-00214-g7ca6ed09db62
cfi=off
6.12.0-rc5-debug-00213-gb575d981092f
cfi=fineibt
6.12.0-rc5-debug-00213-gb575d981092f
cfi=kcfi
6.12.0-rc5-debug-00213-gb575d981092f
cfi=off
I do not think this machine has an accessible serial port and I do not
think IBT virtualization is supported via either KVM or TCG in QEMU, so
I am not sure how to get more information about what is going on here. I
wanted to try reverting these changes on top of next-20241104 but there
was a non-trivial conflict in mm/execmem.c due to some changes on top,
so I just tested in the mm history.
If there is any other information I can provide or patches I can test, I
am more than happy to do so.
Cheers,
Nathan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text
2024-11-04 23:27 ` Nathan Chancellor
@ 2024-11-05 7:02 ` Mike Rapoport
2024-11-05 19:04 ` Nathan Chancellor
0 siblings, 1 reply; 23+ messages in thread
From: Mike Rapoport @ 2024-11-05 7:02 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Andrew Morton, Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Borislav Petkov, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
Hi Nathan,
On Mon, Nov 04, 2024 at 04:27:41PM -0700, Nathan Chancellor wrote:
> Hi Mike,
>
> On Wed, Oct 23, 2024 at 07:27:09PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > When module text memory will be allocated with ROX permissions, the
> > memory at the actual address where the module will live will contain
> > invalid instructions and there will be a writable copy that contains the
> > actual module code.
> >
> > Update relocations and alternatives patching to deal with it.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > Tested-by: kdevops <kdevops@lists.linux.dev>
>
> Hopefully the last time you have to hear from me, as I am only
> experiencing issues with only one of my test machines at this point and
> it is my only machine that supports IBT, so it seems to point to
> something specific with the IBT part of the FineIBT support. I notice
> either a boot hang or an almost immediate reboot (triple fault?). I
> guess this is how I missed reporting this earlier, as my machine was
> falling back to the default distribution kernel after the restart and I
> did not notice I was not actually testing a -next kernel.
>
> Checking out the version of this change that is in next-20241104, commit
> 7ca6ed09db62 ("x86/module: prepare module loading for ROX allocations of
> text"), it boots with either 'cfi=off' or 'cfi=kcfi' but it exhibits the
> issues noted above with 'cfi=fineibt'. At the immediate parent, commit
> b575d981092f ("arch: introduce set_direct_map_valid_noflush()"), all
> three combinations boot fine.
>
> $ uname -r; tr ' ' '\n' </proc/cmdline | grep cfi=
>
> 6.12.0-rc5-debug-00214-g7ca6ed09db62
> cfi=kcfi
>
> 6.12.0-rc5-debug-00214-g7ca6ed09db62
> cfi=off
>
> 6.12.0-rc5-debug-00213-gb575d981092f
> cfi=fineibt
>
> 6.12.0-rc5-debug-00213-gb575d981092f
> cfi=kcfi
>
> 6.12.0-rc5-debug-00213-gb575d981092f
> cfi=off
>
> I do not think this machine has an accessible serial port and I do not
> think IBT virtualization is supported via either KVM or TCG in QEMU, so
> I am not sure how to get more information about what is going on here. I
> wanted to try reverting these changes on top of next-20241104 but there
> was a non-trivial conflict in mm/execmem.c due to some changes on top,
> so I just tested in the mm history.
>
> If there is any other information I can provide or patches I can test, I
> am more than happy to do so.
Yes, please :)
There's a silly mistake in cfi_rewrite_endbr() in that commit, the patch
below should fix it. Can you please test?
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 3407efc26528..243843e44e89 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1241,7 +1241,7 @@ static void cfi_rewrite_endbr(s32 *start, s32 *end, struct module *mod)
void *addr = (void *)s + *s;
void *wr_addr = module_writable_address(mod, addr);
- poison_endbr(addr+16, wr_addr, false);
+ poison_endbr(addr + 16, wr_addr + 16, false);
}
}
> Cheers,
> Nathan
--
Sincerely yours,
Mike.
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text
2024-11-05 7:02 ` Mike Rapoport
@ 2024-11-05 19:04 ` Nathan Chancellor
0 siblings, 0 replies; 23+ messages in thread
From: Nathan Chancellor @ 2024-11-05 19:04 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Borislav Petkov, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
On Tue, Nov 05, 2024 at 09:02:26AM +0200, Mike Rapoport wrote:
> There's a silly mistake in cfi_rewrite_endbr() in that commit, the patch
> below should fix it. Can you please test?
Yup, that was it! All my machines boot with this diff applied on top of
next-20241105, so with that fixed, I think we are all good here.
Tested-by: Nathan Chancellor <nathan@kernel.org>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 3407efc26528..243843e44e89 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1241,7 +1241,7 @@ static void cfi_rewrite_endbr(s32 *start, s32 *end, struct module *mod)
> void *addr = (void *)s + *s;
> void *wr_addr = module_writable_address(mod, addr);
>
> - poison_endbr(addr+16, wr_addr, false);
> + poison_endbr(addr + 16, wr_addr + 16, false);
> }
> }
Cheers,
Nathan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v7 0/8] x86/module: use large ROX pages for text allocations
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
` (7 preceding siblings ...)
2024-10-23 16:27 ` [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit Mike Rapoport
@ 2024-11-18 18:25 ` Steven Rostedt
2024-11-18 18:40 ` Mike Rapoport
8 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2024-11-18 18:25 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Borislav Petkov, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Suren Baghdasaryan, Thomas Bogendoerfer,
Thomas Gleixner, Uladzislau Rezki, Vineet Gupta, Will Deacon, bpf,
linux-alpha, linux-arch, linux-arm-kernel, linux-csky,
linux-hexagon, linux-kernel, linux-m68k, linux-mips, linux-mm,
linux-modules, linux-openrisc, linux-parisc, linux-riscv,
linux-sh, linux-snps-arc, linux-trace-kernel, linux-um,
linuxppc-dev, loongarch, sparclinux, x86
On Wed, 23 Oct 2024 19:27:03 +0300
Mike Rapoport <rppt@kernel.org> wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> Hi,
>
> This is an updated version of execmem ROX caches.
>
FYI, I booted a kernel before and after applying these patches with my
change:
https://lore.kernel.org/20241017113105.1edfa943@gandalf.local.home
Before these patches:
# cat /sys/kernel/tracing/dyn_ftrace_total_info
57695 pages:231 groups: 9
ftrace boot update time = 14733459 (ns)
ftrace module total update time = 449016 (ns)
After:
# cat /sys/kernel/tracing/dyn_ftrace_total_info
57708 pages:231 groups: 9
ftrace boot update time = 47195374 (ns)
ftrace module total update time = 592080 (ns)
Which caused boot time to slowdown by over 30ms. That may not seem like
much, but we are very concerned about boot time and are fighting every ms
we can get.
-- Steve
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v7 0/8] x86/module: use large ROX pages for text allocations
2024-11-18 18:25 ` [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Steven Rostedt
@ 2024-11-18 18:40 ` Mike Rapoport
0 siblings, 0 replies; 23+ messages in thread
From: Mike Rapoport @ 2024-11-18 18:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andrew Morton, Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Borislav Petkov, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Suren Baghdasaryan, Thomas Bogendoerfer,
Thomas Gleixner, Uladzislau Rezki, Vineet Gupta, Will Deacon, bpf,
linux-alpha, linux-arch, linux-arm-kernel, linux-csky,
linux-hexagon, linux-kernel, linux-m68k, linux-mips, linux-mm,
linux-modules, linux-openrisc, linux-parisc, linux-riscv,
linux-sh, linux-snps-arc, linux-trace-kernel, linux-um,
linuxppc-dev, loongarch, sparclinux, x86
On Mon, Nov 18, 2024 at 01:25:01PM -0500, Steven Rostedt wrote:
> On Wed, 23 Oct 2024 19:27:03 +0300
> Mike Rapoport <rppt@kernel.org> wrote:
>
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > Hi,
> >
> > This is an updated version of execmem ROX caches.
> >
>
> FYI, I booted a kernel before and after applying these patches with my
> change:
>
> https://lore.kernel.org/20241017113105.1edfa943@gandalf.local.home
>
> Before these patches:
>
> # cat /sys/kernel/tracing/dyn_ftrace_total_info
> 57695 pages:231 groups: 9
> ftrace boot update time = 14733459 (ns)
> ftrace module total update time = 449016 (ns)
>
> After:
>
> # cat /sys/kernel/tracing/dyn_ftrace_total_info
> 57708 pages:231 groups: 9
> ftrace boot update time = 47195374 (ns)
> ftrace module total update time = 592080 (ns)
>
> Which caused boot time to slowdown by over 30ms. That may not seem like
> much, but we are very concerned about boot time and are fighting every ms
> we can get.
Hmm, looks like this change was lost in rebase :/
@Andrew, should I send it as a patch on top of mm-stable?
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8da0e66ca22d..859902dd06fc 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -111,17 +111,22 @@ static int ftrace_verify_code(unsigned long ip, const char *old_code)
*/
static int __ref
ftrace_modify_code_direct(unsigned long ip, const char *old_code,
- const char *new_code)
+ const char *new_code, struct module *mod)
{
int ret = ftrace_verify_code(ip, old_code);
if (ret)
return ret;
/* replace the text with the new text */
- if (ftrace_poke_late)
+ if (ftrace_poke_late) {
text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
- else
+ } else if (!mod) {
text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
+ } else {
+ mutex_lock(&text_mutex);
+ text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
+ mutex_unlock(&text_mutex);
+ }
return 0;
}
@@ -142,7 +147,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long ad
* just modify the code directly.
*/
if (addr == MCOUNT_ADDR)
- return ftrace_modify_code_direct(ip, old, new);
+ return ftrace_modify_code_direct(ip, old, new, mod);
/*
* x86 overrides ftrace_replace_code -- this function will never be used
@@ -161,7 +166,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
new = ftrace_call_replace(ip, addr);
/* Should only be called when module is loaded */
- return ftrace_modify_code_direct(rec->ip, old, new);
+ return ftrace_modify_code_direct(rec->ip, old, new, NULL);
}
/*
> -- Steve
--
Sincerely yours,
Mike.
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [REGRESSION] Re: [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit
2024-10-23 16:27 ` [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit Mike Rapoport
@ 2025-01-12 18:42 ` Ville Syrjälä
2025-01-12 19:07 ` Borislav Petkov
0 siblings, 1 reply; 23+ messages in thread
From: Ville Syrjälä @ 2025-01-12 18:42 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Borislav Petkov, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc
On Wed, Oct 23, 2024 at 07:27:11PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module
> text allocations on 64 bit.
Hi,
This breaks resume from hibernation on my Alderlake laptop.
Fortunately this still reverts cleanly.
pstore managed to catch the following oops (seems to blow up on
bringing up the second ht of the second pcore for whatever reason):
<6>[ 33.993727] PM: hibernation: hibernation entry
<6>[ 34.154410] Filesystems sync: 0.006 seconds
<6>[ 34.154418] Freezing user space processes
<6>[ 34.156019] Freezing user space processes completed (elapsed 0.001 seconds)
<6>[ 34.156026] OOM killer disabled.
<6>[ 34.160470] PM: hibernation: Preallocating image memory
<6>[ 34.330327] PM: hibernation: Allocated 387861 pages for snapshot
<6>[ 34.330328] PM: hibernation: Allocated 1551444 kbytes in 0.16 seconds (9696.52 MB/s)
<6>[ 34.330330] Freezing remaining freezable tasks
<6>[ 34.331499] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
<6>[ 34.351602] printk: Suspending console(s) (use no_console_suspend to debug)
<6>[ 34.836184] ACPI: EC: interrupt blocked
<6>[ 34.858903] ACPI: PM: Preparing to enter system sleep state S4
<6>[ 34.868434] ACPI: EC: event blocked
<6>[ 34.868434] ACPI: EC: EC stopped
<6>[ 34.868435] ACPI: PM: Saving platform NVS memory
<6>[ 34.872538] Disabling non-boot CPUs ...
<6>[ 34.874012] smpboot: CPU 15 is now offline
<6>[ 34.875511] smpboot: CPU 14 is now offline
<6>[ 34.877209] smpboot: CPU 13 is now offline
<6>[ 34.879173] smpboot: CPU 12 is now offline
<6>[ 34.881117] smpboot: CPU 11 is now offline
<6>[ 34.882952] smpboot: CPU 10 is now offline
<6>[ 34.885175] smpboot: CPU 9 is now offline
<6>[ 34.886844] smpboot: CPU 8 is now offline
<6>[ 34.889056] smpboot: CPU 7 is now offline
<6>[ 34.891866] smpboot: CPU 6 is now offline
<6>[ 34.893685] smpboot: CPU 5 is now offline
<6>[ 34.896856] smpboot: CPU 4 is now offline
<6>[ 34.900015] smpboot: CPU 3 is now offline
<6>[ 34.902355] smpboot: CPU 2 is now offline
<6>[ 34.906162] smpboot: CPU 1 is now offline
<6>[ 34.914835] PM: hibernation: Creating image:
<6>[ 35.074645] PM: hibernation: Need to copy 348072 pages
<6>[ 34.915076] ACPI: PM: Restoring platform NVS memory
<6>[ 34.916499] ACPI: EC: EC started
<6>[ 34.918228] Enabling non-boot CPUs ...
<6>[ 34.918248] smpboot: Booting Node 0 Processor 1 APIC 0x1
<6>[ 34.920121] CPU1 is up
<6>[ 34.920132] smpboot: Booting Node 0 Processor 2 APIC 0x8
<6>[ 34.923737] CPU2 is up
<6>[ 34.923746] smpboot: Booting Node 0 Processor 3 APIC 0x9
<1>[ 34.927019] BUG: kernel NULL pointer dereference, address: 0000000000000008
<1>[ 34.927023] #PF: supervisor write access in kernel mode
<1>[ 34.927024] #PF: error_code(0x0002) - not-present page
<6>[ 34.927026] PGD 0 P4D 0
<4>[ 34.927028] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
<4>[ 34.927030] CPU: 3 UID: 0 PID: 93 Comm: cpuhp/3 Not tainted 6.13.0-rc5+ #406
<4>[ 34.927033] Hardware name: LENOVO 21AJS29M0Q/21AJS29M0Q, BIOS N3MET18W (1.17 ) 10/24/2023
<4>[ 34.927033] RIP: 0010:__rmqueue_pcplist+0x1c9/0x6f0
<4>[ 34.927039] Code: 00 00 48 39 f0 0f 84 7c 01 00 00 49 89 c6 49 83 ee 08 0f 84 6f 01 00 00 48 8b 78 08 41 b8 01 00 00 00 89 ce 4c 8b 08 41 d3 e0 <49> 89 79 08 4c 89 0f 48 bf 00 01 00 00 00 00 ad de 48 89 38 48 83
<4>[ 34.927041] RSP: 0000:ffffc90000457840 EFLAGS: 00010082
<4>[ 34.927042] RAX: ffffea000427af48 RBX: ffff88844e2f0720 RCX: 0000000000000000
<4>[ 34.927043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff827f0f00
<4>[ 34.927043] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
<4>[ 34.927044] R10: ffff88844e2f0700 R11: 0000000000000001 R12: 0000000000000000
<4>[ 34.927044] R13: ffffffff827f0e40 R14: ffffea000427af40 R15: ffffffffffffffff
<4>[ 34.927045] FS: 0000000000000000(0000) GS:ffff88844e2c0000(0000) knlGS:0000000000000000
<4>[ 34.927046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 34.927046] CR2: 0000000000000008 CR3: 000000000402e001 CR4: 0000000000372ef0
<4>[ 34.927047] Call Trace:
<4>[ 34.927048] <TASK>
<4>[ 34.927049] ? __die+0x1a/0x60
<4>[ 34.927053] ? page_fault_oops+0x157/0x440
<4>[ 34.927055] ? exc_page_fault+0x444/0x710
<4>[ 34.927058] ? asm_exc_page_fault+0x22/0x30
<4>[ 34.927060] ? __rmqueue_pcplist+0x1c9/0x6f0
<4>[ 34.927061] get_page_from_freelist+0x20f/0xf00
<4>[ 34.927063] __alloc_pages_noprof+0x111/0x240
<4>[ 34.927065] allocate_slab+0x24f/0x3c0
<4>[ 34.927066] ? get_page_from_freelist+0x21a/0xf00
<4>[ 34.927067] ___slab_alloc+0x42a/0xb80
<4>[ 34.927069] ? __kernfs_new_node.isra.0+0x59/0x1e0
<4>[ 34.927071] ? get_nohz_timer_target+0x29/0x140
<4>[ 34.927073] ? timerqueue_add+0x62/0xb0
<4>[ 34.927074] kmem_cache_alloc_noprof+0x17a/0x220
<4>[ 34.927076] __kernfs_new_node.isra.0+0x59/0x1e0
<4>[ 34.927077] ? dl_server_stop+0x26/0x30
<4>[ 34.927079] ? dequeue_entities+0x33b/0x5f0
<4>[ 34.927080] kernfs_new_node+0x3b/0x60
<4>[ 34.927081] kernfs_create_dir_ns+0x22/0x80
<4>[ 34.927083] sysfs_create_dir_ns+0x6b/0xd0
<4>[ 34.927085] kobject_add_internal+0x9e/0x270
<4>[ 34.927087] kobject_add+0x79/0xe0
<4>[ 34.927088] ? __kmalloc_cache_noprof+0x105/0x240
<4>[ 34.927090] device_add+0xb7/0x7d0
<4>[ 34.927092] ? complete_all+0x1b/0x80
<4>[ 34.927095] cpu_device_create+0xea/0x100
<4>[ 34.927097] ? __pfx_smpboot_thread_fn+0x10/0x10
<4>[ 34.927099] ? __set_cpus_allowed_ptr+0x47/0x90
<4>[ 34.927100] ? detect_cache_attributes+0xd2/0x330
<4>[ 34.927102] ? __pfx_cacheinfo_cpu_online+0x10/0x10
<4>[ 34.927104] cacheinfo_cpu_online+0x94/0x270
<4>[ 34.927105] ? __pfx_cacheinfo_cpu_online+0x10/0x10
<4>[ 34.927107] ? __pfx_smpboot_thread_fn+0x10/0x10
<4>[ 34.927108] cpuhp_invoke_callback+0x10e/0x4b0
<4>[ 34.927110] ? __pfx_smpboot_thread_fn+0x10/0x10
<4>[ 34.927111] cpuhp_thread_fun+0xcf/0x150
<4>[ 34.927112] smpboot_thread_fn+0xd1/0x1c0
<4>[ 34.927113] kthread+0xc6/0x100
<4>[ 34.927115] ? __pfx_kthread+0x10/0x10
<4>[ 34.927115] ret_from_fork+0x28/0x40
<4>[ 34.927117] ? __pfx_kthread+0x10/0x10
<4>[ 34.927118] ret_from_fork_asm+0x1a/0x30
<4>[ 34.927120] </TASK>
<4>[ 34.927120] Modules linked in: fuse ctr ccm sch_fq_codel cmac xt_multiport xt_tcpudp xt_state iptable_filter xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables bnep btusb uvcvideo btrtl btintel videobuf2_vmalloc btbcm videobuf2_memops btmtk uvc videobuf2_v4l2 bluetooth videodev videobuf2_common ecdh_generic mc ecc binfmt_misc snd_hda_codec_hdmi mousedev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_ctl_led iwlmvm mac80211 libarc4 i915 snd_hda_intel snd_intel_dspcfg i2c_algo_bit snd_hda_codec drm_buddy xhci_pci intel_tcc_cooling mei_hdcp snd_hwdep ttm mei_pxp mei_wdt processor_thermal_device_pci snd_hda_core x86_pkg_temp_thermal processor_thermal_device xhci_hcd thinkpad_acpi intel_powerclamp 8250_pci processor_thermal_wt_hint drm_display_helper snd_pcm processor_thermal_rfim i2c_hid_acpi coretemp 8250 processor_thermal_rapl nvram i2c_hid iwlwifi kvm_intel drm_kms_helper usbcore intel_rapl_msr 8250_base iTCO_wdt intel_rapl_common platform_profile hid snd_timer
<4>[ 34.927180] intel_gtt mei_me serial_mctrl_gpio led_class processor_thermal_wt_req cfg80211 kvm thunderbolt e1000e snd efi_pstore intel_lpss_pci serial_base usb_common agpgart processor_thermal_power_floor spi_intel_pci i2c_i801 intel_lpss mei soundcore processor_thermal_mbox drm rfkill pstore spi_intel mfd_core tpm_tis i2c_smbus psmouse tpm_tis_core zlib_deflate pcspkr evdev tpm think_lmi video intel_pmc_core firmware_attributes_class intel_hid sparse_keymap int3403_thermal intel_vsec pinctrl_tigerlake int3400_thermal wmi int340x_thermal_zone pmt_telemetry i2c_core backlight pinctrl_intel pmt_class acpi_thermal_rel efivarfs
<4>[ 34.927209] CR2: 0000000000000008
<4>[ 34.927210] ---[ end trace 0000000000000000 ]---
>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> Tested-by: kdevops <kdevops@lists.linux.dev>
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/mm/init.c | 37 ++++++++++++++++++++++++++++++++++++-
> 2 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2852fcd82cbd..ff71d18253ba 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -83,6 +83,7 @@ config X86
> select ARCH_HAS_DMA_OPS if GART_IOMMU || XEN
> select ARCH_HAS_EARLY_DEBUG if KGDB
> select ARCH_HAS_ELF_RANDOMIZE
> + select ARCH_HAS_EXECMEM_ROX if X86_64
> select ARCH_HAS_FAST_MULTIPLIER
> select ARCH_HAS_FORTIFY_SOURCE
> select ARCH_HAS_GCOV_PROFILE_ALL
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index eb503f53c319..c2e4f389f47f 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -1053,18 +1053,53 @@ unsigned long arch_max_swapfile_size(void)
> #ifdef CONFIG_EXECMEM
> static struct execmem_info execmem_info __ro_after_init;
>
> +#ifdef CONFIG_ARCH_HAS_EXECMEM_ROX
> +void execmem_fill_trapping_insns(void *ptr, size_t size, bool writeable)
> +{
> + /* fill memory with INT3 instructions */
> + if (writeable)
> + memset(ptr, INT3_INSN_OPCODE, size);
> + else
> + text_poke_set(ptr, INT3_INSN_OPCODE, size);
> +}
> +#endif
> +
> struct execmem_info __init *execmem_arch_setup(void)
> {
> unsigned long start, offset = 0;
> + enum execmem_range_flags flags;
> + pgprot_t pgprot;
>
> if (kaslr_enabled())
> offset = get_random_u32_inclusive(1, 1024) * PAGE_SIZE;
>
> start = MODULES_VADDR + offset;
>
> + if (IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX)) {
> + pgprot = PAGE_KERNEL_ROX;
> + flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE;
> + } else {
> + pgprot = PAGE_KERNEL;
> + flags = EXECMEM_KASAN_SHADOW;
> + }
> +
> execmem_info = (struct execmem_info){
> .ranges = {
> - [EXECMEM_DEFAULT] = {
> + [EXECMEM_MODULE_TEXT] = {
> + .flags = flags,
> + .start = start,
> + .end = MODULES_END,
> + .pgprot = pgprot,
> + .alignment = MODULE_ALIGN,
> + },
> + [EXECMEM_KPROBES ... EXECMEM_BPF] = {
> + .flags = EXECMEM_KASAN_SHADOW,
> + .start = start,
> + .end = MODULES_END,
> + .pgprot = PAGE_KERNEL,
> + .alignment = MODULE_ALIGN,
> + },
> + [EXECMEM_MODULE_DATA] = {
> .flags = EXECMEM_KASAN_SHADOW,
> .start = start,
> .end = MODULES_END,
> --
> 2.43.0
>
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [REGRESSION] Re: [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit
2025-01-12 18:42 ` [REGRESSION] " Ville Syrjälä
@ 2025-01-12 19:07 ` Borislav Petkov
2025-01-13 11:11 ` Peter Zijlstra
2025-01-13 15:45 ` [REGRESSION] Re: [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit Ville Syrjälä
0 siblings, 2 replies; 23+ messages in thread
From: Borislav Petkov @ 2025-01-12 19:07 UTC (permalink / raw)
To: Ville Syrjälä
Cc: Mike Rapoport, Andrew Morton, Luis Chamberlain, Andreas Larsson,
Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc
On Sun, Jan 12, 2025 at 08:42:05PM +0200, Ville Syrjälä wrote:
> On Wed, Oct 23, 2024 at 07:27:11PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module
> > text allocations on 64 bit.
>
> Hi,
>
> This breaks resume from hibernation on my Alderlake laptop.
>
> Fortunately this still reverts cleanly.
Does that hunk in the mail here fix it?
https://lore.kernel.org/all/Z4DwPkcYyZ-tDKwY@kernel.org/
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [REGRESSION] Re: [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit
2025-01-12 19:07 ` Borislav Petkov
@ 2025-01-13 11:11 ` Peter Zijlstra
2025-01-13 11:29 ` [PATCH] x86: Disable EXECMEM_ROX support Peter Zijlstra
2025-01-13 15:45 ` [REGRESSION] Re: [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit Ville Syrjälä
1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2025-01-13 11:11 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ville Syrjälä, Mike Rapoport, Andrew Morton,
Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Brian Cain, Catalin Marinas,
Christoph Hellwig, Christophe Leroy, Dave Hansen, Dinh Nguyen,
Geert Uytterhoeven, Guo Ren, Helge Deller, Huacai Chen,
Ingo Molnar, Johannes Berg, John Paul Adrian Glaubitz,
Kent Overstreet, Liam R. Howlett, Mark Rutland, Masami Hiramatsu,
Matt Turner, Max Filippov, Michael Ellerman, Michal Simek,
Oleg Nesterov, Palmer Dabbelt, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc
On Sun, Jan 12, 2025 at 08:07:55PM +0100, Borislav Petkov wrote:
> On Sun, Jan 12, 2025 at 08:42:05PM +0200, Ville Syrjälä wrote:
> > On Wed, Oct 23, 2024 at 07:27:11PM +0300, Mike Rapoport wrote:
> > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > >
> > > Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module
> > > text allocations on 64 bit.
> >
> > Hi,
> >
> > This breaks resume from hibernation on my Alderlake laptop.
> >
> > Fortunately this still reverts cleanly.
>
> Does that hunk in the mail here fix it?
>
> https://lore.kernel.org/all/Z4DwPkcYyZ-tDKwY@kernel.org/
There's definiltely breakage with that module_writable_address()
nonsense in alternative.c that will not be fixed by that patch.
The very simplest thing at this point is to remove:
select ARCH_HAS_EXECMEM_ROX if X86_64
and try again next cycle.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] x86: Disable EXECMEM_ROX support
2025-01-13 11:11 ` Peter Zijlstra
@ 2025-01-13 11:29 ` Peter Zijlstra
2025-01-13 11:51 ` Borislav Petkov
2025-01-13 15:47 ` Ville Syrjälä
0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2025-01-13 11:29 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ville Syrjälä, Mike Rapoport, Andrew Morton,
Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Brian Cain, Catalin Marinas,
Christoph Hellwig, Christophe Leroy, Dave Hansen, Dinh Nguyen,
Geert Uytterhoeven, Guo Ren, Helge Deller, Huacai Chen,
Ingo Molnar, Johannes Berg, John Paul Adrian Glaubitz,
Kent Overstreet, Liam R. Howlett, Mark Rutland, Masami Hiramatsu,
Matt Turner, Max Filippov, Michael Ellerman, Michal Simek,
Oleg Nesterov, Palmer Dabbelt, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc
On Mon, Jan 13, 2025 at 12:11:16PM +0100, Peter Zijlstra wrote:
> There's definiltely breakage with that module_writable_address()
> nonsense in alternative.c that will not be fixed by that patch.
>
> The very simplest thing at this point is to remove:
>
> select ARCH_HAS_EXECMEM_ROX if X86_64
>
> and try again next cycle.
Boris asked I send it as a proper patch, so here goes. Perhaps next time
let x86 merge x86 code :/
---
Subject: x86: Disable EXECMEM_ROX support
The whole module_writable_address() nonsense made a giant mess of
alternative.c, not to mention it still contains bugs -- notable some of the CFI
variants crash and burn.
Mike has been working on patches to clean all this up again, but given the
current state of things, this stuff just isn't ready.
Disable for now, lets try again next cycle.
Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9d7bd0ae48c4..ef6cfea9df73 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -83,7 +83,6 @@ config X86
select ARCH_HAS_DMA_OPS if GART_IOMMU || XEN
select ARCH_HAS_EARLY_DEBUG if KGDB
select ARCH_HAS_ELF_RANDOMIZE
- select ARCH_HAS_EXECMEM_ROX if X86_64
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] x86: Disable EXECMEM_ROX support
2025-01-13 11:29 ` [PATCH] x86: Disable EXECMEM_ROX support Peter Zijlstra
@ 2025-01-13 11:51 ` Borislav Petkov
2025-01-13 15:47 ` Ville Syrjälä
1 sibling, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2025-01-13 11:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ville Syrjälä, Mike Rapoport, Andrew Morton,
Luis Chamberlain, Andreas Larsson, Andy Lutomirski,
Ard Biesheuvel, Arnd Bergmann, Brian Cain, Catalin Marinas,
Christoph Hellwig, Christophe Leroy, Dave Hansen, Dinh Nguyen,
Geert Uytterhoeven, Guo Ren, Helge Deller, Huacai Chen,
Ingo Molnar, Johannes Berg, John Paul Adrian Glaubitz,
Kent Overstreet, Liam R. Howlett, Mark Rutland, Masami Hiramatsu,
Matt Turner, Max Filippov, Michael Ellerman, Michal Simek,
Oleg Nesterov, Palmer Dabbelt, Richard Weinberger, Russell King,
Song Liu, Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc
On Mon, Jan 13, 2025 at 12:29:34PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 13, 2025 at 12:11:16PM +0100, Peter Zijlstra wrote:
>
> > There's definiltely breakage with that module_writable_address()
> > nonsense in alternative.c that will not be fixed by that patch.
> >
> > The very simplest thing at this point is to remove:
> >
> > select ARCH_HAS_EXECMEM_ROX if X86_64
> >
> > and try again next cycle.
>
> Boris asked I send it as a proper patch, so here goes. Perhaps next time
> let x86 merge x86 code :/
I just love it how this went in without a single x86 maintainer Ack, it broke
a bunch of things and then it is still there instead of getting reverted.
Let's not do this again please.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [REGRESSION] Re: [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit
2025-01-12 19:07 ` Borislav Petkov
2025-01-13 11:11 ` Peter Zijlstra
@ 2025-01-13 15:45 ` Ville Syrjälä
1 sibling, 0 replies; 23+ messages in thread
From: Ville Syrjälä @ 2025-01-13 15:45 UTC (permalink / raw)
To: Borislav Petkov
Cc: Mike Rapoport, Andrew Morton, Luis Chamberlain, Andreas Larsson,
Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Brian Cain,
Catalin Marinas, Christoph Hellwig, Christophe Leroy, Dave Hansen,
Dinh Nguyen, Geert Uytterhoeven, Guo Ren, Helge Deller,
Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc
On Sun, Jan 12, 2025 at 08:07:55PM +0100, Borislav Petkov wrote:
> On Sun, Jan 12, 2025 at 08:42:05PM +0200, Ville Syrjälä wrote:
> > On Wed, Oct 23, 2024 at 07:27:11PM +0300, Mike Rapoport wrote:
> > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > >
> > > Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module
> > > text allocations on 64 bit.
> >
> > Hi,
> >
> > This breaks resume from hibernation on my Alderlake laptop.
> >
> > Fortunately this still reverts cleanly.
>
> Does that hunk in the mail here fix it?
>
> https://lore.kernel.org/all/Z4DwPkcYyZ-tDKwY@kernel.org/
Still blows up with that one.
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86: Disable EXECMEM_ROX support
2025-01-13 11:29 ` [PATCH] x86: Disable EXECMEM_ROX support Peter Zijlstra
2025-01-13 11:51 ` Borislav Petkov
@ 2025-01-13 15:47 ` Ville Syrjälä
1 sibling, 0 replies; 23+ messages in thread
From: Ville Syrjälä @ 2025-01-13 15:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Borislav Petkov, Mike Rapoport, Andrew Morton, Luis Chamberlain,
Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Brian Cain, Catalin Marinas, Christoph Hellwig, Christophe Leroy,
Dave Hansen, Dinh Nguyen, Geert Uytterhoeven, Guo Ren,
Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Richard Weinberger, Russell King, Song Liu, Stafford Horne,
Steven Rostedt, Suren Baghdasaryan, Thomas Bogendoerfer,
Thomas Gleixner, Uladzislau Rezki, Vineet Gupta, Will Deacon, bpf,
linux-alpha, linux-arch, linux-arm-kernel, linux-csky,
linux-hexagon, linux-kernel, linux-m68k, linux-mips, linux-mm,
linux-modules, linux-openrisc, linux-parisc
On Mon, Jan 13, 2025 at 12:29:34PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 13, 2025 at 12:11:16PM +0100, Peter Zijlstra wrote:
>
> > There's definiltely breakage with that module_writable_address()
> > nonsense in alternative.c that will not be fixed by that patch.
> >
> > The very simplest thing at this point is to remove:
> >
> > select ARCH_HAS_EXECMEM_ROX if X86_64
> >
> > and try again next cycle.
>
> Boris asked I send it as a proper patch, so here goes. Perhaps next time
> let x86 merge x86 code :/
>
> ---
> Subject: x86: Disable EXECMEM_ROX support
>
> The whole module_writable_address() nonsense made a giant mess of
> alternative.c, not to mention it still contains bugs -- notable some of the CFI
> variants crash and burn.
>
> Mike has been working on patches to clean all this up again, but given the
> current state of things, this stuff just isn't ready.
>
> Disable for now, lets try again next cycle.
>
> Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/x86/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 9d7bd0ae48c4..ef6cfea9df73 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -83,7 +83,6 @@ config X86
> select ARCH_HAS_DMA_OPS if GART_IOMMU || XEN
> select ARCH_HAS_EARLY_DEBUG if KGDB
> select ARCH_HAS_ELF_RANDOMIZE
> - select ARCH_HAS_EXECMEM_ROX if X86_64
> select ARCH_HAS_FAST_MULTIPLIER
> select ARCH_HAS_FORTIFY_SOURCE
> select ARCH_HAS_GCOV_PROFILE_ALL
This one works for my hibernate woes.
In case you want it:
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v7 7/8] execmem: add support for cache of large ROX pages
2024-10-23 16:27 ` [PATCH v7 7/8] execmem: add support for cache of large ROX pages Mike Rapoport
@ 2025-02-27 11:13 ` Ryan Roberts
2025-02-28 13:55 ` Mike Rapoport
0 siblings, 1 reply; 23+ messages in thread
From: Ryan Roberts @ 2025-02-27 11:13 UTC (permalink / raw)
To: Mike Rapoport, Andrew Morton, Luis Chamberlain, Dev Jain
Cc: Andreas Larsson, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
Borislav Petkov, Brian Cain, Catalin Marinas, Christoph Hellwig,
Christophe Leroy, Dave Hansen, Dinh Nguyen, Geert Uytterhoeven,
Guo Ren, Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
Hi Mike,
Drive by review comments below...
On 23/10/2024 17:27, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> Using large pages to map text areas reduces iTLB pressure and improves
> performance.
>
> Extend execmem_alloc() with an ability to use huge pages with ROX
> permissions as a cache for smaller allocations.
>
> To populate the cache, a writable large page is allocated from vmalloc with
> VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> ROX.
>
> The direct map alias of that large page is exculded from the direct map.
>
> Portions of that large page are handed out to execmem_alloc() callers
> without any changes to the permissions.
>
> When the memory is freed with execmem_free() it is invalidated again so
> that it won't contain stale instructions.
>
> An architecture has to implement execmem_fill_trapping_insns() callback
> and select ARCH_HAS_EXECMEM_ROX configuration option to be able to use
> the ROX cache.
>
> The cache is enabled on per-range basis when an architecture sets
> EXECMEM_ROX_CACHE flag in definition of an execmem_range.
>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> Tested-by: kdevops <kdevops@lists.linux.dev>
> ---
[...]
> +
> +static int execmem_cache_populate(struct execmem_range *range, size_t size)
> +{
> + unsigned long vm_flags = VM_ALLOW_HUGE_VMAP;
> + unsigned long start, end;
> + struct vm_struct *vm;
> + size_t alloc_size;
> + int err = -ENOMEM;
> + void *p;
> +
> + alloc_size = round_up(size, PMD_SIZE);
> + p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags);
Shouldn't this be passing PAGE_KERNEL_ROX? Otherwise I don't see how the
allocated memory is ROX? I don't see any call below where you change the permission.
Given the range has the pgprot in it, you could just drop passing the pgprot
explicitly here and have execmem_vmalloc() use range->pgprot directly?
Thanks,
Ryan
> + if (!p)
> + return err;
> +
> + vm = find_vm_area(p);
> + if (!vm)
> + goto err_free_mem;
> +
> + /* fill memory with instructions that will trap */
> + execmem_fill_trapping_insns(p, alloc_size, /* writable = */ true);
> +
> + start = (unsigned long)p;
> + end = start + alloc_size;
> +
> + vunmap_range(start, end);
> +
> + err = execmem_set_direct_map_valid(vm, false);
> + if (err)
> + goto err_free_mem;
> +
> + err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages,
> + PMD_SHIFT);
> + if (err)
> + goto err_free_mem;
> +
> + err = execmem_cache_add(p, alloc_size);
> + if (err)
> + goto err_free_mem;
> +
> + return 0;
> +
> +err_free_mem:
> + vfree(p);
> + return err;
> +}
[...]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v7 7/8] execmem: add support for cache of large ROX pages
2025-02-27 11:13 ` Ryan Roberts
@ 2025-02-28 13:55 ` Mike Rapoport
0 siblings, 0 replies; 23+ messages in thread
From: Mike Rapoport @ 2025-02-28 13:55 UTC (permalink / raw)
To: Ryan Roberts
Cc: Andrew Morton, Luis Chamberlain, Dev Jain, Andreas Larsson,
Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Borislav Petkov,
Brian Cain, Catalin Marinas, Christoph Hellwig, Christophe Leroy,
Dave Hansen, Dinh Nguyen, Geert Uytterhoeven, Guo Ren,
Helge Deller, Huacai Chen, Ingo Molnar, Johannes Berg,
John Paul Adrian Glaubitz, Kent Overstreet, Liam R. Howlett,
Mark Rutland, Masami Hiramatsu, Matt Turner, Max Filippov,
Michael Ellerman, Michal Simek, Oleg Nesterov, Palmer Dabbelt,
Peter Zijlstra, Richard Weinberger, Russell King, Song Liu,
Stafford Horne, Steven Rostedt, Suren Baghdasaryan,
Thomas Bogendoerfer, Thomas Gleixner, Uladzislau Rezki,
Vineet Gupta, Will Deacon, bpf, linux-alpha, linux-arch,
linux-arm-kernel, linux-csky, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-mm, linux-modules, linux-openrisc,
linux-parisc, linux-riscv, linux-sh, linux-snps-arc,
linux-trace-kernel, linux-um, linuxppc-dev, loongarch, sparclinux,
x86
Hi Ryan,
On Thu, Feb 27, 2025 at 11:13:29AM +0000, Ryan Roberts wrote:
> Hi Mike,
>
> Drive by review comments below...
>
>
> On 23/10/2024 17:27, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > Using large pages to map text areas reduces iTLB pressure and improves
> > performance.
> >
> > Extend execmem_alloc() with an ability to use huge pages with ROX
> > permissions as a cache for smaller allocations.
> >
> > To populate the cache, a writable large page is allocated from vmalloc with
> > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> > ROX.
> >
> > The direct map alias of that large page is exculded from the direct map.
> >
> > Portions of that large page are handed out to execmem_alloc() callers
> > without any changes to the permissions.
> >
> > When the memory is freed with execmem_free() it is invalidated again so
> > that it won't contain stale instructions.
> >
> > An architecture has to implement execmem_fill_trapping_insns() callback
> > and select ARCH_HAS_EXECMEM_ROX configuration option to be able to use
> > the ROX cache.
> >
> > The cache is enabled on per-range basis when an architecture sets
> > EXECMEM_ROX_CACHE flag in definition of an execmem_range.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> > Tested-by: kdevops <kdevops@lists.linux.dev>
> > ---
>
> [...]
>
> > +
> > +static int execmem_cache_populate(struct execmem_range *range, size_t size)
> > +{
> > + unsigned long vm_flags = VM_ALLOW_HUGE_VMAP;
> > + unsigned long start, end;
> > + struct vm_struct *vm;
> > + size_t alloc_size;
> > + int err = -ENOMEM;
> > + void *p;
> > +
> > + alloc_size = round_up(size, PMD_SIZE);
> > + p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags);
>
> Shouldn't this be passing PAGE_KERNEL_ROX? Otherwise I don't see how the
> allocated memory is ROX? I don't see any call below where you change the permission.
The memory is allocated RW, filled with invalid instructions, unammped in
vmalloc space, removed from the direct map and then mapped as ROX in
vmalloc address space.
> Given the range has the pgprot in it, you could just drop passing the pgprot
> explicitly here and have execmem_vmalloc() use range->pgprot directly?
Here range->prprot and the prot passed to vmalloc are different.
> Thanks,
> Ryan
>
> > + if (!p)
> > + return err;
> > +
> > + vm = find_vm_area(p);
> > + if (!vm)
> > + goto err_free_mem;
> > +
> > + /* fill memory with instructions that will trap */
> > + execmem_fill_trapping_insns(p, alloc_size, /* writable = */ true);
> > +
> > + start = (unsigned long)p;
> > + end = start + alloc_size;
> > +
> > + vunmap_range(start, end);
> > +
> > + err = execmem_set_direct_map_valid(vm, false);
> > + if (err)
> > + goto err_free_mem;
> > +
> > + err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages,
> > + PMD_SHIFT);
> > + if (err)
> > + goto err_free_mem;
> > +
> > + err = execmem_cache_add(p, alloc_size);
> > + if (err)
> > + goto err_free_mem;
> > +
> > + return 0;
> > +
> > +err_free_mem:
> > + vfree(p);
> > + return err;
> > +}
>
> [...]
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-02-28 13:56 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 16:27 [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 1/8] mm: vmalloc: group declarations depending on CONFIG_MMU together Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 2/8] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 3/8] asm-generic: introduce text-patching.h Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 4/8] module: prepare to handle ROX allocations for text Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 5/8] arch: introduce set_direct_map_valid_noflush() Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 6/8] x86/module: prepare module loading for ROX allocations of text Mike Rapoport
2024-11-04 23:27 ` Nathan Chancellor
2024-11-05 7:02 ` Mike Rapoport
2024-11-05 19:04 ` Nathan Chancellor
2024-10-23 16:27 ` [PATCH v7 7/8] execmem: add support for cache of large ROX pages Mike Rapoport
2025-02-27 11:13 ` Ryan Roberts
2025-02-28 13:55 ` Mike Rapoport
2024-10-23 16:27 ` [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit Mike Rapoport
2025-01-12 18:42 ` [REGRESSION] " Ville Syrjälä
2025-01-12 19:07 ` Borislav Petkov
2025-01-13 11:11 ` Peter Zijlstra
2025-01-13 11:29 ` [PATCH] x86: Disable EXECMEM_ROX support Peter Zijlstra
2025-01-13 11:51 ` Borislav Petkov
2025-01-13 15:47 ` Ville Syrjälä
2025-01-13 15:45 ` [REGRESSION] Re: [PATCH v7 8/8] x86/module: enable ROX caches for module text on 64 bit Ville Syrjälä
2024-11-18 18:25 ` [PATCH v7 0/8] x86/module: use large ROX pages for text allocations Steven Rostedt
2024-11-18 18:40 ` Mike Rapoport
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).