* [PATCH v4 02/63] vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and related macros
From: Josh Poimboeuf @ 2025-09-17 16:03 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Petr Mladek, Miroslav Benes, Joe Lawrence,
live-patching, Song Liu, laokz, Jiri Kosina,
Marcos Paulo de Souza, Weinan Liu, Fazla Mehrab, Chen Zhongjin,
Puranjay Mohan, Dylan Hatch, Peter Zijlstra, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev
In-Reply-To: <cover.1758067942.git.jpoimboe@kernel.org>
TEXT_MAIN, DATA_MAIN and friends are defined differently depending on
whether certain config options enable -ffunction-sections and/or
-fdata-sections.
There's no technical reason for that beyond voodoo coding. Keeping the
separate implementations adds unnecessary complexity, fragments the
logic, and increases the risk of subtle bugs.
Unify the macros by using the same input section patterns across all
configs.
This is a prerequisite for the upcoming livepatch klp-build tooling
which will manually enable -ffunction-sections and -fdata-sections via
KCFLAGS.
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
include/asm-generic/vmlinux.lds.h | 40 ++++++++++---------------------
scripts/module.lds.S | 12 ++++------
2 files changed, 17 insertions(+), 35 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ae2d2359b79e9..6b2311fa41393 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -87,39 +87,24 @@
#define ALIGN_FUNCTION() . = ALIGN(CONFIG_FUNCTION_ALIGNMENT)
/*
- * LD_DEAD_CODE_DATA_ELIMINATION option enables -fdata-sections, which
- * generates .data.identifier sections, which need to be pulled in with
- * .data. We don't want to pull in .data..other sections, which Linux
- * has defined. Same for text and bss.
+ * Support -ffunction-sections by matching .text and .text.*,
+ * but exclude '.text..*'.
*
- * With LTO_CLANG, the linker also splits sections by default, so we need
- * these macros to combine the sections during the final link.
- *
- * With AUTOFDO_CLANG and PROPELLER_CLANG, by default, the linker splits
- * text sections and regroups functions into subsections.
- *
- * RODATA_MAIN is not used because existing code already defines .rodata.x
- * sections to be brought in with rodata.
+ * Special .text.* sections that are typically grouped separately, such as
+ * .text.unlikely or .text.hot, must be matched explicitly before using
+ * TEXT_MAIN.
*/
-#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \
-defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
-#else
-#define TEXT_MAIN .text
-#endif
-#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
+
+/*
+ * Support -fdata-sections by matching .data, .data.*, and others,
+ * but exclude '.data..*'.
+ */
#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data.rel.* .data..L* .data..compoundliteral* .data.$__unnamed_* .data.$L*
#define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L*
#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..L* .bss..compoundliteral*
#define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
-#else
-#define DATA_MAIN .data .data.rel .data.rel.local
-#define SDATA_MAIN .sdata
-#define RODATA_MAIN .rodata
-#define BSS_MAIN .bss
-#define SBSS_MAIN .sbss
-#endif
/*
* GCC 4.5 and later have a 32 bytes section alignment for structures.
@@ -580,9 +565,8 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
* during second ld run in second ld pass when generating System.map
*
* TEXT_MAIN here will match symbols with a fixed pattern (for example,
- * .text.hot or .text.unlikely) if dead code elimination or
- * function-section is enabled. Match these symbols first before
- * TEXT_MAIN to ensure they are grouped together.
+ * .text.hot or .text.unlikely). Match those before TEXT_MAIN to ensure
+ * they get grouped together.
*
* Also placing .text.hot section at the beginning of a page, this
* would help the TLB performance.
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index ee79c41059f3d..2632c6cb8ebe7 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -38,12 +38,10 @@ SECTIONS {
__kcfi_traps : { KEEP(*(.kcfi_traps)) }
#endif
-#ifdef CONFIG_LTO_CLANG
- /*
- * With CONFIG_LTO_CLANG, LLD always enables -fdata-sections and
- * -ffunction-sections, which increases the size of the final module.
- * Merge the split sections in the final binary.
- */
+ .text : {
+ *(.text .text.[0-9a-zA-Z_]*)
+ }
+
.bss : {
*(.bss .bss.[0-9a-zA-Z_]*)
*(.bss..L*)
@@ -58,7 +56,7 @@ SECTIONS {
*(.rodata .rodata.[0-9a-zA-Z_]*)
*(.rodata..L*)
}
-#endif
+
MOD_SEPARATE_CODETAG_SECTIONS()
}
--
2.50.0
^ permalink raw reply related
* [PATCH v4 01/63] s390/vmlinux.lds.S: Prevent thunk functions from getting placed with normal text
From: Josh Poimboeuf @ 2025-09-17 16:03 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Petr Mladek, Miroslav Benes, Joe Lawrence,
live-patching, Song Liu, laokz, Jiri Kosina,
Marcos Paulo de Souza, Weinan Liu, Fazla Mehrab, Chen Zhongjin,
Puranjay Mohan, Dylan Hatch, Peter Zijlstra, Vasily Gorbik,
Alexander Gordeev, Heiko Carstens
In-Reply-To: <cover.1758067942.git.jpoimboe@kernel.org>
The s390 indirect thunks are placed in the .text.__s390_indirect_jump_*
sections.
Certain config options which enable -ffunction-sections have a custom
version of the TEXT_TEXT macro:
.text.[0-9a-zA-Z_]*
That unintentionally matches the thunk sections, causing them to get
grouped with normal text rather than being handled by their intended
rule later in the script:
*(.text.*_indirect_*)
Fix that by adding another period to the thunk section names, following
the kernel's general convention for distinguishing code-generated text
sections from compiler-generated ones.
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
arch/s390/include/asm/nospec-insn.h | 2 +-
arch/s390/kernel/vmlinux.lds.S | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/nospec-insn.h b/arch/s390/include/asm/nospec-insn.h
index 6ce6b56e282b8..46f92bb4c9e50 100644
--- a/arch/s390/include/asm/nospec-insn.h
+++ b/arch/s390/include/asm/nospec-insn.h
@@ -19,7 +19,7 @@
#ifdef CONFIG_EXPOLINE_EXTERN
SYM_CODE_START(\name)
#else
- .pushsection .text.\name,"axG",@progbits,\name,comdat
+ .pushsection .text..\name,"axG",@progbits,\name,comdat
.globl \name
.hidden \name
.type \name,@function
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 1c606dfa595d8..79b514c72c5b4 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -51,7 +51,7 @@ SECTIONS
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
FTRACE_HOTPATCH_TRAMPOLINES_TEXT
- *(.text.*_indirect_*)
+ *(.text..*_indirect_*)
*(.gnu.warning)
. = ALIGN(PAGE_SIZE);
_etext = .; /* End of text section */
--
2.50.0
^ permalink raw reply related
* [PATCH v4 00/63] objtool,livepatch: klp-build livepatch module generation
From: Josh Poimboeuf @ 2025-09-17 16:03 UTC (permalink / raw)
To: x86
Cc: linux-kernel, Petr Mladek, Miroslav Benes, Joe Lawrence,
live-patching, Song Liu, laokz, Jiri Kosina,
Marcos Paulo de Souza, Weinan Liu, Fazla Mehrab, Chen Zhongjin,
Puranjay Mohan, Dylan Hatch, Peter Zijlstra
Changes since v3 (https://lore.kernel.org/cover.1750980516.git.jpoimboe@kernel.org):
- Get rid of the SHF_MERGE+SHF_WRITE toolchain shenanigans in favor of
simple .discard.annotate_data annotations
- Fix potential double free in elf_create_reloc()
- Sync interval_tree_generic.h (Peter)
- Refactor prefix symbol creation error handling
- Rebase on tip/master and fix new issue (--checksum getting added with --noabs)
(v3..v4 diff below)
----
This series introduces new objtool features and a klp-build script to
generate livepatch modules using a source .patch as input.
This builds on concepts from the longstanding out-of-tree kpatch [1]
project which began in 2012 and has been used for many years to generate
livepatch modules for production kernels. However, this is a complete
rewrite which incorporates hard-earned lessons from 12+ years of
maintaining kpatch.
Key improvements compared to kpatch-build:
- Integrated with objtool: Leverages objtool's existing control-flow
graph analysis to help detect changed functions.
- Works on vmlinux.o: Supports late-linked objects, making it
compatible with LTO, IBT, and similar.
- Simplified code base: ~3k fewer lines of code.
- Upstream: No more out-of-tree #ifdef hacks, far less cruft.
- Cleaner internals: Vastly simplified logic for symbol/section/reloc
inclusion and special section extraction.
- Robust __LINE__ macro handling: Avoids false positive binary diffs
caused by the __LINE__ macro by introducing a fix-patch-lines script
which injects #line directives into the source .patch to preserve
the original line numbers at compile time.
The primary user interface is the klp-build script which does the
following:
- Builds an original kernel with -function-sections and
-fdata-sections, plus objtool function checksumming.
- Applies the .patch file and rebuilds the kernel using the same
options.
- Runs 'objtool klp diff' to detect changed functions and generate
intermediate binary diff objects.
- Builds a kernel module which links the diff objects with some
livepatch module init code (scripts/livepatch/init.c).
- Finalizes the livepatch module (aka work around linker wreckage)
using 'objtool klp post-link'.
I've tested with a variety of patches on defconfig and Fedora-config
kernels with both GCC and Clang.
These patches can also be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git klp-build-v3
Please test!
[1] https://github.com/dynup/kpatch
Josh Poimboeuf (63):
s390/vmlinux.lds.S: Prevent thunk functions from getting placed with
normal text
vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and related macros
x86/module: Improve relocation error messages
x86/kprobes: Remove STACK_FRAME_NON_STANDARD annotation
compiler: Tweak __UNIQUE_ID() naming
compiler.h: Make addressable symbols less of an eyesore
elfnote: Change ELFNOTE() to use __UNIQUE_ID()
kbuild: Remove 'kmod_' prefix from __KBUILD_MODNAME
modpost: Ignore unresolved section bounds symbols
x86/alternative: Refactor INT3 call emulation selftest
interval_tree: Sync interval_tree_generic.h with tools
interval_tree: Fix ITSTATIC usage for *_subtree_search()
objtool: Make find_symbol_containing() less arbitrary
objtool: Fix broken error handling in read_symbols()
objtool: Propagate elf_truncate_section() error in elf_write()
objtool: Remove error handling boilerplate
objtool: Add empty symbols to the symbol tree again
objtool: Fix interval tree insertion for zero-length symbols
objtool: Fix weak symbol detection
objtool: Fix x86 addend calculation
objtool: Fix __pa_symbol() relocation handling
objtool: Fix "unexpected end of section" warning for alternatives
objtool: Check for missing annotation entries in read_annotate()
objtool: Const string cleanup
objtool: Clean up compiler flag usage
objtool: Remove .parainstructions reference
objtool: Convert elf iterator macros to use 'struct elf'
objtool: Add section/symbol type helpers
objtool: Mark .cold subfunctions
objtool: Fix weak symbol hole detection for .cold functions
objtool: Mark prefix functions
objtool: Simplify reloc offset calculation in unwind_read_hints()
objtool: Avoid emptying lists for duplicate sections
objtool: Rename --Werror to --werror
objtool: Resurrect --backup option
objtool: Reindent check_options[]
objtool: Refactor add_jump_destinations()
objtool: Simplify special symbol handling in elf_update_symbol()
objtool: Generalize elf_create_symbol()
objtool: Generalize elf_create_section()
objtool: Add elf_create_data()
objtool: Add elf_create_reloc() and elf_init_reloc()
objtool: Add elf_create_file()
objtool: Add annotype() helper
objtool: Move ANNOTATE* macros to annotate.h
objtool: Add ANNOTATE_DATA_SPECIAL
x86/asm: Annotate special section entries
objtool: Unify STACK_FRAME_NON_STANDARD entry sizes
objtool/klp: Add --checksum option to generate per-function checksums
objtool/klp: Add --debug-checksum=<funcs> to show per-instruction
checksums
objtool/klp: Introduce klp diff subcommand for diffing object files
objtool/klp: Add --debug option to show cloning decisions
objtool/klp: Add post-link subcommand to finalize livepatch modules
objtool: Refactor prefix symbol creation code
objtool: Add base objtool support for livepatch modules
livepatch: Add CONFIG_KLP_BUILD
kbuild,objtool: Defer objtool validation step for CONFIG_KLP_BUILD
livepatch/klp-build: Introduce fix-patch-lines script to avoid
__LINE__ diff noise
livepatch/klp-build: Add stub init code for livepatch modules
livepatch/klp-build: Introduce klp-build script for generating
livepatch modules
livepatch/klp-build: Add --debug option to show cloning decisions
livepatch/klp-build: Add --show-first-changed option to show function
divergence
livepatch: Introduce source code helpers for livepatch modules
MAINTAINERS | 3 +-
arch/s390/include/asm/nospec-insn.h | 2 +-
arch/s390/kernel/vmlinux.lds.S | 2 +-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/alternative.h | 4 +
arch/x86/include/asm/asm.h | 5 +
arch/x86/include/asm/bug.h | 1 +
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/include/asm/jump_label.h | 1 +
arch/x86/kernel/alternative.c | 51 +-
arch/x86/kernel/kprobes/opt.c | 4 -
arch/x86/kernel/module.c | 15 +-
include/asm-generic/vmlinux.lds.h | 40 +-
include/linux/annotate.h | 134 ++
include/linux/compiler.h | 8 +-
include/linux/elfnote.h | 13 +-
include/linux/init.h | 3 +-
include/linux/interval_tree.h | 4 +
include/linux/interval_tree_generic.h | 2 +-
include/linux/livepatch.h | 25 +-
include/linux/livepatch_external.h | 76 +
include/linux/livepatch_helpers.h | 77 +
include/linux/mm.h | 2 +
include/linux/objtool.h | 96 +-
include/linux/objtool_types.h | 2 +
kernel/livepatch/Kconfig | 12 +
kernel/livepatch/core.c | 8 +-
scripts/Makefile.lib | 7 +-
scripts/Makefile.vmlinux_o | 2 +-
scripts/link-vmlinux.sh | 3 +-
scripts/livepatch/fix-patch-lines | 79 +
scripts/livepatch/init.c | 108 ++
scripts/livepatch/klp-build | 827 ++++++++
scripts/mod/modpost.c | 5 +
scripts/module.lds.S | 22 +-
tools/include/linux/interval_tree_generic.h | 10 +-
tools/include/linux/livepatch_external.h | 76 +
tools/include/linux/objtool_types.h | 2 +
tools/include/linux/string.h | 14 +
tools/objtool/Build | 4 +-
tools/objtool/Makefile | 48 +-
tools/objtool/arch/loongarch/decode.c | 6 +-
tools/objtool/arch/loongarch/orc.c | 1 -
tools/objtool/arch/powerpc/decode.c | 7 +-
tools/objtool/arch/x86/decode.c | 63 +-
tools/objtool/arch/x86/orc.c | 1 -
tools/objtool/arch/x86/special.c | 2 +-
tools/objtool/builtin-check.c | 96 +-
tools/objtool/builtin-klp.c | 53 +
tools/objtool/check.c | 875 +++++----
tools/objtool/elf.c | 781 ++++++--
tools/objtool/include/objtool/arch.h | 5 +-
tools/objtool/include/objtool/builtin.h | 11 +-
tools/objtool/include/objtool/check.h | 6 +-
tools/objtool/include/objtool/checksum.h | 43 +
.../objtool/include/objtool/checksum_types.h | 25 +
tools/objtool/include/objtool/elf.h | 196 +-
tools/objtool/include/objtool/endianness.h | 9 +-
tools/objtool/include/objtool/klp.h | 35 +
tools/objtool/include/objtool/objtool.h | 4 +-
tools/objtool/include/objtool/util.h | 19 +
tools/objtool/include/objtool/warn.h | 40 +
tools/objtool/klp-diff.c | 1723 +++++++++++++++++
tools/objtool/klp-post-link.c | 168 ++
tools/objtool/objtool.c | 42 +-
tools/objtool/orc_dump.c | 1 -
tools/objtool/orc_gen.c | 9 +-
tools/objtool/special.c | 14 +-
tools/objtool/sync-check.sh | 2 +
tools/objtool/weak.c | 7 +
70 files changed, 5152 insertions(+), 891 deletions(-)
create mode 100644 include/linux/annotate.h
create mode 100644 include/linux/livepatch_external.h
create mode 100644 include/linux/livepatch_helpers.h
create mode 100755 scripts/livepatch/fix-patch-lines
create mode 100644 scripts/livepatch/init.c
create mode 100755 scripts/livepatch/klp-build
create mode 100644 tools/include/linux/livepatch_external.h
create mode 100644 tools/objtool/builtin-klp.c
create mode 100644 tools/objtool/include/objtool/checksum.h
create mode 100644 tools/objtool/include/objtool/checksum_types.h
create mode 100644 tools/objtool/include/objtool/klp.h
create mode 100644 tools/objtool/include/objtool/util.h
create mode 100644 tools/objtool/klp-diff.c
create mode 100644 tools/objtool/klp-post-link.c
--
2.50.0
diff --git a/arch/Kconfig b/arch/Kconfig
index 4fb74eade61af..b13e86ad23e2f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1368,9 +1368,6 @@ config HAVE_NOINSTR_HACK
config HAVE_NOINSTR_VALIDATION
bool
-config NEED_MODULE_PERMISSIONS_FIX
- bool
-
config HAVE_UACCESS_VALIDATION
bool
select OBJTOOL
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe3220..b6810db24ca4d 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += device.h
generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
+generic-y += extable.h
generic-y += ftrace.h
generic-y += hw_irq.h
generic-y += irq_regs.h
diff --git a/arch/um/include/shared/common-offsets.h b/arch/um/include/shared/common-offsets.h
index a6f77cb6aa7e1..8ca66a1918c3a 100644
--- a/arch/um/include/shared/common-offsets.h
+++ b/arch/um/include/shared/common-offsets.h
@@ -18,6 +18,3 @@ DEFINE(UM_NSEC_PER_USEC, NSEC_PER_USEC);
DEFINE(UM_KERN_GDT_ENTRY_TLS_ENTRIES, GDT_ENTRY_TLS_ENTRIES);
DEFINE(UM_SECCOMP_ARCH_NATIVE, SECCOMP_ARCH_NATIVE);
-
-DEFINE(ALT_INSTR_SIZE, sizeof(struct alt_instr));
-DEFINE(EXTABLE_SIZE, sizeof(struct exception_table_entry));
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 073274e35da5e..986d31587e999 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -305,7 +305,6 @@ config X86
select HOTPLUG_SPLIT_STARTUP if SMP && X86_32
select IRQ_FORCED_THREADING
select LOCK_MM_AND_FIND_VMA
- select NEED_MODULE_PERMISSIONS_FIX
select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK
select NEED_SG_DMA_LENGTH
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index eb24d9ba30d7f..b14c045679e16 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -197,8 +197,8 @@ static inline int alternatives_text_reserved(void *start, void *end)
"773:\n"
#define ALTINSTR_ENTRY(ft_flags) \
- ".pushsection .altinstructions, \"aM\", @progbits, " \
- __stringify(ALT_INSTR_SIZE) "\n" \
+ ".pushsection .altinstructions,\"a\"\n" \
+ ANNOTATE_DATA_SPECIAL \
" .long 771b - .\n" /* label */ \
" .long 774f - .\n" /* new instruction */ \
" .4byte " __stringify(ft_flags) "\n" /* feature + flags */ \
@@ -208,6 +208,7 @@ static inline int alternatives_text_reserved(void *start, void *end)
#define ALTINSTR_REPLACEMENT(newinstr) /* replacement */ \
".pushsection .altinstr_replacement, \"ax\"\n" \
+ ANNOTATE_DATA_SPECIAL \
"# ALT: replacement\n" \
"774:\n\t" newinstr "\n775:\n" \
".popsection\n"
@@ -338,6 +339,7 @@ void nop_func(void);
* instruction. See apply_alternatives().
*/
.macro altinstr_entry orig alt ft_flags orig_len alt_len
+ ANNOTATE_DATA_SPECIAL
.long \orig - .
.long \alt - .
.4byte \ft_flags
@@ -361,11 +363,12 @@ void nop_func(void);
741: \
.skip -(((744f-743f)-(741b-740b)) > 0) * ((744f-743f)-(741b-740b)),0x90 ;\
742: \
- .pushsection .altinstructions, "aM", @progbits, ALT_INSTR_SIZE ;\
+ .pushsection .altinstructions,"a" ; \
altinstr_entry 740b,743f,flag,742b-740b,744f-743f ; \
.popsection ; \
.pushsection .altinstr_replacement,"ax" ; \
743: \
+ ANNOTATE_DATA_SPECIAL ; \
newinst ; \
744: \
.popsection ;
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index ecb28d2bc6730..bd62bd87a841e 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_ASM_H
#define _ASM_X86_ASM_H
+#include <linux/annotate.h>
+
#ifdef __ASSEMBLER__
# define __ASM_FORM(x, ...) x,## __VA_ARGS__
# define __ASM_FORM_RAW(x, ...) x,## __VA_ARGS__
@@ -124,21 +126,18 @@ static __always_inline __pure void *rip_rel_ptr(void *p)
#ifdef __KERNEL__
-#ifndef COMPILE_OFFSETS
-#include <asm/asm-offsets.h>
-#endif
-
# include <asm/extable_fixup_types.h>
/* Exception table entry */
#ifdef __ASSEMBLER__
-# define _ASM_EXTABLE_TYPE(from, to, type) \
- .pushsection "__ex_table", "aM", @progbits, EXTABLE_SIZE; \
- .balign 4 ; \
- .long (from) - . ; \
- .long (to) - . ; \
- .long type ; \
+# define _ASM_EXTABLE_TYPE(from, to, type) \
+ .pushsection "__ex_table","a" ; \
+ .balign 4 ; \
+ ANNOTATE_DATA_SPECIAL ; \
+ .long (from) - . ; \
+ .long (to) - . ; \
+ .long type ; \
.popsection
# ifdef CONFIG_KPROBES
@@ -181,18 +180,18 @@ static __always_inline __pure void *rip_rel_ptr(void *p)
".purgem extable_type_reg\n"
# define _ASM_EXTABLE_TYPE(from, to, type) \
- " .pushsection __ex_table, \"aM\", @progbits, " \
- __stringify(EXTABLE_SIZE) "\n" \
+ " .pushsection \"__ex_table\",\"a\"\n" \
" .balign 4\n" \
+ ANNOTATE_DATA_SPECIAL \
" .long (" #from ") - .\n" \
" .long (" #to ") - .\n" \
" .long " __stringify(type) " \n" \
" .popsection\n"
# define _ASM_EXTABLE_TYPE_REG(from, to, type, reg) \
- " .pushsection __ex_table, \"aM\", @progbits, " \
- __stringify(EXTABLE_SIZE) "\n" \
+ " .pushsection \"__ex_table\",\"a\"\n" \
" .balign 4\n" \
+ ANNOTATE_DATA_SPECIAL \
" .long (" #from ") - .\n" \
" .long (" #to ") - .\n" \
DEFINE_EXTABLE_TYPE_REG \
diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index db1522fdbd108..3910db28e2f5b 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -56,7 +56,8 @@
#define _BUG_FLAGS_ASM(ins, file, line, flags, size, extra) \
"1:\t" ins "\n" \
- ".pushsection __bug_table, \"aM\", @progbits, " size "\n" \
+ ".pushsection __bug_table,\"aw\"\n" \
+ ANNOTATE_DATA_SPECIAL \
__BUG_ENTRY(file, line, flags) \
"\t.org 2b + " size "\n" \
".popsection\n" \
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 893cbca37fe99..fc5f32d4da6e1 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -101,6 +101,7 @@ static __always_inline bool _static_cpu_has(u16 bit)
asm goto(ALTERNATIVE_TERNARY("jmp 6f", %c[feature], "", "jmp %l[t_no]")
".pushsection .altinstr_aux,\"ax\"\n"
"6:\n"
+ ANNOTATE_DATA_SPECIAL
" testb %[bitnum], %a[cap_byte]\n"
" jnz %l[t_yes]\n"
" jmp %l[t_no]\n"
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 7a6b0e5d85c19..e0a6930a4029a 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -12,17 +12,13 @@
#include <linux/stringify.h>
#include <linux/types.h>
-#ifndef COMPILE_OFFSETS
-#include <generated/bounds.h>
-#endif
-
-#define JUMP_TABLE_ENTRY(key, label) \
- ".pushsection __jump_table, \"aM\", @progbits, " \
- __stringify(JUMP_ENTRY_SIZE) "\n\t" \
- _ASM_ALIGN "\n\t" \
- ".long 1b - . \n\t" \
- ".long " label " - . \n\t" \
- _ASM_PTR " " key " - . \n\t" \
+#define JUMP_TABLE_ENTRY(key, label) \
+ ".pushsection __jump_table, \"aw\" \n\t" \
+ _ASM_ALIGN "\n\t" \
+ ANNOTATE_DATA_SPECIAL \
+ ".long 1b - . \n\t" \
+ ".long " label " - . \n\t" \
+ _ASM_PTR " " key " - . \n\t" \
".popsection \n\t"
/* This macro is also expanded on the Rust side. */
diff --git a/arch/x86/include/asm/static_call.h b/arch/x86/include/asm/static_call.h
index e03ad9bbbf59d..41502bd2afd64 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -58,8 +58,7 @@
ARCH_DEFINE_STATIC_CALL_TRAMP(name, __static_call_return0)
#define ARCH_ADD_TRAMP_KEY(name) \
- asm(".pushsection .static_call_tramp_key, \"aM\", @progbits, " \
- __stringify(STATIC_CALL_TRAMP_KEY_SIZE) "\n" \
+ asm(".pushsection .static_call_tramp_key, \"a\" \n" \
".long " STATIC_CALL_TRAMP_STR(name) " - . \n" \
".long " STATIC_CALL_KEY_STR(name) " - . \n" \
".popsection \n")
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 0586d237b8866..32ba599a51f88 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -124,7 +124,4 @@ static void __used common(void)
OFFSET(ARIA_CTX_rounds, aria_ctx, rounds);
#endif
- BLANK();
- DEFINE(ALT_INSTR_SIZE, sizeof(struct alt_instr));
- DEFINE(EXTABLE_SIZE, sizeof(struct exception_table_entry));
}
diff --git a/arch/x86/um/shared/sysdep/kernel-offsets.h b/arch/x86/um/shared/sysdep/kernel-offsets.h
index 8215a0200ddd9..6fd1ed4003992 100644
--- a/arch/x86/um/shared/sysdep/kernel-offsets.h
+++ b/arch/x86/um/shared/sysdep/kernel-offsets.h
@@ -1,5 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#define COMPILE_OFFSETS
#include <linux/stddef.h>
#include <linux/sched.h>
#include <linux/elf.h>
@@ -8,7 +7,6 @@
#include <linux/audit.h>
#include <asm/mman.h>
#include <asm/seccomp.h>
-#include <asm/extable.h>
/* workaround for a warning with -Wmissing-prototypes */
void foo(void);
diff --git a/include/linux/annotate.h b/include/linux/annotate.h
new file mode 100644
index 0000000000000..7c10d34d198cf
--- /dev/null
+++ b/include/linux/annotate.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_ANNOTATE_H
+#define _LINUX_ANNOTATE_H
+
+#include <linux/objtool_types.h>
+
+#ifdef CONFIG_OBJTOOL
+
+#ifndef __ASSEMBLY__
+
+#define __ASM_ANNOTATE(section, label, type) \
+ ".pushsection " section ",\"M\", @progbits, 8\n\t" \
+ ".long " __stringify(label) " - .\n\t" \
+ ".long " __stringify(type) "\n\t" \
+ ".popsection\n\t"
+
+#define ASM_ANNOTATE_LABEL(label, type) \
+ __ASM_ANNOTATE(".discard.annotate_insn", label, type)
+
+#define ASM_ANNOTATE(type) \
+ "911:\n\t" \
+ ASM_ANNOTATE_LABEL(911b, type)
+
+#define ASM_ANNOTATE_DATA(type) \
+ "912:\n\t" \
+ __ASM_ANNOTATE(".discard.annotate_data", 912b, type)
+
+#else /* __ASSEMBLY__ */
+
+.macro __ANNOTATE section, type
+.Lhere_\@:
+ .pushsection \section, "M", @progbits, 8
+ .long .Lhere_\@ - .
+ .long \type
+ .popsection
+.endm
+
+.macro ANNOTATE type
+ __ANNOTATE ".discard.annotate_insn", \type
+.endm
+
+.macro ANNOTATE_DATA type
+ __ANNOTATE ".discard.annotate_data", \type
+.endm
+
+#endif /* __ASSEMBLY__ */
+
+#else /* !CONFIG_OBJTOOL */
+#ifndef __ASSEMBLY__
+#define ASM_ANNOTATE_LABEL(label, type) ""
+#define ASM_ANNOTATE(type)
+#define ASM_ANNOTATE_DATA(type)
+#else /* __ASSEMBLY__ */
+.macro ANNOTATE type
+.endm
+.macro ANNOTATE_DATA type
+.endm
+#endif /* __ASSEMBLY__ */
+#endif /* !CONFIG_OBJTOOL */
+
+#ifndef __ASSEMBLY__
+
+/*
+ * Annotate away the various 'relocation to !ENDBR` complaints; knowing that
+ * these relocations will never be used for indirect calls.
+ */
+#define ANNOTATE_NOENDBR ASM_ANNOTATE(ANNOTYPE_NOENDBR)
+#define ANNOTATE_NOENDBR_SYM(sym) asm(ASM_ANNOTATE_LABEL(sym, ANNOTYPE_NOENDBR))
+
+/*
+ * This should be used immediately before an indirect jump/call. It tells
+ * objtool the subsequent indirect jump/call is vouched safe for retpoline
+ * builds.
+ */
+#define ANNOTATE_RETPOLINE_SAFE ASM_ANNOTATE(ANNOTYPE_RETPOLINE_SAFE)
+/*
+ * See linux/instrumentation.h
+ */
+#define ANNOTATE_INSTR_BEGIN(label) ASM_ANNOTATE_LABEL(label, ANNOTYPE_INSTR_BEGIN)
+#define ANNOTATE_INSTR_END(label) ASM_ANNOTATE_LABEL(label, ANNOTYPE_INSTR_END)
+/*
+ * objtool annotation to ignore the alternatives and only consider the original
+ * instruction(s).
+ */
+#define ANNOTATE_IGNORE_ALTERNATIVE ASM_ANNOTATE(ANNOTYPE_IGNORE_ALTS)
+/*
+ * This macro indicates that the following intra-function call is valid.
+ * Any non-annotated intra-function call will cause objtool to issue a warning.
+ */
+#define ANNOTATE_INTRA_FUNCTION_CALL ASM_ANNOTATE(ANNOTYPE_INTRA_FUNCTION_CALL)
+/*
+ * Use objtool to validate the entry requirement that all code paths do
+ * VALIDATE_UNRET_END before RET.
+ *
+ * NOTE: The macro must be used at the beginning of a global symbol, otherwise
+ * it will be ignored.
+ */
+#define ANNOTATE_UNRET_BEGIN ASM_ANNOTATE(ANNOTYPE_UNRET_BEGIN)
+/*
+ * This should be used to refer to an instruction that is considered
+ * terminating, like a noreturn CALL or UD2 when we know they are not -- eg
+ * WARN using UD2.
+ */
+#define ANNOTATE_REACHABLE(label) ASM_ANNOTATE_LABEL(label, ANNOTYPE_REACHABLE)
+/*
+ * This should not be used; it annotates away CFI violations. There are a few
+ * valid use cases like kexec handover to the next kernel image, and there is
+ * no security concern there.
+ *
+ * There are also a few real issues annotated away, like EFI because we can't
+ * control the EFI code.
+ */
+#define ANNOTATE_NOCFI_SYM(sym) asm(ASM_ANNOTATE_LABEL(sym, ANNOTYPE_NOCFI))
+
+/*
+ * Annotate a special section entry. This emables livepatch module generation
+ * to find and extract individual special section entries as needed.
+ */
+#define ANNOTATE_DATA_SPECIAL ASM_ANNOTATE_DATA(ANNOTYPE_DATA_SPECIAL)
+
+#else /* __ASSEMBLY__ */
+#define ANNOTATE_NOENDBR ANNOTATE type=ANNOTYPE_NOENDBR
+#define ANNOTATE_RETPOLINE_SAFE ANNOTATE type=ANNOTYPE_RETPOLINE_SAFE
+/* ANNOTATE_INSTR_BEGIN ANNOTATE type=ANNOTYPE_INSTR_BEGIN */
+/* ANNOTATE_INSTR_END ANNOTATE type=ANNOTYPE_INSTR_END */
+#define ANNOTATE_IGNORE_ALTERNATIVE ANNOTATE type=ANNOTYPE_IGNORE_ALTS
+#define ANNOTATE_INTRA_FUNCTION_CALL ANNOTATE type=ANNOTYPE_INTRA_FUNCTION_CALL
+#define ANNOTATE_UNRET_BEGIN ANNOTATE type=ANNOTYPE_UNRET_BEGIN
+#define ANNOTATE_REACHABLE ANNOTATE type=ANNOTYPE_REACHABLE
+#define ANNOTATE_NOCFI_SYM ANNOTATE type=ANNOTYPE_NOCFI
+#define ANNOTATE_DATA_SPECIAL ANNOTATE_DATA type=ANNOTYPE_DATA_SPECIAL
+#endif /* __ASSEMBLY__ */
+
+#endif /* _LINUX_ANNOTATE_H */
diff --git a/include/linux/interval_tree.h b/include/linux/interval_tree.h
index 2b8026a399064..9d5791e9f737a 100644
--- a/include/linux/interval_tree.h
+++ b/include/linux/interval_tree.h
@@ -19,6 +19,10 @@ extern void
interval_tree_remove(struct interval_tree_node *node,
struct rb_root_cached *root);
+extern struct interval_tree_node *
+interval_tree_subtree_search(struct interval_tree_node *node,
+ unsigned long start, unsigned long last);
+
extern struct interval_tree_node *
interval_tree_iter_first(struct rb_root_cached *root,
unsigned long start, unsigned long last);
diff --git a/include/linux/interval_tree_generic.h b/include/linux/interval_tree_generic.h
index 1b400f26f63d6..c5a2fed49eb0d 100644
--- a/include/linux/interval_tree_generic.h
+++ b/include/linux/interval_tree_generic.h
@@ -77,7 +77,7 @@ ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node, \
* Cond2: start <= ITLAST(node) \
*/ \
\
-static ITSTRUCT * \
+ITSTATIC ITSTRUCT * \
ITPREFIX ## _subtree_search(ITSTRUCT *node, ITTYPE start, ITTYPE last) \
{ \
while (true) { \
diff --git a/include/linux/livepatch_helpers.h b/include/linux/livepatch_helpers.h
index 337bee91d7daf..99d68d0773fa8 100644
--- a/include/linux/livepatch_helpers.h
+++ b/include/linux/livepatch_helpers.h
@@ -36,8 +36,6 @@
__PASTE(__KLP_POST_UNPATCH_PREFIX, KLP_OBJNAME) = func
/*
- * KLP_STATIC_CALL
- *
* Replace static_call() usage with this macro when create-diff-object
* recommends it due to the original static call key living in a module.
*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec75..69baa9a1e2cb4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3265,6 +3265,8 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
struct rb_root_cached *root);
void vma_interval_tree_remove(struct vm_area_struct *node,
struct rb_root_cached *root);
+struct vm_area_struct *vma_interval_tree_subtree_search(struct vm_area_struct *node,
+ unsigned long start, unsigned long last);
struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root_cached *root,
unsigned long start, unsigned long last);
struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index b5081aea3b69d..b18ab53561c99 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -2,22 +2,17 @@
#ifndef _LINUX_OBJTOOL_H
#define _LINUX_OBJTOOL_H
-#ifndef COMPILE_OFFSETS
-#include <generated/bounds.h>
-#endif
-
#include <linux/objtool_types.h>
+#include <linux/annotate.h>
#ifdef CONFIG_OBJTOOL
-#include <asm/asm.h>
-
#ifndef __ASSEMBLY__
#define UNWIND_HINT(type, sp_reg, sp_offset, signal) \
"987: \n\t" \
- ".pushsection .discard.unwind_hints, \"M\", @progbits, "\
- __stringify(UNWIND_HINT_SIZE) "\n\t" \
+ ".pushsection .discard.unwind_hints\n\t" \
+ ANNOTATE_DATA_SPECIAL \
/* struct unwind_hint */ \
".long 987b - .\n\t" \
".short " __stringify(sp_offset) "\n\t" \
@@ -58,16 +53,6 @@
#define __ASM_BREF(label) label ## b
-#define __ASM_ANNOTATE(label, type) \
- ".pushsection .discard.annotate_insn,\"M\",@progbits,8\n\t" \
- ".long " __stringify(label) " - .\n\t" \
- ".long " __stringify(type) "\n\t" \
- ".popsection\n\t"
-
-#define ASM_ANNOTATE(type) \
- "911:\n\t" \
- __ASM_ANNOTATE(911b, type)
-
#else /* __ASSEMBLY__ */
/*
@@ -93,7 +78,8 @@
*/
.macro UNWIND_HINT type:req sp_reg=0 sp_offset=0 signal=0
.Lhere_\@:
- .pushsection .discard.unwind_hints, "M", @progbits, UNWIND_HINT_SIZE
+ .pushsection .discard.unwind_hints
+ ANNOTATE_DATA_SPECIAL
/* struct unwind_hint */
.long .Lhere_\@ - .
.short \sp_offset
@@ -116,14 +102,6 @@
#endif
.endm
-.macro ANNOTATE type:req
-.Lhere_\@:
- .pushsection .discard.annotate_insn,"M",@progbits,8
- .long .Lhere_\@ - .
- .long \type
- .popsection
-.endm
-
#endif /* __ASSEMBLY__ */
#else /* !CONFIG_OBJTOOL */
@@ -133,84 +111,15 @@
#define UNWIND_HINT(type, sp_reg, sp_offset, signal) "\n\t"
#define STACK_FRAME_NON_STANDARD(func)
#define STACK_FRAME_NON_STANDARD_FP(func)
-#define __ASM_ANNOTATE(label, type) ""
-#define ASM_ANNOTATE(type)
#else
.macro UNWIND_HINT type:req sp_reg=0 sp_offset=0 signal=0
.endm
.macro STACK_FRAME_NON_STANDARD func:req
.endm
-.macro ANNOTATE type:req
-.endm
#endif
#endif /* CONFIG_OBJTOOL */
-#ifndef __ASSEMBLY__
-/*
- * Annotate away the various 'relocation to !ENDBR` complaints; knowing that
- * these relocations will never be used for indirect calls.
- */
-#define ANNOTATE_NOENDBR ASM_ANNOTATE(ANNOTYPE_NOENDBR)
-#define ANNOTATE_NOENDBR_SYM(sym) asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOENDBR))
-
-/*
- * This should be used immediately before an indirect jump/call. It tells
- * objtool the subsequent indirect jump/call is vouched safe for retpoline
- * builds.
- */
-#define ANNOTATE_RETPOLINE_SAFE ASM_ANNOTATE(ANNOTYPE_RETPOLINE_SAFE)
-/*
- * See linux/instrumentation.h
- */
-#define ANNOTATE_INSTR_BEGIN(label) __ASM_ANNOTATE(label, ANNOTYPE_INSTR_BEGIN)
-#define ANNOTATE_INSTR_END(label) __ASM_ANNOTATE(label, ANNOTYPE_INSTR_END)
-/*
- * objtool annotation to ignore the alternatives and only consider the original
- * instruction(s).
- */
-#define ANNOTATE_IGNORE_ALTERNATIVE ASM_ANNOTATE(ANNOTYPE_IGNORE_ALTS)
-/*
- * This macro indicates that the following intra-function call is valid.
- * Any non-annotated intra-function call will cause objtool to issue a warning.
- */
-#define ANNOTATE_INTRA_FUNCTION_CALL ASM_ANNOTATE(ANNOTYPE_INTRA_FUNCTION_CALL)
-/*
- * Use objtool to validate the entry requirement that all code paths do
- * VALIDATE_UNRET_END before RET.
- *
- * NOTE: The macro must be used at the beginning of a global symbol, otherwise
- * it will be ignored.
- */
-#define ANNOTATE_UNRET_BEGIN ASM_ANNOTATE(ANNOTYPE_UNRET_BEGIN)
-/*
- * This should be used to refer to an instruction that is considered
- * terminating, like a noreturn CALL or UD2 when we know they are not -- eg
- * WARN using UD2.
- */
-#define ANNOTATE_REACHABLE(label) __ASM_ANNOTATE(label, ANNOTYPE_REACHABLE)
-/*
- * This should not be used; it annotates away CFI violations. There are a few
- * valid use cases like kexec handover to the next kernel image, and there is
- * no security concern there.
- *
- * There are also a few real issues annotated away, like EFI because we can't
- * control the EFI code.
- */
-#define ANNOTATE_NOCFI_SYM(sym) asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOCFI))
-
-#else
-#define ANNOTATE_NOENDBR ANNOTATE type=ANNOTYPE_NOENDBR
-#define ANNOTATE_RETPOLINE_SAFE ANNOTATE type=ANNOTYPE_RETPOLINE_SAFE
-/* ANNOTATE_INSTR_BEGIN ANNOTATE type=ANNOTYPE_INSTR_BEGIN */
-/* ANNOTATE_INSTR_END ANNOTATE type=ANNOTYPE_INSTR_END */
-#define ANNOTATE_IGNORE_ALTERNATIVE ANNOTATE type=ANNOTYPE_IGNORE_ALTS
-#define ANNOTATE_INTRA_FUNCTION_CALL ANNOTATE type=ANNOTYPE_INTRA_FUNCTION_CALL
-#define ANNOTATE_UNRET_BEGIN ANNOTATE type=ANNOTYPE_UNRET_BEGIN
-#define ANNOTATE_REACHABLE ANNOTATE type=ANNOTYPE_REACHABLE
-#define ANNOTATE_NOCFI_SYM ANNOTATE type=ANNOTYPE_NOCFI
-#endif
-
#if defined(CONFIG_NOINSTR_VALIDATION) && \
(defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO))
#define VALIDATE_UNRET_BEGIN ANNOTATE_UNRET_BEGIN
diff --git a/include/linux/objtool_types.h b/include/linux/objtool_types.h
index aceac94632c8a..c6def4049b1ae 100644
--- a/include/linux/objtool_types.h
+++ b/include/linux/objtool_types.h
@@ -67,4 +67,6 @@ struct unwind_hint {
#define ANNOTYPE_REACHABLE 8
#define ANNOTYPE_NOCFI 9
+#define ANNOTYPE_DATA_SPECIAL 1
+
#endif /* _LINUX_OBJTOOL_TYPES_H */
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 5210612817f2e..78a77a4ae0ea8 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -172,6 +172,12 @@ struct static_call_mod {
struct static_call_site *sites;
};
+/* For finding the key associated with a trampoline */
+struct static_call_tramp_key {
+ s32 tramp;
+ s32 key;
+};
+
extern void __static_call_update(struct static_call_key *key, void *tramp, void *func);
extern int static_call_mod_init(struct module *mod);
extern int static_call_text_reserved(void *start, void *end);
diff --git a/include/linux/static_call_types.h b/include/linux/static_call_types.h
index eb772df625d4e..5a00b8b2cf9fc 100644
--- a/include/linux/static_call_types.h
+++ b/include/linux/static_call_types.h
@@ -34,12 +34,6 @@ struct static_call_site {
s32 key;
};
-/* For finding the key associated with a trampoline */
-struct static_call_tramp_key {
- s32 tramp;
- s32 key;
-};
-
#define DECLARE_STATIC_CALL(name, func) \
extern struct static_call_key STATIC_CALL_KEY(name); \
extern typeof(func) STATIC_CALL_TRAMP(name);
diff --git a/kernel/bounds.c b/kernel/bounds.c
index f9bc13727721e..29b2cd00df2cc 100644
--- a/kernel/bounds.c
+++ b/kernel/bounds.c
@@ -6,16 +6,12 @@
*/
#define __GENERATING_BOUNDS_H
-#define COMPILE_OFFSETS
/* Include headers that define the enum constants of interest */
#include <linux/page-flags.h>
#include <linux/mmzone.h>
#include <linux/kbuild.h>
#include <linux/log2.h>
#include <linux/spinlock_types.h>
-#include <linux/jump_label.h>
-#include <linux/static_call_types.h>
-#include <linux/objtool_types.h>
int main(void)
{
@@ -32,15 +28,6 @@ int main(void)
#else
DEFINE(LRU_GEN_WIDTH, 0);
DEFINE(__LRU_REFS_WIDTH, 0);
-#endif
-#if defined(CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE) && defined(CONFIG_JUMP_LABEL)
- DEFINE(JUMP_ENTRY_SIZE, sizeof(struct jump_entry));
-#endif
-#ifdef CONFIG_HAVE_STATIC_CALL_INLINE
- DEFINE(STATIC_CALL_TRAMP_KEY_SIZE, sizeof(struct static_call_tramp_key));
-#endif
-#ifdef CONFIG_OBJTOOL
- DEFINE(UNWIND_HINT_SIZE, sizeof(struct unwind_hint));
#endif
/* End of constants */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 28a1c08e3b221..f4b33919ec371 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -173,6 +173,7 @@ ifdef CONFIG_OBJTOOL
objtool := $(objtree)/tools/objtool/objtool
+objtool-args-$(CONFIG_KLP_BUILD) += --checksum
objtool-args-$(CONFIG_HAVE_JUMP_LABEL_HACK) += --hacks=jump_label
objtool-args-$(CONFIG_HAVE_NOINSTR_HACK) += --hacks=noinstr
objtool-args-$(CONFIG_MITIGATION_CALL_DEPTH_TRACKING) += --hacks=skylake
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 7a888e1ff70f2..542ba462ed3ec 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -28,24 +28,12 @@ ccflags-remove-y := $(CC_FLAGS_CFI)
.module-common.o: $(srctree)/scripts/module-common.c FORCE
$(call if_changed_rule,cc_o_c)
-ifdef CONFIG_NEED_MODULE_PERMISSIONS_FIX
-cmd_fix_mod_permissions = \
- $(OBJCOPY) --set-section-flags __jump_table=alloc,data \
- --set-section-flags __bug_table=alloc,data $@ \
- --set-section-flags .static_call_sites=alloc,data $@
-endif
-
quiet_cmd_ld_ko_o = LD [M] $@
cmd_ld_ko_o = \
$(LD) -r $(KBUILD_LDFLAGS) \
$(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
-T $(objtree)/scripts/module.lds -o $@ $(filter %.o, $^)
-define rule_ld_ko_o
- $(call cmd_and_savecmd,ld_ko_o)
- $(call cmd,fix_mod_permissions)
-endef
-
quiet_cmd_btf_ko = BTF [M] $@
cmd_btf_ko = \
if [ ! -f $(objtree)/vmlinux ]; then \
@@ -58,11 +46,14 @@ quiet_cmd_btf_ko = BTF [M] $@
# Same as newer-prereqs, but allows to exclude specified extra dependencies
newer_prereqs_except = $(filter-out $(PHONY) $(1),$?)
-if_changed_rule_except = $(if $(call newer_prereqs_except,$(2))$(cmd-check),$(rule_$(1)),@:)
+# Same as if_changed, but allows to exclude specified extra dependencies
+if_changed_except = $(if $(call newer_prereqs_except,$(2))$(cmd-check), \
+ $(cmd); \
+ printf '%s\n' 'savedcmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:)
# Re-generate module BTFs if either module's .ko or vmlinux changed
%.ko: %.o %.mod.o .module-common.o $(objtree)/scripts/module.lds $(and $(CONFIG_DEBUG_INFO_BTF_MODULES),$(KBUILD_BUILTIN),$(objtree)/vmlinux) FORCE
- +$(call if_changed_rule_except,ld_ko_o,$(objtree)/vmlinux)
+ +$(call if_changed_except,ld_ko_o,$(objtree)/vmlinux)
ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+$(if $(newer-prereqs),$(call cmd,btf_ko))
endif
diff --git a/scripts/livepatch/klp-build b/scripts/livepatch/klp-build
index e47056f75475e..881e052e7faef 100755
--- a/scripts/livepatch/klp-build
+++ b/scripts/livepatch/klp-build
@@ -489,11 +489,8 @@ clean_kernel() {
build_kernel() {
local log="$TMP_DIR/build.log"
- local objtool_args=()
local cmd=()
- objtool_args=("--checksum")
-
cmd=("make")
# When a patch to a kernel module references a newly created unexported
@@ -516,7 +513,6 @@ build_kernel() {
cmd+=("$VERBOSE")
cmd+=("-j$JOBS")
cmd+=("KCFLAGS=-ffunction-sections -fdata-sections")
- cmd+=("OBJTOOL_ARGS=${objtool_args[*]}")
cmd+=("vmlinux")
cmd+=("modules")
@@ -794,7 +790,7 @@ process_args "$@"
do_init
if (( SHORT_CIRCUIT <= 1 )); then
- status "Validating patches"
+ status "Validating patch(es)"
validate_patches
status "Building original kernel"
clean_kernel
@@ -804,7 +800,7 @@ if (( SHORT_CIRCUIT <= 1 )); then
fi
if (( SHORT_CIRCUIT <= 2 )); then
- status "Fixing patches"
+ status "Fixing patch(es)"
fix_patches
apply_patches
status "Building patched kernel"
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index ef2ffb68f69d1..d3d00e85edf73 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0
-#define COMPILE_OFFSETS
#include <linux/kbuild.h>
#include <linux/mod_devicetable.h>
diff --git a/tools/include/linux/interval_tree_generic.h b/tools/include/linux/interval_tree_generic.h
index c0ec9dbdfbaf2..c5a2fed49eb0d 100644
--- a/tools/include/linux/interval_tree_generic.h
+++ b/tools/include/linux/interval_tree_generic.h
@@ -104,12 +104,8 @@ ITPREFIX ## _subtree_search(ITSTRUCT *node, ITTYPE start, ITTYPE last) \
if (ITSTART(node) <= last) { /* Cond1 */ \
if (start <= ITLAST(node)) /* Cond2 */ \
return node; /* node is leftmost match */ \
- if (node->ITRB.rb_right) { \
- node = rb_entry(node->ITRB.rb_right, \
- ITSTRUCT, ITRB); \
- if (start <= node->ITSUBTREE) \
- continue; \
- } \
+ node = rb_entry(node->ITRB.rb_right, ITSTRUCT, ITRB); \
+ continue; \
} \
return NULL; /* No match */ \
} \
diff --git a/tools/include/linux/objtool_types.h b/tools/include/linux/objtool_types.h
index aceac94632c8a..c6def4049b1ae 100644
--- a/tools/include/linux/objtool_types.h
+++ b/tools/include/linux/objtool_types.h
@@ -67,4 +67,6 @@ struct unwind_hint {
#define ANNOTYPE_REACHABLE 8
#define ANNOTYPE_NOCFI 9
+#define ANNOTYPE_DATA_SPECIAL 1
+
#endif /* _LINUX_OBJTOOL_TYPES_H */
diff --git a/tools/include/linux/static_call_types.h b/tools/include/linux/static_call_types.h
index eb772df625d4e..5a00b8b2cf9fc 100644
--- a/tools/include/linux/static_call_types.h
+++ b/tools/include/linux/static_call_types.h
@@ -34,12 +34,6 @@ struct static_call_site {
s32 key;
};
-/* For finding the key associated with a trampoline */
-struct static_call_tramp_key {
- s32 tramp;
- s32 key;
-};
-
#define DECLARE_STATIC_CALL(name, func) \
extern struct static_call_key STATIC_CALL_KEY(name); \
extern typeof(func) STATIC_CALL_TRAMP(name);
diff --git a/tools/objtool/arch/loongarch/orc.c b/tools/objtool/arch/loongarch/orc.c
index b58c5ff443c92..ffd3a3c858ae7 100644
--- a/tools/objtool/arch/loongarch/orc.c
+++ b/tools/objtool/arch/loongarch/orc.c
@@ -5,7 +5,6 @@
#include <objtool/check.h>
#include <objtool/orc.h>
#include <objtool/warn.h>
-#include <objtool/endianness.h>
int init_orc_entry(struct orc_entry *orc, struct cfi_state *cfi, struct instruction *insn)
{
diff --git a/tools/objtool/arch/powerpc/decode.c b/tools/objtool/arch/powerpc/decode.c
index d4cb02120a6bd..3a9b748216edc 100644
--- a/tools/objtool/arch/powerpc/decode.c
+++ b/tools/objtool/arch/powerpc/decode.c
@@ -7,7 +7,6 @@
#include <objtool/arch.h>
#include <objtool/warn.h>
#include <objtool/builtin.h>
-#include <objtool/endianness.h>
int arch_ftrace_match(const char *name)
{
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 6f3aa117027a6..5c72beeaa3a71 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -19,7 +19,6 @@
#include <objtool/elf.h>
#include <objtool/arch.h>
#include <objtool/warn.h>
-#include <objtool/endianness.h>
#include <objtool/builtin.h>
#include <arch/elf.h>
diff --git a/tools/objtool/arch/x86/orc.c b/tools/objtool/arch/x86/orc.c
index 7176b9ec5b058..735e150ca6b73 100644
--- a/tools/objtool/arch/x86/orc.c
+++ b/tools/objtool/arch/x86/orc.c
@@ -5,7 +5,6 @@
#include <objtool/check.h>
#include <objtool/orc.h>
#include <objtool/warn.h>
-#include <objtool/endianness.h>
int init_orc_entry(struct orc_entry *orc, struct cfi_state *cfi, struct instruction *insn)
{
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index c4e45236a561d..b20b0077449b2 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -74,7 +74,7 @@ static int parse_hacks(const struct option *opt, const char *str, int unset)
static const struct option check_options[] = {
OPT_GROUP("Actions:"),
OPT_BOOLEAN(0, "checksum", &opts.checksum, "generate per-function checksums"),
- OPT_BOOLEAN(0 , "cfi", &opts.cfi, "annotate kernel control flow integrity (kCFI) function preambles"),
+ OPT_BOOLEAN(0, "cfi", &opts.cfi, "annotate kernel control flow integrity (kCFI) function preambles"),
OPT_CALLBACK_OPTARG('h', "hacks", NULL, NULL, "jump_label,noinstr,skylake", "patch toolchain bugs/limitations", parse_hacks),
OPT_BOOLEAN('i', "ibt", &opts.ibt, "validate and annotate IBT"),
OPT_BOOLEAN('m', "mcount", &opts.mcount, "annotate mcount/fentry calls for ftrace"),
@@ -162,7 +162,6 @@ static bool opts_valid(void)
return false;
}
-
#ifndef BUILD_KLP
if (opts.checksum) {
ERROR("--checksum not supported; install xxhash-devel and recompile");
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 2f0c86ba14a5c..ba591a325d52e 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -15,8 +15,8 @@
#include <objtool/check.h>
#include <objtool/special.h>
#include <objtool/warn.h>
-#include <objtool/endianness.h>
#include <objtool/checksum.h>
+#include <objtool/util.h>
#include <linux/objtool_types.h>
#include <linux/hashtable.h>
@@ -660,15 +660,8 @@ static int create_static_call_sections(struct objtool_file *file)
if (!sec)
return -1;
- /*
- * Set SHF_MERGE to prevent tooling from stripping entsize.
- *
- * SHF_WRITE would also get set here to allow modules to modify the low
- * bits of static_call_site::key, but the LLVM linker doesn't allow
- * SHF_MERGE+SHF_WRITE for whatever reason. That gets fixed up by the
- * makefiles with CONFIG_NEED_MODULE_PERMISSIONS_FIX.
- */
- sec->sh.sh_flags |= SHF_MERGE;
+ /* Allow modules to modify the low bits of static_call_site::key */
+ sec->sh.sh_flags |= SHF_WRITE;
idx = 0;
list_for_each_entry(insn, &file->static_call_list, call_node) {
@@ -1003,10 +996,10 @@ static int create_sym_checksum_section(struct objtool_file *file)
struct sym_checksum *checksum;
size_t entsize = sizeof(struct sym_checksum);
- sec = find_section_by_name(file->elf, SYM_CHECKSUM_SEC);
+ sec = find_section_by_name(file->elf, ".discard.sym_checksum");
if (sec) {
if (!opts.dryrun)
- WARN("file already has " SYM_CHECKSUM_SEC " section, skipping");
+ WARN("file already has .discard.sym_checksum section, skipping");
return 0;
}
@@ -2349,9 +2342,7 @@ static int read_annotate(struct objtool_file *file,
}
for_each_reloc(sec->rsec, reloc) {
- type = *(u32 *)(sec->data->d_buf + (reloc_idx(reloc) * sec->sh.sh_entsize) + 4);
- type = bswap_if_needed(file->elf, type);
-
+ type = annotype(file->elf, sec, reloc);
offset = reloc->sym->offset + reloc_addend(reloc);
insn = find_insn(file, reloc->sym->sec, offset);
@@ -4283,48 +4274,82 @@ static bool ignore_unreachable_insn(struct objtool_file *file, struct instructio
return false;
}
-static int add_prefix_symbol(struct objtool_file *file, struct symbol *func)
+/*
+ * For FineIBT or kCFI, a certain number of bytes preceding the function may be
+ * NOPs. Those NOPs may be rewritten at runtime and executed, so give them a
+ * proper function name: __pfx_<func>.
+ *
+ * The NOPs may not exist for the following cases:
+ *
+ * - compiler cloned functions (*.cold, *.part0, etc)
+ * - asm functions created with inline asm or without SYM_FUNC_START()
+ *
+ * Also, the function may already have a prefix from a previous objtool run
+ * (livepatch extracted functions, or manually running objtool multiple times).
+ *
+ * So return 0 if the NOPs are missing or the function already has a prefix
+ * symbol.
+ */
+static int create_prefix_symbol(struct objtool_file *file, struct symbol *func)
{
struct instruction *insn, *prev;
+ char name[SYM_NAME_LEN];
struct cfi_state *cfi;
- insn = find_insn(file, func->sec, func->offset);
- if (!insn)
+ if (!is_func_sym(func) || is_prefix_func(func) ||
+ func->cold || func->static_call_tramp)
+ return 0;
+
+ if ((strlen(func->name) + sizeof("__pfx_") > SYM_NAME_LEN)) {
+ WARN("%s: symbol name too long, can't create __pfx_ symbol",
+ func->name);
+ return 0;
+ }
+
+ if (snprintf_check(name, SYM_NAME_LEN, "__pfx_%s", func->name))
return -1;
+ if (file->klp) {
+ struct symbol *pfx;
+
+ pfx = find_symbol_by_offset(func->sec, func->offset - opts.prefix);
+ if (pfx && is_prefix_func(pfx) && !strcmp(pfx->name, name))
+ return 0;
+ }
+
+ insn = find_insn(file, func->sec, func->offset);
+ if (!insn) {
+ WARN("%s: can't find starting instruction", func->name);
+ return -1;
+ }
+
for (prev = prev_insn_same_sec(file, insn);
prev;
prev = prev_insn_same_sec(file, prev)) {
- struct symbol *sym_pfx;
u64 offset;
if (prev->type != INSN_NOP)
- return -1;
+ return 0;
offset = func->offset - prev->offset;
if (offset > opts.prefix)
- return -1;
+ return 0;
if (offset < opts.prefix)
continue;
- /*
- * Ignore attempts to make duplicate symbols in livepatch
- * modules. They've already extracted the prefix symbols
- * except for the newly compiled init.c.
- */
- sym_pfx = elf_create_prefix_symbol(file->elf, func, opts.prefix);
- if (!sym_pfx && !file->klp) {
- WARN("duplicate prefix symbol for %s\n", func->name);
+ if (!elf_create_symbol(file->elf, name, func->sec,
+ GELF_ST_BIND(func->sym.st_info),
+ GELF_ST_TYPE(func->sym.st_info),
+ prev->offset, opts.prefix))
return -1;
- }
break;
}
if (!prev)
- return -1;
+ return 0;
if (!insn->cfi) {
/*
@@ -4342,7 +4367,7 @@ static int add_prefix_symbol(struct objtool_file *file, struct symbol *func)
return 0;
}
-static int add_prefix_symbols(struct objtool_file *file)
+static int create_prefix_symbols(struct objtool_file *file)
{
struct section *sec;
struct symbol *func;
@@ -4352,14 +4377,8 @@ static int add_prefix_symbols(struct objtool_file *file)
continue;
sec_for_each_sym(sec, func) {
- if (!is_func_sym(func))
- continue;
-
- /*
- * Ignore this error on purpose, there are valid
- * reasons for this to fail.
- */
- add_prefix_symbol(file, func);
+ if (create_prefix_symbol(file, func))
+ return -1;
}
}
@@ -4987,7 +5006,7 @@ int check(struct objtool_file *file)
}
if (opts.prefix) {
- ret = add_prefix_symbols(file);
+ ret = create_prefix_symbols(file);
if (ret)
goto out;
}
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 2551a5727949f..5feeefc7fc8f8 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -945,32 +945,6 @@ struct symbol *elf_create_section_symbol(struct elf *elf, struct section *sec)
return sym;
}
-struct symbol *
-elf_create_prefix_symbol(struct elf *elf, struct symbol *orig, size_t size)
-{
- size_t namelen = strlen(orig->name) + sizeof("__pfx_");
- char name[SYM_NAME_LEN];
- unsigned long offset;
- struct symbol *sym;
-
- snprintf(name, namelen, "__pfx_%s", orig->name);
-
- sym = orig;
- offset = orig->sym.st_value - size;
-
- sec_for_each_sym_continue_reverse(orig->sec, sym) {
- if (sym->offset < offset)
- break;
- if (sym->offset == offset && !strcmp(sym->name, name))
- return NULL;
- }
-
- return elf_create_symbol(elf, name, orig->sec,
- GELF_ST_BIND(orig->sym.st_info),
- GELF_ST_TYPE(orig->sym.st_info),
- offset, size);
-}
-
struct reloc *elf_init_reloc(struct elf *elf, struct section *rsec,
unsigned int reloc_idx, unsigned long offset,
struct symbol *sym, s64 addend, unsigned int type)
@@ -1075,8 +1049,10 @@ static int read_relocs(struct elf *elf)
rsec->base->rsec = rsec;
- rsec->nr_alloc_relocs = sec_num_entries(rsec);
- rsec->relocs = calloc(rsec->nr_alloc_relocs, sizeof(*reloc));
+ /* nr_alloc_relocs=0: libelf owns d_buf */
+ rsec->nr_alloc_relocs = 0;
+
+ rsec->relocs = calloc(sec_num_entries(rsec), sizeof(*reloc));
if (!rsec->relocs) {
ERROR_GLIBC("calloc");
return -1;
@@ -1496,15 +1472,32 @@ static int elf_alloc_reloc(struct elf *elf, struct section *rsec)
nr_alloc = MAX(64, ALIGN_UP_POW2(nr_relocs_new));
if (nr_alloc <= rsec->nr_alloc_relocs)
return 0;
- rsec->nr_alloc_relocs = nr_alloc;
- rsec->data->d_buf = realloc(rsec->data->d_buf,
- nr_alloc * elf_rela_size(elf));
- if (!rsec->data->d_buf) {
- ERROR_GLIBC("realloc");
- return -1;
+ if (rsec->data->d_buf && !rsec->nr_alloc_relocs) {
+ void *orig_buf = rsec->data->d_buf;
+
+ /*
+ * The original d_buf is owned by libelf so it can't be
+ * realloced.
+ */
+ rsec->data->d_buf = malloc(nr_alloc * elf_rela_size(elf));
+ if (!rsec->data->d_buf) {
+ ERROR_GLIBC("malloc");
+ return -1;
+ }
+ memcpy(rsec->data->d_buf, orig_buf,
+ nr_relocs_old * elf_rela_size(elf));
+ } else {
+ rsec->data->d_buf = realloc(rsec->data->d_buf,
+ nr_alloc * elf_rela_size(elf));
+ if (!rsec->data->d_buf) {
+ ERROR_GLIBC("realloc");
+ return -1;
+ }
}
+ rsec->nr_alloc_relocs = nr_alloc;
+
old_relocs = rsec->relocs;
new_relocs = calloc(nr_alloc, sizeof(struct reloc));
if (!new_relocs) {
@@ -1623,6 +1616,8 @@ struct reloc *elf_create_reloc(struct elf *elf, struct section *sec,
if (elf_alloc_reloc(elf, rsec))
return NULL;
+ mark_sec_changed(elf, rsec, true);
+
return elf_init_reloc(elf, rsec, sec_num_entries(rsec) - 1, offset, sym,
addend, type);
}
diff --git a/tools/objtool/include/objtool/builtin.h b/tools/objtool/include/objtool/builtin.h
index 8a0d42aa4d858..bb0b25eb08ba4 100644
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -10,6 +10,7 @@
struct opts {
/* actions: */
bool cfi;
+ bool checksum;
bool dump_orc;
bool hack_jump_label;
bool hack_noinstr;
@@ -25,7 +26,6 @@ struct opts {
bool sls;
bool stackval;
bool static_call;
- bool checksum;
bool uaccess;
int prefix;
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index 64e75ade01c90..21d8b825fd8f0 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -14,12 +14,15 @@
#include <linux/rbtree.h>
#include <linux/jhash.h>
+#include <objtool/endianness.h>
#include <objtool/checksum_types.h>
#include <arch/elf.h>
#define SEC_NAME_LEN 1024
#define SYM_NAME_LEN 512
+#define bswap_if_needed(elf, val) __bswap_if_needed(&elf->ehdr, val)
+
#ifdef LIBELF_USE_DEPRECATED
# define elf_getshdrnum elf_getshnum
# define elf_getshdrstrndx elf_getshstrndx
@@ -146,8 +149,6 @@ struct symbol *elf_create_symbol(struct elf *elf, const char *name,
unsigned int type, unsigned long offset,
size_t size);
struct symbol *elf_create_section_symbol(struct elf *elf, struct section *sec);
-struct symbol *elf_create_prefix_symbol(struct elf *elf, struct symbol *orig,
- size_t size);
void *elf_add_data(struct elf *elf, struct section *sec, const void *data,
size_t size);
@@ -274,7 +275,7 @@ static inline bool is_local_sym(struct symbol *sym)
static inline bool is_prefix_func(struct symbol *sym)
{
- return is_func_sym(sym) && sym->prefix;
+ return sym->prefix;
}
static inline bool is_reloc_sec(struct section *sec)
@@ -414,6 +415,15 @@ static inline void set_reloc_type(struct elf *elf, struct reloc *reloc, unsigned
mark_sec_changed(elf, reloc->sec, true);
}
+static inline unsigned int annotype(struct elf *elf, struct section *sec,
+ struct reloc *reloc)
+{
+ unsigned int type;
+
+ type = *(u32 *)(sec->data->d_buf + (reloc_idx(reloc) * 8) + 4);
+ return bswap_if_needed(elf, type);
+}
+
#define RELOC_JUMP_TABLE_BIT 1UL
/* Does reloc mark the beginning of a jump table? */
@@ -445,9 +455,6 @@ static inline void set_sym_next_reloc(struct reloc *reloc, struct reloc *next)
#define sec_for_each_sym(sec, sym) \
list_for_each_entry(sym, &sec->symbol_list, list)
-#define sec_for_each_sym_continue_reverse(sec, sym) \
- list_for_each_entry_continue_reverse(sym, &sec->symbol_list, list)
-
#define sec_prev_sym(sym) \
sym->sec && sym->list.prev != &sym->sec->symbol_list ? \
list_prev_entry(sym, list) : NULL
@@ -467,6 +474,10 @@ static inline void set_sym_next_reloc(struct reloc *reloc, struct reloc *next)
#define for_each_reloc_from(rsec, reloc) \
for (; reloc; reloc = rsec_next_reloc(rsec, reloc))
+#define for_each_reloc_continue(rsec, reloc) \
+ for (reloc = rsec_next_reloc(rsec, reloc); reloc; \
+ reloc = rsec_next_reloc(rsec, reloc))
+
#define sym_for_each_reloc(elf, sym, reloc) \
for (reloc = find_reloc_by_dest_range(elf, sym->sec, \
sym->offset, sym->len); \
diff --git a/tools/objtool/include/objtool/endianness.h b/tools/objtool/include/objtool/endianness.h
index 4d2aa9b0fe2fd..aebcd23386685 100644
--- a/tools/objtool/include/objtool/endianness.h
+++ b/tools/objtool/include/objtool/endianness.h
@@ -4,7 +4,6 @@
#include <linux/kernel.h>
#include <endian.h>
-#include <objtool/elf.h>
/*
* Does a byte swap if target file endianness doesn't match the host, i.e. cross
@@ -12,16 +11,16 @@
* To be used for multi-byte values conversion, which are read from / about
* to be written to a target native endianness ELF file.
*/
-static inline bool need_bswap(struct elf *elf)
+static inline bool need_bswap(GElf_Ehdr *ehdr)
{
return (__BYTE_ORDER == __LITTLE_ENDIAN) ^
- (elf->ehdr.e_ident[EI_DATA] == ELFDATA2LSB);
+ (ehdr->e_ident[EI_DATA] == ELFDATA2LSB);
}
-#define bswap_if_needed(elf, val) \
+#define __bswap_if_needed(ehdr, val) \
({ \
__typeof__(val) __ret; \
- bool __need_bswap = need_bswap(elf); \
+ bool __need_bswap = need_bswap(ehdr); \
switch (sizeof(val)) { \
case 8: \
__ret = __need_bswap ? bswap_64(val) : (val); break; \
diff --git a/tools/objtool/include/objtool/objtool.h b/tools/objtool/include/objtool/objtool.h
index 731965a742e99..f7051bbe0bcb2 100644
--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -14,8 +14,6 @@
#define __weak __attribute__((weak))
-#define SYM_CHECKSUM_SEC ".discard.sym_checksum"
-
struct pv_state {
bool clean;
struct list_head targets;
diff --git a/tools/objtool/klp-diff.c b/tools/objtool/klp-diff.c
index 15b554b53da63..4d1f9e9977eb9 100644
--- a/tools/objtool/klp-diff.c
+++ b/tools/objtool/klp-diff.c
@@ -14,6 +14,7 @@
#include <objtool/util.h>
#include <arch/special.h>
+#include <linux/objtool_types.h>
#include <linux/livepatch_external.h>
#include <linux/stringify.h>
#include <linux/string.h>
@@ -167,15 +168,15 @@ static int read_sym_checksums(struct elf *elf)
{
struct section *sec;
- sec = find_section_by_name(elf, SYM_CHECKSUM_SEC);
+ sec = find_section_by_name(elf, ".discard.sym_checksum");
if (!sec) {
- ERROR("'%s' missing " SYM_CHECKSUM_SEC " section, file not processed by 'objtool --checksum'?",
+ ERROR("'%s' missing .discard.sym_checksum section, file not processed by 'objtool --checksum'?",
elf->name);
return -1;
}
if (!sec->rsec) {
- ERROR("missing reloc section for " SYM_CHECKSUM_SEC);
+ ERROR("missing reloc section for .discard.sym_checksum");
return -1;
}
@@ -297,7 +298,7 @@ static bool is_special_section(struct section *sec)
static const char * const non_special_discards[] = {
".discard.addressable",
- SYM_CHECKSUM_SEC,
+ ".discard.sym_checksum",
};
if (is_text_sec(sec))
@@ -1150,6 +1151,135 @@ static int clone_sym_relocs(struct elfs *e, struct symbol *patched_sym)
}
+static int create_fake_symbol(struct elf *elf, struct section *sec,
+ unsigned long offset, size_t size)
+{
+ char name[SYM_NAME_LEN];
+ unsigned int type;
+ static int ctr;
+ char *c;
+
+ if (snprintf_check(name, SYM_NAME_LEN, "%s_%d", sec->name, ctr++))
+ return -1;
+
+ for (c = name; *c; c++)
+ if (*c == '.')
+ *c = '_';
+
+ /*
+ * STT_NOTYPE: Prevent objtool from validating .altinstr_replacement
+ * while still allowing objdump to disassemble it.
+ */
+ type = is_text_sec(sec) ? STT_NOTYPE : STT_OBJECT;
+ return elf_create_symbol(elf, name, sec, STB_LOCAL, type, offset, size) ? 0 : -1;
+}
+
+/*
+ * Special sections (alternatives, etc) are basically arrays of structs.
+ * For all the special sections, create a symbol for each struct entry. This
+ * is a bit cumbersome, but it makes the extracting of the individual entries
+ * much more straightforward.
+ *
+ * There are three ways to identify the entry sizes for a special section:
+ *
+ * 1) ELF section header sh_entsize: Ideally this would be used almost
+ * everywhere. But unfortunately the toolchains make it difficult. The
+ * assembler .[push]section directive syntax only takes entsize when
+ * combined with SHF_MERGE. But Clang disallows combining SHF_MERGE with
+ * SHF_WRITE. And some special sections do need to be writable.
+ *
+ * Another place this wouldn't work is .altinstr_replacement, whose entries
+ * don't have a fixed size.
+ *
+ * 2) ANNOTATE_DATA_SPECIAL: This is a lightweight objtool annotation which
+ * points to the beginning of each entry. The size of the entry is then
+ * inferred by the location of the subsequent annotation (or end of
+ * section).
+ *
+ * 3) Simple array of pointers: If the special section is just a basic array of
+ * pointers, the entry size can be inferred by the number of relocations.
+ * No annotations needed.
+ *
+ * Note I also tried to create per-entry symbols at the time of creation, in
+ * the original [inline] asm. Unfortunately, creating uniquely named symbols
+ * is trickier than one might think, especially with Clang inline asm. I
+ * eventually just gave up trying to make that work, in favor of using
+ * ANNOTATE_DATA_SPECIAL and creating the symbols here after the fact.
+ */
+static int create_fake_symbols(struct elf *elf)
+{
+ struct section *sec;
+ struct reloc *reloc;
+
+ /*
+ * 1) Make symbols for all the ANNOTATE_DATA_SPECIAL entries:
+ */
+
+ sec = find_section_by_name(elf, ".discard.annotate_data");
+ if (!sec || !sec->rsec)
+ return 0;
+
+ for_each_reloc(sec->rsec, reloc) {
+ unsigned long offset, size;
+ struct reloc *next_reloc;
+
+ if (annotype(elf, sec, reloc) != ANNOTYPE_DATA_SPECIAL)
+ continue;
+
+ offset = reloc_addend(reloc);
+
+ size = 0;
+ next_reloc = reloc;
+ for_each_reloc_continue(sec->rsec, next_reloc) {
+ if (annotype(elf, sec, next_reloc) != ANNOTYPE_DATA_SPECIAL ||
+ next_reloc->sym->sec != reloc->sym->sec)
+ continue;
+
+ size = reloc_addend(next_reloc) - offset;
+ break;
+ }
+
+ if (!size)
+ size = sec_size(reloc->sym->sec) - offset;
+
+ if (create_fake_symbol(elf, reloc->sym->sec, offset, size))
+ return -1;
+ }
+
+ /*
+ * 2) Make symbols for sh_entsize, and simple arrays of pointers:
+ */
+
+ for_each_sec(elf, sec) {
+ unsigned int entry_size;
+ unsigned long offset;
+
+ if (!is_special_section(sec) || find_symbol_by_offset(sec, 0))
+ continue;
+
+ if (!sec->rsec) {
+ ERROR("%s: missing special section relocations", sec->name);
+ return -1;
+ }
+
+ entry_size = sec->sh.sh_entsize;
+ if (!entry_size) {
+ entry_size = arch_reloc_size(sec->rsec->relocs);
+ if (sec_size(sec) != entry_size * sec_num_entries(sec->rsec)) {
+ ERROR("%s: missing special section entsize or annotations", sec->name);
+ return -1;
+ }
+ }
+
+ for (offset = 0; offset < sec_size(sec); offset += entry_size) {
+ if (create_fake_symbol(elf, sec, offset, entry_size))
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
/* Keep a special section entry if it references an included function */
static bool should_keep_special_sym(struct elf *elf, struct symbol *sym)
{
@@ -1260,99 +1390,12 @@ static int validate_special_section_klp_reloc(struct elfs *e, struct symbol *sym
return ret;
}
-static int special_section_entry_size(struct section *sec)
-{
- unsigned int reloc_size;
-
- if ((sec->sh.sh_flags & SHF_MERGE) && sec->sh.sh_entsize)
- return sec->sh.sh_entsize;
-
- if (!sec->rsec)
- return 0;
-
- /* Check for a simple array of pointers */
- reloc_size = arch_reloc_size(sec->rsec->relocs);
- if (sec_size(sec) == reloc_size * sec_num_entries(sec->rsec))
- return reloc_size;
-
- return 0;
-}
-
-static int create_fake_symbol(struct elf *elf, struct section *sec,
- unsigned long offset, size_t size)
-{
- char name[SYM_NAME_LEN];
- unsigned int type;
- static int ctr;
- char *c;
-
- if (snprintf_check(name, SYM_NAME_LEN, "__DISCARD_%s_%d", sec->name, ctr++))
- return -1;
-
- for (c = name; *c; c++)
- if (*c == '.')
- *c = '_';
-
- /*
- * STT_NOTYPE: Prevent objtool from validating .altinstr_replacement
- * while still allowing objdump to disassemble it.
- */
- type = is_text_sec(sec) ? STT_NOTYPE : STT_OBJECT;
- if (!elf_create_symbol(elf, name, sec, STB_LOCAL, type, offset, size))
- return -1;
-
- return 0;
-}
-
static int clone_special_section(struct elfs *e, struct section *patched_sec)
{
struct symbol *patched_sym;
- unsigned int entry_size;
- unsigned long offset;
-
- entry_size = special_section_entry_size(patched_sec);
- if (!entry_size) {
- /*
- * Any special section more complex than a simple array of
- * pointers must have its entry size specified in sh_entsize
- * (and the SHF_MERGE flag set so the linker preserves it).
- *
- * Clang older than version 20 doesn't properly preserve
- * sh_entsize and will error out here.
- */
- ERROR("%s: buggy linker and/or missing sh_entsize", patched_sec->name);
- return -1;
- }
/*
- * In the patched object, create a fake symbol for each special section
- * entry. This makes the below extracting of entries much easier.
- */
- for (offset = 0; offset < sec_size(patched_sec); offset += entry_size) {
- if (create_fake_symbol(e->patched, patched_sec, offset, entry_size))
- return -1;
-
- /* Symbolize alternative replacements: */
- if (!strcmp(patched_sec->name, ".altinstructions")) {
- struct reloc *reloc;
- unsigned char size;
-
- reloc = find_reloc_by_dest(e->patched, patched_sec, offset + ALT_NEW_OFFSET);
- if (!reloc) {
- ERROR_FUNC(patched_sec, offset + ALT_NEW_OFFSET, "can't find new reloc");
- return -1;
- }
-
- size = *(unsigned char *)(patched_sec->data->d_buf + offset + ALT_NEW_LEN_OFFSET);
-
- if (create_fake_symbol(e->patched, reloc->sym->sec,
- reloc->sym->offset + reloc_addend(reloc), size))
- return -1;
- }
- }
-
- /*
- * Extract all special section entries (and their dependencies) which
+ * Extract all special section symbols (and their dependencies) which
* reference included functions.
*/
sec_for_each_sym(patched_sec, patched_sym) {
@@ -1382,6 +1425,9 @@ static int clone_special_sections(struct elfs *e)
{
struct section *patched_sec;
+ if (create_fake_symbols(e->patched))
+ return -1;
+
for_each_sec(e->patched, patched_sec) {
if (is_special_section(patched_sec)) {
if (clone_special_section(e, patched_sec))
diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c
index 1dd9fc18fe624..5a979f52425ab 100644
--- a/tools/objtool/orc_dump.c
+++ b/tools/objtool/orc_dump.c
@@ -8,7 +8,6 @@
#include <objtool/objtool.h>
#include <objtool/orc.h>
#include <objtool/warn.h>
-#include <objtool/endianness.h>
int orc_dump(const char *filename)
{
diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c
index 9d380abc2ed35..1045e1380ffde 100644
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -12,7 +12,6 @@
#include <objtool/check.h>
#include <objtool/orc.h>
#include <objtool/warn.h>
-#include <objtool/endianness.h>
struct orc_list_entry {
struct list_head list;
diff --git a/tools/objtool/special.c b/tools/objtool/special.c
index fc2cf8dba1c03..e262af9171436 100644
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -15,7 +15,6 @@
#include <objtool/builtin.h>
#include <objtool/special.h>
#include <objtool/warn.h>
-#include <objtool/endianness.h>
struct special_entry {
const char *sec;
diff --git a/tools/objtool/sync-check.sh b/tools/objtool/sync-check.sh
index e1d98fb031575..e38167ca56a95 100755
--- a/tools/objtool/sync-check.sh
+++ b/tools/objtool/sync-check.sh
@@ -16,6 +16,7 @@ arch/x86/include/asm/orc_types.h
arch/x86/include/asm/emulate_prefix.h
arch/x86/lib/x86-opcode-map.txt
arch/x86/tools/gen-insn-attr-x86.awk
+include/linux/interval_tree_generic.h
include/linux/livepatch_external.h
include/linux/static_call_types.h
"
^ permalink raw reply related
* Re: [PATCH v3 42/64] kbuild,x86: Fix special section module permissions
From: Josh Poimboeuf @ 2025-09-16 23:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, Petr Mladek, Miroslav Benes, Joe Lawrence,
live-patching, Song Liu, laokz, Jiri Kosina,
Marcos Paulo de Souza, Weinan Liu, Fazla Mehrab, Chen Zhongjin,
Puranjay Mohan, Dylan Hatch, Masahiro Yamada
In-Reply-To: <20250630073144.GE1613200@noisy.programming.kicks-ass.net>
On Mon, Jun 30, 2025 at 09:31:44AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 27, 2025 at 10:34:15AM -0700, Josh Poimboeuf wrote:
> > On Fri, Jun 27, 2025 at 12:53:28PM +0200, Peter Zijlstra wrote:
> > > On Thu, Jun 26, 2025 at 04:55:29PM -0700, Josh Poimboeuf wrote:
> > > > An upcoming patch will add the SHF_MERGE flag to x86 __jump_table and
> > > > __bug_table so their entry sizes can be defined in inline asm.
> > > >
> > > > However, those sections have SHF_WRITE, which the Clang linker (lld)
> > > > explicitly forbids combining with SHF_MERGE.
> > > >
> > > > Those sections are modified at runtime and must remain writable. While
> > > > SHF_WRITE is ignored by vmlinux, it's still needed for modules.
> > > >
> > > > To work around the linker interference, remove SHF_WRITE during
> > > > compilation and restore it after linking the module.
> > >
> > > This is vile... but I'm not sure I have a better solution.
> > >
> > > Eventually we should get the toolchains fixed, but we can't very well
> > > mandate clang-21+ to build x86 just yet.
> >
> > Yeah, I really hate this too. I really tried to find something better,
> > including mucking with the linker script, but this was unfortunately the
> > only thing that worked.
> >
> > Though, looking at it again, I realize we can localize the pain to Clang
> > (and the makefile) by leaving the code untouched and instead strip
> > SHF_WRITE before the link and re-add it afterwards. Then we can tie
> > this horrible hack to specific Clang versions when it gets fixed.
>
> Oh yeah, that might be nicer indeed!
I ended up giving up on this approach. It was just too painful trying
to get the toolchain to cooperate. It does kind of make sense that
SHF_WRITE and SHF_MERGE are incompatible.
What I really needed was a way to set entsize from assembly, but that
doesn't seem possible without SHF_MERGE.
For now I'm just going with marking the beginning of each special
section entry with an objtool annotation. Then objtool can deduce the
sizes from the distance between the annotations.
--
Josh
^ permalink raw reply
* [PATCH v2] LoongArch: Fix unreliable stack for live patching
From: Tiezhu Yang @ 2025-09-16 9:35 UTC (permalink / raw)
To: Huacai Chen; +Cc: Xi Zhang, live-patching, loongarch, linux-kernel
When testing the kernel live patching with "modprobe livepatch-sample",
there is a timeout over 15 seconds from "starting patching transition"
to "patching complete", dmesg shows "unreliable stack" for user tasks
in debug mode, here is one of the messages:
livepatch: klp_try_switch_task: bash:1193 has an unreliable stack
The "unreliable stack" is because it can not unwind from do_syscall()
to its previous frame handle_syscall(), it should use fp to find the
original stack top due to secondary stack in do_syscall(), but fp is
not used for some other functions, then fp can not be restored by the
next frame of do_syscall(), so it is necessary to save fp if task is
not current to get the stack top of do_syscall().
Here are the call chains:
klp_enable_patch()
klp_try_complete_transition()
klp_try_switch_task()
klp_check_and_switch_task()
klp_check_stack()
stack_trace_save_tsk_reliable()
arch_stack_walk_reliable()
When executing "rmmod livepatch-sample", there exists the similar issue.
With this patch, it takes a short time for patching and unpatching.
Before:
# modprobe livepatch-sample
# dmesg -T | tail -3
[Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
[Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
rmmod: ERROR: Module livepatch_sample is in use
# rmmod livepatch_sample
# dmesg -T | tail -3
[Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
After:
# modprobe livepatch-sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:30 2025] livepatch: 'livepatch_sample': starting patching transition
[Tue Sep 16 16:19:31 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:36 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Tue Sep 16 16:19:37 2025] livepatch: 'livepatch_sample': unpatching complete
Cc: stable@vger.kernel.org # v6.9+
Fixes: 199cc14cb4f1 ("LoongArch: Add kernel livepatching support")
Reported-by: Xi Zhang <zhangxi@kylinos.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
arch/loongarch/kernel/stacktrace.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
index 9a038d1070d7..387dc4d3c486 100644
--- a/arch/loongarch/kernel/stacktrace.c
+++ b/arch/loongarch/kernel/stacktrace.c
@@ -51,12 +51,13 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
if (task == current) {
regs->regs[3] = (unsigned long)__builtin_frame_address(0);
regs->csr_era = (unsigned long)__builtin_return_address(0);
+ regs->regs[22] = 0;
} else {
regs->regs[3] = thread_saved_fp(task);
regs->csr_era = thread_saved_ra(task);
+ regs->regs[22] = task->thread.reg22;
}
regs->regs[1] = 0;
- regs->regs[22] = 0;
for (unwind_start(&state, task, regs);
!unwind_done(&state) && !unwind_error(&state); unwind_next_frame(&state)) {
--
2.42.0
^ permalink raw reply related
* Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
From: Tiezhu Yang @ 2025-09-15 8:54 UTC (permalink / raw)
To: Jinyang He
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <26036193-f570-3a17-e6d3-45ad70704198@loongson.cn>
On 2025/9/12 上午9:55, Jinyang He wrote:
> On 2025-09-11 19:49, Tiezhu Yang wrote:
>
>> On 2025/9/10 上午9:11, Jinyang He wrote:
>>> On 2025-09-09 19:31, Tiezhu Yang wrote:
>>>
>>>> When testing the kernel live patching with "modprobe livepatch-sample",
>>>> there is a timeout over 15 seconds from "starting patching transition"
>>>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>>>> in debug mode. When executing "rmmod livepatch-sample", there exists
>>>> the similar issue.
...
>> for this case, get_stack_info() does not return 0 due to in_task_stack()
>> is not true, then goto error, state->stack_info.type = STACK_TYPE_UNKNOWN
>> and state->error = true. In arch_stack_walk_reliable(), the loop will be
>> break and it returns -EINVAL, thus causing unreliable stack.
> The stop position of a complete stack backtrace on LoongArch should be
> the top of the task stack or until the address is_entry_func.
> Otherwise, it is not a complete stack backtrace, and thus I think it
> is an "unreliable stack".
> I'm curious about what the ORC info at this PC.
The unwind process has problem, I have found the root cause and am
working to fix the "unreliable stack" issue, it should and can find
the last frame, and then the user_mode() check is not necessary.
Thanks,
Tiezhu
^ permalink raw reply
* Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
From: Tiezhu Yang @ 2025-09-15 8:48 UTC (permalink / raw)
To: Miroslav Benes
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <alpine.LSU.2.21.2509111541590.29971@pobox.suse.cz>
On 2025/9/11 下午9:44, Miroslav Benes wrote:
> Hi,
>
> On Tue, 9 Sep 2025, Tiezhu Yang wrote:
>
>> When testing the kernel live patching with "modprobe livepatch-sample",
>> there is a timeout over 15 seconds from "starting patching transition"
>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>> in debug mode. When executing "rmmod livepatch-sample", there exists
>> the similar issue.
>>
>> Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
>> It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
>> then use user_mode() to check whether the task is in userspace.
>
> it is a nice optimization for sure, but "unreliable stack" messages point
> to a fact that the unwinding of these tasks is probably suboptimal and
> could be improved, no?
Yes, makes sense, I will fix "unreliable stack" in the next version.
> It would also be nice to include these messages (not for all tasks) to the
> commit message.
OK, will do it.
Thanks,
Tiezhu
^ permalink raw reply
* Re: [PATCH v2 0/3] powerpc/ftrace: Fix livepatch module OOL ftrace corruption
From: Naveen N Rao @ 2025-09-15 5:43 UTC (permalink / raw)
To: Joe Lawrence
Cc: linuxppc-dev, live-patching, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy
In-Reply-To: <20250912142740.3581368-1-joe.lawrence@redhat.com>
On Fri, Sep 12, 2025 at 10:27:37AM -0400, Joe Lawrence wrote:
> This patch series fixes a couple of bugs in the powerpc64 out-of-line
> (OOL) ftrace support for modules, and follows up with a patch to
> simplify the module .stubs allocation code. An analysis of the module
> stub area corruption that prompted this work can be found in the v1
> thread [1].
>
> The first two patches fix bugs introduced by commit eec37961a56a
> ("powerpc64/ftrace: Move ftrace sequence out of line"). The first,
> suggested by Naveen, ensures that a NOP'd ftrace call site has its
> ftrace_ops record updated correctly. The second patch corrects a loop in
> setup_ftrace_ool_stubs() to ensure all required stubs are reserved, not
> just the first. Together, these bugs lead to potential corruption of the
> OOL ftrace stubs area for livepatch modules.
>
> The final patch replaces the sentinel-based allocation in the module
> .stubs section with an explicit counter. This improves clarity and helps
> prevent similar problems in the future.
>
> Changes since v1: https://lore.kernel.org/live-patching/df7taxdxpbo4qfn7lniggj5o4ili6kweg4nytyb2fwwwgmnyo4@halp5gf244nn/T/
>
> - Split into parts: bug fix x2, code cleanup
> - Call ftrace_rec_set_nop_ops() from ftrace_init_nop() [Naveen]
> - Update commit msg on cleanup patch [Naveen]
>
> Joe Lawrence (3):
> powerpc/ftrace: ensure ftrace record ops are always set for NOPs
> powerpc64/modules: correctly iterate over stubs in
> setup_ftrace_ool_stubs
> powerpc64/modules: replace stub allocation sentinel with an explicit
> counter
>
> arch/powerpc/include/asm/module.h | 1 +
> arch/powerpc/kernel/module_64.c | 26 ++++++++------------------
> arch/powerpc/kernel/trace/ftrace.c | 10 ++++++++--
> 3 files changed, 17 insertions(+), 20 deletions(-)
Thanks for fixing this! For the series:
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
- Naveen
^ permalink raw reply
* [PATCH v2 3/3] powerpc64/modules: replace stub allocation sentinel with an explicit counter
From: Joe Lawrence @ 2025-09-12 14:27 UTC (permalink / raw)
To: linuxppc-dev, live-patching
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao
In-Reply-To: <20250912142740.3581368-1-joe.lawrence@redhat.com>
The logic for allocating ppc64_stub_entry trampolines in the .stubs
section relies on an inline sentinel, where a NULL .funcdata member
indicates an available slot.
While preceding commits fixed the initialization bugs that led to ftrace
stub corruption, the sentinel-based approach remains fragile: it depends
on an implicit convention between subsystems modifying different
struct types in the same memory area.
Replace the sentinel with an explicit counter, module->arch.num_stubs.
Instead of iterating through memory to find a NULL marker, the module
loader uses this counter as the boundary for the next free slot.
This simplifies the allocation code, hardens it against future changes
to stub structures, and removes the need for an extra relocation slot
previously reserved to terminate the sentinel search.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
---
arch/powerpc/include/asm/module.h | 1 +
arch/powerpc/kernel/module_64.c | 26 ++++++++------------------
2 files changed, 9 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h
index e1ee5026ac4a..864e22deaa2c 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -27,6 +27,7 @@ struct ppc_plt_entry {
struct mod_arch_specific {
#ifdef __powerpc64__
unsigned int stubs_section; /* Index of stubs section in module */
+ unsigned int stub_count; /* Number of stubs used */
#ifdef CONFIG_PPC_KERNEL_PCREL
unsigned int got_section; /* What section is the GOT? */
unsigned int pcpu_section; /* .data..percpu section */
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 0e45cac4de76..2a44bc8e2439 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -209,8 +209,7 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
char *secstrings,
struct module *me)
{
- /* One extra reloc so it's always 0-addr terminated */
- unsigned long relocs = 1;
+ unsigned long relocs = 0;
unsigned i;
/* Every relocated section... */
@@ -705,7 +704,7 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
/* Find this stub, or if that fails, the next avail. entry */
stubs = (void *)sechdrs[me->arch.stubs_section].sh_addr;
- for (i = 0; stub_func_addr(stubs[i].funcdata); i++) {
+ for (i = 0; i < me->arch.stub_count; i++) {
if (WARN_ON(i >= num_stubs))
return 0;
@@ -716,6 +715,7 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
if (!create_stub(sechdrs, &stubs[i], addr, me, name))
return 0;
+ me->arch.stub_count++;
return (unsigned long)&stubs[i];
}
@@ -1118,29 +1118,19 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
static int setup_ftrace_ool_stubs(const Elf64_Shdr *sechdrs, unsigned long addr, struct module *me)
{
#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
- unsigned int i, total_stubs, num_stubs;
+ unsigned int total_stubs, num_stubs;
struct ppc64_stub_entry *stub;
total_stubs = sechdrs[me->arch.stubs_section].sh_size / sizeof(*stub);
num_stubs = roundup(me->arch.ool_stub_count * sizeof(struct ftrace_ool_stub),
sizeof(struct ppc64_stub_entry)) / sizeof(struct ppc64_stub_entry);
- /* Find the next available entry */
- stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
- for (i = 0; stub_func_addr(stub[i].funcdata); i++)
- if (WARN_ON(i >= total_stubs))
- return -1;
-
- if (WARN_ON(i + num_stubs > total_stubs))
+ if (WARN_ON(me->arch.stub_count + num_stubs > total_stubs))
return -1;
- stub += i;
- me->arch.ool_stubs = (struct ftrace_ool_stub *)stub;
-
- /* reserve stubs */
- for (i = 0; i < num_stubs; i++)
- if (patch_u32((void *)&stub[i].funcdata, PPC_RAW_NOP()))
- return -1;
+ stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
+ me->arch.ool_stubs = (struct ftrace_ool_stub *)(stub + me->arch.stub_count);
+ me->arch.stub_count += num_stubs;
#endif
return 0;
--
2.51.0
^ permalink raw reply related
* [PATCH v2 2/3] powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
From: Joe Lawrence @ 2025-09-12 14:27 UTC (permalink / raw)
To: linuxppc-dev, live-patching
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao
In-Reply-To: <20250912142740.3581368-1-joe.lawrence@redhat.com>
CONFIG_PPC_FTRACE_OUT_OF_LINE introduced setup_ftrace_ool_stubs() to
extend the ppc64le module .stubs section with an array of
ftrace_ool_stub structures for each patchable function.
Fix its ppc64_stub_entry stub reservation loop to properly write across
all of the num_stubs used and not just the first entry.
Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line")
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
---
arch/powerpc/kernel/module_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 126bf3b06ab7..0e45cac4de76 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -1139,7 +1139,7 @@ static int setup_ftrace_ool_stubs(const Elf64_Shdr *sechdrs, unsigned long addr,
/* reserve stubs */
for (i = 0; i < num_stubs; i++)
- if (patch_u32((void *)&stub->funcdata, PPC_RAW_NOP()))
+ if (patch_u32((void *)&stub[i].funcdata, PPC_RAW_NOP()))
return -1;
#endif
--
2.51.0
^ permalink raw reply related
* [PATCH v2 0/3] powerpc/ftrace: Fix livepatch module OOL ftrace corruption
From: Joe Lawrence @ 2025-09-12 14:27 UTC (permalink / raw)
To: linuxppc-dev, live-patching
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao
This patch series fixes a couple of bugs in the powerpc64 out-of-line
(OOL) ftrace support for modules, and follows up with a patch to
simplify the module .stubs allocation code. An analysis of the module
stub area corruption that prompted this work can be found in the v1
thread [1].
The first two patches fix bugs introduced by commit eec37961a56a
("powerpc64/ftrace: Move ftrace sequence out of line"). The first,
suggested by Naveen, ensures that a NOP'd ftrace call site has its
ftrace_ops record updated correctly. The second patch corrects a loop in
setup_ftrace_ool_stubs() to ensure all required stubs are reserved, not
just the first. Together, these bugs lead to potential corruption of the
OOL ftrace stubs area for livepatch modules.
The final patch replaces the sentinel-based allocation in the module
.stubs section with an explicit counter. This improves clarity and helps
prevent similar problems in the future.
Changes since v1: https://lore.kernel.org/live-patching/df7taxdxpbo4qfn7lniggj5o4ili6kweg4nytyb2fwwwgmnyo4@halp5gf244nn/T/
- Split into parts: bug fix x2, code cleanup
- Call ftrace_rec_set_nop_ops() from ftrace_init_nop() [Naveen]
- Update commit msg on cleanup patch [Naveen]
Joe Lawrence (3):
powerpc/ftrace: ensure ftrace record ops are always set for NOPs
powerpc64/modules: correctly iterate over stubs in
setup_ftrace_ool_stubs
powerpc64/modules: replace stub allocation sentinel with an explicit
counter
arch/powerpc/include/asm/module.h | 1 +
arch/powerpc/kernel/module_64.c | 26 ++++++++------------------
arch/powerpc/kernel/trace/ftrace.c | 10 ++++++++--
3 files changed, 17 insertions(+), 20 deletions(-)
--
2.51.0
^ permalink raw reply
* [PATCH v2 1/3] powerpc/ftrace: ensure ftrace record ops are always set for NOPs
From: Joe Lawrence @ 2025-09-12 14:27 UTC (permalink / raw)
To: linuxppc-dev, live-patching
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao
In-Reply-To: <20250912142740.3581368-1-joe.lawrence@redhat.com>
When an ftrace call site is converted to a NOP, its corresponding
dyn_ftrace record should have its ftrace_ops pointer set to
ftrace_nop_ops.
Correct the powerpc implementation to ensure the
ftrace_rec_set_nop_ops() helper is called on all successful NOP
initialization paths. This ensures all ftrace records are consistent
before being handled by the ftrace core.
Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line")
Suggested-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
---
arch/powerpc/kernel/trace/ftrace.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 6dca92d5a6e8..841d077e2825 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -488,8 +488,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
return ret;
/* Set up out-of-line stub */
- if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
- return ftrace_init_ool_stub(mod, rec);
+ if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
+ ret = ftrace_init_ool_stub(mod, rec);
+ goto out;
+ }
/* Nop-out the ftrace location */
new = ppc_inst(PPC_RAW_NOP());
@@ -520,6 +522,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
return -EINVAL;
}
+out:
+ if (!ret)
+ ret = ftrace_rec_set_nop_ops(rec);
+
return ret;
}
--
2.51.0
^ permalink raw reply related
* Re: [PATCH v1 1/2] livepatch: Add config LIVEPATCH_DEBUG to get debug information
From: Miroslav Benes @ 2025-09-12 7:14 UTC (permalink / raw)
To: Tiezhu Yang
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <8dcdba05-4d8c-83ff-f337-b6e71546e1a0@loongson.cn>
[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]
On Fri, 12 Sep 2025, Tiezhu Yang wrote:
> On 2025/9/11 下午9:50, Miroslav Benes wrote:
> > Hi,
> >
> > On Tue, 9 Sep 2025, Tiezhu Yang wrote:
> >
> >> Add config LIVEPATCH_DEBUG and define DEBUG if CONFIG_LIVEPATCH_DEBUG
> >> is set, then pr_debug() can print a debug level message, it is a easy
> >> way to get debug information without dynamic debugging.
> >
> > I do not have a strong opinion but is it really worth it? Configuring
>
> This is an alternative way, there are some similar usages:
>
> drivers/iommu/exynos-iommu.c:
> #ifdef CONFIG_EXYNOS_IOMMU_DEBUG
> #define DEBUG
> #endif
>
> drivers/mtd/nand/raw/s3c2410.c:
> #ifdef CONFIG_MTD_NAND_S3C2410_DEBUG
> #define DEBUG
> #endif
>
> drivers/usb/storage/usb.c:
> #ifdef CONFIG_USB_STORAGE_DEBUG
> #define DEBUG
> #endif
>
> > dynamic debug is not difficult, it is more targetted (you can enable it
> > just for a subset of functions in livepatch subsystem) and it can also be
> > done on the command line.
>
> Yes, this is true. It is up to the maintainers to apply this patch
> or not.
Right and I do not see the point to have it in the tree for the reasons
above.
Miroslav
^ permalink raw reply
* Re: [PATCH v1 1/2] livepatch: Add config LIVEPATCH_DEBUG to get debug information
From: Tiezhu Yang @ 2025-09-12 2:26 UTC (permalink / raw)
To: Miroslav Benes
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <alpine.LSU.2.21.2509111549200.29971@pobox.suse.cz>
On 2025/9/11 下午9:50, Miroslav Benes wrote:
> Hi,
>
> On Tue, 9 Sep 2025, Tiezhu Yang wrote:
>
>> Add config LIVEPATCH_DEBUG and define DEBUG if CONFIG_LIVEPATCH_DEBUG
>> is set, then pr_debug() can print a debug level message, it is a easy
>> way to get debug information without dynamic debugging.
>
> I do not have a strong opinion but is it really worth it? Configuring
This is an alternative way, there are some similar usages:
drivers/iommu/exynos-iommu.c:
#ifdef CONFIG_EXYNOS_IOMMU_DEBUG
#define DEBUG
#endif
drivers/mtd/nand/raw/s3c2410.c:
#ifdef CONFIG_MTD_NAND_S3C2410_DEBUG
#define DEBUG
#endif
drivers/usb/storage/usb.c:
#ifdef CONFIG_USB_STORAGE_DEBUG
#define DEBUG
#endif
> dynamic debug is not difficult, it is more targetted (you can enable it
> just for a subset of functions in livepatch subsystem) and it can also be
> done on the command line.
Yes, this is true. It is up to the maintainers to apply this patch
or not.
Thanks,
Tiezhu
^ permalink raw reply
* Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
From: Jinyang He @ 2025-09-12 1:55 UTC (permalink / raw)
To: Tiezhu Yang
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <c3431ce4-0026-3a05-fa50-281cd34aba4e@loongson.cn>
On 2025-09-11 19:49, Tiezhu Yang wrote:
> On 2025/9/10 上午9:11, Jinyang He wrote:
>> On 2025-09-09 19:31, Tiezhu Yang wrote:
>>
>>> When testing the kernel live patching with "modprobe livepatch-sample",
>>> there is a timeout over 15 seconds from "starting patching transition"
>>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>>> in debug mode. When executing "rmmod livepatch-sample", there exists
>>> the similar issue.
>
> ...
>
>>> @@ -57,9 +62,14 @@ int
>>> arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
>>> }
>>> regs->regs[1] = 0;
>>> regs->regs[22] = 0;
>>> + regs->csr_prmd = task->thread.csr_prmd;
>>> for (unwind_start(&state, task, regs);
>>> !unwind_done(&state) && !unwind_error(&state);
>>> unwind_next_frame(&state)) {
>>> + /* Success path for user tasks */
>>> + if (user_mode(regs))
>>> + return 0;
>>> +
>>> addr = unwind_get_return_address(&state);
>>> /*
>> Hi, Tiezhu,
>>
>> We update stack info by get_stack_info when meet ORC_TYPE_REGS in
>> unwind_next_frame. And in arch_stack_walk(_reliable), we always
>> do unwind_done before unwind_next_frame. So is there anything
>> error in get_stack_info which causing regs is user_mode while
>> stack is not STACK_TYPE_UNKNOWN?
>
> When testing the kernel live patching, the error code path in
> unwind_next_frame() is:
>
> switch (orc->fp_reg) {
> case ORC_REG_PREV_SP:
> p = (unsigned long *)(state->sp + orc->fp_offset);
> if (!stack_access_ok(state, (unsigned long)p,
> sizeof(unsigned long)))
> goto err;
>
> for this case, get_stack_info() does not return 0 due to in_task_stack()
> is not true, then goto error, state->stack_info.type = STACK_TYPE_UNKNOWN
> and state->error = true. In arch_stack_walk_reliable(), the loop will be
> break and it returns -EINVAL, thus causing unreliable stack.
The stop position of a complete stack backtrace on LoongArch should be
the top of the task stack or until the address is_entry_func.
Otherwise, it is not a complete stack backtrace, and thus I think it
is an "unreliable stack".
I'm curious about what the ORC info at this PC.
^ permalink raw reply
* Re: [PATCH v1 1/2] livepatch: Add config LIVEPATCH_DEBUG to get debug information
From: Miroslav Benes @ 2025-09-11 13:50 UTC (permalink / raw)
To: Tiezhu Yang
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <20250909113106.22992-2-yangtiezhu@loongson.cn>
Hi,
On Tue, 9 Sep 2025, Tiezhu Yang wrote:
> Add config LIVEPATCH_DEBUG and define DEBUG if CONFIG_LIVEPATCH_DEBUG
> is set, then pr_debug() can print a debug level message, it is a easy
> way to get debug information without dynamic debugging.
I do not have a strong opinion but is it really worth it? Configuring
dynamic debug is not difficult, it is more targetted (you can enable it
just for a subset of functions in livepatch subsystem) and it can also be
done on the command line.
Miroslav
^ permalink raw reply
* Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
From: Miroslav Benes @ 2025-09-11 13:44 UTC (permalink / raw)
To: Tiezhu Yang
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <20250909113106.22992-3-yangtiezhu@loongson.cn>
Hi,
On Tue, 9 Sep 2025, Tiezhu Yang wrote:
> When testing the kernel live patching with "modprobe livepatch-sample",
> there is a timeout over 15 seconds from "starting patching transition"
> to "patching complete", dmesg shows "unreliable stack" for user tasks
> in debug mode. When executing "rmmod livepatch-sample", there exists
> the similar issue.
>
> Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
> It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
> then use user_mode() to check whether the task is in userspace.
it is a nice optimization for sure, but "unreliable stack" messages point
to a fact that the unwinding of these tasks is probably suboptimal and
could be improved, no?
It would also be nice to include these messages (not for all tasks) to the
commit message.
Regards
Miroslav
^ permalink raw reply
* Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
From: Tiezhu Yang @ 2025-09-11 11:49 UTC (permalink / raw)
To: Jinyang He
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <5e45a1a9-4ac3-56ee-1415-0b2128b4ed9a@loongson.cn>
On 2025/9/10 上午9:11, Jinyang He wrote:
> On 2025-09-09 19:31, Tiezhu Yang wrote:
>
>> When testing the kernel live patching with "modprobe livepatch-sample",
>> there is a timeout over 15 seconds from "starting patching transition"
>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>> in debug mode. When executing "rmmod livepatch-sample", there exists
>> the similar issue.
...
>> @@ -57,9 +62,14 @@ int arch_stack_walk_reliable(stack_trace_consume_fn
>> consume_entry,
>> }
>> regs->regs[1] = 0;
>> regs->regs[22] = 0;
>> + regs->csr_prmd = task->thread.csr_prmd;
>> for (unwind_start(&state, task, regs);
>> !unwind_done(&state) && !unwind_error(&state);
>> unwind_next_frame(&state)) {
>> + /* Success path for user tasks */
>> + if (user_mode(regs))
>> + return 0;
>> +
>> addr = unwind_get_return_address(&state);
>> /*
> Hi, Tiezhu,
>
> We update stack info by get_stack_info when meet ORC_TYPE_REGS in
> unwind_next_frame. And in arch_stack_walk(_reliable), we always
> do unwind_done before unwind_next_frame. So is there anything
> error in get_stack_info which causing regs is user_mode while
> stack is not STACK_TYPE_UNKNOWN?
When testing the kernel live patching, the error code path in
unwind_next_frame() is:
switch (orc->fp_reg) {
case ORC_REG_PREV_SP:
p = (unsigned long *)(state->sp + orc->fp_offset);
if (!stack_access_ok(state, (unsigned long)p,
sizeof(unsigned long)))
goto err;
for this case, get_stack_info() does not return 0 due to in_task_stack()
is not true, then goto error, state->stack_info.type = STACK_TYPE_UNKNOWN
and state->error = true. In arch_stack_walk_reliable(), the loop will be
break and it returns -EINVAL, thus causing unreliable stack.
Maybe it can check whether the task is in userspace and set
state->stack_info.type = STACK_TYPE_UNKNOWN in get_stack_info(),
but I think no need to do that because it has similar effect with
this patch.
Thanks,
Tiezhu
^ permalink raw reply
* Re: [PATCH] powerpc64/modules: fix ool-ftrace-stub vs. livepatch relocation corruption
From: Naveen N Rao @ 2025-09-11 6:25 UTC (permalink / raw)
To: Joe Lawrence
Cc: linuxppc-dev, live-patching, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy
In-Reply-To: <aMHKD_X97uu0tUyK@redhat.com>
On Wed, Sep 10, 2025 at 02:57:19PM -0400, Joe Lawrence wrote:
> On Mon, Sep 08, 2025 at 04:33:24PM +0530, Naveen N Rao wrote:
> > On Wed, Sep 03, 2025 at 10:37:39PM -0400, Joe Lawrence wrote:
> > > On Wed, Sep 03, 2025 at 10:29:50PM -0400, Joe Lawrence wrote:
> > > > The powerpc64 module .stubs section holds ppc64_stub_entry[] code
> > > > trampolines that are generated at module load time. These stubs are
> > > > necessary for function calls to external symbols that are too far away
> > > > for a simple relative branch.
> > > >
> > > > Logic for finding an available ppc64_stub_entry has relied on a sentinel
> > > > value in the funcdata member to indicate a used slot. Code iterates
> > > > through the array, inspecting .funcdata to find the first unused (zeroed)
> > > > entry:
> > > >
> > > > for (i = 0; stub_func_addr(stubs[i].funcdata); i++)
> > > >
> > > > To support CONFIG_PPC_FTRACE_OUT_OF_LINE, a new setup_ftrace_ool_stubs()
> > > > function extended the .stubs section by adding an array of
> > > > ftrace_ool_stub structures for each patchable function. A side effect
> > > > of writing these smaller structures is that the funcdata sentinel
> > > > convention is not maintained.
> >
> > There is clearly a bug in how we are reserving the stubs as you point
> > out further below, but once that is properly initialized, I don't think
> > the smaller structure size for ftrace_ool_stub matters (in so far as
> > stub->funcdata being non-NULL). We end up writing four valid powerpc
> > instructions, along with a ftrace_ops pointer before those instructions
> > which should also be non-zero (there is a bug here too, more on that
> > below). The whole function descriptor dance does complicate matters a
> > bit though.
> >
>
> Hi Naveen,
>
> Ah hah, I see now, given the other bug that you mention, we should have
> had seen non-NULL entries in either ftrace_ool_stub.insn[] or .ftrace_op
> fields such that when read as ppc64_stub_entry, .funcdata would indicate
> that it's in use:
>
> ppc64_stub_entry[] ftrace_ool_stub[]
> 0x00 [0].jump[0] [0].ftrace_op
> 0x04 [0].jump[1] [0].ftrace_op
> 0x08 [0].jump[2] [0].insn[0]
> 0x0C [0].jump[3] [0].insn[1]
> 0x10 [0].jump[4] [0].insn[2]
> 0x14 [0].jump[5] [0].insn[3]
> 0x18 [0].jump[6] [1].ftrace_op
> 0x1C [0].magic [1].ftrace_op
> 0x20 [0].funcdata [1].insn[0] <<
> 0x24 [0].funcdata [1].insn[1] <<
> 0x28 [1].jump[0] [1].insn[2]
> 0x2C [1].jump[1] [1].insn[3]
> 0x30 [1].jump[2] [2].ftrace_op
> 0x34 [1].jump[3] [2].ftrace_op
> 0x38 [1].jump[4] [2].insn[0]
> 0x3C [1].jump[5] [2].insn[1]
> 0x40 [1].jump[6] [2].insn[2]
> 0x44 [1].magic [2].insn[3]
> 0x48 [1].funcdata [3].ftrace_op <<
> 0x4C [1].funcdata [3].ftrace_op <<
>
> If the commit msg for this patch would be clearer by rewording anything,
> I'm happy to update. (My understanding at the time of writing was that
> the NULL funcdata vs. insn[]/ftrace_op was a valid sequence.)
>
Yes, please. But only just to point out the bug in how we are reserving
the stubs.
> > > > @@ -1118,29 +1118,19 @@ int module_trampoline_target(struct
> > > > module *mod, unsigned long addr,
> > > > static int setup_ftrace_ool_stubs(const Elf64_Shdr *sechdrs, unsigned long addr, struct module *me)
> > > > {
> > > > #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> > > > - unsigned int i, total_stubs, num_stubs;
> > > > + unsigned int total_stubs, num_stubs;
> > > > struct ppc64_stub_entry *stub;
> > > >
> > > > total_stubs = sechdrs[me->arch.stubs_section].sh_size / sizeof(*stub);
> > > > num_stubs = roundup(me->arch.ool_stub_count * sizeof(struct ftrace_ool_stub),
> > > > sizeof(struct ppc64_stub_entry)) / sizeof(struct ppc64_stub_entry);
> > > >
> > > > - /* Find the next available entry */
> > > > - stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > > - for (i = 0; stub_func_addr(stub[i].funcdata); i++)
> > > > - if (WARN_ON(i >= total_stubs))
> > > > - return -1;
> > > > -
> > > > - if (WARN_ON(i + num_stubs > total_stubs))
> > > > + if (WARN_ON(me->arch.stub_count + num_stubs > total_stubs))
> > > > return -1;
> > > >
> > > > - stub += i;
> > > > - me->arch.ool_stubs = (struct ftrace_ool_stub *)stub;
> > > > -
> > > > - /* reserve stubs */
> > > > - for (i = 0; i < num_stubs; i++)
> > > > - if (patch_u32((void *)&stub->funcdata, PPC_RAW_NOP()))
> > > > - return -1;
> > >
> > > At first I thought the bug was that this loop was re-writting the same
> > > PPC_RAW_NOP() to the same funcdata (i.e, should have been something
> > > like: patch_u32((void *)stub[i]->funcdata, PPC_RAW_NOP())), but that
> > > didn't work and seemed fragile anyway.
> >
> > D'uh - this path was clearly never tested. I suppose this should have
> > been something like this:
> > patch_ulong((void *)&stub[i]->funcdata, func_desc(local_paca))
> >
> > Still convoluted, but I think that should hopefully fix the problem you
> > are seeing.
> >
>
> I can still try something like this if you prefer that solution (though
> I think there may be some type massaging required in the second argument
> to patch_ulong().) LMK ...
That's alright -- it is better to rip this out and replace with the
changes in your patch.
>
> > >
> > > > + stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > > + me->arch.ool_stubs = (struct ftrace_ool_stub *)(stub + me->arch.stub_count);
> > > > + me->arch.stub_count += num_stubs;
> > > > #endif
> >
> > Regardless of the above, your proposed change looks good to me and
> > simplifies the logic. So:
> > Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
> >
>
>
>
> > > crash> dis 0xc008000007d70dd0 42
> > > ppc64[ ] ftrace[0] <xfs_stats_format+0x558>: .long 0x0
> > > <xfs_stats_format+0x55c>: .long 0x0
> > > <xfs_stats_format+0x560>: mflr r0
> > > <xfs_stats_format+0x564>: bl 0xc008000007d70d80 <xfs_stats_format+0x508>
> > > <xfs_stats_format+0x568>: mtlr r0
> > > <xfs_stats_format+0x56c>: b 0xc008000007d70014 <patch_free_livepatch+0xc>
> > > ftrace[1] <xfs_stats_format+0x570>: .long 0x0
> > > <xfs_stats_format+0x574>: .long 0x0
> > > <xfs_stats_format+0x578>: mflr r0
> > > <xfs_stats_format+0x57c>: bl 0xc008000007d70d80 <xfs_stats_format+0x508>
> > > ppc64[ ] <xfs_stats_format+0x580>: addis r11,r2,4 << This looks like a full
> > > <xfs_stats_format+0x584>: addi r11,r11,-29448 << ppc64_stub_entry
> > > ftrace[2] <xfs_stats_format+0x588>: std r2,24(r1) << dropped in the middle
> > > <xfs_stats_format+0x58c>: ld r12,32(r11) << of the ool_stubs array
> > > <xfs_stats_format+0x590>: mtctr r12 << of ftrace_ool_stub[]
> > > <xfs_stats_format+0x594>: bctr <<
> > > <xfs_stats_format+0x598>: mtlr r0 <<
> > > <xfs_stats_format+0x59c>: andi. r20,r27,30050 <<
> > > ftrace[3] <xfs_stats_format+0x5a0>: .long 0x54e92b8 <<
> > > <xfs_stats_format+0x5a4>: lfs f0,0(r8) <<
> > > ppc64[ ] <xfs_stats_format+0x5a8>: mflr r0
> > > <xfs_stats_format+0x5ac>: bl 0xc008000007d70d80 <xfs_stats_format+0x508>
> > > <xfs_stats_format+0x5b0>: mtlr r0
> > > <xfs_stats_format+0x5b4>: b 0xc008000007d7037c <add_callbacks_to_patch_objects+0xc>
> > > ftrace[4] <xfs_stats_format+0x5b8>: .long 0x0
> > > <xfs_stats_format+0x5bc>: .long 0x0
> >
> > These NULL values are also problematic. I think those are NULL since we
> > are not "reserving" the stubs properly, but those should point to some
> > ftrace_op. I think we are missing a call to ftrace_rec_set_nop_ops() in
> > ftrace_init_nop(), which would be good to do separately.
> >
>
> Very lightly tested, but were you thinking of something like:
>
> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index 6dca92d5a..687371c64 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -488,8 +488,12 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> return ret;
>
> /* Set up out-of-line stub */
> - if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
> - return ftrace_init_ool_stub(mod, rec);
> + if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
> + ret = ftrace_init_ool_stub(mod, rec);
> + if (ret)
> + return ret;
> + return ftrace_rec_set_nop_ops(rec);
> + }
Minor nit: since ftrace_rec_set_nop_ops() has to be called regardless, I
would prefer to add a goto here. See below.
>
> /* Nop-out the ftrace location */
> new = ppc_inst(PPC_RAW_NOP());
> @@ -520,7 +524,7 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> return -EINVAL;
> }
>
> - return ret;
> + return ftrace_rec_set_nop_ops(rec);
> }
We will need to still check for ret here, so something like this?
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 6dca92d5a6e8..841d077e2825 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -488,8 +488,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
return ret;
/* Set up out-of-line stub */
- if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
- return ftrace_init_ool_stub(mod, rec);
+ if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
+ ret = ftrace_init_ool_stub(mod, rec);
+ goto out;
+ }
/* Nop-out the ftrace location */
new = ppc_inst(PPC_RAW_NOP());
@@ -520,6 +522,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
return -EINVAL;
}
+out:
+ if (!ret)
+ ret = ftrace_rec_set_nop_ops(rec);
+
return ret;
}
>
> int ftrace_update_ftrace_func(ftrace_func_t func)
>
>
> In which case the ftrace-ool area looks like:
>
> crash> mod | grep livepatch_module
> c008000006350500 livepatch_module c008000009b90000 262144 (not loaded) [CONFIG_KALLSYMS]
> crash> struct module.arch.ool_stubs c008000006350500
> arch.ool_stubs = 0xc008000009b90dd0 <xfs_stats_format+1368>,
> crash> struct module.arch.ool_stub_count c008000006350500
> arch.ool_stub_count = 7,
>
> crash> struct ftrace_ool_stub 0xc008000009b90dd0 7
> struct ftrace_ool_stub {
> ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
> insn = {0x7c0802a6, 0x4bffffa5, 0x7c0803a6, 0x4bfff230}
> }
>
> struct ftrace_ool_stub {
> ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
> insn = {0x7c0802a6, 0x4bffff8d, 0x7c0803a6, 0x4bfff304}
> }
>
> struct ftrace_ool_stub {
> ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
> insn = {0x7c0802a6, 0x4bffff75, 0x7c0803a6, 0x4bfff430}
> }
>
> struct ftrace_ool_stub {
> ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
> insn = {0x7c0802a6, 0x4bffff5d, 0x7c0803a6, 0x4bfff550}
> }
>
> struct ftrace_ool_stub {
> ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
> insn = {0x7c0802a6, 0x4bffff45, 0x7c0803a6, 0x4bfff768}
> }
>
> struct ftrace_ool_stub {
> ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
> insn = {0x7c0802a6, 0x4bffff2d, 0x7c0803a6, 0x4bfffa08}
> }
>
> struct ftrace_ool_stub {
> ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
> insn = {0x7c0802a6, 0x4bffff15, 0x7c0803a6, 0x4bfffa10}
> }
LGTM.
Thanks,
- Naveen
^ permalink raw reply related
* Re: [PATCH] powerpc64/modules: fix ool-ftrace-stub vs. livepatch relocation corruption
From: Joe Lawrence @ 2025-09-10 18:57 UTC (permalink / raw)
To: Naveen N Rao
Cc: linuxppc-dev, live-patching, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy
In-Reply-To: <2tscft2yyndfbkl2a7ltndqfwx7phajkfma3m6o5phpm3xkme2@dcy6ohdbfhsk>
On Mon, Sep 08, 2025 at 04:33:24PM +0530, Naveen N Rao wrote:
> On Wed, Sep 03, 2025 at 10:37:39PM -0400, Joe Lawrence wrote:
> > On Wed, Sep 03, 2025 at 10:29:50PM -0400, Joe Lawrence wrote:
> > > The powerpc64 module .stubs section holds ppc64_stub_entry[] code
> > > trampolines that are generated at module load time. These stubs are
> > > necessary for function calls to external symbols that are too far away
> > > for a simple relative branch.
> > >
> > > Logic for finding an available ppc64_stub_entry has relied on a sentinel
> > > value in the funcdata member to indicate a used slot. Code iterates
> > > through the array, inspecting .funcdata to find the first unused (zeroed)
> > > entry:
> > >
> > > for (i = 0; stub_func_addr(stubs[i].funcdata); i++)
> > >
> > > To support CONFIG_PPC_FTRACE_OUT_OF_LINE, a new setup_ftrace_ool_stubs()
> > > function extended the .stubs section by adding an array of
> > > ftrace_ool_stub structures for each patchable function. A side effect
> > > of writing these smaller structures is that the funcdata sentinel
> > > convention is not maintained.
>
> There is clearly a bug in how we are reserving the stubs as you point
> out further below, but once that is properly initialized, I don't think
> the smaller structure size for ftrace_ool_stub matters (in so far as
> stub->funcdata being non-NULL). We end up writing four valid powerpc
> instructions, along with a ftrace_ops pointer before those instructions
> which should also be non-zero (there is a bug here too, more on that
> below). The whole function descriptor dance does complicate matters a
> bit though.
>
Hi Naveen,
Ah hah, I see now, given the other bug that you mention, we should have
had seen non-NULL entries in either ftrace_ool_stub.insn[] or .ftrace_op
fields such that when read as ppc64_stub_entry, .funcdata would indicate
that it's in use:
ppc64_stub_entry[] ftrace_ool_stub[]
0x00 [0].jump[0] [0].ftrace_op
0x04 [0].jump[1] [0].ftrace_op
0x08 [0].jump[2] [0].insn[0]
0x0C [0].jump[3] [0].insn[1]
0x10 [0].jump[4] [0].insn[2]
0x14 [0].jump[5] [0].insn[3]
0x18 [0].jump[6] [1].ftrace_op
0x1C [0].magic [1].ftrace_op
0x20 [0].funcdata [1].insn[0] <<
0x24 [0].funcdata [1].insn[1] <<
0x28 [1].jump[0] [1].insn[2]
0x2C [1].jump[1] [1].insn[3]
0x30 [1].jump[2] [2].ftrace_op
0x34 [1].jump[3] [2].ftrace_op
0x38 [1].jump[4] [2].insn[0]
0x3C [1].jump[5] [2].insn[1]
0x40 [1].jump[6] [2].insn[2]
0x44 [1].magic [2].insn[3]
0x48 [1].funcdata [3].ftrace_op <<
0x4C [1].funcdata [3].ftrace_op <<
If the commit msg for this patch would be clearer by rewording anything,
I'm happy to update. (My understanding at the time of writing was that
the NULL funcdata vs. insn[]/ftrace_op was a valid sequence.)
> > > This is not a problem for an ordinary
> > > kernel module, as this occurs in module_finalize(), after which no
> > > further .stubs updates are needed.
> > >
> > > However, when loading a livepatch module that contains klp-relocations,
> > > apply_relocate_add() is executed a second time, after the out-of-line
> > > ftrace stubs have been set up.
> > >
> > > When apply_relocate_add() then calls stub_for_addr() to handle a
> > > klp-relocation, its search for the next available ppc64_stub_entry[]
> > > slot may stop prematurely in the middle of the ftrace_ool_stub array. A
> > > full ppc64_stub_entry is then written, corrupting the ftrace stubs.
> > >
> > > Fix this by explicitly tracking the count of used ppc64_stub_entrys.
> > > Rather than relying on an inline funcdata sentinel value, a new
> > > stub_count is used as the explicit boundary for searching and allocating
> > > stubs. This simplifies the code, eliminates the "one extra reloc" that
> > > was required for the sentinel check, and resolves the memory corruption.
> > >
> >
> > Apologies if this is too wordy, I wrote it as a bit of a braindump to
> > summarize the longer analysis at the bottom of the reply ...
>
> This was a good read, thanks for all the details. It did help spot
> another issue.
>
> >
> > > Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line")
> > > Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
> > > ---
> > > arch/powerpc/include/asm/module.h | 1 +
> > > arch/powerpc/kernel/module_64.c | 26 ++++++++------------------
> > > 2 files changed, 9 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h
> > > index e1ee5026ac4a..864e22deaa2c 100644
> > > --- a/arch/powerpc/include/asm/module.h
> > > +++ b/arch/powerpc/include/asm/module.h
> > > @@ -27,6 +27,7 @@ struct ppc_plt_entry {
> > > struct mod_arch_specific {
> > > #ifdef __powerpc64__
> > > unsigned int stubs_section; /* Index of stubs section in module */
> > > + unsigned int stub_count; /* Number of stubs used */
> > > #ifdef CONFIG_PPC_KERNEL_PCREL
> > > unsigned int got_section; /* What section is the GOT? */
> > > unsigned int pcpu_section; /* .data..percpu section */
> > > diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> > > index 126bf3b06ab7..2a44bc8e2439 100644
> > > --- a/arch/powerpc/kernel/module_64.c
> > > +++ b/arch/powerpc/kernel/module_64.c
> > > @@ -209,8 +209,7 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
> > > char *secstrings,
> > > struct module *me)
> > > {
> > > - /* One extra reloc so it's always 0-addr terminated */
> > > - unsigned long relocs = 1;
> > > + unsigned long relocs = 0;
> > > unsigned i;
> > >
> > > /* Every relocated section... */
> > > @@ -705,7 +704,7 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
> > >
> > > /* Find this stub, or if that fails, the next avail. entry */
> > > stubs = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > - for (i = 0; stub_func_addr(stubs[i].funcdata); i++) {
> > > + for (i = 0; i < me->arch.stub_count; i++) {
> > > if (WARN_ON(i >= num_stubs))
> > > return 0;
> > >
> > > @@ -716,6 +715,7 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
> > > if (!create_stub(sechdrs, &stubs[i], addr, me, name))
> > > return 0;
> > >
> > > + me->arch.stub_count++;
> > > return (unsigned long)&stubs[i];
> > > }
> > >
> > > @@ -1118,29 +1118,19 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
> > > static int setup_ftrace_ool_stubs(const Elf64_Shdr *sechdrs, unsigned long addr, struct module *me)
> > > {
> > > #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> > > - unsigned int i, total_stubs, num_stubs;
> > > + unsigned int total_stubs, num_stubs;
> > > struct ppc64_stub_entry *stub;
> > >
> > > total_stubs = sechdrs[me->arch.stubs_section].sh_size / sizeof(*stub);
> > > num_stubs = roundup(me->arch.ool_stub_count * sizeof(struct ftrace_ool_stub),
> > > sizeof(struct ppc64_stub_entry)) / sizeof(struct ppc64_stub_entry);
> > >
> > > - /* Find the next available entry */
> > > - stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > - for (i = 0; stub_func_addr(stub[i].funcdata); i++)
> > > - if (WARN_ON(i >= total_stubs))
> > > - return -1;
> > > -
> > > - if (WARN_ON(i + num_stubs > total_stubs))
> > > + if (WARN_ON(me->arch.stub_count + num_stubs > total_stubs))
> > > return -1;
> > >
> > > - stub += i;
> > > - me->arch.ool_stubs = (struct ftrace_ool_stub *)stub;
> > > -
> > > - /* reserve stubs */
> > > - for (i = 0; i < num_stubs; i++)
> > > - if (patch_u32((void *)&stub->funcdata, PPC_RAW_NOP()))
> > > - return -1;
> >
> > At first I thought the bug was that this loop was re-writting the same
> > PPC_RAW_NOP() to the same funcdata (i.e, should have been something
> > like: patch_u32((void *)stub[i]->funcdata, PPC_RAW_NOP())), but that
> > didn't work and seemed fragile anyway.
>
> D'uh - this path was clearly never tested. I suppose this should have
> been something like this:
> patch_ulong((void *)&stub[i]->funcdata, func_desc(local_paca))
>
> Still convoluted, but I think that should hopefully fix the problem you
> are seeing.
>
I can still try something like this if you prefer that solution (though
I think there may be some type massaging required in the second argument
to patch_ulong().) LMK ...
> >
> > > + stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > + me->arch.ool_stubs = (struct ftrace_ool_stub *)(stub + me->arch.stub_count);
> > > + me->arch.stub_count += num_stubs;
> > > #endif
>
> Regardless of the above, your proposed change looks good to me and
> simplifies the logic. So:
> Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
>
> > crash> dis 0xc008000007d70dd0 42
> > ppc64[ ] ftrace[0] <xfs_stats_format+0x558>: .long 0x0
> > <xfs_stats_format+0x55c>: .long 0x0
> > <xfs_stats_format+0x560>: mflr r0
> > <xfs_stats_format+0x564>: bl 0xc008000007d70d80 <xfs_stats_format+0x508>
> > <xfs_stats_format+0x568>: mtlr r0
> > <xfs_stats_format+0x56c>: b 0xc008000007d70014 <patch_free_livepatch+0xc>
> > ftrace[1] <xfs_stats_format+0x570>: .long 0x0
> > <xfs_stats_format+0x574>: .long 0x0
> > <xfs_stats_format+0x578>: mflr r0
> > <xfs_stats_format+0x57c>: bl 0xc008000007d70d80 <xfs_stats_format+0x508>
> > ppc64[ ] <xfs_stats_format+0x580>: addis r11,r2,4 << This looks like a full
> > <xfs_stats_format+0x584>: addi r11,r11,-29448 << ppc64_stub_entry
> > ftrace[2] <xfs_stats_format+0x588>: std r2,24(r1) << dropped in the middle
> > <xfs_stats_format+0x58c>: ld r12,32(r11) << of the ool_stubs array
> > <xfs_stats_format+0x590>: mtctr r12 << of ftrace_ool_stub[]
> > <xfs_stats_format+0x594>: bctr <<
> > <xfs_stats_format+0x598>: mtlr r0 <<
> > <xfs_stats_format+0x59c>: andi. r20,r27,30050 <<
> > ftrace[3] <xfs_stats_format+0x5a0>: .long 0x54e92b8 <<
> > <xfs_stats_format+0x5a4>: lfs f0,0(r8) <<
> > ppc64[ ] <xfs_stats_format+0x5a8>: mflr r0
> > <xfs_stats_format+0x5ac>: bl 0xc008000007d70d80 <xfs_stats_format+0x508>
> > <xfs_stats_format+0x5b0>: mtlr r0
> > <xfs_stats_format+0x5b4>: b 0xc008000007d7037c <add_callbacks_to_patch_objects+0xc>
> > ftrace[4] <xfs_stats_format+0x5b8>: .long 0x0
> > <xfs_stats_format+0x5bc>: .long 0x0
>
> These NULL values are also problematic. I think those are NULL since we
> are not "reserving" the stubs properly, but those should point to some
> ftrace_op. I think we are missing a call to ftrace_rec_set_nop_ops() in
> ftrace_init_nop(), which would be good to do separately.
>
Very lightly tested, but were you thinking of something like:
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 6dca92d5a..687371c64 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -488,8 +488,12 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
return ret;
/* Set up out-of-line stub */
- if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
- return ftrace_init_ool_stub(mod, rec);
+ if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
+ ret = ftrace_init_ool_stub(mod, rec);
+ if (ret)
+ return ret;
+ return ftrace_rec_set_nop_ops(rec);
+ }
/* Nop-out the ftrace location */
new = ppc_inst(PPC_RAW_NOP());
@@ -520,7 +524,7 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
return -EINVAL;
}
- return ret;
+ return ftrace_rec_set_nop_ops(rec);
}
int ftrace_update_ftrace_func(ftrace_func_t func)
In which case the ftrace-ool area looks like:
crash> mod | grep livepatch_module
c008000006350500 livepatch_module c008000009b90000 262144 (not loaded) [CONFIG_KALLSYMS]
crash> struct module.arch.ool_stubs c008000006350500
arch.ool_stubs = 0xc008000009b90dd0 <xfs_stats_format+1368>,
crash> struct module.arch.ool_stub_count c008000006350500
arch.ool_stub_count = 7,
crash> struct ftrace_ool_stub 0xc008000009b90dd0 7
struct ftrace_ool_stub {
ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
insn = {0x7c0802a6, 0x4bffffa5, 0x7c0803a6, 0x4bfff230}
}
struct ftrace_ool_stub {
ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
insn = {0x7c0802a6, 0x4bffff8d, 0x7c0803a6, 0x4bfff304}
}
struct ftrace_ool_stub {
ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
insn = {0x7c0802a6, 0x4bffff75, 0x7c0803a6, 0x4bfff430}
}
struct ftrace_ool_stub {
ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
insn = {0x7c0802a6, 0x4bffff5d, 0x7c0803a6, 0x4bfff550}
}
struct ftrace_ool_stub {
ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
insn = {0x7c0802a6, 0x4bffff45, 0x7c0803a6, 0x4bfff768}
}
struct ftrace_ool_stub {
ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
insn = {0x7c0802a6, 0x4bffff2d, 0x7c0803a6, 0x4bfffa08}
}
struct ftrace_ool_stub {
ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
insn = {0x7c0802a6, 0x4bffff15, 0x7c0803a6, 0x4bfffa10}
}
--
Joe
^ permalink raw reply related
* Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
From: Jinyang He @ 2025-09-10 1:11 UTC (permalink / raw)
To: Tiezhu Yang
Cc: Josh Poimboeuf, Huacai Chen, Xi Zhang, live-patching, loongarch,
linux-kernel
In-Reply-To: <20250909113106.22992-3-yangtiezhu@loongson.cn>
On 2025-09-09 19:31, Tiezhu Yang wrote:
> When testing the kernel live patching with "modprobe livepatch-sample",
> there is a timeout over 15 seconds from "starting patching transition"
> to "patching complete", dmesg shows "unreliable stack" for user tasks
> in debug mode. When executing "rmmod livepatch-sample", there exists
> the similar issue.
>
> Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
> It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
> then use user_mode() to check whether the task is in userspace.
>
> Here are the call chains:
>
> klp_enable_patch()
> klp_try_complete_transition()
> klp_try_switch_task()
> klp_check_and_switch_task()
> klp_check_stack()
> stack_trace_save_tsk_reliable()
> arch_stack_walk_reliable()
>
> With this patch, it takes a short time for patching and unpatching.
>
> Before:
>
> # modprobe livepatch-sample
> # dmesg -T | tail -3
> [Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
> [Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks
> [Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
>
> # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
> # rmmod livepatch_sample
> rmmod: ERROR: Module livepatch_sample is in use
> # rmmod livepatch_sample
> # dmesg -T | tail -3
> [Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
> [Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks
> [Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
>
> After:
>
> # modprobe livepatch-sample
> # dmesg -T | tail -2
> [Sat Sep 6 11:19:00 2025] livepatch: 'livepatch_sample': starting patching transition
> [Sat Sep 6 11:19:01 2025] livepatch: 'livepatch_sample': patching complete
>
> # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
> # rmmod livepatch_sample
> # dmesg -T | tail -2
> [Sat Sep 6 11:21:10 2025] livepatch: 'livepatch_sample': starting unpatching transition
> [Sat Sep 6 11:21:11 2025] livepatch: 'livepatch_sample': unpatching complete
>
> While at it, do the similar thing for arch_stack_walk() to avoid
> potential issues.
>
> Cc: stable@vger.kernel.org # v6.9+
> Fixes: 199cc14cb4f1 ("LoongArch: Add kernel livepatching support")
> Reported-by: Xi Zhang <zhangxi@kylinos.cn>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
> arch/loongarch/kernel/stacktrace.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
> index 9a038d1070d7..0454cce3b667 100644
> --- a/arch/loongarch/kernel/stacktrace.c
> +++ b/arch/loongarch/kernel/stacktrace.c
> @@ -30,10 +30,15 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> }
> regs->regs[1] = 0;
> regs->regs[22] = 0;
> + regs->csr_prmd = task->thread.csr_prmd;
> }
>
> for (unwind_start(&state, task, regs);
> !unwind_done(&state); unwind_next_frame(&state)) {
> + /* Success path for user tasks */
> + if (user_mode(regs))
> + return;
> +
> addr = unwind_get_return_address(&state);
> if (!addr || !consume_entry(cookie, addr))
> break;
> @@ -57,9 +62,14 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
> }
> regs->regs[1] = 0;
> regs->regs[22] = 0;
> + regs->csr_prmd = task->thread.csr_prmd;
>
> for (unwind_start(&state, task, regs);
> !unwind_done(&state) && !unwind_error(&state); unwind_next_frame(&state)) {
> + /* Success path for user tasks */
> + if (user_mode(regs))
> + return 0;
> +
> addr = unwind_get_return_address(&state);
>
> /*
Hi, Tiezhu,
We update stack info by get_stack_info when meet ORC_TYPE_REGS in
unwind_next_frame. And in arch_stack_walk(_reliable), we always
do unwind_done before unwind_next_frame. So is there anything
error in get_stack_info which causing regs is user_mode while
stack is not STACK_TYPE_UNKNOWN?
^ permalink raw reply
* Re: [PATCH v2 4/6] unwind: Implement generic sframe unwinder library
From: Indu Bhagat @ 2025-09-09 18:39 UTC (permalink / raw)
To: Puranjay Mohan, Dylan Hatch, Josh Poimboeuf, Steven Rostedt,
Peter Zijlstra, Will Deacon, Catalin Marinas, Jiri Kosina
Cc: Roman Gushchin, Weinan Liu, Mark Rutland, Ian Rogers,
linux-toolchains, linux-kernel, live-patching, joe.lawrence,
Song Liu, Prasanna Kumar T S M, Jens Remus
In-Reply-To: <mb61p4itb4ltz.fsf@kernel.org>
On 9/9/25 9:44 AM, Puranjay Mohan wrote:
> Dylan Hatch <dylanbhatch@google.com> writes:
>
>> From: Weinan Liu <wnliu@google.com>
>>
>> This change introduces a kernel space unwinder using sframe table for
>> architectures without ORC unwinder support.
>>
>> The implementation is adapted from Josh's userspace sframe unwinder
>> proposal[1] according to the sframe v2 spec[2].
>>
>> [1] https://lore.kernel.org/lkml/42c0a99236af65c09c8182e260af7bcf5aa1e158.1730150953.git.jpoimboe@kernel.org/
>> [2] https://sourceware.org/binutils/docs/sframe-spec.html
>>
>> Signed-off-by: Weinan Liu <wnliu@google.com>
>> Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
>> Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
>> ---
>> include/linux/sframe_lookup.h | 43 ++++++++
>> kernel/Makefile | 1 +
>> kernel/sframe_lookup.c | 196 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 240 insertions(+)
>> create mode 100644 include/linux/sframe_lookup.h
>> create mode 100644 kernel/sframe_lookup.c
>>
>> diff --git a/include/linux/sframe_lookup.h b/include/linux/sframe_lookup.h
>> new file mode 100644
>> index 000000000000..1c26cf1f38d4
>> --- /dev/null
>> +++ b/include/linux/sframe_lookup.h
>> @@ -0,0 +1,43 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_SFRAME_LOOKUP_H
>> +#define _LINUX_SFRAME_LOOKUP_H
>> +
>> +/**
>> + * struct sframe_ip_entry - sframe unwind info for given ip
>> + * @cfa_offset: Offset for the Canonical Frame Address(CFA) from Frame
>> + * Pointer(FP) or Stack Pointer(SP)
>> + * @ra_offset: Offset for the Return Address from CFA.
>> + * @fp_offset: Offset for the Frame Pointer (FP) from CFA.
>> + * @use_fp: Use FP to get next CFA or not
>> + */
>> +struct sframe_ip_entry {
>> + int32_t cfa_offset;
>> + int32_t ra_offset;
>> + int32_t fp_offset;
>> + bool use_fp;
>> +};
>> +
>> +/**
>> + * struct sframe_table - sframe struct of a table
>> + * @sfhdr_p: Pointer to sframe header
>> + * @fde_p: Pointer to the first of sframe frame description entry(FDE).
>> + * @fre_p: Pointer to the first of sframe frame row entry(FRE).
>> + */
>> +struct sframe_table {
>> + struct sframe_header *sfhdr_p;
>> + struct sframe_fde *fde_p;
>> + char *fre_p;
>> +};
>> +
>> +#ifdef CONFIG_SFRAME_UNWINDER
>> +void init_sframe_table(void);
>> +int sframe_find_pc(unsigned long pc, struct sframe_ip_entry *entry);
>> +#else
>> +static inline void init_sframe_table(void) {}
>> +static inline int sframe_find_pc(unsigned long pc, struct sframe_ip_entry *entry)
>> +{
>> + return -EINVAL;
>> +}
>> +#endif
>> +
>> +#endif /* _LINUX_SFRAME_LOOKUP_H */
>> diff --git a/kernel/Makefile b/kernel/Makefile
>> index c60623448235..17e9cfe09dc0 100644
>> --- a/kernel/Makefile
>> +++ b/kernel/Makefile
>> @@ -138,6 +138,7 @@ obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o
>>
>> obj-$(CONFIG_RESOURCE_KUNIT_TEST) += resource_kunit.o
>> obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
>> +obj-$(CONFIG_SFRAME_UNWINDER) += sframe_lookup.o
>>
>> CFLAGS_kstack_erase.o += $(DISABLE_KSTACK_ERASE)
>> CFLAGS_kstack_erase.o += $(call cc-option,-mgeneral-regs-only)
>> diff --git a/kernel/sframe_lookup.c b/kernel/sframe_lookup.c
>> new file mode 100644
>> index 000000000000..51cd24a75956
>> --- /dev/null
>> +++ b/kernel/sframe_lookup.c
>> @@ -0,0 +1,196 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +#define pr_fmt(fmt) "sframe: " fmt
>> +
>> +#include <linux/module.h>
>> +#include <linux/sort.h>
>> +#include <linux/sframe_lookup.h>
>> +#include <linux/kallsyms.h>
>> +#include "sframe.h"
>> +
>> +extern char __start_sframe_header[];
>> +extern char __stop_sframe_header[];
>> +
>> +static bool sframe_init __ro_after_init;
>> +static struct sframe_table sftbl;
>> +
>> +#define SFRAME_READ_TYPE(in, out, type) \
>> +({ \
>> + type __tmp; \
>> + memcpy(&__tmp, in, sizeof(__tmp)); \
>> + in += sizeof(__tmp); \
>> + out = __tmp; \
>> +})
>> +
>> +#define SFRAME_READ_ROW_ADDR(in_addr, out_addr, type) \
>> +({ \
>> + switch (type) { \
>> + case SFRAME_FRE_TYPE_ADDR1: \
>> + SFRAME_READ_TYPE(in_addr, out_addr, u8); \
>> + break; \
>> + case SFRAME_FRE_TYPE_ADDR2: \
>> + SFRAME_READ_TYPE(in_addr, out_addr, u16); \
>> + break; \
>> + case SFRAME_FRE_TYPE_ADDR4: \
>> + SFRAME_READ_TYPE(in_addr, out_addr, u32); \
>> + break; \
>> + default: \
>> + break; \
>> + } \
>> +})
>> +
>> +#define SFRAME_READ_ROW_OFFSETS(in_addr, out_addr, size) \
>> +({ \
>> + switch (size) { \
>> + case 1: \
>> + SFRAME_READ_TYPE(in_addr, out_addr, s8); \
>> + break; \
>> + case 2: \
>> + SFRAME_READ_TYPE(in_addr, out_addr, s16); \
>> + break; \
>> + case 4: \
>> + SFRAME_READ_TYPE(in_addr, out_addr, s32); \
>> + break; \
>> + default: \
>> + break; \
>> + } \
>> +})
>> +
>> +static struct sframe_fde *find_fde(const struct sframe_table *tbl, unsigned long pc)
>> +{
>> + int l, r, m, f;
>> + int32_t ip;
>> + struct sframe_fde *fdep;
>> +
>> + if (!tbl || !tbl->sfhdr_p || !tbl->fde_p)
>> + return NULL;
>> +
>> + ip = (pc - (unsigned long)tbl->sfhdr_p);
>> +
>> + /* Do a binary range search to find the rightmost FDE start_addr < ip */
>> + l = m = f = 0;
>> + r = tbl->sfhdr_p->num_fdes;
>> + while (l < r) {
>> + m = l + ((r - l) / 2);
>> + fdep = tbl->fde_p + m;
>> + if (fdep->start_addr > ip)
>> + r = m;
>> + else
>> + l = m + 1;
>> + }
>
> The above logic doesn't correctly work for the new scheme with
> SFRAME_F_FDE_FUNC_START_PCREL, see [1]
>
> If SFRAME_F_FDE_FUNC_START_PCREL is set in flags then function start
> address in SFrame FDE is encoded as the distance from the location of
> the sfde_func_start_address to the start PC of the function.
>
> And for modules, sframes will only work if compiled with [1] with
> SFRAME_F_FDE_FUNC_START_PCREL flag set as ET_DYN, ET_EXEC, and ET_REL
> (relocatable links) generated by ld have sfde_func_start_address as
> offset from field itself. see [2] for more details.
>
Yes.
The SFrame reader patches need to be refreshed with changes to do the
right thing when SFRAME_F_FDE_FUNC_START_PCREL flag is set.
Jens had posted a patch sometime ago (patch to update the SFrame reader
routines to work with Binutils 2.45) to serve as a starting point.
+CC: Jens Remus
> So, for the in kernel sframe unwinder that should support both normal
> links (kernel) and relocatable links (modules), we need to reject the
> sframe section if this flag is not set in init_sframe_table() and in
> sframe_module_init().
>
> Then we can fix find_fde() like:
>
> use pc in place of ip directly.
>
> and the check will become
>
> if (fdep->start_addr > (s32)(pc - fdep))
>
> I hope I am not missing something,
>
> Indu,
> Do you agree with my comments above?
>
> Thanks,
> Puranjay
>
> [1] https://sourceware.org/pipermail/binutils/2025-July/142222.html
> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=32666
>
^ permalink raw reply
* Re: [PATCH v2 4/6] unwind: Implement generic sframe unwinder library
From: Puranjay Mohan @ 2025-09-09 16:44 UTC (permalink / raw)
To: Dylan Hatch, Josh Poimboeuf, Steven Rostedt, Indu Bhagat,
Peter Zijlstra, Will Deacon, Catalin Marinas, Jiri Kosina
Cc: Dylan Hatch, Roman Gushchin, Weinan Liu, Mark Rutland, Ian Rogers,
linux-toolchains, linux-kernel, live-patching, joe.lawrence,
Song Liu, Prasanna Kumar T S M
In-Reply-To: <20250904223850.884188-5-dylanbhatch@google.com>
Dylan Hatch <dylanbhatch@google.com> writes:
> From: Weinan Liu <wnliu@google.com>
>
> This change introduces a kernel space unwinder using sframe table for
> architectures without ORC unwinder support.
>
> The implementation is adapted from Josh's userspace sframe unwinder
> proposal[1] according to the sframe v2 spec[2].
>
> [1] https://lore.kernel.org/lkml/42c0a99236af65c09c8182e260af7bcf5aa1e158.1730150953.git.jpoimboe@kernel.org/
> [2] https://sourceware.org/binutils/docs/sframe-spec.html
>
> Signed-off-by: Weinan Liu <wnliu@google.com>
> Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
> Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
> ---
> include/linux/sframe_lookup.h | 43 ++++++++
> kernel/Makefile | 1 +
> kernel/sframe_lookup.c | 196 ++++++++++++++++++++++++++++++++++
> 3 files changed, 240 insertions(+)
> create mode 100644 include/linux/sframe_lookup.h
> create mode 100644 kernel/sframe_lookup.c
>
> diff --git a/include/linux/sframe_lookup.h b/include/linux/sframe_lookup.h
> new file mode 100644
> index 000000000000..1c26cf1f38d4
> --- /dev/null
> +++ b/include/linux/sframe_lookup.h
> @@ -0,0 +1,43 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_SFRAME_LOOKUP_H
> +#define _LINUX_SFRAME_LOOKUP_H
> +
> +/**
> + * struct sframe_ip_entry - sframe unwind info for given ip
> + * @cfa_offset: Offset for the Canonical Frame Address(CFA) from Frame
> + * Pointer(FP) or Stack Pointer(SP)
> + * @ra_offset: Offset for the Return Address from CFA.
> + * @fp_offset: Offset for the Frame Pointer (FP) from CFA.
> + * @use_fp: Use FP to get next CFA or not
> + */
> +struct sframe_ip_entry {
> + int32_t cfa_offset;
> + int32_t ra_offset;
> + int32_t fp_offset;
> + bool use_fp;
> +};
> +
> +/**
> + * struct sframe_table - sframe struct of a table
> + * @sfhdr_p: Pointer to sframe header
> + * @fde_p: Pointer to the first of sframe frame description entry(FDE).
> + * @fre_p: Pointer to the first of sframe frame row entry(FRE).
> + */
> +struct sframe_table {
> + struct sframe_header *sfhdr_p;
> + struct sframe_fde *fde_p;
> + char *fre_p;
> +};
> +
> +#ifdef CONFIG_SFRAME_UNWINDER
> +void init_sframe_table(void);
> +int sframe_find_pc(unsigned long pc, struct sframe_ip_entry *entry);
> +#else
> +static inline void init_sframe_table(void) {}
> +static inline int sframe_find_pc(unsigned long pc, struct sframe_ip_entry *entry)
> +{
> + return -EINVAL;
> +}
> +#endif
> +
> +#endif /* _LINUX_SFRAME_LOOKUP_H */
> diff --git a/kernel/Makefile b/kernel/Makefile
> index c60623448235..17e9cfe09dc0 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -138,6 +138,7 @@ obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o
>
> obj-$(CONFIG_RESOURCE_KUNIT_TEST) += resource_kunit.o
> obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
> +obj-$(CONFIG_SFRAME_UNWINDER) += sframe_lookup.o
>
> CFLAGS_kstack_erase.o += $(DISABLE_KSTACK_ERASE)
> CFLAGS_kstack_erase.o += $(call cc-option,-mgeneral-regs-only)
> diff --git a/kernel/sframe_lookup.c b/kernel/sframe_lookup.c
> new file mode 100644
> index 000000000000..51cd24a75956
> --- /dev/null
> +++ b/kernel/sframe_lookup.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#define pr_fmt(fmt) "sframe: " fmt
> +
> +#include <linux/module.h>
> +#include <linux/sort.h>
> +#include <linux/sframe_lookup.h>
> +#include <linux/kallsyms.h>
> +#include "sframe.h"
> +
> +extern char __start_sframe_header[];
> +extern char __stop_sframe_header[];
> +
> +static bool sframe_init __ro_after_init;
> +static struct sframe_table sftbl;
> +
> +#define SFRAME_READ_TYPE(in, out, type) \
> +({ \
> + type __tmp; \
> + memcpy(&__tmp, in, sizeof(__tmp)); \
> + in += sizeof(__tmp); \
> + out = __tmp; \
> +})
> +
> +#define SFRAME_READ_ROW_ADDR(in_addr, out_addr, type) \
> +({ \
> + switch (type) { \
> + case SFRAME_FRE_TYPE_ADDR1: \
> + SFRAME_READ_TYPE(in_addr, out_addr, u8); \
> + break; \
> + case SFRAME_FRE_TYPE_ADDR2: \
> + SFRAME_READ_TYPE(in_addr, out_addr, u16); \
> + break; \
> + case SFRAME_FRE_TYPE_ADDR4: \
> + SFRAME_READ_TYPE(in_addr, out_addr, u32); \
> + break; \
> + default: \
> + break; \
> + } \
> +})
> +
> +#define SFRAME_READ_ROW_OFFSETS(in_addr, out_addr, size) \
> +({ \
> + switch (size) { \
> + case 1: \
> + SFRAME_READ_TYPE(in_addr, out_addr, s8); \
> + break; \
> + case 2: \
> + SFRAME_READ_TYPE(in_addr, out_addr, s16); \
> + break; \
> + case 4: \
> + SFRAME_READ_TYPE(in_addr, out_addr, s32); \
> + break; \
> + default: \
> + break; \
> + } \
> +})
> +
> +static struct sframe_fde *find_fde(const struct sframe_table *tbl, unsigned long pc)
> +{
> + int l, r, m, f;
> + int32_t ip;
> + struct sframe_fde *fdep;
> +
> + if (!tbl || !tbl->sfhdr_p || !tbl->fde_p)
> + return NULL;
> +
> + ip = (pc - (unsigned long)tbl->sfhdr_p);
> +
> + /* Do a binary range search to find the rightmost FDE start_addr < ip */
> + l = m = f = 0;
> + r = tbl->sfhdr_p->num_fdes;
> + while (l < r) {
> + m = l + ((r - l) / 2);
> + fdep = tbl->fde_p + m;
> + if (fdep->start_addr > ip)
> + r = m;
> + else
> + l = m + 1;
> + }
The above logic doesn't correctly work for the new scheme with
SFRAME_F_FDE_FUNC_START_PCREL, see [1]
If SFRAME_F_FDE_FUNC_START_PCREL is set in flags then function start
address in SFrame FDE is encoded as the distance from the location of
the sfde_func_start_address to the start PC of the function.
And for modules, sframes will only work if compiled with [1] with
SFRAME_F_FDE_FUNC_START_PCREL flag set as ET_DYN, ET_EXEC, and ET_REL
(relocatable links) generated by ld have sfde_func_start_address as
offset from field itself. see [2] for more details.
So, for the in kernel sframe unwinder that should support both normal
links (kernel) and relocatable links (modules), we need to reject the
sframe section if this flag is not set in init_sframe_table() and in
sframe_module_init().
Then we can fix find_fde() like:
use pc in place of ip directly.
and the check will become
if (fdep->start_addr > (s32)(pc - fdep))
I hope I am not missing something,
Indu,
Do you agree with my comments above?
Thanks,
Puranjay
[1] https://sourceware.org/pipermail/binutils/2025-July/142222.html
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=32666
> + /* use l - 1 because l will be the first item fdep->start_addr > ip */
> + f = l - 1;
> + if (f >= tbl->sfhdr_p->num_fdes || f < 0)
> + return NULL;
> + fdep = tbl->fde_p + f;
> + if (ip < fdep->start_addr || ip > fdep->start_addr + fdep->func_size)
> + return NULL;
> +
> + return fdep;
> +}
> +
> +static int find_fre(const struct sframe_table *tbl, unsigned long pc,
> + const struct sframe_fde *fdep, struct sframe_ip_entry *entry)
> +{
> + int i, offset_size, offset_count;
> + char *fres, *offsets_loc;
> + int32_t ip_off;
> + uint32_t next_row_ip_off;
> + uint8_t fre_info, fde_type = SFRAME_FUNC_FDE_TYPE(fdep->info),
> + fre_type = SFRAME_FUNC_FRE_TYPE(fdep->info);
> +
> + fres = tbl->fre_p + fdep->fres_off;
> +
> + /* Whether PCs in the FREs should be treated as masks or not */
> + if (fde_type == SFRAME_FDE_TYPE_PCMASK)
> + ip_off = pc % fdep->rep_size;
> + else
> + ip_off = (int32_t)(pc - (unsigned long)tbl->sfhdr_p) - fdep->start_addr;
> +
> + if (ip_off < 0 || ip_off > fdep->func_size)
> + return -EINVAL;
> +
> + /*
> + * FRE structure starts by address of the entry with variants length. Use
> + * two pointers to track current head(fres) and the address of last
> + * offset(offsets_loc)
> + */
> + for (i = 0; i < fdep->fres_num; i++) {
> + SFRAME_READ_ROW_ADDR(fres, next_row_ip_off, fre_type);
> + if (ip_off < next_row_ip_off)
> + break;
> + SFRAME_READ_TYPE(fres, fre_info, u8);
> + offsets_loc = fres;
> + /*
> + * jump to the start of next fre
> + * fres += fre_offets_cnt*offset_size
> + */
> + fres += SFRAME_FRE_OFFSET_COUNT(fre_info) << SFRAME_FRE_OFFSET_SIZE(fre_info);
> + }
> +
> + offset_size = 1 << SFRAME_FRE_OFFSET_SIZE(fre_info);
> + offset_count = SFRAME_FRE_OFFSET_COUNT(fre_info);
> +
> + if (offset_count > 0) {
> + SFRAME_READ_ROW_OFFSETS(offsets_loc, entry->cfa_offset, offset_size);
> + offset_count--;
> + }
> + if (offset_count > 0 && !entry->ra_offset) {
> + SFRAME_READ_ROW_OFFSETS(offsets_loc, entry->ra_offset, offset_size);
> + offset_count--;
> + }
> + if (offset_count > 0 && !entry->fp_offset) {
> + SFRAME_READ_ROW_OFFSETS(offsets_loc, entry->fp_offset, offset_size);
> + offset_count--;
> + }
> + if (offset_count)
> + return -EINVAL;
> +
> + entry->use_fp = SFRAME_FRE_CFA_BASE_REG_ID(fre_info) == SFRAME_BASE_REG_FP;
> +
> + return 0;
> +}
> +
> +int sframe_find_pc(unsigned long pc, struct sframe_ip_entry *entry)
> +{
> + struct sframe_fde *fdep;
> + struct sframe_table *sftbl_p = &sftbl;
> + int err;
> +
> + if (!sframe_init)
> + return -EINVAL;
> +
> + memset(entry, 0, sizeof(*entry));
> + entry->ra_offset = sftbl_p->sfhdr_p->cfa_fixed_ra_offset;
> + entry->fp_offset = sftbl_p->sfhdr_p->cfa_fixed_fp_offset;
> +
> + fdep = find_fde(sftbl_p, pc);
> + if (!fdep)
> + return -EINVAL;
> + err = find_fre(sftbl_p, pc, fdep, entry);
> + if (err)
> + return err;
> +
> + return 0;
> +}
> +
> +void __init init_sframe_table(void)
> +{
> + size_t sframe_size = (void *)__stop_sframe_header - (void *)__start_sframe_header;
> + void *sframe_buf = __start_sframe_header;
> +
> + if (sframe_size <= 0)
> + return;
> + sftbl.sfhdr_p = sframe_buf;
> + if (!sftbl.sfhdr_p || sftbl.sfhdr_p->preamble.magic != SFRAME_MAGIC ||
> + sftbl.sfhdr_p->preamble.version != SFRAME_VERSION_2 ||
> + !(sftbl.sfhdr_p->preamble.flags & SFRAME_F_FDE_SORTED)) {
> + pr_warn("WARNING: Unable to read sframe header. Disabling unwinder.\n");
> + return;
> + }
> +
> + sftbl.fde_p = (struct sframe_fde *)(__start_sframe_header + SFRAME_HEADER_SIZE(*sftbl.sfhdr_p)
> + + sftbl.sfhdr_p->fdes_off);
> + sftbl.fre_p = __start_sframe_header + SFRAME_HEADER_SIZE(*sftbl.sfhdr_p)
> + + sftbl.sfhdr_p->fres_off;
> + sframe_init = true;
> +}
> --
> 2.51.0.355.g5224444f11-goog
^ permalink raw reply
* [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
From: Tiezhu Yang @ 2025-09-09 11:31 UTC (permalink / raw)
To: Josh Poimboeuf, Huacai Chen
Cc: Xi Zhang, live-patching, loongarch, linux-kernel
In-Reply-To: <20250909113106.22992-1-yangtiezhu@loongson.cn>
When testing the kernel live patching with "modprobe livepatch-sample",
there is a timeout over 15 seconds from "starting patching transition"
to "patching complete", dmesg shows "unreliable stack" for user tasks
in debug mode. When executing "rmmod livepatch-sample", there exists
the similar issue.
Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
then use user_mode() to check whether the task is in userspace.
Here are the call chains:
klp_enable_patch()
klp_try_complete_transition()
klp_try_switch_task()
klp_check_and_switch_task()
klp_check_stack()
stack_trace_save_tsk_reliable()
arch_stack_walk_reliable()
With this patch, it takes a short time for patching and unpatching.
Before:
# modprobe livepatch-sample
# dmesg -T | tail -3
[Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
[Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
rmmod: ERROR: Module livepatch_sample is in use
# rmmod livepatch_sample
# dmesg -T | tail -3
[Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
After:
# modprobe livepatch-sample
# dmesg -T | tail -2
[Sat Sep 6 11:19:00 2025] livepatch: 'livepatch_sample': starting patching transition
[Sat Sep 6 11:19:01 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
# dmesg -T | tail -2
[Sat Sep 6 11:21:10 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Sat Sep 6 11:21:11 2025] livepatch: 'livepatch_sample': unpatching complete
While at it, do the similar thing for arch_stack_walk() to avoid
potential issues.
Cc: stable@vger.kernel.org # v6.9+
Fixes: 199cc14cb4f1 ("LoongArch: Add kernel livepatching support")
Reported-by: Xi Zhang <zhangxi@kylinos.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
arch/loongarch/kernel/stacktrace.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
index 9a038d1070d7..0454cce3b667 100644
--- a/arch/loongarch/kernel/stacktrace.c
+++ b/arch/loongarch/kernel/stacktrace.c
@@ -30,10 +30,15 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
}
regs->regs[1] = 0;
regs->regs[22] = 0;
+ regs->csr_prmd = task->thread.csr_prmd;
}
for (unwind_start(&state, task, regs);
!unwind_done(&state); unwind_next_frame(&state)) {
+ /* Success path for user tasks */
+ if (user_mode(regs))
+ return;
+
addr = unwind_get_return_address(&state);
if (!addr || !consume_entry(cookie, addr))
break;
@@ -57,9 +62,14 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
}
regs->regs[1] = 0;
regs->regs[22] = 0;
+ regs->csr_prmd = task->thread.csr_prmd;
for (unwind_start(&state, task, regs);
!unwind_done(&state) && !unwind_error(&state); unwind_next_frame(&state)) {
+ /* Success path for user tasks */
+ if (user_mode(regs))
+ return 0;
+
addr = unwind_get_return_address(&state);
/*
--
2.42.0
^ permalink raw reply related
* [PATCH v1 1/2] livepatch: Add config LIVEPATCH_DEBUG to get debug information
From: Tiezhu Yang @ 2025-09-09 11:31 UTC (permalink / raw)
To: Josh Poimboeuf, Huacai Chen
Cc: Xi Zhang, live-patching, loongarch, linux-kernel
In-Reply-To: <20250909113106.22992-1-yangtiezhu@loongson.cn>
Add config LIVEPATCH_DEBUG and define DEBUG if CONFIG_LIVEPATCH_DEBUG
is set, then pr_debug() can print a debug level message, it is a easy
way to get debug information without dynamic debugging.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
kernel/livepatch/Kconfig | 8 ++++++++
kernel/livepatch/transition.c | 4 ++++
2 files changed, 12 insertions(+)
diff --git a/kernel/livepatch/Kconfig b/kernel/livepatch/Kconfig
index 53d51ed619a3..4843665b1939 100644
--- a/kernel/livepatch/Kconfig
+++ b/kernel/livepatch/Kconfig
@@ -18,3 +18,11 @@ config LIVEPATCH
module uses the interface provided by this option to register
a patch, causing calls to patched functions to be redirected
to new function code contained in the patch module.
+
+config LIVEPATCH_DEBUG
+ bool "Kernel Live Patching debug"
+ depends on LIVEPATCH
+ help
+ Say Y here to print a debug level message with pr_debug() for
+ the Kernel Live Patching code, it is a easy way to get debug
+ information without dynamic debugging.
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 2351a19ac2a9..0ab3e5684680 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -5,6 +5,10 @@
* Copyright (C) 2015-2016 Josh Poimboeuf <jpoimboe@redhat.com>
*/
+#ifdef CONFIG_LIVEPATCH_DEBUG
+#define DEBUG
+#endif
+
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/cpu.h>
--
2.42.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox