linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines
@ 2024-09-15 20:56 Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 01/17] powerpc/trace: Account for -fpatchable-function-entry support by toolchain Hari Bathini
                   ` (17 more replies)
  0 siblings, 18 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

This is v5 of the series posted here:
https://lore.kernel.org/all/cover.1720942106.git.naveen@kernel.org/

This series reworks core ftrace support on powerpc to have the function
profiling sequence moved out of line. This enables us to have a single
nop at kernel function entry virtually eliminating effect of the
function tracer when it is not enabled. The function profile sequence is
moved out of line and is allocated at two separate places depending on a
new config option.

For 64-bit powerpc, the function profiling sequence is also updated to
include an additional instruction 'mtlr r0' after the usual
two-instruction sequence to fix link stack imbalance (return address
predictor) when ftrace is enabled. This showed an improvement of ~10%
in null_syscall benchmark (NR_LOOPS=10000000) on a Power 10 system
with ftrace enabled.

Finally, support for ftrace direct calls is added based on support for
DYNAMIC_FTRACE_WITH_CALL_OPS. BPF Trampoline support is added atop this.

Support for ftrace direct calls is added for 32-bit powerpc. There is
some code to enable bpf trampolines for 32-bit powerpc, but it is not
complete and will need to be pursued separately.

Patches 1 to 10 are independent of this series and can go in separately
though. Rest of the patches depend on the series from Benjamin Gray
adding support for patch_uint() and patch_ulong():
https://lore.kernel.org/all/172474280311.31690.1489687786264785049.b4-ty@ellerman.id.au/

Changelog v5:
* Intermediate files named .vmlinux.arch.* instead of .arch.vmlinux.*
* Fixed ftrace stack tracer failure due to inadvertent use of
  'add r7, r3, MCOUNT_INSN_SIZE' instruction instead of
  'addi r7, r3, MCOUNT_INSN_SIZE'
* Fixed build error for !CONFIG_MODULES case.
* .vmlinux.arch.* files compiled under arch/powerpc/tools
* Made sure .vmlinux.arch.* files are cleaned with `make clean`
* num_ool_stubs_text_end used for setting up ftrace_ool_stub_text_end
  set to zero instead of computing to some random negative value when
  not required.
* Resolved checkpatch.pl warnings.
* Dropped RFC tag.

Changelog v4:
- Patches 1, 10 and 13 are new.
- Address review comments from Nick. Numerous changes throughout the
  patch series.
- Extend support for ftrace ool to vmlinux text up to 64MB (patch 13).
- Address remaining TODOs in support for BPF Trampolines.
- Update synchronization when patching instructions during trampoline
  attach/detach.


Naveen N Rao (17):
  powerpc/trace: Account for -fpatchable-function-entry support by
    toolchain
  powerpc/kprobes: Use ftrace to determine if a probe is at function
    entry
  powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc
    v5.x
  powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code
  powerpc/module_64: Convert #ifdef to IS_ENABLED()
  powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  powerpc/ftrace: Skip instruction patching if the instructions are the
    same
  powerpc/ftrace: Move ftrace stub used for init text before _einittext
  powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into
    bpf_jit_emit_func_call_rel()
  powerpc/ftrace: Add a postlink script to validate function tracer
  kbuild: Add generic hook for architectures to use before the final
    vmlinux link
  powerpc64/ftrace: Move ftrace sequence out of line
  powerpc64/ftrace: Support .text larger than 32MB with out-of-line
    stubs
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  samples/ftrace: Add support for ftrace direct samples on powerpc
  powerpc64/bpf: Add support for bpf trampolines

 arch/Kconfig                                |   6 +
 arch/powerpc/Kbuild                         |   2 +-
 arch/powerpc/Kconfig                        |  23 +-
 arch/powerpc/Makefile                       |   8 +
 arch/powerpc/Makefile.postlink              |   8 +
 arch/powerpc/include/asm/ftrace.h           |  33 +-
 arch/powerpc/include/asm/module.h           |   5 +
 arch/powerpc/include/asm/ppc-opcode.h       |  14 +
 arch/powerpc/kernel/asm-offsets.c           |  11 +
 arch/powerpc/kernel/kprobes.c               |  18 +-
 arch/powerpc/kernel/module_64.c             |  66 +-
 arch/powerpc/kernel/trace/Makefile          |  11 +-
 arch/powerpc/kernel/trace/ftrace.c          | 298 ++++++-
 arch/powerpc/kernel/trace/ftrace_64_pg.c    |  69 +-
 arch/powerpc/kernel/trace/ftrace_entry.S    | 244 ++++--
 arch/powerpc/kernel/vmlinux.lds.S           |   3 +-
 arch/powerpc/net/bpf_jit.h                  |  12 +
 arch/powerpc/net/bpf_jit_comp.c             | 847 +++++++++++++++++++-
 arch/powerpc/net/bpf_jit_comp32.c           |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c           |  68 +-
 arch/powerpc/tools/Makefile                 |  12 +
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh  |  52 ++
 arch/powerpc/tools/ftrace_check.sh          |  50 ++
 samples/ftrace/ftrace-direct-modify.c       |  85 +-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 ++-
 samples/ftrace/ftrace-direct-multi.c        |  79 +-
 samples/ftrace/ftrace-direct-too.c          |  83 +-
 samples/ftrace/ftrace-direct.c              |  69 +-
 scripts/Makefile.vmlinux                    |   7 +
 scripts/link-vmlinux.sh                     |   7 +-
 30 files changed, 2098 insertions(+), 200 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/ftrace-gen-ool-stubs.sh
 create mode 100755 arch/powerpc/tools/ftrace_check.sh

-- 
2.46.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 01/17] powerpc/trace: Account for -fpatchable-function-entry support by toolchain
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 02/17] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Hari Bathini
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

So far, we have relied on the fact that gcc supports both
-mprofile-kernel, as well as -fpatchable-function-entry, and clang
supports neither. Our Makefile only checks for CONFIG_MPROFILE_KERNEL to
decide which files to build. Clang has a feature request out [*] to
implement -fpatchable-function-entry, and is unlikely to support
-mprofile-kernel.

Update our Makefile checks so that we pick up the correct files to build
once clang picks up support for -fpatchable-function-entry.

[*] https://github.com/llvm/llvm-project/issues/57031

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/trace/Makefile | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
index 125f4ca588b9..d6c3885453bd 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -9,12 +9,15 @@ CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_ftrace_64_pg.o = $(CC_FLAGS_FTRACE)
 endif
 
-obj32-$(CONFIG_FUNCTION_TRACER)		+= ftrace.o ftrace_entry.o
-ifdef CONFIG_MPROFILE_KERNEL
-obj64-$(CONFIG_FUNCTION_TRACER)		+= ftrace.o ftrace_entry.o
+ifdef CONFIG_FUNCTION_TRACER
+obj32-y					+= ftrace.o ftrace_entry.o
+ifeq ($(CONFIG_MPROFILE_KERNEL)$(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY),)
+obj64-y					+= ftrace_64_pg.o ftrace_64_pg_entry.o
 else
-obj64-$(CONFIG_FUNCTION_TRACER)		+= ftrace_64_pg.o ftrace_64_pg_entry.o
+obj64-y					+= ftrace.o ftrace_entry.o
+endif
 endif
+
 obj-$(CONFIG_TRACING)			+= trace_clock.o
 
 obj-$(CONFIG_PPC64)			+= $(obj64-y)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 02/17] powerpc/kprobes: Use ftrace to determine if a probe is at function entry
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 01/17] powerpc/trace: Account for -fpatchable-function-entry support by toolchain Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 03/17] powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc v5.x Hari Bathini
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2, unless we are using pcrel.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/kprobes.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index f8aa91bc3b17..bf382c459e1f 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 	return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-	return offset <= 16;
-#else
-	return offset <= 8;
-#endif
-#else
+	unsigned long ip = ftrace_location(addr);
+
+	if (ip)
+		return offset <= (ip - addr);
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+		return offset <= 8;
 	return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
 					 bool *on_func_entry)
 {
-	*on_func_entry = arch_kprobe_on_func_entry(offset);
+	*on_func_entry = arch_kprobe_on_func_entry(addr, offset);
 	return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 03/17] powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc v5.x
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 01/17] powerpc/trace: Account for -fpatchable-function-entry support by toolchain Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 02/17] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 04/17] powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code Hari Bathini
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Gcc v5.x emits a 3-instruction sequence for -mprofile-kernel:
	mflr	r0
	std	r0, 16(r1)
	bl	_mcount

Gcc v6.x moved to a simpler 2-instruction sequence by removing the 'std'
instruction. The store saved the return address in the LR save area in
the caller stack frame for stack unwinding. However, with dynamic
ftrace, we no longer have a call to _mcount on kernel boot when ftrace
is not enabled. When ftrace is enabled, that store is performed within
ftrace_caller(). As such, the additional 'std' instruction is redundant.
Nop it out on kernel boot.

With this change, we now use the same 2-instruction profiling sequence
with both -mprofile-kernel, as well as -fpatchable-function-entry on
64-bit powerpc.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/trace/ftrace.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..2ef504700e8d 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -246,8 +246,12 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 		/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */
 		ret = ftrace_read_inst(ip - 4, &old);
 		if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) {
+			/* Gcc v5.x emit the additional 'std' instruction, gcc v6.x don't */
 			ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
-			ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+			if (ret)
+				return ret;
+			ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+						 ppc_inst(PPC_RAW_NOP()));
 		}
 	} else {
 		return -EINVAL;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 04/17] powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (2 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 03/17] powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc v5.x Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 05/17] powerpc/module_64: Convert #ifdef to IS_ENABLED() Hari Bathini
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
	mflr	r0
	stw	r0, 4(r1)
	bl	_mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/trace/ftrace.c       | 6 ++++--
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 2ef504700e8d..8c3e523e4f96 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -240,8 +240,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	} else if (IS_ENABLED(CONFIG_PPC32)) {
 		/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
 		ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
-		if (!ret)
-			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+		if (ret)
+			return ret;
+		ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+					 ppc_inst(PPC_RAW_NOP()));
 	} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
 		/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */
 		ret = ftrace_read_inst(ip - 4, &old);
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 76dbe9fd2c0f..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro	ftrace_regs_entry allregs
+	/* Save the original return address in A's stack frame */
+	PPC_STL		r0, LRSAVE(r1)
 	/* Create a minimal stack frame for representing B */
 	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
 	SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-	/* Save the original return address in A's stack frame */
-	std	r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
 	/* Ok to continue? */
 	lbz	r3, PACA_FTRACE_ENABLED(r13)
 	cmpdi	r3, 0
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 05/17] powerpc/module_64: Convert #ifdef to IS_ENABLED()
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (3 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 04/17] powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 06/17] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace Hari Bathini
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Minor refactor for converting #ifdef to IS_ENABLED().

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/module_64.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index e9bab599d0c2..1db88409bd95 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -241,14 +241,8 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 		}
 	}
 
-#ifdef CONFIG_DYNAMIC_FTRACE
-	/* make the trampoline to the ftrace_caller */
-	relocs++;
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
-	/* an additional one for ftrace_regs_caller */
-	relocs++;
-#endif
-#endif
+	/* stubs for ftrace_caller and ftrace_regs_caller */
+	relocs += IS_ENABLED(CONFIG_DYNAMIC_FTRACE) + IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS);
 
 	pr_debug("Looks like a total of %lu stubs, max\n", relocs);
 	return relocs * sizeof(struct ppc64_stub_entry);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 06/17] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (4 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 05/17] powerpc/module_64: Convert #ifdef to IS_ENABLED() Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 07/17] powerpc/ftrace: Skip instruction patching if the instructions are the same Hari Bathini
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Pointer to struct module is only relevant for ftrace records belonging
to kernel modules. Having this field in dyn_arch_ftrace wastes memory
for all ftrace records belonging to the kernel. Remove the same in
favour of looking up the module from the ftrace record address, similar
to other architectures.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/include/asm/ftrace.h        |  1 -
 arch/powerpc/kernel/trace/ftrace.c       | 49 +++++++++--------
 arch/powerpc/kernel/trace/ftrace_64_pg.c | 69 ++++++++++--------------
 3 files changed, 56 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 559560286e6d..278d4548e8f1 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -24,7 +24,6 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
-	struct module *mod;
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 8c3e523e4f96..fe0546fbac8e 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -106,28 +106,43 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
 	return 0;
 }
 
+#ifdef CONFIG_MODULES
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long addr)
+{
+	struct module *mod = NULL;
+
+	preempt_disable();
+	mod = __module_text_address(ip);
+	preempt_enable();
+
+	if (!mod)
+		pr_err("No module loaded at addr=%lx\n", ip);
+
+	return (addr == (unsigned long)ftrace_caller ? mod->arch.tramp : mod->arch.tramp_regs);
+}
+#else
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long addr)
+{
+	return 0;
+}
+#endif
+
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
 {
 	unsigned long ip = rec->ip;
 	unsigned long stub;
 
-	if (is_offset_in_branch_range(addr - ip)) {
+	if (is_offset_in_branch_range(addr - ip))
 		/* Within range */
 		stub = addr;
-#ifdef CONFIG_MODULES
-	} else if (rec->arch.mod) {
-		/* Module code would be going to one of the module stubs */
-		stub = (addr == (unsigned long)ftrace_caller ? rec->arch.mod->arch.tramp :
-							       rec->arch.mod->arch.tramp_regs);
-#endif
-	} else if (core_kernel_text(ip)) {
+	else if (core_kernel_text(ip))
 		/* We would be branching to one of our ftrace stubs */
 		stub = find_ftrace_tramp(ip);
-		if (!stub) {
-			pr_err("0x%lx: No ftrace stubs reachable\n", ip);
-			return -EINVAL;
-		}
-	} else {
+	else
+		stub = ftrace_lookup_module_stub(ip, addr);
+
+	if (!stub) {
+		pr_err("0x%lx: No ftrace stubs reachable\n", ip);
 		return -EINVAL;
 	}
 
@@ -262,14 +277,6 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	if (ret)
 		return ret;
 
-	if (!core_kernel_text(ip)) {
-		if (!mod) {
-			pr_err("0x%lx: No module provided for non-kernel address\n", ip);
-			return -EFAULT;
-		}
-		rec->arch.mod = mod;
-	}
-
 	/* Nop-out the ftrace location */
 	new = ppc_inst(PPC_RAW_NOP());
 	addr = MCOUNT_ADDR;
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..8a551dfca3d0 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -116,6 +116,20 @@ static unsigned long find_bl_target(unsigned long ip, ppc_inst_t op)
 }
 
 #ifdef CONFIG_MODULES
+static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
+{
+	struct module *mod;
+
+	preempt_disable();
+	mod = __module_text_address(rec->ip);
+	preempt_enable();
+
+	if (!mod)
+		pr_err("No module loaded at addr=%lx\n", rec->ip);
+
+	return mod;
+}
+
 static int
 __ftrace_make_nop(struct module *mod,
 		  struct dyn_ftrace *rec, unsigned long addr)
@@ -124,6 +138,12 @@ __ftrace_make_nop(struct module *mod,
 	unsigned long ip = rec->ip;
 	ppc_inst_t op, pop;
 
+	if (!mod) {
+		mod = ftrace_lookup_module(rec);
+		if (!mod)
+			return -EINVAL;
+	}
+
 	/* read where this goes */
 	if (copy_inst_from_kernel_nofault(&op, (void *)ip)) {
 		pr_err("Fetching opcode failed.\n");
@@ -366,27 +386,6 @@ int ftrace_make_nop(struct module *mod,
 		return -EINVAL;
 	}
 
-	/*
-	 * Out of range jumps are called from modules.
-	 * We should either already have a pointer to the module
-	 * or it has been passed in.
-	 */
-	if (!rec->arch.mod) {
-		if (!mod) {
-			pr_err("No module loaded addr=%lx\n", addr);
-			return -EFAULT;
-		}
-		rec->arch.mod = mod;
-	} else if (mod) {
-		if (mod != rec->arch.mod) {
-			pr_err("Record mod %p not equal to passed in mod %p\n",
-			       rec->arch.mod, mod);
-			return -EINVAL;
-		}
-		/* nothing to do if mod == rec->arch.mod */
-	} else
-		mod = rec->arch.mod;
-
 	return __ftrace_make_nop(mod, rec, addr);
 }
 
@@ -411,7 +410,10 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	ppc_inst_t op[2];
 	void *ip = (void *)rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_lookup_module(rec);
+
+	if (!mod)
+		return -EINVAL;
 
 	/* read where this goes */
 	if (copy_inst_from_kernel_nofault(op, ip))
@@ -533,16 +535,6 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 		return -EINVAL;
 	}
 
-	/*
-	 * Out of range jumps are called from modules.
-	 * Being that we are converting from nop, it had better
-	 * already have a module defined.
-	 */
-	if (!rec->arch.mod) {
-		pr_err("No module loaded\n");
-		return -EINVAL;
-	}
-
 	return __ftrace_make_call(rec, addr);
 }
 
@@ -555,7 +547,10 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	ppc_inst_t op;
 	unsigned long ip = rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_lookup_module(rec);
+
+	if (!mod)
+		return -EINVAL;
 
 	/* If we never set up ftrace trampolines, then bail */
 	if (!mod->arch.tramp || !mod->arch.tramp_regs) {
@@ -668,14 +663,6 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 		return -EINVAL;
 	}
 
-	/*
-	 * Out of range jumps are called from modules.
-	 */
-	if (!rec->arch.mod) {
-		pr_err("No module loaded\n");
-		return -EINVAL;
-	}
-
 	return __ftrace_modify_call(rec, old_addr, addr);
 }
 #endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 07/17] powerpc/ftrace: Skip instruction patching if the instructions are the same
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (5 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 06/17] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 08/17] powerpc/ftrace: Move ftrace stub used for init text before _einittext Hari Bathini
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

To simplify upcoming changes to ftrace, add a check to skip actual
instruction patching if the old and new instructions are the same. We
still validate that the instruction is what we expect, but don't
actually patch the same instruction again.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/trace/ftrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index fe0546fbac8e..719517265d39 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -82,7 +82,7 @@ static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_
 {
 	int ret = ftrace_validate_inst(ip, old);
 
-	if (!ret)
+	if (!ret && !ppc_inst_equal(old, new))
 		ret = patch_instruction((u32 *)ip, new);
 
 	return ret;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 08/17] powerpc/ftrace: Move ftrace stub used for init text before _einittext
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (6 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 07/17] powerpc/ftrace: Skip instruction patching if the instructions are the same Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 09/17] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel() Hari Bathini
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Move the ftrace stub used to cover inittext before _einittext so that it
is within kernel text, as seen through core_kernel_text(). This is
required for a subsequent change to ftrace.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/vmlinux.lds.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 7ab4e2fb28b1..b4c9decc7a75 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -265,14 +265,13 @@ SECTIONS
 	.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
 		_sinittext = .;
 		INIT_TEXT
-
+		*(.tramp.ftrace.init);
 		/*
 		 *.init.text might be RO so we must ensure this section ends on
 		 * a page boundary.
 		 */
 		. = ALIGN(PAGE_SIZE);
 		_einittext = .;
-		*(.tramp.ftrace.init);
 	} :text
 
 	/* .exit.text is discarded at runtime, not link time,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 09/17] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel()
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (7 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 08/17] powerpc/ftrace: Move ftrace stub used for init text before _einittext Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 10/17] powerpc/ftrace: Add a postlink script to validate function tracer Hari Bathini
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") enhanced
bpf_jit_emit_func_call_hlp() to handle calls out to module region, where
bpf progs are generated. The only difference now between
bpf_jit_emit_func_call_hlp() and bpf_jit_emit_func_call_rel() is in
handling of the initial pass where target function address is not known.
Fold that logic into bpf_jit_emit_func_call_hlp() and rename it to
bpf_jit_emit_func_call_rel() to simplify bpf function call JIT code.

We don't actually need to load/restore TOC across a call out to a
different kernel helper or to a different bpf program since they all
work with the kernel TOC. We only need to do it if we have to call out
to a module function. So, guard TOC load/restore with appropriate
conditions.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/net/bpf_jit_comp64.c | 61 +++++++++----------------------
 1 file changed, 17 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 2cbcdf93cc19..f3be024fc685 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -202,14 +202,22 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	EMIT(PPC_RAW_BLR());
 }
 
-static int
-bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
 {
 	unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
 	long reladdr;
 
-	if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
-		return -EINVAL;
+	/* bpf to bpf call, func is not known in the initial pass. Emit 5 nops as a placeholder */
+	if (!func) {
+		for (int i = 0; i < 5; i++)
+			EMIT(PPC_RAW_NOP());
+		/* elfv1 needs an additional instruction to load addr from descriptor */
+		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1))
+			EMIT(PPC_RAW_NOP());
+		EMIT(PPC_RAW_MTCTR(_R12));
+		EMIT(PPC_RAW_BCTRL());
+		return 0;
+	}
 
 #ifdef CONFIG_PPC_KERNEL_PCREL
 	reladdr = func_addr - local_paca->kernelbase;
@@ -266,7 +274,8 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context *ctx,
 			 * We can clobber r2 since we get called through a
 			 * function pointer (so caller will save/restore r2).
 			 */
-			EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
+			if (is_module_text_address(func_addr))
+				EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
 		} else {
 			PPC_LI64(_R12, func);
 			EMIT(PPC_RAW_MTCTR(_R12));
@@ -276,46 +285,14 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context *ctx,
 		 * Load r2 with kernel TOC as kernel TOC is used if function address falls
 		 * within core kernel text.
 		 */
-		EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)));
+		if (is_module_text_address(func_addr))
+			EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)));
 	}
 #endif
 
 	return 0;
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
-{
-	unsigned int i, ctx_idx = ctx->idx;
-
-	if (WARN_ON_ONCE(func && is_module_text_address(func)))
-		return -EINVAL;
-
-	/* skip past descriptor if elf v1 */
-	func += FUNCTION_DESCR_SIZE;
-
-	/* Load function address into r12 */
-	PPC_LI64(_R12, func);
-
-	/* For bpf-to-bpf function calls, the callee's address is unknown
-	 * until the last extra pass. As seen above, we use PPC_LI64() to
-	 * load the callee's address, but this may optimize the number of
-	 * instructions required based on the nature of the address.
-	 *
-	 * Since we don't want the number of instructions emitted to increase,
-	 * we pad the optimized PPC_LI64() call with NOPs to guarantee that
-	 * we always have a five-instruction sequence, which is the maximum
-	 * that PPC_LI64() can emit.
-	 */
-	if (!image)
-		for (i = ctx->idx - ctx_idx; i < 5; i++)
-			EMIT(PPC_RAW_NOP());
-
-	EMIT(PPC_RAW_MTCTR(_R12));
-	EMIT(PPC_RAW_BCTRL());
-
-	return 0;
-}
-
 static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
 {
 	/*
@@ -1102,11 +1079,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
 			if (ret < 0)
 				return ret;
 
-			if (func_addr_fixed)
-				ret = bpf_jit_emit_func_call_hlp(image, fimage, ctx, func_addr);
-			else
-				ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, func_addr);
-
+			ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, func_addr);
 			if (ret)
 				return ret;
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 10/17] powerpc/ftrace: Add a postlink script to validate function tracer
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (8 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 09/17] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel() Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link Hari Bathini
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Function tracer on powerpc can only work with vmlinux having a .text
size of up to ~64MB due to powerpc branch instruction having a limited
relative branch range of 32MB. Today, this is only detected on kernel
boot when ftrace is init'ed. Add a post-link script to check the size of
.text so that we can detect this at build time, and break the build if
necessary.

We add a dependency on !COMPILE_TEST for CONFIG_HAVE_FUNCTION_TRACER so
that allyesconfig and other test builds can continue to work without
enabling ftrace.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig               |  2 +-
 arch/powerpc/Makefile.postlink     |  8 +++++
 arch/powerpc/tools/ftrace_check.sh | 50 ++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100755 arch/powerpc/tools/ftrace_check.sh

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f9d23b276b5..de18f3baff66 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -243,7 +243,7 @@ config PPC
 	select HAVE_FUNCTION_DESCRIPTORS	if PPC64_ELF_ABI_V1
 	select HAVE_FUNCTION_ERROR_INJECTION
 	select HAVE_FUNCTION_GRAPH_TRACER
-	select HAVE_FUNCTION_TRACER		if PPC64 || (PPC32 && CC_IS_GCC)
+	select HAVE_FUNCTION_TRACER		if !COMPILE_TEST && (PPC64 || (PPC32 && CC_IS_GCC))
 	select HAVE_GCC_PLUGINS			if GCC_VERSION >= 50200   # plugin support on gcc <= 5.1 is buggy on PPC
 	select HAVE_GENERIC_VDSO
 	select HAVE_HARDLOCKUP_DETECTOR_ARCH	if PPC_BOOK3S_64 && SMP
diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
index ae5a4256b03d..bb601be36173 100644
--- a/arch/powerpc/Makefile.postlink
+++ b/arch/powerpc/Makefile.postlink
@@ -24,6 +24,9 @@ else
 	$(CONFIG_SHELL) $(srctree)/arch/powerpc/tools/relocs_check.sh "$(OBJDUMP)" "$(NM)" "$@"
 endif
 
+quiet_cmd_ftrace_check = CHKFTRC $@
+      cmd_ftrace_check = $(CONFIG_SHELL) $(srctree)/arch/powerpc/tools/ftrace_check.sh "$(NM)" "$@"
+
 # `@true` prevents complaint when there is nothing to be done
 
 vmlinux: FORCE
@@ -34,6 +37,11 @@ endif
 ifdef CONFIG_RELOCATABLE
 	$(call if_changed,relocs_check)
 endif
+ifdef CONFIG_FUNCTION_TRACER
+ifndef CONFIG_PPC64_ELF_ABI_V1
+	$(call cmd,ftrace_check)
+endif
+endif
 
 clean:
 	rm -f .tmp_symbols.txt
diff --git a/arch/powerpc/tools/ftrace_check.sh b/arch/powerpc/tools/ftrace_check.sh
new file mode 100755
index 000000000000..f4310e736f1b
--- /dev/null
+++ b/arch/powerpc/tools/ftrace_check.sh
@@ -0,0 +1,50 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# This script checks vmlinux to ensure that all functions can call ftrace_caller() either directly,
+# or through the stub, ftrace_tramp_text, at the end of kernel text.
+
+# Error out if any command fails
+set -e
+
+# Allow for verbose output
+if [ "$V" = "1" ]; then
+	set -x
+fi
+
+if [ $# -lt 2 ]; then
+	echo "$0 [path to nm] [path to vmlinux]" 1>&2
+	exit 1
+fi
+
+# Have Kbuild supply the path to nm so we handle cross compilation.
+nm="$1"
+vmlinux="$2"
+
+stext_addr=$($nm "$vmlinux" | grep -e " [TA] _stext$" | \
+	cut -d' ' -f1 | tr '[[:lower:]]' '[[:upper:]]')
+ftrace_caller_addr=$($nm "$vmlinux" | grep -e " T ftrace_caller$" | \
+	cut -d' ' -f1 | tr '[[:lower:]]' '[[:upper:]]')
+ftrace_tramp_addr=$($nm "$vmlinux" | grep -e " T ftrace_tramp_text$" | \
+	cut -d' ' -f1 | tr '[[:lower:]]' '[[:upper:]]')
+
+ftrace_caller_offset=$(echo "ibase=16;$ftrace_caller_addr - $stext_addr" | bc)
+ftrace_tramp_offset=$(echo "ibase=16;$ftrace_tramp_addr - $ftrace_caller_addr" | bc)
+sz_32m=$(printf "%d" 0x2000000)
+sz_64m=$(printf "%d" 0x4000000)
+
+# ftrace_caller - _stext < 32M
+if [ $ftrace_caller_offset -ge $sz_32m ]; then
+	echo "ERROR: ftrace_caller (0x$ftrace_caller_addr) is beyond 32MiB of _stext" 1>&2
+	echo "ERROR: consider disabling CONFIG_FUNCTION_TRACER, or reducing the size \
+		of kernel text" 1>&2
+	exit 1
+fi
+
+# ftrace_tramp_text - ftrace_caller < 64M
+if [ $ftrace_tramp_offset -ge $sz_64m ]; then
+	echo "ERROR: kernel text extends beyond 64MiB from ftrace_caller" 1>&2
+	echo "ERROR: consider disabling CONFIG_FUNCTION_TRACER, or reducing the size \
+		of kernel text" 1>&2
+	exit 1
+fi
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (9 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 10/17] powerpc/ftrace: Add a postlink script to validate function tracer Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-10-09 15:23   ` Masahiro Yamada
  2024-09-15 20:56 ` [PATCH v5 12/17] powerpc64/ftrace: Move ftrace sequence out of line Hari Bathini
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

On powerpc, we would like to be able to make a pass on vmlinux.o and
generate a new object file to be linked into vmlinux. Add a generic pass
in Makefile.vmlinux that architectures can use for this purpose.

Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
will be invoked prior to the final vmlinux link step.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v5:
* Intermediate files named .vmlinux.arch.* instead of .arch.vmlinux.*


 arch/Kconfig             | 6 ++++++
 scripts/Makefile.vmlinux | 7 +++++++
 scripts/link-vmlinux.sh  | 7 ++++++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 975dd22a2dbd..ef868ff8156a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1643,4 +1643,10 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 config ARCH_NEED_CMPXCHG_1_EMU
 	bool
 
+config ARCH_WANTS_PRE_LINK_VMLINUX
+	def_bool n
+	help
+	  An architecture can select this if it provides arch/<arch>/tools/Makefile
+	  with .arch.vmlinux.o target to be linked into vmlinux.
+
 endmenu
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..edf6fae8d960 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -22,6 +22,13 @@ targets += .vmlinux.export.o
 vmlinux: .vmlinux.export.o
 endif
 
+ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
+vmlinux: arch/$(SRCARCH)/tools/.vmlinux.arch.o
+
+arch/$(SRCARCH)/tools/.vmlinux.arch.o: vmlinux.o
+	$(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools $@
+endif
+
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Final link of vmlinux with optional arch pass after final link
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index f7b2503cdba9..b3a940c0e6c2 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -100,7 +100,7 @@ vmlinux_link()
 	${ld} ${ldflags} -o ${output}					\
 		${wl}--whole-archive ${objs} ${wl}--no-whole-archive	\
 		${wl}--start-group ${libs} ${wl}--end-group		\
-		${kallsymso} ${btf_vmlinux_bin_o} ${ldlibs}
+		${kallsymso} ${btf_vmlinux_bin_o} ${arch_vmlinux_o} ${ldlibs}
 }
 
 # generate .BTF typeinfo from DWARF debuginfo
@@ -214,6 +214,11 @@ fi
 
 ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init init/version-timestamp.o
 
+arch_vmlinux_o=""
+if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
+	arch_vmlinux_o=arch/${SRCARCH}/tools/.vmlinux.arch.o
+fi
+
 btf_vmlinux_bin_o=
 kallsymso=
 strip_debug=
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 12/17] powerpc64/ftrace: Move ftrace sequence out of line
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (10 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-10-09 15:35   ` Masahiro Yamada
  2024-09-15 20:56 ` [PATCH v5 13/17] powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs Hari Bathini
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Function profile sequence on powerpc includes two instructions at the
beginning of each function:
	mflr	r0
	bl	ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

Given the sequence, we cannot return from ftrace_caller with 'blr' as we
need to keep LR and r0 intact. This results in link stack (return
address predictor) imbalance when ftrace is enabled. To address that, we
would like to use a three instruction sequence:
	mflr	r0
	bl	ftrace_caller
	mtlr	r0

Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
reserve two instruction slots before the function. This results in a
total of five instruction slots to be reserved for ftrace use on each
function that is traced.

Move the function profile sequence out-of-line to minimize its impact.
To do this, we reserve a single nop at function entry using
-fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
the total number of functions that can be traced. This is then used to
generate a .S file reserving the appropriate amount of space for use as
ftrace stubs, which is built and linked into vmlinux.

On bootup, the stub space is split into separate stubs per function and
populated with the proper instruction sequence. A pointer to the
associated stub is maintained in dyn_arch_ftrace.

For modules, space for ftrace stubs is reserved from the generic module
stub space.

This is restricted to and enabled by default only on 64-bit powerpc,
though there are some changes to accommodate 32-bit powerpc. This is
done so that 32-bit powerpc could choose to opt into this based on
further tests and benchmarks.

As an example, after this patch, kernel functions will have a single nop
at function entry:
<kernel_clone>:
	addis	r2,r12,467
	addi	r2,r2,-16028
	nop
	mfocrf	r11,8
	...

When ftrace is enabled, the nop is converted to an unconditional branch
to the stub associated with that function:
<kernel_clone>:
	addis	r2,r12,467
	addi	r2,r2,-16028
	b	ftrace_ool_stub_text_end+0x11b28
	mfocrf	r11,8
	...

The associated stub:
<ftrace_ool_stub_text_end+0x11b28>:
	mflr	r0
	bl	ftrace_caller
	mtlr	r0
	b	kernel_clone+0xc
	...

This change showed an improvement of ~10% in null_syscall benchmark on a
Power 10 system with ftrace enabled.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v5:
* Fixed ftrace stack tracer failure due to inadvertent use of
  'add r7, r3, MCOUNT_INSN_SIZE' instruction instead of
  'addi r7, r3, MCOUNT_INSN_SIZE'
* Fixed build error for !CONFIG_MODULES case.
* .vmlinux.arch.* files compiled under arch/powerpc/tools
* Made sure .vmlinux.arch.* files are cleaned with `make clean`


 arch/powerpc/Kbuild                        |   2 +-
 arch/powerpc/Kconfig                       |   5 +
 arch/powerpc/Makefile                      |   4 +
 arch/powerpc/include/asm/ftrace.h          |  11 ++
 arch/powerpc/include/asm/module.h          |   5 +
 arch/powerpc/kernel/asm-offsets.c          |   4 +
 arch/powerpc/kernel/module_64.c            |  58 +++++++-
 arch/powerpc/kernel/trace/ftrace.c         | 162 +++++++++++++++++++--
 arch/powerpc/kernel/trace/ftrace_entry.S   | 116 +++++++++++----
 arch/powerpc/tools/Makefile                |  12 ++
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh |  43 ++++++
 11 files changed, 384 insertions(+), 38 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/ftrace-gen-ool-stubs.sh

diff --git a/arch/powerpc/Kbuild b/arch/powerpc/Kbuild
index 571f260b0842..b010ccb071b6 100644
--- a/arch/powerpc/Kbuild
+++ b/arch/powerpc/Kbuild
@@ -19,4 +19,4 @@ obj-$(CONFIG_KEXEC_CORE)  += kexec/
 obj-$(CONFIG_KEXEC_FILE)  += purgatory/
 
 # for cleaning
-subdir- += boot
+subdir- += boot tools
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index de18f3baff66..bae96b65f295 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
 	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
 
+config PPC_FTRACE_OUT_OF_LINE
+	def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	depends on PPC64
+	select ARCH_WANTS_PRE_LINK_VMLINUX
+
 config HOTPLUG_CPU
 	bool "Support for enabling/disabling CPUs"
 	depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index bbfe4a1f06ef..c973e6cd1ae8 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -155,7 +155,11 @@ CC_FLAGS_NO_FPU		:= $(call cc-option,-msoft-float)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS	+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+CC_FLAGS_FTRACE := -fpatchable-function-entry=1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 278d4548e8f1..bdbafc668b20 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -24,6 +24,10 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	/* pointer to the associated out-of-line stub */
+	unsigned long ool_stub;
+#endif
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
@@ -130,6 +134,13 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
 
 #ifdef CONFIG_FUNCTION_TRACER
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+struct ftrace_ool_stub {
+	u32	insn[4];
+};
+extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], ftrace_ool_stub_inittext[];
+extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_inittext_count;
+#endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
 #else
diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h
index 300c777cc307..9ee70a4a0fde 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -47,6 +47,11 @@ struct mod_arch_specific {
 #ifdef CONFIG_DYNAMIC_FTRACE
 	unsigned long tramp;
 	unsigned long tramp_regs;
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	struct ftrace_ool_stub *ool_stubs;
+	unsigned int ool_stub_count;
+	unsigned int ool_stub_index;
+#endif
 #endif
 };
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 23733282de4d..6854547d3164 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -674,5 +674,9 @@ int main(void)
 	DEFINE(BPT_SIZE, BPT_SIZE);
 #endif
 
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	DEFINE(FTRACE_OOL_STUB_SIZE, sizeof(struct ftrace_ool_stub));
+#endif
+
 	return 0;
 }
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 1db88409bd95..6816e9967cab 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -205,7 +205,9 @@ static int relacmp(const void *_x, const void *_y)
 
 /* Get size of potential trampolines required. */
 static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
-				    const Elf64_Shdr *sechdrs)
+				    const Elf64_Shdr *sechdrs,
+				    char *secstrings,
+				    struct module *me)
 {
 	/* One extra reloc so it's always 0-addr terminated */
 	unsigned long relocs = 1;
@@ -244,6 +246,24 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 	/* stubs for ftrace_caller and ftrace_regs_caller */
 	relocs += IS_ENABLED(CONFIG_DYNAMIC_FTRACE) + IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS);
 
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	/* stubs for the function tracer */
+	for (i = 1; i < hdr->e_shnum; i++) {
+		if (!strcmp(secstrings + sechdrs[i].sh_name, "__patchable_function_entries")) {
+			me->arch.ool_stub_count = sechdrs[i].sh_size / sizeof(unsigned long);
+			me->arch.ool_stub_index = 0;
+			relocs += roundup(me->arch.ool_stub_count * sizeof(struct ftrace_ool_stub),
+					  sizeof(struct ppc64_stub_entry)) /
+				  sizeof(struct ppc64_stub_entry);
+			break;
+		}
+	}
+	if (i == hdr->e_shnum) {
+		pr_err("%s: doesn't contain __patchable_function_entries.\n", me->name);
+		return -ENOEXEC;
+	}
+#endif
+
 	pr_debug("Looks like a total of %lu stubs, max\n", relocs);
 	return relocs * sizeof(struct ppc64_stub_entry);
 }
@@ -454,7 +474,7 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
 #endif
 
 	/* Override the stubs size */
-	sechdrs[me->arch.stubs_section].sh_size = get_stubs_size(hdr, sechdrs);
+	sechdrs[me->arch.stubs_section].sh_size = get_stubs_size(hdr, sechdrs, secstrings, me);
 
 	return 0;
 }
@@ -1079,6 +1099,37 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
 	return 0;
 }
 
+static int setup_ftrace_ool_stubs(const Elf64_Shdr *sechdrs, unsigned long addr, struct module *me)
+{
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	unsigned int i, total_stubs, num_stubs;
+	struct ppc64_stub_entry *stub;
+
+	total_stubs = sechdrs[me->arch.stubs_section].sh_size / sizeof(*stub);
+	num_stubs = roundup(me->arch.ool_stub_count * sizeof(struct ftrace_ool_stub),
+			    sizeof(struct ppc64_stub_entry)) / sizeof(struct ppc64_stub_entry);
+
+	/* Find the next available entry */
+	stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
+	for (i = 0; stub_func_addr(stub[i].funcdata); i++)
+		if (WARN_ON(i >= total_stubs))
+			return -1;
+
+	if (WARN_ON(i + num_stubs > total_stubs))
+		return -1;
+
+	stub += i;
+	me->arch.ool_stubs = (struct ftrace_ool_stub *)stub;
+
+	/* reserve stubs */
+	for (i = 0; i < num_stubs; i++)
+		if (patch_u32((void *)&stub->funcdata, PPC_RAW_NOP()))
+			return -1;
+#endif
+
+	return 0;
+}
+
 int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sechdrs)
 {
 	mod->arch.tramp = stub_for_addr(sechdrs,
@@ -1097,6 +1148,9 @@ int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sechdrs)
 	if (!mod->arch.tramp)
 		return -ENOENT;
 
+	if (setup_ftrace_ool_stubs(sechdrs, mod->arch.tramp, mod))
+		return -ENOENT;
+
 	return 0;
 }
 #endif
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 719517265d39..1fee074388cc 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -37,7 +37,8 @@ unsigned long ftrace_call_adjust(unsigned long addr)
 	if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
 		return 0;
 
-	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
+	    !IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
 		addr += MCOUNT_INSN_SIZE;
 
 	return addr;
@@ -127,11 +128,25 @@ static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long a
 }
 #endif
 
+static unsigned long ftrace_get_ool_stub(struct dyn_ftrace *rec)
+{
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	return rec->arch.ool_stub;
+#else
+	BUILD_BUG();
+#endif
+}
+
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
 {
-	unsigned long ip = rec->ip;
+	unsigned long ip;
 	unsigned long stub;
 
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+		ip = ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
+	else
+		ip = rec->ip;
+
 	if (is_offset_in_branch_range(addr - ip))
 		/* Within range */
 		stub = addr;
@@ -142,7 +157,7 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 		stub = ftrace_lookup_module_stub(ip, addr);
 
 	if (!stub) {
-		pr_err("0x%lx: No ftrace stubs reachable\n", ip);
+		pr_err("0x%lx (0x%lx): No ftrace stubs reachable\n", ip, rec->ip);
 		return -EINVAL;
 	}
 
@@ -150,6 +165,92 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 	return 0;
 }
 
+static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
+{
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	static int ool_stub_text_end_index, ool_stub_inittext_index;
+	int ret = 0, ool_stub_count, *ool_stub_index;
+	ppc_inst_t inst;
+	/*
+	 * See ftrace_entry.S if changing the below instruction sequence, as we rely on
+	 * decoding the last branch instruction here to recover the correct function ip.
+	 */
+	struct ftrace_ool_stub *ool_stub, ool_stub_template = {
+		.insn = {
+			PPC_RAW_MFLR(_R0),
+			PPC_RAW_NOP(),		/* bl ftrace_caller */
+			PPC_RAW_MTLR(_R0),
+			PPC_RAW_NOP()		/* b rec->ip + 4 */
+		}
+	};
+
+	WARN_ON(rec->arch.ool_stub);
+
+	if (is_kernel_inittext(rec->ip)) {
+		ool_stub = ftrace_ool_stub_inittext;
+		ool_stub_index = &ool_stub_inittext_index;
+		ool_stub_count = ftrace_ool_stub_inittext_count;
+	} else if (is_kernel_text(rec->ip)) {
+		ool_stub = ftrace_ool_stub_text_end;
+		ool_stub_index = &ool_stub_text_end_index;
+		ool_stub_count = ftrace_ool_stub_text_end_count;
+#ifdef CONFIG_MODULES
+	} else if (mod) {
+		ool_stub = mod->arch.ool_stubs;
+		ool_stub_index = &mod->arch.ool_stub_index;
+		ool_stub_count = mod->arch.ool_stub_count;
+#endif
+	} else {
+		return -EINVAL;
+	}
+
+	ool_stub += (*ool_stub_index)++;
+
+	if (WARN_ON(*ool_stub_index > ool_stub_count))
+		return -EINVAL;
+
+	if (!is_offset_in_branch_range((long)rec->ip - (long)&ool_stub->insn[0]) ||
+	    !is_offset_in_branch_range((long)(rec->ip + MCOUNT_INSN_SIZE) -
+				       (long)&ool_stub->insn[3])) {
+		pr_err("%s: ftrace ool stub out of range (%p -> %p).\n",
+					__func__, (void *)rec->ip, (void *)&ool_stub->insn[0]);
+		return -EINVAL;
+	}
+
+	rec->arch.ool_stub = (unsigned long)&ool_stub->insn[0];
+
+	/* bl ftrace_caller */
+	if (!mod)
+		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &inst);
+#ifdef CONFIG_MODULES
+	else
+		/*
+		 * We can't use ftrace_get_call_inst() since that uses
+		 * __module_text_address(rec->ip) to look up the module.
+		 * But, since the module is not fully formed at this stage,
+		 * the lookup fails. We know the target though, so generate
+		 * the branch inst directly.
+		 */
+		inst = ftrace_create_branch_inst(ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE,
+						 mod->arch.tramp, 1);
+#endif
+	ool_stub_template.insn[1] = ppc_inst_val(inst);
+
+	/* b rec->ip + 4 */
+	if (!ret && create_branch(&inst, &ool_stub->insn[3], rec->ip + MCOUNT_INSN_SIZE, 0))
+		return -EINVAL;
+	ool_stub_template.insn[3] = ppc_inst_val(inst);
+
+	if (!ret)
+		ret = patch_instructions((u32 *)ool_stub, (u32 *)&ool_stub_template,
+					 sizeof(ool_stub_template), false);
+
+	return ret;
+#else /* !CONFIG_PPC_FTRACE_OUT_OF_LINE */
+	BUILD_BUG();
+#endif
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
 {
@@ -162,18 +263,29 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	ppc_inst_t old, new;
-	int ret;
+	unsigned long ip = rec->ip;
+	int ret = 0;
 
 	/* This can only ever be called during module load */
-	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
+	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(ip)))
 		return -EINVAL;
 
 	old = ppc_inst(PPC_RAW_NOP());
-	ret = ftrace_get_call_inst(rec, addr, &new);
-	if (ret)
-		return ret;
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
+		ip = ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
+		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &old);
+	}
+
+	ret |= ftrace_get_call_inst(rec, addr, &new);
+
+	if (!ret)
+		ret = ftrace_modify_code(ip, old, new);
 
-	return ftrace_modify_code(rec->ip, old, new);
+	if (!ret && IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+		ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
+			 ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_ool_stub(rec) - (long)rec->ip)));
+
+	return ret;
 }
 
 int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
@@ -206,6 +318,13 @@ void ftrace_replace_code(int enable)
 		new_addr = ftrace_get_addr_new(rec);
 		update = ftrace_update_record(rec, enable);
 
+		if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) && update != FTRACE_UPDATE_IGNORE) {
+			ip = ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE;
+			ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &nop_inst);
+			if (ret)
+				goto out;
+		}
+
 		switch (update) {
 		case FTRACE_UPDATE_IGNORE:
 		default:
@@ -230,6 +349,24 @@ void ftrace_replace_code(int enable)
 
 		if (!ret)
 			ret = ftrace_modify_code(ip, old, new);
+
+		if (!ret && IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) &&
+		    (update == FTRACE_UPDATE_MAKE_NOP || update == FTRACE_UPDATE_MAKE_CALL)) {
+			/* Update the actual ftrace location */
+			call_inst = ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_ool_stub(rec) -
+							    (long)rec->ip));
+			nop_inst = ppc_inst(PPC_RAW_NOP());
+			ip = rec->ip;
+
+			if (update == FTRACE_UPDATE_MAKE_NOP)
+				ret = ftrace_modify_code(ip, call_inst, nop_inst);
+			else
+				ret = ftrace_modify_code(ip, nop_inst, call_inst);
+
+			if (ret)
+				goto out;
+		}
+
 		if (ret)
 			goto out;
 	}
@@ -249,7 +386,8 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	/* Verify instructions surrounding the ftrace location */
 	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
 		/* Expect nops */
-		ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
+		if (!IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
 		if (!ret)
 			ret = ftrace_validate_inst(ip, ppc_inst(PPC_RAW_NOP()));
 	} else if (IS_ENABLED(CONFIG_PPC32)) {
@@ -277,6 +415,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	if (ret)
 		return ret;
 
+	/* Set up out-of-line stub */
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+		return ftrace_init_ool_stub(mod, rec);
+
 	/* Nop-out the ftrace location */
 	new = ppc_inst(PPC_RAW_NOP());
 	addr = MCOUNT_ADDR;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 244a1c7bb1e8..5b2fc6483dce 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -56,7 +56,7 @@
 	SAVE_GPR(2, r1)
 	SAVE_GPRS(11, 31, r1)
 	.else
-#ifdef CONFIG_LIVEPATCH_64
+#if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
 	SAVE_GPR(14, r1)
 #endif
 	.endif
@@ -78,10 +78,6 @@
 
 	/* Get the _mcount() call site out of LR */
 	mflr	r7
-	/* Save it as pt_regs->nip */
-	PPC_STL	r7, _NIP(r1)
-	/* Also save it in B's stackframe header for proper unwind */
-	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
 	/* Save the read LR in pt_regs->link */
 	PPC_STL	r0, _LINK(r1)
 
@@ -96,16 +92,6 @@
 	lwz	r5,function_trace_op@l(r3)
 #endif
 
-#ifdef CONFIG_LIVEPATCH_64
-	mr	r14, r7		/* remember old NIP */
-#endif
-
-	/* Calculate ip from nip-4 into r3 for call below */
-	subi    r3, r7, MCOUNT_INSN_SIZE
-
-	/* Put the original return address in r4 as parent_ip */
-	mr	r4, r0
-
 	/* Save special regs */
 	PPC_STL	r8, _MSR(r1)
 	.if \allregs == 1
@@ -114,17 +100,69 @@
 	PPC_STL	r11, _CCR(r1)
 	.endif
 
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	/* Save our real return address in nvr for return */
+	.if \allregs == 0
+	SAVE_GPR(15, r1)
+	.endif
+	mr	r15, r7
+	/*
+	 * We want the ftrace location in the function, but our lr (in r7)
+	 * points at the 'mtlr r0' instruction in the out of line stub.  To
+	 * recover the ftrace location, we read the branch instruction in the
+	 * stub, and adjust our lr by the branch offset.
+	 *
+	 * See ftrace_init_ool_stub() for the profile sequence.
+	 */
+	lwz	r8, MCOUNT_INSN_SIZE(r7)
+	slwi	r8, r8, 6
+	srawi	r8, r8, 6
+	add	r3, r7, r8
+	/*
+	 * Override our nip to point past the branch in the original function.
+	 * This allows reliable stack trace and the ftrace stack tracer to work as-is.
+	 */
+	addi	r7, r3, MCOUNT_INSN_SIZE
+#else
+	/* Calculate ip from nip-4 into r3 for call below */
+	subi    r3, r7, MCOUNT_INSN_SIZE
+#endif
+
+	/* Save NIP as pt_regs->nip */
+	PPC_STL	r7, _NIP(r1)
+	/* Also save it in B's stackframe header for proper unwind */
+	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
+#if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
+	mr	r14, r7		/* remember old NIP */
+#endif
+
+	/* Put the original return address in r4 as parent_ip */
+	mr	r4, r0
+
 	/* Load &pt_regs in r6 for call below */
 	addi    r6, r1, STACK_INT_FRAME_REGS
 .endm
 
 .macro	ftrace_regs_exit allregs
+#ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
 	/* Load ctr with the possibly modified NIP */
 	PPC_LL	r3, _NIP(r1)
 	mtctr	r3
 
 #ifdef CONFIG_LIVEPATCH_64
 	cmpd	r14, r3		/* has NIP been altered? */
+#endif
+#else /* !CONFIG_PPC_FTRACE_OUT_OF_LINE */
+	/* Load LR with the possibly modified NIP */
+	PPC_LL	r3, _NIP(r1)
+	cmpd	r14, r3		/* has NIP been altered? */
+	bne-	1f
+
+	mr	r3, r15
+	.if \allregs == 0
+	REST_GPR(15, r1)
+	.endif
+1:	mtlr	r3
 #endif
 
 	/* Restore gprs */
@@ -132,14 +170,16 @@
 	REST_GPRS(2, 31, r1)
 	.else
 	REST_GPRS(3, 10, r1)
-#ifdef CONFIG_LIVEPATCH_64
+#if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
 	REST_GPR(14, r1)
 #endif
 	.endif
 
 	/* Restore possibly modified LR */
 	PPC_LL	r0, _LINK(r1)
+#ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
 	mtlr	r0
+#endif
 
 #ifdef CONFIG_PPC64
 	/* Restore callee's TOC */
@@ -153,7 +193,16 @@
         /* Based on the cmpd above, if the NIP was altered handle livepatch */
 	bne-	livepatch_handler
 #endif
-	bctr			/* jump after _mcount site */
+	/* jump after _mcount site */
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	/*
+	 * Return with blr to keep the link stack balanced. The function profiling sequence
+	 * uses 'mtlr r0' to restore LR.
+	 */
+	blr
+#else
+	bctr
+#endif
 .endm
 
 _GLOBAL(ftrace_regs_caller)
@@ -177,6 +226,11 @@ _GLOBAL(ftrace_stub)
 
 #ifdef CONFIG_PPC64
 ftrace_no_trace:
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	REST_GPR(3, r1)
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	blr
+#else
 	mflr	r3
 	mtctr	r3
 	REST_GPR(3, r1)
@@ -184,6 +238,7 @@ ftrace_no_trace:
 	mtlr	r0
 	bctr
 #endif
+#endif
 
 #ifdef CONFIG_LIVEPATCH_64
 	/*
@@ -194,11 +249,17 @@ ftrace_no_trace:
 	 * We get here when a function A, calls another function B, but B has
 	 * been live patched with a new function C.
 	 *
-	 * On entry:
-	 *  - we have no stack frame and can not allocate one
+	 * On entry, we have no stack frame and can not allocate one.
+	 *
+	 * With PPC_FTRACE_OUT_OF_LINE=n, on entry:
 	 *  - LR points back to the original caller (in A)
 	 *  - CTR holds the new NIP in C
 	 *  - r0, r11 & r12 are free
+	 *
+	 * With PPC_FTRACE_OUT_OF_LINE=y, on entry:
+	 *  - r0 points back to the original caller (in A)
+	 *  - LR holds the new NIP in C
+	 *  - r11 & r12 are free
 	 */
 livepatch_handler:
 	ld	r12, PACA_THREAD_INFO(r13)
@@ -208,18 +269,23 @@ livepatch_handler:
 	addi	r11, r11, 24
 	std	r11, TI_livepatch_sp(r12)
 
-	/* Save toc & real LR on livepatch stack */
-	std	r2,  -24(r11)
-	mflr	r12
-	std	r12, -16(r11)
-
 	/* Store stack end marker */
 	lis     r12, STACK_END_MAGIC@h
 	ori     r12, r12, STACK_END_MAGIC@l
 	std	r12, -8(r11)
 
-	/* Put ctr in r12 for global entry and branch there */
+	/* Save toc & real LR on livepatch stack */
+	std	r2,  -24(r11)
+#ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
+	mflr	r12
+	std	r12, -16(r11)
 	mfctr	r12
+#else
+	std	r0, -16(r11)
+	mflr	r12
+	/* Put ctr in r12 for global entry and branch there */
+	mtctr	r12
+#endif
 	bctrl
 
 	/*
diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
new file mode 100644
index 000000000000..3a389526498e
--- /dev/null
+++ b/arch/powerpc/tools/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+quiet_cmd_gen_ftrace_ool_stubs = GEN     $@
+      cmd_gen_ftrace_ool_stubs = $< vmlinux.o $@
+
+$(obj)/.vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE
+	$(call if_changed,gen_ftrace_ool_stubs)
+
+$(obj)/.vmlinux.arch.o: $(obj)/.vmlinux.arch.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+clean-files += .vmlinux.arch.S .vmlinux.arch.o
diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
new file mode 100755
index 000000000000..8e0a6d4ea202
--- /dev/null
+++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+# Error out on error
+set -e
+
+is_enabled() {
+	grep -q "^$1=y" include/config/auto.conf
+}
+
+vmlinux_o=${1}
+arch_vmlinux_S=${2}
+
+RELOCATION=R_PPC64_ADDR64
+if is_enabled CONFIG_PPC32; then
+	RELOCATION=R_PPC_ADDR32
+fi
+
+num_ool_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
+		     grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
+num_ool_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
+			 grep ".init.text" | grep "${RELOCATION}" | wc -l)
+
+cat > ${arch_vmlinux_S} <<EOF
+#include <asm/asm-offsets.h>
+#include <linux/linkage.h>
+
+.pushsection .tramp.ftrace.text,"aw"
+SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text})
+
+SYM_CODE_START(ftrace_ool_stub_text_end)
+	.space ${num_ool_stubs_text} * FTRACE_OOL_STUB_SIZE
+SYM_CODE_END(ftrace_ool_stub_text_end)
+.popsection
+
+.pushsection .tramp.ftrace.init,"aw"
+SYM_DATA(ftrace_ool_stub_inittext_count, .long ${num_ool_stubs_inittext})
+
+SYM_CODE_START(ftrace_ool_stub_inittext)
+	.space ${num_ool_stubs_inittext} * FTRACE_OOL_STUB_SIZE
+SYM_CODE_END(ftrace_ool_stub_inittext)
+.popsection
+EOF
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 13/17] powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (11 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 12/17] powerpc64/ftrace: Move ftrace sequence out of line Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-10-09 15:36   ` Masahiro Yamada
  2024-09-15 20:56 ` [PATCH v5 14/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Hari Bathini
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

We are restricted to a .text size of ~32MB when using out-of-line
function profile sequence. Allow this to be extended up to the previous
limit of ~64MB by reserving space in the middle of .text.

A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is
introduced to specify the number of function stubs that are reserved in
.text. On boot, ftrace utilizes stubs from this area first before using
the stub area at the end of .text.

A ppc64le defconfig has ~44k functions that can be traced. A more
conservative value of 32k functions is chosen as the default value of
PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space
than necessary by default. If building a kernel that only has 32k
trace-able functions, we won't allot any more space at the end of .text
during the pass on vmlinux.o. Otherwise, only the remaining functions
get space for stubs at the end of .text. This default value should help
cover a .text size of ~48MB in total (including space reserved at the
end of .text which can cover up to 32MB), which should be sufficient for
most common builds. For a very small kernel build, this can be set to 0.
Or, this can be bumped up to a larger value to support vmlinux .text
size up to ~64MB.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v5:
* num_ool_stubs_text_end used for setting up ftrace_ool_stub_text_end
  set to zero instead of computing to some random negative value when
  not required.

 arch/powerpc/Kconfig                       | 12 ++++++++++++
 arch/powerpc/include/asm/ftrace.h          |  6 ++++--
 arch/powerpc/kernel/trace/ftrace.c         | 21 +++++++++++++++++----
 arch/powerpc/kernel/trace/ftrace_entry.S   |  8 ++++++++
 arch/powerpc/tools/Makefile                |  2 +-
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh | 16 ++++++++++++----
 6 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bae96b65f295..a0ce00368bab 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -573,6 +573,18 @@ config PPC_FTRACE_OUT_OF_LINE
 	depends on PPC64
 	select ARCH_WANTS_PRE_LINK_VMLINUX
 
+config PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE
+	int "Number of ftrace out-of-line stubs to reserve within .text"
+	default 32768 if PPC_FTRACE_OUT_OF_LINE
+	default 0
+	help
+	  Number of stubs to reserve for use by ftrace. This space is
+	  reserved within .text, and is distinct from any additional space
+	  added at the end of .text before the final vmlinux link. Set to
+	  zero to have stubs only be generated at the end of vmlinux (only
+	  if the size of vmlinux is less than 32MB). Set to a higher value
+	  if building vmlinux larger than 48MB.
+
 config HOTPLUG_CPU
 	bool "Support for enabling/disabling CPUs"
 	depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index bdbafc668b20..28f3590ca780 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -138,8 +138,10 @@ extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 struct ftrace_ool_stub {
 	u32	insn[4];
 };
-extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], ftrace_ool_stub_inittext[];
-extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_inittext_count;
+extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], ftrace_ool_stub_text[],
+			      ftrace_ool_stub_inittext[];
+extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_text_count,
+		    ftrace_ool_stub_inittext_count;
 #endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 1fee074388cc..bee2c54a8c04 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -168,7 +168,7 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
 {
 #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
-	static int ool_stub_text_end_index, ool_stub_inittext_index;
+	static int ool_stub_text_index, ool_stub_text_end_index, ool_stub_inittext_index;
 	int ret = 0, ool_stub_count, *ool_stub_index;
 	ppc_inst_t inst;
 	/*
@@ -191,9 +191,22 @@ static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
 		ool_stub_index = &ool_stub_inittext_index;
 		ool_stub_count = ftrace_ool_stub_inittext_count;
 	} else if (is_kernel_text(rec->ip)) {
-		ool_stub = ftrace_ool_stub_text_end;
-		ool_stub_index = &ool_stub_text_end_index;
-		ool_stub_count = ftrace_ool_stub_text_end_count;
+		/*
+		 * ftrace records are sorted, so we first use up the stub area within .text
+		 * (ftrace_ool_stub_text) before using the area at the end of .text
+		 * (ftrace_ool_stub_text_end), unless the stub is out of range of the record.
+		 */
+		if (ool_stub_text_index >= ftrace_ool_stub_text_count ||
+		    !is_offset_in_branch_range((long)rec->ip -
+					       (long)&ftrace_ool_stub_text[ool_stub_text_index])) {
+			ool_stub = ftrace_ool_stub_text_end;
+			ool_stub_index = &ool_stub_text_end_index;
+			ool_stub_count = ftrace_ool_stub_text_end_count;
+		} else {
+			ool_stub = ftrace_ool_stub_text;
+			ool_stub_index = &ool_stub_text_index;
+			ool_stub_count = ftrace_ool_stub_text_count;
+		}
 #ifdef CONFIG_MODULES
 	} else if (mod) {
 		ool_stub = mod->arch.ool_stubs;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 5b2fc6483dce..a6bf7f841040 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -374,6 +374,14 @@ _GLOBAL(return_to_handler)
 	blr
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+SYM_DATA(ftrace_ool_stub_text_count, .long CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE)
+
+SYM_CODE_START(ftrace_ool_stub_text)
+	.space CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE * FTRACE_OOL_STUB_SIZE
+SYM_CODE_END(ftrace_ool_stub_text)
+#endif
+
 .pushsection ".tramp.ftrace.text","aw",@progbits;
 .globl ftrace_tramp_text
 ftrace_tramp_text:
diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
index 3a389526498e..9eeb6edf02fe 100644
--- a/arch/powerpc/tools/Makefile
+++ b/arch/powerpc/tools/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
 quiet_cmd_gen_ftrace_ool_stubs = GEN     $@
-      cmd_gen_ftrace_ool_stubs = $< vmlinux.o $@
+      cmd_gen_ftrace_ool_stubs = $< $(CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE) vmlinux.o $@
 
 $(obj)/.vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE
 	$(call if_changed,gen_ftrace_ool_stubs)
diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
index 8e0a6d4ea202..d6bd834e0868 100755
--- a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
+++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
@@ -8,8 +8,9 @@ is_enabled() {
 	grep -q "^$1=y" include/config/auto.conf
 }
 
-vmlinux_o=${1}
-arch_vmlinux_S=${2}
+vmlinux_o=${2}
+arch_vmlinux_S=${3}
+arch_vmlinux_o=$(dirname ${arch_vmlinux_S})/$(basename ${arch_vmlinux_S} .S).o
 
 RELOCATION=R_PPC64_ADDR64
 if is_enabled CONFIG_PPC32; then
@@ -21,15 +22,22 @@ num_ool_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries
 num_ool_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
 			 grep ".init.text" | grep "${RELOCATION}" | wc -l)
 
+num_ool_stubs_text_builtin=${1}
+if [ ${num_ool_stubs_text} -gt ${num_ool_stubs_text_builtin} ]; then
+	num_ool_stubs_text_end=$(expr ${num_ool_stubs_text} - ${num_ool_stubs_text_builtin})
+else
+	num_ool_stubs_text_end=0
+fi
+
 cat > ${arch_vmlinux_S} <<EOF
 #include <asm/asm-offsets.h>
 #include <linux/linkage.h>
 
 .pushsection .tramp.ftrace.text,"aw"
-SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text})
+SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text_end})
 
 SYM_CODE_START(ftrace_ool_stub_text_end)
-	.space ${num_ool_stubs_text} * FTRACE_OOL_STUB_SIZE
+	.space ${num_ool_stubs_text_end} * FTRACE_OOL_STUB_SIZE
 SYM_CODE_END(ftrace_ool_stub_text_end)
 .popsection
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 14/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (12 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 13/17] powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 15/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Hari Bathini
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.

This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.

The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.

A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.

To support this, we reserve a nop before each function on 32-bit
powerpc. For 64-bit powerpc, two nops are reserved before each
out-of-line stub. During ftrace activation, we update this location with
the associated ftrace_ops pointer. Then, on ftrace entry, we load from
this location and call into ftrace_ops->func().

For 64-bit powerpc, we ensure that the out-of-line stub area is
doubleword aligned so that ftrace_ops address can be updated atomically.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                       |  1 +
 arch/powerpc/Makefile                      |  4 ++
 arch/powerpc/include/asm/ftrace.h          |  5 +-
 arch/powerpc/kernel/asm-offsets.c          |  4 ++
 arch/powerpc/kernel/trace/ftrace.c         | 59 +++++++++++++++++++++-
 arch/powerpc/kernel/trace/ftrace_entry.S   | 36 ++++++++++---
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh |  5 +-
 7 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a0ce00368bab..f1a0adedeb8e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -234,6 +234,7 @@ config PPC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if PPC_FTRACE_OUT_OF_LINE || (PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index c973e6cd1ae8..7dede0ec0163 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -158,8 +158,12 @@ KBUILD_CPPFLAGS	+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
 ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
 CC_FLAGS_FTRACE := -fpatchable-function-entry=1
 else
+ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS # PPC32 only
+CC_FLAGS_FTRACE := -fpatchable-function-entry=3,1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
 endif
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 28f3590ca780..1ad1328cf4e3 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -136,8 +136,11 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
 struct ftrace_ool_stub {
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	struct ftrace_ops *ftrace_op;
+#endif
 	u32	insn[4];
-};
+} __aligned(sizeof(unsigned long));
 extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], ftrace_ool_stub_text[],
 			      ftrace_ool_stub_inittext[];
 extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_text_count,
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 6854547d3164..60d1e388c2ba 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -678,5 +678,9 @@ int main(void)
 	DEFINE(FTRACE_OOL_STUB_SIZE, sizeof(struct ftrace_ool_stub));
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#endif
+
 	return 0;
 }
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index bee2c54a8c04..9090d1a21600 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -38,8 +38,11 @@ unsigned long ftrace_call_adjust(unsigned long addr)
 		return 0;
 
 	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
-	    !IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+	    !IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
 		addr += MCOUNT_INSN_SIZE;
+		if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+			addr += MCOUNT_INSN_SIZE;
+	}
 
 	return addr;
 }
@@ -264,6 +267,46 @@ static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
 #endif
 }
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+static const struct ftrace_ops *powerpc_rec_get_ops(struct dyn_ftrace *rec)
+{
+	const struct ftrace_ops *ops = NULL;
+
+	if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
+		ops = ftrace_find_unique_ops(rec);
+		WARN_ON_ONCE(!ops);
+	}
+
+	if (!ops)
+		ops = &ftrace_list_ops;
+
+	return ops;
+}
+
+static int ftrace_rec_set_ops(struct dyn_ftrace *rec, const struct ftrace_ops *ops)
+{
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+		return patch_ulong((void *)(ftrace_get_ool_stub(rec) - sizeof(unsigned long)),
+				   (unsigned long)ops);
+	else
+		return patch_ulong((void *)(rec->ip - MCOUNT_INSN_SIZE - sizeof(unsigned long)),
+				   (unsigned long)ops);
+}
+
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, &ftrace_nop_ops);
+}
+
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, powerpc_rec_get_ops(rec));
+}
+#else
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; }
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; }
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
 {
@@ -294,6 +337,10 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	if (!ret)
 		ret = ftrace_modify_code(ip, old, new);
 
+	ret = ftrace_rec_update_ops(rec);
+	if (ret)
+		return ret;
+
 	if (!ret && IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
 		ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
 			 ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_ool_stub(rec) - (long)rec->ip)));
@@ -345,16 +392,19 @@ void ftrace_replace_code(int enable)
 		case FTRACE_UPDATE_MODIFY_CALL:
 			ret = ftrace_get_call_inst(rec, new_addr, &new_call_inst);
 			ret |= ftrace_get_call_inst(rec, addr, &call_inst);
+			ret |= ftrace_rec_update_ops(rec);
 			old = call_inst;
 			new = new_call_inst;
 			break;
 		case FTRACE_UPDATE_MAKE_NOP:
 			ret = ftrace_get_call_inst(rec, addr, &call_inst);
+			ret |= ftrace_rec_set_nop_ops(rec);
 			old = call_inst;
 			new = nop_inst;
 			break;
 		case FTRACE_UPDATE_MAKE_CALL:
 			ret = ftrace_get_call_inst(rec, new_addr, &call_inst);
+			ret |= ftrace_rec_update_ops(rec);
 			old = nop_inst;
 			new = call_inst;
 			break;
@@ -470,6 +520,13 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	ppc_inst_t old, new;
 	int ret;
 
+	/*
+	 * When using CALL_OPS, the function to call is associated with the
+	 * call site, and we don't have a global function pointer to update.
+	 */
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+		return 0;
+
 	old = ppc_inst_read((u32 *)&ftrace_call);
 	new = ftrace_create_branch_inst(ip, ppc_function_entry(func), 1);
 	ret = ftrace_modify_code(ip, old, new);
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index a6bf7f841040..ff376c990308 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -85,11 +85,21 @@
 	/* Save callee's TOC in the ABI compliant location */
 	std	r2, STK_GOT(r1)
 	LOAD_PACA_TOC()		/* get kernel TOC in r2 */
+#endif
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	/* r7 points to the instruction following the call to ftrace */
+	PPC_LL	r5, -(MCOUNT_INSN_SIZE*2 + SZL)(r7)
+	PPC_LL	r12, FTRACE_OPS_FUNC(r5)
+	mtctr	r12
+#else /* !CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS */
+#ifdef CONFIG_PPC64
 	LOAD_REG_ADDR(r3, function_trace_op)
 	ld	r5,0(r3)
 #else
 	lis	r3,function_trace_op@ha
 	lwz	r5,function_trace_op@l(r3)
+#endif
 #endif
 
 	/* Save special regs */
@@ -205,20 +215,30 @@
 #endif
 .endm
 
-_GLOBAL(ftrace_regs_caller)
-	ftrace_regs_entry 1
-	/* ftrace_call(r3, r4, r5, r6) */
+.macro ftrace_regs_func allregs
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	bctrl
+#else
+	.if \allregs == 1
 .globl ftrace_regs_call
 ftrace_regs_call:
+	.else
+.globl ftrace_call
+ftrace_call:
+	.endif
+	/* ftrace_call(r3, r4, r5, r6) */
 	bl	ftrace_stub
+#endif
+.endm
+
+_GLOBAL(ftrace_regs_caller)
+	ftrace_regs_entry 1
+	ftrace_regs_func 1
 	ftrace_regs_exit 1
 
 _GLOBAL(ftrace_caller)
 	ftrace_regs_entry 0
-	/* ftrace_call(r3, r4, r5, r6) */
-.globl ftrace_call
-ftrace_call:
-	bl	ftrace_stub
+	ftrace_regs_func 0
 	ftrace_regs_exit 0
 
 _GLOBAL(ftrace_stub)
@@ -377,7 +397,7 @@ _GLOBAL(return_to_handler)
 #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
 SYM_DATA(ftrace_ool_stub_text_count, .long CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE)
 
-SYM_CODE_START(ftrace_ool_stub_text)
+SYM_START(ftrace_ool_stub_text, SYM_L_GLOBAL, .balign SZL)
 	.space CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE * FTRACE_OOL_STUB_SIZE
 SYM_CODE_END(ftrace_ool_stub_text)
 #endif
diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
index d6bd834e0868..33f5ae4bace5 100755
--- a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
+++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
@@ -31,12 +31,13 @@ fi
 
 cat > ${arch_vmlinux_S} <<EOF
 #include <asm/asm-offsets.h>
+#include <asm/ppc_asm.h>
 #include <linux/linkage.h>
 
 .pushsection .tramp.ftrace.text,"aw"
 SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text_end})
 
-SYM_CODE_START(ftrace_ool_stub_text_end)
+SYM_START(ftrace_ool_stub_text_end, SYM_L_GLOBAL, .balign SZL)
 	.space ${num_ool_stubs_text_end} * FTRACE_OOL_STUB_SIZE
 SYM_CODE_END(ftrace_ool_stub_text_end)
 .popsection
@@ -44,7 +45,7 @@ SYM_CODE_END(ftrace_ool_stub_text_end)
 .pushsection .tramp.ftrace.init,"aw"
 SYM_DATA(ftrace_ool_stub_inittext_count, .long ${num_ool_stubs_inittext})
 
-SYM_CODE_START(ftrace_ool_stub_inittext)
+SYM_START(ftrace_ool_stub_inittext, SYM_L_GLOBAL, .balign SZL)
 	.space ${num_ool_stubs_inittext} * FTRACE_OOL_STUB_SIZE
 SYM_CODE_END(ftrace_ool_stub_inittext)
 .popsection
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 15/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (13 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 14/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 16/17] samples/ftrace: Add support for ftrace direct samples on powerpc Hari Bathini
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.

ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.

Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.

When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                     |   1 +
 arch/powerpc/include/asm/ftrace.h        |  16 ++++
 arch/powerpc/kernel/asm-offsets.c        |   3 +
 arch/powerpc/kernel/trace/ftrace.c       |  11 +++
 arch/powerpc/kernel/trace/ftrace_entry.S | 114 +++++++++++++++++------
 5 files changed, 116 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f1a0adedeb8e..ef845ea4dd27 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if PPC_FTRACE_OUT_OF_LINE || (PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
+	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 1ad1328cf4e3..5eb7631355a1 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -148,6 +148,22 @@ extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_text_count,
 #endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When an ftrace registered caller is tracing a function that is also set by a
+ * register_ftrace_direct() call, it needs to be differentiated in the
+ * ftrace_caller trampoline so that the direct call can be invoked after the
+ * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field
+ * of pt_regs. This tells ftrace_caller that there's a direct caller.
+ */
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, unsigned long addr)
+{
+	struct pt_regs *regs = &fregs->regs;
+
+	regs->orig_gpr3 = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #else
 static inline void ftrace_free_init_tramp(void) { }
 static inline unsigned long ftrace_call_adjust(unsigned long addr) { return addr; }
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 60d1e388c2ba..dbd56264a8bc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -680,6 +680,9 @@ int main(void)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
 	OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call);
+#endif
 #endif
 
 	return 0;
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 9090d1a21600..051f3db14606 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -150,6 +150,17 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 	else
 		ip = rec->ip;
 
+	if (!is_offset_in_branch_range(addr - ip) && addr != FTRACE_ADDR &&
+	    addr != FTRACE_REGS_ADDR) {
+		/* This can only happen with ftrace direct */
+		if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS)) {
+			pr_err("0x%lx (0x%lx): Unexpected target address 0x%lx\n",
+			       ip, rec->ip, addr);
+			return -EINVAL;
+		}
+		addr = FTRACE_ADDR;
+	}
+
 	if (is_offset_in_branch_range(addr - ip))
 		/* Within range */
 		stub = addr;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index ff376c990308..2c1b24100eca 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,14 +33,38 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro	ftrace_regs_entry allregs
-	/* Save the original return address in A's stack frame */
-	PPC_STL		r0, LRSAVE(r1)
 	/* Create a minimal stack frame for representing B */
 	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
 
 	/* Create our stack frame + pt_regs */
 	PPC_STLU	r1,-SWITCH_FRAME_SIZE(r1)
 
+	.if \allregs == 1
+	SAVE_GPRS(11, 12, r1)
+	.endif
+
+	/* Get the _mcount() call site out of LR */
+	mflr	r11
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Load the ftrace_op */
+	PPC_LL	r12, -(MCOUNT_INSN_SIZE*2 + SZL)(r11)
+
+	/* Load direct_call from the ftrace_op */
+	PPC_LL	r12, FTRACE_OPS_DIRECT_CALL(r12)
+	PPC_LCMPI r12, 0
+	.if \allregs == 1
+	bne	.Lftrace_direct_call_regs
+	.else
+	bne	.Lftrace_direct_call
+	.endif
+#endif
+
+	/* Save the previous LR in pt_regs->link */
+	PPC_STL	r0, _LINK(r1)
+	/* Also save it in A's stack frame */
+	PPC_STL	r0, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE+LRSAVE(r1)
+
 	/* Save all gprs to pt_regs */
 	SAVE_GPR(0, r1)
 	SAVE_GPRS(3, 10, r1)
@@ -54,7 +78,7 @@
 
 	.if \allregs == 1
 	SAVE_GPR(2, r1)
-	SAVE_GPRS(11, 31, r1)
+	SAVE_GPRS(13, 31, r1)
 	.else
 #if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
 	SAVE_GPR(14, r1)
@@ -67,20 +91,15 @@
 
 	.if \allregs == 1
 	/* Load special regs for save below */
+	mfcr	r7
 	mfmsr   r8
 	mfctr   r9
 	mfxer   r10
-	mfcr	r11
 	.else
 	/* Clear MSR to flag as ftrace_caller versus frace_regs_caller */
 	li	r8, 0
 	.endif
 
-	/* Get the _mcount() call site out of LR */
-	mflr	r7
-	/* Save the read LR in pt_regs->link */
-	PPC_STL	r0, _LINK(r1)
-
 #ifdef CONFIG_PPC64
 	/* Save callee's TOC in the ABI compliant location */
 	std	r2, STK_GOT(r1)
@@ -88,8 +107,8 @@
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
-	/* r7 points to the instruction following the call to ftrace */
-	PPC_LL	r5, -(MCOUNT_INSN_SIZE*2 + SZL)(r7)
+	/* r11 points to the instruction following the call to ftrace */
+	PPC_LL	r5, -(MCOUNT_INSN_SIZE*2 + SZL)(r11)
 	PPC_LL	r12, FTRACE_OPS_FUNC(r5)
 	mtctr	r12
 #else /* !CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS */
@@ -105,45 +124,51 @@
 	/* Save special regs */
 	PPC_STL	r8, _MSR(r1)
 	.if \allregs == 1
+	PPC_STL	r7, _CCR(r1)
 	PPC_STL	r9, _CTR(r1)
 	PPC_STL	r10, _XER(r1)
-	PPC_STL	r11, _CCR(r1)
 	.endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Clear orig_gpr3 to later detect ftrace_direct call */
+	li	r7, 0
+	PPC_STL	r7, ORIG_GPR3(r1)
+#endif
+
 #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
 	/* Save our real return address in nvr for return */
 	.if \allregs == 0
 	SAVE_GPR(15, r1)
 	.endif
-	mr	r15, r7
+	mr	r15, r11
 	/*
-	 * We want the ftrace location in the function, but our lr (in r7)
+	 * We want the ftrace location in the function, but our lr (in r11)
 	 * points at the 'mtlr r0' instruction in the out of line stub.  To
 	 * recover the ftrace location, we read the branch instruction in the
 	 * stub, and adjust our lr by the branch offset.
 	 *
 	 * See ftrace_init_ool_stub() for the profile sequence.
 	 */
-	lwz	r8, MCOUNT_INSN_SIZE(r7)
+	lwz	r8, MCOUNT_INSN_SIZE(r11)
 	slwi	r8, r8, 6
 	srawi	r8, r8, 6
-	add	r3, r7, r8
+	add	r3, r11, r8
 	/*
 	 * Override our nip to point past the branch in the original function.
 	 * This allows reliable stack trace and the ftrace stack tracer to work as-is.
 	 */
-	addi	r7, r3, MCOUNT_INSN_SIZE
+	addi	r11, r3, MCOUNT_INSN_SIZE
 #else
 	/* Calculate ip from nip-4 into r3 for call below */
-	subi    r3, r7, MCOUNT_INSN_SIZE
+	subi    r3, r11, MCOUNT_INSN_SIZE
 #endif
 
 	/* Save NIP as pt_regs->nip */
-	PPC_STL	r7, _NIP(r1)
+	PPC_STL	r11, _NIP(r1)
 	/* Also save it in B's stackframe header for proper unwind */
-	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
+	PPC_STL	r11, LRSAVE+SWITCH_FRAME_SIZE(r1)
 #if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
-	mr	r14, r7		/* remember old NIP */
+	mr	r14, r11	/* remember old NIP */
 #endif
 
 	/* Put the original return address in r4 as parent_ip */
@@ -154,14 +179,32 @@
 .endm
 
 .macro	ftrace_regs_exit allregs
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Check orig_gpr3 to detect ftrace_direct call */
+	PPC_LL	r3, ORIG_GPR3(r1)
+	PPC_LCMPI cr1, r3, 0
+	mtctr	r3
+#endif
+
+	/* Restore possibly modified LR */
+	PPC_LL	r0, _LINK(r1)
+
 #ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
 	/* Load ctr with the possibly modified NIP */
 	PPC_LL	r3, _NIP(r1)
-	mtctr	r3
-
 #ifdef CONFIG_LIVEPATCH_64
 	cmpd	r14, r3		/* has NIP been altered? */
 #endif
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	beq	cr1,2f
+	mtlr	r3
+	b	3f
+#endif
+2:	mtctr	r3
+	mtlr	r0
+3:
+
 #else /* !CONFIG_PPC_FTRACE_OUT_OF_LINE */
 	/* Load LR with the possibly modified NIP */
 	PPC_LL	r3, _NIP(r1)
@@ -185,12 +228,6 @@
 #endif
 	.endif
 
-	/* Restore possibly modified LR */
-	PPC_LL	r0, _LINK(r1)
-#ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
-	mtlr	r0
-#endif
-
 #ifdef CONFIG_PPC64
 	/* Restore callee's TOC */
 	ld	r2, STK_GOT(r1)
@@ -203,8 +240,12 @@
         /* Based on the cmpd above, if the NIP was altered handle livepatch */
 	bne-	livepatch_handler
 #endif
+
 	/* jump after _mcount site */
 #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	bnectr	cr1
+#endif
 	/*
 	 * Return with blr to keep the link stack balanced. The function profiling sequence
 	 * uses 'mtlr r0' to restore LR.
@@ -260,6 +301,21 @@ ftrace_no_trace:
 #endif
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+.Lftrace_direct_call_regs:
+	mtctr	r12
+	REST_GPRS(11, 12, r1)
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	bctr
+.Lftrace_direct_call:
+	mtctr	r12
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	bctr
+SYM_FUNC_START(ftrace_stub_direct_tramp)
+	blr
+SYM_FUNC_END(ftrace_stub_direct_tramp)
+#endif
+
 #ifdef CONFIG_LIVEPATCH_64
 	/*
 	 * This function runs in the mcount context, between two functions. As
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 16/17] samples/ftrace: Add support for ftrace direct samples on powerpc
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (14 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 15/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-15 20:56 ` [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines Hari Bathini
  2024-10-09 15:46 ` [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and " Masahiro Yamada
  17 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.

On 64-bit powerpc, TOC setup requires some additional work.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                        |   2 +
 samples/ftrace/ftrace-direct-modify.c       |  85 +++++++++++++++-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 +++++++++++++++++++-
 samples/ftrace/ftrace-direct-multi.c        |  79 ++++++++++++++-
 samples/ftrace/ftrace-direct-too.c          |  83 +++++++++++++++-
 samples/ftrace/ftrace-direct.c              |  69 ++++++++++++-
 6 files changed, 414 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ef845ea4dd27..1e093ed287fe 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -275,6 +275,8 @@ config PPC
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_RSEQ
+	select HAVE_SAMPLE_FTRACE_DIRECT	if HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	select HAVE_SAMPLE_FTRACE_DIRECT_MULTI	if HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_SETUP_PER_CPU_AREA		if PPC64
 	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index 81220390851a..cfea7a38befb 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,7 @@
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -199,6 +199,89 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+"	bl		my_direct_func1\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+"	bl		my_direct_func2\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static unsigned long my_tramp = (unsigned long)my_tramp1;
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index f943e40d57fd..8f7986d698d8 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -2,7 +2,7 @@
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -225,6 +225,105 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#define PPC_FTRACE_RECOVER_IP				\
+"	lwz		8, 4(3)\n"			\
+"	li		9, 6\n"				\
+"	slw		8, 8, 9\n"			\
+"	sraw		8, 8, 9\n"			\
+"	add		3, 3, 8\n"			\
+"	addi		3, 3, 4\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#define PPC_FTRACE_RECOVER_IP ""
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mr		3, 0\n"
+	PPC_FTRACE_RECOVER_IP
+"	bl		my_direct_func1\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mr		3, 0\n"
+	PPC_FTRACE_RECOVER_IP
+"	bl		my_direct_func2\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+	"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static unsigned long my_tramp = (unsigned long)my_tramp1;
 static unsigned long tramps[2] = {
 	(unsigned long)my_tramp1,
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index aed6df2927ce..db326c81a27d 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -4,7 +4,7 @@
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
 #include <linux/sched/stat.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -141,6 +141,83 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#define PPC_FTRACE_RECOVER_IP				\
+"	lwz		8, 4(3)\n"			\
+"	li		9, 6\n"				\
+"	slw		8, 8, 9\n"			\
+"	sraw		8, 8, 9\n"			\
+"	add		3, 3, 8\n"			\
+"	addi		3, 3, 4\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#define PPC_FTRACE_RECOVER_IP ""
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mr		3, 0\n"
+	PPC_FTRACE_RECOVER_IP
+"	bl		my_direct_func\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_multi_init(void)
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index 6ff546a5d7eb..3d0fa260332d 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -3,7 +3,7 @@
 
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -153,6 +153,87 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 64
+#define STACK_FRAME_ARG1 32
+#define STACK_FRAME_ARG2 40
+#define STACK_FRAME_ARG3 48
+#define STACK_FRAME_ARG4 56
+#else
+#define STACK_FRAME_SIZE 32
+#define STACK_FRAME_ARG1 16
+#define STACK_FRAME_ARG2 20
+#define STACK_FRAME_ARG3 24
+#define STACK_FRAME_ARG4 28
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_ARG1)"(1)\n"
+	PPC_STL"	4, "__stringify(STACK_FRAME_ARG2)"(1)\n"
+	PPC_STL"	5, "__stringify(STACK_FRAME_ARG3)"(1)\n"
+	PPC_STL"	6, "__stringify(STACK_FRAME_ARG4)"(1)\n"
+"	bl		my_direct_func\n"
+	PPC_LL"		6, "__stringify(STACK_FRAME_ARG4)"(1)\n"
+	PPC_LL"		5, "__stringify(STACK_FRAME_ARG3)"(1)\n"
+	PPC_LL"		4, "__stringify(STACK_FRAME_ARG2)"(1)\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_ARG1)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_init(void)
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index ef0945670e1e..956834b0d19a 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -3,7 +3,7 @@
 
 #include <linux/sched.h> /* for wake_up_process() */
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -134,6 +134,73 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	bl		my_direct_func\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_init(void)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (15 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 16/17] samples/ftrace: Add support for ftrace direct samples on powerpc Hari Bathini
@ 2024-09-15 20:56 ` Hari Bathini
  2024-09-16 21:41   ` kernel test robot
  2024-09-17  7:50   ` Alexei Starovoitov
  2024-10-09 15:46 ` [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and " Masahiro Yamada
  17 siblings, 2 replies; 36+ messages in thread
From: Hari Bathini @ 2024-09-15 20:56 UTC (permalink / raw)
  To: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel
  Cc: Naveen N. Rao, Mark Rutland, Daniel Borkmann, Masahiro Yamada,
	Nicholas Piggin, Alexei Starovoitov, Steven Rostedt,
	Andrii Nakryiko, Christophe Leroy, Vishal Chourasia,
	Mahesh J Salgaonkar, Masami Hiramatsu

From: Naveen N Rao <naveen@kernel.org>

Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.

BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.

BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.

When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.

We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.

However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/include/asm/ppc-opcode.h |  14 +
 arch/powerpc/net/bpf_jit.h            |  12 +
 arch/powerpc/net/bpf_jit_comp.c       | 847 +++++++++++++++++++++++++-
 arch/powerpc/net/bpf_jit_comp32.c     |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c     |   7 +-
 5 files changed, 884 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index b98a9e982c03..4312bcb913a4 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -587,12 +587,26 @@
 #define PPC_RAW_MTSPR(spr, d)		(0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr))
 #define PPC_RAW_EIEIO()			(0x7c0006ac)
 
+/* bcl 20,31,$+4 */
+#define PPC_RAW_BCL4()			(0x429f0005)
 #define PPC_RAW_BRANCH(offset)		(0x48000000 | PPC_LI(offset))
 #define PPC_RAW_BL(offset)		(0x48000001 | PPC_LI(offset))
 #define PPC_RAW_TW(t0, a, b)		(0x7c000008 | ___PPC_RS(t0) | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_TRAP()			PPC_RAW_TW(31, 0, 0)
 #define PPC_RAW_SETB(t, bfa)		(0x7c000100 | ___PPC_RT(t) | ___PPC_RA((bfa) << 2))
 
+#ifdef CONFIG_PPC32
+#define PPC_RAW_STL		PPC_RAW_STW
+#define PPC_RAW_STLU		PPC_RAW_STWU
+#define PPC_RAW_LL		PPC_RAW_LWZ
+#define PPC_RAW_CMPLI		PPC_RAW_CMPWI
+#else
+#define PPC_RAW_STL		PPC_RAW_STD
+#define PPC_RAW_STLU		PPC_RAW_STDU
+#define PPC_RAW_LL		PPC_RAW_LD
+#define PPC_RAW_CMPLI		PPC_RAW_CMPDI
+#endif
+
 /* Deal with instructions that older assemblers aren't aware of */
 #define	PPC_BCCTR_FLUSH		stringify_in_c(.long PPC_INST_BCCTR_FLUSH)
 #define	PPC_CP_ABORT		stringify_in_c(.long PPC_RAW_CP_ABORT)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index cdea5dccaefe..2d04ce5a23da 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -12,6 +12,7 @@
 
 #include <asm/types.h>
 #include <asm/ppc-opcode.h>
+#include <linux/build_bug.h>
 
 #ifdef CONFIG_PPC64_ELF_ABI_V1
 #define FUNCTION_DESCR_SIZE	24
@@ -21,6 +22,9 @@
 
 #define CTX_NIA(ctx) ((unsigned long)ctx->idx * 4)
 
+#define SZL			sizeof(unsigned long)
+#define BPF_INSN_SAFETY		64
+
 #define PLANT_INSTR(d, idx, instr)					      \
 	do { if (d) { (d)[idx] = instr; } idx++; } while (0)
 #define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
@@ -81,6 +85,13 @@
 				EMIT(PPC_RAW_ORI(d, d, (uintptr_t)(i) &       \
 							0xffff));             \
 		} } while (0)
+#define PPC_LI_ADDR	PPC_LI64
+#define PPC64_LOAD_PACA()						      \
+	EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)))
+#else
+#define PPC_LI64(d, i)	BUILD_BUG()
+#define PPC_LI_ADDR	PPC_LI32
+#define PPC64_LOAD_PACA() BUILD_BUG()
 #endif
 
 /*
@@ -165,6 +176,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
 		       u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 2a36cc2e7e9e..79e85d595c82 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -22,11 +22,81 @@
 
 #include "bpf_jit.h"
 
+/* These offsets are from bpf prog end and stay the same across progs */
+static int bpf_jit_ool_stub, bpf_jit_long_branch_stub;
+
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
 {
 	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
+void dummy_tramp(void);
+
+asm (
+"	.pushsection .text, \"ax\", @progbits	;"
+"	.global dummy_tramp			;"
+"	.type dummy_tramp, @function		;"
+"dummy_tramp:					;"
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+"	blr					;"
+#else
+/* LR is always in r11, so we don't need a 'mflr r11' here */
+"	mtctr	11				;"
+"	mtlr	0				;"
+"	bctr					;"
+#endif
+"	.size dummy_tramp, .-dummy_tramp	;"
+"	.popsection				;"
+);
+
+void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
+{
+	int ool_stub_idx, long_branch_stub_idx;
+
+	/*
+	 * Out-of-line stub:
+	 *	mflr	r0
+	 *	[b|bl]	tramp
+	 *	mtlr	r0 // only with CONFIG_PPC_FTRACE_OUT_OF_LINE
+	 *	b	bpf_func + 4
+	 */
+	ool_stub_idx = ctx->idx;
+	EMIT(PPC_RAW_MFLR(_R0));
+	EMIT(PPC_RAW_NOP());
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+		EMIT(PPC_RAW_MTLR(_R0));
+	WARN_ON_ONCE(!is_offset_in_branch_range(4 - (long)ctx->idx * 4));
+	EMIT(PPC_RAW_BRANCH(4 - (long)ctx->idx * 4));
+
+	/*
+	 * Long branch stub:
+	 *	.long	<dummy_tramp_addr>
+	 *	mflr	r11
+	 *	bcl	20,31,$+4
+	 *	mflr	r12
+	 *	ld	r12, -8-SZL(r12)
+	 *	mtctr	r12
+	 *	mtlr	r11 // needed to retain ftrace ABI
+	 *	bctr
+	 */
+	if (image)
+		*((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
+	ctx->idx += SZL / 4;
+	long_branch_stub_idx = ctx->idx;
+	EMIT(PPC_RAW_MFLR(_R11));
+	EMIT(PPC_RAW_BCL4());
+	EMIT(PPC_RAW_MFLR(_R12));
+	EMIT(PPC_RAW_LL(_R12, _R12, -8-SZL));
+	EMIT(PPC_RAW_MTCTR(_R12));
+	EMIT(PPC_RAW_MTLR(_R11));
+	EMIT(PPC_RAW_BCTR());
+
+	if (!bpf_jit_ool_stub) {
+		bpf_jit_ool_stub = (ctx->idx - ool_stub_idx) * 4;
+		bpf_jit_long_branch_stub = (ctx->idx - long_branch_stub_idx) * 4;
+	}
+}
+
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
 {
 	if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
@@ -222,7 +292,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 	fp->bpf_func = (void *)fimage;
 	fp->jited = 1;
-	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
+	fp->jited_len = cgctx.idx * 4 + FUNCTION_DESCR_SIZE;
 
 	if (!fp->is_func || extra_pass) {
 		if (bpf_jit_binary_pack_finalize(fhdr, hdr)) {
@@ -369,3 +439,778 @@ bool bpf_jit_supports_far_kfunc_call(void)
 {
 	return IS_ENABLED(CONFIG_PPC64);
 }
+
+void *arch_alloc_bpf_trampoline(unsigned int size)
+{
+	return bpf_prog_pack_alloc(size, bpf_jit_fill_ill_insns);
+}
+
+void arch_free_bpf_trampoline(void *image, unsigned int size)
+{
+	bpf_prog_pack_free(image, size);
+}
+
+int arch_protect_bpf_trampoline(void *image, unsigned int size)
+{
+	return 0;
+}
+
+static int invoke_bpf_prog(u32 *image, u32 *ro_image, struct codegen_context *ctx,
+			   struct bpf_tramp_link *l, int regs_off, int retval_off,
+			   int run_ctx_off, bool save_ret)
+{
+	struct bpf_prog *p = l->link.prog;
+	ppc_inst_t branch_insn;
+	u32 jmp_idx;
+	int ret = 0;
+
+	/* Save cookie */
+	if (IS_ENABLED(CONFIG_PPC64)) {
+		PPC_LI64(_R3, l->cookie);
+		EMIT(PPC_RAW_STD(_R3, _R1, run_ctx_off + offsetof(struct bpf_tramp_run_ctx,
+				 bpf_cookie)));
+	} else {
+		PPC_LI32(_R3, l->cookie >> 32);
+		PPC_LI32(_R4, l->cookie);
+		EMIT(PPC_RAW_STW(_R3, _R1,
+				 run_ctx_off + offsetof(struct bpf_tramp_run_ctx, bpf_cookie)));
+		EMIT(PPC_RAW_STW(_R4, _R1,
+				 run_ctx_off + offsetof(struct bpf_tramp_run_ctx, bpf_cookie) + 4));
+	}
+
+	/* __bpf_prog_enter(p, &bpf_tramp_run_ctx) */
+	PPC_LI_ADDR(_R3, p);
+	EMIT(PPC_RAW_MR(_R25, _R3));
+	EMIT(PPC_RAW_ADDI(_R4, _R1, run_ctx_off));
+	ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx,
+					 (unsigned long)bpf_trampoline_enter(p));
+	if (ret)
+		return ret;
+
+	/* Remember prog start time returned by __bpf_prog_enter */
+	EMIT(PPC_RAW_MR(_R26, _R3));
+
+	/*
+	 * if (__bpf_prog_enter(p) == 0)
+	 *	goto skip_exec_of_prog;
+	 *
+	 * Emit a nop to be later patched with conditional branch, once offset is known
+	 */
+	EMIT(PPC_RAW_CMPLI(_R3, 0));
+	jmp_idx = ctx->idx;
+	EMIT(PPC_RAW_NOP());
+
+	/* p->bpf_func(ctx) */
+	EMIT(PPC_RAW_ADDI(_R3, _R1, regs_off));
+	if (!p->jited)
+		PPC_LI_ADDR(_R4, (unsigned long)p->insnsi);
+	if (!create_branch(&branch_insn, (u32 *)&ro_image[ctx->idx], (unsigned long)p->bpf_func,
+			   BRANCH_SET_LINK)) {
+		if (image)
+			image[ctx->idx] = ppc_inst_val(branch_insn);
+		ctx->idx++;
+	} else {
+		EMIT(PPC_RAW_LL(_R12, _R25, offsetof(struct bpf_prog, bpf_func)));
+		EMIT(PPC_RAW_MTCTR(_R12));
+		EMIT(PPC_RAW_BCTRL());
+	}
+
+	if (save_ret)
+		EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
+
+	/* Fix up branch */
+	if (image) {
+		if (create_cond_branch(&branch_insn, &image[jmp_idx],
+				       (unsigned long)&image[ctx->idx], COND_EQ << 16))
+			return -EINVAL;
+		image[jmp_idx] = ppc_inst_val(branch_insn);
+	}
+
+	/* __bpf_prog_exit(p, start_time, &bpf_tramp_run_ctx) */
+	EMIT(PPC_RAW_MR(_R3, _R25));
+	EMIT(PPC_RAW_MR(_R4, _R26));
+	EMIT(PPC_RAW_ADDI(_R5, _R1, run_ctx_off));
+	ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx,
+					 (unsigned long)bpf_trampoline_exit(p));
+
+	return ret;
+}
+
+static int invoke_bpf_mod_ret(u32 *image, u32 *ro_image, struct codegen_context *ctx,
+			      struct bpf_tramp_links *tl, int regs_off, int retval_off,
+			      int run_ctx_off, u32 *branches)
+{
+	int i;
+
+	/*
+	 * The first fmod_ret program will receive a garbage return value.
+	 * Set this to 0 to avoid confusing the program.
+	 */
+	EMIT(PPC_RAW_LI(_R3, 0));
+	EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
+	for (i = 0; i < tl->nr_links; i++) {
+		if (invoke_bpf_prog(image, ro_image, ctx, tl->links[i], regs_off, retval_off,
+				    run_ctx_off, true))
+			return -EINVAL;
+
+		/*
+		 * mod_ret prog stored return value after prog ctx. Emit:
+		 * if (*(u64 *)(ret_val) !=  0)
+		 *	goto do_fexit;
+		 */
+		EMIT(PPC_RAW_LL(_R3, _R1, retval_off));
+		EMIT(PPC_RAW_CMPLI(_R3, 0));
+
+		/*
+		 * Save the location of the branch and generate a nop, which is
+		 * replaced with a conditional jump once do_fexit (i.e. the
+		 * start of the fexit invocation) is finalized.
+		 */
+		branches[i] = ctx->idx;
+		EMIT(PPC_RAW_NOP());
+	}
+
+	return 0;
+}
+
+static void bpf_trampoline_setup_tail_call_cnt(u32 *image, struct codegen_context *ctx,
+					       int func_frame_offset, int r4_off)
+{
+	if (IS_ENABLED(CONFIG_PPC64)) {
+		/* See bpf_jit_stack_tailcallcnt() */
+		int tailcallcnt_offset = 6 * 8;
+
+		EMIT(PPC_RAW_LL(_R3, _R1, func_frame_offset - tailcallcnt_offset));
+		EMIT(PPC_RAW_STL(_R3, _R1, -tailcallcnt_offset));
+	} else {
+		/* See bpf_jit_stack_offsetof() and BPF_PPC_TC */
+		EMIT(PPC_RAW_LL(_R4, _R1, r4_off));
+	}
+}
+
+static void bpf_trampoline_restore_tail_call_cnt(u32 *image, struct codegen_context *ctx,
+						 int func_frame_offset, int r4_off)
+{
+	if (IS_ENABLED(CONFIG_PPC64)) {
+		/* See bpf_jit_stack_tailcallcnt() */
+		int tailcallcnt_offset = 6 * 8;
+
+		EMIT(PPC_RAW_LL(_R3, _R1, -tailcallcnt_offset));
+		EMIT(PPC_RAW_STL(_R3, _R1, func_frame_offset - tailcallcnt_offset));
+	} else {
+		/* See bpf_jit_stack_offsetof() and BPF_PPC_TC */
+		EMIT(PPC_RAW_STL(_R4, _R1, r4_off));
+	}
+}
+
+static void bpf_trampoline_save_args(u32 *image, struct codegen_context *ctx, int func_frame_offset,
+				     int nr_regs, int regs_off)
+{
+	int param_save_area_offset;
+
+	param_save_area_offset = func_frame_offset; /* the two frames we alloted */
+	param_save_area_offset += STACK_FRAME_MIN_SIZE; /* param save area is past frame header */
+
+	for (int i = 0; i < nr_regs; i++) {
+		if (i < 8) {
+			EMIT(PPC_RAW_STL(_R3 + i, _R1, regs_off + i * SZL));
+		} else {
+			EMIT(PPC_RAW_LL(_R3, _R1, param_save_area_offset + i * SZL));
+			EMIT(PPC_RAW_STL(_R3, _R1, regs_off + i * SZL));
+		}
+	}
+}
+
+/* Used when restoring just the register parameters when returning back */
+static void bpf_trampoline_restore_args_regs(u32 *image, struct codegen_context *ctx,
+					     int nr_regs, int regs_off)
+{
+	for (int i = 0; i < nr_regs && i < 8; i++)
+		EMIT(PPC_RAW_LL(_R3 + i, _R1, regs_off + i * SZL));
+}
+
+/* Used when we call into the traced function. Replicate parameter save area */
+static void bpf_trampoline_restore_args_stack(u32 *image, struct codegen_context *ctx,
+					      int func_frame_offset, int nr_regs, int regs_off)
+{
+	int param_save_area_offset;
+
+	param_save_area_offset = func_frame_offset; /* the two frames we alloted */
+	param_save_area_offset += STACK_FRAME_MIN_SIZE; /* param save area is past frame header */
+
+	for (int i = 8; i < nr_regs; i++) {
+		EMIT(PPC_RAW_LL(_R3, _R1, param_save_area_offset + i * SZL));
+		EMIT(PPC_RAW_STL(_R3, _R1, STACK_FRAME_MIN_SIZE + i * SZL));
+	}
+	bpf_trampoline_restore_args_regs(image, ctx, nr_regs, regs_off);
+}
+
+static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_image,
+					 void *rw_image_end, void *ro_image,
+					 const struct btf_func_model *m, u32 flags,
+					 struct bpf_tramp_links *tlinks,
+					 void *func_addr)
+{
+	int regs_off, nregs_off, ip_off, run_ctx_off, retval_off, nvr_off, alt_lr_off, r4_off;
+	int i, ret, nr_regs, bpf_frame_size = 0, bpf_dummy_frame_size = 0, func_frame_offset;
+	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+	struct codegen_context codegen_ctx, *ctx;
+	u32 *image = (u32 *)rw_image;
+	ppc_inst_t branch_insn;
+	u32 *branches = NULL;
+	bool save_ret;
+
+	if (IS_ENABLED(CONFIG_PPC32))
+		return -EOPNOTSUPP;
+
+	nr_regs = m->nr_args;
+	/* Extra registers for struct arguments */
+	for (i = 0; i < m->nr_args; i++)
+		if (m->arg_size[i] > SZL)
+			nr_regs += round_up(m->arg_size[i], SZL) / SZL - 1;
+
+	if (nr_regs > MAX_BPF_FUNC_ARGS)
+		return -EOPNOTSUPP;
+
+	ctx = &codegen_ctx;
+	memset(ctx, 0, sizeof(*ctx));
+
+	/*
+	 * Generated stack layout:
+	 *
+	 * func prev back chain         [ back chain        ]
+	 *                              [                   ]
+	 * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
+	 *                              [                   ] --
+	 * LR save area                 [ r0 save (64-bit)  ]   | header
+	 *                              [ r0 save (32-bit)  ]   |
+	 * dummy frame for unwind       [ back chain 1      ] --
+	 *                              [ padding           ] align stack frame
+	 *       r4_off                 [ r4 (tailcallcnt)  ] optional - 32-bit powerpc
+	 *       alt_lr_off             [ real lr (ool stub)] optional - actual lr
+	 *                              [ r26               ]
+	 *       nvr_off                [ r25               ] nvr save area
+	 *       retval_off             [ return value      ]
+	 *                              [ reg argN          ]
+	 *                              [ ...               ]
+	 *       regs_off               [ reg_arg1          ] prog ctx context
+	 *       nregs_off              [ args count        ]
+	 *       ip_off                 [ traced function   ]
+	 *                              [ ...               ]
+	 *       run_ctx_off            [ bpf_tramp_run_ctx ]
+	 *                              [ reg argN          ]
+	 *                              [ ...               ]
+	 *       param_save_area        [ reg_arg1          ] min 8 doublewords, per ABI
+	 *                              [ TOC save (64-bit) ] --
+	 *                              [ LR save (64-bit)  ]   | header
+	 *                              [ LR save (32-bit)  ]   |
+	 * bpf trampoline frame	        [ back chain 2      ] --
+	 *
+	 */
+
+	/* Minimum stack frame header */
+	bpf_frame_size = STACK_FRAME_MIN_SIZE;
+
+	/*
+	 * Room for parameter save area.
+	 *
+	 * As per the ABI, this is required if we call into the traced
+	 * function (BPF_TRAMP_F_CALL_ORIG):
+	 * - if the function takes more than 8 arguments for the rest to spill onto the stack
+	 * - or, if the function has variadic arguments
+	 * - or, if this functions's prototype was not available to the caller
+	 *
+	 * Reserve space for at least 8 registers for now. This can be optimized later.
+	 */
+	bpf_frame_size += (nr_regs > 8 ? nr_regs : 8) * SZL;
+
+	/* Room for struct bpf_tramp_run_ctx */
+	run_ctx_off = bpf_frame_size;
+	bpf_frame_size += round_up(sizeof(struct bpf_tramp_run_ctx), SZL);
+
+	/* Room for IP address argument */
+	ip_off = bpf_frame_size;
+	if (flags & BPF_TRAMP_F_IP_ARG)
+		bpf_frame_size += SZL;
+
+	/* Room for args count */
+	nregs_off = bpf_frame_size;
+	bpf_frame_size += SZL;
+
+	/* Room for args */
+	regs_off = bpf_frame_size;
+	bpf_frame_size += nr_regs * SZL;
+
+	/* Room for return value of func_addr or fentry prog */
+	retval_off = bpf_frame_size;
+	save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+	if (save_ret)
+		bpf_frame_size += SZL;
+
+	/* Room for nvr save area */
+	nvr_off = bpf_frame_size;
+	bpf_frame_size += 2 * SZL;
+
+	/* Optional save area for actual LR in case of ool ftrace */
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
+		alt_lr_off = bpf_frame_size;
+		bpf_frame_size += SZL;
+	}
+
+	if (IS_ENABLED(CONFIG_PPC32)) {
+		if (nr_regs < 2) {
+			r4_off = bpf_frame_size;
+			bpf_frame_size += SZL;
+		} else {
+			r4_off = regs_off + SZL;
+		}
+	}
+
+	/* Padding to align stack frame, if any */
+	bpf_frame_size = round_up(bpf_frame_size, SZL * 2);
+
+	/* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
+	bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
+
+	/* Offset to the traced function's stack frame */
+	func_frame_offset = bpf_dummy_frame_size + bpf_frame_size;
+
+	/* Create dummy frame for unwind, store original return value */
+	EMIT(PPC_RAW_STL(_R0, _R1, PPC_LR_STKOFF));
+	/* Protect red zone where tail call count goes */
+	EMIT(PPC_RAW_STLU(_R1, _R1, -bpf_dummy_frame_size));
+
+	/* Create our stack frame */
+	EMIT(PPC_RAW_STLU(_R1, _R1, -bpf_frame_size));
+
+	/* 64-bit: Save TOC and load kernel TOC */
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
+		EMIT(PPC_RAW_STD(_R2, _R1, 24));
+		PPC64_LOAD_PACA();
+	}
+
+	/* 32-bit: save tail call count in r4 */
+	if (IS_ENABLED(CONFIG_PPC32) && nr_regs < 2)
+		EMIT(PPC_RAW_STL(_R4, _R1, r4_off));
+
+	bpf_trampoline_save_args(image, ctx, func_frame_offset, nr_regs, regs_off);
+
+	/* Save our return address */
+	EMIT(PPC_RAW_MFLR(_R3));
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+		EMIT(PPC_RAW_STL(_R3, _R1, alt_lr_off));
+	else
+		EMIT(PPC_RAW_STL(_R3, _R1, bpf_frame_size + PPC_LR_STKOFF));
+
+	/*
+	 * Save ip address of the traced function.
+	 * We could recover this from LR, but we will need to address for OOL trampoline,
+	 * and optional GEP area.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) || flags & BPF_TRAMP_F_IP_ARG) {
+		EMIT(PPC_RAW_LWZ(_R4, _R3, 4));
+		EMIT(PPC_RAW_SLWI(_R4, _R4, 6));
+		EMIT(PPC_RAW_SRAWI(_R4, _R4, 6));
+		EMIT(PPC_RAW_ADD(_R3, _R3, _R4));
+		EMIT(PPC_RAW_ADDI(_R3, _R3, 4));
+	}
+
+	if (flags & BPF_TRAMP_F_IP_ARG)
+		EMIT(PPC_RAW_STL(_R3, _R1, ip_off));
+
+	if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
+		/* Fake our LR for unwind */
+		EMIT(PPC_RAW_STL(_R3, _R1, bpf_frame_size + PPC_LR_STKOFF));
+
+	/* Save function arg count -- see bpf_get_func_arg_cnt() */
+	EMIT(PPC_RAW_LI(_R3, nr_regs));
+	EMIT(PPC_RAW_STL(_R3, _R1, nregs_off));
+
+	/* Save nv regs */
+	EMIT(PPC_RAW_STL(_R25, _R1, nvr_off));
+	EMIT(PPC_RAW_STL(_R26, _R1, nvr_off + SZL));
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		PPC_LI_ADDR(_R3, (unsigned long)im);
+		ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx,
+						 (unsigned long)__bpf_tramp_enter);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < fentry->nr_links; i++)
+		if (invoke_bpf_prog(image, ro_image, ctx, fentry->links[i], regs_off, retval_off,
+				    run_ctx_off, flags & BPF_TRAMP_F_RET_FENTRY_RET))
+			return -EINVAL;
+
+	if (fmod_ret->nr_links) {
+		branches = kcalloc(fmod_ret->nr_links, sizeof(u32), GFP_KERNEL);
+		if (!branches)
+			return -ENOMEM;
+
+		if (invoke_bpf_mod_ret(image, ro_image, ctx, fmod_ret, regs_off, retval_off,
+				       run_ctx_off, branches)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+	}
+
+	/* Call the traced function */
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		/*
+		 * The address in LR save area points to the correct point in the original function
+		 * with both PPC_FTRACE_OUT_OF_LINE as well as with traditional ftrace instruction
+		 * sequence
+		 */
+		EMIT(PPC_RAW_LL(_R3, _R1, bpf_frame_size + PPC_LR_STKOFF));
+		EMIT(PPC_RAW_MTCTR(_R3));
+
+		/* Replicate tail_call_cnt before calling the original BPF prog */
+		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+			bpf_trampoline_setup_tail_call_cnt(image, ctx, func_frame_offset, r4_off);
+
+		/* Restore args */
+		bpf_trampoline_restore_args_stack(image, ctx, func_frame_offset, nr_regs, regs_off);
+
+		/* Restore TOC for 64-bit */
+		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+			EMIT(PPC_RAW_LD(_R2, _R1, 24));
+		EMIT(PPC_RAW_BCTRL());
+		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+			PPC64_LOAD_PACA();
+
+		/* Store return value for bpf prog to access */
+		EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
+
+		/* Restore updated tail_call_cnt */
+		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+			bpf_trampoline_restore_tail_call_cnt(image, ctx, func_frame_offset, r4_off);
+
+		/* Reserve space to patch branch instruction to skip fexit progs */
+		im->ip_after_call = &((u32 *)ro_image)[ctx->idx];
+		EMIT(PPC_RAW_NOP());
+	}
+
+	/* Update branches saved in invoke_bpf_mod_ret with address of do_fexit */
+	for (i = 0; i < fmod_ret->nr_links && image; i++) {
+		if (create_cond_branch(&branch_insn, &image[branches[i]],
+				       (unsigned long)&image[ctx->idx], COND_NE << 16)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+
+		image[branches[i]] = ppc_inst_val(branch_insn);
+	}
+
+	for (i = 0; i < fexit->nr_links; i++)
+		if (invoke_bpf_prog(image, ro_image, ctx, fexit->links[i], regs_off, retval_off,
+				    run_ctx_off, false)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		im->ip_epilogue = &((u32 *)ro_image)[ctx->idx];
+		PPC_LI_ADDR(_R3, im);
+		ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx,
+						 (unsigned long)__bpf_tramp_exit);
+		if (ret)
+			goto cleanup;
+	}
+
+	if (flags & BPF_TRAMP_F_RESTORE_REGS)
+		bpf_trampoline_restore_args_regs(image, ctx, nr_regs, regs_off);
+
+	/* Restore return value of func_addr or fentry prog */
+	if (save_ret)
+		EMIT(PPC_RAW_LL(_R3, _R1, retval_off));
+
+	/* Restore nv regs */
+	EMIT(PPC_RAW_LL(_R26, _R1, nvr_off + SZL));
+	EMIT(PPC_RAW_LL(_R25, _R1, nvr_off));
+
+	/* Epilogue */
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+		EMIT(PPC_RAW_LD(_R2, _R1, 24));
+	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+		/* Skip the traced function and return to parent */
+		EMIT(PPC_RAW_ADDI(_R1, _R1, func_frame_offset));
+		EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
+		EMIT(PPC_RAW_MTLR(_R0));
+		EMIT(PPC_RAW_BLR());
+	} else {
+		if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
+			EMIT(PPC_RAW_LL(_R0, _R1, alt_lr_off));
+			EMIT(PPC_RAW_MTLR(_R0));
+			EMIT(PPC_RAW_ADDI(_R1, _R1, func_frame_offset));
+			EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
+			EMIT(PPC_RAW_BLR());
+		} else {
+			EMIT(PPC_RAW_LL(_R0, _R1, bpf_frame_size + PPC_LR_STKOFF));
+			EMIT(PPC_RAW_MTCTR(_R0));
+			EMIT(PPC_RAW_ADDI(_R1, _R1, func_frame_offset));
+			EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
+			EMIT(PPC_RAW_MTLR(_R0));
+			EMIT(PPC_RAW_BCTR());
+		}
+	}
+
+	/* Make sure the trampoline generation logic doesn't overflow */
+	if (image && WARN_ON_ONCE(&image[ctx->idx] > (u32 *)rw_image_end - BPF_INSN_SAFETY)) {
+		ret = -EFAULT;
+		goto cleanup;
+	}
+	ret = ctx->idx * 4 + BPF_INSN_SAFETY * 4;
+
+cleanup:
+	kfree(branches);
+	return ret;
+}
+
+int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
+			     struct bpf_tramp_links *tlinks, void *func_addr)
+{
+	struct bpf_tramp_image im;
+	void *image;
+	int ret;
+
+	/*
+	 * Allocate a temporary buffer for __arch_prepare_bpf_trampoline().
+	 * This will NOT cause fragmentation in direct map, as we do not
+	 * call set_memory_*() on this buffer.
+	 *
+	 * We cannot use kvmalloc here, because we need image to be in
+	 * module memory range.
+	 */
+	image = bpf_jit_alloc_exec(PAGE_SIZE);
+	if (!image)
+		return -ENOMEM;
+
+	ret = __arch_prepare_bpf_trampoline(&im, image, image + PAGE_SIZE, image,
+					    m, flags, tlinks, func_addr);
+	bpf_jit_free_exec(image);
+
+	return ret;
+}
+
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
+				const struct btf_func_model *m, u32 flags,
+				struct bpf_tramp_links *tlinks,
+				void *func_addr)
+{
+	u32 size = image_end - image;
+	void *rw_image, *tmp;
+	int ret;
+
+	/*
+	 * rw_image doesn't need to be in module memory range, so we can
+	 * use kvmalloc.
+	 */
+	rw_image = kvmalloc(size, GFP_KERNEL);
+	if (!rw_image)
+		return -ENOMEM;
+
+	ret = __arch_prepare_bpf_trampoline(im, rw_image, rw_image + size, image, m,
+					    flags, tlinks, func_addr);
+	if (ret < 0)
+		goto out;
+
+	if (bpf_jit_enable > 1)
+		bpf_jit_dump(1, ret - BPF_INSN_SAFETY * 4, 1, rw_image);
+
+	tmp = bpf_arch_text_copy(image, rw_image, size);
+	if (IS_ERR(tmp))
+		ret = PTR_ERR(tmp);
+
+out:
+	kvfree(rw_image);
+	return ret;
+}
+
+static int bpf_modify_inst(void *ip, ppc_inst_t old_inst, ppc_inst_t new_inst)
+{
+	ppc_inst_t org_inst;
+
+	if (copy_inst_from_kernel_nofault(&org_inst, ip)) {
+		pr_err("0x%lx: fetching instruction failed\n", (unsigned long)ip);
+		return -EFAULT;
+	}
+
+	if (!ppc_inst_equal(org_inst, old_inst)) {
+		pr_err("0x%lx: expected (%08lx) != found (%08lx)\n",
+		       (unsigned long)ip, ppc_inst_as_ulong(old_inst), ppc_inst_as_ulong(org_inst));
+		return -EINVAL;
+	}
+
+	if (ppc_inst_equal(old_inst, new_inst))
+		return 0;
+
+	return patch_instruction(ip, new_inst);
+}
+
+static void do_isync(void *info __maybe_unused)
+{
+	isync();
+}
+
+/*
+ * A 3-step process for bpf prog entry:
+ * 1. At bpf prog entry, a single nop/b:
+ * bpf_func:
+ *	[nop|b]	ool_stub
+ * 2. Out-of-line stub:
+ * ool_stub:
+ *	mflr	r0
+ *	[b|bl]	<bpf_prog>/<long_branch_stub>
+ *	mtlr	r0 // CONFIG_PPC_FTRACE_OUT_OF_LINE only
+ *	b	bpf_func + 4
+ * 3. Long branch stub:
+ * long_branch_stub:
+ *	.long	<branch_addr>/<dummy_tramp>
+ *	mflr	r11
+ *	bcl	20,31,$+4
+ *	mflr	r12
+ *	ld	r12, -16(r12)
+ *	mtctr	r12
+ *	mtlr	r11 // needed to retain ftrace ABI
+ *	bctr
+ *
+ * dummy_tramp is used to reduce synchronization requirements.
+ *
+ * When attaching a bpf trampoline to a bpf prog, we do not need any
+ * synchronization here since we always have a valid branch target regardless
+ * of the order in which the above stores are seen. dummy_tramp ensures that
+ * the long_branch stub goes to a valid destination on other cpus, even when
+ * the branch to the long_branch stub is seen before the updated trampoline
+ * address.
+ *
+ * However, when detaching a bpf trampoline from a bpf prog, or if changing
+ * the bpf trampoline address, we need synchronization to ensure that other
+ * cpus can no longer branch into the older trampoline so that it can be
+ * safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
+ * make forward progress, but we still need to ensure that other cpus
+ * execute isync (or some CSI) so that they don't go back into the
+ * trampoline again.
+ */
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
+		       void *old_addr, void *new_addr)
+{
+	unsigned long bpf_func, bpf_func_end, size, offset;
+	ppc_inst_t old_inst, new_inst;
+	int ret = 0, branch_flags;
+	char name[KSYM_NAME_LEN];
+
+	if (IS_ENABLED(CONFIG_PPC32))
+		return -EOPNOTSUPP;
+
+	bpf_func = (unsigned long)ip;
+	branch_flags = poke_type == BPF_MOD_CALL ? BRANCH_SET_LINK : 0;
+
+	/* We currently only support poking bpf programs */
+	if (!__bpf_address_lookup(bpf_func, &size, &offset, name)) {
+		pr_err("%s (0x%lx): kernel/modules are not supported\n", __func__, bpf_func);
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * If we are not poking at bpf prog entry, then we are simply patching in/out
+	 * an unconditional branch instruction at im->ip_after_call
+	 */
+	if (offset) {
+		if (poke_type != BPF_MOD_JUMP) {
+			pr_err("%s (0x%lx): calls are not supported in bpf prog body\n", __func__,
+			       bpf_func);
+			return -EOPNOTSUPP;
+		}
+		old_inst = ppc_inst(PPC_RAW_NOP());
+		if (old_addr)
+			if (create_branch(&old_inst, ip, (unsigned long)old_addr, 0))
+				return -ERANGE;
+		new_inst = ppc_inst(PPC_RAW_NOP());
+		if (new_addr)
+			if (create_branch(&new_inst, ip, (unsigned long)new_addr, 0))
+				return -ERANGE;
+		mutex_lock(&text_mutex);
+		ret = bpf_modify_inst(ip, old_inst, new_inst);
+		mutex_unlock(&text_mutex);
+
+		/* Make sure all cpus see the new instruction */
+		smp_call_function(do_isync, NULL, 1);
+		return ret;
+	}
+
+	bpf_func_end = bpf_func + size;
+
+	/* Address of the jmp/call instruction in the out-of-line stub */
+	ip = (void *)(bpf_func_end - bpf_jit_ool_stub + 4);
+
+	if (!is_offset_in_branch_range((long)ip - 4 - bpf_func)) {
+		pr_err("%s (0x%lx): bpf prog too large, ool stub out of branch range\n", __func__,
+		       bpf_func);
+		return -ERANGE;
+	}
+
+	old_inst = ppc_inst(PPC_RAW_NOP());
+	if (old_addr) {
+		if (is_offset_in_branch_range(ip - old_addr))
+			create_branch(&old_inst, ip, (unsigned long)old_addr, branch_flags);
+		else
+			create_branch(&old_inst, ip, bpf_func_end - bpf_jit_long_branch_stub,
+				      branch_flags);
+	}
+	new_inst = ppc_inst(PPC_RAW_NOP());
+	if (new_addr) {
+		if (is_offset_in_branch_range(ip - new_addr))
+			create_branch(&new_inst, ip, (unsigned long)new_addr, branch_flags);
+		else
+			create_branch(&new_inst, ip, bpf_func_end - bpf_jit_long_branch_stub,
+				      branch_flags);
+	}
+
+	mutex_lock(&text_mutex);
+
+	/*
+	 * 1. Update the address in the long branch stub:
+	 * If new_addr is out of range, we will have to use the long branch stub, so patch new_addr
+	 * here. Otherwise, revert to dummy_tramp, but only if we had patched old_addr here.
+	 */
+	if ((new_addr && !is_offset_in_branch_range(new_addr - ip)) ||
+	    (old_addr && !is_offset_in_branch_range(old_addr - ip)))
+		ret = patch_ulong((void *)(bpf_func_end - bpf_jit_long_branch_stub - SZL),
+				  (new_addr && !is_offset_in_branch_range(new_addr - ip)) ?
+				  (unsigned long)new_addr : (unsigned long)dummy_tramp);
+	if (ret)
+		goto out;
+
+	/* 2. Update the branch/call in the out-of-line stub */
+	ret = bpf_modify_inst(ip, old_inst, new_inst);
+	if (ret)
+		goto out;
+
+	/* 3. Update instruction at bpf prog entry */
+	ip = (void *)bpf_func;
+	if (!old_addr || !new_addr) {
+		if (!old_addr) {
+			old_inst = ppc_inst(PPC_RAW_NOP());
+			create_branch(&new_inst, ip, bpf_func_end - bpf_jit_ool_stub, 0);
+		} else {
+			new_inst = ppc_inst(PPC_RAW_NOP());
+			create_branch(&old_inst, ip, bpf_func_end - bpf_jit_ool_stub, 0);
+		}
+		ret = bpf_modify_inst(ip, old_inst, new_inst);
+	}
+
+out:
+	mutex_unlock(&text_mutex);
+
+	/*
+	 * Sync only if we are not attaching a trampoline to a bpf prog so the older
+	 * trampoline can be freed safely.
+	 */
+	if (old_addr)
+		smp_call_function(do_isync, NULL, 1);
+
+	return ret;
+}
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index a0c4f1bde83e..c4db278dae36 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -127,13 +127,16 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 {
 	int i;
 
+	/* Instruction for trampoline attach */
+	EMIT(PPC_RAW_NOP());
+
 	/* Initialize tail_call_cnt, to be skipped if we do tail calls. */
 	if (ctx->seen & SEEN_TAILCALL)
 		EMIT(PPC_RAW_LI(_R4, 0));
 	else
 		EMIT(PPC_RAW_NOP());
 
-#define BPF_TAILCALL_PROLOGUE_SIZE	4
+#define BPF_TAILCALL_PROLOGUE_SIZE	8
 
 	if (bpf_has_stack_frame(ctx))
 		EMIT(PPC_RAW_STWU(_R1, _R1, -BPF_PPC_STACKFRAME(ctx)));
@@ -198,6 +201,8 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	bpf_jit_emit_common_epilogue(image, ctx);
 
 	EMIT(PPC_RAW_BLR());
+
+	bpf_jit_build_fentry_stubs(image, ctx);
 }
 
 /* Relative offset needs to be calculated based on final image location */
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index f3be024fc685..dcf339788e58 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -126,6 +126,9 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 {
 	int i;
 
+	/* Instruction for trampoline attach */
+	EMIT(PPC_RAW_NOP());
+
 #ifndef CONFIG_PPC_KERNEL_PCREL
 	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
 		EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)));
@@ -200,6 +203,8 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
 
 	EMIT(PPC_RAW_BLR());
+
+	bpf_jit_build_fentry_stubs(image, ctx);
 }
 
 int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
@@ -303,7 +308,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
 	 */
 	int b2p_bpf_array = bpf_to_ppc(BPF_REG_2);
 	int b2p_index = bpf_to_ppc(BPF_REG_3);
-	int bpf_tailcall_prologue_size = 8;
+	int bpf_tailcall_prologue_size = 12;
 
 	if (!IS_ENABLED(CONFIG_PPC_KERNEL_PCREL) && IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
 		bpf_tailcall_prologue_size += 4; /* skip past the toc load */
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-09-15 20:56 ` [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines Hari Bathini
@ 2024-09-16 21:41   ` kernel test robot
  2024-09-17  7:50   ` Alexei Starovoitov
  1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2024-09-16 21:41 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild,
	linux-kernel
  Cc: llvm, oe-kbuild-all, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu

Hi Hari,

kernel test robot noticed the following build warnings:

[auto build test WARNING on powerpc/next]
[also build test WARNING on powerpc/fixes masahiroy-kbuild/for-next masahiroy-kbuild/fixes linus/master v6.11 next-20240916]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Hari-Bathini/powerpc-trace-Account-for-fpatchable-function-entry-support-by-toolchain/20240916-050056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
patch link:    https://lore.kernel.org/r/20240915205648.830121-18-hbathini%40linux.ibm.com
patch subject: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
config: powerpc-allyesconfig (https://download.01.org/0day-ci/archive/20240917/202409170544.6d1odaN2-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project bf684034844c660b778f0eba103582f582b710c9)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240917/202409170544.6d1odaN2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409170544.6d1odaN2-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/net/bpf_jit_comp.c:11:
   In file included from arch/powerpc/include/asm/cacheflush.h:7:
   In file included from include/linux/mm.h:2228:
   include/linux/vmstat.h:500:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     500 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     501 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:507:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     507 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     508 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     514 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:519:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     519 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     520 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:528:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     528 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     529 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> arch/powerpc/net/bpf_jit_comp.c:872:70: warning: variable 'r4_off' is uninitialized when used here [-Wuninitialized]
     872 |                         bpf_trampoline_setup_tail_call_cnt(image, ctx, func_frame_offset, r4_off);
         |                                                                                           ^~~~~~
   arch/powerpc/net/bpf_jit_comp.c:654:87: note: initialize the variable 'r4_off' to silence this warning
     654 |         int regs_off, nregs_off, ip_off, run_ctx_off, retval_off, nvr_off, alt_lr_off, r4_off;
         |                                                                                              ^
         |                                                                                               = 0
   6 warnings generated.


vim +/r4_off +872 arch/powerpc/net/bpf_jit_comp.c

   647	
   648	static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_image,
   649						 void *rw_image_end, void *ro_image,
   650						 const struct btf_func_model *m, u32 flags,
   651						 struct bpf_tramp_links *tlinks,
   652						 void *func_addr)
   653	{
   654		int regs_off, nregs_off, ip_off, run_ctx_off, retval_off, nvr_off, alt_lr_off, r4_off;
   655		int i, ret, nr_regs, bpf_frame_size = 0, bpf_dummy_frame_size = 0, func_frame_offset;
   656		struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
   657		struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
   658		struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
   659		struct codegen_context codegen_ctx, *ctx;
   660		u32 *image = (u32 *)rw_image;
   661		ppc_inst_t branch_insn;
   662		u32 *branches = NULL;
   663		bool save_ret;
   664	
   665		if (IS_ENABLED(CONFIG_PPC32))
   666			return -EOPNOTSUPP;
   667	
   668		nr_regs = m->nr_args;
   669		/* Extra registers for struct arguments */
   670		for (i = 0; i < m->nr_args; i++)
   671			if (m->arg_size[i] > SZL)
   672				nr_regs += round_up(m->arg_size[i], SZL) / SZL - 1;
   673	
   674		if (nr_regs > MAX_BPF_FUNC_ARGS)
   675			return -EOPNOTSUPP;
   676	
   677		ctx = &codegen_ctx;
   678		memset(ctx, 0, sizeof(*ctx));
   679	
   680		/*
   681		 * Generated stack layout:
   682		 *
   683		 * func prev back chain         [ back chain        ]
   684		 *                              [                   ]
   685		 * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
   686		 *                              [                   ] --
   687		 * LR save area                 [ r0 save (64-bit)  ]   | header
   688		 *                              [ r0 save (32-bit)  ]   |
   689		 * dummy frame for unwind       [ back chain 1      ] --
   690		 *                              [ padding           ] align stack frame
   691		 *       r4_off                 [ r4 (tailcallcnt)  ] optional - 32-bit powerpc
   692		 *       alt_lr_off             [ real lr (ool stub)] optional - actual lr
   693		 *                              [ r26               ]
   694		 *       nvr_off                [ r25               ] nvr save area
   695		 *       retval_off             [ return value      ]
   696		 *                              [ reg argN          ]
   697		 *                              [ ...               ]
   698		 *       regs_off               [ reg_arg1          ] prog ctx context
   699		 *       nregs_off              [ args count        ]
   700		 *       ip_off                 [ traced function   ]
   701		 *                              [ ...               ]
   702		 *       run_ctx_off            [ bpf_tramp_run_ctx ]
   703		 *                              [ reg argN          ]
   704		 *                              [ ...               ]
   705		 *       param_save_area        [ reg_arg1          ] min 8 doublewords, per ABI
   706		 *                              [ TOC save (64-bit) ] --
   707		 *                              [ LR save (64-bit)  ]   | header
   708		 *                              [ LR save (32-bit)  ]   |
   709		 * bpf trampoline frame	        [ back chain 2      ] --
   710		 *
   711		 */
   712	
   713		/* Minimum stack frame header */
   714		bpf_frame_size = STACK_FRAME_MIN_SIZE;
   715	
   716		/*
   717		 * Room for parameter save area.
   718		 *
   719		 * As per the ABI, this is required if we call into the traced
   720		 * function (BPF_TRAMP_F_CALL_ORIG):
   721		 * - if the function takes more than 8 arguments for the rest to spill onto the stack
   722		 * - or, if the function has variadic arguments
   723		 * - or, if this functions's prototype was not available to the caller
   724		 *
   725		 * Reserve space for at least 8 registers for now. This can be optimized later.
   726		 */
   727		bpf_frame_size += (nr_regs > 8 ? nr_regs : 8) * SZL;
   728	
   729		/* Room for struct bpf_tramp_run_ctx */
   730		run_ctx_off = bpf_frame_size;
   731		bpf_frame_size += round_up(sizeof(struct bpf_tramp_run_ctx), SZL);
   732	
   733		/* Room for IP address argument */
   734		ip_off = bpf_frame_size;
   735		if (flags & BPF_TRAMP_F_IP_ARG)
   736			bpf_frame_size += SZL;
   737	
   738		/* Room for args count */
   739		nregs_off = bpf_frame_size;
   740		bpf_frame_size += SZL;
   741	
   742		/* Room for args */
   743		regs_off = bpf_frame_size;
   744		bpf_frame_size += nr_regs * SZL;
   745	
   746		/* Room for return value of func_addr or fentry prog */
   747		retval_off = bpf_frame_size;
   748		save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
   749		if (save_ret)
   750			bpf_frame_size += SZL;
   751	
   752		/* Room for nvr save area */
   753		nvr_off = bpf_frame_size;
   754		bpf_frame_size += 2 * SZL;
   755	
   756		/* Optional save area for actual LR in case of ool ftrace */
   757		if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
   758			alt_lr_off = bpf_frame_size;
   759			bpf_frame_size += SZL;
   760		}
   761	
   762		if (IS_ENABLED(CONFIG_PPC32)) {
   763			if (nr_regs < 2) {
   764				r4_off = bpf_frame_size;
   765				bpf_frame_size += SZL;
   766			} else {
   767				r4_off = regs_off + SZL;
   768			}
   769		}
   770	
   771		/* Padding to align stack frame, if any */
   772		bpf_frame_size = round_up(bpf_frame_size, SZL * 2);
   773	
   774		/* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
   775		bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
   776	
   777		/* Offset to the traced function's stack frame */
   778		func_frame_offset = bpf_dummy_frame_size + bpf_frame_size;
   779	
   780		/* Create dummy frame for unwind, store original return value */
   781		EMIT(PPC_RAW_STL(_R0, _R1, PPC_LR_STKOFF));
   782		/* Protect red zone where tail call count goes */
   783		EMIT(PPC_RAW_STLU(_R1, _R1, -bpf_dummy_frame_size));
   784	
   785		/* Create our stack frame */
   786		EMIT(PPC_RAW_STLU(_R1, _R1, -bpf_frame_size));
   787	
   788		/* 64-bit: Save TOC and load kernel TOC */
   789		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
   790			EMIT(PPC_RAW_STD(_R2, _R1, 24));
   791			PPC64_LOAD_PACA();
   792		}
   793	
   794		/* 32-bit: save tail call count in r4 */
   795		if (IS_ENABLED(CONFIG_PPC32) && nr_regs < 2)
   796			EMIT(PPC_RAW_STL(_R4, _R1, r4_off));
   797	
   798		bpf_trampoline_save_args(image, ctx, func_frame_offset, nr_regs, regs_off);
   799	
   800		/* Save our return address */
   801		EMIT(PPC_RAW_MFLR(_R3));
   802		if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
   803			EMIT(PPC_RAW_STL(_R3, _R1, alt_lr_off));
   804		else
   805			EMIT(PPC_RAW_STL(_R3, _R1, bpf_frame_size + PPC_LR_STKOFF));
   806	
   807		/*
   808		 * Save ip address of the traced function.
   809		 * We could recover this from LR, but we will need to address for OOL trampoline,
   810		 * and optional GEP area.
   811		 */
   812		if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) || flags & BPF_TRAMP_F_IP_ARG) {
   813			EMIT(PPC_RAW_LWZ(_R4, _R3, 4));
   814			EMIT(PPC_RAW_SLWI(_R4, _R4, 6));
   815			EMIT(PPC_RAW_SRAWI(_R4, _R4, 6));
   816			EMIT(PPC_RAW_ADD(_R3, _R3, _R4));
   817			EMIT(PPC_RAW_ADDI(_R3, _R3, 4));
   818		}
   819	
   820		if (flags & BPF_TRAMP_F_IP_ARG)
   821			EMIT(PPC_RAW_STL(_R3, _R1, ip_off));
   822	
   823		if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
   824			/* Fake our LR for unwind */
   825			EMIT(PPC_RAW_STL(_R3, _R1, bpf_frame_size + PPC_LR_STKOFF));
   826	
   827		/* Save function arg count -- see bpf_get_func_arg_cnt() */
   828		EMIT(PPC_RAW_LI(_R3, nr_regs));
   829		EMIT(PPC_RAW_STL(_R3, _R1, nregs_off));
   830	
   831		/* Save nv regs */
   832		EMIT(PPC_RAW_STL(_R25, _R1, nvr_off));
   833		EMIT(PPC_RAW_STL(_R26, _R1, nvr_off + SZL));
   834	
   835		if (flags & BPF_TRAMP_F_CALL_ORIG) {
   836			PPC_LI_ADDR(_R3, (unsigned long)im);
   837			ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx,
   838							 (unsigned long)__bpf_tramp_enter);
   839			if (ret)
   840				return ret;
   841		}
   842	
   843		for (i = 0; i < fentry->nr_links; i++)
   844			if (invoke_bpf_prog(image, ro_image, ctx, fentry->links[i], regs_off, retval_off,
   845					    run_ctx_off, flags & BPF_TRAMP_F_RET_FENTRY_RET))
   846				return -EINVAL;
   847	
   848		if (fmod_ret->nr_links) {
   849			branches = kcalloc(fmod_ret->nr_links, sizeof(u32), GFP_KERNEL);
   850			if (!branches)
   851				return -ENOMEM;
   852	
   853			if (invoke_bpf_mod_ret(image, ro_image, ctx, fmod_ret, regs_off, retval_off,
   854					       run_ctx_off, branches)) {
   855				ret = -EINVAL;
   856				goto cleanup;
   857			}
   858		}
   859	
   860		/* Call the traced function */
   861		if (flags & BPF_TRAMP_F_CALL_ORIG) {
   862			/*
   863			 * The address in LR save area points to the correct point in the original function
   864			 * with both PPC_FTRACE_OUT_OF_LINE as well as with traditional ftrace instruction
   865			 * sequence
   866			 */
   867			EMIT(PPC_RAW_LL(_R3, _R1, bpf_frame_size + PPC_LR_STKOFF));
   868			EMIT(PPC_RAW_MTCTR(_R3));
   869	
   870			/* Replicate tail_call_cnt before calling the original BPF prog */
   871			if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
 > 872				bpf_trampoline_setup_tail_call_cnt(image, ctx, func_frame_offset, r4_off);
   873	
   874			/* Restore args */
   875			bpf_trampoline_restore_args_stack(image, ctx, func_frame_offset, nr_regs, regs_off);
   876	
   877			/* Restore TOC for 64-bit */
   878			if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
   879				EMIT(PPC_RAW_LD(_R2, _R1, 24));
   880			EMIT(PPC_RAW_BCTRL());
   881			if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
   882				PPC64_LOAD_PACA();
   883	
   884			/* Store return value for bpf prog to access */
   885			EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
   886	
   887			/* Restore updated tail_call_cnt */
   888			if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
   889				bpf_trampoline_restore_tail_call_cnt(image, ctx, func_frame_offset, r4_off);
   890	
   891			/* Reserve space to patch branch instruction to skip fexit progs */
   892			im->ip_after_call = &((u32 *)ro_image)[ctx->idx];
   893			EMIT(PPC_RAW_NOP());
   894		}
   895	
   896		/* Update branches saved in invoke_bpf_mod_ret with address of do_fexit */
   897		for (i = 0; i < fmod_ret->nr_links && image; i++) {
   898			if (create_cond_branch(&branch_insn, &image[branches[i]],
   899					       (unsigned long)&image[ctx->idx], COND_NE << 16)) {
   900				ret = -EINVAL;
   901				goto cleanup;
   902			}
   903	
   904			image[branches[i]] = ppc_inst_val(branch_insn);
   905		}
   906	
   907		for (i = 0; i < fexit->nr_links; i++)
   908			if (invoke_bpf_prog(image, ro_image, ctx, fexit->links[i], regs_off, retval_off,
   909					    run_ctx_off, false)) {
   910				ret = -EINVAL;
   911				goto cleanup;
   912			}
   913	
   914		if (flags & BPF_TRAMP_F_CALL_ORIG) {
   915			im->ip_epilogue = &((u32 *)ro_image)[ctx->idx];
   916			PPC_LI_ADDR(_R3, im);
   917			ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx,
   918							 (unsigned long)__bpf_tramp_exit);
   919			if (ret)
   920				goto cleanup;
   921		}
   922	
   923		if (flags & BPF_TRAMP_F_RESTORE_REGS)
   924			bpf_trampoline_restore_args_regs(image, ctx, nr_regs, regs_off);
   925	
   926		/* Restore return value of func_addr or fentry prog */
   927		if (save_ret)
   928			EMIT(PPC_RAW_LL(_R3, _R1, retval_off));
   929	
   930		/* Restore nv regs */
   931		EMIT(PPC_RAW_LL(_R26, _R1, nvr_off + SZL));
   932		EMIT(PPC_RAW_LL(_R25, _R1, nvr_off));
   933	
   934		/* Epilogue */
   935		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
   936			EMIT(PPC_RAW_LD(_R2, _R1, 24));
   937		if (flags & BPF_TRAMP_F_SKIP_FRAME) {
   938			/* Skip the traced function and return to parent */
   939			EMIT(PPC_RAW_ADDI(_R1, _R1, func_frame_offset));
   940			EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
   941			EMIT(PPC_RAW_MTLR(_R0));
   942			EMIT(PPC_RAW_BLR());
   943		} else {
   944			if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
   945				EMIT(PPC_RAW_LL(_R0, _R1, alt_lr_off));
   946				EMIT(PPC_RAW_MTLR(_R0));
   947				EMIT(PPC_RAW_ADDI(_R1, _R1, func_frame_offset));
   948				EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
   949				EMIT(PPC_RAW_BLR());
   950			} else {
   951				EMIT(PPC_RAW_LL(_R0, _R1, bpf_frame_size + PPC_LR_STKOFF));
   952				EMIT(PPC_RAW_MTCTR(_R0));
   953				EMIT(PPC_RAW_ADDI(_R1, _R1, func_frame_offset));
   954				EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
   955				EMIT(PPC_RAW_MTLR(_R0));
   956				EMIT(PPC_RAW_BCTR());
   957			}
   958		}
   959	
   960		/* Make sure the trampoline generation logic doesn't overflow */
   961		if (image && WARN_ON_ONCE(&image[ctx->idx] > (u32 *)rw_image_end - BPF_INSN_SAFETY)) {
   962			ret = -EFAULT;
   963			goto cleanup;
   964		}
   965		ret = ctx->idx * 4 + BPF_INSN_SAFETY * 4;
   966	
   967	cleanup:
   968		kfree(branches);
   969		return ret;
   970	}
   971	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-09-15 20:56 ` [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines Hari Bathini
  2024-09-16 21:41   ` kernel test robot
@ 2024-09-17  7:50   ` Alexei Starovoitov
  2024-09-30  5:33     ` Hari Bathini
  1 sibling, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2024-09-17  7:50 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu

On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
> +
> +       /*
> +        * Generated stack layout:
> +        *
> +        * func prev back chain         [ back chain        ]
> +        *                              [                   ]
> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
> +        *                              [                   ] --
...
> +
> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;

What is the goal of such a large "red zone" ?
The kernel stack is a limited resource.
Why reserve 64 bytes ?
tail call cnt can probably be optional as well.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-09-17  7:50   ` Alexei Starovoitov
@ 2024-09-30  5:33     ` Hari Bathini
  2024-09-30 12:55       ` Alexei Starovoitov
  0 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-09-30  5:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu



On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>
>> +
>> +       /*
>> +        * Generated stack layout:
>> +        *
>> +        * func prev back chain         [ back chain        ]
>> +        *                              [                   ]
>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
>> +        *                              [                   ] --
> ...
>> +
>> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
> 
> What is the goal of such a large "red zone" ?
> The kernel stack is a limited resource.
> Why reserve 64 bytes ?
> tail call cnt can probably be optional as well.

Hi Alexei, thanks for reviewing.
FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
a redzone of 80 bytes since tailcall support was introduced [1].
It came down to 64 bytes thanks to [2]. The red zone is being used
to save NVRs and tail call count when a stack is not setup. I do
agree that we should look at optimizing it further. Do you think
the optimization should go as part of PPC64 trampoline enablement
being done here or should that be taken up as a separate item, maybe?

[1] 
https://lore.kernel.org/all/40b65ab2bb3a48837ab047a70887de3ccd70c56b.1474661927.git.naveen.n.rao@linux.vnet.ibm.com/
[2] https://lore.kernel.org/all/20180503230824.3462-11-daniel@iogearbox.net/

Thanks
Hari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-09-30  5:33     ` Hari Bathini
@ 2024-09-30 12:55       ` Alexei Starovoitov
  2024-10-01  7:18         ` Hari Bathini
  0 siblings, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2024-09-30 12:55 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu

On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
>
>
> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
> > On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
> >>
> >> +
> >> +       /*
> >> +        * Generated stack layout:
> >> +        *
> >> +        * func prev back chain         [ back chain        ]
> >> +        *                              [                   ]
> >> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
> >> +        *                              [                   ] --
> > ...
> >> +
> >> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
> >> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
> >
> > What is the goal of such a large "red zone" ?
> > The kernel stack is a limited resource.
> > Why reserve 64 bytes ?
> > tail call cnt can probably be optional as well.
>
> Hi Alexei, thanks for reviewing.
> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
> a redzone of 80 bytes since tailcall support was introduced [1].
> It came down to 64 bytes thanks to [2]. The red zone is being used
> to save NVRs and tail call count when a stack is not setup. I do
> agree that we should look at optimizing it further. Do you think
> the optimization should go as part of PPC64 trampoline enablement
> being done here or should that be taken up as a separate item, maybe?

The follow up is fine.
It just odd to me that we currently have:

[   unused red zone ] 208 bytes protected

I simply don't understand why we need to waste this much stack space.
Why can't it be zero today ?

> [1]
> https://lore.kernel.org/all/40b65ab2bb3a48837ab047a70887de3ccd70c56b.1474661927.git.naveen.n.rao@linux.vnet.ibm.com/
> [2] https://lore.kernel.org/all/20180503230824.3462-11-daniel@iogearbox.net/
>
> Thanks
> Hari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-09-30 12:55       ` Alexei Starovoitov
@ 2024-10-01  7:18         ` Hari Bathini
  2024-10-01 14:53           ` Alexei Starovoitov
  0 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-10-01  7:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu



On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
> On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>
>>
>>
>> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
>>> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>>>
>>>> +
>>>> +       /*
>>>> +        * Generated stack layout:
>>>> +        *
>>>> +        * func prev back chain         [ back chain        ]
>>>> +        *                              [                   ]
>>>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
>>>> +        *                              [                   ] --
>>> ...
>>>> +
>>>> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
>>>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
>>>
>>> What is the goal of such a large "red zone" ?
>>> The kernel stack is a limited resource.
>>> Why reserve 64 bytes ?
>>> tail call cnt can probably be optional as well.
>>
>> Hi Alexei, thanks for reviewing.
>> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
>> a redzone of 80 bytes since tailcall support was introduced [1].
>> It came down to 64 bytes thanks to [2]. The red zone is being used
>> to save NVRs and tail call count when a stack is not setup. I do
>> agree that we should look at optimizing it further. Do you think
>> the optimization should go as part of PPC64 trampoline enablement
>> being done here or should that be taken up as a separate item, maybe?
> 
> The follow up is fine.
> It just odd to me that we currently have:
> 
> [   unused red zone ] 208 bytes protected
> 
> I simply don't understand why we need to waste this much stack space.
> Why can't it be zero today ?
> 

The ABI for ppc64 has a redzone of 288 bytes below the current
stack pointer that can be used as a scratch area until a new
stack frame is created. So, no wastage of stack space as such.
It is just red zone that can be used before a new stack frame
is created. The comment there is only to show how redzone is
being used in ppc64 BPF JIT. I think the confusion is with the
mention of "208 bytes" as protected. As not all of that scratch
area is used, it mentions the remaining as unused. Essentially
288 bytes below current stack pointer is protected from debuggers
and interrupt code (red zone). Note that it should be 224 bytes
of unused red zone instead of 208 bytes as red zone usage in
ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
Hope that clears the misunderstanding..

>> [1]
>> https://lore.kernel.org/all/40b65ab2bb3a48837ab047a70887de3ccd70c56b.1474661927.git.naveen.n.rao@linux.vnet.ibm.com/
>> [2] https://lore.kernel.org/all/20180503230824.3462-11-daniel@iogearbox.net/
>>

Thanks
Hari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-10-01  7:18         ` Hari Bathini
@ 2024-10-01 14:53           ` Alexei Starovoitov
  2024-10-03  5:33             ` Hari Bathini
  2024-10-10  0:18             ` Michael Ellerman
  0 siblings, 2 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2024-10-01 14:53 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu

On Tue, Oct 1, 2024 at 12:18 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
>
>
> On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
> > On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
> >>
> >>
> >>
> >> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
> >>> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
> >>>>
> >>>> +
> >>>> +       /*
> >>>> +        * Generated stack layout:
> >>>> +        *
> >>>> +        * func prev back chain         [ back chain        ]
> >>>> +        *                              [                   ]
> >>>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
> >>>> +        *                              [                   ] --
> >>> ...
> >>>> +
> >>>> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
> >>>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
> >>>
> >>> What is the goal of such a large "red zone" ?
> >>> The kernel stack is a limited resource.
> >>> Why reserve 64 bytes ?
> >>> tail call cnt can probably be optional as well.
> >>
> >> Hi Alexei, thanks for reviewing.
> >> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
> >> a redzone of 80 bytes since tailcall support was introduced [1].
> >> It came down to 64 bytes thanks to [2]. The red zone is being used
> >> to save NVRs and tail call count when a stack is not setup. I do
> >> agree that we should look at optimizing it further. Do you think
> >> the optimization should go as part of PPC64 trampoline enablement
> >> being done here or should that be taken up as a separate item, maybe?
> >
> > The follow up is fine.
> > It just odd to me that we currently have:
> >
> > [   unused red zone ] 208 bytes protected
> >
> > I simply don't understand why we need to waste this much stack space.
> > Why can't it be zero today ?
> >
>
> The ABI for ppc64 has a redzone of 288 bytes below the current
> stack pointer that can be used as a scratch area until a new
> stack frame is created. So, no wastage of stack space as such.
> It is just red zone that can be used before a new stack frame
> is created. The comment there is only to show how redzone is
> being used in ppc64 BPF JIT. I think the confusion is with the
> mention of "208 bytes" as protected. As not all of that scratch
> area is used, it mentions the remaining as unused. Essentially
> 288 bytes below current stack pointer is protected from debuggers
> and interrupt code (red zone). Note that it should be 224 bytes
> of unused red zone instead of 208 bytes as red zone usage in
> ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
> Hope that clears the misunderstanding..

I see. That makes sense. So it's similar to amd64 red zone,
but there we have an issue with irqs, hence the kernel is
compiled with -mno-red-zone.

I guess ppc always has a different interrupt stack and
it's not an issue?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-10-01 14:53           ` Alexei Starovoitov
@ 2024-10-03  5:33             ` Hari Bathini
  2024-10-10  0:18             ` Michael Ellerman
  1 sibling, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-10-03  5:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu



On 01/10/24 8:23 pm, Alexei Starovoitov wrote:
> On Tue, Oct 1, 2024 at 12:18 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>
>>
>>
>> On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
>>> On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
>>>>> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>>>>>
>>>>>> +
>>>>>> +       /*
>>>>>> +        * Generated stack layout:
>>>>>> +        *
>>>>>> +        * func prev back chain         [ back chain        ]
>>>>>> +        *                              [                   ]
>>>>>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
>>>>>> +        *                              [                   ] --
>>>>> ...
>>>>>> +
>>>>>> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
>>>>>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
>>>>>
>>>>> What is the goal of such a large "red zone" ?
>>>>> The kernel stack is a limited resource.
>>>>> Why reserve 64 bytes ?
>>>>> tail call cnt can probably be optional as well.
>>>>
>>>> Hi Alexei, thanks for reviewing.
>>>> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
>>>> a redzone of 80 bytes since tailcall support was introduced [1].
>>>> It came down to 64 bytes thanks to [2]. The red zone is being used
>>>> to save NVRs and tail call count when a stack is not setup. I do
>>>> agree that we should look at optimizing it further. Do you think
>>>> the optimization should go as part of PPC64 trampoline enablement
>>>> being done here or should that be taken up as a separate item, maybe?
>>>
>>> The follow up is fine.
>>> It just odd to me that we currently have:
>>>
>>> [   unused red zone ] 208 bytes protected
>>>
>>> I simply don't understand why we need to waste this much stack space.
>>> Why can't it be zero today ?
>>>
>>
>> The ABI for ppc64 has a redzone of 288 bytes below the current
>> stack pointer that can be used as a scratch area until a new
>> stack frame is created. So, no wastage of stack space as such.
>> It is just red zone that can be used before a new stack frame
>> is created. The comment there is only to show how redzone is
>> being used in ppc64 BPF JIT. I think the confusion is with the
>> mention of "208 bytes" as protected. As not all of that scratch
>> area is used, it mentions the remaining as unused. Essentially
>> 288 bytes below current stack pointer is protected from debuggers
>> and interrupt code (red zone). Note that it should be 224 bytes
>> of unused red zone instead of 208 bytes as red zone usage in
>> ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
>> Hope that clears the misunderstanding..
> 
> I see. That makes sense. So it's similar to amd64 red zone,
> but there we have an issue with irqs, hence the kernel is
> compiled with -mno-red-zone.
> 
> I guess ppc always has a different interrupt stack and
> it's not an issue?

Yeah. On ppc64, kernel also uses redzone.
Interrupts use a different stack..

Thanks
Hari


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link
  2024-09-15 20:56 ` [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link Hari Bathini
@ 2024-10-09 15:23   ` Masahiro Yamada
  2024-10-10  9:56     ` Hari Bathini
  0 siblings, 1 reply; 36+ messages in thread
From: Masahiro Yamada @ 2024-10-09 15:23 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel,
	Naveen N. Rao, Mark Rutland, Daniel Borkmann, Nicholas Piggin,
	Alexei Starovoitov, Steven Rostedt, Andrii Nakryiko,
	Christophe Leroy, Vishal Chourasia, Mahesh J Salgaonkar,
	Masami Hiramatsu

On Mon, Sep 16, 2024 at 5:58 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
> From: Naveen N Rao <naveen@kernel.org>
>
> On powerpc, we would like to be able to make a pass on vmlinux.o and
> generate a new object file to be linked into vmlinux. Add a generic pass
> in Makefile.vmlinux that architectures can use for this purpose.
>
> Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
> provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
> will be invoked prior to the final vmlinux link step.
>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>
> Changes in v5:
> * Intermediate files named .vmlinux.arch.* instead of .arch.vmlinux.*
>
>
>  arch/Kconfig             | 6 ++++++
>  scripts/Makefile.vmlinux | 7 +++++++
>  scripts/link-vmlinux.sh  | 7 ++++++-
>  3 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..ef868ff8156a 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1643,4 +1643,10 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
>  config ARCH_NEED_CMPXCHG_1_EMU
>         bool
>
> +config ARCH_WANTS_PRE_LINK_VMLINUX
> +       def_bool n


Redundant default. This line should be "bool".






> +       help
> +         An architecture can select this if it provides arch/<arch>/tools/Makefile
> +         with .arch.vmlinux.o target to be linked into vmlinux.
> +
>  endmenu
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 49946cb96844..edf6fae8d960 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -22,6 +22,13 @@ targets += .vmlinux.export.o
>  vmlinux: .vmlinux.export.o
>  endif
>
> +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> +vmlinux: arch/$(SRCARCH)/tools/.vmlinux.arch.o

If you move this to arch/*/tools/, there is no reason
to make it a hidden file.


vmlinux: arch/$(SRCARCH)/tools/vmlinux.arch.o




> +arch/$(SRCARCH)/tools/.vmlinux.arch.o: vmlinux.o

FORCE is missing.


arch/$(SRCARCH)/tools/vmlinux.arch.o: vmlinux.o FORCE



> +       $(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools $@
> +endif
> +
>  ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
>
>  # Final link of vmlinux with optional arch pass after final link
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index f7b2503cdba9..b3a940c0e6c2 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -100,7 +100,7 @@ vmlinux_link()
>         ${ld} ${ldflags} -o ${output}                                   \
>                 ${wl}--whole-archive ${objs} ${wl}--no-whole-archive    \
>                 ${wl}--start-group ${libs} ${wl}--end-group             \
> -               ${kallsymso} ${btf_vmlinux_bin_o} ${ldlibs}
> +               ${kallsymso} ${btf_vmlinux_bin_o} ${arch_vmlinux_o} ${ldlibs}
>  }
>
>  # generate .BTF typeinfo from DWARF debuginfo
> @@ -214,6 +214,11 @@ fi
>
>  ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init init/version-timestamp.o
>
> +arch_vmlinux_o=""
> +if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
> +       arch_vmlinux_o=arch/${SRCARCH}/tools/.vmlinux.arch.o


arch_vmlinux_o=arch/${SRCARCH}/tools/vmlinux.arch.o



> +fi
> +
>  btf_vmlinux_bin_o=
>  kallsymso=
>  strip_debug=
> --
> 2.46.0
>


--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 12/17] powerpc64/ftrace: Move ftrace sequence out of line
  2024-09-15 20:56 ` [PATCH v5 12/17] powerpc64/ftrace: Move ftrace sequence out of line Hari Bathini
@ 2024-10-09 15:35   ` Masahiro Yamada
  0 siblings, 0 replies; 36+ messages in thread
From: Masahiro Yamada @ 2024-10-09 15:35 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel,
	Naveen N. Rao, Mark Rutland, Daniel Borkmann, Nicholas Piggin,
	Alexei Starovoitov, Steven Rostedt, Andrii Nakryiko,
	Christophe Leroy, Vishal Chourasia, Mahesh J Salgaonkar,
	Masami Hiramatsu

On Mon, Sep 16, 2024 at 5:58 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
> From: Naveen N Rao <naveen@kernel.org>
>
> Function profile sequence on powerpc includes two instructions at the
> beginning of each function:
>         mflr    r0
>         bl      ftrace_caller
>
> The call to ftrace_caller() gets nop'ed out during kernel boot and is
> patched in when ftrace is enabled.
>
> Given the sequence, we cannot return from ftrace_caller with 'blr' as we
> need to keep LR and r0 intact. This results in link stack (return
> address predictor) imbalance when ftrace is enabled. To address that, we
> would like to use a three instruction sequence:
>         mflr    r0
>         bl      ftrace_caller
>         mtlr    r0
>
> Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
> reserve two instruction slots before the function. This results in a
> total of five instruction slots to be reserved for ftrace use on each
> function that is traced.
>
> Move the function profile sequence out-of-line to minimize its impact.
> To do this, we reserve a single nop at function entry using
> -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
> the total number of functions that can be traced. This is then used to
> generate a .S file reserving the appropriate amount of space for use as
> ftrace stubs, which is built and linked into vmlinux.
>
> On bootup, the stub space is split into separate stubs per function and
> populated with the proper instruction sequence. A pointer to the
> associated stub is maintained in dyn_arch_ftrace.
>
> For modules, space for ftrace stubs is reserved from the generic module
> stub space.
>
> This is restricted to and enabled by default only on 64-bit powerpc,
> though there are some changes to accommodate 32-bit powerpc. This is
> done so that 32-bit powerpc could choose to opt into this based on
> further tests and benchmarks.
>
> As an example, after this patch, kernel functions will have a single nop
> at function entry:
> <kernel_clone>:
>         addis   r2,r12,467
>         addi    r2,r2,-16028
>         nop
>         mfocrf  r11,8
>         ...
>
> When ftrace is enabled, the nop is converted to an unconditional branch
> to the stub associated with that function:
> <kernel_clone>:
>         addis   r2,r12,467
>         addi    r2,r2,-16028
>         b       ftrace_ool_stub_text_end+0x11b28
>         mfocrf  r11,8
>         ...
>
> The associated stub:
> <ftrace_ool_stub_text_end+0x11b28>:
>         mflr    r0
>         bl      ftrace_caller
>         mtlr    r0
>         b       kernel_clone+0xc
>         ...
>
> This change showed an improvement of ~10% in null_syscall benchmark on a
> Power 10 system with ftrace enabled.
>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>
> Changes in v5:
> * Fixed ftrace stack tracer failure due to inadvertent use of
>   'add r7, r3, MCOUNT_INSN_SIZE' instruction instead of
>   'addi r7, r3, MCOUNT_INSN_SIZE'
> * Fixed build error for !CONFIG_MODULES case.
> * .vmlinux.arch.* files compiled under arch/powerpc/tools
> * Made sure .vmlinux.arch.* files are cleaned with `make clean`
>
>
>  arch/powerpc/Kbuild                        |   2 +-
>  arch/powerpc/Kconfig                       |   5 +
>  arch/powerpc/Makefile                      |   4 +
>  arch/powerpc/include/asm/ftrace.h          |  11 ++
>  arch/powerpc/include/asm/module.h          |   5 +
>  arch/powerpc/kernel/asm-offsets.c          |   4 +
>  arch/powerpc/kernel/module_64.c            |  58 +++++++-
>  arch/powerpc/kernel/trace/ftrace.c         | 162 +++++++++++++++++++--
>  arch/powerpc/kernel/trace/ftrace_entry.S   | 116 +++++++++++----
>  arch/powerpc/tools/Makefile                |  12 ++
>  arch/powerpc/tools/ftrace-gen-ool-stubs.sh |  43 ++++++
>  11 files changed, 384 insertions(+), 38 deletions(-)
>  create mode 100644 arch/powerpc/tools/Makefile
>  create mode 100755 arch/powerpc/tools/ftrace-gen-ool-stubs.sh
>
> diff --git a/arch/powerpc/Kbuild b/arch/powerpc/Kbuild
> index 571f260b0842..b010ccb071b6 100644
> --- a/arch/powerpc/Kbuild
> +++ b/arch/powerpc/Kbuild
> @@ -19,4 +19,4 @@ obj-$(CONFIG_KEXEC_CORE)  += kexec/
>  obj-$(CONFIG_KEXEC_FILE)  += purgatory/
>
>  # for cleaning
> -subdir- += boot
> +subdir- += boot tools
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index de18f3baff66..bae96b65f295 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
>         def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
>         def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
>
> +config PPC_FTRACE_OUT_OF_LINE
> +       def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> +       depends on PPC64

PPC64 appears twice here. It is redundant.

If this config entry is user-unconfigurable,
"def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY" is enough.

"depends on PPC64" is unneeded.









> +       select ARCH_WANTS_PRE_LINK_VMLINUX
> +
>  config HOTPLUG_CPU
>         bool "Support for enabling/disabling CPUs"
>         depends on SMP && (PPC_PSERIES || \
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index bbfe4a1f06ef..c973e6cd1ae8 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -155,7 +155,11 @@ CC_FLAGS_NO_FPU            := $(call cc-option,-msoft-float)
>  ifdef CONFIG_FUNCTION_TRACER
>  ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
>  KBUILD_CPPFLAGS        += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> +ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +CC_FLAGS_FTRACE := -fpatchable-function-entry=1
> +else
>  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
> +endif
>  else
>  CC_FLAGS_FTRACE := -pg
>  ifdef CONFIG_MPROFILE_KERNEL
> diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
> index 278d4548e8f1..bdbafc668b20 100644
> --- a/arch/powerpc/include/asm/ftrace.h
> +++ b/arch/powerpc/include/asm/ftrace.h
> @@ -24,6 +24,10 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
>  struct module;
>  struct dyn_ftrace;
>  struct dyn_arch_ftrace {
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       /* pointer to the associated out-of-line stub */
> +       unsigned long ool_stub;
> +#endif
>  };
>
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> @@ -130,6 +134,13 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
>
>  #ifdef CONFIG_FUNCTION_TRACER
>  extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +struct ftrace_ool_stub {
> +       u32     insn[4];
> +};
> +extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], ftrace_ool_stub_inittext[];
> +extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_inittext_count;
> +#endif
>  void ftrace_free_init_tramp(void);
>  unsigned long ftrace_call_adjust(unsigned long addr);
>  #else
> diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h
> index 300c777cc307..9ee70a4a0fde 100644
> --- a/arch/powerpc/include/asm/module.h
> +++ b/arch/powerpc/include/asm/module.h
> @@ -47,6 +47,11 @@ struct mod_arch_specific {
>  #ifdef CONFIG_DYNAMIC_FTRACE
>         unsigned long tramp;
>         unsigned long tramp_regs;
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       struct ftrace_ool_stub *ool_stubs;
> +       unsigned int ool_stub_count;
> +       unsigned int ool_stub_index;
> +#endif
>  #endif
>  };
>
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 23733282de4d..6854547d3164 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -674,5 +674,9 @@ int main(void)
>         DEFINE(BPT_SIZE, BPT_SIZE);
>  #endif
>
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       DEFINE(FTRACE_OOL_STUB_SIZE, sizeof(struct ftrace_ool_stub));
> +#endif
> +
>         return 0;
>  }
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 1db88409bd95..6816e9967cab 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -205,7 +205,9 @@ static int relacmp(const void *_x, const void *_y)
>
>  /* Get size of potential trampolines required. */
>  static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
> -                                   const Elf64_Shdr *sechdrs)
> +                                   const Elf64_Shdr *sechdrs,
> +                                   char *secstrings,
> +                                   struct module *me)
>  {
>         /* One extra reloc so it's always 0-addr terminated */
>         unsigned long relocs = 1;
> @@ -244,6 +246,24 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
>         /* stubs for ftrace_caller and ftrace_regs_caller */
>         relocs += IS_ENABLED(CONFIG_DYNAMIC_FTRACE) + IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS);
>
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       /* stubs for the function tracer */
> +       for (i = 1; i < hdr->e_shnum; i++) {
> +               if (!strcmp(secstrings + sechdrs[i].sh_name, "__patchable_function_entries")) {
> +                       me->arch.ool_stub_count = sechdrs[i].sh_size / sizeof(unsigned long);
> +                       me->arch.ool_stub_index = 0;
> +                       relocs += roundup(me->arch.ool_stub_count * sizeof(struct ftrace_ool_stub),
> +                                         sizeof(struct ppc64_stub_entry)) /
> +                                 sizeof(struct ppc64_stub_entry);
> +                       break;
> +               }
> +       }
> +       if (i == hdr->e_shnum) {
> +               pr_err("%s: doesn't contain __patchable_function_entries.\n", me->name);
> +               return -ENOEXEC;
> +       }
> +#endif
> +
>         pr_debug("Looks like a total of %lu stubs, max\n", relocs);
>         return relocs * sizeof(struct ppc64_stub_entry);
>  }
> @@ -454,7 +474,7 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
>  #endif
>
>         /* Override the stubs size */
> -       sechdrs[me->arch.stubs_section].sh_size = get_stubs_size(hdr, sechdrs);
> +       sechdrs[me->arch.stubs_section].sh_size = get_stubs_size(hdr, sechdrs, secstrings, me);
>
>         return 0;
>  }
> @@ -1079,6 +1099,37 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
>         return 0;
>  }
>
> +static int setup_ftrace_ool_stubs(const Elf64_Shdr *sechdrs, unsigned long addr, struct module *me)
> +{
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       unsigned int i, total_stubs, num_stubs;
> +       struct ppc64_stub_entry *stub;
> +
> +       total_stubs = sechdrs[me->arch.stubs_section].sh_size / sizeof(*stub);
> +       num_stubs = roundup(me->arch.ool_stub_count * sizeof(struct ftrace_ool_stub),
> +                           sizeof(struct ppc64_stub_entry)) / sizeof(struct ppc64_stub_entry);
> +
> +       /* Find the next available entry */
> +       stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> +       for (i = 0; stub_func_addr(stub[i].funcdata); i++)
> +               if (WARN_ON(i >= total_stubs))
> +                       return -1;
> +
> +       if (WARN_ON(i + num_stubs > total_stubs))
> +               return -1;
> +
> +       stub += i;
> +       me->arch.ool_stubs = (struct ftrace_ool_stub *)stub;
> +
> +       /* reserve stubs */
> +       for (i = 0; i < num_stubs; i++)
> +               if (patch_u32((void *)&stub->funcdata, PPC_RAW_NOP()))
> +                       return -1;
> +#endif
> +
> +       return 0;
> +}
> +
>  int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sechdrs)
>  {
>         mod->arch.tramp = stub_for_addr(sechdrs,
> @@ -1097,6 +1148,9 @@ int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sechdrs)
>         if (!mod->arch.tramp)
>                 return -ENOENT;
>
> +       if (setup_ftrace_ool_stubs(sechdrs, mod->arch.tramp, mod))
> +               return -ENOENT;
> +
>         return 0;
>  }
>  #endif
> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index 719517265d39..1fee074388cc 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -37,7 +37,8 @@ unsigned long ftrace_call_adjust(unsigned long addr)
>         if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
>                 return 0;
>
> -       if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
> +       if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
> +           !IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
>                 addr += MCOUNT_INSN_SIZE;
>
>         return addr;
> @@ -127,11 +128,25 @@ static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long a
>  }
>  #endif
>
> +static unsigned long ftrace_get_ool_stub(struct dyn_ftrace *rec)
> +{
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       return rec->arch.ool_stub;
> +#else
> +       BUILD_BUG();
> +#endif
> +}
> +
>  static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
>  {
> -       unsigned long ip = rec->ip;
> +       unsigned long ip;
>         unsigned long stub;
>
> +       if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
> +               ip = ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
> +       else
> +               ip = rec->ip;
> +
>         if (is_offset_in_branch_range(addr - ip))
>                 /* Within range */
>                 stub = addr;
> @@ -142,7 +157,7 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
>                 stub = ftrace_lookup_module_stub(ip, addr);
>
>         if (!stub) {
> -               pr_err("0x%lx: No ftrace stubs reachable\n", ip);
> +               pr_err("0x%lx (0x%lx): No ftrace stubs reachable\n", ip, rec->ip);
>                 return -EINVAL;
>         }
>
> @@ -150,6 +165,92 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
>         return 0;
>  }
>
> +static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
> +{
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       static int ool_stub_text_end_index, ool_stub_inittext_index;
> +       int ret = 0, ool_stub_count, *ool_stub_index;
> +       ppc_inst_t inst;
> +       /*
> +        * See ftrace_entry.S if changing the below instruction sequence, as we rely on
> +        * decoding the last branch instruction here to recover the correct function ip.
> +        */
> +       struct ftrace_ool_stub *ool_stub, ool_stub_template = {
> +               .insn = {
> +                       PPC_RAW_MFLR(_R0),
> +                       PPC_RAW_NOP(),          /* bl ftrace_caller */
> +                       PPC_RAW_MTLR(_R0),
> +                       PPC_RAW_NOP()           /* b rec->ip + 4 */
> +               }
> +       };
> +
> +       WARN_ON(rec->arch.ool_stub);
> +
> +       if (is_kernel_inittext(rec->ip)) {
> +               ool_stub = ftrace_ool_stub_inittext;
> +               ool_stub_index = &ool_stub_inittext_index;
> +               ool_stub_count = ftrace_ool_stub_inittext_count;
> +       } else if (is_kernel_text(rec->ip)) {
> +               ool_stub = ftrace_ool_stub_text_end;
> +               ool_stub_index = &ool_stub_text_end_index;
> +               ool_stub_count = ftrace_ool_stub_text_end_count;
> +#ifdef CONFIG_MODULES
> +       } else if (mod) {
> +               ool_stub = mod->arch.ool_stubs;
> +               ool_stub_index = &mod->arch.ool_stub_index;
> +               ool_stub_count = mod->arch.ool_stub_count;
> +#endif
> +       } else {
> +               return -EINVAL;
> +       }
> +
> +       ool_stub += (*ool_stub_index)++;
> +
> +       if (WARN_ON(*ool_stub_index > ool_stub_count))
> +               return -EINVAL;
> +
> +       if (!is_offset_in_branch_range((long)rec->ip - (long)&ool_stub->insn[0]) ||
> +           !is_offset_in_branch_range((long)(rec->ip + MCOUNT_INSN_SIZE) -
> +                                      (long)&ool_stub->insn[3])) {
> +               pr_err("%s: ftrace ool stub out of range (%p -> %p).\n",
> +                                       __func__, (void *)rec->ip, (void *)&ool_stub->insn[0]);
> +               return -EINVAL;
> +       }
> +
> +       rec->arch.ool_stub = (unsigned long)&ool_stub->insn[0];
> +
> +       /* bl ftrace_caller */
> +       if (!mod)
> +               ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &inst);
> +#ifdef CONFIG_MODULES
> +       else
> +               /*
> +                * We can't use ftrace_get_call_inst() since that uses
> +                * __module_text_address(rec->ip) to look up the module.
> +                * But, since the module is not fully formed at this stage,
> +                * the lookup fails. We know the target though, so generate
> +                * the branch inst directly.
> +                */
> +               inst = ftrace_create_branch_inst(ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE,
> +                                                mod->arch.tramp, 1);
> +#endif
> +       ool_stub_template.insn[1] = ppc_inst_val(inst);
> +
> +       /* b rec->ip + 4 */
> +       if (!ret && create_branch(&inst, &ool_stub->insn[3], rec->ip + MCOUNT_INSN_SIZE, 0))
> +               return -EINVAL;
> +       ool_stub_template.insn[3] = ppc_inst_val(inst);
> +
> +       if (!ret)
> +               ret = patch_instructions((u32 *)ool_stub, (u32 *)&ool_stub_template,
> +                                        sizeof(ool_stub_template), false);
> +
> +       return ret;
> +#else /* !CONFIG_PPC_FTRACE_OUT_OF_LINE */
> +       BUILD_BUG();
> +#endif
> +}
> +
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
>  {
> @@ -162,18 +263,29 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned
>  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>  {
>         ppc_inst_t old, new;
> -       int ret;
> +       unsigned long ip = rec->ip;
> +       int ret = 0;
>
>         /* This can only ever be called during module load */
> -       if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
> +       if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(ip)))
>                 return -EINVAL;
>
>         old = ppc_inst(PPC_RAW_NOP());
> -       ret = ftrace_get_call_inst(rec, addr, &new);
> -       if (ret)
> -               return ret;
> +       if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
> +               ip = ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
> +               ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &old);
> +       }
> +
> +       ret |= ftrace_get_call_inst(rec, addr, &new);
> +
> +       if (!ret)
> +               ret = ftrace_modify_code(ip, old, new);
>
> -       return ftrace_modify_code(rec->ip, old, new);
> +       if (!ret && IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
> +               ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
> +                        ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_ool_stub(rec) - (long)rec->ip)));
> +
> +       return ret;
>  }
>
>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
> @@ -206,6 +318,13 @@ void ftrace_replace_code(int enable)
>                 new_addr = ftrace_get_addr_new(rec);
>                 update = ftrace_update_record(rec, enable);
>
> +               if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) && update != FTRACE_UPDATE_IGNORE) {
> +                       ip = ftrace_get_ool_stub(rec) + MCOUNT_INSN_SIZE;
> +                       ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &nop_inst);
> +                       if (ret)
> +                               goto out;
> +               }
> +
>                 switch (update) {
>                 case FTRACE_UPDATE_IGNORE:
>                 default:
> @@ -230,6 +349,24 @@ void ftrace_replace_code(int enable)
>
>                 if (!ret)
>                         ret = ftrace_modify_code(ip, old, new);
> +
> +               if (!ret && IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) &&
> +                   (update == FTRACE_UPDATE_MAKE_NOP || update == FTRACE_UPDATE_MAKE_CALL)) {
> +                       /* Update the actual ftrace location */
> +                       call_inst = ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_ool_stub(rec) -
> +                                                           (long)rec->ip));
> +                       nop_inst = ppc_inst(PPC_RAW_NOP());
> +                       ip = rec->ip;
> +
> +                       if (update == FTRACE_UPDATE_MAKE_NOP)
> +                               ret = ftrace_modify_code(ip, call_inst, nop_inst);
> +                       else
> +                               ret = ftrace_modify_code(ip, nop_inst, call_inst);
> +
> +                       if (ret)
> +                               goto out;
> +               }
> +
>                 if (ret)
>                         goto out;
>         }
> @@ -249,7 +386,8 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>         /* Verify instructions surrounding the ftrace location */
>         if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
>                 /* Expect nops */
> -               ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
> +               if (!IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
> +                       ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
>                 if (!ret)
>                         ret = ftrace_validate_inst(ip, ppc_inst(PPC_RAW_NOP()));
>         } else if (IS_ENABLED(CONFIG_PPC32)) {
> @@ -277,6 +415,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>         if (ret)
>                 return ret;
>
> +       /* Set up out-of-line stub */
> +       if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
> +               return ftrace_init_ool_stub(mod, rec);
> +
>         /* Nop-out the ftrace location */
>         new = ppc_inst(PPC_RAW_NOP());
>         addr = MCOUNT_ADDR;
> diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
> index 244a1c7bb1e8..5b2fc6483dce 100644
> --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> @@ -56,7 +56,7 @@
>         SAVE_GPR(2, r1)
>         SAVE_GPRS(11, 31, r1)
>         .else
> -#ifdef CONFIG_LIVEPATCH_64
> +#if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
>         SAVE_GPR(14, r1)
>  #endif
>         .endif
> @@ -78,10 +78,6 @@
>
>         /* Get the _mcount() call site out of LR */
>         mflr    r7
> -       /* Save it as pt_regs->nip */
> -       PPC_STL r7, _NIP(r1)
> -       /* Also save it in B's stackframe header for proper unwind */
> -       PPC_STL r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
>         /* Save the read LR in pt_regs->link */
>         PPC_STL r0, _LINK(r1)
>
> @@ -96,16 +92,6 @@
>         lwz     r5,function_trace_op@l(r3)
>  #endif
>
> -#ifdef CONFIG_LIVEPATCH_64
> -       mr      r14, r7         /* remember old NIP */
> -#endif
> -
> -       /* Calculate ip from nip-4 into r3 for call below */
> -       subi    r3, r7, MCOUNT_INSN_SIZE
> -
> -       /* Put the original return address in r4 as parent_ip */
> -       mr      r4, r0
> -
>         /* Save special regs */
>         PPC_STL r8, _MSR(r1)
>         .if \allregs == 1
> @@ -114,17 +100,69 @@
>         PPC_STL r11, _CCR(r1)
>         .endif
>
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       /* Save our real return address in nvr for return */
> +       .if \allregs == 0
> +       SAVE_GPR(15, r1)
> +       .endif
> +       mr      r15, r7
> +       /*
> +        * We want the ftrace location in the function, but our lr (in r7)
> +        * points at the 'mtlr r0' instruction in the out of line stub.  To
> +        * recover the ftrace location, we read the branch instruction in the
> +        * stub, and adjust our lr by the branch offset.
> +        *
> +        * See ftrace_init_ool_stub() for the profile sequence.
> +        */
> +       lwz     r8, MCOUNT_INSN_SIZE(r7)
> +       slwi    r8, r8, 6
> +       srawi   r8, r8, 6
> +       add     r3, r7, r8
> +       /*
> +        * Override our nip to point past the branch in the original function.
> +        * This allows reliable stack trace and the ftrace stack tracer to work as-is.
> +        */
> +       addi    r7, r3, MCOUNT_INSN_SIZE
> +#else
> +       /* Calculate ip from nip-4 into r3 for call below */
> +       subi    r3, r7, MCOUNT_INSN_SIZE
> +#endif
> +
> +       /* Save NIP as pt_regs->nip */
> +       PPC_STL r7, _NIP(r1)
> +       /* Also save it in B's stackframe header for proper unwind */
> +       PPC_STL r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
> +#if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
> +       mr      r14, r7         /* remember old NIP */
> +#endif
> +
> +       /* Put the original return address in r4 as parent_ip */
> +       mr      r4, r0
> +
>         /* Load &pt_regs in r6 for call below */
>         addi    r6, r1, STACK_INT_FRAME_REGS
>  .endm
>
>  .macro ftrace_regs_exit allregs
> +#ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
>         /* Load ctr with the possibly modified NIP */
>         PPC_LL  r3, _NIP(r1)
>         mtctr   r3
>
>  #ifdef CONFIG_LIVEPATCH_64
>         cmpd    r14, r3         /* has NIP been altered? */
> +#endif
> +#else /* !CONFIG_PPC_FTRACE_OUT_OF_LINE */
> +       /* Load LR with the possibly modified NIP */
> +       PPC_LL  r3, _NIP(r1)
> +       cmpd    r14, r3         /* has NIP been altered? */
> +       bne-    1f
> +
> +       mr      r3, r15
> +       .if \allregs == 0
> +       REST_GPR(15, r1)
> +       .endif
> +1:     mtlr    r3
>  #endif
>
>         /* Restore gprs */
> @@ -132,14 +170,16 @@
>         REST_GPRS(2, 31, r1)
>         .else
>         REST_GPRS(3, 10, r1)
> -#ifdef CONFIG_LIVEPATCH_64
> +#if defined(CONFIG_LIVEPATCH_64) || defined(CONFIG_PPC_FTRACE_OUT_OF_LINE)
>         REST_GPR(14, r1)
>  #endif
>         .endif
>
>         /* Restore possibly modified LR */
>         PPC_LL  r0, _LINK(r1)
> +#ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
>         mtlr    r0
> +#endif
>
>  #ifdef CONFIG_PPC64
>         /* Restore callee's TOC */
> @@ -153,7 +193,16 @@
>          /* Based on the cmpd above, if the NIP was altered handle livepatch */
>         bne-    livepatch_handler
>  #endif
> -       bctr                    /* jump after _mcount site */
> +       /* jump after _mcount site */
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       /*
> +        * Return with blr to keep the link stack balanced. The function profiling sequence
> +        * uses 'mtlr r0' to restore LR.
> +        */
> +       blr
> +#else
> +       bctr
> +#endif
>  .endm
>
>  _GLOBAL(ftrace_regs_caller)
> @@ -177,6 +226,11 @@ _GLOBAL(ftrace_stub)
>
>  #ifdef CONFIG_PPC64
>  ftrace_no_trace:
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       REST_GPR(3, r1)
> +       addi    r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
> +       blr
> +#else
>         mflr    r3
>         mtctr   r3
>         REST_GPR(3, r1)
> @@ -184,6 +238,7 @@ ftrace_no_trace:
>         mtlr    r0
>         bctr
>  #endif
> +#endif
>
>  #ifdef CONFIG_LIVEPATCH_64
>         /*
> @@ -194,11 +249,17 @@ ftrace_no_trace:
>          * We get here when a function A, calls another function B, but B has
>          * been live patched with a new function C.
>          *
> -        * On entry:
> -        *  - we have no stack frame and can not allocate one
> +        * On entry, we have no stack frame and can not allocate one.
> +        *
> +        * With PPC_FTRACE_OUT_OF_LINE=n, on entry:
>          *  - LR points back to the original caller (in A)
>          *  - CTR holds the new NIP in C
>          *  - r0, r11 & r12 are free
> +        *
> +        * With PPC_FTRACE_OUT_OF_LINE=y, on entry:
> +        *  - r0 points back to the original caller (in A)
> +        *  - LR holds the new NIP in C
> +        *  - r11 & r12 are free
>          */
>  livepatch_handler:
>         ld      r12, PACA_THREAD_INFO(r13)
> @@ -208,18 +269,23 @@ livepatch_handler:
>         addi    r11, r11, 24
>         std     r11, TI_livepatch_sp(r12)
>
> -       /* Save toc & real LR on livepatch stack */
> -       std     r2,  -24(r11)
> -       mflr    r12
> -       std     r12, -16(r11)
> -
>         /* Store stack end marker */
>         lis     r12, STACK_END_MAGIC@h
>         ori     r12, r12, STACK_END_MAGIC@l
>         std     r12, -8(r11)
>
> -       /* Put ctr in r12 for global entry and branch there */
> +       /* Save toc & real LR on livepatch stack */
> +       std     r2,  -24(r11)
> +#ifndef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +       mflr    r12
> +       std     r12, -16(r11)
>         mfctr   r12
> +#else
> +       std     r0, -16(r11)
> +       mflr    r12
> +       /* Put ctr in r12 for global entry and branch there */
> +       mtctr   r12
> +#endif
>         bctrl
>
>         /*
> diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
> new file mode 100644
> index 000000000000..3a389526498e
> --- /dev/null
> +++ b/arch/powerpc/tools/Makefile
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +quiet_cmd_gen_ftrace_ool_stubs = GEN     $@
> +      cmd_gen_ftrace_ool_stubs = $< vmlinux.o $@
> +
> +$(obj)/.vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE

$(obj)/vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE


> +       $(call if_changed,gen_ftrace_ool_stubs)



> +
> +$(obj)/.vmlinux.arch.o: $(obj)/.vmlinux.arch.S FORCE
> +       $(call if_changed_rule,as_o_S)


This is unnecessary because the build rule %.S -> %.o
is available in scripts/Makefile.build


> +
> +clean-files += .vmlinux.arch.S .vmlinux.arch.o

if_changed macro needs 'targets' assignment.

This line should be replaced with:

targets += vmlinux.arch.S







> diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
> new file mode 100755
> index 000000000000..8e0a6d4ea202
> --- /dev/null
> +++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
> @@ -0,0 +1,43 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +# Error out on error
> +set -e
> +
> +is_enabled() {
> +       grep -q "^$1=y" include/config/auto.conf
> +}

Instead of checking the CONFIG option in this script,
I recommend passing the 64bit flag as a command line parameter.


> +
> +vmlinux_o=${1}
> +arch_vmlinux_S=${2}
> +
> +RELOCATION=R_PPC64_ADDR64
> +if is_enabled CONFIG_PPC32; then
> +       RELOCATION=R_PPC_ADDR32
> +fi
> +
> +num_ool_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |


${CROSS_COMPILE}objdump  -> ${OBJDUMP}


> +                    grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
> +num_ool_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |

${CROSS_COMPILE}objdump  -> ${OBJDUMP}

I also recommend passing ${OBJDUMP} from the command line parameter.



> +                        grep ".init.text" | grep "${RELOCATION}" | wc -l)
> +
> +cat > ${arch_vmlinux_S} <<EOF
> +#include <asm/asm-offsets.h>
> +#include <linux/linkage.h>
> +
> +.pushsection .tramp.ftrace.text,"aw"
> +SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text})
> +
> +SYM_CODE_START(ftrace_ool_stub_text_end)
> +       .space ${num_ool_stubs_text} * FTRACE_OOL_STUB_SIZE
> +SYM_CODE_END(ftrace_ool_stub_text_end)
> +.popsection
> +
> +.pushsection .tramp.ftrace.init,"aw"
> +SYM_DATA(ftrace_ool_stub_inittext_count, .long ${num_ool_stubs_inittext})
> +
> +SYM_CODE_START(ftrace_ool_stub_inittext)
> +       .space ${num_ool_stubs_inittext} * FTRACE_OOL_STUB_SIZE
> +SYM_CODE_END(ftrace_ool_stub_inittext)
> +.popsection
> +EOF
> --
> 2.46.0
>


--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 13/17] powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs
  2024-09-15 20:56 ` [PATCH v5 13/17] powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs Hari Bathini
@ 2024-10-09 15:36   ` Masahiro Yamada
  0 siblings, 0 replies; 36+ messages in thread
From: Masahiro Yamada @ 2024-10-09 15:36 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel,
	Naveen N. Rao, Mark Rutland, Daniel Borkmann, Nicholas Piggin,
	Alexei Starovoitov, Steven Rostedt, Andrii Nakryiko,
	Christophe Leroy, Vishal Chourasia, Mahesh J Salgaonkar,
	Masami Hiramatsu

On Mon, Sep 16, 2024 at 5:58 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
> From: Naveen N Rao <naveen@kernel.org>
>
> We are restricted to a .text size of ~32MB when using out-of-line
> function profile sequence. Allow this to be extended up to the previous
> limit of ~64MB by reserving space in the middle of .text.
>
> A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is
> introduced to specify the number of function stubs that are reserved in
> .text. On boot, ftrace utilizes stubs from this area first before using
> the stub area at the end of .text.
>
> A ppc64le defconfig has ~44k functions that can be traced. A more
> conservative value of 32k functions is chosen as the default value of
> PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space
> than necessary by default. If building a kernel that only has 32k
> trace-able functions, we won't allot any more space at the end of .text
> during the pass on vmlinux.o. Otherwise, only the remaining functions
> get space for stubs at the end of .text. This default value should help
> cover a .text size of ~48MB in total (including space reserved at the
> end of .text which can cover up to 32MB), which should be sufficient for
> most common builds. For a very small kernel build, this can be set to 0.
> Or, this can be bumped up to a larger value to support vmlinux .text
> size up to ~64MB.
>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>
> Changes in v5:
> * num_ool_stubs_text_end used for setting up ftrace_ool_stub_text_end
>   set to zero instead of computing to some random negative value when
>   not required.
>
>  arch/powerpc/Kconfig                       | 12 ++++++++++++
>  arch/powerpc/include/asm/ftrace.h          |  6 ++++--
>  arch/powerpc/kernel/trace/ftrace.c         | 21 +++++++++++++++++----
>  arch/powerpc/kernel/trace/ftrace_entry.S   |  8 ++++++++
>  arch/powerpc/tools/Makefile                |  2 +-
>  arch/powerpc/tools/ftrace-gen-ool-stubs.sh | 16 ++++++++++++----
>  6 files changed, 54 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index bae96b65f295..a0ce00368bab 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -573,6 +573,18 @@ config PPC_FTRACE_OUT_OF_LINE
>         depends on PPC64
>         select ARCH_WANTS_PRE_LINK_VMLINUX
>
> +config PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE
> +       int "Number of ftrace out-of-line stubs to reserve within .text"
> +       default 32768 if PPC_FTRACE_OUT_OF_LINE
> +       default 0

This entry is meaningless when CONFIG_PPC_FTRACE_OUT_OF_LINE=n.

           depends on PPC_FTRACE_OUT_OF_LINE
           default 32768




> +       help
> +         Number of stubs to reserve for use by ftrace. This space is
> +         reserved within .text, and is distinct from any additional space
> +         added at the end of .text before the final vmlinux link. Set to
> +         zero to have stubs only be generated at the end of vmlinux (only
> +         if the size of vmlinux is less than 32MB). Set to a higher value
> +         if building vmlinux larger than 48MB.
> +
>  config HOTPLUG_CPU
>         bool "Support for enabling/disabling CPUs"
>         depends on SMP && (PPC_PSERIES || \
> diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
> index bdbafc668b20..28f3590ca780 100644
> --- a/arch/powerpc/include/asm/ftrace.h
> +++ b/arch/powerpc/include/asm/ftrace.h
> @@ -138,8 +138,10 @@ extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
>  struct ftrace_ool_stub {
>         u32     insn[4];
>  };
> -extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], ftrace_ool_stub_inittext[];
> -extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_inittext_count;
> +extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], ftrace_ool_stub_text[],
> +                             ftrace_ool_stub_inittext[];
> +extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_text_count,
> +                   ftrace_ool_stub_inittext_count;
>  #endif
>  void ftrace_free_init_tramp(void);
>  unsigned long ftrace_call_adjust(unsigned long addr);
> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index 1fee074388cc..bee2c54a8c04 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -168,7 +168,7 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
>  static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
>  {
>  #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> -       static int ool_stub_text_end_index, ool_stub_inittext_index;
> +       static int ool_stub_text_index, ool_stub_text_end_index, ool_stub_inittext_index;
>         int ret = 0, ool_stub_count, *ool_stub_index;
>         ppc_inst_t inst;
>         /*
> @@ -191,9 +191,22 @@ static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
>                 ool_stub_index = &ool_stub_inittext_index;
>                 ool_stub_count = ftrace_ool_stub_inittext_count;
>         } else if (is_kernel_text(rec->ip)) {
> -               ool_stub = ftrace_ool_stub_text_end;
> -               ool_stub_index = &ool_stub_text_end_index;
> -               ool_stub_count = ftrace_ool_stub_text_end_count;
> +               /*
> +                * ftrace records are sorted, so we first use up the stub area within .text
> +                * (ftrace_ool_stub_text) before using the area at the end of .text
> +                * (ftrace_ool_stub_text_end), unless the stub is out of range of the record.
> +                */
> +               if (ool_stub_text_index >= ftrace_ool_stub_text_count ||
> +                   !is_offset_in_branch_range((long)rec->ip -
> +                                              (long)&ftrace_ool_stub_text[ool_stub_text_index])) {
> +                       ool_stub = ftrace_ool_stub_text_end;
> +                       ool_stub_index = &ool_stub_text_end_index;
> +                       ool_stub_count = ftrace_ool_stub_text_end_count;
> +               } else {
> +                       ool_stub = ftrace_ool_stub_text;
> +                       ool_stub_index = &ool_stub_text_index;
> +                       ool_stub_count = ftrace_ool_stub_text_count;
> +               }
>  #ifdef CONFIG_MODULES
>         } else if (mod) {
>                 ool_stub = mod->arch.ool_stubs;
> diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
> index 5b2fc6483dce..a6bf7f841040 100644
> --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> @@ -374,6 +374,14 @@ _GLOBAL(return_to_handler)
>         blr
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>
> +#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> +SYM_DATA(ftrace_ool_stub_text_count, .long CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE)
> +
> +SYM_CODE_START(ftrace_ool_stub_text)
> +       .space CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE * FTRACE_OOL_STUB_SIZE
> +SYM_CODE_END(ftrace_ool_stub_text)
> +#endif
> +
>  .pushsection ".tramp.ftrace.text","aw",@progbits;
>  .globl ftrace_tramp_text
>  ftrace_tramp_text:
> diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
> index 3a389526498e..9eeb6edf02fe 100644
> --- a/arch/powerpc/tools/Makefile
> +++ b/arch/powerpc/tools/Makefile
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-or-later
>
>  quiet_cmd_gen_ftrace_ool_stubs = GEN     $@
> -      cmd_gen_ftrace_ool_stubs = $< vmlinux.o $@
> +      cmd_gen_ftrace_ool_stubs = $< $(CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE) vmlinux.o $@
>
>  $(obj)/.vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE
>         $(call if_changed,gen_ftrace_ool_stubs)
> diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
> index 8e0a6d4ea202..d6bd834e0868 100755
> --- a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
> +++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
> @@ -8,8 +8,9 @@ is_enabled() {
>         grep -q "^$1=y" include/config/auto.conf
>  }
>
> -vmlinux_o=${1}
> -arch_vmlinux_S=${2}
> +vmlinux_o=${2}
> +arch_vmlinux_S=${3}
> +arch_vmlinux_o=$(dirname ${arch_vmlinux_S})/$(basename ${arch_vmlinux_S} .S).o


arch_vmlinux_o is not used in this script. Delete it.






>
>  RELOCATION=R_PPC64_ADDR64
>  if is_enabled CONFIG_PPC32; then
> @@ -21,15 +22,22 @@ num_ool_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries
>  num_ool_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
>                          grep ".init.text" | grep "${RELOCATION}" | wc -l)
>
> +num_ool_stubs_text_builtin=${1}
> +if [ ${num_ool_stubs_text} -gt ${num_ool_stubs_text_builtin} ]; then
> +       num_ool_stubs_text_end=$(expr ${num_ool_stubs_text} - ${num_ool_stubs_text_builtin})
> +else
> +       num_ool_stubs_text_end=0
> +fi
> +
>  cat > ${arch_vmlinux_S} <<EOF
>  #include <asm/asm-offsets.h>
>  #include <linux/linkage.h>
>
>  .pushsection .tramp.ftrace.text,"aw"
> -SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text})
> +SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text_end})
>
>  SYM_CODE_START(ftrace_ool_stub_text_end)
> -       .space ${num_ool_stubs_text} * FTRACE_OOL_STUB_SIZE
> +       .space ${num_ool_stubs_text_end} * FTRACE_OOL_STUB_SIZE
>  SYM_CODE_END(ftrace_ool_stub_text_end)
>  .popsection
>
> --
> 2.46.0
>


--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines
  2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
                   ` (16 preceding siblings ...)
  2024-09-15 20:56 ` [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines Hari Bathini
@ 2024-10-09 15:46 ` Masahiro Yamada
  17 siblings, 0 replies; 36+ messages in thread
From: Masahiro Yamada @ 2024-10-09 15:46 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel,
	Naveen N. Rao, Mark Rutland, Daniel Borkmann, Nicholas Piggin,
	Alexei Starovoitov, Steven Rostedt, Andrii Nakryiko,
	Christophe Leroy, Vishal Chourasia, Mahesh J Salgaonkar,
	Masami Hiramatsu

[-- Attachment #1: Type: text/plain, Size: 8950 bytes --]

On Mon, Sep 16, 2024 at 5:57 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
> This is v5 of the series posted here:
> https://lore.kernel.org/all/cover.1720942106.git.naveen@kernel.org/
>
> This series reworks core ftrace support on powerpc to have the function
> profiling sequence moved out of line. This enables us to have a single
> nop at kernel function entry virtually eliminating effect of the
> function tracer when it is not enabled. The function profile sequence is
> moved out of line and is allocated at two separate places depending on a
> new config option.
>
> For 64-bit powerpc, the function profiling sequence is also updated to
> include an additional instruction 'mtlr r0' after the usual
> two-instruction sequence to fix link stack imbalance (return address
> predictor) when ftrace is enabled. This showed an improvement of ~10%
> in null_syscall benchmark (NR_LOOPS=10000000) on a Power 10 system
> with ftrace enabled.
>
> Finally, support for ftrace direct calls is added based on support for
> DYNAMIC_FTRACE_WITH_CALL_OPS. BPF Trampoline support is added atop this.
>
> Support for ftrace direct calls is added for 32-bit powerpc. There is
> some code to enable bpf trampolines for 32-bit powerpc, but it is not
> complete and will need to be pursued separately.
>
> Patches 1 to 10 are independent of this series and can go in separately
> though. Rest of the patches depend on the series from Benjamin Gray
> adding support for patch_uint() and patch_ulong():
> https://lore.kernel.org/all/172474280311.31690.1489687786264785049.b4-ty@ellerman.id.au/



It is getting better.

I attached a diff for improvements.



Also, please run 'shellcheck' and eliminate
as many warnings as you can.






$ shellcheck  arch/powerpc/tools/ftrace-gen-ool-stubs.sh

In arch/powerpc/tools/ftrace-gen-ool-stubs.sh line 19:
num_ool_stubs_text=$(${OBJDUMP} -r -j __patchable_function_entries
${vmlinux_o} |

^----------^ SC2086 (info): Double quote to prevent globbing and word
splitting.

Did you mean:
num_ool_stubs_text=$(${OBJDUMP} -r -j __patchable_function_entries
"${vmlinux_o}" |


In arch/powerpc/tools/ftrace-gen-ool-stubs.sh line 20:
     grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
                                            ^------------------^
SC2126 (style): Consider using 'grep -c' instead of 'grep|wc -l'.


In arch/powerpc/tools/ftrace-gen-ool-stubs.sh line 21:
num_ool_stubs_inittext=$(${OBJDUMP} -r -j __patchable_function_entries
${vmlinux_o} |

^----------^ SC2086 (info): Double quote to prevent globbing and word
splitting.

Did you mean:
num_ool_stubs_inittext=$(${OBJDUMP} -r -j __patchable_function_entries
"${vmlinux_o}" |


In arch/powerpc/tools/ftrace-gen-ool-stubs.sh line 22:
grep ".init.text" | grep "${RELOCATION}" | wc -l)
                                             ^------------------^
SC2126 (style): Consider using 'grep -c' instead of 'grep|wc -l'.


In arch/powerpc/tools/ftrace-gen-ool-stubs.sh line 25:
if [ ${num_ool_stubs_text} -gt ${num_ool_stubs_text_builtin} ]; then
     ^-------------------^ SC2086 (info): Double quote to prevent
globbing and word splitting.
                               ^---------------------------^ SC2086
(info): Double quote to prevent globbing and word splitting.

Did you mean:
if [ "${num_ool_stubs_text}" -gt "${num_ool_stubs_text_builtin}" ]; then


In arch/powerpc/tools/ftrace-gen-ool-stubs.sh line 26:
num_ool_stubs_text_end=$(expr ${num_ool_stubs_text} -
${num_ool_stubs_text_builtin})
                                 ^--^ SC2003 (style): expr is
antiquated. Consider rewriting this using $((..)), ${} or [[ ]].
                                      ^-------------------^ SC2086
(info): Double quote to prevent globbing and word splitting.

^---------------------------^ SC2086 (info): Double quote to prevent
globbing and word splitting.

Did you mean:
num_ool_stubs_text_end=$(expr "${num_ool_stubs_text}" -
"${num_ool_stubs_text_builtin}")


In arch/powerpc/tools/ftrace-gen-ool-stubs.sh line 31:
cat > ${arch_vmlinux_S} <<EOF
      ^---------------^ SC2086 (info): Double quote to prevent
globbing and word splitting.

Did you mean:
cat > "${arch_vmlinux_S}" <<EOF

For more information:
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
  https://www.shellcheck.net/wiki/SC2003 -- expr is antiquated. Consider rewr...
  https://www.shellcheck.net/wiki/SC2126 -- Consider using 'grep -c' instead ...











> Changelog v5:
> * Intermediate files named .vmlinux.arch.* instead of .arch.vmlinux.*
> * Fixed ftrace stack tracer failure due to inadvertent use of
>   'add r7, r3, MCOUNT_INSN_SIZE' instruction instead of
>   'addi r7, r3, MCOUNT_INSN_SIZE'
> * Fixed build error for !CONFIG_MODULES case.
> * .vmlinux.arch.* files compiled under arch/powerpc/tools
> * Made sure .vmlinux.arch.* files are cleaned with `make clean`
> * num_ool_stubs_text_end used for setting up ftrace_ool_stub_text_end
>   set to zero instead of computing to some random negative value when
>   not required.
> * Resolved checkpatch.pl warnings.
> * Dropped RFC tag.
>
> Changelog v4:
> - Patches 1, 10 and 13 are new.
> - Address review comments from Nick. Numerous changes throughout the
>   patch series.
> - Extend support for ftrace ool to vmlinux text up to 64MB (patch 13).
> - Address remaining TODOs in support for BPF Trampolines.
> - Update synchronization when patching instructions during trampoline
>   attach/detach.
>
>
> Naveen N Rao (17):
>   powerpc/trace: Account for -fpatchable-function-entry support by
>     toolchain
>   powerpc/kprobes: Use ftrace to determine if a probe is at function
>     entry
>   powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc
>     v5.x
>   powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code
>   powerpc/module_64: Convert #ifdef to IS_ENABLED()
>   powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
>   powerpc/ftrace: Skip instruction patching if the instructions are the
>     same
>   powerpc/ftrace: Move ftrace stub used for init text before _einittext
>   powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into
>     bpf_jit_emit_func_call_rel()
>   powerpc/ftrace: Add a postlink script to validate function tracer
>   kbuild: Add generic hook for architectures to use before the final
>     vmlinux link
>   powerpc64/ftrace: Move ftrace sequence out of line
>   powerpc64/ftrace: Support .text larger than 32MB with out-of-line
>     stubs
>   powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
>   powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>   samples/ftrace: Add support for ftrace direct samples on powerpc
>   powerpc64/bpf: Add support for bpf trampolines
>
>  arch/Kconfig                                |   6 +
>  arch/powerpc/Kbuild                         |   2 +-
>  arch/powerpc/Kconfig                        |  23 +-
>  arch/powerpc/Makefile                       |   8 +
>  arch/powerpc/Makefile.postlink              |   8 +
>  arch/powerpc/include/asm/ftrace.h           |  33 +-
>  arch/powerpc/include/asm/module.h           |   5 +
>  arch/powerpc/include/asm/ppc-opcode.h       |  14 +
>  arch/powerpc/kernel/asm-offsets.c           |  11 +
>  arch/powerpc/kernel/kprobes.c               |  18 +-
>  arch/powerpc/kernel/module_64.c             |  66 +-
>  arch/powerpc/kernel/trace/Makefile          |  11 +-
>  arch/powerpc/kernel/trace/ftrace.c          | 298 ++++++-
>  arch/powerpc/kernel/trace/ftrace_64_pg.c    |  69 +-
>  arch/powerpc/kernel/trace/ftrace_entry.S    | 244 ++++--
>  arch/powerpc/kernel/vmlinux.lds.S           |   3 +-
>  arch/powerpc/net/bpf_jit.h                  |  12 +
>  arch/powerpc/net/bpf_jit_comp.c             | 847 +++++++++++++++++++-
>  arch/powerpc/net/bpf_jit_comp32.c           |   7 +-
>  arch/powerpc/net/bpf_jit_comp64.c           |  68 +-
>  arch/powerpc/tools/Makefile                 |  12 +
>  arch/powerpc/tools/ftrace-gen-ool-stubs.sh  |  52 ++
>  arch/powerpc/tools/ftrace_check.sh          |  50 ++
>  samples/ftrace/ftrace-direct-modify.c       |  85 +-
>  samples/ftrace/ftrace-direct-multi-modify.c | 101 ++-
>  samples/ftrace/ftrace-direct-multi.c        |  79 +-
>  samples/ftrace/ftrace-direct-too.c          |  83 +-
>  samples/ftrace/ftrace-direct.c              |  69 +-
>  scripts/Makefile.vmlinux                    |   7 +
>  scripts/link-vmlinux.sh                     |   7 +-
>  30 files changed, 2098 insertions(+), 200 deletions(-)
>  create mode 100644 arch/powerpc/tools/Makefile
>  create mode 100755 arch/powerpc/tools/ftrace-gen-ool-stubs.sh
>  create mode 100755 arch/powerpc/tools/ftrace_check.sh
>
> --
> 2.46.0
>


-- 
Best Regards
Masahiro Yamada

[-- Attachment #2: 0001-fixup.patch --]
[-- Type: text/x-patch, Size: 4765 bytes --]

From 0e96689efc977542a47e815a78892833e0305d79 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <masahiroy@kernel.org>
Date: Wed, 9 Oct 2024 23:37:47 +0900
Subject: [PATCH] fixup

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---
 arch/Kconfig                               | 2 +-
 arch/powerpc/Kconfig                       | 5 ++---
 arch/powerpc/tools/.gitignore              | 2 ++
 arch/powerpc/tools/Makefile                | 7 ++-----
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh | 5 ++---
 scripts/Makefile.vmlinux                   | 4 ++--
 scripts/link-vmlinux.sh                    | 2 +-
 7 files changed, 12 insertions(+), 15 deletions(-)
 create mode 100644 arch/powerpc/tools/.gitignore

diff --git a/arch/Kconfig b/arch/Kconfig
index 87806750cf4e..a1538927c8c1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1685,7 +1685,7 @@ config ARCH_NEED_CMPXCHG_1_EMU
 	bool
 
 config ARCH_WANTS_PRE_LINK_VMLINUX
-	def_bool n
+	bool
 	help
 	  An architecture can select this if it provides arch/<arch>/tools/Makefile
 	  with .arch.vmlinux.o target to be linked into vmlinux.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8a31f61f1b34..c85470b24118 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -575,13 +575,12 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 
 config PPC_FTRACE_OUT_OF_LINE
 	def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
-	depends on PPC64
 	select ARCH_WANTS_PRE_LINK_VMLINUX
 
 config PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE
 	int "Number of ftrace out-of-line stubs to reserve within .text"
-	default 32768 if PPC_FTRACE_OUT_OF_LINE
-	default 0
+	depends on PPC_FTRACE_OUT_OF_LINE
+	default 32768
 	help
 	  Number of stubs to reserve for use by ftrace. This space is
 	  reserved within .text, and is distinct from any additional space
diff --git a/arch/powerpc/tools/.gitignore b/arch/powerpc/tools/.gitignore
new file mode 100644
index 000000000000..ec380a14a09a
--- /dev/null
+++ b/arch/powerpc/tools/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+/vmlinux.arch.S
diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
index 9eeb6edf02fe..96dbbc4f3e66 100644
--- a/arch/powerpc/tools/Makefile
+++ b/arch/powerpc/tools/Makefile
@@ -3,10 +3,7 @@
 quiet_cmd_gen_ftrace_ool_stubs = GEN     $@
       cmd_gen_ftrace_ool_stubs = $< $(CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE) vmlinux.o $@
 
-$(obj)/.vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE
+$(obj)/vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE
 	$(call if_changed,gen_ftrace_ool_stubs)
 
-$(obj)/.vmlinux.arch.o: $(obj)/.vmlinux.arch.S FORCE
-	$(call if_changed_rule,as_o_S)
-
-clean-files += .vmlinux.arch.S .vmlinux.arch.o
+targets += vmlinux.arch.S
diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
index 33f5ae4bace5..c69b375309bc 100755
--- a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
+++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh
@@ -10,16 +10,15 @@ is_enabled() {
 
 vmlinux_o=${2}
 arch_vmlinux_S=${3}
-arch_vmlinux_o=$(dirname ${arch_vmlinux_S})/$(basename ${arch_vmlinux_S} .S).o
 
 RELOCATION=R_PPC64_ADDR64
 if is_enabled CONFIG_PPC32; then
 	RELOCATION=R_PPC_ADDR32
 fi
 
-num_ool_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
+num_ool_stubs_text=$(${OBJDUMP} -r -j __patchable_function_entries ${vmlinux_o} |
 		     grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
-num_ool_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
+num_ool_stubs_inittext=$(${OBJDUMP} -r -j __patchable_function_entries ${vmlinux_o} |
 			 grep ".init.text" | grep "${RELOCATION}" | wc -l)
 
 num_ool_stubs_text_builtin=${1}
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 8f08117f4a48..dddad554e912 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -23,9 +23,9 @@ vmlinux: .vmlinux.export.o
 endif
 
 ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
-vmlinux: arch/$(SRCARCH)/tools/.vmlinux.arch.o
+vmlinux: arch/$(SRCARCH)/tools/vmlinux.arch.o
 
-arch/$(SRCARCH)/tools/.vmlinux.arch.o: vmlinux.o
+arch/$(SRCARCH)/tools/vmlinux.arch.o: vmlinux.o FORCE
 	$(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools $@
 endif
 
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 33c1aa8dd468..7acf4e31e51c 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -200,7 +200,7 @@ ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init init/version-timestamp.o
 
 arch_vmlinux_o=""
 if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
-	arch_vmlinux_o=arch/${SRCARCH}/tools/.vmlinux.arch.o
+	arch_vmlinux_o=arch/${SRCARCH}/tools/vmlinux.arch.o
 fi
 
 btf_vmlinux_bin_o=
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-10-01 14:53           ` Alexei Starovoitov
  2024-10-03  5:33             ` Hari Bathini
@ 2024-10-10  0:18             ` Michael Ellerman
  2024-10-10  9:39               ` Hari Bathini
  1 sibling, 1 reply; 36+ messages in thread
From: Michael Ellerman @ 2024-10-10  0:18 UTC (permalink / raw)
  To: Alexei Starovoitov, Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Tue, Oct 1, 2024 at 12:18 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>> On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
>> > On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>> >> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
>> >>> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>> >>>>
>> >>>> +
>> >>>> +       /*
>> >>>> +        * Generated stack layout:
>> >>>> +        *
>> >>>> +        * func prev back chain         [ back chain        ]
>> >>>> +        *                              [                   ]
>> >>>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
>> >>>> +        *                              [                   ] --
>> >>> ...
>> >>>> +
>> >>>> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
>> >>>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
>> >>>
>> >>> What is the goal of such a large "red zone" ?
>> >>> The kernel stack is a limited resource.
>> >>> Why reserve 64 bytes ?
>> >>> tail call cnt can probably be optional as well.
>> >>
>> >> Hi Alexei, thanks for reviewing.
>> >> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
>> >> a redzone of 80 bytes since tailcall support was introduced [1].
>> >> It came down to 64 bytes thanks to [2]. The red zone is being used
>> >> to save NVRs and tail call count when a stack is not setup. I do
>> >> agree that we should look at optimizing it further. Do you think
>> >> the optimization should go as part of PPC64 trampoline enablement
>> >> being done here or should that be taken up as a separate item, maybe?
>> >
>> > The follow up is fine.
>> > It just odd to me that we currently have:
>> >
>> > [   unused red zone ] 208 bytes protected
>> >
>> > I simply don't understand why we need to waste this much stack space.
>> > Why can't it be zero today ?
>>
>> The ABI for ppc64 has a redzone of 288 bytes below the current
>> stack pointer that can be used as a scratch area until a new
>> stack frame is created. So, no wastage of stack space as such.
>> It is just red zone that can be used before a new stack frame
>> is created. The comment there is only to show how redzone is
>> being used in ppc64 BPF JIT. I think the confusion is with the
>> mention of "208 bytes" as protected. As not all of that scratch
>> area is used, it mentions the remaining as unused. Essentially
>> 288 bytes below current stack pointer is protected from debuggers
>> and interrupt code (red zone). Note that it should be 224 bytes
>> of unused red zone instead of 208 bytes as red zone usage in
>> ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
>> Hope that clears the misunderstanding..
>
> I see. That makes sense. So it's similar to amd64 red zone,
> but there we have an issue with irqs, hence the kernel is
> compiled with -mno-red-zone.

I assume that issue is that the interrupt entry unconditionally writes
some data below the stack pointer, disregarding the red zone?

> I guess ppc always has a different interrupt stack and
> it's not an issue?

No, the interrupt entry allocates a frame that is big enough to cover
the red zone as well as the space it needs to save registers.

See STACK_INT_FRAME_SIZE which includes KERNEL_REDZONE_SIZE:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/include/asm/ptrace.h?commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n165

Which is renamed to INT_FRAME_SIZE in asm-offsets.c and then is used in
the interrupt entry here:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kernel/exceptions-64s.S?commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n497

cheers

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-10-10  0:18             ` Michael Ellerman
@ 2024-10-10  9:39               ` Hari Bathini
  2024-10-10  9:46                 ` Hari Bathini
  0 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-10-10  9:39 UTC (permalink / raw)
  To: Michael Ellerman, Alexei Starovoitov
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu



On 10/10/24 5:48 am, Michael Ellerman wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> On Tue, Oct 1, 2024 at 12:18 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>> On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
>>>> On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>>>> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
>>>>>> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>>>>>>
>>>>>>> +
>>>>>>> +       /*
>>>>>>> +        * Generated stack layout:
>>>>>>> +        *
>>>>>>> +        * func prev back chain         [ back chain        ]
>>>>>>> +        *                              [                   ]
>>>>>>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 bytes (64-bit powerpc)
>>>>>>> +        *                              [                   ] --
>>>>>> ...
>>>>>>> +
>>>>>>> +       /* Dummy frame size for proper unwind - includes 64-bytes red zone for 64-bit powerpc */
>>>>>>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
>>>>>>
>>>>>> What is the goal of such a large "red zone" ?
>>>>>> The kernel stack is a limited resource.
>>>>>> Why reserve 64 bytes ?
>>>>>> tail call cnt can probably be optional as well.
>>>>>
>>>>> Hi Alexei, thanks for reviewing.
>>>>> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
>>>>> a redzone of 80 bytes since tailcall support was introduced [1].
>>>>> It came down to 64 bytes thanks to [2]. The red zone is being used
>>>>> to save NVRs and tail call count when a stack is not setup. I do
>>>>> agree that we should look at optimizing it further. Do you think
>>>>> the optimization should go as part of PPC64 trampoline enablement
>>>>> being done here or should that be taken up as a separate item, maybe?
>>>>
>>>> The follow up is fine.
>>>> It just odd to me that we currently have:
>>>>
>>>> [   unused red zone ] 208 bytes protected
>>>>
>>>> I simply don't understand why we need to waste this much stack space.
>>>> Why can't it be zero today ?
>>>
>>> The ABI for ppc64 has a redzone of 288 bytes below the current
>>> stack pointer that can be used as a scratch area until a new
>>> stack frame is created. So, no wastage of stack space as such.
>>> It is just red zone that can be used before a new stack frame
>>> is created. The comment there is only to show how redzone is
>>> being used in ppc64 BPF JIT. I think the confusion is with the
>>> mention of "208 bytes" as protected. As not all of that scratch
>>> area is used, it mentions the remaining as unused. Essentially
>>> 288 bytes below current stack pointer is protected from debuggers
>>> and interrupt code (red zone). Note that it should be 224 bytes
>>> of unused red zone instead of 208 bytes as red zone usage in
>>> ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
>>> Hope that clears the misunderstanding..
>>
>> I see. That makes sense. So it's similar to amd64 red zone,
>> but there we have an issue with irqs, hence the kernel is
>> compiled with -mno-red-zone.
> 
> I assume that issue is that the interrupt entry unconditionally writes
> some data below the stack pointer, disregarding the red zone?
> 
>> I guess ppc always has a different interrupt stack and
>> it's not an issue?
> 
> No, the interrupt entry allocates a frame that is big enough to cover
> the red zone as well as the space it needs to save registers.
> 
> See STACK_INT_FRAME_SIZE which includes KERNEL_REDZONE_SIZE:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/include/asm/ptrace.h?commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n165
> 
> Which is renamed to INT_FRAME_SIZE in asm-offsets.c and then is used in
> the interrupt entry here:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kernel/exceptions-64s.S?commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n497

Thanks for clarifying that, Michael.
Only async interrupt handlers use different interrupt stacks, right?

Thanks
Hari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-10-10  9:39               ` Hari Bathini
@ 2024-10-10  9:46                 ` Hari Bathini
  2024-10-28  5:46                   ` Michael Ellerman
  0 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-10-10  9:46 UTC (permalink / raw)
  To: Michael Ellerman, Alexei Starovoitov
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu



On 10/10/24 3:09 pm, Hari Bathini wrote:
> 
> 
> On 10/10/24 5:48 am, Michael Ellerman wrote:
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>> On Tue, Oct 1, 2024 at 12:18 AM Hari Bathini <hbathini@linux.ibm.com> 
>>> wrote:
>>>> On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
>>>>> On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini 
>>>>> <hbathini@linux.ibm.com> wrote:
>>>>>> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
>>>>>>> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini 
>>>>>>> <hbathini@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> +
>>>>>>>> +       /*
>>>>>>>> +        * Generated stack layout:
>>>>>>>> +        *
>>>>>>>> +        * func prev back chain         [ back chain        ]
>>>>>>>> +        *                              [                   ]
>>>>>>>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 
>>>>>>>> bytes (64-bit powerpc)
>>>>>>>> +        *                              [                   ] --
>>>>>>> ...
>>>>>>>> +
>>>>>>>> +       /* Dummy frame size for proper unwind - includes 64- 
>>>>>>>> bytes red zone for 64-bit powerpc */
>>>>>>>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
>>>>>>>
>>>>>>> What is the goal of such a large "red zone" ?
>>>>>>> The kernel stack is a limited resource.
>>>>>>> Why reserve 64 bytes ?
>>>>>>> tail call cnt can probably be optional as well.
>>>>>>
>>>>>> Hi Alexei, thanks for reviewing.
>>>>>> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
>>>>>> a redzone of 80 bytes since tailcall support was introduced [1].
>>>>>> It came down to 64 bytes thanks to [2]. The red zone is being used
>>>>>> to save NVRs and tail call count when a stack is not setup. I do
>>>>>> agree that we should look at optimizing it further. Do you think
>>>>>> the optimization should go as part of PPC64 trampoline enablement
>>>>>> being done here or should that be taken up as a separate item, maybe?
>>>>>
>>>>> The follow up is fine.
>>>>> It just odd to me that we currently have:
>>>>>
>>>>> [   unused red zone ] 208 bytes protected
>>>>>
>>>>> I simply don't understand why we need to waste this much stack space.
>>>>> Why can't it be zero today ?
>>>>
>>>> The ABI for ppc64 has a redzone of 288 bytes below the current
>>>> stack pointer that can be used as a scratch area until a new
>>>> stack frame is created. So, no wastage of stack space as such.
>>>> It is just red zone that can be used before a new stack frame
>>>> is created. The comment there is only to show how redzone is
>>>> being used in ppc64 BPF JIT. I think the confusion is with the
>>>> mention of "208 bytes" as protected. As not all of that scratch
>>>> area is used, it mentions the remaining as unused. Essentially
>>>> 288 bytes below current stack pointer is protected from debuggers
>>>> and interrupt code (red zone). Note that it should be 224 bytes
>>>> of unused red zone instead of 208 bytes as red zone usage in
>>>> ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
>>>> Hope that clears the misunderstanding..
>>>
>>> I see. That makes sense. So it's similar to amd64 red zone,
>>> but there we have an issue with irqs, hence the kernel is
>>> compiled with -mno-red-zone.
>>
>> I assume that issue is that the interrupt entry unconditionally writes
>> some data below the stack pointer, disregarding the red zone?
>>
>>> I guess ppc always has a different interrupt stack and
>>> it's not an issue?
>>
>> No, the interrupt entry allocates a frame that is big enough to cover
>> the red zone as well as the space it needs to save registers.
>>
>> See STACK_INT_FRAME_SIZE which includes KERNEL_REDZONE_SIZE:
>>
>>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ 
>> tree/arch/powerpc/include/asm/ptrace.h? 
>> commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n165
>>
>> Which is renamed to INT_FRAME_SIZE in asm-offsets.c and then is used in
>> the interrupt entry here:
>>
>>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ 
>> tree/arch/powerpc/kernel/exceptions-64s.S? 
>> commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n497
> 
> Thanks for clarifying that, Michael.
> Only async interrupt handlers use different interrupt stacks, right?

... and separate emergency stack for some special cases...

Thanks
Hari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link
  2024-10-09 15:23   ` Masahiro Yamada
@ 2024-10-10  9:56     ` Hari Bathini
  2024-10-10 11:37       ` Masahiro Yamada
  0 siblings, 1 reply; 36+ messages in thread
From: Hari Bathini @ 2024-10-10  9:56 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel,
	Naveen N. Rao, Mark Rutland, Daniel Borkmann, Nicholas Piggin,
	Alexei Starovoitov, Steven Rostedt, Andrii Nakryiko,
	Christophe Leroy, Vishal Chourasia, Mahesh J Salgaonkar,
	Masami Hiramatsu


On 09/10/24 8:53 pm, Masahiro Yamada wrote:
> On Mon, Sep 16, 2024 at 5:58 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>
>> From: Naveen N Rao <naveen@kernel.org>
>>
>> On powerpc, we would like to be able to make a pass on vmlinux.o and
>> generate a new object file to be linked into vmlinux. Add a generic pass
>> in Makefile.vmlinux that architectures can use for this purpose.
>>
>> Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
>> provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
>> will be invoked prior to the final vmlinux link step.
>>
>> Signed-off-by: Naveen N Rao <naveen@kernel.org>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>
>> Changes in v5:
>> * Intermediate files named .vmlinux.arch.* instead of .arch.vmlinux.*
>>
>>
>>   arch/Kconfig             | 6 ++++++
>>   scripts/Makefile.vmlinux | 7 +++++++
>>   scripts/link-vmlinux.sh  | 7 ++++++-
>>   3 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 975dd22a2dbd..ef868ff8156a 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -1643,4 +1643,10 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
>>   config ARCH_NEED_CMPXCHG_1_EMU
>>          bool
>>
>> +config ARCH_WANTS_PRE_LINK_VMLINUX
>> +       def_bool n
> 
> 
> Redundant default. This line should be "bool".
> 
> 
> 
> 
> 
> 
>> +       help
>> +         An architecture can select this if it provides arch/<arch>/tools/Makefile
>> +         with .arch.vmlinux.o target to be linked into vmlinux.
>> +
>>   endmenu
>> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
>> index 49946cb96844..edf6fae8d960 100644
>> --- a/scripts/Makefile.vmlinux
>> +++ b/scripts/Makefile.vmlinux
>> @@ -22,6 +22,13 @@ targets += .vmlinux.export.o
>>   vmlinux: .vmlinux.export.o
>>   endif
>>
>> +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
>> +vmlinux: arch/$(SRCARCH)/tools/.vmlinux.arch.o
> 
> If you move this to arch/*/tools/, there is no reason
> to make it a hidden file.

Thanks for reviewing and the detailed comments, Masahiro.

> 
> 
> vmlinux: arch/$(SRCARCH)/tools/vmlinux.arch.o
> 
> 
> 
> 
>> +arch/$(SRCARCH)/tools/.vmlinux.arch.o: vmlinux.o
> 
> FORCE is missing.


I dropped FORCE as it was rebuilding vmlinux on every invocation
of `make` irrespective of whether vmlinux.o changed or not..
Just curious if the changes you suggested makes FORCE necessary
or FORCE was expected even without the other changes you suggested?

Thanks
Hari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link
  2024-10-10  9:56     ` Hari Bathini
@ 2024-10-10 11:37       ` Masahiro Yamada
  2024-10-24 17:20         ` Hari Bathini
  0 siblings, 1 reply; 36+ messages in thread
From: Masahiro Yamada @ 2024-10-10 11:37 UTC (permalink / raw)
  To: Hari Bathini
  Cc: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel,
	Naveen N. Rao, Mark Rutland, Daniel Borkmann, Nicholas Piggin,
	Alexei Starovoitov, Steven Rostedt, Andrii Nakryiko,
	Christophe Leroy, Vishal Chourasia, Mahesh J Salgaonkar,
	Masami Hiramatsu

On Thu, Oct 10, 2024 at 6:57 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
>
> On 09/10/24 8:53 pm, Masahiro Yamada wrote:
> > On Mon, Sep 16, 2024 at 5:58 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
> >>
> >> From: Naveen N Rao <naveen@kernel.org>
> >>
> >> On powerpc, we would like to be able to make a pass on vmlinux.o and
> >> generate a new object file to be linked into vmlinux. Add a generic pass
> >> in Makefile.vmlinux that architectures can use for this purpose.
> >>
> >> Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
> >> provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
> >> will be invoked prior to the final vmlinux link step.
> >>
> >> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> >> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> >> ---
> >>
> >> Changes in v5:
> >> * Intermediate files named .vmlinux.arch.* instead of .arch.vmlinux.*
> >>
> >>
> >>   arch/Kconfig             | 6 ++++++
> >>   scripts/Makefile.vmlinux | 7 +++++++
> >>   scripts/link-vmlinux.sh  | 7 ++++++-
> >>   3 files changed, 19 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/Kconfig b/arch/Kconfig
> >> index 975dd22a2dbd..ef868ff8156a 100644
> >> --- a/arch/Kconfig
> >> +++ b/arch/Kconfig
> >> @@ -1643,4 +1643,10 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
> >>   config ARCH_NEED_CMPXCHG_1_EMU
> >>          bool
> >>
> >> +config ARCH_WANTS_PRE_LINK_VMLINUX
> >> +       def_bool n
> >
> >
> > Redundant default. This line should be "bool".
> >
> >
> >
> >
> >
> >
> >> +       help
> >> +         An architecture can select this if it provides arch/<arch>/tools/Makefile
> >> +         with .arch.vmlinux.o target to be linked into vmlinux.
> >> +
> >>   endmenu
> >> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> >> index 49946cb96844..edf6fae8d960 100644
> >> --- a/scripts/Makefile.vmlinux
> >> +++ b/scripts/Makefile.vmlinux
> >> @@ -22,6 +22,13 @@ targets += .vmlinux.export.o
> >>   vmlinux: .vmlinux.export.o
> >>   endif
> >>
> >> +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> >> +vmlinux: arch/$(SRCARCH)/tools/.vmlinux.arch.o
> >
> > If you move this to arch/*/tools/, there is no reason
> > to make it a hidden file.
>
> Thanks for reviewing and the detailed comments, Masahiro.
>
> >
> >
> > vmlinux: arch/$(SRCARCH)/tools/vmlinux.arch.o
> >
> >
> >
> >
> >> +arch/$(SRCARCH)/tools/.vmlinux.arch.o: vmlinux.o
> >
> > FORCE is missing.
>
>
> I dropped FORCE as it was rebuilding vmlinux on every invocation
> of `make` irrespective of whether vmlinux.o changed or not..


It is because you did not add vmlinux.arch.S to 'targets'

See my comment in 12/17.

  targets += vmlinux.arch.S


> Just curious if the changes you suggested makes FORCE necessary
> or FORCE was expected even without the other changes you suggested?


FORCE is necessary.

arch/powerpc/tools/Makefile must be checked every time.


When arch/powerpc/tools/ftrace-gen-ool-stubs.sh is changed,
vmlinux must be relinked.





> Thanks
> Hari




--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link
  2024-10-10 11:37       ` Masahiro Yamada
@ 2024-10-24 17:20         ` Hari Bathini
  0 siblings, 0 replies; 36+ messages in thread
From: Hari Bathini @ 2024-10-24 17:20 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: linuxppc-dev, bpf, linux-trace-kernel, linux-kbuild, linux-kernel,
	Naveen N. Rao, Mark Rutland, Daniel Borkmann, Nicholas Piggin,
	Alexei Starovoitov, Steven Rostedt, Andrii Nakryiko,
	Christophe Leroy, Vishal Chourasia, Mahesh J Salgaonkar,
	Masami Hiramatsu

Hello Masahiro,

On 10/10/24 5:07 pm, Masahiro Yamada wrote:
> On Thu, Oct 10, 2024 at 6:57 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>
>>
>> On 09/10/24 8:53 pm, Masahiro Yamada wrote:
>>> On Mon, Sep 16, 2024 at 5:58 AM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>>>
>>>> From: Naveen N Rao <naveen@kernel.org>
>>>>
>>>> On powerpc, we would like to be able to make a pass on vmlinux.o and
>>>> generate a new object file to be linked into vmlinux. Add a generic pass
>>>> in Makefile.vmlinux that architectures can use for this purpose.
>>>>
>>>> Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
>>>> provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
>>>> will be invoked prior to the final vmlinux link step.
>>>>
>>>> Signed-off-by: Naveen N Rao <naveen@kernel.org>
>>>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>>>> ---
>>>>
>>>> Changes in v5:
>>>> * Intermediate files named .vmlinux.arch.* instead of .arch.vmlinux.*
>>>>
>>>>
>>>>    arch/Kconfig             | 6 ++++++
>>>>    scripts/Makefile.vmlinux | 7 +++++++
>>>>    scripts/link-vmlinux.sh  | 7 ++++++-
>>>>    3 files changed, 19 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/Kconfig b/arch/Kconfig
>>>> index 975dd22a2dbd..ef868ff8156a 100644
>>>> --- a/arch/Kconfig
>>>> +++ b/arch/Kconfig
>>>> @@ -1643,4 +1643,10 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
>>>>    config ARCH_NEED_CMPXCHG_1_EMU
>>>>           bool
>>>>
>>>> +config ARCH_WANTS_PRE_LINK_VMLINUX
>>>> +       def_bool n
>>>
>>>
>>> Redundant default. This line should be "bool".
>>>
>>>
>>>
>>>
>>>
>>>
>>>> +       help
>>>> +         An architecture can select this if it provides arch/<arch>/tools/Makefile
>>>> +         with .arch.vmlinux.o target to be linked into vmlinux.
>>>> +
>>>>    endmenu
>>>> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
>>>> index 49946cb96844..edf6fae8d960 100644
>>>> --- a/scripts/Makefile.vmlinux
>>>> +++ b/scripts/Makefile.vmlinux
>>>> @@ -22,6 +22,13 @@ targets += .vmlinux.export.o
>>>>    vmlinux: .vmlinux.export.o
>>>>    endif
>>>>
>>>> +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
>>>> +vmlinux: arch/$(SRCARCH)/tools/.vmlinux.arch.o
>>>
>>> If you move this to arch/*/tools/, there is no reason
>>> to make it a hidden file.
>>
>> Thanks for reviewing and the detailed comments, Masahiro.
>>
>>>
>>>
>>> vmlinux: arch/$(SRCARCH)/tools/vmlinux.arch.o
>>>
>>>
>>>
>>>
>>>> +arch/$(SRCARCH)/tools/.vmlinux.arch.o: vmlinux.o
>>>
>>> FORCE is missing.
>>
>>
>> I dropped FORCE as it was rebuilding vmlinux on every invocation
>> of `make` irrespective of whether vmlinux.o changed or not..
> 
> 
> It is because you did not add vmlinux.arch.S to 'targets'
> 
> See my comment in 12/17.
> 
>    targets += vmlinux.arch.S
> 
> 
>> Just curious if the changes you suggested makes FORCE necessary
>> or FORCE was expected even without the other changes you suggested?
> 
> 
> FORCE is necessary.
> 
> arch/powerpc/tools/Makefile must be checked every time.
> 
> 
> When arch/powerpc/tools/ftrace-gen-ool-stubs.sh is changed,
> vmlinux must be relinked.

Thanks for the review and clarifications!
Posted v6 with the changes. Please review:

  
https://lore.kernel.org/all/20241018173632.277333-1-hbathini@linux.ibm.com/

- Hari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines
  2024-10-10  9:46                 ` Hari Bathini
@ 2024-10-28  5:46                   ` Michael Ellerman
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Ellerman @ 2024-10-28  5:46 UTC (permalink / raw)
  To: Hari Bathini, Alexei Starovoitov
  Cc: linuxppc-dev, bpf, linux-trace-kernel, Linux Kbuild mailing list,
	LKML, Naveen N. Rao, Mark Rutland, Daniel Borkmann,
	Masahiro Yamada, Nicholas Piggin, Alexei Starovoitov,
	Steven Rostedt, Andrii Nakryiko, Christophe Leroy,
	Vishal Chourasia, Mahesh J Salgaonkar, Masami Hiramatsu

Hari Bathini <hbathini@linux.ibm.com> writes:
> On 10/10/24 3:09 pm, Hari Bathini wrote:
>> On 10/10/24 5:48 am, Michael Ellerman wrote:
>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>>> On Tue, Oct 1, 2024 at 12:18 AM Hari Bathini <hbathini@linux.ibm.com> 
>>>> wrote:
>>>>> On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
>>>>>> On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini 
>>>>>> <hbathini@linux.ibm.com> wrote:
>>>>>>> On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
>>>>>>>> On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini 
>>>>>>>> <hbathini@linux.ibm.com> wrote:
>>>>>>>>>
>>>>>>>>> +
>>>>>>>>> +       /*
>>>>>>>>> +        * Generated stack layout:
>>>>>>>>> +        *
>>>>>>>>> +        * func prev back chain         [ back chain        ]
>>>>>>>>> +        *                              [                   ]
>>>>>>>>> +        * bpf prog redzone/tailcallcnt [ ...               ] 64 
>>>>>>>>> bytes (64-bit powerpc)
>>>>>>>>> +        *                              [                   ] --
>>>>>>>> ...
>>>>>>>>> +
>>>>>>>>> +       /* Dummy frame size for proper unwind - includes 64- 
>>>>>>>>> bytes red zone for 64-bit powerpc */
>>>>>>>>> +       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
>>>>>>>>
>>>>>>>> What is the goal of such a large "red zone" ?
>>>>>>>> The kernel stack is a limited resource.
>>>>>>>> Why reserve 64 bytes ?
>>>>>>>> tail call cnt can probably be optional as well.
>>>>>>>
>>>>>>> Hi Alexei, thanks for reviewing.
>>>>>>> FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
>>>>>>> a redzone of 80 bytes since tailcall support was introduced [1].
>>>>>>> It came down to 64 bytes thanks to [2]. The red zone is being used
>>>>>>> to save NVRs and tail call count when a stack is not setup. I do
>>>>>>> agree that we should look at optimizing it further. Do you think
>>>>>>> the optimization should go as part of PPC64 trampoline enablement
>>>>>>> being done here or should that be taken up as a separate item, maybe?
>>>>>>
>>>>>> The follow up is fine.
>>>>>> It just odd to me that we currently have:
>>>>>>
>>>>>> [   unused red zone ] 208 bytes protected
>>>>>>
>>>>>> I simply don't understand why we need to waste this much stack space.
>>>>>> Why can't it be zero today ?
>>>>>
>>>>> The ABI for ppc64 has a redzone of 288 bytes below the current
>>>>> stack pointer that can be used as a scratch area until a new
>>>>> stack frame is created. So, no wastage of stack space as such.
>>>>> It is just red zone that can be used before a new stack frame
>>>>> is created. The comment there is only to show how redzone is
>>>>> being used in ppc64 BPF JIT. I think the confusion is with the
>>>>> mention of "208 bytes" as protected. As not all of that scratch
>>>>> area is used, it mentions the remaining as unused. Essentially
>>>>> 288 bytes below current stack pointer is protected from debuggers
>>>>> and interrupt code (red zone). Note that it should be 224 bytes
>>>>> of unused red zone instead of 208 bytes as red zone usage in
>>>>> ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
>>>>> Hope that clears the misunderstanding..
>>>>
>>>> I see. That makes sense. So it's similar to amd64 red zone,
>>>> but there we have an issue with irqs, hence the kernel is
>>>> compiled with -mno-red-zone.
>>>
>>> I assume that issue is that the interrupt entry unconditionally writes
>>> some data below the stack pointer, disregarding the red zone?
>>>
>>>> I guess ppc always has a different interrupt stack and
>>>> it's not an issue?
>>>
>>> No, the interrupt entry allocates a frame that is big enough to cover
>>> the red zone as well as the space it needs to save registers.
>>>
>>> See STACK_INT_FRAME_SIZE which includes KERNEL_REDZONE_SIZE:
>>>
>>>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ 
>>> tree/arch/powerpc/include/asm/ptrace.h? 
>>> commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n165
>>>
>>> Which is renamed to INT_FRAME_SIZE in asm-offsets.c and then is used in
>>> the interrupt entry here:
>>>
>>>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ 
>>> tree/arch/powerpc/kernel/exceptions-64s.S? 
>>> commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n497
>> 
>> Thanks for clarifying that, Michael.
>> Only async interrupt handlers use different interrupt stacks, right?
>
> ... and separate emergency stack for some special cases...

There isn't a neat rule like sync/async.

Most interrupts use the normal kernel stack, whether sync or async.

External interrupts switch to a separate hard interrupt stack
(hardirq_ctx) in call_do_irq(), but only after coming in on the kernel
stack first.

Some interrupts use the emergency stack (in some cases), eg. HMI, soft
NMI (fake), TM bad thing (program check), or their own stack, system
reset (nmi_emergency_sp), machine check (mc_emergency_sp).

cheers

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-10-28  5:46 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-15 20:56 [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines Hari Bathini
2024-09-15 20:56 ` [PATCH v5 01/17] powerpc/trace: Account for -fpatchable-function-entry support by toolchain Hari Bathini
2024-09-15 20:56 ` [PATCH v5 02/17] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Hari Bathini
2024-09-15 20:56 ` [PATCH v5 03/17] powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc v5.x Hari Bathini
2024-09-15 20:56 ` [PATCH v5 04/17] powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code Hari Bathini
2024-09-15 20:56 ` [PATCH v5 05/17] powerpc/module_64: Convert #ifdef to IS_ENABLED() Hari Bathini
2024-09-15 20:56 ` [PATCH v5 06/17] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace Hari Bathini
2024-09-15 20:56 ` [PATCH v5 07/17] powerpc/ftrace: Skip instruction patching if the instructions are the same Hari Bathini
2024-09-15 20:56 ` [PATCH v5 08/17] powerpc/ftrace: Move ftrace stub used for init text before _einittext Hari Bathini
2024-09-15 20:56 ` [PATCH v5 09/17] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel() Hari Bathini
2024-09-15 20:56 ` [PATCH v5 10/17] powerpc/ftrace: Add a postlink script to validate function tracer Hari Bathini
2024-09-15 20:56 ` [PATCH v5 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link Hari Bathini
2024-10-09 15:23   ` Masahiro Yamada
2024-10-10  9:56     ` Hari Bathini
2024-10-10 11:37       ` Masahiro Yamada
2024-10-24 17:20         ` Hari Bathini
2024-09-15 20:56 ` [PATCH v5 12/17] powerpc64/ftrace: Move ftrace sequence out of line Hari Bathini
2024-10-09 15:35   ` Masahiro Yamada
2024-09-15 20:56 ` [PATCH v5 13/17] powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs Hari Bathini
2024-10-09 15:36   ` Masahiro Yamada
2024-09-15 20:56 ` [PATCH v5 14/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Hari Bathini
2024-09-15 20:56 ` [PATCH v5 15/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Hari Bathini
2024-09-15 20:56 ` [PATCH v5 16/17] samples/ftrace: Add support for ftrace direct samples on powerpc Hari Bathini
2024-09-15 20:56 ` [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines Hari Bathini
2024-09-16 21:41   ` kernel test robot
2024-09-17  7:50   ` Alexei Starovoitov
2024-09-30  5:33     ` Hari Bathini
2024-09-30 12:55       ` Alexei Starovoitov
2024-10-01  7:18         ` Hari Bathini
2024-10-01 14:53           ` Alexei Starovoitov
2024-10-03  5:33             ` Hari Bathini
2024-10-10  0:18             ` Michael Ellerman
2024-10-10  9:39               ` Hari Bathini
2024-10-10  9:46                 ` Hari Bathini
2024-10-28  5:46                   ` Michael Ellerman
2024-10-09 15:46 ` [PATCH v5 00/17] powerpc: Core ftrace rework, support for ftrace direct and " Masahiro Yamada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).