linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines
@ 2024-06-20 18:54 Naveen N Rao
  2024-06-20 18:54 ` [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

This is v3 of the patches posted here:
http://lkml.kernel.org/r/cover.1718008093.git.naveen@kernel.org

Since v2, I have addressed review comments from Steven and Masahiro 
along with a few fixes. Patches 7-11 are new in this series and add 
support for ftrace direct and bpf trampolines. 

This series depends on the patch series from Benjamin Gray adding 
support for patch_ulong():
http://lkml.kernel.org/r/20240515024445.236364-1-bgray@linux.ibm.com


- Naveen


Naveen N Rao (11):
  powerpc/kprobes: Use ftrace to determine if a probe is at function
    entry
  powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  powerpc/module_64: Convert #ifdef to IS_ENABLED()
  powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  kbuild: Add generic hook for architectures to use before the final
    vmlinux link
  powerpc64/ftrace: Move ftrace sequence out of line
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  samples/ftrace: Add support for ftrace direct samples on powerpc
  powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into
    bpf_jit_emit_func_call_rel()
  powerpc64/bpf: Add support for bpf trampolines

 arch/Kconfig                                |   3 +
 arch/powerpc/Kconfig                        |   9 +
 arch/powerpc/Makefile                       |   8 +
 arch/powerpc/include/asm/ftrace.h           |  29 +-
 arch/powerpc/include/asm/module.h           |   5 +
 arch/powerpc/include/asm/ppc-opcode.h       |  14 +
 arch/powerpc/kernel/asm-offsets.c           |  11 +
 arch/powerpc/kernel/kprobes.c               |  18 +-
 arch/powerpc/kernel/module_64.c             |  67 +-
 arch/powerpc/kernel/trace/ftrace.c          | 269 +++++++-
 arch/powerpc/kernel/trace/ftrace_64_pg.c    |  73 +-
 arch/powerpc/kernel/trace/ftrace_entry.S    | 210 ++++--
 arch/powerpc/kernel/vmlinux.lds.S           |   3 +-
 arch/powerpc/net/bpf_jit.h                  |  11 +
 arch/powerpc/net/bpf_jit_comp.c             | 702 +++++++++++++++++++-
 arch/powerpc/net/bpf_jit_comp32.c           |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c           |  68 +-
 arch/powerpc/tools/Makefile                 |  10 +
 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh  |  49 ++
 samples/ftrace/ftrace-direct-modify.c       |  85 ++-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 ++-
 samples/ftrace/ftrace-direct-multi.c        |  79 ++-
 samples/ftrace/ftrace-direct-too.c          |  83 ++-
 samples/ftrace/ftrace-direct.c              |  69 +-
 scripts/Makefile.vmlinux                    |   8 +
 scripts/link-vmlinux.sh                     |  11 +-
 26 files changed, 1813 insertions(+), 189 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh


base-commit: e2b06d707dd067509cdc9ceba783c06fa6a551c2
prerequisite-patch-id: a1d50e589288239d5a8b1c1f354cd4737057c9d3
prerequisite-patch-id: da4142d56880861bd0ad7ad7087c9e2670a2ee54
prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
prerequisite-patch-id: ad3b71bf071ae4ba1bee5b087e61a2055772a74f
-- 
2.45.2


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-07-01  8:40   ` Nicholas Piggin
  2024-06-20 18:54 ` [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code Naveen N Rao
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2, unless we are using pcrel.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/kprobes.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 14c5ddec3056..ca204f4f21c1 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 	return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-	return offset <= 16;
-#else
-	return offset <= 8;
-#endif
-#else
+	unsigned long ip = ftrace_location(addr);
+
+	if (ip)
+		return offset <= (ip - addr);
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+		return offset <= 8;
 	return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
 					 bool *on_func_entry)
 {
-	*on_func_entry = arch_kprobe_on_func_entry(offset);
+	*on_func_entry = arch_kprobe_on_func_entry(addr, offset);
 	return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
  2024-06-20 18:54 ` [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-07-01  8:57   ` Nicholas Piggin
  2024-06-20 18:54 ` [RFC PATCH v3 03/11] powerpc/module_64: Convert #ifdef to IS_ENABLED() Naveen N Rao
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
	mflr	r0
	stw	r0, 4(r1)
	bl	_mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

For 64-bit powerpc, early versions of gcc used to emit a three
instruction sequence for function profiling (with -mprofile-kernel) with
a 'std' instruction to mimic the 'stw' above. Address that scenario also
by nop-ing out the 'std' instruction during ftrace init.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/trace/ftrace.c       | 6 ++++--
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..463bd7531dc8 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 		/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
 		ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
 		if (!ret)
-			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+			ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+						 ppc_inst(PPC_RAW_NOP()));
 	} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
 		/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */
 		ret = ftrace_read_inst(ip - 4, &old);
 		if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) {
 			ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
-			ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+			ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+						  ppc_inst(PPC_RAW_NOP()));
 		}
 	} else {
 		return -EINVAL;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 76dbe9fd2c0f..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro	ftrace_regs_entry allregs
+	/* Save the original return address in A's stack frame */
+	PPC_STL		r0, LRSAVE(r1)
 	/* Create a minimal stack frame for representing B */
 	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
 	SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-	/* Save the original return address in A's stack frame */
-	std	r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
 	/* Ok to continue? */
 	lbz	r3, PACA_FTRACE_ENABLED(r13)
 	cmpdi	r3, 0
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 03/11] powerpc/module_64: Convert #ifdef to IS_ENABLED()
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
  2024-06-20 18:54 ` [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
  2024-06-20 18:54 ` [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-07-01  9:01   ` Nicholas Piggin
  2024-06-20 18:54 ` [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace Naveen N Rao
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Minor refactor for converting #ifdef to IS_ENABLED().

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/module_64.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index e9bab599d0c2..c202be11683b 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -241,14 +241,13 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 		}
 	}
 
-#ifdef CONFIG_DYNAMIC_FTRACE
 	/* make the trampoline to the ftrace_caller */
-	relocs++;
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE))
+		relocs++;
+
 	/* an additional one for ftrace_regs_caller */
-	relocs++;
-#endif
-#endif
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
+		relocs++;
 
 	pr_debug("Looks like a total of %lu stubs, max\n", relocs);
 	return relocs * sizeof(struct ppc64_stub_entry);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (2 preceding siblings ...)
  2024-06-20 18:54 ` [RFC PATCH v3 03/11] powerpc/module_64: Convert #ifdef to IS_ENABLED() Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-07-01  9:27   ` Nicholas Piggin
  2024-06-20 18:54 ` [RFC PATCH v3 05/11] kbuild: Add generic hook for architectures to use before the final vmlinux link Naveen N Rao
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Pointer to struct module is only relevant for ftrace records belonging
to kernel modules. Having this field in dyn_arch_ftrace wastes memory
for all ftrace records belonging to the kernel. Remove the same in
favour of looking up the module from the ftrace record address, similar
to other architectures.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/include/asm/ftrace.h        |  1 -
 arch/powerpc/kernel/trace/ftrace.c       | 54 +++++++++++-------
 arch/powerpc/kernel/trace/ftrace_64_pg.c | 73 +++++++++++-------------
 3 files changed, 65 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 107fc5a48456..201f9d15430a 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,7 +26,6 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
-	struct module *mod;
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 463bd7531dc8..2cff37b5fd2c 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -106,28 +106,48 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
 	return 0;
 }
 
+#ifdef CONFIG_MODULES
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long addr)
+{
+	struct module *mod = NULL;
+
+	/*
+	 * NOTE: __module_text_address() must be called with preemption
+	 * disabled, but we can rely on ftrace_lock to ensure that 'mod'
+	 * retains its validity throughout the remainder of this code.
+	 */
+	preempt_disable();
+	mod = __module_text_address(ip);
+	preempt_enable();
+
+	if (!mod)
+		pr_err("No module loaded at addr=%lx\n", ip);
+
+	return (addr == (unsigned long)ftrace_caller ? mod->arch.tramp : mod->arch.tramp_regs);
+}
+#else
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long addr)
+{
+	return 0;
+}
+#endif
+
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
 {
 	unsigned long ip = rec->ip;
 	unsigned long stub;
 
-	if (is_offset_in_branch_range(addr - ip)) {
+	if (is_offset_in_branch_range(addr - ip))
 		/* Within range */
 		stub = addr;
-#ifdef CONFIG_MODULES
-	} else if (rec->arch.mod) {
-		/* Module code would be going to one of the module stubs */
-		stub = (addr == (unsigned long)ftrace_caller ? rec->arch.mod->arch.tramp :
-							       rec->arch.mod->arch.tramp_regs);
-#endif
-	} else if (core_kernel_text(ip)) {
+	else if (core_kernel_text(ip))
 		/* We would be branching to one of our ftrace stubs */
 		stub = find_ftrace_tramp(ip);
-		if (!stub) {
-			pr_err("0x%lx: No ftrace stubs reachable\n", ip);
-			return -EINVAL;
-		}
-	} else {
+	else
+		stub = ftrace_lookup_module_stub(ip, addr);
+
+	if (!stub) {
+		pr_err("0x%lx: No ftrace stubs reachable\n", ip);
 		return -EINVAL;
 	}
 
@@ -258,14 +278,6 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	if (ret)
 		return ret;
 
-	if (!core_kernel_text(ip)) {
-		if (!mod) {
-			pr_err("0x%lx: No module provided for non-kernel address\n", ip);
-			return -EFAULT;
-		}
-		rec->arch.mod = mod;
-	}
-
 	/* Nop-out the ftrace location */
 	new = ppc_inst(PPC_RAW_NOP());
 	addr = MCOUNT_ADDR;
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..a563b9ffcc2b 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -116,6 +116,24 @@ static unsigned long find_bl_target(unsigned long ip, ppc_inst_t op)
 }
 
 #ifdef CONFIG_MODULES
+static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
+{
+	struct module *mod;
+	/*
+	 * NOTE: __module_text_address() must be called with preemption
+	 * disabled, but we can rely on ftrace_lock to ensure that 'mod'
+	 * retains its validity throughout the remainder of this code.
+	 */
+	preempt_disable();
+	mod = __module_text_address(rec->ip);
+	preempt_enable();
+
+	if (!mod)
+		pr_err("No module loaded at addr=%lx\n", rec->ip);
+
+	return mod;
+}
+
 static int
 __ftrace_make_nop(struct module *mod,
 		  struct dyn_ftrace *rec, unsigned long addr)
@@ -124,6 +142,12 @@ __ftrace_make_nop(struct module *mod,
 	unsigned long ip = rec->ip;
 	ppc_inst_t op, pop;
 
+	if (!mod) {
+		mod = ftrace_lookup_module(rec);
+		if (!mod)
+			return -EINVAL;
+	}
+
 	/* read where this goes */
 	if (copy_inst_from_kernel_nofault(&op, (void *)ip)) {
 		pr_err("Fetching opcode failed.\n");
@@ -366,27 +390,6 @@ int ftrace_make_nop(struct module *mod,
 		return -EINVAL;
 	}
 
-	/*
-	 * Out of range jumps are called from modules.
-	 * We should either already have a pointer to the module
-	 * or it has been passed in.
-	 */
-	if (!rec->arch.mod) {
-		if (!mod) {
-			pr_err("No module loaded addr=%lx\n", addr);
-			return -EFAULT;
-		}
-		rec->arch.mod = mod;
-	} else if (mod) {
-		if (mod != rec->arch.mod) {
-			pr_err("Record mod %p not equal to passed in mod %p\n",
-			       rec->arch.mod, mod);
-			return -EINVAL;
-		}
-		/* nothing to do if mod == rec->arch.mod */
-	} else
-		mod = rec->arch.mod;
-
 	return __ftrace_make_nop(mod, rec, addr);
 }
 
@@ -411,7 +414,10 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	ppc_inst_t op[2];
 	void *ip = (void *)rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_lookup_module(rec);
+
+	if (!mod)
+		return -EINVAL;
 
 	/* read where this goes */
 	if (copy_inst_from_kernel_nofault(op, ip))
@@ -533,16 +539,6 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 		return -EINVAL;
 	}
 
-	/*
-	 * Out of range jumps are called from modules.
-	 * Being that we are converting from nop, it had better
-	 * already have a module defined.
-	 */
-	if (!rec->arch.mod) {
-		pr_err("No module loaded\n");
-		return -EINVAL;
-	}
-
 	return __ftrace_make_call(rec, addr);
 }
 
@@ -555,7 +551,10 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	ppc_inst_t op;
 	unsigned long ip = rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_lookup_module(rec);
+
+	if (!mod)
+		return -EINVAL;
 
 	/* If we never set up ftrace trampolines, then bail */
 	if (!mod->arch.tramp || !mod->arch.tramp_regs) {
@@ -668,14 +667,6 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 		return -EINVAL;
 	}
 
-	/*
-	 * Out of range jumps are called from modules.
-	 */
-	if (!rec->arch.mod) {
-		pr_err("No module loaded\n");
-		return -EINVAL;
-	}
-
 	return __ftrace_modify_call(rec, old_addr, addr);
 }
 #endif
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 05/11] kbuild: Add generic hook for architectures to use before the final vmlinux link
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (3 preceding siblings ...)
  2024-06-20 18:54 ` [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-07-01  9:30   ` Nicholas Piggin
  2024-06-20 18:54 ` [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line Naveen N Rao
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On powerpc, we would like to be able to make a pass on vmlinux.o and
generate a new object file to be linked into vmlinux. Add a generic pass
in Makefile.vmlinux that architectures can use for this purpose.

Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
will be invoked prior to the final vmlinux link step.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/Kconfig             |  3 +++
 scripts/Makefile.vmlinux |  8 ++++++++
 scripts/link-vmlinux.sh  | 11 ++++++++---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 975dd22a2dbd..649f0903e7ef 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1643,4 +1643,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 config ARCH_NEED_CMPXCHG_1_EMU
 	bool
 
+config ARCH_WANTS_PRE_LINK_VMLINUX
+	def_bool n
+
 endmenu
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..6410e0be7f52 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -22,6 +22,14 @@ targets += .vmlinux.export.o
 vmlinux: .vmlinux.export.o
 endif
 
+ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
+targets += .arch.vmlinux.o
+.arch.vmlinux.o: vmlinux.o FORCE
+	$(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
+
+vmlinux: .arch.vmlinux.o
+endif
+
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Final link of vmlinux with optional arch pass after final link
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 518c70b8db50..aafaed1412ea 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -122,7 +122,7 @@ gen_btf()
 		return 1
 	fi
 
-	vmlinux_link ${1}
+	vmlinux_link ${1} ${arch_vmlinux_o}
 
 	info "BTF" ${2}
 	LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
@@ -178,7 +178,7 @@ kallsyms_step()
 	kallsymso=${kallsyms_vmlinux}.o
 	kallsyms_S=${kallsyms_vmlinux}.S
 
-	vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}
+	vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o} ${arch_vmlinux_o}
 	mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms
 	kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
 
@@ -223,6 +223,11 @@ fi
 
 ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init init/version-timestamp.o
 
+arch_vmlinux_o=""
+if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
+	arch_vmlinux_o=.arch.vmlinux.o
+fi
+
 btf_vmlinux_bin_o=""
 if is_enabled CONFIG_DEBUG_INFO_BTF; then
 	btf_vmlinux_bin_o=.btf.vmlinux.bin.o
@@ -273,7 +278,7 @@ if is_enabled CONFIG_KALLSYMS; then
 	fi
 fi
 
-vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
+vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o} ${arch_vmlinux_o}
 
 # fill in BTF IDs
 if is_enabled CONFIG_DEBUG_INFO_BTF && is_enabled CONFIG_BPF; then
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (4 preceding siblings ...)
  2024-06-20 18:54 ` [RFC PATCH v3 05/11] kbuild: Add generic hook for architectures to use before the final vmlinux link Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-07-01 10:39   ` Nicholas Piggin
  2024-06-20 18:54 ` [RFC PATCH v3 07/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Naveen N Rao
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Function profile sequence on powerpc includes two instructions at the
beginning of each function:
	mflr	r0
	bl	ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

Given the sequence, we cannot return from ftrace_caller with 'blr' as we
need to keep LR and r0 intact. This results in link stack imbalance when
ftrace is enabled. To address that, we would like to use a three
instruction sequence:
	mflr	r0
	bl	ftrace_caller
	mtlr	r0

Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
reserve two instruction slots before the function. This results in a
total of five instruction slots to be reserved for ftrace use on each
function that is traced.

Move the function profile sequence out-of-line to minimize its impact.
To do this, we reserve a single nop at function entry using
-fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
the total number of functions that can be traced. This is then used to
generate a .S file reserving the appropriate amount of space for use as
ftrace stubs, which is built and linked into vmlinux.

On bootup, the stub space is split into separate stubs per function and
populated with the proper instruction sequence. A pointer to the
associated stub is maintained in dyn_arch_ftrace.

For modules, space for ftrace stubs is reserved from the generic module
stub space.

This is restricted to and enabled by default only on 64-bit powerpc.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                       |   5 +
 arch/powerpc/Makefile                      |   4 +
 arch/powerpc/include/asm/ftrace.h          |  10 ++
 arch/powerpc/include/asm/module.h          |   5 +
 arch/powerpc/kernel/asm-offsets.c          |   4 +
 arch/powerpc/kernel/module_64.c            |  58 +++++++-
 arch/powerpc/kernel/trace/ftrace.c         | 147 +++++++++++++++++++--
 arch/powerpc/kernel/trace/ftrace_entry.S   |  99 ++++++++++----
 arch/powerpc/kernel/vmlinux.lds.S          |   3 +-
 arch/powerpc/tools/Makefile                |  10 ++
 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh |  48 +++++++
 11 files changed, 355 insertions(+), 38 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c88c6d46a5bc..dd7efca2275a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
 	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
 
+config FTRACE_PFE_OUT_OF_LINE
+	def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	depends on PPC64
+	select ARCH_WANTS_PRE_LINK_VMLINUX
+
 config HOTPLUG_CPU
 	bool "Support for enabling/disabling CPUs"
 	depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index a8479c881cac..bb920d48ec6e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -155,7 +155,11 @@ CC_FLAGS_NO_FPU		:= $(call cc-option,-msoft-float)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS	+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+CC_FLAGS_FTRACE := -fpatchable-function-entry=1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 201f9d15430a..9da1da0f87b4 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,6 +26,9 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	unsigned long pfe_stub;
+#endif
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
@@ -132,6 +135,13 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
 
 #ifdef CONFIG_FUNCTION_TRACER
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+struct ftrace_pfe_stub {
+	u32	insn[4];
+};
+extern struct ftrace_pfe_stub ftrace_pfe_stub_text[], ftrace_pfe_stub_inittext[];
+extern unsigned long ftrace_pfe_stub_text_count, ftrace_pfe_stub_inittext_count;
+#endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
 #else
diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h
index 300c777cc307..28dbd1ec5593 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -47,6 +47,11 @@ struct mod_arch_specific {
 #ifdef CONFIG_DYNAMIC_FTRACE
 	unsigned long tramp;
 	unsigned long tramp_regs;
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	struct ftrace_pfe_stub *pfe_stubs;
+	unsigned int pfe_stub_count;
+	unsigned int pfe_stub_index;
+#endif
 #endif
 };
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index f029755f9e69..5f1a411d714c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -674,5 +674,9 @@ int main(void)
 	DEFINE(BPT_SIZE, BPT_SIZE);
 #endif
 
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	DEFINE(FTRACE_PFE_STUB_SIZE, sizeof(struct ftrace_pfe_stub));
+#endif
+
 	return 0;
 }
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index c202be11683b..3388de5a32de 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -205,7 +205,9 @@ static int relacmp(const void *_x, const void *_y)
 
 /* Get size of potential trampolines required. */
 static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
-				    const Elf64_Shdr *sechdrs)
+				    const Elf64_Shdr *sechdrs,
+				    char *secstrings,
+				    struct module *me)
 {
 	/* One extra reloc so it's always 0-addr terminated */
 	unsigned long relocs = 1;
@@ -249,6 +251,24 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
 		relocs++;
 
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	/* stubs for the function tracer */
+	for (i = 1; i < hdr->e_shnum; i++) {
+		if (!strcmp(secstrings + sechdrs[i].sh_name, "__patchable_function_entries")) {
+			me->arch.pfe_stub_count = sechdrs[i].sh_size / sizeof(unsigned long);
+			me->arch.pfe_stub_index = 0;
+			relocs += roundup(me->arch.pfe_stub_count * sizeof(struct ftrace_pfe_stub),
+					  sizeof(struct ppc64_stub_entry)) /
+				  sizeof(struct ppc64_stub_entry);
+			break;
+		}
+	}
+	if (i == hdr->e_shnum) {
+		pr_err("%s: doesn't contain __patchable_function_entries.\n", me->name);
+		return -ENOEXEC;
+	}
+#endif
+
 	pr_debug("Looks like a total of %lu stubs, max\n", relocs);
 	return relocs * sizeof(struct ppc64_stub_entry);
 }
@@ -459,7 +479,7 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
 #endif
 
 	/* Override the stubs size */
-	sechdrs[me->arch.stubs_section].sh_size = get_stubs_size(hdr, sechdrs);
+	sechdrs[me->arch.stubs_section].sh_size = get_stubs_size(hdr, sechdrs, secstrings, me);
 
 	return 0;
 }
@@ -1084,6 +1104,37 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
 	return 0;
 }
 
+static int setup_ftrace_pfe_stubs(const Elf64_Shdr *sechdrs, unsigned long addr, struct module *me)
+{
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	unsigned int i, total_stubs, num_stubs;
+	struct ppc64_stub_entry *stub;
+
+	total_stubs = sechdrs[me->arch.stubs_section].sh_size / sizeof(*stub);
+	num_stubs = roundup(me->arch.pfe_stub_count * sizeof(struct ftrace_pfe_stub),
+			    sizeof(struct ppc64_stub_entry)) / sizeof(struct ppc64_stub_entry);
+
+	/* Find the next available entry */
+	stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
+	for (i = 0; stub_func_addr(stub[i].funcdata); i++)
+		if (WARN_ON(i >= total_stubs))
+			return -1;
+
+	if (WARN_ON(i + num_stubs > total_stubs))
+		return -1;
+
+	stub += i;
+	me->arch.pfe_stubs = (struct ftrace_pfe_stub *)stub;
+
+	/* reserve stubs */
+	for (i = 0; i < num_stubs; i++)
+		if (patch_u32((void *)&stub->funcdata, PPC_RAW_NOP()))
+			return -1;
+#endif
+
+	return 0;
+}
+
 int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sechdrs)
 {
 	mod->arch.tramp = stub_for_addr(sechdrs,
@@ -1102,6 +1153,9 @@ int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sechdrs)
 	if (!mod->arch.tramp)
 		return -ENOENT;
 
+	if (setup_ftrace_pfe_stubs(sechdrs, mod->arch.tramp, mod))
+		return -ENOENT;
+
 	return 0;
 }
 #endif
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 2cff37b5fd2c..9f3c10307331 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -37,7 +37,8 @@ unsigned long ftrace_call_adjust(unsigned long addr)
 	if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
 		return 0;
 
-	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
+	    !IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
 		addr += MCOUNT_INSN_SIZE;
 
 	return addr;
@@ -82,7 +83,7 @@ static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_
 {
 	int ret = ftrace_validate_inst(ip, old);
 
-	if (!ret)
+	if (!ret && !ppc_inst_equal(old, new))
 		ret = patch_instruction((u32 *)ip, new);
 
 	return ret;
@@ -132,11 +133,23 @@ static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long a
 }
 #endif
 
+static unsigned long ftrace_get_pfe_stub(struct dyn_ftrace *rec)
+{
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	return rec->arch.pfe_stub;
+#else
+	BUILD_BUG();
+#endif
+}
+
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
 {
 	unsigned long ip = rec->ip;
 	unsigned long stub;
 
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
+
 	if (is_offset_in_branch_range(addr - ip))
 		/* Within range */
 		stub = addr;
@@ -147,7 +160,7 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 		stub = ftrace_lookup_module_stub(ip, addr);
 
 	if (!stub) {
-		pr_err("0x%lx: No ftrace stubs reachable\n", ip);
+		pr_err("0x%lx (0x%lx): No ftrace stubs reachable\n", ip, rec->ip);
 		return -EINVAL;
 	}
 
@@ -155,6 +168,79 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 	return 0;
 }
 
+static int ftrace_init_pfe_stub(struct module *mod, struct dyn_ftrace *rec)
+{
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	static int pfe_stub_text_index, pfe_stub_inittext_index;
+	int ret = 0, pfe_stub_count, *pfe_stub_index;
+	ppc_inst_t inst;
+	struct ftrace_pfe_stub *pfe_stub, pfe_stub_template = {
+		.insn = {
+			PPC_RAW_MFLR(_R0),
+			PPC_RAW_NOP(),		/* bl ftrace_caller */
+			PPC_RAW_MTLR(_R0),
+			PPC_RAW_NOP()		/* b rec->ip + 4 */
+		}
+	};
+
+	WARN_ON(rec->arch.pfe_stub);
+
+	if (is_kernel_inittext(rec->ip)) {
+		pfe_stub = ftrace_pfe_stub_inittext;
+		pfe_stub_index = &pfe_stub_inittext_index;
+		pfe_stub_count = ftrace_pfe_stub_inittext_count;
+	} else if (is_kernel_text(rec->ip)) {
+		pfe_stub = ftrace_pfe_stub_text;
+		pfe_stub_index = &pfe_stub_text_index;
+		pfe_stub_count = ftrace_pfe_stub_text_count;
+#ifdef CONFIG_MODULES
+	} else if (mod) {
+		pfe_stub = mod->arch.pfe_stubs;
+		pfe_stub_index = &mod->arch.pfe_stub_index;
+		pfe_stub_count = mod->arch.pfe_stub_count;
+#endif
+	} else {
+		return -EINVAL;
+	}
+
+	pfe_stub += (*pfe_stub_index)++;
+
+	if (WARN_ON(*pfe_stub_index > pfe_stub_count))
+		return -EINVAL;
+
+	if (!is_offset_in_branch_range((long)rec->ip - (long)&pfe_stub->insn[0]) ||
+	    !is_offset_in_branch_range((long)(rec->ip + MCOUNT_INSN_SIZE) - (long)&pfe_stub->insn[3])) {
+		pr_err("%s: ftrace pfe stub out of range (%p -> %p).\n",
+					__func__, (void *)rec->ip, (void *)&pfe_stub->insn[0]);
+		return -EINVAL;
+	}
+
+	rec->arch.pfe_stub = (unsigned long)&pfe_stub->insn[0];
+
+	/* bl ftrace_caller */
+	if (mod)
+		/* We can't lookup the module since it is not fully formed yet */
+		inst = ftrace_create_branch_inst(ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE,
+						 mod->arch.tramp, 1);
+	else
+		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &inst);
+	pfe_stub_template.insn[1] = ppc_inst_val(inst);
+
+	/* b rec->ip + 4 */
+	if (!ret && create_branch(&inst, &pfe_stub->insn[3], rec->ip + MCOUNT_INSN_SIZE, 0))
+		return -EINVAL;
+	pfe_stub_template.insn[3] = ppc_inst_val(inst);
+
+	if (!ret)
+		ret = patch_instructions((u32 *)pfe_stub, (u32 *)&pfe_stub_template,
+					 sizeof(pfe_stub_template), false);
+
+	return ret;
+#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
+	BUILD_BUG();
+#endif
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
 {
@@ -167,18 +253,29 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	ppc_inst_t old, new;
-	int ret;
+	unsigned long ip = rec->ip;
+	int ret = 0;
 
 	/* This can only ever be called during module load */
-	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
+	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(ip)))
 		return -EINVAL;
 
 	old = ppc_inst(PPC_RAW_NOP());
-	ret = ftrace_get_call_inst(rec, addr, &new);
-	if (ret)
-		return ret;
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) {
+		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
+		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &old);
+	}
 
-	return ftrace_modify_code(rec->ip, old, new);
+	ret |= ftrace_get_call_inst(rec, addr, &new);
+
+	if (!ret)
+		ret = ftrace_modify_code(ip, old, new);
+
+	if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+		ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
+			 ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) - (long)rec->ip)));
+
+	return ret;
 }
 
 int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
@@ -211,6 +308,13 @@ void ftrace_replace_code(int enable)
 		new_addr = ftrace_get_addr_new(rec);
 		update = ftrace_update_record(rec, enable);
 
+		if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) && update != FTRACE_UPDATE_IGNORE) {
+			ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE;
+			ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &nop_inst);
+			if (ret)
+				goto out;
+		}
+
 		switch (update) {
 		case FTRACE_UPDATE_IGNORE:
 		default:
@@ -235,6 +339,24 @@ void ftrace_replace_code(int enable)
 
 		if (!ret)
 			ret = ftrace_modify_code(ip, old, new);
+
+		if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) &&
+		    (update == FTRACE_UPDATE_MAKE_NOP || update == FTRACE_UPDATE_MAKE_CALL)) {
+			/* Update the actual ftrace location */
+			call_inst = ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) -
+							    (long)rec->ip));
+			nop_inst = ppc_inst(PPC_RAW_NOP());
+			ip = rec->ip;
+
+			if (update == FTRACE_UPDATE_MAKE_NOP)
+				ret = ftrace_modify_code(ip, call_inst, nop_inst);
+			else
+				ret = ftrace_modify_code(ip, nop_inst, call_inst);
+
+			if (ret)
+				goto out;
+		}
+
 		if (ret)
 			goto out;
 	}
@@ -254,7 +376,8 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	/* Verify instructions surrounding the ftrace location */
 	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
 		/* Expect nops */
-		ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
+		if (!IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
 		if (!ret)
 			ret = ftrace_validate_inst(ip, ppc_inst(PPC_RAW_NOP()));
 	} else if (IS_ENABLED(CONFIG_PPC32)) {
@@ -278,6 +401,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	if (ret)
 		return ret;
 
+	/* Set up out-of-line stub */
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+		return ftrace_init_pfe_stub(mod, rec);
+
 	/* Nop-out the ftrace location */
 	new = ppc_inst(PPC_RAW_NOP());
 	addr = MCOUNT_ADDR;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 244a1c7bb1e8..b1cbef24f18f 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -78,10 +78,6 @@
 
 	/* Get the _mcount() call site out of LR */
 	mflr	r7
-	/* Save it as pt_regs->nip */
-	PPC_STL	r7, _NIP(r1)
-	/* Also save it in B's stackframe header for proper unwind */
-	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
 	/* Save the read LR in pt_regs->link */
 	PPC_STL	r0, _LINK(r1)
 
@@ -96,16 +92,6 @@
 	lwz	r5,function_trace_op@l(r3)
 #endif
 
-#ifdef CONFIG_LIVEPATCH_64
-	mr	r14, r7		/* remember old NIP */
-#endif
-
-	/* Calculate ip from nip-4 into r3 for call below */
-	subi    r3, r7, MCOUNT_INSN_SIZE
-
-	/* Put the original return address in r4 as parent_ip */
-	mr	r4, r0
-
 	/* Save special regs */
 	PPC_STL	r8, _MSR(r1)
 	.if \allregs == 1
@@ -114,17 +100,64 @@
 	PPC_STL	r11, _CCR(r1)
 	.endif
 
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	/* Save our real return address locally for return */
+	PPC_STL	r7, STACK_INT_FRAME_MARKER(r1)
+	/*
+	 * We want the ftrace location in the function, but our lr (in r7)
+	 * points at the 'mtlr r0' instruction in the out of line stub.  To
+	 * recover the ftrace location, we read the branch instruction in the
+	 * stub, and adjust our lr by the branch offset.
+	 */
+	lwz	r8, MCOUNT_INSN_SIZE(r7)
+	slwi	r8, r8, 6
+	srawi	r8, r8, 6
+	add	r3, r7, r8
+	/*
+	 * Override our nip to point past the branch in the original function.
+	 * This allows reliable stack trace and the ftrace stack tracer to work as-is.
+	 */
+	add	r7, r3, MCOUNT_INSN_SIZE
+#else
+	/* Calculate ip from nip-4 into r3 for call below */
+	subi    r3, r7, MCOUNT_INSN_SIZE
+#endif
+
+	/* Save NIP as pt_regs->nip */
+	PPC_STL	r7, _NIP(r1)
+	/* Also save it in B's stackframe header for proper unwind */
+	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
+#ifdef CONFIG_LIVEPATCH_64
+	mr	r14, r7		/* remember old NIP */
+#endif
+
+	/* Put the original return address in r4 as parent_ip */
+	mr	r4, r0
+
 	/* Load &pt_regs in r6 for call below */
 	addi    r6, r1, STACK_INT_FRAME_REGS
 .endm
 
 .macro	ftrace_regs_exit allregs
+#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
 	/* Load ctr with the possibly modified NIP */
 	PPC_LL	r3, _NIP(r1)
 	mtctr	r3
 
 #ifdef CONFIG_LIVEPATCH_64
 	cmpd	r14, r3		/* has NIP been altered? */
+#endif
+#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
+#ifdef CONFIG_LIVEPATCH_64
+	/* Load ctr with the possibly modified NIP */
+	PPC_LL	r3, _NIP(r1)
+
+	cmpd	r14, r3		/* has NIP been altered? */
+	bne-	1f
+#endif
+
+	PPC_LL	r3, STACK_INT_FRAME_MARKER(r1)
+1:	mtlr	r3
 #endif
 
 	/* Restore gprs */
@@ -139,7 +172,9 @@
 
 	/* Restore possibly modified LR */
 	PPC_LL	r0, _LINK(r1)
+#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
 	mtlr	r0
+#endif
 
 #ifdef CONFIG_PPC64
 	/* Restore callee's TOC */
@@ -153,7 +188,12 @@
         /* Based on the cmpd above, if the NIP was altered handle livepatch */
 	bne-	livepatch_handler
 #endif
-	bctr			/* jump after _mcount site */
+	/* jump after _mcount site */
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	blr
+#else
+	bctr
+#endif
 .endm
 
 _GLOBAL(ftrace_regs_caller)
@@ -177,6 +217,11 @@ _GLOBAL(ftrace_stub)
 
 #ifdef CONFIG_PPC64
 ftrace_no_trace:
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	REST_GPR(3, r1)
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	blr
+#else
 	mflr	r3
 	mtctr	r3
 	REST_GPR(3, r1)
@@ -184,6 +229,7 @@ ftrace_no_trace:
 	mtlr	r0
 	bctr
 #endif
+#endif
 
 #ifdef CONFIG_LIVEPATCH_64
 	/*
@@ -196,9 +242,9 @@ ftrace_no_trace:
 	 *
 	 * On entry:
 	 *  - we have no stack frame and can not allocate one
-	 *  - LR points back to the original caller (in A)
-	 *  - CTR holds the new NIP in C
-	 *  - r0, r11 & r12 are free
+	 *  - LR/r0 points back to the original caller (in A)
+	 *  - CTR/LR holds the new NIP in C
+	 *  - r11 & r12 are free
 	 */
 livepatch_handler:
 	ld	r12, PACA_THREAD_INFO(r13)
@@ -208,18 +254,23 @@ livepatch_handler:
 	addi	r11, r11, 24
 	std	r11, TI_livepatch_sp(r12)
 
-	/* Save toc & real LR on livepatch stack */
-	std	r2,  -24(r11)
-	mflr	r12
-	std	r12, -16(r11)
-
 	/* Store stack end marker */
 	lis     r12, STACK_END_MAGIC@h
 	ori     r12, r12, STACK_END_MAGIC@l
 	std	r12, -8(r11)
 
-	/* Put ctr in r12 for global entry and branch there */
+	/* Save toc & real LR on livepatch stack */
+	std	r2,  -24(r11)
+#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
+	mflr	r12
+	std	r12, -16(r11)
 	mfctr	r12
+#else
+	std	r0, -16(r11)
+	mflr	r12
+	/* Put ctr in r12 for global entry and branch there */
+	mtctr	r12
+#endif
 	bctrl
 
 	/*
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index f420df7888a7..0aef9959f2cd 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -267,14 +267,13 @@ SECTIONS
 	.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
 		_sinittext = .;
 		INIT_TEXT
-
+		*(.tramp.ftrace.init);
 		/*
 		 *.init.text might be RO so we must ensure this section ends on
 		 * a page boundary.
 		 */
 		. = ALIGN(PAGE_SIZE);
 		_einittext = .;
-		*(.tramp.ftrace.init);
 	} :text
 
 	/* .exit.text is discarded at runtime, not link time,
diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
new file mode 100644
index 000000000000..9e2ba9a85baa
--- /dev/null
+++ b/arch/powerpc/tools/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+quiet_cmd_gen_ftrace_pfe_stubs = FTRACE  $@
+      cmd_gen_ftrace_pfe_stubs = $< $(objtree)/vmlinux.o $@
+
+targets += .arch.vmlinux.o
+.arch.vmlinux.o: $(srctree)/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh $(objtree)/vmlinux.o FORCE
+	$(call if_changed,gen_ftrace_pfe_stubs)
+
+clean-files += $(objtree)/.arch.vmlinux.S $(objtree)/.arch.vmlinux.o
diff --git a/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
new file mode 100755
index 000000000000..ec95e99aff14
--- /dev/null
+++ b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+# Error out on error
+set -e
+
+is_enabled() {
+	grep -q "^$1=y" include/config/auto.conf
+}
+
+vmlinux_o=${1}
+arch_vmlinux_o=${2}
+arch_vmlinux_S=$(dirname ${arch_vmlinux_o})/$(basename ${arch_vmlinux_o} .o).S
+
+RELOCATION=R_PPC64_ADDR64
+if is_enabled CONFIG_PPC32; then
+	RELOCATION=R_PPC_ADDR32
+fi
+
+num_pfe_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
+		     grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
+num_pfe_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
+			 grep ".init.text" | grep "${RELOCATION}" | wc -l)
+
+cat > ${arch_vmlinux_S} <<EOF
+#include <asm/asm-offsets.h>
+#include <linux/linkage.h>
+
+.pushsection .tramp.ftrace.text,"aw"
+SYM_DATA(ftrace_pfe_stub_text_count, .long ${num_pfe_stubs_text})
+
+SYM_CODE_START(ftrace_pfe_stub_text)
+	.space ${num_pfe_stubs_text} * FTRACE_PFE_STUB_SIZE
+SYM_CODE_END(ftrace_pfe_stub_text)
+.popsection
+
+.pushsection .tramp.ftrace.init,"aw"
+SYM_DATA(ftrace_pfe_stub_inittext_count, .long ${num_pfe_stubs_inittext})
+
+SYM_CODE_START(ftrace_pfe_stub_inittext)
+	.space ${num_pfe_stubs_inittext} * FTRACE_PFE_STUB_SIZE
+SYM_CODE_END(ftrace_pfe_stub_inittext)
+.popsection
+EOF
+
+${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \
+      ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \
+      -c -o ${arch_vmlinux_o} ${arch_vmlinux_S}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 07/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (5 preceding siblings ...)
  2024-06-20 18:54 ` [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-06-20 18:54 ` [RFC PATCH v3 08/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Naveen N Rao
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.

This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.

The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.

A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.

To support this, we reserve a nop before each function on 32-bit
powerpc. For 64-bit powerpc, two nops are reserved before each
out-of-line stub. During ftrace activation, we update this location with
the associated ftrace_ops pointer. Then, on ftrace entry, we load from
this location and call into ftrace_ops->func().

For 64-bit powerpc, we ensure that the out-of-line stub area is
doubleword aligned so that ftrace_ops address can be updated atomically.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                       |  1 +
 arch/powerpc/Makefile                      |  4 ++
 arch/powerpc/include/asm/ftrace.h          |  5 +-
 arch/powerpc/kernel/asm-offsets.c          |  4 ++
 arch/powerpc/kernel/trace/ftrace.c         | 59 +++++++++++++++++++++-
 arch/powerpc/kernel/trace/ftrace_entry.S   | 34 ++++++++++---
 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh |  5 +-
 7 files changed, 101 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index dd7efca2275a..fde64ad19de5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if FTRACE_PFE_OUT_OF_LINE || (PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index bb920d48ec6e..c3e577dea137 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -158,8 +158,12 @@ KBUILD_CPPFLAGS	+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
 ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
 CC_FLAGS_FTRACE := -fpatchable-function-entry=1
 else
+ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS # PPC32 only
+CC_FLAGS_FTRACE := -fpatchable-function-entry=3,1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
 endif
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 9da1da0f87b4..938cecf72eb1 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -137,8 +137,11 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 #ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
 struct ftrace_pfe_stub {
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	struct ftrace_ops *ftrace_op;
+#endif
 	u32	insn[4];
-};
+} __aligned(sizeof(unsigned long));
 extern struct ftrace_pfe_stub ftrace_pfe_stub_text[], ftrace_pfe_stub_inittext[];
 extern unsigned long ftrace_pfe_stub_text_count, ftrace_pfe_stub_inittext_count;
 #endif
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 5f1a411d714c..a11ea5f4d86a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -678,5 +678,9 @@ int main(void)
 	DEFINE(FTRACE_PFE_STUB_SIZE, sizeof(struct ftrace_pfe_stub));
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#endif
+
 	return 0;
 }
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 9f3c10307331..028548312c23 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -38,8 +38,11 @@ unsigned long ftrace_call_adjust(unsigned long addr)
 		return 0;
 
 	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
-	    !IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+	    !IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) {
 		addr += MCOUNT_INSN_SIZE;
+		if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+			addr += MCOUNT_INSN_SIZE;
+	}
 
 	return addr;
 }
@@ -241,6 +244,46 @@ static int ftrace_init_pfe_stub(struct module *mod, struct dyn_ftrace *rec)
 #endif
 }
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+static const struct ftrace_ops *powerpc_rec_get_ops(struct dyn_ftrace *rec)
+{
+	const struct ftrace_ops *ops = NULL;
+
+	if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
+		ops = ftrace_find_unique_ops(rec);
+		WARN_ON_ONCE(!ops);
+	}
+
+	if (!ops)
+		ops = &ftrace_list_ops;
+
+	return ops;
+}
+
+static int ftrace_rec_set_ops(struct dyn_ftrace *rec, const struct ftrace_ops *ops)
+{
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+		return patch_ulong((void *)(ftrace_get_pfe_stub(rec) - sizeof(unsigned long)),
+				   (unsigned long)ops);
+	else
+		return patch_ulong((void *)(rec->ip - MCOUNT_INSN_SIZE - sizeof(unsigned long)),
+				   (unsigned long)ops);
+}
+
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, &ftrace_nop_ops);
+}
+
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, powerpc_rec_get_ops(rec));
+}
+#else
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; }
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; }
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
 {
@@ -271,6 +314,10 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	if (!ret)
 		ret = ftrace_modify_code(ip, old, new);
 
+	ret = ftrace_rec_update_ops(rec);
+	if (ret)
+		return ret;
+
 	if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
 		ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
 			 ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) - (long)rec->ip)));
@@ -322,16 +369,19 @@ void ftrace_replace_code(int enable)
 		case FTRACE_UPDATE_MODIFY_CALL:
 			ret = ftrace_get_call_inst(rec, new_addr, &new_call_inst);
 			ret |= ftrace_get_call_inst(rec, addr, &call_inst);
+			ret |= ftrace_rec_update_ops(rec);
 			old = call_inst;
 			new = new_call_inst;
 			break;
 		case FTRACE_UPDATE_MAKE_NOP:
 			ret = ftrace_get_call_inst(rec, addr, &call_inst);
+			ret |= ftrace_rec_set_nop_ops(rec);
 			old = call_inst;
 			new = nop_inst;
 			break;
 		case FTRACE_UPDATE_MAKE_CALL:
 			ret = ftrace_get_call_inst(rec, new_addr, &call_inst);
+			ret |= ftrace_rec_update_ops(rec);
 			old = nop_inst;
 			new = call_inst;
 			break;
@@ -443,6 +493,13 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	ppc_inst_t old, new;
 	int ret;
 
+	/*
+	 * When using CALL_OPS, the function to call is associated with the
+	 * call site, and we don't have a global function pointer to update.
+	 */
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+		return 0;
+
 	old = ppc_inst_read((u32 *)&ftrace_call);
 	new = ftrace_create_branch_inst(ip, ppc_function_entry(func), 1);
 	ret = ftrace_modify_code(ip, old, new);
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index b1cbef24f18f..a76aedd970a6 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -85,11 +85,21 @@
 	/* Save callee's TOC in the ABI compliant location */
 	std	r2, STK_GOT(r1)
 	LOAD_PACA_TOC()		/* get kernel TOC in r2 */
+#endif
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	/* r7 points to the instruction following the call to ftrace */
+	PPC_LL	r5, -(MCOUNT_INSN_SIZE*2 + SZL)(r7)
+	PPC_LL	r12, FTRACE_OPS_FUNC(r5)
+	mtctr	r12
+#else /* !CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS */
+#ifdef CONFIG_PPC64
 	LOAD_REG_ADDR(r3, function_trace_op)
 	ld	r5,0(r3)
 #else
 	lis	r3,function_trace_op@ha
 	lwz	r5,function_trace_op@l(r3)
+#endif
 #endif
 
 	/* Save special regs */
@@ -196,20 +206,30 @@
 #endif
 .endm
 
-_GLOBAL(ftrace_regs_caller)
-	ftrace_regs_entry 1
-	/* ftrace_call(r3, r4, r5, r6) */
+.macro ftrace_regs_func allregs
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	bctrl
+#else
+	.if \allregs == 1
 .globl ftrace_regs_call
 ftrace_regs_call:
+	.else
+.globl ftrace_call
+ftrace_call:
+	.endif
+	/* ftrace_call(r3, r4, r5, r6) */
 	bl	ftrace_stub
+#endif
+.endm
+
+_GLOBAL(ftrace_regs_caller)
+	ftrace_regs_entry 1
+	ftrace_regs_func 1
 	ftrace_regs_exit 1
 
 _GLOBAL(ftrace_caller)
 	ftrace_regs_entry 0
-	/* ftrace_call(r3, r4, r5, r6) */
-.globl ftrace_call
-ftrace_call:
-	bl	ftrace_stub
+	ftrace_regs_func 0
 	ftrace_regs_exit 0
 
 _GLOBAL(ftrace_stub)
diff --git a/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
index ec95e99aff14..f23d3f74e029 100755
--- a/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
+++ b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
@@ -24,12 +24,13 @@ num_pfe_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entr
 
 cat > ${arch_vmlinux_S} <<EOF
 #include <asm/asm-offsets.h>
+#include <asm/ppc_asm.h>
 #include <linux/linkage.h>
 
 .pushsection .tramp.ftrace.text,"aw"
 SYM_DATA(ftrace_pfe_stub_text_count, .long ${num_pfe_stubs_text})
 
-SYM_CODE_START(ftrace_pfe_stub_text)
+SYM_START(ftrace_pfe_stub_text, SYM_L_GLOBAL, .balign SZL)
 	.space ${num_pfe_stubs_text} * FTRACE_PFE_STUB_SIZE
 SYM_CODE_END(ftrace_pfe_stub_text)
 .popsection
@@ -37,7 +38,7 @@ SYM_CODE_END(ftrace_pfe_stub_text)
 .pushsection .tramp.ftrace.init,"aw"
 SYM_DATA(ftrace_pfe_stub_inittext_count, .long ${num_pfe_stubs_inittext})
 
-SYM_CODE_START(ftrace_pfe_stub_inittext)
+SYM_START(ftrace_pfe_stub_inittext, SYM_L_GLOBAL, .balign SZL)
 	.space ${num_pfe_stubs_inittext} * FTRACE_PFE_STUB_SIZE
 SYM_CODE_END(ftrace_pfe_stub_inittext)
 .popsection
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 08/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (6 preceding siblings ...)
  2024-06-20 18:54 ` [RFC PATCH v3 07/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-06-20 18:54 ` [RFC PATCH v3 09/11] samples/ftrace: Add support for ftrace direct samples on powerpc Naveen N Rao
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.

ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.

Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.

When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                     |  1 +
 arch/powerpc/include/asm/ftrace.h        | 15 ++++
 arch/powerpc/kernel/asm-offsets.c        |  3 +
 arch/powerpc/kernel/trace/ftrace.c       |  9 +++
 arch/powerpc/kernel/trace/ftrace_entry.S | 99 ++++++++++++++++++------
 5 files changed, 105 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fde64ad19de5..96ae653bdcde 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -236,6 +236,7 @@ config PPC
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if FTRACE_PFE_OUT_OF_LINE || (PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
+	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 938cecf72eb1..fc0f25b10e86 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -147,6 +147,21 @@ extern unsigned long ftrace_pfe_stub_text_count, ftrace_pfe_stub_inittext_count;
 #endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When an ftrace registered caller is tracing a function that is also set by a
+ * register_ftrace_direct() call, it needs to be differentiated in the
+ * ftrace_caller trampoline so that the direct call can be invoked after the
+ * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field
+ * of pt_regs. This tells ftrace_caller that there's a direct caller.
+ */
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, unsigned long addr)
+{
+	struct pt_regs *regs = &fregs->regs;
+	regs->orig_gpr3 = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #else
 static inline void ftrace_free_init_tramp(void) { }
 static inline unsigned long ftrace_call_adjust(unsigned long addr) { return addr; }
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index a11ea5f4d86a..0b955dddeb28 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -680,6 +680,9 @@ int main(void)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
 	OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call);
+#endif
 #endif
 
 	return 0;
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 028548312c23..799612ee270f 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -153,6 +153,15 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
 		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
 
+	if (!is_offset_in_branch_range(addr - ip) && addr != FTRACE_ADDR && addr != FTRACE_REGS_ADDR) {
+		/* This can only happen with ftrace direct */
+		if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS)) {
+			pr_err("0x%lx (0x%lx): Unexpected target address 0x%lx\n", ip, rec->ip, addr);
+			return -EINVAL;
+		}
+		addr = FTRACE_ADDR;
+	}
+
 	if (is_offset_in_branch_range(addr - ip))
 		/* Within range */
 		stub = addr;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index a76aedd970a6..d5a63e60aafa 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,14 +33,38 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro	ftrace_regs_entry allregs
-	/* Save the original return address in A's stack frame */
-	PPC_STL		r0, LRSAVE(r1)
 	/* Create a minimal stack frame for representing B */
 	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
 
 	/* Create our stack frame + pt_regs */
 	PPC_STLU	r1,-SWITCH_FRAME_SIZE(r1)
 
+	.if \allregs == 1
+	SAVE_GPRS(11, 12, r1)
+	.endif
+
+	/* Get the _mcount() call site out of LR */
+	mflr	r11
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Load the ftrace_op */
+	PPC_LL	r12, -(MCOUNT_INSN_SIZE*2 + SZL)(r11)
+
+	/* Load direct_call from the ftrace_op */
+	PPC_LL	r12, FTRACE_OPS_DIRECT_CALL(r12)
+	PPC_LCMPI r12, 0
+	.if \allregs == 1
+	bne	.Lftrace_direct_call_regs
+	.else
+	bne	.Lftrace_direct_call
+	.endif
+#endif
+
+	/* Save the previous LR in pt_regs->link */
+	PPC_STL	r0, _LINK(r1)
+	/* Also save it in A's stack frame */
+	PPC_STL	r0, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE+LRSAVE(r1)
+
 	/* Save all gprs to pt_regs */
 	SAVE_GPR(0, r1)
 	SAVE_GPRS(3, 10, r1)
@@ -54,7 +78,7 @@
 
 	.if \allregs == 1
 	SAVE_GPR(2, r1)
-	SAVE_GPRS(11, 31, r1)
+	SAVE_GPRS(13, 31, r1)
 	.else
 #ifdef CONFIG_LIVEPATCH_64
 	SAVE_GPR(14, r1)
@@ -67,20 +91,15 @@
 
 	.if \allregs == 1
 	/* Load special regs for save below */
+	mfcr	r7
 	mfmsr   r8
 	mfctr   r9
 	mfxer   r10
-	mfcr	r11
 	.else
 	/* Clear MSR to flag as ftrace_caller versus frace_regs_caller */
 	li	r8, 0
 	.endif
 
-	/* Get the _mcount() call site out of LR */
-	mflr	r7
-	/* Save the read LR in pt_regs->link */
-	PPC_STL	r0, _LINK(r1)
-
 #ifdef CONFIG_PPC64
 	/* Save callee's TOC in the ABI compliant location */
 	std	r2, STK_GOT(r1)
@@ -88,8 +107,8 @@
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
-	/* r7 points to the instruction following the call to ftrace */
-	PPC_LL	r5, -(MCOUNT_INSN_SIZE*2 + SZL)(r7)
+	/* r11 points to the instruction following the call to ftrace */
+	PPC_LL	r5, -(MCOUNT_INSN_SIZE*2 + SZL)(r11)
 	PPC_LL	r12, FTRACE_OPS_FUNC(r5)
 	mtctr	r12
 #else /* !CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS */
@@ -105,40 +124,47 @@
 	/* Save special regs */
 	PPC_STL	r8, _MSR(r1)
 	.if \allregs == 1
+	PPC_STL	r7, _CCR(r1)
 	PPC_STL	r9, _CTR(r1)
 	PPC_STL	r10, _XER(r1)
-	PPC_STL	r11, _CCR(r1)
 	.endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Clear orig_gpr3 to later detect ftrace_direct call */
+	li	r7, 0
+	PPC_STL	r7, ORIG_GPR3(r1)
+#endif
+
 #ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
 	/* Save our real return address locally for return */
-	PPC_STL	r7, STACK_INT_FRAME_MARKER(r1)
+	PPC_STL	r11, STACK_INT_FRAME_MARKER(r1)
 	/*
-	 * We want the ftrace location in the function, but our lr (in r7)
+	 * We want the ftrace location in the function, but our lr (in r11)
 	 * points at the 'mtlr r0' instruction in the out of line stub.  To
 	 * recover the ftrace location, we read the branch instruction in the
 	 * stub, and adjust our lr by the branch offset.
 	 */
-	lwz	r8, MCOUNT_INSN_SIZE(r7)
+	lwz	r8, MCOUNT_INSN_SIZE(r11)
 	slwi	r8, r8, 6
 	srawi	r8, r8, 6
-	add	r3, r7, r8
+	add	r3, r11, r8
 	/*
 	 * Override our nip to point past the branch in the original function.
 	 * This allows reliable stack trace and the ftrace stack tracer to work as-is.
 	 */
-	add	r7, r3, MCOUNT_INSN_SIZE
+	add	r11, r3, MCOUNT_INSN_SIZE
 #else
 	/* Calculate ip from nip-4 into r3 for call below */
-	subi    r3, r7, MCOUNT_INSN_SIZE
+	subi    r3, r11, MCOUNT_INSN_SIZE
 #endif
 
 	/* Save NIP as pt_regs->nip */
-	PPC_STL	r7, _NIP(r1)
+	PPC_STL	r11, _NIP(r1)
 	/* Also save it in B's stackframe header for proper unwind */
-	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
+	PPC_STL	r11, LRSAVE+SWITCH_FRAME_SIZE(r1)
+
 #ifdef CONFIG_LIVEPATCH_64
-	mr	r14, r7		/* remember old NIP */
+	mr	r14, r11		/* remember old NIP */
 #endif
 
 	/* Put the original return address in r4 as parent_ip */
@@ -149,11 +175,21 @@
 .endm
 
 .macro	ftrace_regs_exit allregs
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Check orig_gpr3 to detect ftrace_direct call */
+	PPC_LL	r7, ORIG_GPR3(r1)
+	PPC_LCMPI cr1, r7, 0
+	mtctr	r7
+#endif
+
 #ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
 	/* Load ctr with the possibly modified NIP */
 	PPC_LL	r3, _NIP(r1)
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	bne	cr1,2f
+#endif
 	mtctr	r3
-
+2:
 #ifdef CONFIG_LIVEPATCH_64
 	cmpd	r14, r3		/* has NIP been altered? */
 #endif
@@ -198,7 +234,11 @@
         /* Based on the cmpd above, if the NIP was altered handle livepatch */
 	bne-	livepatch_handler
 #endif
+
 	/* jump after _mcount site */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	bnectr	cr1
+#endif
 #ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
 	blr
 #else
@@ -251,6 +291,21 @@ ftrace_no_trace:
 #endif
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+.Lftrace_direct_call_regs:
+	mtctr	r12
+	REST_GPRS(11, 12, r1)
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	bctr
+.Lftrace_direct_call:
+	mtctr	r12
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	bctr
+SYM_FUNC_START(ftrace_stub_direct_tramp)
+	blr
+SYM_FUNC_END(ftrace_stub_direct_tramp)
+#endif
+
 #ifdef CONFIG_LIVEPATCH_64
 	/*
 	 * This function runs in the mcount context, between two functions. As
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 09/11] samples/ftrace: Add support for ftrace direct samples on powerpc
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (7 preceding siblings ...)
  2024-06-20 18:54 ` [RFC PATCH v3 08/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Naveen N Rao
@ 2024-06-20 18:54 ` Naveen N Rao
  2024-06-20 19:09 ` [RFC PATCH v3 10/11] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel() Naveen N Rao
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 18:54 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.

On 64-bit powerpc, TOC setup requires some additional work.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                        |   2 +
 samples/ftrace/ftrace-direct-modify.c       |  85 +++++++++++++++-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 +++++++++++++++++++-
 samples/ftrace/ftrace-direct-multi.c        |  79 ++++++++++++++-
 samples/ftrace/ftrace-direct-too.c          |  83 +++++++++++++++-
 samples/ftrace/ftrace-direct.c              |  69 ++++++++++++-
 6 files changed, 414 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 96ae653bdcde..cf5780d6f7bf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -275,6 +275,8 @@ config PPC
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_RSEQ
+	select HAVE_SAMPLE_FTRACE_DIRECT	if HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	select HAVE_SAMPLE_FTRACE_DIRECT_MULTI	if HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_SETUP_PER_CPU_AREA		if PPC64
 	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index 81220390851a..fa8996e251c8 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,7 @@
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -199,6 +199,89 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+"	bl		my_direct_func1\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+"	bl		my_direct_func2\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static unsigned long my_tramp = (unsigned long)my_tramp1;
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index f943e40d57fd..f86ee2a89000 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -2,7 +2,7 @@
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -225,6 +225,105 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#define PPC_FTRACE_RECOVER_IP				\
+"	lwz		8, 4(3)\n"			\
+"	li		9, 6\n"				\
+"	slw		8, 8, 9\n"			\
+"	sraw		8, 8, 9\n"			\
+"	add		3, 3, 8\n"			\
+"	addi		3, 3, 4\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#define PPC_FTRACE_RECOVER_IP ""
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mr		3, 0\n"
+	PPC_FTRACE_RECOVER_IP
+"	bl		my_direct_func1\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mr		3, 0\n"
+	PPC_FTRACE_RECOVER_IP
+"	bl		my_direct_func2\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+	"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static unsigned long my_tramp = (unsigned long)my_tramp1;
 static unsigned long tramps[2] = {
 	(unsigned long)my_tramp1,
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index aed6df2927ce..e47513f128c4 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -4,7 +4,7 @@
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
 #include <linux/sched/stat.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -141,6 +141,83 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#define PPC_FTRACE_RECOVER_IP				\
+"	lwz		8, 4(3)\n"			\
+"	li		9, 6\n"				\
+"	slw		8, 8, 9\n"			\
+"	sraw		8, 8, 9\n"			\
+"	add		3, 3, 8\n"			\
+"	addi		3, 3, 4\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#define PPC_FTRACE_RECOVER_IP ""
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mr		3, 0\n"
+	PPC_FTRACE_RECOVER_IP
+"	bl		my_direct_func\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_multi_init(void)
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index 6ff546a5d7eb..10d2f54c8b39 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -3,7 +3,7 @@
 
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -153,6 +153,87 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 64
+#define STACK_FRAME_ARG1 32
+#define STACK_FRAME_ARG2 40
+#define STACK_FRAME_ARG3 48
+#define STACK_FRAME_ARG4 56
+#else
+#define STACK_FRAME_SIZE 32
+#define STACK_FRAME_ARG1 16
+#define STACK_FRAME_ARG2 20
+#define STACK_FRAME_ARG3 24
+#define STACK_FRAME_ARG4 28
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_ARG1)"(1)\n"
+	PPC_STL"	4, "__stringify(STACK_FRAME_ARG2)"(1)\n"
+	PPC_STL"	5, "__stringify(STACK_FRAME_ARG3)"(1)\n"
+	PPC_STL"	6, "__stringify(STACK_FRAME_ARG4)"(1)\n"
+"	bl		my_direct_func\n"
+	PPC_LL"		6, "__stringify(STACK_FRAME_ARG4)"(1)\n"
+	PPC_LL"		5, "__stringify(STACK_FRAME_ARG3)"(1)\n"
+	PPC_LL"		4, "__stringify(STACK_FRAME_ARG2)"(1)\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_ARG1)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_init(void)
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index ef0945670e1e..671d11dcbde8 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -3,7 +3,7 @@
 
 #include <linux/sched.h> /* for wake_up_process() */
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -134,6 +134,73 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include <asm/ppc_asm.h>
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE			\
+"	std		2, 24(1)\n"			\
+"	bcl		20, 31, 1f\n"			\
+"   1:	mflr		12\n"				\
+"	ld		2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE				\
+"	ld		2, 24(1)\n"
+#define PPC64_TOC					\
+"   99:	.quad		.TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtlr		0\n"
+#define PPC_FTRACE_RET					\
+"	blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR				\
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"	\
+"	mtctr		0\n"
+#define PPC_FTRACE_RET					\
+"	mtlr		0\n"				\
+"	bctr\n"
+#endif
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	mflr		0\n"
+	PPC_STL"	0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_STLU"	1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+	PPC64_TOC_SAVE_AND_UPDATE
+	PPC_STL"	3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"	bl		my_direct_func\n"
+	PPC_LL"		3, "__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+	PPC64_TOC_RESTORE
+"	addi		1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+	PPC_FTRACE_RESTORE_LR
+"	addi		1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+	PPC_LL"		0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+	PPC_FTRACE_RET
+	PPC64_TOC
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_init(void)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 10/11] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel()
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (8 preceding siblings ...)
  2024-06-20 18:54 ` [RFC PATCH v3 09/11] samples/ftrace: Add support for ftrace direct samples on powerpc Naveen N Rao
@ 2024-06-20 19:09 ` Naveen N Rao
  2024-06-20 19:09 ` [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines Naveen N Rao
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 19:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") enhanced
bpf_jit_emit_func_call_hlp() to handle calls out to module region, where
bpf progs are generated. The only difference now between
bpf_jit_emit_func_call_hlp() and bpf_jit_emit_func_call_rel() is in
handling of the initial pass where target function address is not known.
Fold that logic into bpf_jit_emit_func_call_hlp() and rename it to
bpf_jit_emit_func_call_rel() to simplify bpf function call JIT code.

We don't actually need to load/restore TOC across a call out to a
different kernel helper or to a different bpf program since they all
work with the kernel TOC. We only need to do it if we have to call out
to a module function. So, guard TOC load/restore with appropriate
conditions.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/net/bpf_jit_comp64.c | 61 +++++++++----------------------
 1 file changed, 17 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 7703dcf48be8..288ff32d676f 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -202,14 +202,22 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	EMIT(PPC_RAW_BLR());
 }
 
-static int
-bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
 {
 	unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
 	long reladdr;
 
-	if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
-		return -EINVAL;
+	/* bpf to bpf call, func is not known in the initial pass. Emit 5 nops as a placeholder */
+	if (!func) {
+		for (int i = 0; i < 5; i++)
+			EMIT(PPC_RAW_NOP());
+		/* elfv1 needs an additional instruction to load addr from descriptor */
+		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1))
+			EMIT(PPC_RAW_NOP());
+		EMIT(PPC_RAW_MTCTR(_R12));
+		EMIT(PPC_RAW_BCTRL());
+		return 0;
+	}
 
 #ifdef CONFIG_PPC_KERNEL_PCREL
 	reladdr = func_addr - local_paca->kernelbase;
@@ -266,7 +274,8 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context *ctx,
 			 * We can clobber r2 since we get called through a
 			 * function pointer (so caller will save/restore r2).
 			 */
-			EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
+			if (is_module_text_address(func_addr))
+				EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
 		} else {
 			PPC_LI64(_R12, func);
 			EMIT(PPC_RAW_MTCTR(_R12));
@@ -276,46 +285,14 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context *ctx,
 		 * Load r2 with kernel TOC as kernel TOC is used if function address falls
 		 * within core kernel text.
 		 */
-		EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)));
+		if (is_module_text_address(func_addr))
+			EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)));
 	}
 #endif
 
 	return 0;
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
-{
-	unsigned int i, ctx_idx = ctx->idx;
-
-	if (WARN_ON_ONCE(func && is_module_text_address(func)))
-		return -EINVAL;
-
-	/* skip past descriptor if elf v1 */
-	func += FUNCTION_DESCR_SIZE;
-
-	/* Load function address into r12 */
-	PPC_LI64(_R12, func);
-
-	/* For bpf-to-bpf function calls, the callee's address is unknown
-	 * until the last extra pass. As seen above, we use PPC_LI64() to
-	 * load the callee's address, but this may optimize the number of
-	 * instructions required based on the nature of the address.
-	 *
-	 * Since we don't want the number of instructions emitted to increase,
-	 * we pad the optimized PPC_LI64() call with NOPs to guarantee that
-	 * we always have a five-instruction sequence, which is the maximum
-	 * that PPC_LI64() can emit.
-	 */
-	if (!image)
-		for (i = ctx->idx - ctx_idx; i < 5; i++)
-			EMIT(PPC_RAW_NOP());
-
-	EMIT(PPC_RAW_MTCTR(_R12));
-	EMIT(PPC_RAW_BCTRL());
-
-	return 0;
-}
-
 static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
 {
 	/*
@@ -1047,11 +1024,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
 			if (ret < 0)
 				return ret;
 
-			if (func_addr_fixed)
-				ret = bpf_jit_emit_func_call_hlp(image, fimage, ctx, func_addr);
-			else
-				ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, func_addr);
-
+			ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, func_addr);
 			if (ret)
 				return ret;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (9 preceding siblings ...)
  2024-06-20 19:09 ` [RFC PATCH v3 10/11] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel() Naveen N Rao
@ 2024-06-20 19:09 ` Naveen N Rao
  2024-07-01 11:03   ` Nicholas Piggin
  2024-06-24 11:59 ` [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Vishal Chourasia
  2024-07-03 11:11 ` Vishal Chourasia
  12 siblings, 1 reply; 27+ messages in thread
From: Naveen N Rao @ 2024-06-20 19:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Nicholas Piggin, Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc.

BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.

BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.

While the code is generic, BPF trampolines are only enabled on 64-bit
powerpc. 32-bit powerpc will need testing and some updates.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/include/asm/ppc-opcode.h |  14 +
 arch/powerpc/net/bpf_jit.h            |  11 +
 arch/powerpc/net/bpf_jit_comp.c       | 702 +++++++++++++++++++++++++-
 arch/powerpc/net/bpf_jit_comp32.c     |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c     |   7 +-
 5 files changed, 738 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 076ae60b4a55..9eaa2c5d9b73 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -585,12 +585,26 @@
 #define PPC_RAW_MTSPR(spr, d)		(0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr))
 #define PPC_RAW_EIEIO()			(0x7c0006ac)
 
+/* bcl 20,31,$+4 */
+#define PPC_RAW_BCL()			(0x429f0005)
 #define PPC_RAW_BRANCH(offset)		(0x48000000 | PPC_LI(offset))
 #define PPC_RAW_BL(offset)		(0x48000001 | PPC_LI(offset))
 #define PPC_RAW_TW(t0, a, b)		(0x7c000008 | ___PPC_RS(t0) | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_TRAP()			PPC_RAW_TW(31, 0, 0)
 #define PPC_RAW_SETB(t, bfa)		(0x7c000100 | ___PPC_RT(t) | ___PPC_RA((bfa) << 2))
 
+#ifdef CONFIG_PPC32
+#define PPC_RAW_STL		PPC_RAW_STW
+#define PPC_RAW_STLU		PPC_RAW_STWU
+#define PPC_RAW_LL		PPC_RAW_LWZ
+#define PPC_RAW_CMPLI		PPC_RAW_CMPWI
+#else
+#define PPC_RAW_STL		PPC_RAW_STD
+#define PPC_RAW_STLU		PPC_RAW_STDU
+#define PPC_RAW_LL		PPC_RAW_LD
+#define PPC_RAW_CMPLI		PPC_RAW_CMPDI
+#endif
+
 /* Deal with instructions that older assemblers aren't aware of */
 #define	PPC_BCCTR_FLUSH		stringify_in_c(.long PPC_INST_BCCTR_FLUSH)
 #define	PPC_CP_ABORT		stringify_in_c(.long PPC_RAW_CP_ABORT)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index cdea5dccaefe..58cdfbfbef94 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -21,6 +21,9 @@
 
 #define CTX_NIA(ctx) ((unsigned long)ctx->idx * 4)
 
+#define SZL			sizeof(unsigned long)
+#define BPF_INSN_SAFETY		64
+
 #define PLANT_INSTR(d, idx, instr)					      \
 	do { if (d) { (d)[idx] = instr; } idx++; } while (0)
 #define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
@@ -81,6 +84,13 @@
 				EMIT(PPC_RAW_ORI(d, d, (uintptr_t)(i) &       \
 							0xffff));             \
 		} } while (0)
+#define PPC_LI_ADDR	PPC_LI64
+#define PPC64_LOAD_PACA()						      \
+	EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)))
+#else
+#define PPC_LI64	BUILD_BUG
+#define PPC_LI_ADDR	PPC_LI32
+#define PPC64_LOAD_PACA() BUILD_BUG()
 #endif
 
 /*
@@ -165,6 +175,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
 		       u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 984655419da5..54df51ce54c8 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -22,11 +22,81 @@
 
 #include "bpf_jit.h"
 
+/* These offsets are from bpf prog end and stay the same across progs */
+static int bpf_jit_ool_stub, bpf_jit_long_branch_stub;
+
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
 {
 	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
+void dummy_tramp(void);
+
+asm (
+"	.pushsection .text, \"ax\", @progbits	;"
+"	.global dummy_tramp			;"
+"	.type dummy_tramp, @function		;"
+"dummy_tramp:					;"
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+"	blr					;"
+#else
+"	mflr	11				;"
+"	mtctr	11				;"
+"	mtlr	0				;"
+"	bctr					;"
+#endif
+"	.size dummy_tramp, .-dummy_tramp	;"
+"	.popsection				;"
+);
+
+void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
+{
+	int ool_stub_idx, long_branch_stub_idx;
+
+	/*
+	 * Out-of-line stub:
+	 *	mflr	r0
+	 *	[b|bl]	tramp
+	 *	mtlr	r0 // only with CONFIG_FTRACE_PFE_OUT_OF_LINE
+	 *	b	bpf_func + 4
+	 */
+	ool_stub_idx = ctx->idx;
+	EMIT(PPC_RAW_MFLR(_R0));
+	EMIT(PPC_RAW_NOP());
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+		EMIT(PPC_RAW_MTLR(_R0));
+	WARN_ON_ONCE(!is_offset_in_branch_range(4 - (long)ctx->idx * 4)); /* TODO */
+	EMIT(PPC_RAW_BRANCH(4 - (long)ctx->idx * 4));
+
+	/*
+	 * Long branch stub:
+	 *	.long	<dummy_tramp_addr>
+	 *	mflr	r11
+	 *	bcl	20,31,$+4
+	 *	mflr	r12
+	 *	ld	r12, -8-SZL(r12)
+	 *	mtctr	r12
+	 *	mtlr	r11 // needed to retain ftrace ABI
+	 *	bctr
+	 */
+	if (image)
+		*((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
+	ctx->idx += SZL / 4;
+	long_branch_stub_idx = ctx->idx;
+	EMIT(PPC_RAW_MFLR(_R11));
+	EMIT(PPC_RAW_BCL());
+	EMIT(PPC_RAW_MFLR(_R12));
+	EMIT(PPC_RAW_LL(_R12, _R12, -8-SZL));
+	EMIT(PPC_RAW_MTCTR(_R12));
+	EMIT(PPC_RAW_MTLR(_R11));
+	EMIT(PPC_RAW_BCTR());
+
+	if (!bpf_jit_ool_stub) {
+		bpf_jit_ool_stub = (ctx->idx - ool_stub_idx) * 4;
+		bpf_jit_long_branch_stub = (ctx->idx - long_branch_stub_idx) * 4;
+	}
+}
+
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
 {
 	if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
@@ -222,7 +292,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 	fp->bpf_func = (void *)fimage;
 	fp->jited = 1;
-	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
+	fp->jited_len = cgctx.idx * 4 + FUNCTION_DESCR_SIZE;
 
 	if (!fp->is_func || extra_pass) {
 		if (bpf_jit_binary_pack_finalize(fp, fhdr, hdr)) {
@@ -369,3 +439,633 @@ bool bpf_jit_supports_far_kfunc_call(void)
 {
 	return IS_ENABLED(CONFIG_PPC64);
 }
+
+void *arch_alloc_bpf_trampoline(unsigned int size)
+{
+	return bpf_prog_pack_alloc(size, bpf_jit_fill_ill_insns);
+}
+
+void arch_free_bpf_trampoline(void *image, unsigned int size)
+{
+	bpf_prog_pack_free(image, size);
+}
+
+int arch_protect_bpf_trampoline(void *image, unsigned int size)
+{
+	return 0;
+}
+
+static int invoke_bpf_prog(u32 *image, u32 *ro_image, struct codegen_context *ctx, struct bpf_tramp_link *l,
+			   int regs_off, int retval_off, int run_ctx_off, bool save_ret)
+{
+	struct bpf_prog *p = l->link.prog;
+	ppc_inst_t branch_insn;
+	u32 jmp_idx;
+	int ret = 0;
+
+	/* Save cookie */
+	if (IS_ENABLED(CONFIG_PPC64)) {
+		PPC_LI64(_R3, l->cookie);
+		EMIT(PPC_RAW_STD(_R3, _R1, run_ctx_off + offsetof(struct bpf_tramp_run_ctx, bpf_cookie)));
+	} else {
+		PPC_LI32(_R3, l->cookie >> 32);
+		PPC_LI32(_R4, l->cookie);
+		EMIT(PPC_RAW_STW(_R3, _R1, run_ctx_off + offsetof(struct bpf_tramp_run_ctx, bpf_cookie)));
+		EMIT(PPC_RAW_STW(_R4, _R1, run_ctx_off + offsetof(struct bpf_tramp_run_ctx, bpf_cookie) + 4));
+	}
+
+	/* __bpf_prog_enter(p, &bpf_tramp_run_ctx) */
+	PPC_LI_ADDR(_R3, p);
+	EMIT(PPC_RAW_MR(_R25, _R3));
+	EMIT(PPC_RAW_ADDI(_R4, _R1, run_ctx_off));
+	ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx, (unsigned long)bpf_trampoline_enter(p));
+	if (ret)
+		return ret;
+
+	/* Remember prog start time returned by __bpf_prog_enter */
+	EMIT(PPC_RAW_MR(_R26, _R3));
+
+	/*
+	 * if (__bpf_prog_enter(p) == 0)
+	 *	goto skip_exec_of_prog;
+	 *
+	 * Emit a nop to be later patched with conditional branch, once offset is known
+	 */
+	EMIT(PPC_RAW_CMPLI(_R3, 0));
+	jmp_idx = ctx->idx;
+	EMIT(PPC_RAW_NOP());
+
+	/* p->bpf_func(ctx) */
+	EMIT(PPC_RAW_ADDI(_R3, _R1, regs_off));
+	if (!p->jited)
+		PPC_LI_ADDR(_R4, (unsigned long)p->insnsi);
+	if (!create_branch(&branch_insn, (u32 *)&ro_image[ctx->idx], (unsigned long)p->bpf_func, BRANCH_SET_LINK)) {
+		if (image)
+			image[ctx->idx] = ppc_inst_val(branch_insn);
+		ctx->idx++;
+	} else {
+		EMIT(PPC_RAW_LL(_R12, _R25, offsetof(struct bpf_prog, bpf_func)));
+		EMIT(PPC_RAW_MTCTR(_R12));
+		EMIT(PPC_RAW_BCTRL());
+	}
+
+	if (save_ret)
+		EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
+
+	/* Fix up branch */
+	if (image) {
+		if (create_cond_branch(&branch_insn, &image[jmp_idx],
+				       (unsigned long)&image[ctx->idx], COND_EQ << 16))
+			return -EINVAL;
+		image[jmp_idx] = ppc_inst_val(branch_insn);
+	}
+
+	/* __bpf_prog_exit(p, start_time, &bpf_tramp_run_ctx) */
+	EMIT(PPC_RAW_MR(_R3, _R25));
+	EMIT(PPC_RAW_MR(_R4, _R26));
+	EMIT(PPC_RAW_ADDI(_R5, _R1, run_ctx_off));
+	ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx, (unsigned long)bpf_trampoline_exit(p));
+
+	return ret;
+}
+
+static int invoke_bpf_mod_ret(u32 *image, u32 *ro_image, struct codegen_context *ctx, struct bpf_tramp_links *tl,
+			      int regs_off, int retval_off, int run_ctx_off, u32 *branches)
+{
+	int i;
+
+	/*
+	 * The first fmod_ret program will receive a garbage return value.
+	 * Set this to 0 to avoid confusing the program.
+	 */
+	EMIT(PPC_RAW_LI(_R3, 0));
+	EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
+	for (i = 0; i < tl->nr_links; i++) {
+		if (invoke_bpf_prog(image, ro_image, ctx, tl->links[i], regs_off, retval_off, run_ctx_off, true))
+			return -EINVAL;
+
+		/*
+		 * mod_ret prog stored return value after prog ctx. Emit:
+		 * if (*(u64 *)(ret_val) !=  0)
+		 *	goto do_fexit;
+		 */
+		EMIT(PPC_RAW_LL(_R3, _R1, retval_off));
+		EMIT(PPC_RAW_CMPLI(_R3, 0));
+
+		/*
+		 * Save the location of the branch and generate a nop, which is
+		 * replaced with a conditional jump once do_fexit (i.e. the
+		 * start of the fexit invocation) is finalized.
+		 */
+		branches[i] = ctx->idx;
+		EMIT(PPC_RAW_NOP());
+	}
+
+	return 0;
+}
+
+static void bpf_trampoline_save_args(u32 *image, struct codegen_context *ctx, int nr_regs, int regs_off)
+{
+	for (int i = 0; i < nr_regs; i++)
+		EMIT(PPC_RAW_STL(_R3 + i, _R1, regs_off + i * SZL));
+}
+
+static void bpf_trampoline_restore_args(u32 *image, struct codegen_context *ctx, int nr_regs, int regs_off)
+{
+	for (int i = 0; i < nr_regs; i++)
+		EMIT(PPC_RAW_LL(_R3 + i, _R1, regs_off + i * SZL));
+}
+
+static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_image,
+					 void *rw_image_end, void *ro_image,
+					 const struct btf_func_model *m, u32 flags,
+					 struct bpf_tramp_links *tlinks,
+					 void *func_addr)
+{
+	int regs_off, nregs_off, ip_off, run_ctx_off, retval_off, nvr_off, alt_lr_off;
+	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+	int i, ret, nr_regs = m->nr_args, stack_size = 0;
+	struct codegen_context codegen_ctx, *ctx;
+	u32 *image = (u32 *)rw_image;
+	ppc_inst_t branch_insn;
+	u32 *branches = NULL;
+	bool save_ret;
+
+	/* TODO: 32-bit enablement */
+	if (IS_ENABLED(CONFIG_PPC32))
+		return -ENOTSUPP;
+
+	/* Extra registers for struct arguments */
+	for (i = 0; i < m->nr_args; i++)
+		if (m->arg_size[i] > sizeof(unsigned long))
+			nr_regs += round_up(m->arg_size[i], sizeof(unsigned long)) / sizeof(unsigned long) - 1;
+
+	/* TODO: We only support functions with in-register arguments - up to 8 per powerpc ABI */
+	if (nr_regs > 8)
+		return -ENOTSUPP;
+
+	ctx = &codegen_ctx;
+	memset(ctx, 0, sizeof(*ctx));
+
+	/*
+	 * Generated stack layout:
+	 *
+	 * func prev back chain         [ back chain        ]
+	 *                              [                   ] --
+	 * LR save area                 [ r0 save (64-bit)  ]   | header
+	 *                              [ r0 save (32-bit)  ]   |
+	 * dummy frame for unwind       [ back chain 1      ] --
+	 *                              [ padding           ] align stack frame
+	 *       alt_lr_off             [ real lr (pfe ool) ] optional - actual lr
+	 *                              [ r26?              ]
+	 *       nvr_off                [ r25?              ] nvr save area
+	 *       retval_off             [ return value      ]
+	 *                              [ reg argN          ]
+	 *                              [ ...               ]
+	 *       regs_off               [ reg_arg1          ] prog ctx context
+	 *       nregs_off              [ args count        ]
+	 *       ip_off                 [ traced function   ]
+	 *                              [ ...               ]
+	 *       run_ctx_off            [ bpf_tramp_run_ctx ]
+	 *                              [ TOC save (64-bit) ] --
+	 *                              [ LR save (64-bit)  ]   | header
+	 *                              [ LR save (32-bit)  ]   |
+	 * bpf trampoline frame	        [ back chain 2      ] --
+	 *
+	 * TODO: tail call cnt?
+	 */
+
+	/* Minimum stack frame header */
+	stack_size = STACK_FRAME_MIN_SIZE;
+
+	/* Room for struct bpf_tramp_run_ctx */
+	run_ctx_off = stack_size;
+	stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), SZL);
+
+	/* Room for IP address argument */
+	ip_off = stack_size;
+	if (flags & BPF_TRAMP_F_IP_ARG)
+		stack_size += SZL;
+
+	/* Room for args count */
+	nregs_off = stack_size;
+	stack_size += SZL;
+
+	/* Room for args */
+	regs_off = stack_size;
+	stack_size += nr_regs * SZL;
+
+	/* Room for return value of func_addr or fentry prog */
+	retval_off = stack_size;
+	save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+	if (save_ret)
+		stack_size += SZL;
+
+	/* Room for nvr save area */
+	nvr_off = stack_size;
+	stack_size += 2 * SZL;
+
+	/* Optional save area for actual LR in case of ool ftrace */
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) {
+		alt_lr_off = stack_size;
+		stack_size += SZL;
+	}
+
+	/* Padding to align stack frame, if any */
+	stack_size = round_up(stack_size, SZL * 2);
+
+
+	/* Create dummy frame for unwind, store original return value */
+	EMIT(PPC_RAW_STL(_R0, _R1, PPC_LR_STKOFF));
+	EMIT(PPC_RAW_STLU(_R1, _R1, -STACK_FRAME_MIN_SIZE));
+
+	/* Create our stack frame */
+	EMIT(PPC_RAW_STLU(_R1, _R1, -stack_size));
+
+	/* 64-bit: Save TOC and load kernel TOC */
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
+		EMIT(PPC_RAW_STD(_R2, _R1, 24));
+		PPC64_LOAD_PACA();
+	}
+
+	bpf_trampoline_save_args(image, ctx, nr_regs, regs_off);
+
+	/* Save our return address */
+	EMIT(PPC_RAW_MFLR(_R3));
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+		EMIT(PPC_RAW_STL(_R3, _R1, alt_lr_off));
+	else
+		EMIT(PPC_RAW_STL(_R3, _R1, stack_size + PPC_LR_STKOFF));
+
+	/*
+	 * Save ip address of the traced function.
+	 * We could recover this from LR, but we will need to address for OOL trampoline,
+	 * and optional GEP area.
+	 */
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) || flags & BPF_TRAMP_F_IP_ARG) {
+		EMIT(PPC_RAW_LWZ(_R4, _R3, 4));
+		EMIT(PPC_RAW_SLWI(_R4, _R4, 6));
+		EMIT(PPC_RAW_SRAWI(_R4, _R4, 6));
+		EMIT(PPC_RAW_ADD(_R3, _R3, _R4));
+		EMIT(PPC_RAW_ADDI(_R3, _R3, 4));
+	}
+
+	if (flags & BPF_TRAMP_F_IP_ARG)
+		EMIT(PPC_RAW_STL(_R3, _R1, ip_off));
+
+	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
+		/* Fake our LR for unwind */
+		EMIT(PPC_RAW_STL(_R3, _R1, stack_size + PPC_LR_STKOFF));
+
+	/* Save function arg count -- see bpf_get_func_arg_cnt() */
+	/* TODO: should this be nr_args? */
+	EMIT(PPC_RAW_LI(_R3, nr_regs));
+	EMIT(PPC_RAW_STL(_R3, _R1, nregs_off));
+
+	/* Save nv regs */
+	EMIT(PPC_RAW_STL(_R25, _R1, nvr_off));
+	EMIT(PPC_RAW_STL(_R26, _R1, nvr_off + SZL));
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		PPC_LI_ADDR(_R3, (unsigned long)im);
+		ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx, (unsigned long)__bpf_tramp_enter);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < fentry->nr_links; i++)
+		if (invoke_bpf_prog(image, ro_image, ctx, fentry->links[i], regs_off, retval_off, run_ctx_off,
+				    flags & BPF_TRAMP_F_RET_FENTRY_RET))
+			return -EINVAL;
+
+	if (fmod_ret->nr_links) {
+		branches = kcalloc(fmod_ret->nr_links, sizeof(u32), GFP_KERNEL);
+		if (!branches)
+			return -ENOMEM;
+
+		if (invoke_bpf_mod_ret(image, ro_image, ctx, fmod_ret, regs_off, retval_off, run_ctx_off, branches)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+	}
+
+	/* Call the traced function */
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		/*
+		 * The address in LR save area points to the correct point in the original function
+		 * with both FTRACE_PFE_OUT_OF_LINE as well as with traditional ftrace instruction
+		 * sequence
+		 */
+		EMIT(PPC_RAW_LL(_R3, _R1, stack_size + PPC_LR_STKOFF));
+		EMIT(PPC_RAW_MTCTR(_R3));
+
+		/* Restore args */
+		bpf_trampoline_restore_args(image, ctx, nr_regs, regs_off);
+
+		/* Restore TOC for 64-bit */
+		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+			EMIT(PPC_RAW_LD(_R2, _R1, 24));
+		EMIT(PPC_RAW_BCTRL());
+		if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+			PPC64_LOAD_PACA();
+
+		/* Store return value for bpf prog to access */
+		EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
+
+		/* Reserve space to patch branch instruction to skip fexit progs */
+		im->ip_after_call = &((u32 *)ro_image)[ctx->idx];
+		EMIT(PPC_RAW_NOP());
+	}
+
+	/* Update branches saved in invoke_bpf_mod_ret with address of do_fexit */
+	for (i = 0; i < fmod_ret->nr_links && image; i++) {
+		if (create_cond_branch(&branch_insn, &image[branches[i]],
+				       (unsigned long)&image[ctx->idx], COND_NE << 16)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+
+		image[branches[i]] = ppc_inst_val(branch_insn);
+	}
+
+	for (i = 0; i < fexit->nr_links; i++)
+		if (invoke_bpf_prog(image, ro_image, ctx, fexit->links[i], regs_off, retval_off, run_ctx_off, false)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		im->ip_epilogue = &((u32 *)ro_image)[ctx->idx];
+		PPC_LI_ADDR(_R3, im);
+		ret = bpf_jit_emit_func_call_rel(image, ro_image, ctx, (unsigned long)__bpf_tramp_exit);
+		if (ret)
+			goto cleanup;
+	}
+
+	if (flags & BPF_TRAMP_F_RESTORE_REGS)
+		bpf_trampoline_restore_args(image, ctx, nr_regs, regs_off);
+
+	/* Restore return value of func_addr or fentry prog */
+	if (save_ret)
+		EMIT(PPC_RAW_LL(_R3, _R1, retval_off));
+
+	/* Restore nv regs */
+	EMIT(PPC_RAW_LL(_R26, _R1, nvr_off + SZL));
+	EMIT(PPC_RAW_LL(_R25, _R1, nvr_off));
+
+	/* Epilogue */
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+		EMIT(PPC_RAW_LD(_R2, _R1, 24));
+	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+		/* Skip the traced function and return to parent */
+		EMIT(PPC_RAW_ADDI(_R1, _R1, stack_size + STACK_FRAME_MIN_SIZE));
+		EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
+		EMIT(PPC_RAW_MTLR(_R0));
+		EMIT(PPC_RAW_BLR());
+	} else {
+		if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) {
+			EMIT(PPC_RAW_LL(_R0, _R1, alt_lr_off));
+			EMIT(PPC_RAW_MTLR(_R0));
+			EMIT(PPC_RAW_ADDI(_R1, _R1, stack_size + STACK_FRAME_MIN_SIZE));
+			EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
+			EMIT(PPC_RAW_BLR());
+		} else {
+			EMIT(PPC_RAW_LL(_R0, _R1, stack_size + PPC_LR_STKOFF));
+			EMIT(PPC_RAW_MTCTR(_R0));
+			EMIT(PPC_RAW_ADDI(_R1, _R1, stack_size + STACK_FRAME_MIN_SIZE));
+			EMIT(PPC_RAW_LL(_R0, _R1, PPC_LR_STKOFF));
+			EMIT(PPC_RAW_MTLR(_R0));
+			EMIT(PPC_RAW_BCTR());
+		}
+	}
+
+	/* Make sure the trampoline generation logic doesn't overflow */
+	if (image && WARN_ON_ONCE(&image[ctx->idx] > (u32 *)rw_image_end - BPF_INSN_SAFETY)) {
+		ret = -EFAULT;
+		goto cleanup;
+	}
+	ret = ctx->idx * 4 + BPF_INSN_SAFETY * 4;
+
+cleanup:
+	kfree(branches);
+	return ret;
+}
+
+int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
+			     struct bpf_tramp_links *tlinks, void *func_addr)
+{
+	struct bpf_tramp_image im;
+	void *image;
+	int ret;
+
+	/*
+	 * Allocate a temporary buffer for __arch_prepare_bpf_trampoline().
+	 * This will NOT cause fragmentation in direct map, as we do not
+	 * call set_memory_*() on this buffer.
+	 *
+	 * We cannot use kvmalloc here, because we need image to be in
+	 * module memory range.
+	 */
+	image = bpf_jit_alloc_exec(PAGE_SIZE);
+	if (!image)
+		return -ENOMEM;
+
+	ret = __arch_prepare_bpf_trampoline(&im, image, image + PAGE_SIZE, image,
+					    m, flags, tlinks, func_addr);
+	bpf_jit_free_exec(image);
+
+	return ret;
+}
+
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
+				const struct btf_func_model *m, u32 flags,
+				struct bpf_tramp_links *tlinks,
+				void *func_addr)
+{
+	u32 size = image_end - image;
+	void *rw_image, *tmp;
+	int ret;
+
+	/*
+	 * rw_image doesn't need to be in module memory range, so we can
+	 * use kvmalloc.
+	 */
+	rw_image = kvmalloc(size, GFP_KERNEL);
+	if (!rw_image)
+		return -ENOMEM;
+
+	ret = __arch_prepare_bpf_trampoline(im, rw_image, rw_image + size, image, m,
+					    flags, tlinks, func_addr);
+	if (ret < 0)
+		goto out;
+
+	if (bpf_jit_enable > 1)
+		bpf_jit_dump(1, ret - BPF_INSN_SAFETY * 4, 1, rw_image);
+
+	tmp = bpf_arch_text_copy(image, rw_image, size);
+	if (IS_ERR(tmp))
+		ret = PTR_ERR(tmp);
+
+out:
+	kvfree(rw_image);
+	return ret;
+}
+
+static int bpf_modify_inst(void *ip, ppc_inst_t old_inst, ppc_inst_t new_inst)
+{
+	ppc_inst_t org_inst;
+
+	if (copy_inst_from_kernel_nofault(&org_inst, ip)) {
+		pr_err("0x%lx: fetching instruction failed\n", (unsigned long)ip);
+		return -EFAULT;
+	}
+
+	if (!ppc_inst_equal(org_inst, old_inst)) {
+		pr_err("0x%lx: expected (%08lx) != found (%08lx)\n",
+		       (unsigned long)ip, ppc_inst_as_ulong(old_inst), ppc_inst_as_ulong(org_inst));
+		return -EINVAL;
+	}
+
+	if (ppc_inst_equal(old_inst, new_inst))
+		return 0;
+
+	return patch_instruction(ip, new_inst);
+}
+
+/*
+ * A 3-step process for bpf prog entry:
+ * 1. At bpf prog entry, a single nop/b:
+ * bpf_func:
+ *	[nop|b]	ool_stub
+ * 2. Out-of-line stub:
+ * ool_stub:
+ *	mflr	r0
+ *	[b|bl]	<bpf_prog>/<long_branch_stub>
+ *	mtlr	r0 // CONFIG_FTRACE_PFE_OUT_OF_LINE only
+ *	b	bpf_func + 4
+ * 3. Long branch stub:
+ * long_branch_stub:
+ *	.long	<branch_addr>/<dummy_tramp>
+ *	mflr	r11
+ *	bcl	20,31,$+4
+ *	mflr	r12
+ *	ld	r12, -16(r12)
+ *	mtctr	r12
+ *	mtlr	r11 // needed to retain ftrace ABI
+ *	bctr
+ *
+ * dummy_tramp is used to reduce synchronization requirements. Old trampoline is only freed
+ * through bpf_tramp_image_put() which uses rcu_tasks, so we don't need to do the same here.
+ */
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
+		       void *old_addr, void *new_addr)
+{
+	unsigned long bpf_func, bpf_func_end, size, offset;
+	ppc_inst_t old_inst, new_inst;
+	int ret = 0, branch_flags;
+	char name[KSYM_NAME_LEN];
+
+	/* TODO: 32-bit enablement */
+	if (IS_ENABLED(CONFIG_PPC32))
+		return -ENOTSUPP;
+
+	bpf_func = (unsigned long)ip;
+	branch_flags = poke_type == BPF_MOD_CALL ? BRANCH_SET_LINK : 0;
+
+	/* We currently only support poking bpf programs */
+	if (!__bpf_address_lookup(bpf_func, &size, &offset, name)) {
+		pr_err("%s (0x%lx): kernel/modules are not supported\n", __func__, bpf_func);
+		return -ENOTSUPP;
+	}
+
+	/*
+	 * If we are not poking at bpf prog entry, then we are simply patching in/out
+	 * an unconditional branch instruction at im->ip_after_call
+	 */
+	if (offset) {
+		if (poke_type != BPF_MOD_JUMP) {
+			pr_err("%s (0x%lx): calls are not supported in bpf prog body\n", __func__,
+			       bpf_func);
+			return -ENOTSUPP;
+		}
+		old_inst = ppc_inst(PPC_RAW_NOP());
+		if (old_addr)
+			if (create_branch(&old_inst, ip, (unsigned long)old_addr, 0))
+				return -ERANGE;
+		new_inst = ppc_inst(PPC_RAW_NOP());
+		if (new_addr)
+			if (create_branch(&new_inst, ip, (unsigned long)new_addr, 0))
+				return -ERANGE;
+		mutex_lock(&text_mutex);
+		ret = bpf_modify_inst(ip, old_inst, new_inst);
+		mutex_unlock(&text_mutex);
+		return ret;
+	}
+
+	bpf_func_end = bpf_func + size;
+
+	/* Address of the jmp/call instruction in the out-of-line stub */
+	ip = (void *)(bpf_func_end - bpf_jit_ool_stub + 4);
+
+	if (!is_offset_in_branch_range((long)ip - 4 - bpf_func)) {
+		pr_err("%s (0x%lx): bpf prog too large, ool stub out of branch range\n", __func__,
+		       bpf_func);
+		return -ERANGE;
+	}
+
+	old_inst = ppc_inst(PPC_RAW_NOP());
+	if (old_addr) {
+		if (is_offset_in_branch_range(ip - old_addr))
+			create_branch(&old_inst, ip, (unsigned long)old_addr, branch_flags);
+		else
+			create_branch(&old_inst, ip, bpf_func_end - bpf_jit_long_branch_stub,
+				      branch_flags);
+	}
+	new_inst = ppc_inst(PPC_RAW_NOP());
+	if (new_addr) {
+		if (is_offset_in_branch_range(ip - new_addr))
+			create_branch(&new_inst, ip, (unsigned long)new_addr, branch_flags);
+		else
+			create_branch(&new_inst, ip, bpf_func_end - bpf_jit_long_branch_stub,
+				      branch_flags);
+	}
+
+	mutex_lock(&text_mutex);
+
+	/*
+	 * 1. Update the address in the long branch stub:
+	 * If new_addr is out of range, we will have to use the long branch stub, so patch new_addr
+	 * here. Otherwise, revert to dummy_tramp, but only if we had patched old_addr here.
+	 */
+	if ((new_addr && !is_offset_in_branch_range(new_addr - ip)) ||
+	    (old_addr && !is_offset_in_branch_range(old_addr - ip)))
+		ret = patch_ulong((void *)(bpf_func_end - bpf_jit_long_branch_stub - SZL),
+				  (new_addr && !is_offset_in_branch_range(new_addr - ip)) ?
+				  (unsigned long)new_addr : (unsigned long)dummy_tramp);
+	if (ret)
+		goto out;
+
+	/* 2. Update the branch/call in the out-of-line stub */
+	ret = bpf_modify_inst(ip, old_inst, new_inst);
+	if (ret)
+		goto out;
+
+	/* 3. Update instruction at bpf prog entry */
+	ip = (void *)bpf_func;
+	if (!old_addr || !new_addr) {
+		if (!old_addr) {
+			old_inst = ppc_inst(PPC_RAW_NOP());
+			create_branch(&new_inst, ip, bpf_func_end - bpf_jit_ool_stub, 0);
+		} else {
+			new_inst = ppc_inst(PPC_RAW_NOP());
+			create_branch(&old_inst, ip, bpf_func_end - bpf_jit_ool_stub, 0);
+		}
+		ret = bpf_modify_inst(ip, old_inst, new_inst);
+	}
+
+out:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index a0c4f1bde83e..c4db278dae36 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -127,13 +127,16 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 {
 	int i;
 
+	/* Instruction for trampoline attach */
+	EMIT(PPC_RAW_NOP());
+
 	/* Initialize tail_call_cnt, to be skipped if we do tail calls. */
 	if (ctx->seen & SEEN_TAILCALL)
 		EMIT(PPC_RAW_LI(_R4, 0));
 	else
 		EMIT(PPC_RAW_NOP());
 
-#define BPF_TAILCALL_PROLOGUE_SIZE	4
+#define BPF_TAILCALL_PROLOGUE_SIZE	8
 
 	if (bpf_has_stack_frame(ctx))
 		EMIT(PPC_RAW_STWU(_R1, _R1, -BPF_PPC_STACKFRAME(ctx)));
@@ -198,6 +201,8 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	bpf_jit_emit_common_epilogue(image, ctx);
 
 	EMIT(PPC_RAW_BLR());
+
+	bpf_jit_build_fentry_stubs(image, ctx);
 }
 
 /* Relative offset needs to be calculated based on final image location */
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 288ff32d676f..b58d1be63ea4 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -126,6 +126,9 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 {
 	int i;
 
+	/* Instruction for trampoline attach */
+	EMIT(PPC_RAW_NOP());
+
 #ifndef CONFIG_PPC_KERNEL_PCREL
 	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
 		EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)));
@@ -200,6 +203,8 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
 
 	EMIT(PPC_RAW_BLR());
+
+	bpf_jit_build_fentry_stubs(image, ctx);
 }
 
 int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
@@ -303,7 +308,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
 	 */
 	int b2p_bpf_array = bpf_to_ppc(BPF_REG_2);
 	int b2p_index = bpf_to_ppc(BPF_REG_3);
-	int bpf_tailcall_prologue_size = 8;
+	int bpf_tailcall_prologue_size = 12;
 
 	if (!IS_ENABLED(CONFIG_PPC_KERNEL_PCREL) && IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
 		bpf_tailcall_prologue_size += 4; /* skip past the toc load */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (10 preceding siblings ...)
  2024-06-20 19:09 ` [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines Naveen N Rao
@ 2024-06-24 11:59 ` Vishal Chourasia
  2024-07-03 11:11 ` Vishal Chourasia
  12 siblings, 0 replies; 27+ messages in thread
From: Vishal Chourasia @ 2024-06-24 11:59 UTC (permalink / raw)
  To: Naveen N Rao
  Cc: linuxppc-dev, linux-trace-kernel, bpf, Michael Ellerman,
	Steven Rostedt, Masami Hiramatsu, Nicholas Piggin,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa, Tejun Heo, David Vernet

On Fri, Jun 21, 2024 at 12:24:03AM +0530, Naveen N Rao wrote:
> This is v3 of the patches posted here:
> http://lkml.kernel.org/r/cover.1718008093.git.naveen@kernel.org
> 
> Since v2, I have addressed review comments from Steven and Masahiro 
> along with a few fixes. Patches 7-11 are new in this series and add 
> support for ftrace direct and bpf trampolines. 
> 
> This series depends on the patch series from Benjamin Gray adding 
> support for patch_ulong():
> http://lkml.kernel.org/r/20240515024445.236364-1-bgray@linux.ibm.com

This patchset, along with its dependent patchset [2], enables support 
for BPF schedulers introduced by the sched_ext patchset [1] on 
Power Architecture.

Before this patchset, BPF schedulers were failing to load due to the 
lack of trampoline support.

GitHub repo for sched_ext: https://github.com/sched-ext/sched_ext
Note: This repo has been archived. Please refer to the description 
provided on the repo homepage for more details.

[1] v6 https://lore.kernel.org/lkml/20240501151312.635565-1-tj@kernel.org
[2] http://lkml.kernel.org/r/20240515024445.236364-1-bgray@linux.ibm.com

Thank you for the patch!
> 
> 
> - Naveen
> 
> 
> Naveen N Rao (11):
>   powerpc/kprobes: Use ftrace to determine if a probe is at function
>     entry
>   powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
>   powerpc/module_64: Convert #ifdef to IS_ENABLED()
>   powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
>   kbuild: Add generic hook for architectures to use before the final
>     vmlinux link
>   powerpc64/ftrace: Move ftrace sequence out of line
>   powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
>   powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>   samples/ftrace: Add support for ftrace direct samples on powerpc
>   powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into
>     bpf_jit_emit_func_call_rel()
>   powerpc64/bpf: Add support for bpf trampolines
> 
>  arch/Kconfig                                |   3 +
>  arch/powerpc/Kconfig                        |   9 +
>  arch/powerpc/Makefile                       |   8 +
>  arch/powerpc/include/asm/ftrace.h           |  29 +-
>  arch/powerpc/include/asm/module.h           |   5 +
>  arch/powerpc/include/asm/ppc-opcode.h       |  14 +
>  arch/powerpc/kernel/asm-offsets.c           |  11 +
>  arch/powerpc/kernel/kprobes.c               |  18 +-
>  arch/powerpc/kernel/module_64.c             |  67 +-
>  arch/powerpc/kernel/trace/ftrace.c          | 269 +++++++-
>  arch/powerpc/kernel/trace/ftrace_64_pg.c    |  73 +-
>  arch/powerpc/kernel/trace/ftrace_entry.S    | 210 ++++--
>  arch/powerpc/kernel/vmlinux.lds.S           |   3 +-
>  arch/powerpc/net/bpf_jit.h                  |  11 +
>  arch/powerpc/net/bpf_jit_comp.c             | 702 +++++++++++++++++++-
>  arch/powerpc/net/bpf_jit_comp32.c           |   7 +-
>  arch/powerpc/net/bpf_jit_comp64.c           |  68 +-
>  arch/powerpc/tools/Makefile                 |  10 +
>  arch/powerpc/tools/gen-ftrace-pfe-stubs.sh  |  49 ++
>  samples/ftrace/ftrace-direct-modify.c       |  85 ++-
>  samples/ftrace/ftrace-direct-multi-modify.c | 101 ++-
>  samples/ftrace/ftrace-direct-multi.c        |  79 ++-
>  samples/ftrace/ftrace-direct-too.c          |  83 ++-
>  samples/ftrace/ftrace-direct.c              |  69 +-
>  scripts/Makefile.vmlinux                    |   8 +
>  scripts/link-vmlinux.sh                     |  11 +-
>  26 files changed, 1813 insertions(+), 189 deletions(-)
>  create mode 100644 arch/powerpc/tools/Makefile
>  create mode 100755 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> 
> 
> base-commit: e2b06d707dd067509cdc9ceba783c06fa6a551c2
> prerequisite-patch-id: a1d50e589288239d5a8b1c1f354cd4737057c9d3
> prerequisite-patch-id: da4142d56880861bd0ad7ad7087c9e2670a2ee54
> prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
> prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
> prerequisite-patch-id: ad3b71bf071ae4ba1bee5b087e61a2055772a74f
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry
  2024-06-20 18:54 ` [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
@ 2024-07-01  8:40   ` Nicholas Piggin
  2024-07-01 18:18     ` Naveen N Rao
  0 siblings, 1 reply; 27+ messages in thread
From: Nicholas Piggin @ 2024-07-01  8:40 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> Rather than hard-coding the offset into a function to be used to
> determine if a kprobe is at function entry, use ftrace_location() to
> determine the ftrace location within the function and categorize all
> instructions till that offset to be function entry.
>
> For functions that cannot be traced, we fall back to using a fixed
> offset of 8 (two instructions) to categorize a probe as being at
> function entry for 64-bit elfv2, unless we are using pcrel.
>
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> ---
>  arch/powerpc/kernel/kprobes.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 14c5ddec3056..ca204f4f21c1 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
>  	return addr;
>  }
>  
> -static bool arch_kprobe_on_func_entry(unsigned long offset)
> +static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
>  {
> -#ifdef CONFIG_PPC64_ELF_ABI_V2
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> -	return offset <= 16;
> -#else
> -	return offset <= 8;
> -#endif
> -#else
> +	unsigned long ip = ftrace_location(addr);
> +
> +	if (ip)
> +		return offset <= (ip - addr);
> +	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
> +		return offset <= 8;

If it is PCREL, why not offset == 0 as well?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  2024-06-20 18:54 ` [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code Naveen N Rao
@ 2024-07-01  8:57   ` Nicholas Piggin
  2024-07-01 18:34     ` Naveen N Rao
  0 siblings, 1 reply; 27+ messages in thread
From: Nicholas Piggin @ 2024-07-01  8:57 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> On 32-bit powerpc, gcc generates a three instruction sequence for
> function profiling:
> 	mflr	r0
> 	stw	r0, 4(r1)
> 	bl	_mcount
>
> On kernel boot, the call to _mcount() is nop-ed out, to be patched back
> in when ftrace is actually enabled. The 'stw' instruction therefore is
> not necessary unless ftrace is enabled. Nop it out during ftrace init.
>
> When ftrace is enabled, we want the 'stw' so that stack unwinding works
> properly. Perform the same within the ftrace handler, similar to 64-bit
> powerpc.
>
> For 64-bit powerpc, early versions of gcc used to emit a three
> instruction sequence for function profiling (with -mprofile-kernel) with
> a 'std' instruction to mimic the 'stw' above. Address that scenario also
> by nop-ing out the 'std' instruction during ftrace init.

Cool! Could 32-bit use the 2-insn sequence as well if it had
-mprofile-kernel, out of curiosity?

>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> ---
>  arch/powerpc/kernel/trace/ftrace.c       | 6 ++++--
>  arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index d8d6b4fd9a14..463bd7531dc8 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>  		/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
>  		ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
>  		if (!ret)
> -			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
> +			ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
> +						 ppc_inst(PPC_RAW_NOP()));
>  	} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
>  		/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */
>  		ret = ftrace_read_inst(ip - 4, &old);
>  		if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) {
>  			ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
> -			ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
> +			ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
> +						  ppc_inst(PPC_RAW_NOP()));

So this is the old style path... Should you check the mflr validate
result first? Also do you know what GCC version, roughly? Maybe we
could have a comment here and eventually deprecate it.

You could split this change into its own patch.

>  		}
>  	} else {
>  		return -EINVAL;
> diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
> index 76dbe9fd2c0f..244a1c7bb1e8 100644
> --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> @@ -33,6 +33,8 @@
>   * and then arrange for the ftrace function to be called.
>   */
>  .macro	ftrace_regs_entry allregs
> +	/* Save the original return address in A's stack frame */
> +	PPC_STL		r0, LRSAVE(r1)
>  	/* Create a minimal stack frame for representing B */
>  	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
>  
> @@ -44,8 +46,6 @@
>  	SAVE_GPRS(3, 10, r1)
>  
>  #ifdef CONFIG_PPC64
> -	/* Save the original return address in A's stack frame */
> -	std	r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
>  	/* Ok to continue? */
>  	lbz	r3, PACA_FTRACE_ENABLED(r13)
>  	cmpdi	r3, 0

That seems right to me.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 03/11] powerpc/module_64: Convert #ifdef to IS_ENABLED()
  2024-06-20 18:54 ` [RFC PATCH v3 03/11] powerpc/module_64: Convert #ifdef to IS_ENABLED() Naveen N Rao
@ 2024-07-01  9:01   ` Nicholas Piggin
  0 siblings, 0 replies; 27+ messages in thread
From: Nicholas Piggin @ 2024-07-01  9:01 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> Minor refactor for converting #ifdef to IS_ENABLED().
>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> ---
>  arch/powerpc/kernel/module_64.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index e9bab599d0c2..c202be11683b 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -241,14 +241,13 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
>  		}
>  	}
>  
> -#ifdef CONFIG_DYNAMIC_FTRACE
>  	/* make the trampoline to the ftrace_caller */
> -	relocs++;
> -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE))
> +		relocs++;
> +
>  	/* an additional one for ftrace_regs_caller */
> -	relocs++;
> -#endif
> -#endif
> +	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
> +		relocs++;
>  
>  	pr_debug("Looks like a total of %lu stubs, max\n", relocs);
>  	return relocs * sizeof(struct ppc64_stub_entry);

LGTM. Hmm, you could get even cleverer

    // make the trampoline to the ftrace_caller and ftrace_regs_caller
    relocs += IS_ENABLED(CONFIG_DYNAMIC_FTRACE) +
              IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS);

But either way

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  2024-06-20 18:54 ` [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace Naveen N Rao
@ 2024-07-01  9:27   ` Nicholas Piggin
  2024-07-01 18:51     ` Naveen N Rao
  0 siblings, 1 reply; 27+ messages in thread
From: Nicholas Piggin @ 2024-07-01  9:27 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> Pointer to struct module is only relevant for ftrace records belonging
> to kernel modules. Having this field in dyn_arch_ftrace wastes memory
> for all ftrace records belonging to the kernel. Remove the same in
> favour of looking up the module from the ftrace record address, similar
> to other architectures.

arm is the only one left that requires dyn_arch_ftrace after this.

>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> ---
>  arch/powerpc/include/asm/ftrace.h        |  1 -
>  arch/powerpc/kernel/trace/ftrace.c       | 54 +++++++++++-------
>  arch/powerpc/kernel/trace/ftrace_64_pg.c | 73 +++++++++++-------------
>  3 files changed, 65 insertions(+), 63 deletions(-)
>

[snip]

> @@ -106,28 +106,48 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_MODULES
> +static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long addr)
> +{
> +	struct module *mod = NULL;
> +
> +	/*
> +	 * NOTE: __module_text_address() must be called with preemption
> +	 * disabled, but we can rely on ftrace_lock to ensure that 'mod'
> +	 * retains its validity throughout the remainder of this code.
> +	 */
> +	preempt_disable();
> +	mod = __module_text_address(ip);
> +	preempt_enable();

If 'mod' was guaranteed to exist before your patch, then it
should do afterward too. But is it always ftrace_lock that
protects it, or do dyn_ftrace entries pin a module in some
cases?

> @@ -555,7 +551,10 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>  	ppc_inst_t op;
>  	unsigned long ip = rec->ip;
>  	unsigned long entry, ptr, tramp;
> -	struct module *mod = rec->arch.mod;
> +	struct module *mod = ftrace_lookup_module(rec);
> +
> +	if (!mod)
> +		return -EINVAL;
>  
>  	/* If we never set up ftrace trampolines, then bail */
>  	if (!mod->arch.tramp || !mod->arch.tramp_regs) {
> @@ -668,14 +667,6 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>  		return -EINVAL;
>  	}
>  
> -	/*
> -	 * Out of range jumps are called from modules.
> -	 */
> -	if (!rec->arch.mod) {
> -		pr_err("No module loaded\n");
> -		return -EINVAL;
> -	}
> -

A couple of these conversions are not _exactly_ the same (lost
the pr_err here), maybe that's deliberate because the messages
don't look too useful.

Looks okay though

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 05/11] kbuild: Add generic hook for architectures to use before the final vmlinux link
  2024-06-20 18:54 ` [RFC PATCH v3 05/11] kbuild: Add generic hook for architectures to use before the final vmlinux link Naveen N Rao
@ 2024-07-01  9:30   ` Nicholas Piggin
  0 siblings, 0 replies; 27+ messages in thread
From: Nicholas Piggin @ 2024-07-01  9:30 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> On powerpc, we would like to be able to make a pass on vmlinux.o and
> generate a new object file to be linked into vmlinux. Add a generic pass
> in Makefile.vmlinux that architectures can use for this purpose.
>
> Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
> provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
> will be invoked prior to the final vmlinux link step.
>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> ---
>  arch/Kconfig             |  3 +++
>  scripts/Makefile.vmlinux |  8 ++++++++
>  scripts/link-vmlinux.sh  | 11 ++++++++---
>  3 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..649f0903e7ef 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1643,4 +1643,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
>  config ARCH_NEED_CMPXCHG_1_EMU
>  	bool
>  
> +config ARCH_WANTS_PRE_LINK_VMLINUX
> +	def_bool n

Could you add a comment above this (that contains basically
the 2nd paragraph of your changelog)?

Thanks,
Nick

> +
>  endmenu
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 49946cb96844..6410e0be7f52 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -22,6 +22,14 @@ targets += .vmlinux.export.o
>  vmlinux: .vmlinux.export.o
>  endif
>  
> +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> +targets += .arch.vmlinux.o
> +.arch.vmlinux.o: vmlinux.o FORCE
> +	$(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
> +
> +vmlinux: .arch.vmlinux.o
> +endif
> +
>  ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
>  
>  # Final link of vmlinux with optional arch pass after final link
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 518c70b8db50..aafaed1412ea 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -122,7 +122,7 @@ gen_btf()
>  		return 1
>  	fi
>  
> -	vmlinux_link ${1}
> +	vmlinux_link ${1} ${arch_vmlinux_o}
>  
>  	info "BTF" ${2}
>  	LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
> @@ -178,7 +178,7 @@ kallsyms_step()
>  	kallsymso=${kallsyms_vmlinux}.o
>  	kallsyms_S=${kallsyms_vmlinux}.S
>  
> -	vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}
> +	vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o} ${arch_vmlinux_o}
>  	mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms
>  	kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
>  
> @@ -223,6 +223,11 @@ fi
>  
>  ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init init/version-timestamp.o
>  
> +arch_vmlinux_o=""
> +if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
> +	arch_vmlinux_o=.arch.vmlinux.o
> +fi
> +
>  btf_vmlinux_bin_o=""
>  if is_enabled CONFIG_DEBUG_INFO_BTF; then
>  	btf_vmlinux_bin_o=.btf.vmlinux.bin.o
> @@ -273,7 +278,7 @@ if is_enabled CONFIG_KALLSYMS; then
>  	fi
>  fi
>  
> -vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
> +vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o} ${arch_vmlinux_o}
>  
>  # fill in BTF IDs
>  if is_enabled CONFIG_DEBUG_INFO_BTF && is_enabled CONFIG_BPF; then


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line
  2024-06-20 18:54 ` [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line Naveen N Rao
@ 2024-07-01 10:39   ` Nicholas Piggin
  2024-07-01 19:44     ` Naveen N Rao
  0 siblings, 1 reply; 27+ messages in thread
From: Nicholas Piggin @ 2024-07-01 10:39 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> Function profile sequence on powerpc includes two instructions at the
> beginning of each function:
> 	mflr	r0
> 	bl	ftrace_caller
>
> The call to ftrace_caller() gets nop'ed out during kernel boot and is
> patched in when ftrace is enabled.
>
> Given the sequence, we cannot return from ftrace_caller with 'blr' as we
> need to keep LR and r0 intact. This results in link stack imbalance when

(link stack is IBMese for "return address predictor", if that wasn't
obvious)

> ftrace is enabled. To address that, we would like to use a three
> instruction sequence:
> 	mflr	r0
> 	bl	ftrace_caller
> 	mtlr	r0
>
> Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
> reserve two instruction slots before the function. This results in a
> total of five instruction slots to be reserved for ftrace use on each
> function that is traced.
>
> Move the function profile sequence out-of-line to minimize its impact.
> To do this, we reserve a single nop at function entry using
> -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine

What's the need to do this on vmlinux.o rather than vmlinux? We have
all function syms?

> the total number of functions that can be traced. This is then used to
> generate a .S file reserving the appropriate amount of space for use as
> ftrace stubs, which is built and linked into vmlinux.

An example instruction listing for the "after" case would be nice too.

Is this all ftrace stubs in the one place? And how do you deal with
kernel size exceeding the limit, if so?

>
> On bootup, the stub space is split into separate stubs per function and
> populated with the proper instruction sequence. A pointer to the
> associated stub is maintained in dyn_arch_ftrace.
>
> For modules, space for ftrace stubs is reserved from the generic module
> stub space.
>
> This is restricted to and enabled by default only on 64-bit powerpc.

This is cool.

[...]

> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
>  	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
>  	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
>  
> +config FTRACE_PFE_OUT_OF_LINE
> +	def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> +	depends on PPC64
> +	select ARCH_WANTS_PRE_LINK_VMLINUX

This remains powerpc specific? Maybe add a PPC_ prefix to the config
option?

Bikeshed - should PFE be expanded to be consistent with the ARCH_
option?

[...]

> diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
> index 201f9d15430a..9da1da0f87b4 100644
> --- a/arch/powerpc/include/asm/ftrace.h
> +++ b/arch/powerpc/include/asm/ftrace.h
> @@ -26,6 +26,9 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
>  struct module;
>  struct dyn_ftrace;
>  struct dyn_arch_ftrace {
> +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	unsigned long pfe_stub;
> +#endif
>  };

Ah, we put something else in here. This is the offset to the
stub? Maybe call it pfe_stub_offset?

[...]

> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index 2cff37b5fd2c..9f3c10307331 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -37,7 +37,8 @@ unsigned long ftrace_call_adjust(unsigned long addr)
>  	if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
>  		return 0;
>  
> -	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
> +	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
> +	    !IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
>  		addr += MCOUNT_INSN_SIZE;
>  
>  	return addr;

I don't understand what this is doing acutally (before this patch
even). We still emit one/two patchable intsructions at entry, so
why do we only need to adjust by zero/one instruction here?


> @@ -82,7 +83,7 @@ static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_
>  {
>  	int ret = ftrace_validate_inst(ip, old);
>  
> -	if (!ret)
> +	if (!ret && !ppc_inst_equal(old, new))
>  		ret = patch_instruction((u32 *)ip, new);
>  
>  	return ret;

Is this leftover debugging stuff or should it be in a different patch?

> @@ -132,11 +133,23 @@ static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long a
>  }
>  #endif
>  
> +static unsigned long ftrace_get_pfe_stub(struct dyn_ftrace *rec)
> +{
> +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	return rec->arch.pfe_stub;
> +#else
> +	BUILD_BUG();
> +#endif
> +}
> +
>  static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
>  {
>  	unsigned long ip = rec->ip;
>  	unsigned long stub;
>  
> +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> +		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */

Maybe put the ip = rec->ip; into an else case here.

[...]

> @@ -155,6 +168,79 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
>  	return 0;
>  }
>  
> +static int ftrace_init_pfe_stub(struct module *mod, struct dyn_ftrace *rec)
> +{
> +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	static int pfe_stub_text_index, pfe_stub_inittext_index;
> +	int ret = 0, pfe_stub_count, *pfe_stub_index;
> +	ppc_inst_t inst;
> +	struct ftrace_pfe_stub *pfe_stub, pfe_stub_template = {
> +		.insn = {
> +			PPC_RAW_MFLR(_R0),
> +			PPC_RAW_NOP(),		/* bl ftrace_caller */
> +			PPC_RAW_MTLR(_R0),
> +			PPC_RAW_NOP()		/* b rec->ip + 4 */
> +		}
> +	};
> +
> +	WARN_ON(rec->arch.pfe_stub);
> +
> +	if (is_kernel_inittext(rec->ip)) {
> +		pfe_stub = ftrace_pfe_stub_inittext;
> +		pfe_stub_index = &pfe_stub_inittext_index;
> +		pfe_stub_count = ftrace_pfe_stub_inittext_count;
> +	} else if (is_kernel_text(rec->ip)) {
> +		pfe_stub = ftrace_pfe_stub_text;
> +		pfe_stub_index = &pfe_stub_text_index;
> +		pfe_stub_count = ftrace_pfe_stub_text_count;
> +#ifdef CONFIG_MODULES
> +	} else if (mod) {
> +		pfe_stub = mod->arch.pfe_stubs;
> +		pfe_stub_index = &mod->arch.pfe_stub_index;
> +		pfe_stub_count = mod->arch.pfe_stub_count;
> +#endif
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	pfe_stub += (*pfe_stub_index)++;
> +
> +	if (WARN_ON(*pfe_stub_index > pfe_stub_count))
> +		return -EINVAL;
> +
> +	if (!is_offset_in_branch_range((long)rec->ip - (long)&pfe_stub->insn[0]) ||
> +	    !is_offset_in_branch_range((long)(rec->ip + MCOUNT_INSN_SIZE) - (long)&pfe_stub->insn[3])) {
> +		pr_err("%s: ftrace pfe stub out of range (%p -> %p).\n",
> +					__func__, (void *)rec->ip, (void *)&pfe_stub->insn[0]);
> +		return -EINVAL;
> +	}
> +
> +	rec->arch.pfe_stub = (unsigned long)&pfe_stub->insn[0];
> +
> +	/* bl ftrace_caller */
> +	if (mod)
> +		/* We can't lookup the module since it is not fully formed yet */

What do you mean here, what would a lookup look like if we could do it?

> +		inst = ftrace_create_branch_inst(ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE,
> +						 mod->arch.tramp, 1);
> +	else
> +		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &inst);
> +	pfe_stub_template.insn[1] = ppc_inst_val(inst);
> +
> +	/* b rec->ip + 4 */
> +	if (!ret && create_branch(&inst, &pfe_stub->insn[3], rec->ip + MCOUNT_INSN_SIZE, 0))
> +		return -EINVAL;
> +	pfe_stub_template.insn[3] = ppc_inst_val(inst);
> +
> +	if (!ret)
> +		ret = patch_instructions((u32 *)pfe_stub, (u32 *)&pfe_stub_template,
> +					 sizeof(pfe_stub_template), false);
> +
> +	return ret;
> +#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
> +	BUILD_BUG();
> +#endif
> +}
> +
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
>  {
> @@ -167,18 +253,29 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned
>  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>  {
>  	ppc_inst_t old, new;
> -	int ret;
> +	unsigned long ip = rec->ip;
> +	int ret = 0;
>  
>  	/* This can only ever be called during module load */
> -	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
> +	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(ip)))
>  		return -EINVAL;
>  
>  	old = ppc_inst(PPC_RAW_NOP());
> -	ret = ftrace_get_call_inst(rec, addr, &new);
> -	if (ret)
> -		return ret;
> +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) {
> +		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
> +		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &old);
> +	}
>  
> -	return ftrace_modify_code(rec->ip, old, new);
> +	ret |= ftrace_get_call_inst(rec, addr, &new);
> +
> +	if (!ret)
> +		ret = ftrace_modify_code(ip, old, new);
> +
> +	if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> +		ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
> +			 ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) - (long)rec->ip)));
> +
> +	return ret;
>  }
>  
>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
> @@ -211,6 +308,13 @@ void ftrace_replace_code(int enable)
>  		new_addr = ftrace_get_addr_new(rec);
>  		update = ftrace_update_record(rec, enable);
>  
> +		if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) && update != FTRACE_UPDATE_IGNORE) {
> +			ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE;
> +			ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &nop_inst);
> +			if (ret)
> +				goto out;
> +		}
> +
>  		switch (update) {
>  		case FTRACE_UPDATE_IGNORE:
>  		default:
> @@ -235,6 +339,24 @@ void ftrace_replace_code(int enable)
>  
>  		if (!ret)
>  			ret = ftrace_modify_code(ip, old, new);
> +
> +		if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) &&
> +		    (update == FTRACE_UPDATE_MAKE_NOP || update == FTRACE_UPDATE_MAKE_CALL)) {
> +			/* Update the actual ftrace location */
> +			call_inst = ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) -
> +							    (long)rec->ip));
> +			nop_inst = ppc_inst(PPC_RAW_NOP());
> +			ip = rec->ip;
> +
> +			if (update == FTRACE_UPDATE_MAKE_NOP)
> +				ret = ftrace_modify_code(ip, call_inst, nop_inst);
> +			else
> +				ret = ftrace_modify_code(ip, nop_inst, call_inst);
> +
> +			if (ret)
> +				goto out;
> +		}
> +
>  		if (ret)
>  			goto out;
>  	}
> @@ -254,7 +376,8 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>  	/* Verify instructions surrounding the ftrace location */
>  	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
>  		/* Expect nops */
> -		ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
> +		if (!IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> +			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
>  		if (!ret)
>  			ret = ftrace_validate_inst(ip, ppc_inst(PPC_RAW_NOP()));
>  	} else if (IS_ENABLED(CONFIG_PPC32)) {
> @@ -278,6 +401,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>  	if (ret)
>  		return ret;
>  
> +	/* Set up out-of-line stub */
> +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> +		return ftrace_init_pfe_stub(mod, rec);
> +
>  	/* Nop-out the ftrace location */
>  	new = ppc_inst(PPC_RAW_NOP());
>  	addr = MCOUNT_ADDR;
> diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
> index 244a1c7bb1e8..b1cbef24f18f 100644
> --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> @@ -78,10 +78,6 @@
>  
>  	/* Get the _mcount() call site out of LR */
>  	mflr	r7
> -	/* Save it as pt_regs->nip */
> -	PPC_STL	r7, _NIP(r1)
> -	/* Also save it in B's stackframe header for proper unwind */
> -	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
>  	/* Save the read LR in pt_regs->link */
>  	PPC_STL	r0, _LINK(r1)
>  
> @@ -96,16 +92,6 @@
>  	lwz	r5,function_trace_op@l(r3)
>  #endif
>  
> -#ifdef CONFIG_LIVEPATCH_64
> -	mr	r14, r7		/* remember old NIP */
> -#endif
> -
> -	/* Calculate ip from nip-4 into r3 for call below */
> -	subi    r3, r7, MCOUNT_INSN_SIZE
> -
> -	/* Put the original return address in r4 as parent_ip */
> -	mr	r4, r0
> -
>  	/* Save special regs */
>  	PPC_STL	r8, _MSR(r1)
>  	.if \allregs == 1
> @@ -114,17 +100,64 @@
>  	PPC_STL	r11, _CCR(r1)
>  	.endif
>  
> +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	/* Save our real return address locally for return */
> +	PPC_STL	r7, STACK_INT_FRAME_MARKER(r1)

Hmm, should you be using STACK_INT_FRAME_MARKER in a
non-INT_FRAME? I actually wanted to turn the int marker
into a 4 byte word and move it into a reserved space in
the frame too. Could it go in pt_regs somewhere?

> +	/*
> +	 * We want the ftrace location in the function, but our lr (in r7)
> +	 * points at the 'mtlr r0' instruction in the out of line stub.  To
> +	 * recover the ftrace location, we read the branch instruction in the
> +	 * stub, and adjust our lr by the branch offset.
> +	 */
> +	lwz	r8, MCOUNT_INSN_SIZE(r7)
> +	slwi	r8, r8, 6
> +	srawi	r8, r8, 6
> +	add	r3, r7, r8

Clever. Maybe a comment in ftrace_init_pfe_stub() that says to
keep that last instruction in synch with this?

> +	/*
> +	 * Override our nip to point past the branch in the original function.
> +	 * This allows reliable stack trace and the ftrace stack tracer to work as-is.
> +	 */
> +	add	r7, r3, MCOUNT_INSN_SIZE
> +#else
> +	/* Calculate ip from nip-4 into r3 for call below */
> +	subi    r3, r7, MCOUNT_INSN_SIZE
> +#endif
> +
> +	/* Save NIP as pt_regs->nip */
> +	PPC_STL	r7, _NIP(r1)
> +	/* Also save it in B's stackframe header for proper unwind */
> +	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
> +#ifdef CONFIG_LIVEPATCH_64
> +	mr	r14, r7		/* remember old NIP */
> +#endif
> +
> +	/* Put the original return address in r4 as parent_ip */
> +	mr	r4, r0
> +
>  	/* Load &pt_regs in r6 for call below */
>  	addi    r6, r1, STACK_INT_FRAME_REGS
>  .endm
>  
>  .macro	ftrace_regs_exit allregs
> +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
>  	/* Load ctr with the possibly modified NIP */
>  	PPC_LL	r3, _NIP(r1)
>  	mtctr	r3
>  
>  #ifdef CONFIG_LIVEPATCH_64
>  	cmpd	r14, r3		/* has NIP been altered? */
> +#endif
> +#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
> +#ifdef CONFIG_LIVEPATCH_64
> +	/* Load ctr with the possibly modified NIP */

Comment doesn't apply to this leg of the ifdef AFAIKS.
We load the original NIP into LR, and set CR0 for
livepatch branch.

> +	PPC_LL	r3, _NIP(r1)
> +
> +	cmpd	r14, r3		/* has NIP been altered? */
> +	bne-	1f
> +#endif
> +
> +	PPC_LL	r3, STACK_INT_FRAME_MARKER(r1)
> +1:	mtlr	r3
>  #endif
>  
>  	/* Restore gprs */
> @@ -139,7 +172,9 @@
>  
>  	/* Restore possibly modified LR */
>  	PPC_LL	r0, _LINK(r1)
> +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
>  	mtlr	r0
> +#endif
>  
>  #ifdef CONFIG_PPC64
>  	/* Restore callee's TOC */
> @@ -153,7 +188,12 @@
>          /* Based on the cmpd above, if the NIP was altered handle livepatch */
>  	bne-	livepatch_handler
>  #endif
> -	bctr			/* jump after _mcount site */
> +	/* jump after _mcount site */
> +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	blr
> +#else
> +	bctr
> +#endif

Here is the crux of it all, we return here with a blr that matches
the return address of the bl which it was called with, so the CPU
can predict it.

I think it would be worth a comment here to explain why you go to
so much effort to add the case that uses LR here. Because the out
of line stub itself could pretty well have the same calling convention
as the original mcount.

Actually that's a thought too. Could you split this patch in two?
First just the patch to add the out of line call but use the same
calling convention as mprofile-kernel. Second which changes it to
use the balanced call/return. Would that be a lot of extra work?

>  .endm
>  
>  _GLOBAL(ftrace_regs_caller)
> @@ -177,6 +217,11 @@ _GLOBAL(ftrace_stub)
>  
>  #ifdef CONFIG_PPC64
>  ftrace_no_trace:
> +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	REST_GPR(3, r1)
> +	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
> +	blr
> +#else
>  	mflr	r3
>  	mtctr	r3
>  	REST_GPR(3, r1)
> @@ -184,6 +229,7 @@ ftrace_no_trace:
>  	mtlr	r0
>  	bctr
>  #endif
> +#endif
>  
>  #ifdef CONFIG_LIVEPATCH_64
>  	/*
> @@ -196,9 +242,9 @@ ftrace_no_trace:
>  	 *
>  	 * On entry:
>  	 *  - we have no stack frame and can not allocate one
> -	 *  - LR points back to the original caller (in A)
> -	 *  - CTR holds the new NIP in C
> -	 *  - r0, r11 & r12 are free
> +	 *  - LR/r0 points back to the original caller (in A)
> +	 *  - CTR/LR holds the new NIP in C
> +	 *  - r11 & r12 are free

Could you explain the added case here, e.g.,

On entry, depending on CONFIG_FTRACE_PFE_OUT_OF_LINE (=n/=y)

>  	 */
>  livepatch_handler:
>  	ld	r12, PACA_THREAD_INFO(r13)
> @@ -208,18 +254,23 @@ livepatch_handler:
>  	addi	r11, r11, 24
>  	std	r11, TI_livepatch_sp(r12)
>  
> -	/* Save toc & real LR on livepatch stack */
> -	std	r2,  -24(r11)
> -	mflr	r12
> -	std	r12, -16(r11)
> -
>  	/* Store stack end marker */
>  	lis     r12, STACK_END_MAGIC@h
>  	ori     r12, r12, STACK_END_MAGIC@l
>  	std	r12, -8(r11)
>  
> -	/* Put ctr in r12 for global entry and branch there */
> +	/* Save toc & real LR on livepatch stack */
> +	std	r2,  -24(r11)
> +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	mflr	r12
> +	std	r12, -16(r11)
>  	mfctr	r12
> +#else
> +	std	r0, -16(r11)
> +	mflr	r12
> +	/* Put ctr in r12 for global entry and branch there */
> +	mtctr	r12
> +#endif
>  	bctrl
>  
>  	/*
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
> index f420df7888a7..0aef9959f2cd 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -267,14 +267,13 @@ SECTIONS
>  	.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
>  		_sinittext = .;
>  		INIT_TEXT
> -
> +		*(.tramp.ftrace.init);
>  		/*
>  		 *.init.text might be RO so we must ensure this section ends on
>  		 * a page boundary.
>  		 */
>  		. = ALIGN(PAGE_SIZE);
>  		_einittext = .;
> -		*(.tramp.ftrace.init);
>  	} :text
>  
>  	/* .exit.text is discarded at runtime, not link time,

Why this change?

> diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
> new file mode 100644
> index 000000000000..9e2ba9a85baa
> --- /dev/null
> +++ b/arch/powerpc/tools/Makefile
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +quiet_cmd_gen_ftrace_pfe_stubs = FTRACE  $@
> +      cmd_gen_ftrace_pfe_stubs = $< $(objtree)/vmlinux.o $@
> +
> +targets += .arch.vmlinux.o
> +.arch.vmlinux.o: $(srctree)/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh $(objtree)/vmlinux.o FORCE
> +	$(call if_changed,gen_ftrace_pfe_stubs)
> +
> +clean-files += $(objtree)/.arch.vmlinux.S $(objtree)/.arch.vmlinux.o
> diff --git a/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> new file mode 100755
> index 000000000000..ec95e99aff14
> --- /dev/null
> +++ b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> @@ -0,0 +1,48 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +# Error out on error
> +set -e
> +
> +is_enabled() {
> +	grep -q "^$1=y" include/config/auto.conf
> +}
> +
> +vmlinux_o=${1}
> +arch_vmlinux_o=${2}
> +arch_vmlinux_S=$(dirname ${arch_vmlinux_o})/$(basename ${arch_vmlinux_o} .o).S
> +
> +RELOCATION=R_PPC64_ADDR64
> +if is_enabled CONFIG_PPC32; then
> +	RELOCATION=R_PPC_ADDR32
> +fi

Started PPC32 support?

> +
> +num_pfe_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
> +		     grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
> +num_pfe_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
> +			 grep ".init.text" | grep "${RELOCATION}" | wc -l)
> +
> +cat > ${arch_vmlinux_S} <<EOF
> +#include <asm/asm-offsets.h>
> +#include <linux/linkage.h>
> +
> +.pushsection .tramp.ftrace.text,"aw"
> +SYM_DATA(ftrace_pfe_stub_text_count, .long ${num_pfe_stubs_text})
> +
> +SYM_CODE_START(ftrace_pfe_stub_text)
> +	.space ${num_pfe_stubs_text} * FTRACE_PFE_STUB_SIZE
> +SYM_CODE_END(ftrace_pfe_stub_text)
> +.popsection
> +
> +.pushsection .tramp.ftrace.init,"aw"
> +SYM_DATA(ftrace_pfe_stub_inittext_count, .long ${num_pfe_stubs_inittext})
> +
> +SYM_CODE_START(ftrace_pfe_stub_inittext)
> +	.space ${num_pfe_stubs_inittext} * FTRACE_PFE_STUB_SIZE
> +SYM_CODE_END(ftrace_pfe_stub_inittext)
> +.popsection
> +EOF
> +
> +${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \
> +      ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \
> +      -c -o ${arch_vmlinux_o} ${arch_vmlinux_S}

Looking pretty good. I don't know the livepatch stuff well though.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines
  2024-06-20 19:09 ` [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines Naveen N Rao
@ 2024-07-01 11:03   ` Nicholas Piggin
  2024-07-01 19:58     ` Naveen N Rao
  0 siblings, 1 reply; 27+ messages in thread
From: Nicholas Piggin @ 2024-07-01 11:03 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-trace-kernel, bpf
  Cc: Michael Ellerman, Steven Rostedt, Masami Hiramatsu,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa

On Fri Jun 21, 2024 at 5:09 AM AEST, Naveen N Rao wrote:
> Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
> for 64-bit powerpc.

What do BPF trampolines give you?

> BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
> having a single nop at function entry, followed by the function
> profiling sequence out-of-line and a separate long branch stub for calls
> to trampolines that are out of range. A dummy_tramp is provided to
> simplify synchronization similar to arm64.

Synrhonization - between BPF and ftrace interfaces?

> BPF Trampolines adhere to the existing ftrace ABI utilizing a
> two-instruction profiling sequence, as well as the newer ABI utilizing a
> three-instruction profiling sequence enabling return with a 'blr'. The
> trampoline code itself closely follows x86 implementation.
>
> While the code is generic, BPF trampolines are only enabled on 64-bit
> powerpc. 32-bit powerpc will need testing and some updates.
>
> Signed-off-by: Naveen N Rao <naveen@kernel.org>

Just a quick glance for now, and I don't know BPF code much.

> ---
>  arch/powerpc/include/asm/ppc-opcode.h |  14 +
>  arch/powerpc/net/bpf_jit.h            |  11 +
>  arch/powerpc/net/bpf_jit_comp.c       | 702 +++++++++++++++++++++++++-
>  arch/powerpc/net/bpf_jit_comp32.c     |   7 +-
>  arch/powerpc/net/bpf_jit_comp64.c     |   7 +-
>  5 files changed, 738 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
> index 076ae60b4a55..9eaa2c5d9b73 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -585,12 +585,26 @@
>  #define PPC_RAW_MTSPR(spr, d)		(0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr))
>  #define PPC_RAW_EIEIO()			(0x7c0006ac)
>  
> +/* bcl 20,31,$+4 */
> +#define PPC_RAW_BCL()			(0x429f0005)

This is the special bcl form that gives the current address.
Maybe call it PPC_RAW_BCL4()

>  
> +void dummy_tramp(void);
> +
> +asm (
> +"	.pushsection .text, \"ax\", @progbits	;"
> +"	.global dummy_tramp			;"
> +"	.type dummy_tramp, @function		;"
> +"dummy_tramp:					;"
> +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> +"	blr					;"
> +#else
> +"	mflr	11				;"

Can you just drop this instruction? The caller will always
have it in r11?

> +"	mtctr	11				;"
> +"	mtlr	0				;"
> +"	bctr					;"
> +#endif
> +"	.size dummy_tramp, .-dummy_tramp	;"
> +"	.popsection				;"
> +);
> +
> +void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> +{
> +	int ool_stub_idx, long_branch_stub_idx;
> +
> +	/*
> +	 * Out-of-line stub:
> +	 *	mflr	r0
> +	 *	[b|bl]	tramp
> +	 *	mtlr	r0 // only with CONFIG_FTRACE_PFE_OUT_OF_LINE
> +	 *	b	bpf_func + 4
> +	 */
> +	ool_stub_idx = ctx->idx;
> +	EMIT(PPC_RAW_MFLR(_R0));
> +	EMIT(PPC_RAW_NOP());
> +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> +		EMIT(PPC_RAW_MTLR(_R0));
> +	WARN_ON_ONCE(!is_offset_in_branch_range(4 - (long)ctx->idx * 4)); /* TODO */
> +	EMIT(PPC_RAW_BRANCH(4 - (long)ctx->idx * 4));
> +
> +	/*
> +	 * Long branch stub:
> +	 *	.long	<dummy_tramp_addr>
> +	 *	mflr	r11
> +	 *	bcl	20,31,$+4
> +	 *	mflr	r12
> +	 *	ld	r12, -8-SZL(r12)
> +	 *	mtctr	r12
> +	 *	mtlr	r11 // needed to retain ftrace ABI
> +	 *	bctr
> +	 */

You could avoid clobbering LR on >= POWER9 with addpcis instruction. Or
use a pcrel load with pcrel even. I guess that's something to do later.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry
  2024-07-01  8:40   ` Nicholas Piggin
@ 2024-07-01 18:18     ` Naveen N Rao
  0 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-07-01 18:18 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, linux-trace-kernel, bpf, Michael Ellerman,
	Steven Rostedt, Masami Hiramatsu, Christophe Leroy,
	Masahiro Yamada, Mark Rutland, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Jiri Olsa

Hi Nick,
Thanks for the reviews!

On Mon, Jul 01, 2024 at 06:40:50PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > Rather than hard-coding the offset into a function to be used to
> > determine if a kprobe is at function entry, use ftrace_location() to
> > determine the ftrace location within the function and categorize all
> > instructions till that offset to be function entry.
> >
> > For functions that cannot be traced, we fall back to using a fixed
> > offset of 8 (two instructions) to categorize a probe as being at
> > function entry for 64-bit elfv2, unless we are using pcrel.
> >
> > Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > Signed-off-by: Naveen N Rao <naveen@kernel.org>
> > ---
> >  arch/powerpc/kernel/kprobes.c | 18 ++++++++----------
> >  1 file changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> > index 14c5ddec3056..ca204f4f21c1 100644
> > --- a/arch/powerpc/kernel/kprobes.c
> > +++ b/arch/powerpc/kernel/kprobes.c
> > @@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
> >  	return addr;
> >  }
> >  
> > -static bool arch_kprobe_on_func_entry(unsigned long offset)
> > +static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
> >  {
> > -#ifdef CONFIG_PPC64_ELF_ABI_V2
> > -#ifdef CONFIG_KPROBES_ON_FTRACE
> > -	return offset <= 16;
> > -#else
> > -	return offset <= 8;
> > -#endif
> > -#else
> > +	unsigned long ip = ftrace_location(addr);
> > +
> > +	if (ip)
> > +		return offset <= (ip - addr);
> > +	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
> > +		return offset <= 8;
> 
> If it is PCREL, why not offset == 0 as well?

That's handled by the fallback code that is after the above line:
	return !offset;

That addresses both pcrel, as well as 32-bit powerpc.

Thanks,
Naveen


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  2024-07-01  8:57   ` Nicholas Piggin
@ 2024-07-01 18:34     ` Naveen N Rao
  0 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-07-01 18:34 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, linux-trace-kernel, bpf, Michael Ellerman,
	Steven Rostedt, Masami Hiramatsu, Christophe Leroy,
	Masahiro Yamada, Mark Rutland, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Jiri Olsa

On Mon, Jul 01, 2024 at 06:57:12PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > On 32-bit powerpc, gcc generates a three instruction sequence for
> > function profiling:
> > 	mflr	r0
> > 	stw	r0, 4(r1)
> > 	bl	_mcount
> >
> > On kernel boot, the call to _mcount() is nop-ed out, to be patched back
> > in when ftrace is actually enabled. The 'stw' instruction therefore is
> > not necessary unless ftrace is enabled. Nop it out during ftrace init.
> >
> > When ftrace is enabled, we want the 'stw' so that stack unwinding works
> > properly. Perform the same within the ftrace handler, similar to 64-bit
> > powerpc.
> >
> > For 64-bit powerpc, early versions of gcc used to emit a three
> > instruction sequence for function profiling (with -mprofile-kernel) with
> > a 'std' instruction to mimic the 'stw' above. Address that scenario also
> > by nop-ing out the 'std' instruction during ftrace init.
> 
> Cool! Could 32-bit use the 2-insn sequence as well if it had
> -mprofile-kernel, out of curiosity?

Yes! It actually already does with the previous change to add support 
for -fpatchable-function-entry. Commit 0f71dcfb4aef ("powerpc/ftrace: 
Add support for -fpatchable-function-entry") changelog describes this:

    This changes the profiling instructions used on ppc32. The default -pg
    option emits an additional 'stw' instruction after 'mflr r0' and before
    the branch to _mcount 'bl _mcount'. This is very similar to the original
    -mprofile-kernel implementation on ppc64le, where an additional 'std'
    instruction was used to save LR to its save location in the caller's
    stackframe. Subsequently, this additional store was removed in later
    compiler versions for performance reasons. The same reasons apply for
    ppc32 so we only patch in a 'mflr r0'.


> 
> >
> > Signed-off-by: Naveen N Rao <naveen@kernel.org>
> > ---
> >  arch/powerpc/kernel/trace/ftrace.c       | 6 ++++--
> >  arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> > index d8d6b4fd9a14..463bd7531dc8 100644
> > --- a/arch/powerpc/kernel/trace/ftrace.c
> > +++ b/arch/powerpc/kernel/trace/ftrace.c
> > @@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >  		/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
> >  		ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
> >  		if (!ret)
> > -			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
> > +			ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
> > +						 ppc_inst(PPC_RAW_NOP()));
> >  	} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
> >  		/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */
> >  		ret = ftrace_read_inst(ip - 4, &old);
> >  		if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) {
> >  			ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
> > -			ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
> > +			ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
> > +						  ppc_inst(PPC_RAW_NOP()));
> 
> So this is the old style path... Should you check the mflr validate
> result first? Also do you know what GCC version, roughly? Maybe we
> could have a comment here and eventually deprecate it.

Sure, this is gcc v5.5 for sure. gcc v6.3 doesn't seem to emit the 
additional 'std' instruction.

> 
> You could split this change into its own patch.

Indeed. I will do that.

> 
> >  		}
> >  	} else {
> >  		return -EINVAL;
> > diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
> > index 76dbe9fd2c0f..244a1c7bb1e8 100644
> > --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> > +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> > @@ -33,6 +33,8 @@
> >   * and then arrange for the ftrace function to be called.
> >   */
> >  .macro	ftrace_regs_entry allregs
> > +	/* Save the original return address in A's stack frame */
> > +	PPC_STL		r0, LRSAVE(r1)
> >  	/* Create a minimal stack frame for representing B */
> >  	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
> >  
> > @@ -44,8 +46,6 @@
> >  	SAVE_GPRS(3, 10, r1)
> >  
> >  #ifdef CONFIG_PPC64
> > -	/* Save the original return address in A's stack frame */
> > -	std	r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
> >  	/* Ok to continue? */
> >  	lbz	r3, PACA_FTRACE_ENABLED(r13)
> >  	cmpdi	r3, 0
> 
> That seems right to me.
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Naveen

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  2024-07-01  9:27   ` Nicholas Piggin
@ 2024-07-01 18:51     ` Naveen N Rao
  0 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-07-01 18:51 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, linux-trace-kernel, bpf, Michael Ellerman,
	Steven Rostedt, Masami Hiramatsu, Christophe Leroy,
	Masahiro Yamada, Mark Rutland, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Jiri Olsa

On Mon, Jul 01, 2024 at 07:27:55PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > Pointer to struct module is only relevant for ftrace records belonging
> > to kernel modules. Having this field in dyn_arch_ftrace wastes memory
> > for all ftrace records belonging to the kernel. Remove the same in
> > favour of looking up the module from the ftrace record address, similar
> > to other architectures.
> 
> arm is the only one left that requires dyn_arch_ftrace after this.

Yes, but as you noticed, we add a different field in a subsequenct patch 
in this series.

> 
> >
> > Signed-off-by: Naveen N Rao <naveen@kernel.org>
> > ---
> >  arch/powerpc/include/asm/ftrace.h        |  1 -
> >  arch/powerpc/kernel/trace/ftrace.c       | 54 +++++++++++-------
> >  arch/powerpc/kernel/trace/ftrace_64_pg.c | 73 +++++++++++-------------
> >  3 files changed, 65 insertions(+), 63 deletions(-)
> >
> 
> [snip]
> 
> > @@ -106,28 +106,48 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
> >  	return 0;
> >  }
> >  
> > +#ifdef CONFIG_MODULES
> > +static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long addr)
> > +{
> > +	struct module *mod = NULL;
> > +
> > +	/*
> > +	 * NOTE: __module_text_address() must be called with preemption
> > +	 * disabled, but we can rely on ftrace_lock to ensure that 'mod'
> > +	 * retains its validity throughout the remainder of this code.
> > +	 */
> > +	preempt_disable();
> > +	mod = __module_text_address(ip);
> > +	preempt_enable();
> 
> If 'mod' was guaranteed to exist before your patch, then it
> should do afterward too. But is it always ftrace_lock that
> protects it, or do dyn_ftrace entries pin a module in some
> cases?

We don't pin a module. It is the ftrace_lock acquired during 
delete_module() in ftrace_release_mod() that protects it.

You're right though. That comment is probably not necessary since there 
are no new users of this new function.

> 
> > @@ -555,7 +551,10 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> >  	ppc_inst_t op;
> >  	unsigned long ip = rec->ip;
> >  	unsigned long entry, ptr, tramp;
> > -	struct module *mod = rec->arch.mod;
> > +	struct module *mod = ftrace_lookup_module(rec);
> > +
> > +	if (!mod)
> > +		return -EINVAL;
> >  
> >  	/* If we never set up ftrace trampolines, then bail */
> >  	if (!mod->arch.tramp || !mod->arch.tramp_regs) {
> > @@ -668,14 +667,6 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> >  		return -EINVAL;
> >  	}
> >  
> > -	/*
> > -	 * Out of range jumps are called from modules.
> > -	 */
> > -	if (!rec->arch.mod) {
> > -		pr_err("No module loaded\n");
> > -		return -EINVAL;
> > -	}
> > -
> 
> A couple of these conversions are not _exactly_ the same (lost
> the pr_err here), maybe that's deliberate because the messages
> don't look too useful.

Indeed. Most of the earlier ones being eliminated are in 
ftrace_init_nop(). The other ones get covered by the pr_err in 
ftrace_lookup_module()/ftrace_lookup_module_stub().

> 
> Looks okay though
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>


Thanks,
Naveen

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line
  2024-07-01 10:39   ` Nicholas Piggin
@ 2024-07-01 19:44     ` Naveen N Rao
  0 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-07-01 19:44 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, linux-trace-kernel, bpf, Michael Ellerman,
	Steven Rostedt, Masami Hiramatsu, Christophe Leroy,
	Masahiro Yamada, Mark Rutland, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Jiri Olsa

On Mon, Jul 01, 2024 at 08:39:03PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > Function profile sequence on powerpc includes two instructions at the
> > beginning of each function:
> > 	mflr	r0
> > 	bl	ftrace_caller
> >
> > The call to ftrace_caller() gets nop'ed out during kernel boot and is
> > patched in when ftrace is enabled.
> >
> > Given the sequence, we cannot return from ftrace_caller with 'blr' as we
> > need to keep LR and r0 intact. This results in link stack imbalance when
> 
> (link stack is IBMese for "return address predictor", if that wasn't
> obvious)
> 
> > ftrace is enabled. To address that, we would like to use a three
> > instruction sequence:
> > 	mflr	r0
> > 	bl	ftrace_caller
> > 	mtlr	r0
> >
> > Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
> > reserve two instruction slots before the function. This results in a
> > total of five instruction slots to be reserved for ftrace use on each
> > function that is traced.
> >
> > Move the function profile sequence out-of-line to minimize its impact.
> > To do this, we reserve a single nop at function entry using
> > -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
> 
> What's the need to do this on vmlinux.o rather than vmlinux? We have
> all function syms?

We want to be able to build and include another .o file to be linked 
into vmlinux. That file contains symbols (pfe_stub_text, et al) used by 
vmlinux.o

> 
> > the total number of functions that can be traced. This is then used to
> > generate a .S file reserving the appropriate amount of space for use as
> > ftrace stubs, which is built and linked into vmlinux.
> 
> An example instruction listing for the "after" case would be nice too.

Sure.

> 
> Is this all ftrace stubs in the one place? And how do you deal with
> kernel size exceeding the limit, if so?

Yes, all at the end. Ftrace init fails on bootup if text size exceeds 
branch range. I should really be putting in a post-link script to detect 
and break the build in that case.

> 
> >
> > On bootup, the stub space is split into separate stubs per function and
> > populated with the proper instruction sequence. A pointer to the
> > associated stub is maintained in dyn_arch_ftrace.
> >
> > For modules, space for ftrace stubs is reserved from the generic module
> > stub space.
> >
> > This is restricted to and enabled by default only on 64-bit powerpc.
> 
> This is cool.
> 
> [...]
> 
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> >  	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
> >  	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
> >  
> > +config FTRACE_PFE_OUT_OF_LINE
> > +	def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> > +	depends on PPC64
> > +	select ARCH_WANTS_PRE_LINK_VMLINUX
> 
> This remains powerpc specific? Maybe add a PPC_ prefix to the config
> option?
> 
> Bikeshed - should PFE be expanded to be consistent with the ARCH_
> option?

I agree. PFE isn't immediately obvious. Now that I think about it, not 
sure it really matters that this uses -fpatchable-function-entry. I'll 
call this PPC_FTRACE_SEQUENCE_OUT_OF_LINE. Suggestions welcome :)

> 
> [...]
> 
> > diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
> > index 201f9d15430a..9da1da0f87b4 100644
> > --- a/arch/powerpc/include/asm/ftrace.h
> > +++ b/arch/powerpc/include/asm/ftrace.h
> > @@ -26,6 +26,9 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
> >  struct module;
> >  struct dyn_ftrace;
> >  struct dyn_arch_ftrace {
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	unsigned long pfe_stub;
> > +#endif
> >  };
> 
> Ah, we put something else in here. This is the offset to the
> stub? Maybe call it pfe_stub_offset?

Ack.

> 
> [...]
> 
> > diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> > index 2cff37b5fd2c..9f3c10307331 100644
> > --- a/arch/powerpc/kernel/trace/ftrace.c
> > +++ b/arch/powerpc/kernel/trace/ftrace.c
> > @@ -37,7 +37,8 @@ unsigned long ftrace_call_adjust(unsigned long addr)
> >  	if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
> >  		return 0;
> >  
> > -	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
> > +	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
> > +	    !IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> >  		addr += MCOUNT_INSN_SIZE;
> >  
> >  	return addr;
> 
> I don't understand what this is doing acutally (before this patch
> even). We still emit one/two patchable intsructions at entry, so
> why do we only need to adjust by zero/one instruction here?

ftrace wants the address of the instruction that calls into 
ftrace_caller. We emit 2 nops with -fpatchable-function-entry, so we 
"adjust" the ftrace location to be the second nop.

> 
> 
> > @@ -82,7 +83,7 @@ static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_
> >  {
> >  	int ret = ftrace_validate_inst(ip, old);
> >  
> > -	if (!ret)
> > +	if (!ret && !ppc_inst_equal(old, new))
> >  		ret = patch_instruction((u32 *)ip, new);
> >  
> >  	return ret;
> 
> Is this leftover debugging stuff or should it be in a different patch?

It is intentional to simplify further patching and checks. I will move 
this to a separate patch.

> 
> > @@ -132,11 +133,23 @@ static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long a
> >  }
> >  #endif
> >  
> > +static unsigned long ftrace_get_pfe_stub(struct dyn_ftrace *rec)
> > +{
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	return rec->arch.pfe_stub;
> > +#else
> > +	BUILD_BUG();
> > +#endif
> > +}
> > +
> >  static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
> >  {
> >  	unsigned long ip = rec->ip;
> >  	unsigned long stub;
> >  
> > +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
> 
> Maybe put the ip = rec->ip; into an else case here.

Ok.

> 
> [...]
> 
> > @@ -155,6 +168,79 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
> >  	return 0;
> >  }
> >  
> > +static int ftrace_init_pfe_stub(struct module *mod, struct dyn_ftrace *rec)
> > +{
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	static int pfe_stub_text_index, pfe_stub_inittext_index;
> > +	int ret = 0, pfe_stub_count, *pfe_stub_index;
> > +	ppc_inst_t inst;
> > +	struct ftrace_pfe_stub *pfe_stub, pfe_stub_template = {
> > +		.insn = {
> > +			PPC_RAW_MFLR(_R0),
> > +			PPC_RAW_NOP(),		/* bl ftrace_caller */
> > +			PPC_RAW_MTLR(_R0),
> > +			PPC_RAW_NOP()		/* b rec->ip + 4 */
> > +		}
> > +	};
> > +
> > +	WARN_ON(rec->arch.pfe_stub);
> > +
> > +	if (is_kernel_inittext(rec->ip)) {
> > +		pfe_stub = ftrace_pfe_stub_inittext;
> > +		pfe_stub_index = &pfe_stub_inittext_index;
> > +		pfe_stub_count = ftrace_pfe_stub_inittext_count;
> > +	} else if (is_kernel_text(rec->ip)) {
> > +		pfe_stub = ftrace_pfe_stub_text;
> > +		pfe_stub_index = &pfe_stub_text_index;
> > +		pfe_stub_count = ftrace_pfe_stub_text_count;
> > +#ifdef CONFIG_MODULES
> > +	} else if (mod) {
> > +		pfe_stub = mod->arch.pfe_stubs;
> > +		pfe_stub_index = &mod->arch.pfe_stub_index;
> > +		pfe_stub_count = mod->arch.pfe_stub_count;
> > +#endif
> > +	} else {
> > +		return -EINVAL;
> > +	}
> > +
> > +	pfe_stub += (*pfe_stub_index)++;
> > +
> > +	if (WARN_ON(*pfe_stub_index > pfe_stub_count))
> > +		return -EINVAL;
> > +
> > +	if (!is_offset_in_branch_range((long)rec->ip - (long)&pfe_stub->insn[0]) ||
> > +	    !is_offset_in_branch_range((long)(rec->ip + MCOUNT_INSN_SIZE) - (long)&pfe_stub->insn[3])) {
> > +		pr_err("%s: ftrace pfe stub out of range (%p -> %p).\n",
> > +					__func__, (void *)rec->ip, (void *)&pfe_stub->insn[0]);
> > +		return -EINVAL;
> > +	}
> > +
> > +	rec->arch.pfe_stub = (unsigned long)&pfe_stub->insn[0];
> > +
> > +	/* bl ftrace_caller */
> > +	if (mod)
> > +		/* We can't lookup the module since it is not fully formed yet */
> 
> What do you mean here, what would a lookup look like if we could do it?

I originally had ftrace_get_call_inst() here, which does a 
__module_text_address() lookup. It didn't work since the module is not 
yet fully loaded when this is called. I can expand the comment.

> 
> > +		inst = ftrace_create_branch_inst(ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE,
> > +						 mod->arch.tramp, 1);
> > +	else
> > +		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &inst);
> > +	pfe_stub_template.insn[1] = ppc_inst_val(inst);
> > +
> > +	/* b rec->ip + 4 */
> > +	if (!ret && create_branch(&inst, &pfe_stub->insn[3], rec->ip + MCOUNT_INSN_SIZE, 0))
> > +		return -EINVAL;
> > +	pfe_stub_template.insn[3] = ppc_inst_val(inst);
> > +
> > +	if (!ret)
> > +		ret = patch_instructions((u32 *)pfe_stub, (u32 *)&pfe_stub_template,
> > +					 sizeof(pfe_stub_template), false);
> > +
> > +	return ret;
> > +#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
> > +	BUILD_BUG();
> > +#endif
> > +}
> > +
> >  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
> >  {
> > @@ -167,18 +253,29 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned
> >  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >  {
> >  	ppc_inst_t old, new;
> > -	int ret;
> > +	unsigned long ip = rec->ip;
> > +	int ret = 0;
> >  
> >  	/* This can only ever be called during module load */
> > -	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
> > +	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(ip)))
> >  		return -EINVAL;
> >  
> >  	old = ppc_inst(PPC_RAW_NOP());
> > -	ret = ftrace_get_call_inst(rec, addr, &new);
> > -	if (ret)
> > -		return ret;
> > +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) {
> > +		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
> > +		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &old);
> > +	}
> >  
> > -	return ftrace_modify_code(rec->ip, old, new);
> > +	ret |= ftrace_get_call_inst(rec, addr, &new);
> > +
> > +	if (!ret)
> > +		ret = ftrace_modify_code(ip, old, new);
> > +
> > +	if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +		ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
> > +			 ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) - (long)rec->ip)));
> > +
> > +	return ret;
> >  }
> >  
> >  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
> > @@ -211,6 +308,13 @@ void ftrace_replace_code(int enable)
> >  		new_addr = ftrace_get_addr_new(rec);
> >  		update = ftrace_update_record(rec, enable);
> >  
> > +		if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) && update != FTRACE_UPDATE_IGNORE) {
> > +			ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE;
> > +			ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &nop_inst);
> > +			if (ret)
> > +				goto out;
> > +		}
> > +
> >  		switch (update) {
> >  		case FTRACE_UPDATE_IGNORE:
> >  		default:
> > @@ -235,6 +339,24 @@ void ftrace_replace_code(int enable)
> >  
> >  		if (!ret)
> >  			ret = ftrace_modify_code(ip, old, new);
> > +
> > +		if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) &&
> > +		    (update == FTRACE_UPDATE_MAKE_NOP || update == FTRACE_UPDATE_MAKE_CALL)) {
> > +			/* Update the actual ftrace location */
> > +			call_inst = ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) -
> > +							    (long)rec->ip));
> > +			nop_inst = ppc_inst(PPC_RAW_NOP());
> > +			ip = rec->ip;
> > +
> > +			if (update == FTRACE_UPDATE_MAKE_NOP)
> > +				ret = ftrace_modify_code(ip, call_inst, nop_inst);
> > +			else
> > +				ret = ftrace_modify_code(ip, nop_inst, call_inst);
> > +
> > +			if (ret)
> > +				goto out;
> > +		}
> > +
> >  		if (ret)
> >  			goto out;
> >  	}
> > @@ -254,7 +376,8 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >  	/* Verify instructions surrounding the ftrace location */
> >  	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
> >  		/* Expect nops */
> > -		ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
> > +		if (!IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
> >  		if (!ret)
> >  			ret = ftrace_validate_inst(ip, ppc_inst(PPC_RAW_NOP()));
> >  	} else if (IS_ENABLED(CONFIG_PPC32)) {
> > @@ -278,6 +401,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >  	if (ret)
> >  		return ret;
> >  
> > +	/* Set up out-of-line stub */
> > +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +		return ftrace_init_pfe_stub(mod, rec);
> > +
> >  	/* Nop-out the ftrace location */
> >  	new = ppc_inst(PPC_RAW_NOP());
> >  	addr = MCOUNT_ADDR;
> > diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
> > index 244a1c7bb1e8..b1cbef24f18f 100644
> > --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> > +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> > @@ -78,10 +78,6 @@
> >  
> >  	/* Get the _mcount() call site out of LR */
> >  	mflr	r7
> > -	/* Save it as pt_regs->nip */
> > -	PPC_STL	r7, _NIP(r1)
> > -	/* Also save it in B's stackframe header for proper unwind */
> > -	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
> >  	/* Save the read LR in pt_regs->link */
> >  	PPC_STL	r0, _LINK(r1)
> >  
> > @@ -96,16 +92,6 @@
> >  	lwz	r5,function_trace_op@l(r3)
> >  #endif
> >  
> > -#ifdef CONFIG_LIVEPATCH_64
> > -	mr	r14, r7		/* remember old NIP */
> > -#endif
> > -
> > -	/* Calculate ip from nip-4 into r3 for call below */
> > -	subi    r3, r7, MCOUNT_INSN_SIZE
> > -
> > -	/* Put the original return address in r4 as parent_ip */
> > -	mr	r4, r0
> > -
> >  	/* Save special regs */
> >  	PPC_STL	r8, _MSR(r1)
> >  	.if \allregs == 1
> > @@ -114,17 +100,64 @@
> >  	PPC_STL	r11, _CCR(r1)
> >  	.endif
> >  
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	/* Save our real return address locally for return */
> > +	PPC_STL	r7, STACK_INT_FRAME_MARKER(r1)
> 
> Hmm, should you be using STACK_INT_FRAME_MARKER in a
> non-INT_FRAME? I actually wanted to turn the int marker
> into a 4 byte word and move it into a reserved space in
> the frame too. Could it go in pt_regs somewhere?

I can use a nvr.

> 
> > +	/*
> > +	 * We want the ftrace location in the function, but our lr (in r7)
> > +	 * points at the 'mtlr r0' instruction in the out of line stub.  To
> > +	 * recover the ftrace location, we read the branch instruction in the
> > +	 * stub, and adjust our lr by the branch offset.
> > +	 */
> > +	lwz	r8, MCOUNT_INSN_SIZE(r7)
> > +	slwi	r8, r8, 6
> > +	srawi	r8, r8, 6
> > +	add	r3, r7, r8
> 
> Clever. Maybe a comment in ftrace_init_pfe_stub() that says to
> keep that last instruction in synch with this?

Sure.

> 
> > +	/*
> > +	 * Override our nip to point past the branch in the original function.
> > +	 * This allows reliable stack trace and the ftrace stack tracer to work as-is.
> > +	 */
> > +	add	r7, r3, MCOUNT_INSN_SIZE
> > +#else
> > +	/* Calculate ip from nip-4 into r3 for call below */
> > +	subi    r3, r7, MCOUNT_INSN_SIZE
> > +#endif
> > +
> > +	/* Save NIP as pt_regs->nip */
> > +	PPC_STL	r7, _NIP(r1)
> > +	/* Also save it in B's stackframe header for proper unwind */
> > +	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
> > +#ifdef CONFIG_LIVEPATCH_64
> > +	mr	r14, r7		/* remember old NIP */
> > +#endif
> > +
> > +	/* Put the original return address in r4 as parent_ip */
> > +	mr	r4, r0
> > +
> >  	/* Load &pt_regs in r6 for call below */
> >  	addi    r6, r1, STACK_INT_FRAME_REGS
> >  .endm
> >  
> >  .macro	ftrace_regs_exit allregs
> > +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
> >  	/* Load ctr with the possibly modified NIP */
> >  	PPC_LL	r3, _NIP(r1)
> >  	mtctr	r3
> >  
> >  #ifdef CONFIG_LIVEPATCH_64
> >  	cmpd	r14, r3		/* has NIP been altered? */
> > +#endif
> > +#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
> > +#ifdef CONFIG_LIVEPATCH_64
> > +	/* Load ctr with the possibly modified NIP */
> 
> Comment doesn't apply to this leg of the ifdef AFAIKS.
> We load the original NIP into LR, and set CR0 for
> livepatch branch.

Indeed. Will update.

> 
> > +	PPC_LL	r3, _NIP(r1)
> > +
> > +	cmpd	r14, r3		/* has NIP been altered? */
> > +	bne-	1f
> > +#endif
> > +
> > +	PPC_LL	r3, STACK_INT_FRAME_MARKER(r1)
> > +1:	mtlr	r3
> >  #endif
> >  
> >  	/* Restore gprs */
> > @@ -139,7 +172,9 @@
> >  
> >  	/* Restore possibly modified LR */
> >  	PPC_LL	r0, _LINK(r1)
> > +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
> >  	mtlr	r0
> > +#endif
> >  
> >  #ifdef CONFIG_PPC64
> >  	/* Restore callee's TOC */
> > @@ -153,7 +188,12 @@
> >          /* Based on the cmpd above, if the NIP was altered handle livepatch */
> >  	bne-	livepatch_handler
> >  #endif
> > -	bctr			/* jump after _mcount site */
> > +	/* jump after _mcount site */
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	blr
> > +#else
> > +	bctr
> > +#endif
> 
> Here is the crux of it all, we return here with a blr that matches
> the return address of the bl which it was called with, so the CPU
> can predict it.
> 
> I think it would be worth a comment here to explain why you go to
> so much effort to add the case that uses LR here. Because the out
> of line stub itself could pretty well have the same calling convention
> as the original mcount.

Sure. FWIW, null_syscall showed an improvement of ~22% with ftrace 
enabled with this patch going down from ~520 cycles to ~400 cycles.

> 
> Actually that's a thought too. Could you split this patch in two?
> First just the patch to add the out of line call but use the same
> calling convention as mprofile-kernel. Second which changes it to
> use the balanced call/return. Would that be a lot of extra work?

I'll check. My primary motive was to ensure there would only ever be two 
options:
- the existing -mprofile-kernel sequence, primarily for ppc32
- the new ool sequence with a third instruction to balance the link 
  stack.

Though I agree splitting this makes the code easier to follow.

> 
> >  .endm
> >  
> >  _GLOBAL(ftrace_regs_caller)
> > @@ -177,6 +217,11 @@ _GLOBAL(ftrace_stub)
> >  
> >  #ifdef CONFIG_PPC64
> >  ftrace_no_trace:
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	REST_GPR(3, r1)
> > +	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
> > +	blr
> > +#else
> >  	mflr	r3
> >  	mtctr	r3
> >  	REST_GPR(3, r1)
> > @@ -184,6 +229,7 @@ ftrace_no_trace:
> >  	mtlr	r0
> >  	bctr
> >  #endif
> > +#endif
> >  
> >  #ifdef CONFIG_LIVEPATCH_64
> >  	/*
> > @@ -196,9 +242,9 @@ ftrace_no_trace:
> >  	 *
> >  	 * On entry:
> >  	 *  - we have no stack frame and can not allocate one
> > -	 *  - LR points back to the original caller (in A)
> > -	 *  - CTR holds the new NIP in C
> > -	 *  - r0, r11 & r12 are free
> > +	 *  - LR/r0 points back to the original caller (in A)
> > +	 *  - CTR/LR holds the new NIP in C
> > +	 *  - r11 & r12 are free
> 
> Could you explain the added case here, e.g.,
> 
> On entry, depending on CONFIG_FTRACE_PFE_OUT_OF_LINE (=n/=y)

Sure.

> 
> >  	 */
> >  livepatch_handler:
> >  	ld	r12, PACA_THREAD_INFO(r13)
> > @@ -208,18 +254,23 @@ livepatch_handler:
> >  	addi	r11, r11, 24
> >  	std	r11, TI_livepatch_sp(r12)
> >  
> > -	/* Save toc & real LR on livepatch stack */
> > -	std	r2,  -24(r11)
> > -	mflr	r12
> > -	std	r12, -16(r11)
> > -
> >  	/* Store stack end marker */
> >  	lis     r12, STACK_END_MAGIC@h
> >  	ori     r12, r12, STACK_END_MAGIC@l
> >  	std	r12, -8(r11)
> >  
> > -	/* Put ctr in r12 for global entry and branch there */
> > +	/* Save toc & real LR on livepatch stack */
> > +	std	r2,  -24(r11)
> > +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	mflr	r12
> > +	std	r12, -16(r11)
> >  	mfctr	r12
> > +#else
> > +	std	r0, -16(r11)
> > +	mflr	r12
> > +	/* Put ctr in r12 for global entry and branch there */
> > +	mtctr	r12
> > +#endif
> >  	bctrl
> >  
> >  	/*
> > diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
> > index f420df7888a7..0aef9959f2cd 100644
> > --- a/arch/powerpc/kernel/vmlinux.lds.S
> > +++ b/arch/powerpc/kernel/vmlinux.lds.S
> > @@ -267,14 +267,13 @@ SECTIONS
> >  	.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
> >  		_sinittext = .;
> >  		INIT_TEXT
> > -
> > +		*(.tramp.ftrace.init);
> >  		/*
> >  		 *.init.text might be RO so we must ensure this section ends on
> >  		 * a page boundary.
> >  		 */
> >  		. = ALIGN(PAGE_SIZE);
> >  		_einittext = .;
> > -		*(.tramp.ftrace.init);
> >  	} :text
> >  
> >  	/* .exit.text is discarded at runtime, not link time,
> 
> Why this change?

I should have explained in the commit log. Will add.
Without this change, core_kernel_text() test was failing in 
ftrace_init_pfe_stub() I think.

> 
> > diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
> > new file mode 100644
> > index 000000000000..9e2ba9a85baa
> > --- /dev/null
> > +++ b/arch/powerpc/tools/Makefile
> > @@ -0,0 +1,10 @@
> > +# SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +quiet_cmd_gen_ftrace_pfe_stubs = FTRACE  $@
> > +      cmd_gen_ftrace_pfe_stubs = $< $(objtree)/vmlinux.o $@
> > +
> > +targets += .arch.vmlinux.o
> > +.arch.vmlinux.o: $(srctree)/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh $(objtree)/vmlinux.o FORCE
> > +	$(call if_changed,gen_ftrace_pfe_stubs)
> > +
> > +clean-files += $(objtree)/.arch.vmlinux.S $(objtree)/.arch.vmlinux.o
> > diff --git a/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> > new file mode 100755
> > index 000000000000..ec95e99aff14
> > --- /dev/null
> > +++ b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> > @@ -0,0 +1,48 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +# Error out on error
> > +set -e
> > +
> > +is_enabled() {
> > +	grep -q "^$1=y" include/config/auto.conf
> > +}
> > +
> > +vmlinux_o=${1}
> > +arch_vmlinux_o=${2}
> > +arch_vmlinux_S=$(dirname ${arch_vmlinux_o})/$(basename ${arch_vmlinux_o} .o).S
> > +
> > +RELOCATION=R_PPC64_ADDR64
> > +if is_enabled CONFIG_PPC32; then
> > +	RELOCATION=R_PPC_ADDR32
> > +fi
> 
> Started PPC32 support?

Yes, except perhaps the module code. The intent was to enable as much of 
it as I could so that Christophe could try and see if this would be 
useful on 32-bit. The config option is intentionally neutral.

> 
> > +
> > +num_pfe_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
> > +		     grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
> > +num_pfe_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
> > +			 grep ".init.text" | grep "${RELOCATION}" | wc -l)
> > +
> > +cat > ${arch_vmlinux_S} <<EOF
> > +#include <asm/asm-offsets.h>
> > +#include <linux/linkage.h>
> > +
> > +.pushsection .tramp.ftrace.text,"aw"
> > +SYM_DATA(ftrace_pfe_stub_text_count, .long ${num_pfe_stubs_text})
> > +
> > +SYM_CODE_START(ftrace_pfe_stub_text)
> > +	.space ${num_pfe_stubs_text} * FTRACE_PFE_STUB_SIZE
> > +SYM_CODE_END(ftrace_pfe_stub_text)
> > +.popsection
> > +
> > +.pushsection .tramp.ftrace.init,"aw"
> > +SYM_DATA(ftrace_pfe_stub_inittext_count, .long ${num_pfe_stubs_inittext})
> > +
> > +SYM_CODE_START(ftrace_pfe_stub_inittext)
> > +	.space ${num_pfe_stubs_inittext} * FTRACE_PFE_STUB_SIZE
> > +SYM_CODE_END(ftrace_pfe_stub_inittext)
> > +.popsection
> > +EOF
> > +
> > +${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \
> > +      ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \
> > +      -c -o ${arch_vmlinux_o} ${arch_vmlinux_S}
> 
> Looking pretty good. I don't know the livepatch stuff well though.

Thanks for the detailed review!

- Naveen

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines
  2024-07-01 11:03   ` Nicholas Piggin
@ 2024-07-01 19:58     ` Naveen N Rao
  0 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-07-01 19:58 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, linux-trace-kernel, bpf, Michael Ellerman,
	Steven Rostedt, Masami Hiramatsu, Christophe Leroy,
	Masahiro Yamada, Mark Rutland, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Jiri Olsa

On Mon, Jul 01, 2024 at 09:03:52PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 5:09 AM AEST, Naveen N Rao wrote:
> > Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
> > for 64-bit powerpc.
> 
> What do BPF trampolines give you?

At a very basic level, they provide a way to attach bpf programs at 
function entry/exit - as an alternative to ftrace/kprobe - in a more 
optimal manner. Commit fec56f5890d9 ("bpf: Introduce BPF trampoline") 
has more details.

> 
> > BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
> > having a single nop at function entry, followed by the function
> > profiling sequence out-of-line and a separate long branch stub for calls
> > to trampolines that are out of range. A dummy_tramp is provided to
> > simplify synchronization similar to arm64.
> 
> Synrhonization - between BPF and ftrace interfaces?
> 
> > BPF Trampolines adhere to the existing ftrace ABI utilizing a
> > two-instruction profiling sequence, as well as the newer ABI utilizing a
> > three-instruction profiling sequence enabling return with a 'blr'. The
> > trampoline code itself closely follows x86 implementation.
> >
> > While the code is generic, BPF trampolines are only enabled on 64-bit
> > powerpc. 32-bit powerpc will need testing and some updates.
> >
> > Signed-off-by: Naveen N Rao <naveen@kernel.org>
> 
> Just a quick glance for now, and I don't know BPF code much.
> 
> > ---
> >  arch/powerpc/include/asm/ppc-opcode.h |  14 +
> >  arch/powerpc/net/bpf_jit.h            |  11 +
> >  arch/powerpc/net/bpf_jit_comp.c       | 702 +++++++++++++++++++++++++-
> >  arch/powerpc/net/bpf_jit_comp32.c     |   7 +-
> >  arch/powerpc/net/bpf_jit_comp64.c     |   7 +-
> >  5 files changed, 738 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
> > index 076ae60b4a55..9eaa2c5d9b73 100644
> > --- a/arch/powerpc/include/asm/ppc-opcode.h
> > +++ b/arch/powerpc/include/asm/ppc-opcode.h
> > @@ -585,12 +585,26 @@
> >  #define PPC_RAW_MTSPR(spr, d)		(0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr))
> >  #define PPC_RAW_EIEIO()			(0x7c0006ac)
> >  
> > +/* bcl 20,31,$+4 */
> > +#define PPC_RAW_BCL()			(0x429f0005)
> 
> This is the special bcl form that gives the current address.
> Maybe call it PPC_RAW_BCL4()

Sure.

> 
> >  
> > +void dummy_tramp(void);
> > +
> > +asm (
> > +"	.pushsection .text, \"ax\", @progbits	;"
> > +"	.global dummy_tramp			;"
> > +"	.type dummy_tramp, @function		;"
> > +"dummy_tramp:					;"
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +"	blr					;"
> > +#else
> > +"	mflr	11				;"
> 
> Can you just drop this instruction? The caller will always
> have it in r11?

Indeed. Will add a comment and remove the instruction.

> 
> > +"	mtctr	11				;"
> > +"	mtlr	0				;"
> > +"	bctr					;"
> > +#endif
> > +"	.size dummy_tramp, .-dummy_tramp	;"
> > +"	.popsection				;"
> > +);
> > +
> > +void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> > +{
> > +	int ool_stub_idx, long_branch_stub_idx;
> > +
> > +	/*
> > +	 * Out-of-line stub:
> > +	 *	mflr	r0
> > +	 *	[b|bl]	tramp
> > +	 *	mtlr	r0 // only with CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	 *	b	bpf_func + 4
> > +	 */
> > +	ool_stub_idx = ctx->idx;
> > +	EMIT(PPC_RAW_MFLR(_R0));
> > +	EMIT(PPC_RAW_NOP());
> > +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +		EMIT(PPC_RAW_MTLR(_R0));
> > +	WARN_ON_ONCE(!is_offset_in_branch_range(4 - (long)ctx->idx * 4)); /* TODO */
> > +	EMIT(PPC_RAW_BRANCH(4 - (long)ctx->idx * 4));
> > +
> > +	/*
> > +	 * Long branch stub:
> > +	 *	.long	<dummy_tramp_addr>
> > +	 *	mflr	r11
> > +	 *	bcl	20,31,$+4
> > +	 *	mflr	r12
> > +	 *	ld	r12, -8-SZL(r12)
> > +	 *	mtctr	r12
> > +	 *	mtlr	r11 // needed to retain ftrace ABI
> > +	 *	bctr
> > +	 */
> 
> You could avoid clobbering LR on >= POWER9 with addpcis instruction. Or
> use a pcrel load with pcrel even. I guess that's something to do later.

Yes, much of BPF JIT could use a re-look to consider opportunities to 
emit prefix instructions.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines
  2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
                   ` (11 preceding siblings ...)
  2024-06-24 11:59 ` [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Vishal Chourasia
@ 2024-07-03 11:11 ` Vishal Chourasia
  2024-07-14  7:52   ` Naveen N Rao
  12 siblings, 1 reply; 27+ messages in thread
From: Vishal Chourasia @ 2024-07-03 11:11 UTC (permalink / raw)
  To: Naveen N Rao
  Cc: linuxppc-dev, linux-trace-kernel, bpf, Michael Ellerman,
	Steven Rostedt, Masami Hiramatsu, Nicholas Piggin,
	Christophe Leroy, Masahiro Yamada, Mark Rutland,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Jiri Olsa, Vishal Chourasia

On Fri, Jun 21, 2024 at 12:24:03AM +0530, Naveen N Rao wrote:
> This is v3 of the patches posted here:
> http://lkml.kernel.org/r/cover.1718008093.git.naveen@kernel.org
> 
> Since v2, I have addressed review comments from Steven and Masahiro 
> along with a few fixes. Patches 7-11 are new in this series and add 
> support for ftrace direct and bpf trampolines. 
> 
> This series depends on the patch series from Benjamin Gray adding 
> support for patch_ulong():
> http://lkml.kernel.org/r/20240515024445.236364-1-bgray@linux.ibm.com
> 
> 
> - Naveen

Hello Naveen,

I've noticed an issue with `kstack()` in bpftrace [1] when using `kfunc` 
compared to `kprobe`. Despite trying all three modes specified in the 
documentation (bpftrace, perf, or raw), the stack isn't unwinding 
properly with `kfunc`. 

[1] https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#kstack


for mode in modes; do
	run bpftrace with kfunc
	disable cpu
	kill bpftrace
	run bpftrace with kprobe
	enable cpu
	kill bpftrace

# ./kprobe_vs_kfunc.sh
+ bpftrace -e 'kfunc:vmlinux:build_sched_domains {@[kstack(bpftrace), comm, tid]=count();}'
Attaching 1 probe...
+ chcpu -d 2-3
CPU 2 disabled
CPU 3 disabled
+ kill 35214

@[
    bpf_prog_cfd8d6c8bb4898ce+972
, cpuhp/2, 33]: 1
@[
    bpf_prog_cfd8d6c8bb4898ce+972
, cpuhp/3, 38]: 1

+ bpftrace -e 'kprobe:build_sched_domains {@[kstack(bpftrace), comm, tid]=count();}'
Attaching 1 probe...
+ chcpu -e 2-3
CPU 2 enabled
CPU 3 enabled
+ kill 35221

@[
    0x80000007642bdfb4
    partition_sched_domains_locked+1304
    rebuild_sched_domains_locked+216
    cpuset_handle_hotplug+1148
    sched_cpu_activate+664
    cpuhp_invoke_callback+480
    cpuhp_thread_fun+244
    smpboot_thread_fn+460
    kthread+308
    start_kernel_thread+20
, cpuhp/3, 38]: 1
@[
    0x80000007524b34a4
    partition_sched_domains_locked+1304
    rebuild_sched_domains_locked+216
    cpuset_handle_hotplug+1148
    sched_cpu_activate+664
    cpuhp_invoke_callback+480
    cpuhp_thread_fun+244
    smpboot_thread_fn+460
    kthread+308
    start_kernel_thread+20
, cpuhp/2, 33]: 1

+ bpftrace -e 'kfunc:vmlinux:build_sched_domains {@[kstack(perf), comm, tid]=count();}'
Attaching 1 probe...
+ chcpu -d 2-3
CPU 2 disabled
CPU 3 disabled
+ kill 35229

@[
        c008000003433454 bpf_prog_cfd8d6c8bb4898ce+960
, cpuhp/3, 38]: 1
@[
        c008000003433454 bpf_prog_cfd8d6c8bb4898ce+960
, cpuhp/2, 33]: 1


+ bpftrace -e 'kprobe:build_sched_domains {@[kstack(perf), comm, tid]=count();}'
Attaching 1 probe...
+ chcpu -e 2-3
CPU 2 enabled
CPU 3 enabled
+ kill 35235

@[
        80000007524b379c 0x80000007524b379c
        c000000000206268 partition_sched_domains_locked+1304
        c0000000002cbf08 rebuild_sched_domains_locked+216
        c0000000002cfd5c cpuset_handle_hotplug+1148
        c0000000001b3178 sched_cpu_activate+664
        c000000000156fc0 cpuhp_invoke_callback+480
        c000000000157974 cpuhp_thread_fun+244
        c00000000019ddec smpboot_thread_fn+460
        c0000000001932c4 kthread+308
        c00000000000dd58 start_kernel_thread+20
, cpuhp/2, 33]: 1
@[
        80000007642b9b6c 0x80000007642b9b6c
        c000000000206268 partition_sched_domains_locked+1304
        c0000000002cbf08 rebuild_sched_domains_locked+216
        c0000000002cfd5c cpuset_handle_hotplug+1148
        c0000000001b3178 sched_cpu_activate+664
        c000000000156fc0 cpuhp_invoke_callback+480
        c000000000157974 cpuhp_thread_fun+244
        c00000000019ddec smpboot_thread_fn+460
        c0000000001932c4 kthread+308
        c00000000000dd58 start_kernel_thread+20
, cpuhp/3, 38]: 1

+ bpftrace -e 'kfunc:vmlinux:build_sched_domains {@[kstack(raw), comm, tid]=count();}'
Attaching 1 probe...
+ chcpu -d 2-3
CPU 2 disabled
CPU 3 disabled
+ kill 35243

@[
c00800000343346c
, cpuhp/3, 38]: 1
@[
c00800000343346c
, cpuhp/2, 33]: 1


+ bpftrace -e 'kprobe:build_sched_domains {@[kstack(raw), comm, tid]=count();}'
Attaching 1 probe...
+ chcpu -e 2-3
CPU 2 enabled
CPU 3 enabled
+ kill 35249

@[
80000007642befac
c000000000206268
c0000000002cbf08
c0000000002cfd5c
c0000000001b3178
c000000000156fc0
c000000000157974
c00000000019ddec
c0000000001932c4
c00000000000dd58
, cpuhp/3, 38]: 1
@[
80000007524b425c
c000000000206268
c0000000002cbf08
c0000000002cfd5c
c0000000001b3178
c000000000156fc0
c000000000157974
c00000000019ddec
c0000000001932c4
c00000000000dd58
, cpuhp/2, 33]: 1

> 
> 
> Naveen N Rao (11):
>   powerpc/kprobes: Use ftrace to determine if a probe is at function
>     entry
>   powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
>   powerpc/module_64: Convert #ifdef to IS_ENABLED()
>   powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
>   kbuild: Add generic hook for architectures to use before the final
>     vmlinux link
>   powerpc64/ftrace: Move ftrace sequence out of line
>   powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
>   powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>   samples/ftrace: Add support for ftrace direct samples on powerpc
>   powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into
>     bpf_jit_emit_func_call_rel()
>   powerpc64/bpf: Add support for bpf trampolines
> 
>  arch/Kconfig                                |   3 +
>  arch/powerpc/Kconfig                        |   9 +
>  arch/powerpc/Makefile                       |   8 +
>  arch/powerpc/include/asm/ftrace.h           |  29 +-
>  arch/powerpc/include/asm/module.h           |   5 +
>  arch/powerpc/include/asm/ppc-opcode.h       |  14 +
>  arch/powerpc/kernel/asm-offsets.c           |  11 +
>  arch/powerpc/kernel/kprobes.c               |  18 +-
>  arch/powerpc/kernel/module_64.c             |  67 +-
>  arch/powerpc/kernel/trace/ftrace.c          | 269 +++++++-
>  arch/powerpc/kernel/trace/ftrace_64_pg.c    |  73 +-
>  arch/powerpc/kernel/trace/ftrace_entry.S    | 210 ++++--
>  arch/powerpc/kernel/vmlinux.lds.S           |   3 +-
>  arch/powerpc/net/bpf_jit.h                  |  11 +
>  arch/powerpc/net/bpf_jit_comp.c             | 702 +++++++++++++++++++-
>  arch/powerpc/net/bpf_jit_comp32.c           |   7 +-
>  arch/powerpc/net/bpf_jit_comp64.c           |  68 +-
>  arch/powerpc/tools/Makefile                 |  10 +
>  arch/powerpc/tools/gen-ftrace-pfe-stubs.sh  |  49 ++
>  samples/ftrace/ftrace-direct-modify.c       |  85 ++-
>  samples/ftrace/ftrace-direct-multi-modify.c | 101 ++-
>  samples/ftrace/ftrace-direct-multi.c        |  79 ++-
>  samples/ftrace/ftrace-direct-too.c          |  83 ++-
>  samples/ftrace/ftrace-direct.c              |  69 +-
>  scripts/Makefile.vmlinux                    |   8 +
>  scripts/link-vmlinux.sh                     |  11 +-
>  26 files changed, 1813 insertions(+), 189 deletions(-)
>  create mode 100644 arch/powerpc/tools/Makefile
>  create mode 100755 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> 
> 
> base-commit: e2b06d707dd067509cdc9ceba783c06fa6a551c2
> prerequisite-patch-id: a1d50e589288239d5a8b1c1f354cd4737057c9d3
> prerequisite-patch-id: da4142d56880861bd0ad7ad7087c9e2670a2ee54
> prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
> prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
> prerequisite-patch-id: ad3b71bf071ae4ba1bee5b087e61a2055772a74f
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines
  2024-07-03 11:11 ` Vishal Chourasia
@ 2024-07-14  7:52   ` Naveen N Rao
  0 siblings, 0 replies; 27+ messages in thread
From: Naveen N Rao @ 2024-07-14  7:52 UTC (permalink / raw)
  To: Vishal Chourasia
  Cc: Andrii Nakryiko, Alexei Starovoitov, bpf, Christophe Leroy,
	Daniel Borkmann, John Fastabend, Jiri Olsa, linuxppc-dev,
	linux-trace-kernel, Mark Rutland, Masahiro Yamada,
	Masami Hiramatsu, Nicholas Piggin, Steven Rostedt, Song Liu

Hi Vishal,

Vishal Chourasia wrote:
> On Fri, Jun 21, 2024 at 12:24:03AM +0530, Naveen N Rao wrote:
>> This is v3 of the patches posted here:
>> http://lkml.kernel.org/r/cover.1718008093.git.naveen@kernel.org
>> 
>> Since v2, I have addressed review comments from Steven and Masahiro 
>> along with a few fixes. Patches 7-11 are new in this series and add 
>> support for ftrace direct and bpf trampolines. 
>> 
>> This series depends on the patch series from Benjamin Gray adding 
>> support for patch_ulong():
>> http://lkml.kernel.org/r/20240515024445.236364-1-bgray@linux.ibm.com
>> 
>> 
>> - Naveen
> 
> Hello Naveen,
> 
> I've noticed an issue with `kstack()` in bpftrace [1] when using `kfunc` 
> compared to `kprobe`. Despite trying all three modes specified in the 
> documentation (bpftrace, perf, or raw), the stack isn't unwinding 
> properly with `kfunc`. 
> 
> [1] https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#kstack
> 
> 
> for mode in modes; do
> 	run bpftrace with kfunc
> 	disable cpu
> 	kill bpftrace
> 	run bpftrace with kprobe
> 	enable cpu
> 	kill bpftrace
> 
> # ./kprobe_vs_kfunc.sh
> + bpftrace -e 'kfunc:vmlinux:build_sched_domains {@[kstack(bpftrace), comm, tid]=count();}'
> Attaching 1 probe...
> + chcpu -d 2-3
> CPU 2 disabled
> CPU 3 disabled
> + kill 35214
> 
> @[
>     bpf_prog_cfd8d6c8bb4898ce+972
> , cpuhp/2, 33]: 1
> @[
>     bpf_prog_cfd8d6c8bb4898ce+972
> , cpuhp/3, 38]: 1

Yeah, this is because we don't capture the full register state with bpf 
trampolines unlike with kprobes. BPF stackmap relies on 
perf_arch_fetch_caller_regs() to create a dummy pt_regs for use by 
get_perf_callchain(). We end up with a NULL LR, and bpftrace (and most 
other userspace tools) stop showing the backtrace when they encounter a 
NULL entry. I recall fixing some tools to continue to show backtrace 
inspite of a NULL entry, but I may be mis-remembering.

Perhaps we should fix/change how the perf callchain is captured in the 
kernel. We filter out invalid entries, and capture an additional entry 
for perf since we can't be sure of our return address. We should revisit 
this and see if we can align with the usual expectations of a callchain 
not having a NULL entry. Something like this may help, but this needs 
more testing especially on the perf side:

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 6b4434dd0ff3..9f67b764da92 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -83,12 +83,12 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
                         * We can't tell which of the first two addresses
                         * we get are valid, but we can filter out the
                         * obviously bogus ones here.  We replace them
-                        * with 0 rather than removing them entirely so
+                        * with -1 rather than removing them entirely so
                         * that userspace can tell which is which.
                         */
                        if ((level == 1 && next_ip == lr) ||
                            (level <= 1 && !kernel_text_address(next_ip)))
-                               next_ip = 0;
+                               next_ip = -1;
 
                        ++level;
		}


- Naveen


^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-07-14  8:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20 18:54 [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Naveen N Rao
2024-06-20 18:54 ` [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
2024-07-01  8:40   ` Nicholas Piggin
2024-07-01 18:18     ` Naveen N Rao
2024-06-20 18:54 ` [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code Naveen N Rao
2024-07-01  8:57   ` Nicholas Piggin
2024-07-01 18:34     ` Naveen N Rao
2024-06-20 18:54 ` [RFC PATCH v3 03/11] powerpc/module_64: Convert #ifdef to IS_ENABLED() Naveen N Rao
2024-07-01  9:01   ` Nicholas Piggin
2024-06-20 18:54 ` [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace Naveen N Rao
2024-07-01  9:27   ` Nicholas Piggin
2024-07-01 18:51     ` Naveen N Rao
2024-06-20 18:54 ` [RFC PATCH v3 05/11] kbuild: Add generic hook for architectures to use before the final vmlinux link Naveen N Rao
2024-07-01  9:30   ` Nicholas Piggin
2024-06-20 18:54 ` [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line Naveen N Rao
2024-07-01 10:39   ` Nicholas Piggin
2024-07-01 19:44     ` Naveen N Rao
2024-06-20 18:54 ` [RFC PATCH v3 07/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Naveen N Rao
2024-06-20 18:54 ` [RFC PATCH v3 08/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Naveen N Rao
2024-06-20 18:54 ` [RFC PATCH v3 09/11] samples/ftrace: Add support for ftrace direct samples on powerpc Naveen N Rao
2024-06-20 19:09 ` [RFC PATCH v3 10/11] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel() Naveen N Rao
2024-06-20 19:09 ` [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines Naveen N Rao
2024-07-01 11:03   ` Nicholas Piggin
2024-07-01 19:58     ` Naveen N Rao
2024-06-24 11:59 ` [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines Vishal Chourasia
2024-07-03 11:11 ` Vishal Chourasia
2024-07-14  7:52   ` Naveen N Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).