public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/4] per-function storage support
@ 2025-03-03  6:53 Menglong Dong
  2025-03-03  6:53 ` [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset Menglong Dong
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Menglong Dong @ 2025-03-03  6:53 UTC (permalink / raw)
  To: peterz, rostedt, mark.rutland, alexei.starovoitov
  Cc: catalin.marinas, will, mhiramat, tglx, mingo, bp, dave.hansen,
	x86, hpa, ast, daniel, andrii, martin.lau, eddyz87, yonghong.song,
	john.fastabend, kpsingh, sdf, jolsa, davem, dsahern,
	mathieu.desnoyers, nathan, nick.desaulniers+lkml, morbo,
	samitolvanen, kees, dongml2, akpm, riel, rppt, linux-arm-kernel,
	linux-kernel, linux-trace-kernel, bpf, netdev, llvm

For now, there isn't a way to set and get per-function metadata with
a low overhead, which is not convenient for some situations. Take
BPF trampoline for example, we need to create a trampoline for each
kernel function, as we have to store some information of the function
to the trampoline, such as BPF progs, function arg count, etc. The
performance overhead and memory consumption can be higher to create
these trampolines. With the supporting of per-function metadata storage,
we can store these information to the metadata, and create a global BPF
trampoline for all the kernel functions. In the global trampoline, we
get the information that we need from the function metadata through the
ip (function address) with almost no overhead.

Another beneficiary can be ftrace. For now, all the kernel functions that
are enabled by dynamic ftrace will be added to a filter hash if there are
more than one callbacks. And hash lookup will happen when the traced
functions are called, which has an impact on the performance, see
__ftrace_ops_list_func() -> ftrace_ops_test(). With the per-function
metadata supporting, we can store the information that if the callback is
enabled on the kernel function to the metadata.

In the 1st patch, we factor out FINEIBT_INSN_OFFSET and CFI_INSN_OFFSET to
make fineibt works on the kernel function is 32-bytes aligned.

In the 2nd patch, we implement the per-function metadata storage by
storing the index of the metadata to the function padding space.

In the 3rd and 4th patch, we implement the per-function metadata storage
for x86 and arm64. And in the feature, we can support more arch.

Changes since V2:
- split the patch into a series.
- considering the effect to cfi and fineibt and introduce the 1st patch.

Changes since V1:
- add supporting for arm64
- split out arch relevant code
- refactor the commit log

Menglong Dong (4):
  x86/ibt: factor out cfi and fineibt offset
  add per-function metadata storage support
  x86: implement per-function metadata storage for x86
  arm64: implement per-function metadata storage for arm64

 arch/arm64/Kconfig              |  15 ++
 arch/arm64/Makefile             |  23 ++-
 arch/arm64/include/asm/ftrace.h |  34 +++++
 arch/arm64/kernel/ftrace.c      |  13 +-
 arch/x86/Kconfig                |  16 +++
 arch/x86/include/asm/cfi.h      |  12 +-
 arch/x86/include/asm/ftrace.h   |  54 ++++++++
 arch/x86/kernel/alternative.c   |  27 +++-
 arch/x86/net/bpf_jit_comp.c     |  22 +--
 include/linux/kfunc_md.h        |  25 ++++
 kernel/Makefile                 |   1 +
 kernel/trace/Makefile           |   1 +
 kernel/trace/kfunc_md.c         | 239 ++++++++++++++++++++++++++++++++
 13 files changed, 456 insertions(+), 26 deletions(-)
 create mode 100644 include/linux/kfunc_md.h
 create mode 100644 kernel/trace/kfunc_md.c

-- 
2.39.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset
  2025-03-03  6:53 [PATCH bpf-next v3 0/4] per-function storage support Menglong Dong
@ 2025-03-03  6:53 ` Menglong Dong
  2025-03-03  9:18   ` Peter Zijlstra
  2025-03-03  6:53 ` [PATCH bpf-next v3 2/4] add per-function metadata storage support Menglong Dong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Menglong Dong @ 2025-03-03  6:53 UTC (permalink / raw)
  To: peterz, rostedt, mark.rutland, alexei.starovoitov
  Cc: catalin.marinas, will, mhiramat, tglx, mingo, bp, dave.hansen,
	x86, hpa, ast, daniel, andrii, martin.lau, eddyz87, yonghong.song,
	john.fastabend, kpsingh, sdf, jolsa, davem, dsahern,
	mathieu.desnoyers, nathan, nick.desaulniers+lkml, morbo,
	samitolvanen, kees, dongml2, akpm, riel, rppt, linux-arm-kernel,
	linux-kernel, linux-trace-kernel, bpf, netdev, llvm

For now, the layout of cfi and fineibt is hard coded, and the padding is
fixed on 16 bytes.

Factor out FINEIBT_INSN_OFFSET and CFI_INSN_OFFSET. CFI_INSN_OFFSET is
the offset of cfi, which is the same as FUNCTION_ALIGNMENT when
CALL_PADDING is enabled. And FINEIBT_INSN_OFFSET is the offset where we
put the fineibt preamble on, which is 16 for now.

When the FUNCTION_ALIGNMENT is bigger than 16, we place the fineibt
preamble on the last 16 bytes of the padding for better performance, which
means the fineibt preamble don't use the space that cfi uses.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 arch/x86/include/asm/cfi.h    | 12 ++++++++----
 arch/x86/kernel/alternative.c | 27 ++++++++++++++++++++-------
 arch/x86/net/bpf_jit_comp.c   | 22 +++++++++++-----------
 3 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h
index 31d19c815f99..ab51fa0ef6af 100644
--- a/arch/x86/include/asm/cfi.h
+++ b/arch/x86/include/asm/cfi.h
@@ -109,15 +109,19 @@ enum bug_trap_type handle_cfi_failure(struct pt_regs *regs);
 extern u32 cfi_bpf_hash;
 extern u32 cfi_bpf_subprog_hash;
 
+#ifdef CONFIG_CALL_PADDING
+#define FINEIBT_INSN_OFFSET	16
+#define CFI_INSN_OFFSET		CONFIG_FUNCTION_ALIGNMENT
+#else
+#define CFI_INSN_OFFSET		5
+#endif
+
 static inline int cfi_get_offset(void)
 {
 	switch (cfi_mode) {
 	case CFI_FINEIBT:
-		return 16;
 	case CFI_KCFI:
-		if (IS_ENABLED(CONFIG_CALL_PADDING))
-			return 16;
-		return 5;
+		return CFI_INSN_OFFSET;
 	default:
 		return 0;
 	}
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c71b575bf229..ad050d09cb2b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -908,7 +908,7 @@ void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct mod
 
 		poison_endbr(addr, wr_addr, true);
 		if (IS_ENABLED(CONFIG_FINEIBT))
-			poison_cfi(addr - 16, wr_addr - 16);
+			poison_cfi(addr, wr_addr);
 	}
 }
 
@@ -974,12 +974,15 @@ u32 cfi_get_func_hash(void *func)
 {
 	u32 hash;
 
-	func -= cfi_get_offset();
 	switch (cfi_mode) {
+#ifdef CONFIG_FINEIBT
 	case CFI_FINEIBT:
+		func -= FINEIBT_INSN_OFFSET;
 		func += 7;
 		break;
+#endif
 	case CFI_KCFI:
+		func -= CFI_INSN_OFFSET;
 		func += 1;
 		break;
 	default:
@@ -1068,7 +1071,7 @@ early_param("cfi", cfi_parse_cmdline);
  *
  * caller:					caller:
  *	movl	$(-0x12345678),%r10d	 // 6	     movl   $0x12345678,%r10d	// 6
- *	addl	$-15(%r11),%r10d	 // 4	     sub    $16,%r11		// 4
+ *	addl	$-15(%r11),%r10d	 // 4	     sub    $FINEIBT_INSN_OFFSET,%r11 // 4
  *	je	1f			 // 2	     nop4			// 4
  *	ud2				 // 2
  * 1:	call	__x86_indirect_thunk_r11 // 5	     call   *%r11; nop2;	// 5
@@ -1092,10 +1095,14 @@ extern u8 fineibt_preamble_end[];
 #define fineibt_preamble_size (fineibt_preamble_end - fineibt_preamble_start)
 #define fineibt_preamble_hash 7
 
+#define ___OFFSET_STR(x)	#x
+#define __OFFSET_STR(x)		___OFFSET_STR(x)
+#define OFFSET_STR		__OFFSET_STR(FINEIBT_INSN_OFFSET)
+
 asm(	".pushsection .rodata			\n"
 	"fineibt_caller_start:			\n"
 	"	movl	$0x12345678, %r10d	\n"
-	"	sub	$16, %r11		\n"
+	"	sub	$"OFFSET_STR", %r11	\n"
 	ASM_NOP4
 	"fineibt_caller_end:			\n"
 	".popsection				\n"
@@ -1225,6 +1232,7 @@ static int cfi_rewrite_preamble(s32 *start, s32 *end, struct module *mod)
 			 addr, addr, 5, addr))
 			return -EINVAL;
 
+		wr_addr += (CFI_INSN_OFFSET - FINEIBT_INSN_OFFSET);
 		text_poke_early(wr_addr, fineibt_preamble_start, fineibt_preamble_size);
 		WARN_ON(*(u32 *)(wr_addr + fineibt_preamble_hash) != 0x12345678);
 		text_poke_early(wr_addr + fineibt_preamble_hash, &hash, 4);
@@ -1241,7 +1249,8 @@ static void cfi_rewrite_endbr(s32 *start, s32 *end, struct module *mod)
 		void *addr = (void *)s + *s;
 		void *wr_addr = module_writable_address(mod, addr);
 
-		poison_endbr(addr + 16, wr_addr + 16, false);
+		poison_endbr(addr + CFI_INSN_OFFSET, wr_addr + CFI_INSN_OFFSET,
+			     false);
 	}
 }
 
@@ -1347,12 +1356,12 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
 		return;
 
 	case CFI_FINEIBT:
-		/* place the FineIBT preamble at func()-16 */
+		/* place the FineIBT preamble at func()-FINEIBT_INSN_OFFSET */
 		ret = cfi_rewrite_preamble(start_cfi, end_cfi, mod);
 		if (ret)
 			goto err;
 
-		/* rewrite the callers to target func()-16 */
+		/* rewrite the callers to target func()-FINEIBT_INSN_OFFSET */
 		ret = cfi_rewrite_callers(start_retpoline, end_retpoline, mod);
 		if (ret)
 			goto err;
@@ -1381,6 +1390,8 @@ static void poison_cfi(void *addr, void *wr_addr)
 {
 	switch (cfi_mode) {
 	case CFI_FINEIBT:
+		addr -= FINEIBT_INSN_OFFSET;
+		wr_addr -= FINEIBT_INSN_OFFSET;
 		/*
 		 * __cfi_\func:
 		 *	osp nopl (%rax)
@@ -1394,6 +1405,8 @@ static void poison_cfi(void *addr, void *wr_addr)
 		break;
 
 	case CFI_KCFI:
+		addr -= CFI_INSN_OFFSET;
+		wr_addr -= CFI_INSN_OFFSET;
 		/*
 		 * __cfi_\func:
 		 *	movl	$0, %eax
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a43fc5af973d..e0ddb0fd28e2 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -414,6 +414,12 @@ static void emit_nops(u8 **pprog, int len)
 static void emit_fineibt(u8 **pprog, u32 hash)
 {
 	u8 *prog = *pprog;
+#ifdef CONFIG_CALL_PADDING
+	int i;
+
+	for (i = 0; i < CFI_INSN_OFFSET - 16; i++)
+		EMIT1(0x90);
+#endif
 
 	EMIT_ENDBR();
 	EMIT3_off32(0x41, 0x81, 0xea, hash);		/* subl $hash, %r10d	*/
@@ -428,20 +434,14 @@ static void emit_fineibt(u8 **pprog, u32 hash)
 static void emit_kcfi(u8 **pprog, u32 hash)
 {
 	u8 *prog = *pprog;
+#ifdef CONFIG_CALL_PADDING
+	int i;
+#endif
 
 	EMIT1_off32(0xb8, hash);			/* movl $hash, %eax	*/
 #ifdef CONFIG_CALL_PADDING
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
+	for (i = 0; i < CFI_INSN_OFFSET - 5; i++)
+		EMIT1(0x90);
 #endif
 	EMIT_ENDBR();
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 2/4] add per-function metadata storage support
  2025-03-03  6:53 [PATCH bpf-next v3 0/4] per-function storage support Menglong Dong
  2025-03-03  6:53 ` [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset Menglong Dong
@ 2025-03-03  6:53 ` Menglong Dong
  2025-03-03  6:53 ` [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86 Menglong Dong
  2025-03-03  6:53 ` [PATCH bpf-next v3 4/4] arm64: implement per-function metadata storage for arm64 Menglong Dong
  3 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-03-03  6:53 UTC (permalink / raw)
  To: peterz, rostedt, mark.rutland, alexei.starovoitov
  Cc: catalin.marinas, will, mhiramat, tglx, mingo, bp, dave.hansen,
	x86, hpa, ast, daniel, andrii, martin.lau, eddyz87, yonghong.song,
	john.fastabend, kpsingh, sdf, jolsa, davem, dsahern,
	mathieu.desnoyers, nathan, nick.desaulniers+lkml, morbo,
	samitolvanen, kees, dongml2, akpm, riel, rppt, linux-arm-kernel,
	linux-kernel, linux-trace-kernel, bpf, netdev, llvm

For now, there isn't a way to set and get per-function metadata with
a low overhead, which is not convenient for some situations. Take
BPF trampoline for example, we need to create a trampoline for each
kernel function, as we have to store some information of the function
to the trampoline, such as BPF progs, function arg count, etc. The
performance overhead and memory consumption can be higher to create
these trampolines. With the supporting of per-function metadata storage,
we can store these information to the metadata, and create a global BPF
trampoline for all the kernel functions. In the global trampoline, we
get the information that we need from the function metadata through the
ip (function address) with almost no overhead.

Another beneficiary can be ftrace. For now, all the kernel functions that
are enabled by dynamic ftrace will be added to a filter hash if there are
more than one callbacks. And hash lookup will happen when the traced
functions are called, which has an impact on the performance, see
__ftrace_ops_list_func() -> ftrace_ops_test(). With the per-function
metadata supporting, we can store the information that if the callback is
enabled on the kernel function to the metadata.

Support per-function metadata storage in the function padding, and
previous discussion can be found in [1]. Generally speaking, we have two
way to implement this feature:

1. Create a function metadata array, and prepend a insn which can hold
the index of the function metadata in the array. And store the insn to
the function padding.

2. Allocate the function metadata with kmalloc(), and prepend a insn which
hold the pointer of the metadata. And store the insn to the function
padding.

Compared with way 2, way 1 consume less space, but we need to do more work
on the global function metadata array. And we implement this function in
the way 1.

Link: https://lore.kernel.org/bpf/CADxym3anLzM6cAkn_z71GDd_VeKiqqk1ts=xuiP7pr4PO6USPA@mail.gmail.com/ [1]
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
v2:
- add supporting for arm64
- split out arch relevant code
- refactor the commit log
---
 include/linux/kfunc_md.h |  25 ++++
 kernel/Makefile          |   1 +
 kernel/trace/Makefile    |   1 +
 kernel/trace/kfunc_md.c  | 239 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 266 insertions(+)
 create mode 100644 include/linux/kfunc_md.h
 create mode 100644 kernel/trace/kfunc_md.c

diff --git a/include/linux/kfunc_md.h b/include/linux/kfunc_md.h
new file mode 100644
index 000000000000..df616f0fcb36
--- /dev/null
+++ b/include/linux/kfunc_md.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_KFUNC_MD_H
+#define _LINUX_KFUNC_MD_H
+
+#include <linux/kernel.h>
+
+struct kfunc_md {
+	int users;
+	/* we can use this field later, make sure it is 8-bytes aligned
+	 * for now.
+	 */
+	int pad0;
+	void *func;
+};
+
+extern struct kfunc_md *kfunc_mds;
+
+struct kfunc_md *kfunc_md_find(void *ip);
+struct kfunc_md *kfunc_md_get(void *ip);
+void kfunc_md_put(struct kfunc_md *meta);
+void kfunc_md_put_by_ip(void *ip);
+void kfunc_md_lock(void);
+void kfunc_md_unlock(void);
+
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index 87866b037fbe..7435674d5da3 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_TRACE_CLOCK) += trace/
 obj-$(CONFIG_RING_BUFFER) += trace/
 obj-$(CONFIG_TRACEPOINTS) += trace/
 obj-$(CONFIG_RETHOOK) += trace/
+obj-$(CONFIG_FUNCTION_METADATA) += trace/
 obj-$(CONFIG_IRQ_WORK) += irq_work.o
 obj-$(CONFIG_CPU_PM) += cpu_pm.o
 obj-$(CONFIG_BPF) += bpf/
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 057cd975d014..9780ee3f8d8d 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
 obj-$(CONFIG_FPROBE) += fprobe.o
 obj-$(CONFIG_RETHOOK) += rethook.o
 obj-$(CONFIG_FPROBE_EVENTS) += trace_fprobe.o
+obj-$(CONFIG_FUNCTION_METADATA) += kfunc_md.o
 
 obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
 obj-$(CONFIG_RV) += rv/
diff --git a/kernel/trace/kfunc_md.c b/kernel/trace/kfunc_md.c
new file mode 100644
index 000000000000..7ec25bcf778d
--- /dev/null
+++ b/kernel/trace/kfunc_md.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/slab.h>
+#include <linux/memory.h>
+#include <linux/rcupdate.h>
+#include <linux/ftrace.h>
+#include <linux/kfunc_md.h>
+
+#define ENTRIES_PER_PAGE (PAGE_SIZE / sizeof(struct kfunc_md))
+
+static u32 kfunc_md_count = ENTRIES_PER_PAGE, kfunc_md_used;
+struct kfunc_md __rcu *kfunc_mds;
+EXPORT_SYMBOL_GPL(kfunc_mds);
+
+static DEFINE_MUTEX(kfunc_md_mutex);
+
+
+void kfunc_md_unlock(void)
+{
+	mutex_unlock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_unlock);
+
+void kfunc_md_lock(void)
+{
+	mutex_lock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_lock);
+
+static u32 kfunc_md_get_index(void *ip)
+{
+	return *(u32 *)(ip - KFUNC_MD_DATA_OFFSET);
+}
+
+static void kfunc_md_init(struct kfunc_md *mds, u32 start, u32 end)
+{
+	u32 i;
+
+	for (i = start; i < end; i++)
+		mds[i].users = 0;
+}
+
+static int kfunc_md_page_order(void)
+{
+	return fls(DIV_ROUND_UP(kfunc_md_count, ENTRIES_PER_PAGE)) - 1;
+}
+
+/* Get next usable function metadata. On success, return the usable
+ * kfunc_md and store the index of it to *index. If no usable kfunc_md is
+ * found in kfunc_mds, a larger array will be allocated.
+ */
+static struct kfunc_md *kfunc_md_get_next(u32 *index)
+{
+	struct kfunc_md *new_mds, *mds;
+	u32 i, order;
+
+	mds = rcu_dereference(kfunc_mds);
+	if (mds == NULL) {
+		order = kfunc_md_page_order();
+		new_mds = (void *)__get_free_pages(GFP_KERNEL, order);
+		if (!new_mds)
+			return NULL;
+		kfunc_md_init(new_mds, 0, kfunc_md_count);
+		/* The first time to initialize kfunc_mds, so it is not
+		 * used anywhere yet, and we can update it directly.
+		 */
+		rcu_assign_pointer(kfunc_mds, new_mds);
+		mds = new_mds;
+	}
+
+	if (likely(kfunc_md_used < kfunc_md_count)) {
+		/* maybe we can manage the used function metadata entry
+		 * with a bit map ?
+		 */
+		for (i = 0; i < kfunc_md_count; i++) {
+			if (!mds[i].users) {
+				kfunc_md_used++;
+				*index = i;
+				mds[i].users++;
+				return mds + i;
+			}
+		}
+	}
+
+	order = kfunc_md_page_order();
+	/* no available function metadata, so allocate a bigger function
+	 * metadata array.
+	 */
+	new_mds = (void *)__get_free_pages(GFP_KERNEL, order + 1);
+	if (!new_mds)
+		return NULL;
+
+	memcpy(new_mds, mds, kfunc_md_count * sizeof(*new_mds));
+	kfunc_md_init(new_mds, kfunc_md_count, kfunc_md_count * 2);
+
+	rcu_assign_pointer(kfunc_mds, new_mds);
+	synchronize_rcu();
+	free_pages((u64)mds, order);
+
+	mds = new_mds + kfunc_md_count;
+	*index = kfunc_md_count;
+	kfunc_md_count <<= 1;
+	kfunc_md_used++;
+	mds->users++;
+
+	return mds;
+}
+
+static int kfunc_md_text_poke(void *ip, void *insn, void *nop)
+{
+	void *target;
+	int ret = 0;
+	u8 *prog;
+
+	target = ip - KFUNC_MD_INSN_OFFSET;
+	mutex_lock(&text_mutex);
+	if (insn) {
+		if (!memcmp(target, insn, KFUNC_MD_INSN_SIZE))
+			goto out;
+
+		if (memcmp(target, nop, KFUNC_MD_INSN_SIZE)) {
+			ret = -EBUSY;
+			goto out;
+		}
+		prog = insn;
+	} else {
+		if (!memcmp(target, nop, KFUNC_MD_INSN_SIZE))
+			goto out;
+		prog = nop;
+	}
+
+	ret = kfunc_md_arch_poke(target, prog);
+out:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
+
+static bool __kfunc_md_put(struct kfunc_md *md)
+{
+	u8 nop_insn[KFUNC_MD_INSN_SIZE];
+
+	if (WARN_ON_ONCE(md->users <= 0))
+		return false;
+
+	md->users--;
+	if (md->users > 0)
+		return false;
+
+	if (!kfunc_md_arch_exist(md->func))
+		return false;
+
+	kfunc_md_arch_nops(nop_insn);
+	/* release the metadata by recovering the function padding to NOPS */
+	kfunc_md_text_poke(md->func, NULL, nop_insn);
+	/* TODO: we need a way to shrink the array "kfunc_mds" */
+	kfunc_md_used--;
+
+	return true;
+}
+
+/* Decrease the reference of the md, release it if "md->users <= 0" */
+void kfunc_md_put(struct kfunc_md *md)
+{
+	mutex_lock(&kfunc_md_mutex);
+	__kfunc_md_put(md);
+	mutex_unlock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_put);
+
+/* Get a exist metadata by the function address, and NULL will be returned
+ * if not exist.
+ *
+ * NOTE: rcu lock should be held during reading the metadata, and
+ * kfunc_md_lock should be held if writing happens.
+ */
+struct kfunc_md *kfunc_md_find(void *ip)
+{
+	struct kfunc_md *md;
+	u32 index;
+
+	if (kfunc_md_arch_exist(ip)) {
+		index = kfunc_md_get_index(ip);
+		if (WARN_ON_ONCE(index >= kfunc_md_count))
+			return NULL;
+
+		md = rcu_dereference(kfunc_mds) + index;
+		return md;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(kfunc_md_find);
+
+void kfunc_md_put_by_ip(void *ip)
+{
+	struct kfunc_md *md;
+
+	mutex_lock(&kfunc_md_mutex);
+	md = kfunc_md_find(ip);
+	if (md)
+		__kfunc_md_put(md);
+	mutex_unlock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_put_by_ip);
+
+/* Get a exist metadata by the function address, and create one if not
+ * exist. Reference of the metadata will increase 1.
+ *
+ * NOTE: always call this function with kfunc_md_lock held, and all
+ * updating to metadata should also hold the kfunc_md_lock.
+ */
+struct kfunc_md *kfunc_md_get(void *ip)
+{
+	u8 nop_insn[KFUNC_MD_INSN_SIZE], insn[KFUNC_MD_INSN_SIZE];
+	struct kfunc_md *md;
+	u32 index;
+
+	md = kfunc_md_find(ip);
+	if (md) {
+		md->users++;
+		return md;
+	}
+
+	md = kfunc_md_get_next(&index);
+	if (!md)
+		return NULL;
+
+	kfunc_md_arch_pretend(insn, index);
+	kfunc_md_arch_nops(nop_insn);
+
+	if (kfunc_md_text_poke(ip, insn, nop_insn)) {
+		kfunc_md_used--;
+		md->users = 0;
+		return NULL;
+	}
+	md->func = ip;
+
+	return md;
+}
+EXPORT_SYMBOL_GPL(kfunc_md_get);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86
  2025-03-03  6:53 [PATCH bpf-next v3 0/4] per-function storage support Menglong Dong
  2025-03-03  6:53 ` [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset Menglong Dong
  2025-03-03  6:53 ` [PATCH bpf-next v3 2/4] add per-function metadata storage support Menglong Dong
@ 2025-03-03  6:53 ` Menglong Dong
  2025-03-03 16:05   ` Steven Rostedt
  2025-03-03  6:53 ` [PATCH bpf-next v3 4/4] arm64: implement per-function metadata storage for arm64 Menglong Dong
  3 siblings, 1 reply; 11+ messages in thread
From: Menglong Dong @ 2025-03-03  6:53 UTC (permalink / raw)
  To: peterz, rostedt, mark.rutland, alexei.starovoitov
  Cc: catalin.marinas, will, mhiramat, tglx, mingo, bp, dave.hansen,
	x86, hpa, ast, daniel, andrii, martin.lau, eddyz87, yonghong.song,
	john.fastabend, kpsingh, sdf, jolsa, davem, dsahern,
	mathieu.desnoyers, nathan, nick.desaulniers+lkml, morbo,
	samitolvanen, kees, dongml2, akpm, riel, rppt, linux-arm-kernel,
	linux-kernel, linux-trace-kernel, bpf, netdev, llvm

With CONFIG_CALL_PADDING enabled, there will be 16-bytes padding space
before all the kernel functions. And some kernel features can use it,
such as MITIGATION_CALL_DEPTH_TRACKING, CFI_CLANG, FINEIBT, etc.

In my research, MITIGATION_CALL_DEPTH_TRACKING will consume the tail
9-bytes in the function padding, CFI_CLANG will consume the head 5-bytes,
and FINEIBT will consume all the 16 bytes if it is enabled. So there will
be no space for us if MITIGATION_CALL_DEPTH_TRACKING and CFI_CLANG are
both enabled, or FINEIBT is enabled.

In x86, we need 5-bytes to prepend a "mov %eax xxx" insn, which can hold
a 4-bytes index. So we have following logic:

1. use the head 5-bytes if CFI_CLANG is not enabled
2. use the tail 5-bytes if MITIGATION_CALL_DEPTH_TRACKING and FINEIBT are
   not enabled
3. compile the kernel with FUNCTION_ALIGNMENT_32B otherwise

In the third case, we make the kernel function 32 bytes aligned, and there
will be 32 bytes padding before the functions. According to my testing,
the text size didn't increase on this case, which is weird.

With 16-bytes padding:

-rwxr-xr-x 1 401190688  x86-dev/vmlinux*
-rw-r--r-- 1    251068  x86-dev/vmlinux.a
-rw-r--r-- 1 851892992  x86-dev/vmlinux.o
-rw-r--r-- 1  12395008  x86-dev/arch/x86/boot/bzImage

With 32-bytes padding:

-rwxr-xr-x 1 401318128 x86-dev/vmlinux*
-rw-r--r-- 1    251154 x86-dev/vmlinux.a
-rw-r--r-- 1 853636704 x86-dev/vmlinux.o
-rw-r--r-- 1  12509696 x86-dev/arch/x86/boot/bzImage

The way I tested should be right, and this is a good news for us. On the
third case, the layout of the padding space will be like this if fineibt
is enabled:

__cfi_func:
	mov	--	5	-- cfi, not used anymore
	nop
	nop
	nop
	mov	--	5	-- function metadata
	nop
	nop
	nop
	fineibt	--	16	-- fineibt
func:
	nopw	--	4
	......

I tested the fineibt with "cfi=fineibt" cmdline, and it works well
together with FUNCTION_METADATA enabled. And I also tested the
performance of this function by setting metadata for all the kernel
function, and it consumes 0.7s for 70k+ functions, not bad :/

I can't find a machine that support IBT, so I didn't test the IBT. I'd
appreciate it if someone can do this testing for me :/

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
v3:
- select FUNCTION_ALIGNMENT_32B on case3, instead of extra 5-bytes
---
 arch/x86/Kconfig              | 18 ++++++++++++
 arch/x86/include/asm/ftrace.h | 54 +++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index be2c311f5118..fe5a98401135 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2509,6 +2509,24 @@ config PREFIX_SYMBOLS
 	def_bool y
 	depends on CALL_PADDING && !CFI_CLANG
 
+config FUNCTION_METADATA
+	bool "Per-function metadata storage support"
+	default y
+	depends on CC_HAS_ENTRY_PADDING && OBJTOOL
+	select CALL_PADDING
+	select FUNCTION_ALIGNMENT_32B if ((CFI_CLANG && CALL_THUNKS) || FINEIBT)
+	help
+	  Support per-function metadata storage for kernel functions, and
+	  get the metadata of the function by its address with almost no
+	  overhead.
+
+	  The index of the metadata will be stored in the function padding
+	  and consumes 5-bytes. FUNCTION_ALIGNMENT_32B will be selected if
+	  "(CFI_CLANG && CALL_THUNKS) || FINEIBT" to make sure there is
+	  enough available padding space for this function. However, it
+	  seems that the text size almost don't change, compare with
+	  FUNCTION_ALIGNMENT_16B.
+
 menuconfig CPU_MITIGATIONS
 	bool "Mitigations for CPU vulnerabilities"
 	default y
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index f9cb4d07df58..d5cbb8e18fd7 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -4,6 +4,28 @@
 
 #include <asm/ptrace.h>
 
+#ifdef CONFIG_FUNCTION_METADATA
+#if (defined(CONFIG_CFI_CLANG) && defined(CONFIG_CALL_THUNKS)) || (defined(CONFIG_FINEIBT))
+  /* the CONFIG_FUNCTION_PADDING_BYTES is 32 in this case, use the
+   * range: [align + 8, align + 13].
+   */
+  #define KFUNC_MD_INSN_OFFSET		(CONFIG_FUNCTION_PADDING_BYTES - 8)
+  #define KFUNC_MD_DATA_OFFSET		(CONFIG_FUNCTION_PADDING_BYTES - 9)
+#else
+  #ifdef CONFIG_CFI_CLANG
+    /* use the space that CALL_THUNKS suppose to use */
+    #define KFUNC_MD_INSN_OFFSET	(5)
+    #define KFUNC_MD_DATA_OFFSET	(4)
+  #else
+    /* use the space that CFI_CLANG suppose to use */
+    #define KFUNC_MD_INSN_OFFSET	(CONFIG_FUNCTION_PADDING_BYTES)
+    #define KFUNC_MD_DATA_OFFSET	(CONFIG_FUNCTION_PADDING_BYTES - 1)
+  #endif
+#endif
+
+#define KFUNC_MD_INSN_SIZE		(5)
+#endif
+
 #ifdef CONFIG_FUNCTION_TRACER
 #ifndef CC_USING_FENTRY
 # error Compiler does not support fentry?
@@ -168,4 +190,36 @@ static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
 #endif /* !COMPILE_OFFSETS */
 #endif /* !__ASSEMBLY__ */
 
+#if !defined(__ASSEMBLY__) && defined(CONFIG_FUNCTION_METADATA)
+#include <asm/text-patching.h>
+
+static inline bool kfunc_md_arch_exist(void *ip)
+{
+	return *(u8 *)(ip - KFUNC_MD_INSN_OFFSET) == 0xB8;
+}
+
+static inline void kfunc_md_arch_pretend(u8 *insn, u32 index)
+{
+	*insn = 0xB8;
+	*(u32 *)(insn + 1) = index;
+}
+
+static inline void kfunc_md_arch_nops(u8 *insn)
+{
+	*(insn++) = BYTES_NOP1;
+	*(insn++) = BYTES_NOP1;
+	*(insn++) = BYTES_NOP1;
+	*(insn++) = BYTES_NOP1;
+	*(insn++) = BYTES_NOP1;
+}
+
+static inline int kfunc_md_arch_poke(void *ip, u8 *insn)
+{
+	text_poke(ip, insn, KFUNC_MD_INSN_SIZE);
+	text_poke_sync();
+	return 0;
+}
+
+#endif
+
 #endif /* _ASM_X86_FTRACE_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 4/4] arm64: implement per-function metadata storage for arm64
  2025-03-03  6:53 [PATCH bpf-next v3 0/4] per-function storage support Menglong Dong
                   ` (2 preceding siblings ...)
  2025-03-03  6:53 ` [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86 Menglong Dong
@ 2025-03-03  6:53 ` Menglong Dong
  3 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-03-03  6:53 UTC (permalink / raw)
  To: peterz, rostedt, mark.rutland, alexei.starovoitov
  Cc: catalin.marinas, will, mhiramat, tglx, mingo, bp, dave.hansen,
	x86, hpa, ast, daniel, andrii, martin.lau, eddyz87, yonghong.song,
	john.fastabend, kpsingh, sdf, jolsa, davem, dsahern,
	mathieu.desnoyers, nathan, nick.desaulniers+lkml, morbo,
	samitolvanen, kees, dongml2, akpm, riel, rppt, linux-arm-kernel,
	linux-kernel, linux-trace-kernel, bpf, netdev, llvm

The per-function metadata storage is already used by ftrace if
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS is enabled, and it store the pointer
of the callback directly to the function padding, which consume 8-bytes,
in the commit
baaf553d3bc3 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS").
So we can directly store the index to the function padding too, without
a prepending. With CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS enabled, the
function is 8-bytes aligned, and we will compile the kernel with extra
8-bytes (2 NOPS) padding space. Otherwise, the function is 4-bytes
aligned, and only extra 4-bytes (1 NOPS) is needed.

However, we have the same problem with Mark in the commit above: we can't
use the function padding together with CFI_CLANG, which can make the clang
compiles a wrong offset to the pre-function type hash. He said that he was
working with others on this problem 2 years ago. Hi Mark, is there any
progress on this problem?

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 arch/arm64/Kconfig              | 15 +++++++++++++++
 arch/arm64/Makefile             | 23 ++++++++++++++++++++--
 arch/arm64/include/asm/ftrace.h | 34 +++++++++++++++++++++++++++++++++
 arch/arm64/kernel/ftrace.c      | 13 +++++++++++--
 4 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 940343beb3d4..7ed80f5eb267 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1536,6 +1536,21 @@ config NODES_SHIFT
 	  Specify the maximum number of NUMA Nodes available on the target
 	  system.  Increases memory reserved to accommodate various tables.
 
+config FUNCTION_METADATA
+	bool "Per-function metadata storage support"
+	default y
+	select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE if !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+	depends on !CFI_CLANG
+	help
+	  Support per-function metadata storage for kernel functions, and
+	  get the metadata of the function by its address with almost no
+	  overhead.
+
+	  The index of the metadata will be stored in the function padding,
+	  which will consume 4-bytes. If FUNCTION_ALIGNMENT_8B is enabled,
+	  extra 8-bytes function padding will be reserved during compiling.
+	  Otherwise, only extra 4-bytes function padding is needed.
+
 source "kernel/Kconfig.hz"
 
 config ARCH_SPARSEMEM_ENABLE
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2b25d671365f..2df2b0f4dd90 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -144,12 +144,31 @@ endif
 
 CHECKFLAGS	+= -D__aarch64__
 
+ifeq ($(CONFIG_FUNCTION_METADATA),y)
+  ifeq ($(CONFIG_FUNCTION_ALIGNMENT_8B),y)
+  __padding_nops := 2
+  else
+  __padding_nops := 1
+  endif
+else
+  __padding_nops := 0
+endif
+
 ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS),y)
+  __padding_nops  := $(shell echo $(__padding_nops) + 2 | bc)
   KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
-  CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops)
 else ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_ARGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops)
   KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
-  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+else ifeq ($(CONFIG_FUNCTION_METADATA),y)
+  CC_FLAGS_FTRACE += -fpatchable-function-entry=$(__padding_nops),$(__padding_nops)
+  ifneq ($(CONFIG_FUNCTION_TRACER),y)
+    KBUILD_CFLAGS += $(CC_FLAGS_FTRACE)
+    # some file need to remove this cflag when CONFIG_FUNCTION_TRACER
+    # is not enabled, so we need to export it here
+    export CC_FLAGS_FTRACE
+  endif
 endif
 
 ifeq ($(CONFIG_KASAN_SW_TAGS), y)
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index bfe3ce9df197..aa3eaa91bf82 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -24,6 +24,16 @@
 #define FTRACE_PLT_IDX		0
 #define NR_FTRACE_PLTS		1
 
+#ifdef CONFIG_FUNCTION_METADATA
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+#define KFUNC_MD_DATA_OFFSET	(AARCH64_INSN_SIZE * 3)
+#else
+#define KFUNC_MD_DATA_OFFSET	AARCH64_INSN_SIZE
+#endif
+#define KFUNC_MD_INSN_SIZE	AARCH64_INSN_SIZE
+#define KFUNC_MD_INSN_OFFSET	KFUNC_MD_DATA_OFFSET
+#endif
+
 /*
  * Currently, gcc tends to save the link register after the local variables
  * on the stack. This causes the max stack tracer to report the function
@@ -216,6 +226,30 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
 	 */
 	return !strcmp(sym + 8, name);
 }
+
+#ifdef CONFIG_FUNCTION_METADATA
+#include <asm/text-patching.h>
+
+static inline bool kfunc_md_arch_exist(void *ip)
+{
+	return !aarch64_insn_is_nop(*(u32 *)(ip - KFUNC_MD_INSN_OFFSET));
+}
+
+static inline void kfunc_md_arch_pretend(u8 *insn, u32 index)
+{
+	*(u32 *)insn = index;
+}
+
+static inline void kfunc_md_arch_nops(u8 *insn)
+{
+	*(u32 *)insn = aarch64_insn_gen_nop();
+}
+
+static inline int kfunc_md_arch_poke(void *ip, u8 *insn)
+{
+	return aarch64_insn_patch_text_nosync(ip, *(u32 *)insn);
+}
+#endif
 #endif /* ifndef __ASSEMBLY__ */
 
 #ifndef __ASSEMBLY__
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index d7c0d023dfe5..4191ff0037f5 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -88,8 +88,10 @@ unsigned long ftrace_call_adjust(unsigned long addr)
 	 * to `BL <caller>`, which is at `addr + 4` bytes in either case.
 	 *
 	 */
-	if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
-		return addr + AARCH64_INSN_SIZE;
+	if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) {
+		addr += AARCH64_INSN_SIZE;
+		goto out;
+	}
 
 	/*
 	 * When using patchable-function-entry with pre-function NOPs, addr is
@@ -139,6 +141,13 @@ unsigned long ftrace_call_adjust(unsigned long addr)
 
 	/* Skip the first NOP after function entry */
 	addr += AARCH64_INSN_SIZE;
+out:
+	if (IS_ENABLED(CONFIG_FUNCTION_METADATA)) {
+		if (IS_ENABLED(CONFIG_FUNCTION_ALIGNMENT_8B))
+			addr += 2 * AARCH64_INSN_SIZE;
+		else
+			addr += AARCH64_INSN_SIZE;
+	}
 
 	return addr;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset
  2025-03-03  6:53 ` [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset Menglong Dong
@ 2025-03-03  9:18   ` Peter Zijlstra
  2025-03-03 10:51     ` Menglong Dong
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2025-03-03  9:18 UTC (permalink / raw)
  To: Menglong Dong
  Cc: rostedt, mark.rutland, alexei.starovoitov, catalin.marinas, will,
	mhiramat, tglx, mingo, bp, dave.hansen, x86, hpa, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, jolsa, davem, dsahern, mathieu.desnoyers, nathan,
	nick.desaulniers+lkml, morbo, samitolvanen, kees, dongml2, akpm,
	riel, rppt, linux-arm-kernel, linux-kernel, linux-trace-kernel,
	bpf, netdev, llvm

On Mon, Mar 03, 2025 at 02:53:42PM +0800, Menglong Dong wrote:
> index c71b575bf229..ad050d09cb2b 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -908,7 +908,7 @@ void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct mod
>  
>  		poison_endbr(addr, wr_addr, true);
>  		if (IS_ENABLED(CONFIG_FINEIBT))
> -			poison_cfi(addr - 16, wr_addr - 16);
> +			poison_cfi(addr, wr_addr);
>  	}
>  }

If you're touching this code, please use tip/x86/core or tip/master.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset
  2025-03-03  9:18   ` Peter Zijlstra
@ 2025-03-03 10:51     ` Menglong Dong
  2025-03-03 11:24       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Menglong Dong @ 2025-03-03 10:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rostedt, mark.rutland, alexei.starovoitov, catalin.marinas, will,
	mhiramat, tglx, mingo, bp, dave.hansen, x86, hpa, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, jolsa, davem, dsahern, mathieu.desnoyers, nathan,
	nick.desaulniers+lkml, morbo, samitolvanen, kees, dongml2, akpm,
	riel, rppt, linux-arm-kernel, linux-kernel, linux-trace-kernel,
	bpf, netdev, llvm

On Mon, Mar 3, 2025 at 5:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 03, 2025 at 02:53:42PM +0800, Menglong Dong wrote:
> > index c71b575bf229..ad050d09cb2b 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -908,7 +908,7 @@ void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct mod
> >
> >               poison_endbr(addr, wr_addr, true);
> >               if (IS_ENABLED(CONFIG_FINEIBT))
> > -                     poison_cfi(addr - 16, wr_addr - 16);
> > +                     poison_cfi(addr, wr_addr);
> >       }
> >  }
>
> If you're touching this code, please use tip/x86/core or tip/master.

Thank you for reminding me that, I were using the linux-next, and
I notice that you just did some optimization to the FINEIBT :/

I'll send a V4 later, based on the tip/x86/core.

Thanks!
Menglong Dong

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset
  2025-03-03 10:51     ` Menglong Dong
@ 2025-03-03 11:24       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2025-03-03 11:24 UTC (permalink / raw)
  To: Menglong Dong
  Cc: rostedt, mark.rutland, alexei.starovoitov, catalin.marinas, will,
	mhiramat, tglx, mingo, bp, dave.hansen, x86, hpa, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, jolsa, davem, dsahern, mathieu.desnoyers, nathan,
	nick.desaulniers+lkml, morbo, samitolvanen, kees, dongml2, akpm,
	riel, rppt, linux-arm-kernel, linux-kernel, linux-trace-kernel,
	bpf, netdev, llvm

On Mon, Mar 03, 2025 at 06:51:41PM +0800, Menglong Dong wrote:
> On Mon, Mar 3, 2025 at 5:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Mar 03, 2025 at 02:53:42PM +0800, Menglong Dong wrote:
> > > index c71b575bf229..ad050d09cb2b 100644
> > > --- a/arch/x86/kernel/alternative.c
> > > +++ b/arch/x86/kernel/alternative.c
> > > @@ -908,7 +908,7 @@ void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct mod
> > >
> > >               poison_endbr(addr, wr_addr, true);
> > >               if (IS_ENABLED(CONFIG_FINEIBT))
> > > -                     poison_cfi(addr - 16, wr_addr - 16);
> > > +                     poison_cfi(addr, wr_addr);
> > >       }
> > >  }
> >
> > If you're touching this code, please use tip/x86/core or tip/master.
> 
> Thank you for reminding me that, I were using the linux-next, and

That must've been an very old -next, because that wr_addr crap has been
gone a while now.

Anyway, thanks for moving to a newer tree!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86
  2025-03-03  6:53 ` [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86 Menglong Dong
@ 2025-03-03 16:05   ` Steven Rostedt
  2025-03-04  2:07     ` Menglong Dong
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2025-03-03 16:05 UTC (permalink / raw)
  To: Menglong Dong
  Cc: peterz, mark.rutland, alexei.starovoitov, catalin.marinas, will,
	mhiramat, tglx, mingo, bp, dave.hansen, x86, hpa, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, jolsa, davem, dsahern, mathieu.desnoyers, nathan,
	nick.desaulniers+lkml, morbo, samitolvanen, kees, dongml2, akpm,
	riel, rppt, linux-arm-kernel, linux-kernel, linux-trace-kernel,
	bpf, netdev, llvm

On Mon,  3 Mar 2025 14:53:44 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:

> In the third case, we make the kernel function 32 bytes aligned, and there
> will be 32 bytes padding before the functions. According to my testing,
> the text size didn't increase on this case, which is weird.
> 
> With 16-bytes padding:
> 
> -rwxr-xr-x 1 401190688  x86-dev/vmlinux*
> -rw-r--r-- 1    251068  x86-dev/vmlinux.a
> -rw-r--r-- 1 851892992  x86-dev/vmlinux.o
> -rw-r--r-- 1  12395008  x86-dev/arch/x86/boot/bzImage
> 
> With 32-bytes padding:
> 
> -rwxr-xr-x 1 401318128 x86-dev/vmlinux*
> -rw-r--r-- 1    251154 x86-dev/vmlinux.a
> -rw-r--r-- 1 853636704 x86-dev/vmlinux.o
> -rw-r--r-- 1  12509696 x86-dev/arch/x86/boot/bzImage

Use the "size" command to see the differences in sizes and not the file size.

$ size vmlinux
   text    data     bss     dec     hex filename
36892658        9798658 16982016        63673332        3cb93f4 vmlinux

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86
  2025-03-03 16:05   ` Steven Rostedt
@ 2025-03-04  2:07     ` Menglong Dong
  2025-03-04  3:06       ` Menglong Dong
  0 siblings, 1 reply; 11+ messages in thread
From: Menglong Dong @ 2025-03-04  2:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, mark.rutland, alexei.starovoitov, catalin.marinas, will,
	mhiramat, tglx, mingo, bp, dave.hansen, x86, hpa, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, jolsa, davem, dsahern, mathieu.desnoyers, nathan,
	nick.desaulniers+lkml, morbo, samitolvanen, kees, dongml2, akpm,
	riel, rppt, linux-arm-kernel, linux-kernel, linux-trace-kernel,
	bpf, netdev, llvm

On Tue, Mar 4, 2025 at 12:05 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon,  3 Mar 2025 14:53:44 +0800
> Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> > In the third case, we make the kernel function 32 bytes aligned, and there
> > will be 32 bytes padding before the functions. According to my testing,
> > the text size didn't increase on this case, which is weird.
> >
> > With 16-bytes padding:
> >
> > -rwxr-xr-x 1 401190688  x86-dev/vmlinux*
> > -rw-r--r-- 1    251068  x86-dev/vmlinux.a
> > -rw-r--r-- 1 851892992  x86-dev/vmlinux.o
> > -rw-r--r-- 1  12395008  x86-dev/arch/x86/boot/bzImage
> >
> > With 32-bytes padding:
> >
> > -rwxr-xr-x 1 401318128 x86-dev/vmlinux*
> > -rw-r--r-- 1    251154 x86-dev/vmlinux.a
> > -rw-r--r-- 1 853636704 x86-dev/vmlinux.o
> > -rw-r--r-- 1  12509696 x86-dev/arch/x86/boot/bzImage
>
> Use the "size" command to see the differences in sizes and not the file size.
>
> $ size vmlinux
>    text    data     bss     dec     hex filename
> 36892658        9798658 16982016        63673332        3cb93f4 vmlinux

Great! It seems that the way I tested has something wrong. I'll
compare the text size with "size" command later.

Thanks!
Menglong Dong

>
> -- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86
  2025-03-04  2:07     ` Menglong Dong
@ 2025-03-04  3:06       ` Menglong Dong
  0 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-03-04  3:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, mark.rutland, alexei.starovoitov, catalin.marinas, will,
	mhiramat, tglx, mingo, bp, dave.hansen, x86, hpa, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, jolsa, davem, dsahern, mathieu.desnoyers, nathan,
	nick.desaulniers+lkml, morbo, samitolvanen, kees, dongml2, akpm,
	riel, rppt, linux-arm-kernel, linux-kernel, linux-trace-kernel,
	bpf, netdev, llvm

On Tue, Mar 4, 2025 at 10:07 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> On Tue, Mar 4, 2025 at 12:05 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Mon,  3 Mar 2025 14:53:44 +0800
> > Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > > In the third case, we make the kernel function 32 bytes aligned, and there
> > > will be 32 bytes padding before the functions. According to my testing,
> > > the text size didn't increase on this case, which is weird.
> > >
> > > With 16-bytes padding:
> > >
> > > -rwxr-xr-x 1 401190688  x86-dev/vmlinux*
> > > -rw-r--r-- 1    251068  x86-dev/vmlinux.a
> > > -rw-r--r-- 1 851892992  x86-dev/vmlinux.o
> > > -rw-r--r-- 1  12395008  x86-dev/arch/x86/boot/bzImage
> > >
> > > With 32-bytes padding:
> > >
> > > -rwxr-xr-x 1 401318128 x86-dev/vmlinux*
> > > -rw-r--r-- 1    251154 x86-dev/vmlinux.a
> > > -rw-r--r-- 1 853636704 x86-dev/vmlinux.o
> > > -rw-r--r-- 1  12509696 x86-dev/arch/x86/boot/bzImage
> >
> > Use the "size" command to see the differences in sizes and not the file size.
> >
> > $ size vmlinux
> >    text    data     bss     dec     hex filename
> > 36892658        9798658 16982016        63673332        3cb93f4 vmlinux
>
> Great! It seems that the way I tested has something wrong. I'll
> compare the text size with "size" command later.

With the size command, the text size with 32-bytes padding is:

  text    data     bss     dec     hex filename
48299471        14776173        18345936        81421580
4da650c x86-dev/vmlinux

And with 16-bytes padding is:

  text    data     bss     dec     hex filename
46620640        14772017        18458396        79851053
4c26e2d x86-dev/vmlinux

It increases about 3%, which I think is acceptable in this case.

I'll post the message in the commit log of the next version.

Thanks!
Menglong Dong

>
> Thanks!
> Menglong Dong
>
> >
> > -- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-03-04  3:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-03  6:53 [PATCH bpf-next v3 0/4] per-function storage support Menglong Dong
2025-03-03  6:53 ` [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset Menglong Dong
2025-03-03  9:18   ` Peter Zijlstra
2025-03-03 10:51     ` Menglong Dong
2025-03-03 11:24       ` Peter Zijlstra
2025-03-03  6:53 ` [PATCH bpf-next v3 2/4] add per-function metadata storage support Menglong Dong
2025-03-03  6:53 ` [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86 Menglong Dong
2025-03-03 16:05   ` Steven Rostedt
2025-03-04  2:07     ` Menglong Dong
2025-03-04  3:06       ` Menglong Dong
2025-03-03  6:53 ` [PATCH bpf-next v3 4/4] arm64: implement per-function metadata storage for arm64 Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox