BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg()
@ 2024-01-05 10:48 Hou Tao
  2024-01-05 10:48 ` [PATCH bpf-next v3 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Hou Tao @ 2024-01-05 10:48 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The motivation of inlining bpf_kptr_xchg() comes from the performance
profiling of bpf memory allocator benchmark [1]. The benchmark uses
bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
objects for free. After inling bpf_kptr_xchg(), the performance for
object free on 8-CPUs VM increases about 2%~10%. However the performance
gain comes with costs: both the kasan and kcsan checks on the pointer
will be unavailable. Initially the inline is implemented in do_jit() for
x86-64 directly, but I think it will more portable to implement the
inline in verifier.

Patch #1 supports inlining bpf_kptr_xchg() helper and enables it on
x86-4. Patch #2 factors out a helper for newly-added test in patch #3.
Patch #3 tests whether the inlining of bpf_kptr_xchg() is expected.
Please see individual patches for more details. And comments are always
welcome.

Change Log:
v3:
  * rebased on bpf-next tree
  * patch 1 & 2: Add Rvb-by and Ack-by tags from Eduard
  * patch 3: use inline assembly and naked function instead of c code
             (suggested by Eduard)

v2: https://lore.kernel.org/bpf/20231223104042.1432300-1-houtao@huaweicloud.com/
  * rebased on bpf-next tree
  * drop patch #1 in v1 due to discussion in [2]
  * patch #1: add the motivation in the commit message, merge patch #1
              and #3 into the new patch in v2. (Daniel)
  * patch #2/#3: newly-added patch to test the inlining of
                 bpf_kptr_xchg() (Eduard)

v1: https://lore.kernel.org/bpf/95b8c2cd-44d5-5fe1-60b5-7e8218779566@huaweicloud.com/

[1]: https://lore.kernel.org/bpf/20231221141501.3588586-1-houtao@huaweicloud.com/
[2]: https://lore.kernel.org/bpf/fd94efb9-4a56-c982-dc2e-c66be5202cb7@huaweicloud.com/

Hou Tao (3):
  bpf: Support inlining bpf_kptr_xchg() helper
  selftests/bpf: Factor out get_xlated_program() helper
  selftests/bpf: Test the inlining of bpf_kptr_xchg()

 arch/x86/net/bpf_jit_comp.c                   |  5 ++
 include/linux/filter.h                        |  1 +
 kernel/bpf/core.c                             | 10 ++++
 kernel/bpf/helpers.c                          |  1 +
 kernel/bpf/verifier.c                         | 17 +++++++
 .../selftests/bpf/prog_tests/ctx_rewrite.c    | 44 ----------------
 .../bpf/prog_tests/kptr_xchg_inline.c         | 51 +++++++++++++++++++
 .../selftests/bpf/progs/kptr_xchg_inline.c    | 48 +++++++++++++++++
 tools/testing/selftests/bpf/test_verifier.c   | 47 +----------------
 tools/testing/selftests/bpf/testing_helpers.c | 42 +++++++++++++++
 tools/testing/selftests/bpf/testing_helpers.h |  6 +++
 11 files changed, 183 insertions(+), 89 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
 create mode 100644 tools/testing/selftests/bpf/progs/kptr_xchg_inline.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 1/3] bpf: Support inlining bpf_kptr_xchg() helper
  2024-01-05 10:48 [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
@ 2024-01-05 10:48 ` Hou Tao
  2024-01-05 10:48 ` [PATCH bpf-next v3 2/3] selftests/bpf: Factor out get_xlated_program() helper Hou Tao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Hou Tao @ 2024-01-05 10:48 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

The motivation of inlining bpf_kptr_xchg() comes from the performance
profiling of bpf memory allocator benchmark. The benchmark uses
bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
objects for free. After inling bpf_kptr_xchg(), the performance for
object free on 8-CPUs VM increases about 2%~10%. The inline also has
downside: both the kasan and kcsan checks on the pointer will be
unavailable.

bpf_kptr_xchg() can be inlined by converting the calling of
bpf_kptr_xchg() into an atomic_xchg() instruction. But the conversion
depends on two conditions:
1) JIT backend supports atomic_xchg() on pointer-sized word
2) For the specific arch, the implementation of xchg is the same as
   atomic_xchg() on pointer-sized words.

It seems most 64-bit JIT backends satisfies these two conditions. But
as a precaution, defining a weak function bpf_jit_supports_ptr_xchg()
to state whether such conversion is safe and only supporting inline for
64-bit host.

For x86-64, it supports BPF_XCHG atomic operation and both xchg() and
atomic_xchg() use arch_xchg() to implement the exchange, so enabling the
inline of bpf_kptr_xchg() on x86-64 first.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 arch/x86/net/bpf_jit_comp.c |  5 +++++
 include/linux/filter.h      |  1 +
 kernel/bpf/core.c           | 10 ++++++++++
 kernel/bpf/helpers.c        |  1 +
 kernel/bpf/verifier.c       | 17 +++++++++++++++++
 5 files changed, 34 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index bdacbb84456d9..9fd275b5827ac 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3245,3 +3245,8 @@ void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
 		BUG_ON(ret < 0);
 	}
 }
+
+bool bpf_jit_supports_ptr_xchg(void)
+{
+	return true;
+}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 68fb6c8142fec..35f067fd3840a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -955,6 +955,7 @@ bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_kfunc_call(void);
 bool bpf_jit_supports_far_kfunc_call(void);
 bool bpf_jit_supports_exceptions(void);
+bool bpf_jit_supports_ptr_xchg(void);
 void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
 bool bpf_helper_changes_pkt_data(void *func);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ea6843be2616c..fbb1d95a9b446 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2925,6 +2925,16 @@ bool __weak bpf_jit_supports_far_kfunc_call(void)
 	return false;
 }
 
+/* Return TRUE if the JIT backend satisfies the following two conditions:
+ * 1) JIT backend supports atomic_xchg() on pointer-sized words.
+ * 2) Under the specific arch, the implementation of xchg() is the same
+ *    as atomic_xchg() on pointer-sized words.
+ */
+bool __weak bpf_jit_supports_ptr_xchg(void)
+{
+	return false;
+}
+
 /* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
  * skb_copy_bits(), so provide a weak definition of it for NET-less config.
  */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index be72824f32b2c..e04ca1af89272 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1414,6 +1414,7 @@ BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
 {
 	unsigned long *kptr = map_value;
 
+	/* This helper may be inlined by verifier. */
 	return xchg(kptr, (unsigned long)ptr);
 }
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d5f4ff1eb235d..b962357acad41 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19805,6 +19805,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			continue;
 		}
 
+		/* Implement bpf_kptr_xchg inline */
+		if (prog->jit_requested && BITS_PER_LONG == 64 &&
+		    insn->imm == BPF_FUNC_kptr_xchg &&
+		    bpf_jit_supports_ptr_xchg()) {
+			insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
+			insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
+			cnt = 2;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += cnt - 1;
+			env->prog = prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+			continue;
+		}
 patch_call_imm:
 		fn = env->ops->get_func_proto(insn->imm, env->prog);
 		/* all functions that have prototype and verifier allowed
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 2/3] selftests/bpf: Factor out get_xlated_program() helper
  2024-01-05 10:48 [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
  2024-01-05 10:48 ` [PATCH bpf-next v3 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
@ 2024-01-05 10:48 ` Hou Tao
  2024-01-05 10:48 ` [PATCH bpf-next v3 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Hou Tao @ 2024-01-05 10:48 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

Both test_verifier and test_progs use get_xlated_program(), so moving
the helper into testing_helpers.h to reuse it.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../selftests/bpf/prog_tests/ctx_rewrite.c    | 44 -----------------
 tools/testing/selftests/bpf/test_verifier.c   | 47 +------------------
 tools/testing/selftests/bpf/testing_helpers.c | 42 +++++++++++++++++
 tools/testing/selftests/bpf/testing_helpers.h |  6 +++
 4 files changed, 50 insertions(+), 89 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
index 4951aa978f332..3b7c57fe55a59 100644
--- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
+++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
@@ -626,50 +626,6 @@ static bool match_pattern(struct btf *btf, char *pattern, char *text, char *reg_
 	return false;
 }
 
-/* Request BPF program instructions after all rewrites are applied,
- * e.g. verifier.c:convert_ctx_access() is done.
- */
-static int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt)
-{
-	struct bpf_prog_info info = {};
-	__u32 info_len = sizeof(info);
-	__u32 xlated_prog_len;
-	__u32 buf_element_size = sizeof(struct bpf_insn);
-
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("bpf_prog_get_info_by_fd failed");
-		return -1;
-	}
-
-	xlated_prog_len = info.xlated_prog_len;
-	if (xlated_prog_len % buf_element_size) {
-		printf("Program length %d is not multiple of %d\n",
-		       xlated_prog_len, buf_element_size);
-		return -1;
-	}
-
-	*cnt = xlated_prog_len / buf_element_size;
-	*buf = calloc(*cnt, buf_element_size);
-	if (!buf) {
-		perror("can't allocate xlated program buffer");
-		return -ENOMEM;
-	}
-
-	bzero(&info, sizeof(info));
-	info.xlated_prog_len = xlated_prog_len;
-	info.xlated_prog_insns = (__u64)(unsigned long)*buf;
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("second bpf_prog_get_info_by_fd failed");
-		goto out_free_buf;
-	}
-
-	return 0;
-
-out_free_buf:
-	free(*buf);
-	return -1;
-}
-
 static void print_insn(void *private_data, const char *fmt, ...)
 {
 	va_list args;
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index f36e41435be79..87519d7fe4c62 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1341,48 +1341,6 @@ static bool cmp_str_seq(const char *log, const char *exp)
 	return true;
 }
 
-static struct bpf_insn *get_xlated_program(int fd_prog, int *cnt)
-{
-	__u32 buf_element_size = sizeof(struct bpf_insn);
-	struct bpf_prog_info info = {};
-	__u32 info_len = sizeof(info);
-	__u32 xlated_prog_len;
-	struct bpf_insn *buf;
-
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("bpf_prog_get_info_by_fd failed");
-		return NULL;
-	}
-
-	xlated_prog_len = info.xlated_prog_len;
-	if (xlated_prog_len % buf_element_size) {
-		printf("Program length %d is not multiple of %d\n",
-		       xlated_prog_len, buf_element_size);
-		return NULL;
-	}
-
-	*cnt = xlated_prog_len / buf_element_size;
-	buf = calloc(*cnt, buf_element_size);
-	if (!buf) {
-		perror("can't allocate xlated program buffer");
-		return NULL;
-	}
-
-	bzero(&info, sizeof(info));
-	info.xlated_prog_len = xlated_prog_len;
-	info.xlated_prog_insns = (__u64)(unsigned long)buf;
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("second bpf_prog_get_info_by_fd failed");
-		goto out_free_buf;
-	}
-
-	return buf;
-
-out_free_buf:
-	free(buf);
-	return NULL;
-}
-
 static bool is_null_insn(struct bpf_insn *insn)
 {
 	struct bpf_insn null_insn = {};
@@ -1505,7 +1463,7 @@ static void print_insn(struct bpf_insn *buf, int cnt)
 static bool check_xlated_program(struct bpf_test *test, int fd_prog)
 {
 	struct bpf_insn *buf;
-	int cnt;
+	unsigned int cnt;
 	bool result = true;
 	bool check_expected = !is_null_insn(test->expected_insns);
 	bool check_unexpected = !is_null_insn(test->unexpected_insns);
@@ -1513,8 +1471,7 @@ static bool check_xlated_program(struct bpf_test *test, int fd_prog)
 	if (!check_expected && !check_unexpected)
 		goto out;
 
-	buf = get_xlated_program(fd_prog, &cnt);
-	if (!buf) {
+	if (get_xlated_program(fd_prog, &buf, &cnt)) {
 		printf("FAIL: can't get xlated program\n");
 		result = false;
 		goto out;
diff --git a/tools/testing/selftests/bpf/testing_helpers.c b/tools/testing/selftests/bpf/testing_helpers.c
index d2458c1b16719..53c40f62fdcb8 100644
--- a/tools/testing/selftests/bpf/testing_helpers.c
+++ b/tools/testing/selftests/bpf/testing_helpers.c
@@ -387,3 +387,45 @@ int kern_sync_rcu(void)
 {
 	return syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0, 0);
 }
+
+int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt)
+{
+	__u32 buf_element_size = sizeof(struct bpf_insn);
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	__u32 xlated_prog_len;
+
+	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
+		perror("bpf_prog_get_info_by_fd failed");
+		return -1;
+	}
+
+	xlated_prog_len = info.xlated_prog_len;
+	if (xlated_prog_len % buf_element_size) {
+		printf("Program length %u is not multiple of %u\n",
+		       xlated_prog_len, buf_element_size);
+		return -1;
+	}
+
+	*cnt = xlated_prog_len / buf_element_size;
+	*buf = calloc(*cnt, buf_element_size);
+	if (!buf) {
+		perror("can't allocate xlated program buffer");
+		return -ENOMEM;
+	}
+
+	bzero(&info, sizeof(info));
+	info.xlated_prog_len = xlated_prog_len;
+	info.xlated_prog_insns = (__u64)(unsigned long)*buf;
+	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
+		perror("second bpf_prog_get_info_by_fd failed");
+		goto out_free_buf;
+	}
+
+	return 0;
+
+out_free_buf:
+	free(*buf);
+	*buf = NULL;
+	return -1;
+}
diff --git a/tools/testing/selftests/bpf/testing_helpers.h b/tools/testing/selftests/bpf/testing_helpers.h
index 35284faff4f29..1ed5b9200d661 100644
--- a/tools/testing/selftests/bpf/testing_helpers.h
+++ b/tools/testing/selftests/bpf/testing_helpers.h
@@ -46,4 +46,10 @@ static inline __u64 get_time_ns(void)
 	return (u64)t.tv_sec * 1000000000 + t.tv_nsec;
 }
 
+struct bpf_insn;
+/* Request BPF program instructions after all rewrites are applied,
+ * e.g. verifier.c:convert_ctx_access() is done.
+ */
+int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt);
+
 #endif /* __TESTING_HELPERS_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
  2024-01-05 10:48 [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
  2024-01-05 10:48 ` [PATCH bpf-next v3 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
  2024-01-05 10:48 ` [PATCH bpf-next v3 2/3] selftests/bpf: Factor out get_xlated_program() helper Hou Tao
@ 2024-01-05 10:48 ` Hou Tao
  2024-01-05 13:26   ` Eduard Zingerman
  2024-01-05 22:53 ` [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Song Liu
  2024-01-12  2:30 ` patchwork-bot+netdevbpf
  4 siblings, 1 reply; 10+ messages in thread
From: Hou Tao @ 2024-01-05 10:48 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

The test uses bpf_prog_get_info_by_fd() to obtain the xlated
instructions of the program first. Since these instructions have
already been rewritten by the verifier, the tests then checks whether
the rewritten instructions are as expected. And to ensure LLVM generates
code exactly as expected, use inline assembly and a naked function.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../bpf/prog_tests/kptr_xchg_inline.c         | 51 +++++++++++++++++++
 .../selftests/bpf/progs/kptr_xchg_inline.c    | 48 +++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
 create mode 100644 tools/testing/selftests/bpf/progs/kptr_xchg_inline.c

diff --git a/tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c b/tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
new file mode 100644
index 0000000000000..5a4bee1cf9707
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#include <test_progs.h>
+
+#include "linux/filter.h"
+#include "kptr_xchg_inline.skel.h"
+
+void test_kptr_xchg_inline(void)
+{
+	struct kptr_xchg_inline *skel;
+	struct bpf_insn *insn = NULL;
+	struct bpf_insn exp;
+	unsigned int cnt;
+	int err;
+
+#if !defined(__x86_64__)
+	test__skip();
+	return;
+#endif
+
+	skel = kptr_xchg_inline__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_load"))
+		return;
+
+	err = get_xlated_program(bpf_program__fd(skel->progs.kptr_xchg_inline), &insn, &cnt);
+	if (!ASSERT_OK(err, "prog insn"))
+		goto out;
+
+	/* The original instructions are:
+	 * r1 = map[id:xxx][0]+0
+	 * r2 = 0
+	 * call bpf_kptr_xchg#yyy
+	 *
+	 * call bpf_kptr_xchg#yyy will be inlined as:
+	 * r0 = r2
+	 * r0 = atomic64_xchg((u64 *)(r1 +0), r0)
+	 */
+	if (!ASSERT_GT(cnt, 5, "insn cnt"))
+		goto out;
+
+	exp = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
+	if (!ASSERT_OK(memcmp(&insn[3], &exp, sizeof(exp)), "mov"))
+		goto out;
+
+	exp = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
+	if (!ASSERT_OK(memcmp(&insn[4], &exp, sizeof(exp)), "xchg"))
+		goto out;
+out:
+	free(insn);
+	kptr_xchg_inline__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/kptr_xchg_inline.c b/tools/testing/selftests/bpf/progs/kptr_xchg_inline.c
new file mode 100644
index 0000000000000..2414ac20b6d50
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kptr_xchg_inline.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <bpf/bpf_helpers.h>
+
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct bin_data {
+	char blob[32];
+};
+
+#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
+private(kptr) struct bin_data __kptr * ptr;
+
+SEC("tc")
+__naked int kptr_xchg_inline(void)
+{
+	asm volatile (
+		"r1 = %[ptr] ll;"
+		"r2 = 0;"
+		"call %[bpf_kptr_xchg];"
+		"if r0 == 0 goto 1f;"
+		"r1 = r0;"
+		"r2 = 0;"
+		"call %[bpf_obj_drop_impl];"
+	"1:"
+		"r0 = 0;"
+		"exit;"
+		:
+		: __imm_addr(ptr),
+		  __imm(bpf_kptr_xchg),
+		  __imm(bpf_obj_drop_impl)
+		: __clobber_all
+	);
+}
+
+/* BTF FUNC records are not generated for kfuncs referenced
+ * from inline assembly. These records are necessary for
+ * libbpf to link the program. The function below is a hack
+ * to ensure that BTF FUNC records are generated.
+ */
+void __btf_root(void)
+{
+	bpf_obj_drop(NULL);
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
  2024-01-05 10:48 ` [PATCH bpf-next v3 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
@ 2024-01-05 13:26   ` Eduard Zingerman
  2024-01-06  2:32     ` Hou Tao
  0 siblings, 1 reply; 10+ messages in thread
From: Eduard Zingerman @ 2024-01-05 13:26 UTC (permalink / raw)
  To: Hou Tao, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

On Fri, 2024-01-05 at 18:48 +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> The test uses bpf_prog_get_info_by_fd() to obtain the xlated
> instructions of the program first. Since these instructions have
> already been rewritten by the verifier, the tests then checks whether
> the rewritten instructions are as expected. And to ensure LLVM generates
> code exactly as expected, use inline assembly and a naked function.
> 
> Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

Thank you for this adjustment.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg()
  2024-01-05 10:48 [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
                   ` (2 preceding siblings ...)
  2024-01-05 10:48 ` [PATCH bpf-next v3 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
@ 2024-01-05 22:53 ` Song Liu
  2024-01-06  2:34   ` Hou Tao
  2024-01-12  2:30 ` patchwork-bot+netdevbpf
  4 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2024-01-05 22:53 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Hi,
>
> The motivation of inlining bpf_kptr_xchg() comes from the performance
> profiling of bpf memory allocator benchmark [1]. The benchmark uses
> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
> objects for free. After inling bpf_kptr_xchg(), the performance for
> object free on 8-CPUs VM increases about 2%~10%. However the performance
> gain comes with costs: both the kasan and kcsan checks on the pointer
> will be unavailable. Initially the inline is implemented in do_jit() for
> x86-64 directly, but I think it will more portable to implement the
> inline in verifier.

How much work would it take to enable this on other major architectures?
AFAICT, most jit compilers already handle BPF_XCHG, so it should be
relatively simple?

Other than this, for the set

Acked-by: Song Liu <song@kernel.org>

Thanks,
Song

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
  2024-01-05 13:26   ` Eduard Zingerman
@ 2024-01-06  2:32     ` Hou Tao
  0 siblings, 0 replies; 10+ messages in thread
From: Hou Tao @ 2024-01-06  2:32 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1



On 1/5/2024 9:26 PM, Eduard Zingerman wrote:
> On Fri, 2024-01-05 at 18:48 +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> The test uses bpf_prog_get_info_by_fd() to obtain the xlated
>> instructions of the program first. Since these instructions have
>> already been rewritten by the verifier, the tests then checks whether
>> the rewritten instructions are as expected. And to ensure LLVM generates
>> code exactly as expected, use inline assembly and a naked function.
>>
>> Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> Thank you for this adjustment.

And thanks for the suggestions. Due to these great suggestions, I have
learned how to write BPF assembly code.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg()
  2024-01-05 22:53 ` [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Song Liu
@ 2024-01-06  2:34   ` Hou Tao
  2024-01-06  8:51     ` Song Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Hou Tao @ 2024-01-06  2:34 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1



On 1/6/2024 6:53 AM, Song Liu wrote:
> On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
>>
>> The motivation of inlining bpf_kptr_xchg() comes from the performance
>> profiling of bpf memory allocator benchmark [1]. The benchmark uses
>> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
>> objects for free. After inling bpf_kptr_xchg(), the performance for
>> object free on 8-CPUs VM increases about 2%~10%. However the performance
>> gain comes with costs: both the kasan and kcsan checks on the pointer
>> will be unavailable. Initially the inline is implemented in do_jit() for
>> x86-64 directly, but I think it will more portable to implement the
>> inline in verifier.
> How much work would it take to enable this on other major architectures?
> AFAICT, most jit compilers already handle BPF_XCHG, so it should be
> relatively simple?

Yes. I think enabling this inline will be relatively simple. As said in
patch #1, the inline depends on two conditions:
1) atomic_xchg() support on pointer-sized word.
2)  the implementation of xchg is the same as atomic_xchg() on
pointer-sized words.
For condition 1), I think most major architecture JIT backends have
support it. So the following work is to check the implementation of xchg
and atomic_xchg(), to enable the inline and to do more test.

I will try to enable the inline on arm64 first. And will x86-64 + arm64
be enough for the definition of "major architectures" ? Or Should it
include riscv, s380, powerpc as well ?

> Other than this, for the set
>
> Acked-by: Song Liu <song@kernel.org>

Thanks for the ack.
>
> Thanks,
> Song
> .


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg()
  2024-01-06  2:34   ` Hou Tao
@ 2024-01-06  8:51     ` Song Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2024-01-06  8:51 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

On Fri, Jan 5, 2024 at 6:34 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
>
>
> On 1/6/2024 6:53 AM, Song Liu wrote:
> > On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Hi,
> >>
> >> The motivation of inlining bpf_kptr_xchg() comes from the performance
> >> profiling of bpf memory allocator benchmark [1]. The benchmark uses
> >> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
> >> objects for free. After inling bpf_kptr_xchg(), the performance for
> >> object free on 8-CPUs VM increases about 2%~10%. However the performance
> >> gain comes with costs: both the kasan and kcsan checks on the pointer
> >> will be unavailable. Initially the inline is implemented in do_jit() for
> >> x86-64 directly, but I think it will more portable to implement the
> >> inline in verifier.
> > How much work would it take to enable this on other major architectures?
> > AFAICT, most jit compilers already handle BPF_XCHG, so it should be
> > relatively simple?
>
> Yes. I think enabling this inline will be relatively simple. As said in
> patch #1, the inline depends on two conditions:
> 1) atomic_xchg() support on pointer-sized word.
> 2)  the implementation of xchg is the same as atomic_xchg() on
> pointer-sized words.
> For condition 1), I think most major architecture JIT backends have
> support it. So the following work is to check the implementation of xchg
> and atomic_xchg(), to enable the inline and to do more test.

Thanks for the clarification.

> I will try to enable the inline on arm64 first. And will x86-64 + arm64
> be enough for the definition of "major architectures" ? Or Should it
> include riscv, s380, powerpc as well ?

x86_64 + arm64 is "major" enough. :) Maintainers of other JIT engines
can help with other archs.

Thanks,
Song

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg()
  2024-01-05 10:48 [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
                   ` (3 preceding siblings ...)
  2024-01-05 22:53 ` [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Song Liu
@ 2024-01-12  2:30 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-12  2:30 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, martin.lau, alexei.starovoitov, andrii, song, haoluo,
	yonghong.song, daniel, kpsingh, sdf, jolsa, john.fastabend,
	eddyz87, houtao1

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri,  5 Jan 2024 18:48:16 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The motivation of inlining bpf_kptr_xchg() comes from the performance
> profiling of bpf memory allocator benchmark [1]. The benchmark uses
> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
> objects for free. After inling bpf_kptr_xchg(), the performance for
> object free on 8-CPUs VM increases about 2%~10%. However the performance
> gain comes with costs: both the kasan and kcsan checks on the pointer
> will be unavailable. Initially the inline is implemented in do_jit() for
> x86-64 directly, but I think it will more portable to implement the
> inline in verifier.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,1/3] bpf: Support inlining bpf_kptr_xchg() helper
    https://git.kernel.org/bpf/bpf-next/c/ac780beba187
  - [bpf-next,v3,2/3] selftests/bpf: Factor out get_xlated_program() helper
    https://git.kernel.org/bpf/bpf-next/c/10cdab919df6
  - [bpf-next,v3,3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
    https://git.kernel.org/bpf/bpf-next/c/ca8cf57c7754

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-12  2:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-05 10:48 [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
2024-01-05 10:48 ` [PATCH bpf-next v3 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
2024-01-05 10:48 ` [PATCH bpf-next v3 2/3] selftests/bpf: Factor out get_xlated_program() helper Hou Tao
2024-01-05 10:48 ` [PATCH bpf-next v3 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
2024-01-05 13:26   ` Eduard Zingerman
2024-01-06  2:32     ` Hou Tao
2024-01-05 22:53 ` [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg() Song Liu
2024-01-06  2:34   ` Hou Tao
2024-01-06  8:51     ` Song Liu
2024-01-12  2:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox