BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/3] bpf: inline bpf_kptr_xchg()
@ 2023-12-23 10:40 Hou Tao
  2023-12-23 10:40 ` [PATCH bpf-next v2 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Hou Tao @ 2023-12-23 10:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The motivation of inlining bpf_kptr_xchg() comes from the performance
profiling of bpf memory allocator benchmark [1]. The benchmark uses
bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
objects for free. After inling bpf_kptr_xchg(), the performance for
object free on 8-CPUs VM increases about 2%~10%. However the performance
gain comes with costs: both the kasan and kcsan checks on the pointer
will be unavailable. Initially the inline is implemented in do_jit() for
x86-64 directly, but I think it will more portable to implement the
inline in verifier.

Patch #1 supports inlining bpf_kptr_xchg() helper and enables it on
x86-4. Patch #2 factors out a helper for newly-added test in patch #3.
Patch #3 tests whether the inlining of bpf_kptr_xchg() is expected.
Please see individual patches for more details. And comments are always
welcome.

Change Log:
v2:
  * rebased on bpf-next tree
  * drop patch #1 in v1 due to discussion in [2]
  * patch #1: add the motivation in the commit message, merge patch #1
              and #3 into the new patch in v2. (Daniel)
  * patch #2/#3: newly-added patch to test the inlining of
                 bpf_kptr_xchg() (Eduard)

v1: https://lore.kernel.org/bpf/95b8c2cd-44d5-5fe1-60b5-7e8218779566@huaweicloud.com/

[1]: https://lore.kernel.org/bpf/20231221141501.3588586-1-houtao@huaweicloud.com/
[2]: https://lore.kernel.org/bpf/fd94efb9-4a56-c982-dc2e-c66be5202cb7@huaweicloud.com/

Hou Tao (3):
  bpf: Support inlining bpf_kptr_xchg() helper
  selftests/bpf: Factor out get_xlated_program() helper
  selftests/bpf: Test the inlining of bpf_kptr_xchg()

 arch/x86/net/bpf_jit_comp.c                   |  5 ++
 include/linux/filter.h                        |  1 +
 kernel/bpf/core.c                             | 10 ++++
 kernel/bpf/helpers.c                          |  1 +
 kernel/bpf/verifier.c                         | 17 +++++++
 .../selftests/bpf/prog_tests/ctx_rewrite.c    | 44 ----------------
 .../bpf/prog_tests/kptr_xchg_inline.c         | 51 +++++++++++++++++++
 .../selftests/bpf/progs/kptr_xchg_inline.c    | 28 ++++++++++
 tools/testing/selftests/bpf/test_verifier.c   | 47 +----------------
 tools/testing/selftests/bpf/testing_helpers.c | 42 +++++++++++++++
 tools/testing/selftests/bpf/testing_helpers.h |  6 +++
 11 files changed, 163 insertions(+), 89 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
 create mode 100644 tools/testing/selftests/bpf/progs/kptr_xchg_inline.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v2 1/3] bpf: Support inlining bpf_kptr_xchg() helper
  2023-12-23 10:40 [PATCH bpf-next v2 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
@ 2023-12-23 10:40 ` Hou Tao
  2024-01-02 18:15   ` Eduard Zingerman
  2023-12-23 10:40 ` [PATCH bpf-next v2 2/3] selftests/bpf: Factor out get_xlated_program() helper Hou Tao
  2023-12-23 10:40 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
  2 siblings, 1 reply; 8+ messages in thread
From: Hou Tao @ 2023-12-23 10:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

The motivation of inlining bpf_kptr_xchg() comes from the performance
profiling of bpf memory allocator benchmark. The benchmark uses
bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
objects for free. After inling bpf_kptr_xchg(), the performance for
object free on 8-CPUs VM increases about 2%~10%. The inline also has
downside: both the kasan and kcsan checks on the pointer will be
unavailable.

bpf_kptr_xchg() can be inlined by converting the calling of
bpf_kptr_xchg() into an atomic_xchg() instruction. But the conversion
depends on two conditions:
1) JIT backend supports atomic_xchg() on pointer-sized word
2) For the specific arch, the implementation of xchg is the same as
   atomic_xchg() on pointer-sized words.

It seems most 64-bit JIT backends satisfies these two conditions. But
as a precaution, defining a weak function bpf_jit_supports_ptr_xchg()
to state whether such conversion is safe and only supporting inline for
64-bit host.

For x86-64, it supports BPF_XCHG atomic operation and both xchg() and
atomic_xchg() use arch_xchg() to implement the exchange, so enabling the
inline of bpf_kptr_xchg() on x86-64 first.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 arch/x86/net/bpf_jit_comp.c |  5 +++++
 include/linux/filter.h      |  1 +
 kernel/bpf/core.c           | 10 ++++++++++
 kernel/bpf/helpers.c        |  1 +
 kernel/bpf/verifier.c       | 17 +++++++++++++++++
 5 files changed, 34 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8b7f4de69917..f798e2c22cd6 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3247,3 +3247,8 @@ void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
 		BUG_ON(ret < 0);
 	}
 }
+
+bool bpf_jit_supports_ptr_xchg(void)
+{
+	return true;
+}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 68fb6c8142fe..35f067fd3840 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -955,6 +955,7 @@ bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_kfunc_call(void);
 bool bpf_jit_supports_far_kfunc_call(void);
 bool bpf_jit_supports_exceptions(void);
+bool bpf_jit_supports_ptr_xchg(void);
 void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
 bool bpf_helper_changes_pkt_data(void *func);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ea6843be2616..fbb1d95a9b44 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2925,6 +2925,16 @@ bool __weak bpf_jit_supports_far_kfunc_call(void)
 	return false;
 }
 
+/* Return TRUE if the JIT backend satisfies the following two conditions:
+ * 1) JIT backend supports atomic_xchg() on pointer-sized words.
+ * 2) Under the specific arch, the implementation of xchg() is the same
+ *    as atomic_xchg() on pointer-sized words.
+ */
+bool __weak bpf_jit_supports_ptr_xchg(void)
+{
+	return false;
+}
+
 /* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
  * skb_copy_bits(), so provide a weak definition of it for NET-less config.
  */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index be72824f32b2..e04ca1af8927 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1414,6 +1414,7 @@ BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
 {
 	unsigned long *kptr = map_value;
 
+	/* This helper may be inlined by verifier. */
 	return xchg(kptr, (unsigned long)ptr);
 }
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a376eb609c41..69895194b3c2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19790,6 +19790,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			continue;
 		}
 
+		/* Implement bpf_kptr_xchg inline */
+		if (prog->jit_requested && BITS_PER_LONG == 64 &&
+		    insn->imm == BPF_FUNC_kptr_xchg &&
+		    bpf_jit_supports_ptr_xchg()) {
+			insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
+			insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
+			cnt = 2;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += cnt - 1;
+			env->prog = prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+			continue;
+		}
 patch_call_imm:
 		fn = env->ops->get_func_proto(insn->imm, env->prog);
 		/* all functions that have prototype and verifier allowed
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v2 2/3] selftests/bpf: Factor out get_xlated_program() helper
  2023-12-23 10:40 [PATCH bpf-next v2 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
  2023-12-23 10:40 ` [PATCH bpf-next v2 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
@ 2023-12-23 10:40 ` Hou Tao
  2024-01-02 18:16   ` Eduard Zingerman
  2023-12-23 10:40 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
  2 siblings, 1 reply; 8+ messages in thread
From: Hou Tao @ 2023-12-23 10:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

Both test_verifier and test_progs use get_xlated_program(), so moving
the helper into testing_helpers.h to reuse it.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../selftests/bpf/prog_tests/ctx_rewrite.c    | 44 -----------------
 tools/testing/selftests/bpf/test_verifier.c   | 47 +------------------
 tools/testing/selftests/bpf/testing_helpers.c | 42 +++++++++++++++++
 tools/testing/selftests/bpf/testing_helpers.h |  6 +++
 4 files changed, 50 insertions(+), 89 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
index 4951aa978f33..3b7c57fe55a5 100644
--- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
+++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
@@ -626,50 +626,6 @@ static bool match_pattern(struct btf *btf, char *pattern, char *text, char *reg_
 	return false;
 }
 
-/* Request BPF program instructions after all rewrites are applied,
- * e.g. verifier.c:convert_ctx_access() is done.
- */
-static int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt)
-{
-	struct bpf_prog_info info = {};
-	__u32 info_len = sizeof(info);
-	__u32 xlated_prog_len;
-	__u32 buf_element_size = sizeof(struct bpf_insn);
-
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("bpf_prog_get_info_by_fd failed");
-		return -1;
-	}
-
-	xlated_prog_len = info.xlated_prog_len;
-	if (xlated_prog_len % buf_element_size) {
-		printf("Program length %d is not multiple of %d\n",
-		       xlated_prog_len, buf_element_size);
-		return -1;
-	}
-
-	*cnt = xlated_prog_len / buf_element_size;
-	*buf = calloc(*cnt, buf_element_size);
-	if (!buf) {
-		perror("can't allocate xlated program buffer");
-		return -ENOMEM;
-	}
-
-	bzero(&info, sizeof(info));
-	info.xlated_prog_len = xlated_prog_len;
-	info.xlated_prog_insns = (__u64)(unsigned long)*buf;
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("second bpf_prog_get_info_by_fd failed");
-		goto out_free_buf;
-	}
-
-	return 0;
-
-out_free_buf:
-	free(*buf);
-	return -1;
-}
-
 static void print_insn(void *private_data, const char *fmt, ...)
 {
 	va_list args;
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index f36e41435be7..87519d7fe4c6 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1341,48 +1341,6 @@ static bool cmp_str_seq(const char *log, const char *exp)
 	return true;
 }
 
-static struct bpf_insn *get_xlated_program(int fd_prog, int *cnt)
-{
-	__u32 buf_element_size = sizeof(struct bpf_insn);
-	struct bpf_prog_info info = {};
-	__u32 info_len = sizeof(info);
-	__u32 xlated_prog_len;
-	struct bpf_insn *buf;
-
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("bpf_prog_get_info_by_fd failed");
-		return NULL;
-	}
-
-	xlated_prog_len = info.xlated_prog_len;
-	if (xlated_prog_len % buf_element_size) {
-		printf("Program length %d is not multiple of %d\n",
-		       xlated_prog_len, buf_element_size);
-		return NULL;
-	}
-
-	*cnt = xlated_prog_len / buf_element_size;
-	buf = calloc(*cnt, buf_element_size);
-	if (!buf) {
-		perror("can't allocate xlated program buffer");
-		return NULL;
-	}
-
-	bzero(&info, sizeof(info));
-	info.xlated_prog_len = xlated_prog_len;
-	info.xlated_prog_insns = (__u64)(unsigned long)buf;
-	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
-		perror("second bpf_prog_get_info_by_fd failed");
-		goto out_free_buf;
-	}
-
-	return buf;
-
-out_free_buf:
-	free(buf);
-	return NULL;
-}
-
 static bool is_null_insn(struct bpf_insn *insn)
 {
 	struct bpf_insn null_insn = {};
@@ -1505,7 +1463,7 @@ static void print_insn(struct bpf_insn *buf, int cnt)
 static bool check_xlated_program(struct bpf_test *test, int fd_prog)
 {
 	struct bpf_insn *buf;
-	int cnt;
+	unsigned int cnt;
 	bool result = true;
 	bool check_expected = !is_null_insn(test->expected_insns);
 	bool check_unexpected = !is_null_insn(test->unexpected_insns);
@@ -1513,8 +1471,7 @@ static bool check_xlated_program(struct bpf_test *test, int fd_prog)
 	if (!check_expected && !check_unexpected)
 		goto out;
 
-	buf = get_xlated_program(fd_prog, &cnt);
-	if (!buf) {
+	if (get_xlated_program(fd_prog, &buf, &cnt)) {
 		printf("FAIL: can't get xlated program\n");
 		result = false;
 		goto out;
diff --git a/tools/testing/selftests/bpf/testing_helpers.c b/tools/testing/selftests/bpf/testing_helpers.c
index d2458c1b1671..53c40f62fdcb 100644
--- a/tools/testing/selftests/bpf/testing_helpers.c
+++ b/tools/testing/selftests/bpf/testing_helpers.c
@@ -387,3 +387,45 @@ int kern_sync_rcu(void)
 {
 	return syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0, 0);
 }
+
+int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt)
+{
+	__u32 buf_element_size = sizeof(struct bpf_insn);
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	__u32 xlated_prog_len;
+
+	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
+		perror("bpf_prog_get_info_by_fd failed");
+		return -1;
+	}
+
+	xlated_prog_len = info.xlated_prog_len;
+	if (xlated_prog_len % buf_element_size) {
+		printf("Program length %u is not multiple of %u\n",
+		       xlated_prog_len, buf_element_size);
+		return -1;
+	}
+
+	*cnt = xlated_prog_len / buf_element_size;
+	*buf = calloc(*cnt, buf_element_size);
+	if (!buf) {
+		perror("can't allocate xlated program buffer");
+		return -ENOMEM;
+	}
+
+	bzero(&info, sizeof(info));
+	info.xlated_prog_len = xlated_prog_len;
+	info.xlated_prog_insns = (__u64)(unsigned long)*buf;
+	if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
+		perror("second bpf_prog_get_info_by_fd failed");
+		goto out_free_buf;
+	}
+
+	return 0;
+
+out_free_buf:
+	free(*buf);
+	*buf = NULL;
+	return -1;
+}
diff --git a/tools/testing/selftests/bpf/testing_helpers.h b/tools/testing/selftests/bpf/testing_helpers.h
index 35284faff4f2..1ed5b9200d66 100644
--- a/tools/testing/selftests/bpf/testing_helpers.h
+++ b/tools/testing/selftests/bpf/testing_helpers.h
@@ -46,4 +46,10 @@ static inline __u64 get_time_ns(void)
 	return (u64)t.tv_sec * 1000000000 + t.tv_nsec;
 }
 
+struct bpf_insn;
+/* Request BPF program instructions after all rewrites are applied,
+ * e.g. verifier.c:convert_ctx_access() is done.
+ */
+int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt);
+
 #endif /* __TESTING_HELPERS_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v2 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
  2023-12-23 10:40 [PATCH bpf-next v2 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
  2023-12-23 10:40 ` [PATCH bpf-next v2 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
  2023-12-23 10:40 ` [PATCH bpf-next v2 2/3] selftests/bpf: Factor out get_xlated_program() helper Hou Tao
@ 2023-12-23 10:40 ` Hou Tao
  2024-01-02 18:41   ` Eduard Zingerman
  2 siblings, 1 reply; 8+ messages in thread
From: Hou Tao @ 2023-12-23 10:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Eduard Zingerman,
	houtao1

From: Hou Tao <houtao1@huawei.com>

The test uses bpf_prog_get_info_by_fd() to obtain the xlated
instructions of the program first. Since these instructions have
already been rewritten by the verifier, the tests then checks whether
the rewritten instructions are as expected.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../bpf/prog_tests/kptr_xchg_inline.c         | 51 +++++++++++++++++++
 .../selftests/bpf/progs/kptr_xchg_inline.c    | 28 ++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
 create mode 100644 tools/testing/selftests/bpf/progs/kptr_xchg_inline.c

diff --git a/tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c b/tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
new file mode 100644
index 000000000000..5a4bee1cf970
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#include <test_progs.h>
+
+#include "linux/filter.h"
+#include "kptr_xchg_inline.skel.h"
+
+void test_kptr_xchg_inline(void)
+{
+	struct kptr_xchg_inline *skel;
+	struct bpf_insn *insn = NULL;
+	struct bpf_insn exp;
+	unsigned int cnt;
+	int err;
+
+#if !defined(__x86_64__)
+	test__skip();
+	return;
+#endif
+
+	skel = kptr_xchg_inline__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_load"))
+		return;
+
+	err = get_xlated_program(bpf_program__fd(skel->progs.kptr_xchg_inline), &insn, &cnt);
+	if (!ASSERT_OK(err, "prog insn"))
+		goto out;
+
+	/* The original instructions are:
+	 * r1 = map[id:xxx][0]+0
+	 * r2 = 0
+	 * call bpf_kptr_xchg#yyy
+	 *
+	 * call bpf_kptr_xchg#yyy will be inlined as:
+	 * r0 = r2
+	 * r0 = atomic64_xchg((u64 *)(r1 +0), r0)
+	 */
+	if (!ASSERT_GT(cnt, 5, "insn cnt"))
+		goto out;
+
+	exp = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
+	if (!ASSERT_OK(memcmp(&insn[3], &exp, sizeof(exp)), "mov"))
+		goto out;
+
+	exp = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
+	if (!ASSERT_OK(memcmp(&insn[4], &exp, sizeof(exp)), "xchg"))
+		goto out;
+out:
+	free(insn);
+	kptr_xchg_inline__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/kptr_xchg_inline.c b/tools/testing/selftests/bpf/progs/kptr_xchg_inline.c
new file mode 100644
index 000000000000..1db7909828d1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kptr_xchg_inline.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <bpf/bpf_helpers.h>
+
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct bin_data {
+	char blob[32];
+};
+
+#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
+private(kptr) struct bin_data __kptr * ptr;
+
+SEC("tc")
+int kptr_xchg_inline(void *ctx)
+{
+	void *old;
+
+	old = bpf_kptr_xchg(&ptr, NULL);
+	if (old)
+		bpf_obj_drop(old);
+
+	return 0;
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next v2 1/3] bpf: Support inlining bpf_kptr_xchg() helper
  2023-12-23 10:40 ` [PATCH bpf-next v2 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
@ 2024-01-02 18:15   ` Eduard Zingerman
  0 siblings, 0 replies; 8+ messages in thread
From: Eduard Zingerman @ 2024-01-02 18:15 UTC (permalink / raw)
  To: Hou Tao, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

On Sat, 2023-12-23 at 18:40 +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> The motivation of inlining bpf_kptr_xchg() comes from the performance
> profiling of bpf memory allocator benchmark. The benchmark uses
> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
> objects for free. After inling bpf_kptr_xchg(), the performance for
> object free on 8-CPUs VM increases about 2%~10%. The inline also has
> downside: both the kasan and kcsan checks on the pointer will be
> unavailable.
> 
> bpf_kptr_xchg() can be inlined by converting the calling of
> bpf_kptr_xchg() into an atomic_xchg() instruction. But the conversion
> depends on two conditions:
> 1) JIT backend supports atomic_xchg() on pointer-sized word
> 2) For the specific arch, the implementation of xchg is the same as
>    atomic_xchg() on pointer-sized words.
> 
> It seems most 64-bit JIT backends satisfies these two conditions. But
> as a precaution, defining a weak function bpf_jit_supports_ptr_xchg()
> to state whether such conversion is safe and only supporting inline for
> 64-bit host.
> 
> For x86-64, it supports BPF_XCHG atomic operation and both xchg() and
> atomic_xchg() use arch_xchg() to implement the exchange, so enabling the
> inline of bpf_kptr_xchg() on x86-64 first.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next v2 2/3] selftests/bpf: Factor out get_xlated_program() helper
  2023-12-23 10:40 ` [PATCH bpf-next v2 2/3] selftests/bpf: Factor out get_xlated_program() helper Hou Tao
@ 2024-01-02 18:16   ` Eduard Zingerman
  0 siblings, 0 replies; 8+ messages in thread
From: Eduard Zingerman @ 2024-01-02 18:16 UTC (permalink / raw)
  To: Hou Tao, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

On Sat, 2023-12-23 at 18:40 +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Both test_verifier and test_progs use get_xlated_program(), so moving
> the helper into testing_helpers.h to reuse it.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next v2 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
  2023-12-23 10:40 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
@ 2024-01-02 18:41   ` Eduard Zingerman
  2024-01-03  1:20     ` Hou Tao
  0 siblings, 1 reply; 8+ messages in thread
From: Eduard Zingerman @ 2024-01-02 18:41 UTC (permalink / raw)
  To: Hou Tao, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

On Sat, 2023-12-23 at 18:40 +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> The test uses bpf_prog_get_info_by_fd() to obtain the xlated
> instructions of the program first. Since these instructions have
> already been rewritten by the verifier, the tests then checks whether
> the rewritten instructions are as expected.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>

Thank you for adding this test, one nitpick below.

[...]

> +#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
> +private(kptr) struct bin_data __kptr * ptr;
> +
> +SEC("tc")
> +int kptr_xchg_inline(void *ctx)
> +{
> +	void *old;
> +
> +	old = bpf_kptr_xchg(&ptr, NULL);
> +	if (old)
> +		bpf_obj_drop(old);
> +
> +	return 0;
> +}

This is highly unlikely, but in theory nothing guarantees that LLVM
would generate code exactly as expected by pattern in test_kptr_xchg_inline().
It would be more fail-proof to use inline assembly and a naked
function instead of C code, e.g.:

SEC("tc")
__naked int kptr_xchg_inline(void)
{
	asm volatile (
		"r1 = %[ptr] ll;"
		"r2 = 0;"
		"call %[bpf_kptr_xchg];"
		"if r0 == 0 goto 1f;"
		"r1 = r0;"
		"r2 = 0;"
		"call %[bpf_obj_drop_impl];"
"1:"
		"r0 = 0;"
		"exit;"
		:
		: __imm_addr(ptr),
		  __imm(bpf_kptr_xchg),
		  __imm(bpf_obj_drop_impl)
		: __clobber_all
	);
}

/* BTF FUNC records are not generated for kfuncs referenced
 * from inline assembly. These records are necessary for
 * libbpf to link the program. The function below is a hack
 * to ensure that BTF FUNC records are generated.
 */
void __btf_root(void)
{
	bpf_obj_drop(NULL);
}

wdyt?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next v2 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
  2024-01-02 18:41   ` Eduard Zingerman
@ 2024-01-03  1:20     ` Hou Tao
  0 siblings, 0 replies; 8+ messages in thread
From: Hou Tao @ 2024-01-03  1:20 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

Hi,

On 1/3/2024 2:41 AM, Eduard Zingerman wrote:
> On Sat, 2023-12-23 at 18:40 +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> The test uses bpf_prog_get_info_by_fd() to obtain the xlated
>> instructions of the program first. Since these instructions have
>> already been rewritten by the verifier, the tests then checks whether
>> the rewritten instructions are as expected.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> Thank you for adding this test, one nitpick below.
>
> [...]
>
>> +#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
>> +private(kptr) struct bin_data __kptr * ptr;
>> +
>> +SEC("tc")
>> +int kptr_xchg_inline(void *ctx)
>> +{
>> +	void *old;
>> +
>> +	old = bpf_kptr_xchg(&ptr, NULL);
>> +	if (old)
>> +		bpf_obj_drop(old);
>> +
>> +	return 0;
>> +}
> This is highly unlikely, but in theory nothing guarantees that LLVM
> would generate code exactly as expected by pattern in test_kptr_xchg_inline().
> It would be more fail-proof to use inline assembly and a naked
> function instead of C code, e.g.:
>
> SEC("tc")
> __naked int kptr_xchg_inline(void)
> {
> 	asm volatile (
> 		"r1 = %[ptr] ll;"
> 		"r2 = 0;"
> 		"call %[bpf_kptr_xchg];"
> 		"if r0 == 0 goto 1f;"
> 		"r1 = r0;"
> 		"r2 = 0;"
> 		"call %[bpf_obj_drop_impl];"
> "1:"
> 		"r0 = 0;"
> 		"exit;"
> 		:
> 		: __imm_addr(ptr),
> 		  __imm(bpf_kptr_xchg),
> 		  __imm(bpf_obj_drop_impl)
> 		: __clobber_all
> 	);
> }
>
> /* BTF FUNC records are not generated for kfuncs referenced
>  * from inline assembly. These records are necessary for
>  * libbpf to link the program. The function below is a hack
>  * to ensure that BTF FUNC records are generated.
>  */
> void __btf_root(void)
> {
> 	bpf_obj_drop(NULL);
> }
>
> wdyt?

Sure, will update the C code to the inline assembly in v3. And thanks
for the review and the suggestions.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-01-03  1:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-23 10:40 [PATCH bpf-next v2 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
2023-12-23 10:40 ` [PATCH bpf-next v2 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
2024-01-02 18:15   ` Eduard Zingerman
2023-12-23 10:40 ` [PATCH bpf-next v2 2/3] selftests/bpf: Factor out get_xlated_program() helper Hou Tao
2024-01-02 18:16   ` Eduard Zingerman
2023-12-23 10:40 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() Hou Tao
2024-01-02 18:41   ` Eduard Zingerman
2024-01-03  1:20     ` Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox