[PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue

BPF List
 help / color / mirror / Atom feed

* [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue
@ 2024-08-21 23:34 Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops Martin KaFai Lau
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

This set allows the subsystem to patch codes before BPF_EXIT.
The verifier ops, .gen_epilogue, is added for this purpose.
One of the use case will be in the bpf qdisc, the bpf qdisc
subsystem can ensure the skb->dev is in the correct value.
The bpf qdisc subsystem can either inline fixing it in the
epilogue or call another kfunc to handle (e.g. drop) it in
the epilogue. Another use case could be in bpf_tcp_ca.c to
enforce snd_cwnd has sane value (e.g. non zero).

The existing .gen_prologue can call bpf helper.
This set also allows the existing .gen_prologue and the new
.gen_epilogue to call kfunc. Other than the skb drop
mentioned above, the bpf qdisc subsystem can call
kfunc to enforce initializing / releasing resources (e.g.
qdisc_watchdog_init/cancel) in some of the qdisc_ops.

v2:
 * Remove the RFC tag. Keep the ordering at where .gen_epilogue is
   called in the verifier relative to the check_max_stack_depth().
   This will be consistent with the other extra stack_depth
   usage like optimize_bpf_loop().
 * Use __xlated check provided by the test_loader to
   check the patched instructions after gen_pro/epilogue (Eduard).
 * Added Patch 3 by Eduard (Thanks!).

Eduard Zingerman (1):
  selftests/bpf: attach struct_ops maps before test prog runs

Martin KaFai Lau (7):
  bpf: Add gen_epilogue to bpf_verifier_ops
  bpf: Export bpf_base_func_proto
  selftests/bpf: Test gen_prologue and gen_epilogue
  selftests/bpf: Add tailcall epilogue test
  bpf: Add module parameter to gen_prologue and gen_epilogue
  bpf: Allow pro/epilogue to call kfunc
  selftests/bpf: Add kfunc call test in gen_prologue and gen_epilogue

 include/linux/bpf.h                           |   4 +-
 include/linux/btf.h                           |   1 +
 kernel/bpf/btf.c                              |   2 +-
 kernel/bpf/cgroup.c                           |   3 +-
 kernel/bpf/helpers.c                          |   1 +
 kernel/bpf/verifier.c                         | 145 ++++++++++-
 net/core/filter.c                             |   6 +-
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 228 ++++++++++++++++++
 .../selftests/bpf/bpf_testmod/bpf_testmod.h   |  11 +
 .../bpf/bpf_testmod/bpf_testmod_kfunc.h       |   6 +
 .../selftests/bpf/prog_tests/pro_epilogue.c   |  50 ++++
 .../selftests/bpf/progs/epilogue_tailcall.c   |  58 +++++
 .../selftests/bpf/progs/pro_epilogue_kfunc.c  | 174 +++++++++++++
 .../bpf/progs/pro_epilogue_subprog.c          | 143 +++++++++++
 tools/testing/selftests/bpf/test_loader.c     |  27 +++
 15 files changed, 850 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
 create mode 100644 tools/testing/selftests/bpf/progs/epilogue_tailcall.c
 create mode 100644 tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c

-- 
2.43.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  2024-08-22  0:22   ` Alexei Starovoitov
  2024-08-21 23:34 ` [PATCH v2 bpf-next 2/8] bpf: Export bpf_base_func_proto Martin KaFai Lau
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

This patch adds a .gen_epilogue to the bpf_verifier_ops. It is similar
to the existing .gen_prologue. Instead of allowing a subsystem
to run code at the beginning of a bpf prog, it allows the subsystem
to run code just before the bpf prog exit.

One of the use case is to allow the upcoming bpf qdisc to ensure that
the skb->dev is the same as the qdisc->dev_queue->dev. The bpf qdisc
struct_ops implementation could either fix it up or drop the skb.
Another use case could be in bpf_tcp_ca.c to enforce snd_cwnd
has sane value (e.g. non zero).

The epilogue can do the useful thing (like checking skb->dev) if it
can access the bpf prog's ctx. Unlike prologue, r1 may not hold the
ctx pointer. This patch saves the r1 in the stack if the .gen_epilogue
has returned some instructions in the "epilogue_buf".

The existing .gen_prologue is done in convert_ctx_accesses().
The new .gen_epilogue is done in the convert_ctx_accesses() also.
When it sees the (BPF_JMP | BPF_EXIT) instruction, it will be patched
with the earlier generated "epilogue_buf". The epilogue patching is
only done for the main prog.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 include/linux/bpf.h   |  2 ++
 kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f0192c173ed8..8ee9d87c332a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -974,6 +974,8 @@ struct bpf_verifier_ops {
 				struct bpf_insn_access_aux *info);
 	int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
 			    const struct bpf_prog *prog);
+	int (*gen_epilogue)(struct bpf_insn *insn, const struct bpf_prog *prog,
+			    s16 ctx_stack_off);
 	int (*gen_ld_abs)(const struct bpf_insn *orig,
 			  struct bpf_insn *insn_buf);
 	u32 (*convert_ctx_access)(enum bpf_access_type type,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index df3be12096cf..bbb655f0c7b5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19610,15 +19610,37 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
  */
 static int convert_ctx_accesses(struct bpf_verifier_env *env)
 {
+	struct bpf_subprog_info *subprogs = env->subprog_info;
 	const struct bpf_verifier_ops *ops = env->ops;
-	int i, cnt, size, ctx_field_size, delta = 0;
+	int i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0;
 	const int insn_cnt = env->prog->len;
-	struct bpf_insn insn_buf[16], *insn;
+	struct bpf_insn insn_buf[16], epilogue_buf[16], *insn;
 	u32 target_size, size_default, off;
 	struct bpf_prog *new_prog;
 	enum bpf_access_type type;
 	bool is_narrower_load;
 
+	if (ops->gen_epilogue) {
+		epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog,
+						 -(subprogs[0].stack_depth + 8));
+		if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) {
+			verbose(env, "bpf verifier is misconfigured\n");
+			return -EINVAL;
+		} else if (epilogue_cnt) {
+			/* Save the ARG_PTR_TO_CTX for the epilogue to use */
+			cnt = 0;
+			subprogs[0].stack_depth += 8;
+			insn_buf[cnt++] = BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_1,
+						      -subprogs[0].stack_depth);
+			insn_buf[cnt++] = env->prog->insnsi[0];
+			new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+			env->prog = new_prog;
+			delta += cnt - 1;
+		}
+	}
+
 	if (ops->gen_prologue || env->seen_direct_write) {
 		if (!ops->gen_prologue) {
 			verbose(env, "bpf verifier is misconfigured\n");
@@ -19671,6 +19693,13 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code);
 			env->prog->aux->num_exentries++;
 			continue;
+		} else if (insn->code == (BPF_JMP | BPF_EXIT) &&
+			   epilogue_cnt &&
+			   i + delta < subprogs[1].start) {
+			/* Generate epilogue for the main prog */
+			memcpy(insn_buf, epilogue_buf, sizeof(epilogue_buf));
+			cnt = epilogue_cnt;
+			goto patch_insn_buf;
 		} else {
 			continue;
 		}
@@ -19807,6 +19836,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 						       insn->dst_reg, insn->dst_reg,
 						       size * 8, 0);
 
+patch_insn_buf:
 		new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
 		if (!new_prog)
 			return -ENOMEM;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 2/8] bpf: Export bpf_base_func_proto
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 3/8] selftests/bpf: attach struct_ops maps before test prog runs Martin KaFai Lau
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

The bpf_testmod needs to use the bpf_tail_call helper in
a later selftest patch. This patch is to EXPORT_GPL_SYMBOL
the bpf_base_func_proto.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 kernel/bpf/helpers.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 12e3aa40b180..84c2034a2f53 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2034,6 +2034,7 @@ bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return NULL;
 	}
 }
+EXPORT_SYMBOL_GPL(bpf_base_func_proto);
 
 void bpf_list_head_free(const struct btf_field *field, void *list_head,
 			struct bpf_spin_lock *spin_lock)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 3/8] selftests/bpf: attach struct_ops maps before test prog runs
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 2/8] bpf: Export bpf_base_func_proto Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 4/8] selftests/bpf: Test gen_prologue and gen_epilogue Martin KaFai Lau
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Eduard Zingerman <eddyz87@gmail.com>

In test_loader based tests to bpf_map__attach_struct_ops()
before call to bpf_prog_test_run_opts() in order to trigger
bpf_struct_ops->reg() callbacks on kernel side.
This allows to use __retval macro for struct_ops tests.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 tools/testing/selftests/bpf/test_loader.c | 27 +++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index 12b0c41e8d64..01feba554a41 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -729,11 +729,13 @@ void run_subtest(struct test_loader *tester,
 {
 	struct test_subspec *subspec = unpriv ? &spec->unpriv : &spec->priv;
 	struct bpf_program *tprog = NULL, *tprog_iter;
+	struct bpf_link *link, *links[32] = {};
 	struct test_spec *spec_iter;
 	struct cap_state caps = {};
 	struct bpf_object *tobj;
 	struct bpf_map *map;
 	int retval, err, i;
+	int links_cnt = 0;
 	bool should_load;
 
 	if (!test__start_subtest(subspec->name))
@@ -823,6 +825,26 @@ void run_subtest(struct test_loader *tester,
 		if (restore_capabilities(&caps))
 			goto tobj_cleanup;
 
+		/* Do bpf_map__attach_struct_ops() for each struct_ops map.
+		 * This should trigger bpf_struct_ops->reg callback on kernel side.
+		 */
+		bpf_object__for_each_map(map, tobj) {
+			if (!bpf_map__autocreate(map) ||
+			    bpf_map__type(map) != BPF_MAP_TYPE_STRUCT_OPS)
+				continue;
+			if (links_cnt >= ARRAY_SIZE(links)) {
+				PRINT_FAIL("too many struct_ops maps");
+				goto tobj_cleanup;
+			}
+			link = bpf_map__attach_struct_ops(map);
+			if (!link) {
+				PRINT_FAIL("bpf_map__attach_struct_ops failed for map %s: err=%d\n",
+					   bpf_map__name(map), err);
+				goto tobj_cleanup;
+			}
+			links[links_cnt++] = link;
+		}
+
 		if (tester->pre_execution_cb) {
 			err = tester->pre_execution_cb(tobj);
 			if (err) {
@@ -837,9 +859,14 @@ void run_subtest(struct test_loader *tester,
 			PRINT_FAIL("Unexpected retval: %d != %d\n", retval, subspec->retval);
 			goto tobj_cleanup;
 		}
+		/* redo bpf_map__attach_struct_ops for each test */
+		while (links_cnt > 0)
+			bpf_link__destroy(links[--links_cnt]);
 	}
 
 tobj_cleanup:
+	while (links_cnt > 0)
+		bpf_link__destroy(links[--links_cnt]);
 	bpf_object__close(tobj);
 subtest_cleanup:
 	test__end_subtest();
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 4/8] selftests/bpf: Test gen_prologue and gen_epilogue
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
                   ` (2 preceding siblings ...)
  2024-08-21 23:34 ` [PATCH v2 bpf-next 3/8] selftests/bpf: attach struct_ops maps before test prog runs Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 5/8] selftests/bpf: Add tailcall epilogue test Martin KaFai Lau
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

This test adds a new struct_ops "bpf_testmod_st_ops" in bpf_testmod.
The ops of the bpf_testmod_st_ops is triggered by new kfunc calls
"bpf_kfunc_st_ops_test_*logue". These new kfunc calls are
primarily used by the SEC("syscall") program. The test triggering
sequence is like:
    SEC("syscall")
    syscall_prologue_subprog(struct st_ops_args *args)
        bpf_kfunc_st_op_test_prologue(args)
	    st_ops->test_prologue(args)

.gen_prologue adds 1000 to args->a
.gen_epilogue adds 10000 to args->a
.gen_epilogue will also set the r0 to 2 * args->a.

The .gen_prologue and .gen_epilogue of the bpf_testmod_st_ops
will test the prog->aux->attach_func_name to decide if
it needs to generate codes.

The main programs of the pro_epilogue_subprog.c will call a subprog()
which does "args->a += 1".

The main programs of the pro_epilogue_kfunc.c will call a
new kfunc bpf_kfunc_st_ops_inc10 which does "args->a += 10".

This patch uses the test_loader infra to check the __xlated
instructions patched after gen_prologue and/or gen_epilogue.
The __xlated check is based on Eduard's example (Thanks!) in v1.

args->a is returned by the struct_ops prog (either the main prog
or the epilogue). Thus, the __retval of the SEC("syscall") prog
is checked. For example, when triggering the ops in the
'SEC("struct_ops/test_epilogue_subprog") int test_epilogue_subprog'
The expected args->a is +1 (subprog call) + 10000 (.gen_epilogue) = 10001.
The expected return value is 2 * 10001 (.gen_epilogue).

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 190 ++++++++++++++++++
 .../selftests/bpf/bpf_testmod/bpf_testmod.h   |  11 +
 .../bpf/bpf_testmod/bpf_testmod_kfunc.h       |   6 +
 .../selftests/bpf/prog_tests/pro_epilogue.c   |  12 ++
 .../selftests/bpf/progs/pro_epilogue_kfunc.c  | 156 ++++++++++++++
 .../bpf/progs/pro_epilogue_subprog.c          | 125 ++++++++++++
 6 files changed, 500 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
 create mode 100644 tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 3687a40b61c6..7194330bdefc 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -17,6 +17,7 @@
 #include <linux/in.h>
 #include <linux/in6.h>
 #include <linux/un.h>
+#include <linux/filter.h>
 #include <net/sock.h>
 #include <linux/namei.h>
 #include "bpf_testmod.h"
@@ -920,6 +921,51 @@ __bpf_kfunc int bpf_kfunc_call_kernel_getpeername(struct addr_args *args)
 	return err;
 }
 
+static DEFINE_MUTEX(st_ops_mutex);
+static struct bpf_testmod_st_ops *st_ops;
+
+__bpf_kfunc int bpf_kfunc_st_ops_test_prologue(struct st_ops_args *args)
+{
+	int ret = -1;
+
+	mutex_lock(&st_ops_mutex);
+	if (st_ops && st_ops->test_prologue)
+		ret = st_ops->test_prologue(args);
+	mutex_unlock(&st_ops_mutex);
+
+	return ret;
+}
+
+__bpf_kfunc int bpf_kfunc_st_ops_test_epilogue(struct st_ops_args *args)
+{
+	int ret = -1;
+
+	mutex_lock(&st_ops_mutex);
+	if (st_ops && st_ops->test_epilogue)
+		ret = st_ops->test_epilogue(args);
+	mutex_unlock(&st_ops_mutex);
+
+	return ret;
+}
+
+__bpf_kfunc int bpf_kfunc_st_ops_test_pro_epilogue(struct st_ops_args *args)
+{
+	int ret = -1;
+
+	mutex_lock(&st_ops_mutex);
+	if (st_ops && st_ops->test_pro_epilogue)
+		ret = st_ops->test_pro_epilogue(args);
+	mutex_unlock(&st_ops_mutex);
+
+	return ret;
+}
+
+__bpf_kfunc int bpf_kfunc_st_ops_inc10(struct st_ops_args *args)
+{
+	args->a += 10;
+	return args->a;
+}
+
 BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
@@ -956,6 +1002,10 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_kernel_sendmsg, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_kfunc_call_sock_sendmsg, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_kfunc_call_kernel_getsockname, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_kfunc_call_kernel_getpeername, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_kfunc_st_ops_test_prologue, KF_TRUSTED_ARGS | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_kfunc_st_ops_test_epilogue, KF_TRUSTED_ARGS | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_kfunc_st_ops_test_pro_epilogue, KF_TRUSTED_ARGS | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_kfunc_st_ops_inc10, KF_TRUSTED_ARGS)
 BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids)
 
 static int bpf_testmod_ops_init(struct btf *btf)
@@ -1075,6 +1125,144 @@ struct bpf_struct_ops bpf_testmod_ops2 = {
 	.owner = THIS_MODULE,
 };
 
+static int bpf_test_mod_st_ops__test_prologue(struct st_ops_args *args)
+{
+	return 0;
+}
+
+static int bpf_test_mod_st_ops__test_epilogue(struct st_ops_args *args)
+{
+	return 0;
+}
+
+static int bpf_test_mod_st_ops__test_pro_epilogue(struct st_ops_args *args)
+{
+	return 0;
+}
+
+static int st_ops_gen_prologue(struct bpf_insn *insn_buf, bool direct_write,
+			       const struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	if (strcmp(prog->aux->attach_func_name, "test_prologue") &&
+	    strcmp(prog->aux->attach_func_name, "test_pro_epilogue"))
+		return 0;
+
+	/* r6 = r1[0]; // r6 will be "struct st_ops *args". r1 is "u64 *ctx".
+	 * r7 = r6->a;
+	 * r7 += 1000;
+	 * r6->a = r7;
+	 */
+	*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0);
+	*insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, offsetof(struct st_ops_args, a));
+	*insn++ = BPF_ALU32_IMM(BPF_ADD, BPF_REG_7, 1000);
+	*insn++ = BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, offsetof(struct st_ops_args, a));
+	*insn++ = prog->insnsi[0];
+
+	return insn - insn_buf;
+}
+
+static int st_ops_gen_epilogue(struct bpf_insn *insn_buf, const struct bpf_prog *prog,
+			       s16 ctx_stack_off)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	if (strcmp(prog->aux->attach_func_name, "test_epilogue") &&
+	    strcmp(prog->aux->attach_func_name, "test_pro_epilogue"))
+		return 0;
+
+	/* r1 = stack[ctx_stack_off]; // r1 will be "u64 *ctx"
+	 * r1 = r1[0]; // r1 will be "struct st_ops *args"
+	 * r6 = r1->a;
+	 * r6 += 10000;
+	 * r1->a = r6;
+	 * r0 = r6;
+	 * r0 *= 2;
+	 * BPF_EXIT;
+	 */
+	*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_FP, ctx_stack_off);
+	*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0);
+	*insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, offsetof(struct st_ops_args, a));
+	*insn++ = BPF_ALU32_IMM(BPF_ADD, BPF_REG_6, 10000);
+	*insn++ = BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6, offsetof(struct st_ops_args, a));
+	*insn++ = BPF_MOV32_REG(BPF_REG_0, BPF_REG_6);
+	*insn++ = BPF_ALU32_IMM(BPF_MUL, BPF_REG_0, 2);
+	*insn++ = BPF_EXIT_INSN();
+
+	return insn - insn_buf;
+}
+
+static int st_ops_btf_struct_access(struct bpf_verifier_log *log,
+				    const struct bpf_reg_state *reg,
+				    int off, int size)
+{
+	if (off < 0 || off + size > sizeof(struct st_ops_args))
+		return -EACCES;
+	return 0;
+}
+
+static const struct bpf_verifier_ops st_ops_verifier_ops = {
+	.is_valid_access = bpf_testmod_ops_is_valid_access,
+	.btf_struct_access = st_ops_btf_struct_access,
+	.gen_prologue = st_ops_gen_prologue,
+	.gen_epilogue = st_ops_gen_epilogue,
+	.get_func_proto = bpf_base_func_proto,
+};
+
+static struct bpf_testmod_st_ops st_ops_cfi_stubs = {
+	.test_prologue = bpf_test_mod_st_ops__test_prologue,
+	.test_epilogue = bpf_test_mod_st_ops__test_epilogue,
+	.test_pro_epilogue = bpf_test_mod_st_ops__test_pro_epilogue,
+};
+
+static int st_ops_reg(void *kdata, struct bpf_link *link)
+{
+	int err = 0;
+
+	mutex_lock(&st_ops_mutex);
+	if (st_ops) {
+		pr_err("st_ops has already been registered\n");
+		err = -EEXIST;
+		goto unlock;
+	}
+	st_ops = kdata;
+
+unlock:
+	mutex_unlock(&st_ops_mutex);
+	return err;
+}
+
+static void st_ops_unreg(void *kdata, struct bpf_link *link)
+{
+	mutex_lock(&st_ops_mutex);
+	st_ops = NULL;
+	mutex_unlock(&st_ops_mutex);
+}
+
+static int st_ops_init(struct btf *btf)
+{
+	return 0;
+}
+
+static int st_ops_init_member(const struct btf_type *t,
+			      const struct btf_member *member,
+			      void *kdata, const void *udata)
+{
+	return 0;
+}
+
+static struct bpf_struct_ops testmod_st_ops = {
+	.verifier_ops = &st_ops_verifier_ops,
+	.init = st_ops_init,
+	.init_member = st_ops_init_member,
+	.reg = st_ops_reg,
+	.unreg = st_ops_unreg,
+	.cfi_stubs = &st_ops_cfi_stubs,
+	.name = "bpf_testmod_st_ops",
+	.owner = THIS_MODULE,
+};
+
 extern int bpf_fentry_test1(int a);
 
 static int bpf_testmod_init(void)
@@ -1092,8 +1280,10 @@ static int bpf_testmod_init(void)
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_testmod_kfunc_set);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_testmod_kfunc_set);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_testmod_kfunc_set);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_testmod_kfunc_set);
 	ret = ret ?: register_bpf_struct_ops(&bpf_bpf_testmod_ops, bpf_testmod_ops);
 	ret = ret ?: register_bpf_struct_ops(&bpf_testmod_ops2, bpf_testmod_ops2);
+	ret = ret ?: register_bpf_struct_ops(&testmod_st_ops, bpf_testmod_st_ops);
 	ret = ret ?: register_btf_id_dtor_kfuncs(bpf_testmod_dtors,
 						 ARRAY_SIZE(bpf_testmod_dtors),
 						 THIS_MODULE);
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h
index fe0d402b0d65..3241a9d796ed 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h
@@ -94,4 +94,15 @@ struct bpf_testmod_ops2 {
 	int (*test_1)(void);
 };
 
+struct st_ops_args {
+	int a;
+};
+
+struct bpf_testmod_st_ops {
+	int (*test_prologue)(struct st_ops_args *args);
+	int (*test_epilogue)(struct st_ops_args *args);
+	int (*test_pro_epilogue)(struct st_ops_args *args);
+	struct module *owner;
+};
+
 #endif /* _BPF_TESTMOD_H */
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
index e587a79f2239..0df429a0edaa 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
@@ -144,4 +144,10 @@ void bpf_kfunc_dynptr_test(struct bpf_dynptr *ptr, struct bpf_dynptr *ptr__nulla
 struct bpf_testmod_ctx *bpf_testmod_ctx_create(int *err) __ksym;
 void bpf_testmod_ctx_release(struct bpf_testmod_ctx *ctx) __ksym;
 
+struct st_ops_args;
+int bpf_kfunc_st_ops_test_prologue(struct st_ops_args *args) __ksym;
+int bpf_kfunc_st_ops_test_epilogue(struct st_ops_args *args) __ksym;
+int bpf_kfunc_st_ops_test_pro_epilogue(struct st_ops_args *args) __ksym;
+int bpf_kfunc_st_ops_inc10(struct st_ops_args *args) __ksym;
+
 #endif /* _BPF_TESTMOD_KFUNC_H */
diff --git a/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c b/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
new file mode 100644
index 000000000000..69e4a5a1756d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include "pro_epilogue_subprog.skel.h"
+#include "pro_epilogue_kfunc.skel.h"
+
+void test_pro_epilogue(void)
+{
+	RUN_TESTS(pro_epilogue_subprog);
+	RUN_TESTS(pro_epilogue_kfunc);
+}
diff --git a/tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c b/tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c
new file mode 100644
index 000000000000..7d1124cf4942
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "../bpf_testmod/bpf_testmod.h"
+#include "../bpf_testmod/bpf_testmod_kfunc.h"
+
+char _license[] SEC("license") = "GPL";
+
+void __kfunc_btf_root(void)
+{
+	struct st_ops_args args = {};
+
+	bpf_kfunc_st_ops_inc10(&args);
+}
+
+static __noinline __used int subprog(struct st_ops_args *args)
+{
+	args->a += 1;
+	return args->a;
+}
+
+__success
+/* prologue */
+__xlated("0: r6 = *(u64 *)(r1 +0)")
+__xlated("1: r7 = *(u32 *)(r6 +0)")
+__xlated("2: w7 += 1000")
+__xlated("3: *(u32 *)(r6 +0) = r7")
+/* main prog */
+__xlated("4: r1 = *(u64 *)(r1 +0)")
+__xlated("5: r6 = r1")
+__xlated("6: call kernel-function")
+__xlated("7: r1 = r6")
+__xlated("8: call pc+1")
+__xlated("9: exit")
+SEC("struct_ops/test_prologue_kfunc")
+__naked int test_prologue_kfunc(void)
+{
+	asm volatile (
+	"r1 = *(u64 *)(r1 +0);"
+	"r6 = r1;"
+	"call %[bpf_kfunc_st_ops_inc10];"
+	"r1 = r6;"
+	"call subprog;"
+	"exit;"
+	:
+	: __imm(bpf_kfunc_st_ops_inc10)
+	: __clobber_all);
+}
+
+__success
+/* save __u64 *ctx to stack */
+__xlated("0: *(u64 *)(r10 -8) = r1")
+/* main prog */
+__xlated("1: r1 = *(u64 *)(r1 +0)")
+__xlated("2: r6 = r1")
+__xlated("3: call kernel-function")
+__xlated("4: r1 = r6")
+__xlated("5: call pc+")
+/* epilogue */
+__xlated("6: r1 = *(u64 *)(r10 -8)")
+__xlated("7: r1 = *(u64 *)(r1 +0)")
+__xlated("8: r6 = *(u32 *)(r1 +0)")
+__xlated("9: w6 += 10000")
+__xlated("10: *(u32 *)(r1 +0) = r6")
+__xlated("11: w0 = w6")
+__xlated("12: w0 *= 2")
+__xlated("13: exit")
+SEC("struct_ops/test_epilogue_kfunc")
+__naked int test_epilogue_kfunc(void)
+{
+	asm volatile (
+	"r1 = *(u64 *)(r1 +0);"
+	"r6 = r1;"
+	"call %[bpf_kfunc_st_ops_inc10];"
+	"r1 = r6;"
+	"call subprog;"
+	"exit;"
+	:
+	: __imm(bpf_kfunc_st_ops_inc10)
+	: __clobber_all);
+}
+
+__success
+/* prologue */
+__xlated("0: r6 = *(u64 *)(r1 +0)")
+__xlated("1: r7 = *(u32 *)(r6 +0)")
+__xlated("2: w7 += 1000")
+__xlated("3: *(u32 *)(r6 +0) = r7")
+/* save __u64 *ctx to stack */
+__xlated("4: *(u64 *)(r10 -8) = r1")
+/* main prog */
+__xlated("5: r1 = *(u64 *)(r1 +0)")
+__xlated("6: r6 = r1")
+__xlated("7: call kernel-function")
+__xlated("8: r1 = r6")
+__xlated("9: call pc+")
+/* epilogue */
+__xlated("10: r1 = *(u64 *)(r10 -8)")
+__xlated("11: r1 = *(u64 *)(r1 +0)")
+__xlated("12: r6 = *(u32 *)(r1 +0)")
+__xlated("13: w6 += 10000")
+__xlated("14: *(u32 *)(r1 +0) = r6")
+__xlated("15: w0 = w6")
+__xlated("16: w0 *= 2")
+__xlated("17: exit")
+SEC("struct_ops/test_pro_epilogue_kfunc")
+__naked int test_pro_epilogue_kfunc(void)
+{
+	asm volatile (
+	"r1 = *(u64 *)(r1 +0);"
+	"r6 = r1;"
+	"call %[bpf_kfunc_st_ops_inc10];"
+	"r1 = r6;"
+	"call subprog;"
+	"exit;"
+	:
+	: __imm(bpf_kfunc_st_ops_inc10)
+	: __clobber_all);
+}
+
+SEC("syscall")
+__retval(1011) /* PROLOGUE_A [1000] + KFUNC_INC10 + SUBPROG_A [1] */
+int syscall_prologue_kfunc(void *ctx)
+{
+	struct st_ops_args args = {};
+
+	return bpf_kfunc_st_ops_test_prologue(&args);
+}
+
+SEC("syscall")
+__retval(20022) /* (KFUNC_INC10 + SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+int syscall_epilogue_kfunc(void *ctx)
+{
+	struct st_ops_args args = {};
+
+	return bpf_kfunc_st_ops_test_epilogue(&args);
+}
+
+SEC("syscall")
+__retval(22022) /* (PROLOGUE_A [1000] + KFUNC_INC10 + SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+int syscall_pro_epilogue_kfunc(void *ctx)
+{
+	struct st_ops_args args = {};
+
+	return bpf_kfunc_st_ops_test_pro_epilogue(&args);
+}
+
+SEC(".struct_ops.link")
+struct bpf_testmod_st_ops pro_epilogue_kfunc = {
+	.test_prologue = (void *)test_prologue_kfunc,
+	.test_epilogue = (void *)test_epilogue_kfunc,
+	.test_pro_epilogue = (void *)test_pro_epilogue_kfunc,
+};
diff --git a/tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c b/tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c
new file mode 100644
index 000000000000..c91b1bf30e37
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "../bpf_testmod/bpf_testmod.h"
+#include "../bpf_testmod/bpf_testmod_kfunc.h"
+
+char _license[] SEC("license") = "GPL";
+
+static __noinline __used int subprog(struct st_ops_args *args)
+{
+	args->a += 1;
+	return args->a;
+}
+
+__success
+/* prologue */
+__xlated("0: r6 = *(u64 *)(r1 +0)")
+__xlated("1: r7 = *(u32 *)(r6 +0)")
+__xlated("2: w7 += 1000")
+__xlated("3: *(u32 *)(r6 +0) = r7")
+/* main prog */
+__xlated("4: r1 = *(u64 *)(r1 +0)")
+__xlated("5: call pc+1")
+__xlated("6: exit")
+SEC("struct_ops/test_prologue_subprog")
+__naked int test_prologue_subprog(void)
+{
+	asm volatile (
+	"r1 = *(u64 *)(r1 +0);"
+	"call subprog;"
+	"exit;"
+	::: __clobber_all);
+}
+
+__success
+/* save __u64 *ctx to stack */
+__xlated("0: *(u64 *)(r10 -8) = r1")
+/* main prog */
+__xlated("1: r1 = *(u64 *)(r1 +0)")
+__xlated("2: call pc+")
+/* epilogue */
+__xlated("3: r1 = *(u64 *)(r10 -8)")
+__xlated("4: r1 = *(u64 *)(r1 +0)")
+__xlated("5: r6 = *(u32 *)(r1 +0)")
+__xlated("6: w6 += 10000")
+__xlated("7: *(u32 *)(r1 +0) = r6")
+__xlated("8: w0 = w6")
+__xlated("9: w0 *= 2")
+__xlated("10: exit")
+SEC("struct_ops/test_epilogue_subprog")
+__naked int test_epilogue_subprog(void)
+{
+	asm volatile (
+	"r1 = *(u64 *)(r1 +0);"
+	"call subprog;"
+	"exit;"
+	::: __clobber_all);
+}
+
+__success
+/* prologue */
+__xlated("0: r6 = *(u64 *)(r1 +0)")
+__xlated("1: r7 = *(u32 *)(r6 +0)")
+__xlated("2: w7 += 1000")
+__xlated("3: *(u32 *)(r6 +0) = r7")
+/* save __u64 *ctx to stack */
+__xlated("4: *(u64 *)(r10 -8) = r1")
+/* main prog */
+__xlated("5: r1 = *(u64 *)(r1 +0)")
+__xlated("6: call pc+")
+/* epilogue */
+__xlated("7: r1 = *(u64 *)(r10 -8)")
+__xlated("8: r1 = *(u64 *)(r1 +0)")
+__xlated("9: r6 = *(u32 *)(r1 +0)")
+__xlated("10: w6 += 10000")
+__xlated("11: *(u32 *)(r1 +0) = r6")
+__xlated("12: w0 = w6")
+__xlated("13: w0 *= 2")
+__xlated("14: exit")
+SEC("struct_ops/test_pro_epilogue_subprog")
+__naked int test_pro_epilogue_subprog(void)
+{
+	asm volatile (
+	"r1 = *(u64 *)(r1 +0);"
+	"call subprog;"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("syscall")
+__retval(1001) /* PROLOGUE_A [1000] + SUBPROG_A [1] */
+int syscall_prologue_subprog(void *ctx)
+{
+	struct st_ops_args args = {};
+
+	return bpf_kfunc_st_ops_test_prologue(&args);
+}
+
+SEC("syscall")
+__retval(20002) /* (SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+int syscall_epilogue_subprog(void *ctx)
+{
+	struct st_ops_args args = {};
+
+	return bpf_kfunc_st_ops_test_epilogue(&args);
+}
+
+SEC("syscall")
+__retval(22002) /* (PROLOGUE_A [1000] + SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+int syscall_pro_epilogue_subprog(void *ctx)
+{
+	struct st_ops_args args = {};
+
+	return bpf_kfunc_st_ops_test_pro_epilogue(&args);
+}
+
+SEC(".struct_ops.link")
+struct bpf_testmod_st_ops pro_epilogue_subprog = {
+	.test_prologue = (void *)test_prologue_subprog,
+	.test_epilogue = (void *)test_epilogue_subprog,
+	.test_pro_epilogue = (void *)test_pro_epilogue_subprog,
+};
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 5/8] selftests/bpf: Add tailcall epilogue test
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
                   ` (3 preceding siblings ...)
  2024-08-21 23:34 ` [PATCH v2 bpf-next 4/8] selftests/bpf: Test gen_prologue and gen_epilogue Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 6/8] bpf: Add module parameter to gen_prologue and gen_epilogue Martin KaFai Lau
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

This patch adds a gen_epilogue test to test a main prog
using a bpf_tail_call.

A non test_loader test is used. The tailcall target program,
"test_epilogue_subprog", needs to be used in a struct_ops map
before it can be loaded. Another struct_ops map is also needed
to host the actual "test_epilogue_tailcall" struct_ops program
that does the bpf_tail_call. The earlier test_loader patch
will attach all struct_ops maps but the bpf_testmod.c does
not support >1 attached struct_ops.

The earlier patch used the test_loader which has already covered
checking for the patched pro/epilogue instructions. This is done
by the __xlated tag.

This patch goes for the regular skel load and syscall test to do
the tailcall test that can also allow to directly pass the
the "struct st_ops_args *args" as ctx_in to the
SEC("syscall") program.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 .../selftests/bpf/prog_tests/pro_epilogue.c   | 38 ++++++++++++
 .../selftests/bpf/progs/epilogue_tailcall.c   | 58 +++++++++++++++++++
 2 files changed, 96 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/epilogue_tailcall.c

diff --git a/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c b/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
index 69e4a5a1756d..98de677c55a9 100644
--- a/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
+++ b/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
@@ -4,9 +4,47 @@
 #include <test_progs.h>
 #include "pro_epilogue_subprog.skel.h"
 #include "pro_epilogue_kfunc.skel.h"
+#include "epilogue_tailcall.skel.h"
+
+struct st_ops_args {
+	int a;
+};
+
+static void test_tailcall(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	struct epilogue_tailcall *skel;
+	struct st_ops_args args;
+	int err, prog_fd;
+
+	skel = epilogue_tailcall__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "epilogue_tailcall__open_and_load"))
+		return;
+
+	topts.ctx_in = &args;
+	topts.ctx_size_in = sizeof(args);
+
+	skel->links.epilogue_tailcall =
+		bpf_map__attach_struct_ops(skel->maps.epilogue_tailcall);
+	if (!ASSERT_OK_PTR(skel->links.epilogue_tailcall, "attach_struct_ops"))
+		goto done;
+
+	/* tailcall prog + gen_epilogue */
+	memset(&args, 0, sizeof(args));
+	prog_fd = bpf_program__fd(skel->progs.syscall_epilogue_tailcall);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "bpf_prog_test_run_opts");
+	ASSERT_EQ(args.a, 10001, "args.a");
+	ASSERT_EQ(topts.retval, 10001 * 2, "topts.retval");
+
+done:
+	epilogue_tailcall__destroy(skel);
+}
 
 void test_pro_epilogue(void)
 {
 	RUN_TESTS(pro_epilogue_subprog);
 	RUN_TESTS(pro_epilogue_kfunc);
+	if (test__start_subtest("tailcall"))
+		test_tailcall();
 }
diff --git a/tools/testing/selftests/bpf/progs/epilogue_tailcall.c b/tools/testing/selftests/bpf/progs/epilogue_tailcall.c
new file mode 100644
index 000000000000..7275dd594de0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/epilogue_tailcall.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "../bpf_testmod/bpf_testmod.h"
+#include "../bpf_testmod/bpf_testmod_kfunc.h"
+
+char _license[] SEC("license") = "GPL";
+
+static __noinline __used int subprog(struct st_ops_args *args)
+{
+	args->a += 1;
+	return args->a;
+}
+
+SEC("struct_ops/test_epilogue_subprog")
+int BPF_PROG(test_epilogue_subprog, struct st_ops_args *args)
+{
+	subprog(args);
+	return args->a;
+}
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+	__array(values, void (void));
+} epilogue_map SEC(".maps") = {
+	.values = {
+		[0] = (void *)&test_epilogue_subprog,
+	}
+};
+
+SEC("struct_ops/test_epilogue_tailcall")
+int test_epilogue_tailcall(unsigned long long *ctx)
+{
+	bpf_tail_call(ctx, &epilogue_map, 0);
+	return 0;
+}
+
+SEC(".struct_ops.link")
+struct bpf_testmod_st_ops epilogue_tailcall = {
+	.test_epilogue = (void *)test_epilogue_tailcall,
+};
+
+SEC(".struct_ops.link")
+struct bpf_testmod_st_ops epilogue_subprog = {
+	.test_epilogue = (void *)test_epilogue_subprog,
+};
+
+SEC("syscall")
+int syscall_epilogue_tailcall(struct st_ops_args *args)
+{
+	return bpf_kfunc_st_ops_test_epilogue(args);
+}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 6/8] bpf: Add module parameter to gen_prologue and gen_epilogue
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
                   ` (4 preceding siblings ...)
  2024-08-21 23:34 ` [PATCH v2 bpf-next 5/8] selftests/bpf: Add tailcall epilogue test Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc Martin KaFai Lau
  2024-08-21 23:34 ` [PATCH v2 bpf-next 8/8] selftests/bpf: Add kfunc call test in gen_prologue and gen_epilogue Martin KaFai Lau
  7 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

This patch adds a "struct module **module" arg to the .gen_prologue
and .gen_epilogue. This will allow the .gen_pro/epilogue to
make kfunc call because the verifer needs to know the kfunc's BTF.
The next patch will figure the kfunc's BTF from the module.
It also exposes the "btf_get_module_btf" function to help
figuring out the btf from a module.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 include/linux/bpf.h                                   | 4 ++--
 include/linux/btf.h                                   | 1 +
 kernel/bpf/btf.c                                      | 2 +-
 kernel/bpf/cgroup.c                                   | 3 ++-
 kernel/bpf/verifier.c                                 | 4 ++--
 net/core/filter.c                                     | 6 +++---
 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c | 4 ++--
 7 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8ee9d87c332a..6d97e57c7801 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -973,9 +973,9 @@ struct bpf_verifier_ops {
 				const struct bpf_prog *prog,
 				struct bpf_insn_access_aux *info);
 	int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
-			    const struct bpf_prog *prog);
+			    const struct bpf_prog *prog, struct module **module);
 	int (*gen_epilogue)(struct bpf_insn *insn, const struct bpf_prog *prog,
-			    s16 ctx_stack_off);
+			    s16 ctx_stack_off, struct module **module);
 	int (*gen_ld_abs)(const struct bpf_insn *orig,
 			  struct bpf_insn *insn_buf);
 	u32 (*convert_ctx_access)(enum bpf_access_type type,
diff --git a/include/linux/btf.h b/include/linux/btf.h
index cffb43133c68..177187fa3819 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -580,6 +580,7 @@ bool btf_is_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_type);
 bool btf_types_are_same(const struct btf *btf1, u32 id1,
 			const struct btf *btf2, u32 id2);
+struct btf *btf_get_module_btf(const struct module *module);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index b12db397303e..6911f8cdb736 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7965,7 +7965,7 @@ struct module *btf_try_get_module(const struct btf *btf)
 /* Returns struct btf corresponding to the struct module.
  * This function can return NULL or ERR_PTR.
  */
-static struct btf *btf_get_module_btf(const struct module *module)
+struct btf *btf_get_module_btf(const struct module *module)
 {
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 	struct btf_module *btf_mod, *tmp;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index e7113d700b87..cbc7df8dfbad 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -2503,7 +2503,8 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
 
 static int cg_sockopt_get_prologue(struct bpf_insn *insn_buf,
 				   bool direct_write,
-				   const struct bpf_prog *prog)
+				   const struct bpf_prog *prog,
+				   struct module **module)
 {
 	/* Nothing to do for sockopt argument. The data is kzalloc'ated.
 	 */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bbb655f0c7b5..5e995b7884fb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19622,7 +19622,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 
 	if (ops->gen_epilogue) {
 		epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog,
-						 -(subprogs[0].stack_depth + 8));
+						 -(subprogs[0].stack_depth + 8), NULL);
 		if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) {
 			verbose(env, "bpf verifier is misconfigured\n");
 			return -EINVAL;
@@ -19647,7 +19647,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			return -EINVAL;
 		}
 		cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
-					env->prog);
+					env->prog, NULL);
 		if (cnt >= ARRAY_SIZE(insn_buf)) {
 			verbose(env, "bpf verifier is misconfigured\n");
 			return -EINVAL;
diff --git a/net/core/filter.c b/net/core/filter.c
index 78a6f746ea0b..65d219e71ae8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8844,7 +8844,7 @@ static bool sock_filter_is_valid_access(int off, int size,
 }
 
 static int bpf_noop_prologue(struct bpf_insn *insn_buf, bool direct_write,
-			     const struct bpf_prog *prog)
+			     const struct bpf_prog *prog, struct module **module)
 {
 	/* Neither direct read nor direct write requires any preliminary
 	 * action.
@@ -8927,7 +8927,7 @@ static int bpf_gen_ld_abs(const struct bpf_insn *orig,
 }
 
 static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
-			       const struct bpf_prog *prog)
+			       const struct bpf_prog *prog, struct module **module)
 {
 	return bpf_unclone_prologue(insn_buf, direct_write, prog, TC_ACT_SHOT);
 }
@@ -9263,7 +9263,7 @@ static bool sock_ops_is_valid_access(int off, int size,
 }
 
 static int sk_skb_prologue(struct bpf_insn *insn_buf, bool direct_write,
-			   const struct bpf_prog *prog)
+			   const struct bpf_prog *prog, struct module **module)
 {
 	return bpf_unclone_prologue(insn_buf, direct_write, prog, SK_DROP);
 }
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 7194330bdefc..4c75346376d9 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -1141,7 +1141,7 @@ static int bpf_test_mod_st_ops__test_pro_epilogue(struct st_ops_args *args)
 }
 
 static int st_ops_gen_prologue(struct bpf_insn *insn_buf, bool direct_write,
-			       const struct bpf_prog *prog)
+			       const struct bpf_prog *prog, struct module **module)
 {
 	struct bpf_insn *insn = insn_buf;
 
@@ -1164,7 +1164,7 @@ static int st_ops_gen_prologue(struct bpf_insn *insn_buf, bool direct_write,
 }
 
 static int st_ops_gen_epilogue(struct bpf_insn *insn_buf, const struct bpf_prog *prog,
-			       s16 ctx_stack_off)
+			       s16 ctx_stack_off, struct module **module)
 {
 	struct bpf_insn *insn = insn_buf;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
                   ` (5 preceding siblings ...)
  2024-08-21 23:34 ` [PATCH v2 bpf-next 6/8] bpf: Add module parameter to gen_prologue and gen_epilogue Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  2024-08-22  1:32   ` Alexei Starovoitov
  2024-08-21 23:34 ` [PATCH v2 bpf-next 8/8] selftests/bpf: Add kfunc call test in gen_prologue and gen_epilogue Martin KaFai Lau
  7 siblings, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

The existing prologue has been able to call bpf helper but not a kfunc.
This patch allows the prologue/epilogue to call the kfunc.

The subsystem that implements the .gen_prologue and .gen_epilogue
can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
set to the btf func_id of the kfunc call. This part is the same
as the bpf prog loaded from the sys_bpf.

Another piece is to have a way for the subsystem to tell the btf object
of the kfunc func_id. This patch uses the "struct module **module"
argument added to the .gen_prologue and .gen_epilogue
in the previous patch. The verifier will use btf_get_module_btf(module)
to find out the btf object.

The .gen_epi/prologue will usually use THIS_MODULE to initialize
the "*module = THIS_MODULE". Only kfunc(s) from one module (or vmlinux)
can be used in the .gen_epi/prologue now. In the future, the
.gen_epi/prologue can return an array of modules and use the
insn->off as an index into the array.

When the returned module is NULL, the btf is btf_vmlinux. Then the
insn->off stays at 0. This is the same as the sys_bpf.

When the btf is from a module, the btf needs an entry in
prog->aux->kfunc_btf_tab. The kfunc_btf_tab is currently
sorted by insn->off which is the offset to the attr->fd_array.

This module btf may or may not be in the kfunc_btf_tab. A new function
"find_kfunc_desc_btf_offset" is added to search for the existing entry
that has the same btf. If it is found, its offset will be used in
the insn->off. If it is not found, it will find an offset value
that is not used in the kfunc_btf_tab. Add a new entry
to kfunc_btf_tab and set this new offset to the insn->off

Once the insn->off is determined (either reuse an existing one
or an unused one is found), it will call the existing add_kfunc_call()
and everything else should fall through.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 kernel/bpf/verifier.c | 115 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 112 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5e995b7884fb..f4ac254a7661 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2787,6 +2787,60 @@ static struct btf *find_kfunc_desc_btf(struct bpf_verifier_env *env, s16 offset)
 	return btf_vmlinux ?: ERR_PTR(-ENOENT);
 }
 
+static int find_kfunc_desc_btf_offset(struct bpf_verifier_env *env, struct btf *btf,
+				      struct module *module, s16 *offset)
+{
+	struct bpf_kfunc_btf_tab *tab;
+	struct bpf_kfunc_btf *b;
+	s16 new_offset = 1; /* 0 is reserved for btf_vmlinux */
+	u32 i;
+
+	if (btf_is_vmlinux(btf)) {
+		*offset = 0;
+		return 0;
+	}
+
+	tab = env->prog->aux->kfunc_btf_tab;
+	if (!tab) {
+		tab = kzalloc(sizeof(*tab), GFP_KERNEL);
+		if (!tab)
+			return -ENOMEM;
+		env->prog->aux->kfunc_btf_tab = tab;
+	}
+
+	b = tab->descs;
+	for (i = 0; i < tab->nr_descs; i++) {
+		if (b[i].btf == btf) {
+			*offset = b[i].offset;
+			return 0;
+		}
+		/* tab->nr_descs (from the sys_bpf) max out at MAX_KFUNC_BTFS
+		 * which is smaller than S16_MAX, so it will be able to find
+		 * a new_offset to use.
+		 */
+		if (new_offset == b[i].offset)
+			new_offset++;
+	}
+
+	if (tab->nr_descs == MAX_KFUNC_BTFS) {
+		verbose(env, "too many different module BTFs\n");
+		return -E2BIG;
+	}
+
+	if (!try_module_get(module))
+		return -ENXIO;
+
+	b = &tab->descs[tab->nr_descs++];
+	btf_get(btf);
+	b->btf = btf;
+	b->module = module;
+	b->offset = new_offset;
+	*offset = new_offset;
+	sort(tab->descs, tab->nr_descs, sizeof(tab->descs[0]),
+	     kfunc_btf_cmp_by_off, NULL);
+	return 0;
+}
+
 static int add_kfunc_call(struct bpf_verifier_env *env, u32 func_id, s16 offset)
 {
 	const struct btf_type *func, *func_proto;
@@ -19603,6 +19657,50 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int fixup_pro_epilogue_kfunc(struct bpf_verifier_env *env, struct bpf_insn *insns,
+				    int cnt, struct module *module)
+{
+	struct btf *btf;
+	u32 func_id;
+	int i, err;
+	s16 offset;
+
+	for (i = 0; i < cnt; i++) {
+		if (!bpf_pseudo_kfunc_call(&insns[i]))
+			continue;
+
+		/* The kernel may not have BTF available, so only
+		 * try to get a btf if the pro/epilogue calls a kfunc.
+		 */
+		btf = btf_get_module_btf(module);
+		if (IS_ERR_OR_NULL(btf)) {
+			verbose(env, "cannot find BTF from %s for kfunc used in pro/epilogue\n",
+				module_name(module));
+			return -EINVAL;
+		}
+
+		func_id = insns[i].imm;
+		if (btf_is_vmlinux(btf) &&
+		    btf_id_set_contains(&special_kfunc_set, func_id)) {
+			verbose(env, "pro/epilogue cannot use special kfunc\n");
+			btf_put(btf);
+			return -EINVAL;
+		}
+
+		err = find_kfunc_desc_btf_offset(env, btf, module, &offset);
+		btf_put(btf);
+		if (err)
+			return err;
+
+		insns[i].off = offset;
+		err = add_kfunc_call(env, func_id, offset);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 /* convert load instructions that access fields of a context type into a
  * sequence of instructions that access fields of the underlying structure:
  *     struct __sk_buff    -> struct sk_buff
@@ -19612,21 +19710,27 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 {
 	struct bpf_subprog_info *subprogs = env->subprog_info;
 	const struct bpf_verifier_ops *ops = env->ops;
-	int i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0;
+	int err, i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0;
 	const int insn_cnt = env->prog->len;
 	struct bpf_insn insn_buf[16], epilogue_buf[16], *insn;
 	u32 target_size, size_default, off;
 	struct bpf_prog *new_prog;
 	enum bpf_access_type type;
 	bool is_narrower_load;
+	struct module *module;
 
 	if (ops->gen_epilogue) {
+		module = NULL;
 		epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog,
-						 -(subprogs[0].stack_depth + 8), NULL);
+						 -(subprogs[0].stack_depth + 8), &module);
 		if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) {
 			verbose(env, "bpf verifier is misconfigured\n");
 			return -EINVAL;
 		} else if (epilogue_cnt) {
+			err = fixup_pro_epilogue_kfunc(env, epilogue_buf, epilogue_cnt, module);
+			if (err)
+				return err;
+
 			/* Save the ARG_PTR_TO_CTX for the epilogue to use */
 			cnt = 0;
 			subprogs[0].stack_depth += 8;
@@ -19646,12 +19750,17 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			verbose(env, "bpf verifier is misconfigured\n");
 			return -EINVAL;
 		}
+		module = NULL;
 		cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
-					env->prog, NULL);
+					env->prog, &module);
 		if (cnt >= ARRAY_SIZE(insn_buf)) {
 			verbose(env, "bpf verifier is misconfigured\n");
 			return -EINVAL;
 		} else if (cnt) {
+			err = fixup_pro_epilogue_kfunc(env, insn_buf, cnt, module);
+			if (err)
+				return err;
+
 			new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt);
 			if (!new_prog)
 				return -ENOMEM;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 8/8] selftests/bpf: Add kfunc call test in gen_prologue and gen_epilogue
  2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
                   ` (6 preceding siblings ...)
  2024-08-21 23:34 ` [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc Martin KaFai Lau
@ 2024-08-21 23:34 ` Martin KaFai Lau
  7 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-21 23:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

This patch changes the .gen_pro/epilogue of the bpf_testmod_st_ops
to call kfunc. It will call the inc10 and inc100 kfunc.

The __xlated instructions have been adjusted accordinly.
The same goes for the __retval.

The inc100 kfunc is newly added in this patch which does
args->a += 100. Note that it is not in the register_btf_kfunc_id_set(),
so no need to declare in the bpf_testmod_kfunc.h.
It is enclosed with __bpf_kfunc_{start,end}_defs to avoid the
compiler warning.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 42 +++++++++++-
 .../selftests/bpf/prog_tests/pro_epilogue.c   |  4 +-
 .../selftests/bpf/progs/pro_epilogue_kfunc.c  | 68 ++++++++++++-------
 .../bpf/progs/pro_epilogue_subprog.c          | 58 ++++++++++------
 4 files changed, 123 insertions(+), 49 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 4c75346376d9..6f745d29e124 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -966,6 +966,16 @@ __bpf_kfunc int bpf_kfunc_st_ops_inc10(struct st_ops_args *args)
 	return args->a;
 }
 
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc int bpf_kfunc_st_ops_inc100(struct st_ops_args *args)
+{
+	args->a += 100;
+	return args->a;
+}
+
+__bpf_kfunc_end_defs();
+
 BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
@@ -1140,6 +1150,10 @@ static int bpf_test_mod_st_ops__test_pro_epilogue(struct st_ops_args *args)
 	return 0;
 }
 
+BTF_ID_LIST(st_ops_epilogue_kfunc_list)
+BTF_ID(func, bpf_kfunc_st_ops_inc10)
+BTF_ID(func, bpf_kfunc_st_ops_inc100)
+
 static int st_ops_gen_prologue(struct bpf_insn *insn_buf, bool direct_write,
 			       const struct bpf_prog *prog, struct module **module)
 {
@@ -1153,13 +1167,28 @@ static int st_ops_gen_prologue(struct bpf_insn *insn_buf, bool direct_write,
 	 * r7 = r6->a;
 	 * r7 += 1000;
 	 * r6->a = r7;
+	 * r7 = r1;
+	 * r1 = r6;
+	 * bpf_kfunc_st_ops_in10(r1)
+	 * r1 = r6;
+	 * bpf_kfunc_st_ops_in100(r1)
+	 * r1 = r7;
 	 */
 	*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0);
 	*insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, offsetof(struct st_ops_args, a));
 	*insn++ = BPF_ALU32_IMM(BPF_ADD, BPF_REG_7, 1000);
 	*insn++ = BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, offsetof(struct st_ops_args, a));
+	*insn++ = BPF_MOV64_REG(BPF_REG_7, BPF_REG_1);
+	*insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
+	*insn++ = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
+			       st_ops_epilogue_kfunc_list[0]);
+	*insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
+	*insn++ = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
+			       st_ops_epilogue_kfunc_list[1]);
+	*insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_7);
 	*insn++ = prog->insnsi[0];
 
+	*module = THIS_MODULE;
 	return insn - insn_buf;
 }
 
@@ -1177,7 +1206,10 @@ static int st_ops_gen_epilogue(struct bpf_insn *insn_buf, const struct bpf_prog
 	 * r6 = r1->a;
 	 * r6 += 10000;
 	 * r1->a = r6;
-	 * r0 = r6;
+	 * r6 = r1;
+	 * bpf_kfunc_st_ops_in10(r1)
+	 * r1 = r6;
+	 * bpf_kfunc_st_ops_in100(r1)
 	 * r0 *= 2;
 	 * BPF_EXIT;
 	 */
@@ -1186,10 +1218,16 @@ static int st_ops_gen_epilogue(struct bpf_insn *insn_buf, const struct bpf_prog
 	*insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, offsetof(struct st_ops_args, a));
 	*insn++ = BPF_ALU32_IMM(BPF_ADD, BPF_REG_6, 10000);
 	*insn++ = BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6, offsetof(struct st_ops_args, a));
-	*insn++ = BPF_MOV32_REG(BPF_REG_0, BPF_REG_6);
+	*insn++ = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
+	*insn++ = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
+			       st_ops_epilogue_kfunc_list[0]);
+	*insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
+	*insn++ = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
+			       st_ops_epilogue_kfunc_list[1]);
 	*insn++ = BPF_ALU32_IMM(BPF_MUL, BPF_REG_0, 2);
 	*insn++ = BPF_EXIT_INSN();
 
+	*module = THIS_MODULE;
 	return insn - insn_buf;
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c b/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
index 98de677c55a9..db6e98096335 100644
--- a/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
+++ b/tools/testing/selftests/bpf/prog_tests/pro_epilogue.c
@@ -34,8 +34,8 @@ static void test_tailcall(void)
 	prog_fd = bpf_program__fd(skel->progs.syscall_epilogue_tailcall);
 	err = bpf_prog_test_run_opts(prog_fd, &topts);
 	ASSERT_OK(err, "bpf_prog_test_run_opts");
-	ASSERT_EQ(args.a, 10001, "args.a");
-	ASSERT_EQ(topts.retval, 10001 * 2, "topts.retval");
+	ASSERT_EQ(args.a, 10111, "args.a");
+	ASSERT_EQ(topts.retval, 10111 * 2, "topts.retval");
 
 done:
 	epilogue_tailcall__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c b/tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c
index 7d1124cf4942..2bd306f16610 100644
--- a/tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c
+++ b/tools/testing/selftests/bpf/progs/pro_epilogue_kfunc.c
@@ -28,13 +28,19 @@ __xlated("0: r6 = *(u64 *)(r1 +0)")
 __xlated("1: r7 = *(u32 *)(r6 +0)")
 __xlated("2: w7 += 1000")
 __xlated("3: *(u32 *)(r6 +0) = r7")
-/* main prog */
-__xlated("4: r1 = *(u64 *)(r1 +0)")
-__xlated("5: r6 = r1")
+__xlated("4: r7 = r1")
+__xlated("5: r1 = r6")
 __xlated("6: call kernel-function")
 __xlated("7: r1 = r6")
-__xlated("8: call pc+1")
-__xlated("9: exit")
+__xlated("8: call kernel-function")
+__xlated("9: r1 = r7")
+/* main prog */
+__xlated("10: r1 = *(u64 *)(r1 +0)")
+__xlated("11: r6 = r1")
+__xlated("12: call kernel-function")
+__xlated("13: r1 = r6")
+__xlated("14: call pc+1")
+__xlated("15: exit")
 SEC("struct_ops/test_prologue_kfunc")
 __naked int test_prologue_kfunc(void)
 {
@@ -65,9 +71,12 @@ __xlated("7: r1 = *(u64 *)(r1 +0)")
 __xlated("8: r6 = *(u32 *)(r1 +0)")
 __xlated("9: w6 += 10000")
 __xlated("10: *(u32 *)(r1 +0) = r6")
-__xlated("11: w0 = w6")
-__xlated("12: w0 *= 2")
-__xlated("13: exit")
+__xlated("11: r6 = r1")
+__xlated("12: call kernel-function")
+__xlated("13: r1 = r6")
+__xlated("14: call kernel-function")
+__xlated("15: w0 *= 2")
+__xlated("16: exit")
 SEC("struct_ops/test_epilogue_kfunc")
 __naked int test_epilogue_kfunc(void)
 {
@@ -89,23 +98,32 @@ __xlated("0: r6 = *(u64 *)(r1 +0)")
 __xlated("1: r7 = *(u32 *)(r6 +0)")
 __xlated("2: w7 += 1000")
 __xlated("3: *(u32 *)(r6 +0) = r7")
+__xlated("4: r7 = r1")
+__xlated("5: r1 = r6")
+__xlated("6: call kernel-function")
+__xlated("7: r1 = r6")
+__xlated("8: call kernel-function")
+__xlated("9: r1 = r7")
 /* save __u64 *ctx to stack */
-__xlated("4: *(u64 *)(r10 -8) = r1")
+__xlated("10: *(u64 *)(r10 -8) = r1")
 /* main prog */
-__xlated("5: r1 = *(u64 *)(r1 +0)")
-__xlated("6: r6 = r1")
-__xlated("7: call kernel-function")
-__xlated("8: r1 = r6")
-__xlated("9: call pc+")
-/* epilogue */
-__xlated("10: r1 = *(u64 *)(r10 -8)")
 __xlated("11: r1 = *(u64 *)(r1 +0)")
-__xlated("12: r6 = *(u32 *)(r1 +0)")
-__xlated("13: w6 += 10000")
-__xlated("14: *(u32 *)(r1 +0) = r6")
-__xlated("15: w0 = w6")
-__xlated("16: w0 *= 2")
-__xlated("17: exit")
+__xlated("12: r6 = r1")
+__xlated("13: call kernel-function")
+__xlated("14: r1 = r6")
+__xlated("15: call pc+")
+/* epilogue */
+__xlated("16: r1 = *(u64 *)(r10 -8)")
+__xlated("17: r1 = *(u64 *)(r1 +0)")
+__xlated("18: r6 = *(u32 *)(r1 +0)")
+__xlated("19: w6 += 10000")
+__xlated("20: *(u32 *)(r1 +0) = r6")
+__xlated("21: r6 = r1")
+__xlated("22: call kernel-function")
+__xlated("23: r1 = r6")
+__xlated("24: call kernel-function")
+__xlated("25: w0 *= 2")
+__xlated("26: exit")
 SEC("struct_ops/test_pro_epilogue_kfunc")
 __naked int test_pro_epilogue_kfunc(void)
 {
@@ -122,7 +140,7 @@ __naked int test_pro_epilogue_kfunc(void)
 }
 
 SEC("syscall")
-__retval(1011) /* PROLOGUE_A [1000] + KFUNC_INC10 + SUBPROG_A [1] */
+__retval(1121) /* PROLOGUE_A [1110] + KFUNC_INC10 + SUBPROG_A [1] */
 int syscall_prologue_kfunc(void *ctx)
 {
 	struct st_ops_args args = {};
@@ -131,7 +149,7 @@ int syscall_prologue_kfunc(void *ctx)
 }
 
 SEC("syscall")
-__retval(20022) /* (KFUNC_INC10 + SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+__retval(20242) /* (KFUNC_INC10 + SUBPROG_A [1] + EPILOGUE_A [10110]) * 2 */
 int syscall_epilogue_kfunc(void *ctx)
 {
 	struct st_ops_args args = {};
@@ -140,7 +158,7 @@ int syscall_epilogue_kfunc(void *ctx)
 }
 
 SEC("syscall")
-__retval(22022) /* (PROLOGUE_A [1000] + KFUNC_INC10 + SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+__retval(22462) /* (PROLOGUE_A [1110] + KFUNC_INC10 + SUBPROG_A [1] + EPILOGUE_A [10110]) * 2 */
 int syscall_pro_epilogue_kfunc(void *ctx)
 {
 	struct st_ops_args args = {};
diff --git a/tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c b/tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c
index c91b1bf30e37..3d9cc25c024b 100644
--- a/tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c
+++ b/tools/testing/selftests/bpf/progs/pro_epilogue_subprog.c
@@ -21,10 +21,16 @@ __xlated("0: r6 = *(u64 *)(r1 +0)")
 __xlated("1: r7 = *(u32 *)(r6 +0)")
 __xlated("2: w7 += 1000")
 __xlated("3: *(u32 *)(r6 +0) = r7")
+__xlated("4: r7 = r1")
+__xlated("5: r1 = r6")
+__xlated("6: call kernel-function")
+__xlated("7: r1 = r6")
+__xlated("8: call kernel-function")
+__xlated("9: r1 = r7")
 /* main prog */
-__xlated("4: r1 = *(u64 *)(r1 +0)")
-__xlated("5: call pc+1")
-__xlated("6: exit")
+__xlated("10: r1 = *(u64 *)(r1 +0)")
+__xlated("11: call pc+1")
+__xlated("12: exit")
 SEC("struct_ops/test_prologue_subprog")
 __naked int test_prologue_subprog(void)
 {
@@ -47,9 +53,12 @@ __xlated("4: r1 = *(u64 *)(r1 +0)")
 __xlated("5: r6 = *(u32 *)(r1 +0)")
 __xlated("6: w6 += 10000")
 __xlated("7: *(u32 *)(r1 +0) = r6")
-__xlated("8: w0 = w6")
-__xlated("9: w0 *= 2")
-__xlated("10: exit")
+__xlated("8: r6 = r1")
+__xlated("9: call kernel-function")
+__xlated("10: r1 = r6")
+__xlated("11: call kernel-function")
+__xlated("12: w0 *= 2")
+__xlated("13: exit")
 SEC("struct_ops/test_epilogue_subprog")
 __naked int test_epilogue_subprog(void)
 {
@@ -66,20 +75,29 @@ __xlated("0: r6 = *(u64 *)(r1 +0)")
 __xlated("1: r7 = *(u32 *)(r6 +0)")
 __xlated("2: w7 += 1000")
 __xlated("3: *(u32 *)(r6 +0) = r7")
+__xlated("4: r7 = r1")
+__xlated("5: r1 = r6")
+__xlated("6: call kernel-function")
+__xlated("7: r1 = r6")
+__xlated("8: call kernel-function")
+__xlated("9: r1 = r7")
 /* save __u64 *ctx to stack */
-__xlated("4: *(u64 *)(r10 -8) = r1")
+__xlated("10: *(u64 *)(r10 -8) = r1")
 /* main prog */
-__xlated("5: r1 = *(u64 *)(r1 +0)")
-__xlated("6: call pc+")
+__xlated("11: r1 = *(u64 *)(r1 +0)")
+__xlated("12: call pc+")
 /* epilogue */
-__xlated("7: r1 = *(u64 *)(r10 -8)")
-__xlated("8: r1 = *(u64 *)(r1 +0)")
-__xlated("9: r6 = *(u32 *)(r1 +0)")
-__xlated("10: w6 += 10000")
-__xlated("11: *(u32 *)(r1 +0) = r6")
-__xlated("12: w0 = w6")
-__xlated("13: w0 *= 2")
-__xlated("14: exit")
+__xlated("13: r1 = *(u64 *)(r10 -8)")
+__xlated("14: r1 = *(u64 *)(r1 +0)")
+__xlated("15: r6 = *(u32 *)(r1 +0)")
+__xlated("16: w6 += 10000")
+__xlated("17: *(u32 *)(r1 +0) = r6")
+__xlated("18: r6 = r1")
+__xlated("19: call kernel-function")
+__xlated("20: r1 = r6")
+__xlated("21: call kernel-function")
+__xlated("22: w0 *= 2")
+__xlated("23: exit")
 SEC("struct_ops/test_pro_epilogue_subprog")
 __naked int test_pro_epilogue_subprog(void)
 {
@@ -91,7 +109,7 @@ __naked int test_pro_epilogue_subprog(void)
 }
 
 SEC("syscall")
-__retval(1001) /* PROLOGUE_A [1000] + SUBPROG_A [1] */
+__retval(1111) /* PROLOGUE_A [1110] + SUBPROG_A [1] */
 int syscall_prologue_subprog(void *ctx)
 {
 	struct st_ops_args args = {};
@@ -100,7 +118,7 @@ int syscall_prologue_subprog(void *ctx)
 }
 
 SEC("syscall")
-__retval(20002) /* (SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+__retval(20222) /* (SUBPROG_A [1] + EPILOGUE_A [10110]) * 2 */
 int syscall_epilogue_subprog(void *ctx)
 {
 	struct st_ops_args args = {};
@@ -109,7 +127,7 @@ int syscall_epilogue_subprog(void *ctx)
 }
 
 SEC("syscall")
-__retval(22002) /* (PROLOGUE_A [1000] + SUBPROG_A [1] + EPILOGUE_A [10000]) * 2 */
+__retval(22442) /* (PROLOGUE_A [1110] + SUBPROG_A [1] + EPILOGUE_A [10110]) * 2 */
 int syscall_pro_epilogue_subprog(void *ctx)
 {
 	struct st_ops_args args = {};
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops
  2024-08-21 23:34 ` [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops Martin KaFai Lau
@ 2024-08-22  0:22   ` Alexei Starovoitov
  2024-08-22  0:30     ` Eduard Zingerman
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-08-22  0:22 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, Kernel Team

On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> This patch adds a .gen_epilogue to the bpf_verifier_ops. It is similar
> to the existing .gen_prologue. Instead of allowing a subsystem
> to run code at the beginning of a bpf prog, it allows the subsystem
> to run code just before the bpf prog exit.
>
> One of the use case is to allow the upcoming bpf qdisc to ensure that
> the skb->dev is the same as the qdisc->dev_queue->dev. The bpf qdisc
> struct_ops implementation could either fix it up or drop the skb.
> Another use case could be in bpf_tcp_ca.c to enforce snd_cwnd
> has sane value (e.g. non zero).
>
> The epilogue can do the useful thing (like checking skb->dev) if it
> can access the bpf prog's ctx. Unlike prologue, r1 may not hold the
> ctx pointer. This patch saves the r1 in the stack if the .gen_epilogue
> has returned some instructions in the "epilogue_buf".
>
> The existing .gen_prologue is done in convert_ctx_accesses().
> The new .gen_epilogue is done in the convert_ctx_accesses() also.
> When it sees the (BPF_JMP | BPF_EXIT) instruction, it will be patched
> with the earlier generated "epilogue_buf". The epilogue patching is
> only done for the main prog.
>
> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> ---
>  include/linux/bpf.h   |  2 ++
>  kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++++++--
>  2 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f0192c173ed8..8ee9d87c332a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -974,6 +974,8 @@ struct bpf_verifier_ops {
>                                 struct bpf_insn_access_aux *info);
>         int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
>                             const struct bpf_prog *prog);
> +       int (*gen_epilogue)(struct bpf_insn *insn, const struct bpf_prog *prog,
> +                           s16 ctx_stack_off);
>         int (*gen_ld_abs)(const struct bpf_insn *orig,
>                           struct bpf_insn *insn_buf);
>         u32 (*convert_ctx_access)(enum bpf_access_type type,
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index df3be12096cf..bbb655f0c7b5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19610,15 +19610,37 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
>   */
>  static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  {
> +       struct bpf_subprog_info *subprogs = env->subprog_info;
>         const struct bpf_verifier_ops *ops = env->ops;
> -       int i, cnt, size, ctx_field_size, delta = 0;
> +       int i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0;
>         const int insn_cnt = env->prog->len;
> -       struct bpf_insn insn_buf[16], *insn;
> +       struct bpf_insn insn_buf[16], epilogue_buf[16], *insn;

This is a noticeable stack increase.
Maybe let's move both to env?

>         u32 target_size, size_default, off;
>         struct bpf_prog *new_prog;
>         enum bpf_access_type type;
>         bool is_narrower_load;
>
> +       if (ops->gen_epilogue) {
> +               epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog,
> +                                                -(subprogs[0].stack_depth + 8));
> +               if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) {
> +                       verbose(env, "bpf verifier is misconfigured\n");
> +                       return -EINVAL;
> +               } else if (epilogue_cnt) {
> +                       /* Save the ARG_PTR_TO_CTX for the epilogue to use */
> +                       cnt = 0;
> +                       subprogs[0].stack_depth += 8;
> +                       insn_buf[cnt++] = BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_1,
> +                                                     -subprogs[0].stack_depth);
> +                       insn_buf[cnt++] = env->prog->insnsi[0];
> +                       new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt);
> +                       if (!new_prog)
> +                               return -ENOMEM;
> +                       env->prog = new_prog;
> +                       delta += cnt - 1;

I suspect this is buggy.
See commit 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and
jump to the 1st insn.")

> +               }
> +       }
> +
>         if (ops->gen_prologue || env->seen_direct_write) {
>                 if (!ops->gen_prologue) {
>                         verbose(env, "bpf verifier is misconfigured\n");
> @@ -19671,6 +19693,13 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                         insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code);
>                         env->prog->aux->num_exentries++;
>                         continue;
> +               } else if (insn->code == (BPF_JMP | BPF_EXIT) &&
> +                          epilogue_cnt &&
> +                          i + delta < subprogs[1].start) {
> +                       /* Generate epilogue for the main prog */
> +                       memcpy(insn_buf, epilogue_buf, sizeof(epilogue_buf));
> +                       cnt = epilogue_cnt;
> +                       goto patch_insn_buf;

That's quite a bit of copy paste of epilogue if the program contains
multiple bpf_exit insns.
I think llvm generates it in such a way that often there is only
one bpf_exit. But gcc might be using bpf_exit liberally.

So let's patch it only once and replace other bpf_exit with jmp
to one patched place ?

That's pretty much what x86 JIT does when it converts bpf_exit.

>                 } else {
>                         continue;
>                 }
> @@ -19807,6 +19836,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                                                        insn->dst_reg, insn->dst_reg,
>                                                        size * 8, 0);
>
> +patch_insn_buf:
>                 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
>                 if (!new_prog)
>                         return -ENOMEM;
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops
  2024-08-22  0:22   ` Alexei Starovoitov
@ 2024-08-22  0:30     ` Eduard Zingerman
  2024-08-22  0:34       ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Eduard Zingerman @ 2024-08-22  0:30 UTC (permalink / raw)
  To: Alexei Starovoitov, Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Yonghong Song, Amery Hung, Kernel Team

On Wed, 2024-08-21 at 17:22 -0700, Alexei Starovoitov wrote:

[...]

> > +       if (ops->gen_epilogue) {
> > +               epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog,
> > +                                                -(subprogs[0].stack_depth + 8));
> > +               if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) {
> > +                       verbose(env, "bpf verifier is misconfigured\n");
> > +                       return -EINVAL;
> > +               } else if (epilogue_cnt) {
> > +                       /* Save the ARG_PTR_TO_CTX for the epilogue to use */
> > +                       cnt = 0;
> > +                       subprogs[0].stack_depth += 8;
> > +                       insn_buf[cnt++] = BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_1,
> > +                                                     -subprogs[0].stack_depth);
> > +                       insn_buf[cnt++] = env->prog->insnsi[0];
> > +                       new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt);
> > +                       if (!new_prog)
> > +                               return -ENOMEM;
> > +                       env->prog = new_prog;
> > +                       delta += cnt - 1;
> 
> I suspect this is buggy.
> See commit 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and
> jump to the 1st insn.")

Actually, I was unable to figure out a counter example for this case,
patching math seems to be correct, jump targets are just moved down.
But let's see, maybe Martin can come up with something.

[...]


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops
  2024-08-22  0:30     ` Eduard Zingerman
@ 2024-08-22  0:34       ` Alexei Starovoitov
  2024-08-22  0:38         ` Eduard Zingerman
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-08-22  0:34 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Martin KaFai Lau, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Amery Hung, Kernel Team

On Wed, Aug 21, 2024 at 5:30 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-08-21 at 17:22 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > > +       if (ops->gen_epilogue) {
> > > +               epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog,
> > > +                                                -(subprogs[0].stack_depth + 8));
> > > +               if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) {
> > > +                       verbose(env, "bpf verifier is misconfigured\n");
> > > +                       return -EINVAL;
> > > +               } else if (epilogue_cnt) {
> > > +                       /* Save the ARG_PTR_TO_CTX for the epilogue to use */
> > > +                       cnt = 0;
> > > +                       subprogs[0].stack_depth += 8;
> > > +                       insn_buf[cnt++] = BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_1,
> > > +                                                     -subprogs[0].stack_depth);
> > > +                       insn_buf[cnt++] = env->prog->insnsi[0];
> > > +                       new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt);
> > > +                       if (!new_prog)
> > > +                               return -ENOMEM;
> > > +                       env->prog = new_prog;
> > > +                       delta += cnt - 1;
> >
> > I suspect this is buggy.
> > See commit 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and
> > jump to the 1st insn.")
>
> Actually, I was unable to figure out a counter example for this case,
> patching math seems to be correct, jump targets are just moved down.
> But let's see, maybe Martin can come up with something.

Something like
insn_0
...
r1 = 0
if rX == .. goto insn_0

this jmp will be rewritten to point to newly added *(u64*)(r10 - ..) = r1

so at run time it will overwrite that slot with zero and
epilogue will read zero from it instead of ctx.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops
  2024-08-22  0:34       ` Alexei Starovoitov
@ 2024-08-22  0:38         ` Eduard Zingerman
  2024-08-22  0:52           ` Martin KaFai Lau
  0 siblings, 1 reply; 19+ messages in thread
From: Eduard Zingerman @ 2024-08-22  0:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Amery Hung, Kernel Team

On Wed, 2024-08-21 at 17:34 -0700, Alexei Starovoitov wrote:

[...]

> Something like
> insn_0
> ...
> r1 = 0
> if rX == .. goto insn_0
> 
> this jmp will be rewritten to point to newly added *(u64*)(r10 - ..) = r1
> 
> so at run time it will overwrite that slot with zero and
> epilogue will read zero from it instead of ctx.

That's exactly what I tried on paper and jmp target was just moving down.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops
  2024-08-22  0:38         ` Eduard Zingerman
@ 2024-08-22  0:52           ` Martin KaFai Lau
  0 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-22  0:52 UTC (permalink / raw)
  To: Eduard Zingerman, Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Yonghong Song, Amery Hung, Kernel Team

On 8/21/24 5:38 PM, Eduard Zingerman wrote:
> On Wed, 2024-08-21 at 17:34 -0700, Alexei Starovoitov wrote:
> 
> [...]
> 
>> Something like
>> insn_0
>> ...
>> r1 = 0
>> if rX == .. goto insn_0
>>
>> this jmp will be rewritten to point to newly added *(u64*)(r10 - ..) = r1
>>
>> so at run time it will overwrite that slot with zero and
>> epilogue will read zero from it instead of ctx.
> 
> That's exactly what I tried on paper and jmp target was just moving down.

Thanks for bringing this up. I will try a test case to confirm.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc
  2024-08-21 23:34 ` [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc Martin KaFai Lau
@ 2024-08-22  1:32   ` Alexei Starovoitov
  2024-08-22  6:09     ` Martin KaFai Lau
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-08-22  1:32 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, Kernel Team

On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> The existing prologue has been able to call bpf helper but not a kfunc.
> This patch allows the prologue/epilogue to call the kfunc.
>
> The subsystem that implements the .gen_prologue and .gen_epilogue
> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
> set to the btf func_id of the kfunc call. This part is the same
> as the bpf prog loaded from the sys_bpf.

I don't understand the value of this feature, since it seems
pretty hard to use.
The module (qdisc-bpf or else) would need to do something
like patch 8/8:
+BTF_ID_LIST(st_ops_epilogue_kfunc_list)
+BTF_ID(func, bpf_kfunc_st_ops_inc10)
+BTF_ID(func, bpf_kfunc_st_ops_inc100)

just to be able to:
  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
               st_ops_epilogue_kfunc_list[0]);

So a bunch of extra work on the module side and
a bunch of work in this patch to enable such a pattern,
but what is the value?

gen_epilogue() can call arbitrary kernel function.
It doesn't have to be a helper.
kfunc-s provide calling convention conversion from bpf to native,
but the same thing is achieved by BPF_CALL_N macro.
The module can use that macro without adding an actual bpf helper
to uapi bpf.h.
Then in gen_epilogue() the extra bpf insn can use:
BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
which will use
BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
to populate imm.
And JITs will emit jump to that wrapper code provided by
BPF_CALL_N.

And no need for this extra complexity in the verifier and
its consumers that have to figure out (module_fd, btf_id) for
kfunc just to fit into kfunc pattern with btf_distill_func_proto().

I guess one can argue that if such kfunc is already available
to bpf prog then extra BPF_CALL_N wrapper for the same thing
is a waste of kernel text, but this patch also adds quite a bit of
kernel text. So the cost of BPF_CALL_N (which is a zero on x86)
is acceptable.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc
  2024-08-22  1:32   ` Alexei Starovoitov
@ 2024-08-22  6:09     ` Martin KaFai Lau
  2024-08-22 13:47       ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-22  6:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, Kernel Team

On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> From: Martin KaFai Lau <martin.lau@kernel.org>
>>
>> The existing prologue has been able to call bpf helper but not a kfunc.
>> This patch allows the prologue/epilogue to call the kfunc.
>>
>> The subsystem that implements the .gen_prologue and .gen_epilogue
>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
>> set to the btf func_id of the kfunc call. This part is the same
>> as the bpf prog loaded from the sys_bpf.
> 
> I don't understand the value of this feature, since it seems
> pretty hard to use.
> The module (qdisc-bpf or else) would need to do something
> like patch 8/8:
> +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
> +BTF_ID(func, bpf_kfunc_st_ops_inc10)
> +BTF_ID(func, bpf_kfunc_st_ops_inc100)
> 
> just to be able to:
>    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
>                 st_ops_epilogue_kfunc_list[0]);
> 
> So a bunch of extra work on the module side and
> a bunch of work in this patch to enable such a pattern,
> but what is the value?
> 
> gen_epilogue() can call arbitrary kernel function.
> It doesn't have to be a helper.
> kfunc-s provide calling convention conversion from bpf to native,
> but the same thing is achieved by BPF_CALL_N macro.
> The module can use that macro without adding an actual bpf helper
> to uapi bpf.h.
> Then in gen_epilogue() the extra bpf insn can use:
> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
> which will use
> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)

BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch 
because of the bpf_jit_supports_far_kfunc_call() support for the kernel module. 
Using kfunc call will make supporting it the same.

I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been 
built-in only also. I think the hid_bpf is built-in only also.

Another consideration is also holding the module refcnt when having an 
attachable bpf prog calling a kernel func implemented in a kernel module. iiuc, 
this is the reason why aux->kfunc_btf_tab holds a reference to the kernel 
module. This should not be a problem to struct_ops though because the struct_ops 
map is the one that is attachable instead of the struct_ops prog. The struct_ops 
map has already held a refcnt of the module.

> to populate imm.
> And JITs will emit jump to that wrapper code provided by
> BPF_CALL_N.
> 
> And no need for this extra complexity in the verifier and
> its consumers that have to figure out (module_fd, btf_id) for
> kfunc just to fit into kfunc pattern with btf_distill_func_proto().
> 
> I guess one can argue that if such kfunc is already available
> to bpf prog then extra BPF_CALL_N wrapper for the same thing
> is a waste of kernel text, but this patch also adds quite a bit of
> kernel text. So the cost of BPF_CALL_N (which is a zero on x86)
> is acceptable.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc
  2024-08-22  6:09     ` Martin KaFai Lau
@ 2024-08-22 13:47       ` Alexei Starovoitov
  2024-08-22 17:38         ` Martin KaFai Lau
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-08-22 13:47 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, Kernel Team

On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
> > On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> From: Martin KaFai Lau <martin.lau@kernel.org>
> >>
> >> The existing prologue has been able to call bpf helper but not a kfunc.
> >> This patch allows the prologue/epilogue to call the kfunc.
> >>
> >> The subsystem that implements the .gen_prologue and .gen_epilogue
> >> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
> >> set to the btf func_id of the kfunc call. This part is the same
> >> as the bpf prog loaded from the sys_bpf.
> >
> > I don't understand the value of this feature, since it seems
> > pretty hard to use.
> > The module (qdisc-bpf or else) would need to do something
> > like patch 8/8:
> > +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
> > +BTF_ID(func, bpf_kfunc_st_ops_inc10)
> > +BTF_ID(func, bpf_kfunc_st_ops_inc100)
> >
> > just to be able to:
> >    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
> >                 st_ops_epilogue_kfunc_list[0]);
> >
> > So a bunch of extra work on the module side and
> > a bunch of work in this patch to enable such a pattern,
> > but what is the value?
> >
> > gen_epilogue() can call arbitrary kernel function.
> > It doesn't have to be a helper.
> > kfunc-s provide calling convention conversion from bpf to native,
> > but the same thing is achieved by BPF_CALL_N macro.
> > The module can use that macro without adding an actual bpf helper
> > to uapi bpf.h.
> > Then in gen_epilogue() the extra bpf insn can use:
> > BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
> > which will use
> > BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
>
> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch
> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module.
> Using kfunc call will make supporting it the same.

I believe far calls are typically slower,
so it may be a foot gun.
If something like qdisc-bpf adding a function call to bpf_exit
it will be called every time the program is called, so
it needs to be really fast.
Allowing such callable funcs in modules may be a performance issue
that we'd need to fix.
So imo making a design requirement that such funcs for gen_epilogoue()
need to be in kernel text is a good thing.

> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been
> built-in only also. I think the hid_bpf is built-in only also.

I don't think hid_bpf has any need for such gen_epilogue() adjustment.
tcp-bpf-cc probably doesn't need it either.
it's cleaner to fix up on the kernel side, no?
qdisc-bpf and ->dev stuff is probably the only upcoming user.
And that's a separate discussion. I'm not sure such gen_epilogoue()
concept is really that great.
Especially considering all the complexity involved.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc
  2024-08-22 13:47       ` Alexei Starovoitov
@ 2024-08-22 17:38         ` Martin KaFai Lau
  2024-08-22 17:58           ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2024-08-22 17:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, Kernel Team

On 8/22/24 6:47 AM, Alexei Starovoitov wrote:
> On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
>>> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> From: Martin KaFai Lau <martin.lau@kernel.org>
>>>>
>>>> The existing prologue has been able to call bpf helper but not a kfunc.
>>>> This patch allows the prologue/epilogue to call the kfunc.
>>>>
>>>> The subsystem that implements the .gen_prologue and .gen_epilogue
>>>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
>>>> set to the btf func_id of the kfunc call. This part is the same
>>>> as the bpf prog loaded from the sys_bpf.
>>>
>>> I don't understand the value of this feature, since it seems
>>> pretty hard to use.
>>> The module (qdisc-bpf or else) would need to do something
>>> like patch 8/8:
>>> +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
>>> +BTF_ID(func, bpf_kfunc_st_ops_inc10)
>>> +BTF_ID(func, bpf_kfunc_st_ops_inc100)
>>>
>>> just to be able to:
>>>     BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
>>>                  st_ops_epilogue_kfunc_list[0]);
>>>
>>> So a bunch of extra work on the module side and
>>> a bunch of work in this patch to enable such a pattern,
>>> but what is the value?
>>>
>>> gen_epilogue() can call arbitrary kernel function.
>>> It doesn't have to be a helper.
>>> kfunc-s provide calling convention conversion from bpf to native,
>>> but the same thing is achieved by BPF_CALL_N macro.
>>> The module can use that macro without adding an actual bpf helper
>>> to uapi bpf.h.
>>> Then in gen_epilogue() the extra bpf insn can use:
>>> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
>>> which will use
>>> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
>>
>> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch
>> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module.
>> Using kfunc call will make supporting it the same.
> 
> I believe far calls are typically slower,
> so it may be a foot gun.
> If something like qdisc-bpf adding a function call to bpf_exit
> it will be called every time the program is called, so
> it needs to be really fast.
> Allowing such callable funcs in modules may be a performance issue
> that we'd need to fix.
> So imo making a design requirement that such funcs for gen_epilogoue()
> need to be in kernel text is a good thing.

Agreed. Make sense.

> 
>> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been
>> built-in only also. I think the hid_bpf is built-in only also.
> 
> I don't think hid_bpf has any need for such gen_epilogue() adjustment.
> tcp-bpf-cc probably doesn't need it either.
> it's cleaner to fix up on the kernel side, no?

tcp-bpf-cc can use it to fix snd_cwnd. We have seen a mistake that snd_cwnd was 
set to 0 (or negative, can't remember which one). >1 ops of the 
tcp_congestion_ops may update the snd_cwnd, so there will be multiple places it 
needs to do an extra check/fix in the kernel. It is usually not the fast path, 
so may be ok.

It is not catastrophic as skb->dev. kfunc was not introduced at that time also. 
Otherwise, having a kfunc to set the snd_cwnd instead could have been an option.

> qdisc-bpf and ->dev stuff is probably the only upcoming user.

For skb->dev, may be having a dedicated kfuncs for skb->dev manipulation is the 
way to go? The example could be operations that need to touch the 
skb->rbnode/dev sharing pointer.

For fixing ->dev in the kernel, there are multiple places doing ->dequeue and 
not sure if we need to include the child->dequeue also. This fixing could be 
refactored to a kernel function and probably need to a static key in this fast 
path case.

> And that's a separate discussion. I'm not sure such gen_epilogoue()
> concept is really that great.
> Especially considering all the complexity involved.

I am curious on the problem you pointed out at patch 1 regardless, I am going to 
give it a try and remove the kfunc call. I made kfunc call separated at patch 7 
and 8 :)

If it still looks too complex or there is no value on gen_epilogue, I am fine to 
table this set.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc
  2024-08-22 17:38         ` Martin KaFai Lau
@ 2024-08-22 17:58           ` Alexei Starovoitov
  0 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2024-08-22 17:58 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Yonghong Song, Amery Hung, Kernel Team

On Thu, Aug 22, 2024 at 10:38 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 8/22/24 6:47 AM, Alexei Starovoitov wrote:
> > On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
> >>> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>>
> >>>> From: Martin KaFai Lau <martin.lau@kernel.org>
> >>>>
> >>>> The existing prologue has been able to call bpf helper but not a kfunc.
> >>>> This patch allows the prologue/epilogue to call the kfunc.
> >>>>
> >>>> The subsystem that implements the .gen_prologue and .gen_epilogue
> >>>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
> >>>> set to the btf func_id of the kfunc call. This part is the same
> >>>> as the bpf prog loaded from the sys_bpf.
> >>>
> >>> I don't understand the value of this feature, since it seems
> >>> pretty hard to use.
> >>> The module (qdisc-bpf or else) would need to do something
> >>> like patch 8/8:
> >>> +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
> >>> +BTF_ID(func, bpf_kfunc_st_ops_inc10)
> >>> +BTF_ID(func, bpf_kfunc_st_ops_inc100)
> >>>
> >>> just to be able to:
> >>>     BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
> >>>                  st_ops_epilogue_kfunc_list[0]);
> >>>
> >>> So a bunch of extra work on the module side and
> >>> a bunch of work in this patch to enable such a pattern,
> >>> but what is the value?
> >>>
> >>> gen_epilogue() can call arbitrary kernel function.
> >>> It doesn't have to be a helper.
> >>> kfunc-s provide calling convention conversion from bpf to native,
> >>> but the same thing is achieved by BPF_CALL_N macro.
> >>> The module can use that macro without adding an actual bpf helper
> >>> to uapi bpf.h.
> >>> Then in gen_epilogue() the extra bpf insn can use:
> >>> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
> >>> which will use
> >>> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
> >>
> >> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch
> >> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module.
> >> Using kfunc call will make supporting it the same.
> >
> > I believe far calls are typically slower,
> > so it may be a foot gun.
> > If something like qdisc-bpf adding a function call to bpf_exit
> > it will be called every time the program is called, so
> > it needs to be really fast.
> > Allowing such callable funcs in modules may be a performance issue
> > that we'd need to fix.
> > So imo making a design requirement that such funcs for gen_epilogoue()
> > need to be in kernel text is a good thing.
>
> Agreed. Make sense.
>
> >
> >> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been
> >> built-in only also. I think the hid_bpf is built-in only also.
> >
> > I don't think hid_bpf has any need for such gen_epilogue() adjustment.
> > tcp-bpf-cc probably doesn't need it either.
> > it's cleaner to fix up on the kernel side, no?
>
> tcp-bpf-cc can use it to fix snd_cwnd. We have seen a mistake that snd_cwnd was
> set to 0 (or negative, can't remember which one). >1 ops of the
> tcp_congestion_ops may update the snd_cwnd, so there will be multiple places it
> needs to do an extra check/fix in the kernel. It is usually not the fast path,
> so may be ok.
>
> It is not catastrophic as skb->dev. kfunc was not introduced at that time also.
> Otherwise, having a kfunc to set the snd_cwnd instead could have been an option.
>
> > qdisc-bpf and ->dev stuff is probably the only upcoming user.
>
> For skb->dev, may be having a dedicated kfuncs for skb->dev manipulation is the
> way to go? The example could be operations that need to touch the
> skb->rbnode/dev sharing pointer.
>
> For fixing ->dev in the kernel, there are multiple places doing ->dequeue and
> not sure if we need to include the child->dequeue also. This fixing could be
> refactored to a kernel function and probably need to a static key in this fast
> path case.
>
> > And that's a separate discussion. I'm not sure such gen_epilogoue()
> > concept is really that great.
> > Especially considering all the complexity involved.
>
> I am curious on the problem you pointed out at patch 1 regardless, I am going to
> give it a try and remove the kfunc call. I made kfunc call separated at patch 7
> and 8 :)
>
> If it still looks too complex or there is no value on gen_epilogue, I am fine to
> table this set.

I think the patches 1-6 are fine and good to go.
Mainly because they simplify landing of qdisc-bpf.
Once all pieces are there we may revisit the need for gen_epilogoue()
and whether there is an alternative.
I would only drop 7 and 8 for now until it's absolutely needed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-08-22 17:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21 23:34 [PATCH v2 bpf-next 0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue Martin KaFai Lau
2024-08-21 23:34 ` [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops Martin KaFai Lau
2024-08-22  0:22   ` Alexei Starovoitov
2024-08-22  0:30     ` Eduard Zingerman
2024-08-22  0:34       ` Alexei Starovoitov
2024-08-22  0:38         ` Eduard Zingerman
2024-08-22  0:52           ` Martin KaFai Lau
2024-08-21 23:34 ` [PATCH v2 bpf-next 2/8] bpf: Export bpf_base_func_proto Martin KaFai Lau
2024-08-21 23:34 ` [PATCH v2 bpf-next 3/8] selftests/bpf: attach struct_ops maps before test prog runs Martin KaFai Lau
2024-08-21 23:34 ` [PATCH v2 bpf-next 4/8] selftests/bpf: Test gen_prologue and gen_epilogue Martin KaFai Lau
2024-08-21 23:34 ` [PATCH v2 bpf-next 5/8] selftests/bpf: Add tailcall epilogue test Martin KaFai Lau
2024-08-21 23:34 ` [PATCH v2 bpf-next 6/8] bpf: Add module parameter to gen_prologue and gen_epilogue Martin KaFai Lau
2024-08-21 23:34 ` [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc Martin KaFai Lau
2024-08-22  1:32   ` Alexei Starovoitov
2024-08-22  6:09     ` Martin KaFai Lau
2024-08-22 13:47       ` Alexei Starovoitov
2024-08-22 17:38         ` Martin KaFai Lau
2024-08-22 17:58           ` Alexei Starovoitov
2024-08-21 23:34 ` [PATCH v2 bpf-next 8/8] selftests/bpf: Add kfunc call test in gen_prologue and gen_epilogue Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox