[RFC PATCH bpf-next v3 0/2] bpf, x64: Fix tailcall infinite loop

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH bpf-next v3 0/2] bpf, x64: Fix tailcall infinite loop
@ 2023-08-25 14:52 Leon Hwang
  2023-08-25 14:52 ` [RFC PATCH bpf-next v3 1/2] " Leon Hwang
  2023-08-25 14:52 ` [RFC PATCH bpf-next v3 2/2] selftests/bpf: Add testcases for tailcall infinite loop fixing Leon Hwang
  0 siblings, 2 replies; 8+ messages in thread
From: Leon Hwang @ 2023-08-25 14:52 UTC (permalink / raw)
  To: ast, daniel, andrii, maciej.fijalkowski; +Cc: song, hffilwlqm, bpf

This patch series fixes a tailcall infinite loop.

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So do it in trampolines.

How did I discover the bug?

From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 for
subprogs with tailcalls"), the total stack size limits to around 8KiB.
Then, I write some bpf progs to validate the stack consuming, that are
tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1].

At that time, accidently, I made a tailcall loop. And then the loop halted
my VM. Without the loop, the bpf progs would consume over 8KiB stack size.
But the _stack-overflow_ did not halt my VM.

With bpf_printk(), I confirmed that the tailcall count limit did not work
expectedly. Next, read the code and fix it.

Finally, unfortunately, I only fix it on x64 but other arches. As a
result, CI tests failed because this bug hasn't been fixed on s390x.

Some helps are requested.

v2 -> v3:
  * Suggestions from Alexei:
    * Fix comment style.
    * Remove FIXME comment.
    * Remove arch_prepare_bpf_trampoline() function comment update.
    * Remove the unnecessary 'tgt_prog->aux->func[subprog]->is_func' check.

[1]: https://github.com/Asphaltt/learn-by-example/tree/main/ebpf/tailcall-stackoverflow

Leon Hwang (2):
  bpf, x64: Fix tailcall infinite loop
  selftests/bpf: Add testcases for tailcall infinite loop fixing

 arch/x86/net/bpf_jit_comp.c                   |  32 ++-
 include/linux/bpf.h                           |   5 +
 kernel/bpf/trampoline.c                       |   4 +-
 kernel/bpf/verifier.c                         |  30 ++-
 .../selftests/bpf/prog_tests/tailcalls.c      | 194 +++++++++++++++++-
 .../bpf/progs/tailcall_bpf2bpf_fentry.c       |  18 ++
 .../bpf/progs/tailcall_bpf2bpf_fexit.c        |  18 ++
 7 files changed, 285 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c

base-commit: 9930e4af4b509bcf6f060b09b16884f26102d110
-- 
2.41.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH bpf-next v3 1/2] bpf, x64: Fix tailcall infinite loop
  2023-08-25 14:52 [RFC PATCH bpf-next v3 0/2] bpf, x64: Fix tailcall infinite loop Leon Hwang
@ 2023-08-25 14:52 ` Leon Hwang
  2023-08-25 17:58   ` Maciej Fijalkowski
  2023-08-25 14:52 ` [RFC PATCH bpf-next v3 2/2] selftests/bpf: Add testcases for tailcall infinite loop fixing Leon Hwang
  1 sibling, 1 reply; 8+ messages in thread
From: Leon Hwang @ 2023-08-25 14:52 UTC (permalink / raw)
  To: ast, daniel, andrii, maciej.fijalkowski; +Cc: song, hffilwlqm, bpf

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So do it in trampolines.

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++------
 include/linux/bpf.h         |  5 +++++
 kernel/bpf/trampoline.c     |  4 ++--
 kernel/bpf/verifier.c       | 30 +++++++++++++++++++++++-------
 4 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a5930042139d3..2846c21d75bfa 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -303,8 +303,12 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 	prog += X86_PATCH_SIZE;
 	if (!ebpf_from_cbpf) {
 		if (tail_call_reachable && !is_subprog)
+			/* When it's the entry of the whole tailcall context,
+			 * zeroing rax means initialising tail_call_cnt.
+			 */
 			EMIT2(0x31, 0xC0); /* xor eax, eax */
 		else
+			/* Keep the same instruction layout. */
 			EMIT2(0x66, 0x90); /* nop2 */
 	}
 	EMIT1(0x55);             /* push rbp */
@@ -1018,6 +1022,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
 
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
+/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
+#define RESTORE_TAIL_CALL_CNT(stack)				\
+	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
+
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
 		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
 {
@@ -1623,9 +1631,7 @@ st:			if (is_imm8(insn->off))
 
 			func = (u8 *) __bpf_call_base + imm32;
 			if (tail_call_reachable) {
-				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
-				EMIT3_off32(0x48, 0x8B, 0x85,
-					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
+				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
 				if (!imm32)
 					return -EINVAL;
 				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
@@ -2400,6 +2406,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	 *                     [ ...        ]
 	 *                     [ stack_arg2 ]
 	 * RBP - arg_stack_off [ stack_arg1 ]
+	 * RSP                 [ tail_call_cnt ] BPF_TRAMP_F_TAIL_CALL_CTX
 	 */
 
 	/* room for return value of orig_call or fentry prog */
@@ -2464,6 +2471,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	else
 		/* sub rsp, stack_size */
 		EMIT4(0x48, 0x83, 0xEC, stack_size);
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		EMIT1(0x50);		/* push rax */
 	/* mov QWORD PTR [rbp - rbx_off], rbx */
 	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
 
@@ -2516,9 +2525,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 		restore_regs(m, &prog, regs_off);
 		save_args(m, &prog, arg_stack_off, true);
 
+		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+			/* Before calling the original function, restore the
+			 * tail_call_cnt from stack to rax.
+			 */
+			RESTORE_TAIL_CALL_CNT(stack_size);
+
 		if (flags & BPF_TRAMP_F_ORIG_STACK) {
-			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
-			EMIT2(0xff, 0xd0); /* call *rax */
+			emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8);
+			EMIT2(0xff, 0xd3); /* call *rbx */
 		} else {
 			/* call original function */
 			if (emit_rsb_call(&prog, orig_call, prog)) {
@@ -2569,7 +2584,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 			ret = -EINVAL;
 			goto cleanup;
 		}
-	}
+	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		/* Before running the original function, restore the
+		 * tail_call_cnt from stack to rax.
+		 */
+		RESTORE_TAIL_CALL_CNT(stack_size);
+
 	/* restore return value of orig_call or fentry prog back into RAX */
 	if (save_ret)
 		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cfabbcf47bdb8..c8df257ea435d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1028,6 +1028,11 @@ struct btf_func_model {
  */
 #define BPF_TRAMP_F_SHARE_IPMODIFY	BIT(6)
 
+/* Indicate that current trampoline is in a tail call context. Then, it has to
+ * cache and restore tail_call_cnt to avoid infinite tail call loop.
+ */
+#define BPF_TRAMP_F_TAIL_CALL_CTX	BIT(7)
+
 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
  * bytes on x86.
  */
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 78acf28d48732..16ab5da7161f2 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 		goto out;
 	}
 
-	/* clear all bits except SHARE_IPMODIFY */
-	tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
+	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
+	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
 
 	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
 	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ccca1f6c9981..6f290bc6f5f19 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19246,6 +19246,21 @@ static int check_non_sleepable_error_inject(u32 btf_id)
 	return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
 }
 
+static inline int find_subprog_index(const struct bpf_prog *prog,
+				     u32 btf_id)
+{
+	struct bpf_prog_aux *aux = prog->aux;
+	int i, subprog = -1;
+
+	for (i = 0; i < aux->func_info_cnt; i++)
+		if (aux->func_info[i].type_id == btf_id) {
+			subprog = i;
+			break;
+		}
+
+	return subprog;
+}
+
 int bpf_check_attach_target(struct bpf_verifier_log *log,
 			    const struct bpf_prog *prog,
 			    const struct bpf_prog *tgt_prog,
@@ -19254,9 +19269,9 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 {
 	bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
 	const char prefix[] = "btf_trace_";
-	int ret = 0, subprog = -1, i;
 	const struct btf_type *t;
 	bool conservative = true;
+	int ret = 0, subprog;
 	const char *tname;
 	struct btf *btf;
 	long addr = 0;
@@ -19291,11 +19306,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 			return -EINVAL;
 		}
 
-		for (i = 0; i < aux->func_info_cnt; i++)
-			if (aux->func_info[i].type_id == btf_id) {
-				subprog = i;
-				break;
-			}
+		subprog = find_subprog_index(tgt_prog, btf_id);
 		if (subprog == -1) {
 			bpf_log(log, "Subprog %s doesn't exist\n", tname);
 			return -EINVAL;
@@ -19559,7 +19570,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	struct bpf_attach_target_info tgt_info = {};
 	u32 btf_id = prog->aux->attach_btf_id;
 	struct bpf_trampoline *tr;
-	int ret;
+	int ret, subprog;
 	u64 key;
 
 	if (prog->type == BPF_PROG_TYPE_SYSCALL) {
@@ -19629,6 +19640,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	if (!tr)
 		return -ENOMEM;
 
+	if (tgt_prog && tgt_prog->aux->tail_call_reachable) {
+		subprog = find_subprog_index(tgt_prog, btf_id);
+		tr->flags = subprog > 0 ? BPF_TRAMP_F_TAIL_CALL_CTX : 0;
+	}
+
 	prog->aux->dst_trampoline = tr;
 	return 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH bpf-next v3 2/2] selftests/bpf: Add testcases for tailcall infinite loop fixing
  2023-08-25 14:52 [RFC PATCH bpf-next v3 0/2] bpf, x64: Fix tailcall infinite loop Leon Hwang
  2023-08-25 14:52 ` [RFC PATCH bpf-next v3 1/2] " Leon Hwang
@ 2023-08-25 14:52 ` Leon Hwang
  1 sibling, 0 replies; 8+ messages in thread
From: Leon Hwang @ 2023-08-25 14:52 UTC (permalink / raw)
  To: ast, daniel, andrii, maciej.fijalkowski; +Cc: song, hffilwlqm, bpf

Add 3 test cases to confirm the tailcall infinite loop bug has been fixed.

Like tailcall_bpf2bpf cases, do fentry/fexit on the bpf2bpf, and then
check the final count result.

tools/testing/selftests/bpf/test_progs -t tailcalls
226/13  tailcalls/tailcall_bpf2bpf_fentry:OK
226/14  tailcalls/tailcall_bpf2bpf_fexit:OK
226/15  tailcalls/tailcall_bpf2bpf_fentry_fexit:OK
226     tailcalls:OK
Summary: 1/15 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 194 +++++++++++++++++-
 .../bpf/progs/tailcall_bpf2bpf_fentry.c       |  18 ++
 .../bpf/progs/tailcall_bpf2bpf_fexit.c        |  18 ++
 3 files changed, 229 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index 58fe2c586ed76..a47c2fd6b8d37 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -634,7 +634,7 @@ static void test_tailcall_bpf2bpf_2(void)
 		return;
 
 	data_fd = bpf_map__fd(data_map);
-	if (CHECK_FAIL(map_fd < 0))
+	if (CHECK_FAIL(data_fd < 0))
 		return;
 
 	i = 0;
@@ -884,6 +884,191 @@ static void test_tailcall_bpf2bpf_6(void)
 	tailcall_bpf2bpf6__destroy(obj);
 }
 
+static void __tailcall_bpf2bpf_fentry_fexit(bool test_fentry, bool test_fexit)
+{
+	struct bpf_object *tgt_obj = NULL, *fentry_obj = NULL, *fexit_obj = NULL;
+	struct bpf_link *fentry_link = NULL, *fexit_link = NULL;
+	int err, map_fd, prog_fd, main_fd, data_fd, i, val;
+	struct bpf_map *prog_array, *data_map;
+	struct bpf_program *prog;
+	char buff[128] = {};
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = buff,
+		.data_size_in = sizeof(buff),
+		.repeat = 1,
+	);
+
+	err = bpf_prog_test_load("tailcall_bpf2bpf2.bpf.o",
+				 BPF_PROG_TYPE_SCHED_CLS,
+				 &tgt_obj, &prog_fd);
+	if (!ASSERT_OK(err, "load tgt_obj"))
+		return;
+
+	prog = bpf_object__find_program_by_name(tgt_obj, "entry");
+	if (!ASSERT_OK_PTR(prog, "find entry prog"))
+		goto out;
+
+	main_fd = bpf_program__fd(prog);
+	if (!ASSERT_FALSE(main_fd < 0, "find entry prog fd"))
+		goto out;
+
+	prog_array = bpf_object__find_map_by_name(tgt_obj, "jmp_table");
+	if (!ASSERT_OK_PTR(prog_array, "find jmp_table map"))
+		goto out;
+
+	map_fd = bpf_map__fd(prog_array);
+	if (!ASSERT_FALSE(map_fd < 0, "find jmp_table map fd"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(tgt_obj, "classifier_0");
+	if (!ASSERT_OK_PTR(prog, "find classifier_0 prog"))
+		goto out;
+
+	prog_fd = bpf_program__fd(prog);
+	if (!ASSERT_FALSE(prog_fd < 0, "find classifier_0 prog fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	if (test_fentry) {
+		fentry_obj = bpf_object__open_file("tailcall_bpf2bpf_fentry.bpf.o",
+						   NULL);
+		if (!ASSERT_OK_PTR(fentry_obj, "open fentry_obj file"))
+			goto out;
+
+		prog = bpf_object__find_program_by_name(fentry_obj, "fentry");
+		if (!ASSERT_OK_PTR(prog, "find fentry prog"))
+			goto out;
+
+		err = bpf_program__set_attach_target(prog, prog_fd,
+						     "subprog_tail");
+		if (!ASSERT_OK(err, "set_attach_target subprog_tail"))
+			goto out;
+
+		err = bpf_object__load(fentry_obj);
+		if (!ASSERT_OK(err, "load fentry_obj"))
+			goto out;
+
+		fentry_link = bpf_program__attach_trace(prog);
+		if (!ASSERT_OK_PTR(fentry_link, "attach_trace"))
+			goto out;
+	}
+
+	if (test_fexit) {
+		fexit_obj = bpf_object__open_file("tailcall_bpf2bpf_fexit.bpf.o",
+						  NULL);
+		if (!ASSERT_OK_PTR(fexit_obj, "open fexit_obj file"))
+			goto out;
+
+		prog = bpf_object__find_program_by_name(fexit_obj, "fexit");
+		if (!ASSERT_OK_PTR(prog, "find fexit prog"))
+			goto out;
+
+		err = bpf_program__set_attach_target(prog, prog_fd,
+						     "subprog_tail");
+		if (!ASSERT_OK(err, "set_attach_target subprog_tail"))
+			goto out;
+
+		err = bpf_object__load(fexit_obj);
+		if (!ASSERT_OK(err, "load fexit_obj"))
+			goto out;
+
+		fexit_link = bpf_program__attach_trace(prog);
+		if (!ASSERT_OK_PTR(fexit_link, "attach_trace"))
+			goto out;
+	}
+
+	err = bpf_prog_test_run_opts(main_fd, &topts);
+	ASSERT_OK(err, "tailcall");
+	ASSERT_EQ(topts.retval, 1, "tailcall retval");
+
+	data_map = bpf_object__find_map_by_name(tgt_obj, "tailcall.bss");
+	if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+			  "find tailcall.bss map"))
+		goto out;
+
+	data_fd = bpf_map__fd(data_map);
+	if (!ASSERT_FALSE(data_fd < 0, "find tailcall.bss map fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_lookup_elem(data_fd, &i, &val);
+	ASSERT_OK(err, "tailcall count");
+	ASSERT_EQ(val, 33, "tailcall count");
+
+	if (test_fentry) {
+		data_map = bpf_object__find_map_by_name(fentry_obj, ".bss");
+		if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+				  "find tailcall_bpf2bpf_fentry.bss.bss map"))
+			goto out;
+
+		data_fd = bpf_map__fd(data_map);
+		if (!ASSERT_FALSE(data_fd < 0,
+				  "find tailcall_bpf2bpf_fentry.bss.bss map fd"))
+			goto out;
+
+		i = 0;
+		err = bpf_map_lookup_elem(data_fd, &i, &val);
+		ASSERT_OK(err, "fentry count");
+		ASSERT_EQ(val, 33, "fentry count");
+	}
+
+	if (test_fexit) {
+		data_map = bpf_object__find_map_by_name(fexit_obj, ".bss");
+		if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+				  "find tailcall_bpf2bpf_fexit.bss map"))
+			goto out;
+
+		data_fd = bpf_map__fd(data_map);
+		if (!ASSERT_FALSE(data_fd < 0,
+				  "find tailcall_bpf2bpf_fexit.bss map fd"))
+			goto out;
+
+		i = 0;
+		err = bpf_map_lookup_elem(data_fd, &i, &val);
+		ASSERT_OK(err, "fexit count");
+		ASSERT_EQ(val, 33, "fexit count");
+	}
+
+out:
+	bpf_link__destroy(fentry_link);
+	bpf_link__destroy(fexit_link);
+	bpf_object__close(fentry_obj);
+	bpf_object__close(fexit_obj);
+	bpf_object__close(tgt_obj);
+}
+
+/* test_tailcall_bpf2bpf_fentry checks that the count value of the tail call
+ * limit enforcement matches with expectations when tailcall is preceded with
+ * bpf2bpf call, and the bpf2bpf call is traced by fentry.
+ */
+static void test_tailcall_bpf2bpf_fentry(void)
+{
+	__tailcall_bpf2bpf_fentry_fexit(true, false);
+}
+
+/* test_tailcall_bpf2bpf_fexit checks that the count value of the tail call
+ * limit enforcement matches with expectations when tailcall is preceded with
+ * bpf2bpf call, and the bpf2bpf call is traced by fexit.
+ */
+static void test_tailcall_bpf2bpf_fexit(void)
+{
+	__tailcall_bpf2bpf_fentry_fexit(false, true);
+}
+
+/* test_tailcall_bpf2bpf_fentry_fexit checks that the count value of the tail call
+ * limit enforcement matches with expectations when tailcall is preceded with
+ * bpf2bpf call, and the bpf2bpf call is traced by both fentry and fexit.
+ */
+static void test_tailcall_bpf2bpf_fentry_fexit(void)
+{
+	__tailcall_bpf2bpf_fentry_fexit(true, true);
+}
+
 void test_tailcalls(void)
 {
 	if (test__start_subtest("tailcall_1"))
@@ -910,4 +1095,11 @@ void test_tailcalls(void)
 		test_tailcall_bpf2bpf_4(true);
 	if (test__start_subtest("tailcall_bpf2bpf_6"))
 		test_tailcall_bpf2bpf_6();
+	if (test__start_subtest("tailcall_bpf2bpf_fentry"))
+		test_tailcall_bpf2bpf_fentry();
+	if (test__start_subtest("tailcall_bpf2bpf_fexit"))
+		test_tailcall_bpf2bpf_fexit();
+	if (test__start_subtest("tailcall_bpf2bpf_fentry_fexit"))
+		test_tailcall_bpf2bpf_fentry_fexit();
 }
+
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c
new file mode 100644
index 0000000000000..8436c6729167c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Leon Hwang */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int count = 0;
+
+SEC("fentry/subprog_tail")
+int BPF_PROG(fentry, struct sk_buff *skb)
+{
+	count++;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c
new file mode 100644
index 0000000000000..fe16412c6e6e9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Leon Hwang */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int count = 0;
+
+SEC("fexit/subprog_tail")
+int BPF_PROG(fexit, struct sk_buff *skb)
+{
+	count++;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] bpf, x64: Fix tailcall infinite loop
  2023-08-25 14:52 ` [RFC PATCH bpf-next v3 1/2] " Leon Hwang
@ 2023-08-25 17:58   ` Maciej Fijalkowski
  2023-08-26  4:03     ` Leon Hwang
  0 siblings, 1 reply; 8+ messages in thread
From: Maciej Fijalkowski @ 2023-08-25 17:58 UTC (permalink / raw)
  To: Leon Hwang; +Cc: ast, daniel, andrii, song, bpf

On Fri, Aug 25, 2023 at 10:52:15PM +0800, Leon Hwang wrote:
> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
> handling in JIT"), the tailcall on x64 works better than before.
> 
> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> 
> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
> to other BPF programs"), BPF program is able to trace other BPF programs.
> 
> How about combining them all together?
> 
> 1. FENTRY/FEXIT on a BPF subprogram.
> 2. A tailcall runs in the BPF subprogram.
> 3. The tailcall calls itself.

I would be interested in seeing broken asm code TBH :)

> 
> As a result, a tailcall infinite loop comes up. And the loop would halt
> the machine.
> 
> As we know, in tail call context, the tail_call_cnt propagates by stack
> and rax register between BPF subprograms. So do it in trampolines.
> 
> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++------
>  include/linux/bpf.h         |  5 +++++
>  kernel/bpf/trampoline.c     |  4 ++--
>  kernel/bpf/verifier.c       | 30 +++++++++++++++++++++++-------
>  4 files changed, 56 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index a5930042139d3..2846c21d75bfa 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -303,8 +303,12 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
>  	prog += X86_PATCH_SIZE;
>  	if (!ebpf_from_cbpf) {
>  		if (tail_call_reachable && !is_subprog)
> +			/* When it's the entry of the whole tailcall context,
> +			 * zeroing rax means initialising tail_call_cnt.
> +			 */
>  			EMIT2(0x31, 0xC0); /* xor eax, eax */
>  		else
> +			/* Keep the same instruction layout. */

While these comments are helpful I have mixed feelings about them residing
in this patch - rule of thumb to me is to keep the fixes as small as
possible.

>  			EMIT2(0x66, 0x90); /* nop2 */
>  	}
>  	EMIT1(0x55);             /* push rbp */
> @@ -1018,6 +1022,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
>  
>  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>  
> +/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> +#define RESTORE_TAIL_CALL_CNT(stack)				\
> +	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
> +
>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>  		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
>  {
> @@ -1623,9 +1631,7 @@ st:			if (is_imm8(insn->off))
>  
>  			func = (u8 *) __bpf_call_base + imm32;
>  			if (tail_call_reachable) {
> -				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> -				EMIT3_off32(0x48, 0x8B, 0x85,
> -					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
> +				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
>  				if (!imm32)
>  					return -EINVAL;
>  				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
> @@ -2400,6 +2406,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	 *                     [ ...        ]
>  	 *                     [ stack_arg2 ]
>  	 * RBP - arg_stack_off [ stack_arg1 ]
> +	 * RSP                 [ tail_call_cnt ] BPF_TRAMP_F_TAIL_CALL_CTX
>  	 */
>  
>  	/* room for return value of orig_call or fentry prog */
> @@ -2464,6 +2471,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	else
>  		/* sub rsp, stack_size */
>  		EMIT4(0x48, 0x83, 0xEC, stack_size);
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		EMIT1(0x50);		/* push rax */
>  	/* mov QWORD PTR [rbp - rbx_off], rbx */
>  	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
>  
> @@ -2516,9 +2525,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  		restore_regs(m, &prog, regs_off);
>  		save_args(m, &prog, arg_stack_off, true);
>  
> +		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +			/* Before calling the original function, restore the
> +			 * tail_call_cnt from stack to rax.
> +			 */
> +			RESTORE_TAIL_CALL_CNT(stack_size);
> +
>  		if (flags & BPF_TRAMP_F_ORIG_STACK) {
> -			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> -			EMIT2(0xff, 0xd0); /* call *rax */
> +			emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8);
> +			EMIT2(0xff, 0xd3); /* call *rbx */
>  		} else {
>  			/* call original function */
>  			if (emit_rsb_call(&prog, orig_call, prog)) {
> @@ -2569,7 +2584,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  			ret = -EINVAL;
>  			goto cleanup;
>  		}
> -	}
> +	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		/* Before running the original function, restore the
> +		 * tail_call_cnt from stack to rax.
> +		 */
> +		RESTORE_TAIL_CALL_CNT(stack_size);
> +
>  	/* restore return value of orig_call or fentry prog back into RAX */
>  	if (save_ret)
>  		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index cfabbcf47bdb8..c8df257ea435d 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1028,6 +1028,11 @@ struct btf_func_model {
>   */
>  #define BPF_TRAMP_F_SHARE_IPMODIFY	BIT(6)
>  
> +/* Indicate that current trampoline is in a tail call context. Then, it has to
> + * cache and restore tail_call_cnt to avoid infinite tail call loop.
> + */
> +#define BPF_TRAMP_F_TAIL_CALL_CTX	BIT(7)
> +
>  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
>   * bytes on x86.
>   */
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 78acf28d48732..16ab5da7161f2 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>  		goto out;
>  	}
>  
> -	/* clear all bits except SHARE_IPMODIFY */
> -	tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
> +	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
> +	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
>  
>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4ccca1f6c9981..6f290bc6f5f19 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19246,6 +19246,21 @@ static int check_non_sleepable_error_inject(u32 btf_id)
>  	return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
>  }
>  
> +static inline int find_subprog_index(const struct bpf_prog *prog,

FWIW please no inlines in source files, but I don't currently follow the
need for that routine.

> +				     u32 btf_id)
> +{
> +	struct bpf_prog_aux *aux = prog->aux;
> +	int i, subprog = -1;
> +
> +	for (i = 0; i < aux->func_info_cnt; i++)
> +		if (aux->func_info[i].type_id == btf_id) {
> +			subprog = i;
> +			break;
> +		}
> +
> +	return subprog;
> +}
> +
>  int bpf_check_attach_target(struct bpf_verifier_log *log,
>  			    const struct bpf_prog *prog,
>  			    const struct bpf_prog *tgt_prog,
> @@ -19254,9 +19269,9 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>  {
>  	bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
>  	const char prefix[] = "btf_trace_";
> -	int ret = 0, subprog = -1, i;
>  	const struct btf_type *t;
>  	bool conservative = true;
> +	int ret = 0, subprog;
>  	const char *tname;
>  	struct btf *btf;
>  	long addr = 0;
> @@ -19291,11 +19306,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>  			return -EINVAL;
>  		}
>  
> -		for (i = 0; i < aux->func_info_cnt; i++)
> -			if (aux->func_info[i].type_id == btf_id) {
> -				subprog = i;
> -				break;
> -			}
> +		subprog = find_subprog_index(tgt_prog, btf_id);
>  		if (subprog == -1) {
>  			bpf_log(log, "Subprog %s doesn't exist\n", tname);
>  			return -EINVAL;
> @@ -19559,7 +19570,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
>  	struct bpf_attach_target_info tgt_info = {};
>  	u32 btf_id = prog->aux->attach_btf_id;
>  	struct bpf_trampoline *tr;
> -	int ret;
> +	int ret, subprog;
>  	u64 key;
>  
>  	if (prog->type == BPF_PROG_TYPE_SYSCALL) {
> @@ -19629,6 +19640,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
>  	if (!tr)
>  		return -ENOMEM;
>  
> +	if (tgt_prog && tgt_prog->aux->tail_call_reachable) {
> +		subprog = find_subprog_index(tgt_prog, btf_id);
> +		tr->flags = subprog > 0 ? BPF_TRAMP_F_TAIL_CALL_CTX : 0;
> +	}

I kinda forgot trampoline internals so please bear with me.
Here you are checking actually...what? That current program is a subprog
of tgt prog? My knee jerk reaction would be to propagate the
BPF_TRAMP_F_TAIL_CALL_CTX based on just tail_call_reachable, but I need
some more time to get my head around it again, sorry :<

> +
>  	prog->aux->dst_trampoline = tr;
>  	return 0;
>  }
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] bpf, x64: Fix tailcall infinite loop
  2023-08-25 17:58   ` Maciej Fijalkowski
@ 2023-08-26  4:03     ` Leon Hwang
  2023-08-30 22:49       ` Maciej Fijalkowski
  0 siblings, 1 reply; 8+ messages in thread
From: Leon Hwang @ 2023-08-26  4:03 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: ast, daniel, andrii, song, bpf



On 2023/8/26 01:58, Maciej Fijalkowski wrote:
> On Fri, Aug 25, 2023 at 10:52:15PM +0800, Leon Hwang wrote:
>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>> handling in JIT"), the tailcall on x64 works better than before.
>>
>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>
>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
>> to other BPF programs"), BPF program is able to trace other BPF programs.
>>
>> How about combining them all together?
>>
>> 1. FENTRY/FEXIT on a BPF subprogram.
>> 2. A tailcall runs in the BPF subprogram.
>> 3. The tailcall calls itself.
> 
> I would be interested in seeing broken asm code TBH :)
> 
>>
>> As a result, a tailcall infinite loop comes up. And the loop would halt
>> the machine.
>>
>> As we know, in tail call context, the tail_call_cnt propagates by stack
>> and rax register between BPF subprograms. So do it in trampolines.
>>
>> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++------
>>  include/linux/bpf.h         |  5 +++++
>>  kernel/bpf/trampoline.c     |  4 ++--
>>  kernel/bpf/verifier.c       | 30 +++++++++++++++++++++++-------
>>  4 files changed, 56 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index a5930042139d3..2846c21d75bfa 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -303,8 +303,12 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
>>  	prog += X86_PATCH_SIZE;
>>  	if (!ebpf_from_cbpf) {
>>  		if (tail_call_reachable && !is_subprog)
>> +			/* When it's the entry of the whole tailcall context,
>> +			 * zeroing rax means initialising tail_call_cnt.
>> +			 */
>>  			EMIT2(0x31, 0xC0); /* xor eax, eax */
>>  		else
>> +			/* Keep the same instruction layout. */
> 
> While these comments are helpful I have mixed feelings about them residing
> in this patch - rule of thumb to me is to keep the fixes as small as
> possible.

Got it. I'll separate them into another patch.

Thanks for your rule of thumb.

> 
>>  			EMIT2(0x66, 0x90); /* nop2 */
>>  	}
>>  	EMIT1(0x55);             /* push rbp */
>> @@ -1018,6 +1022,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
>>  
>>  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>>  
>> +/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>> +#define RESTORE_TAIL_CALL_CNT(stack)				\
>> +	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
>> +
>>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>>  		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
>>  {
>> @@ -1623,9 +1631,7 @@ st:			if (is_imm8(insn->off))
>>  
>>  			func = (u8 *) __bpf_call_base + imm32;
>>  			if (tail_call_reachable) {
>> -				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>> -				EMIT3_off32(0x48, 0x8B, 0x85,
>> -					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
>> +				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
>>  				if (!imm32)
>>  					return -EINVAL;
>>  				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
>> @@ -2400,6 +2406,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  	 *                     [ ...        ]
>>  	 *                     [ stack_arg2 ]
>>  	 * RBP - arg_stack_off [ stack_arg1 ]
>> +	 * RSP                 [ tail_call_cnt ] BPF_TRAMP_F_TAIL_CALL_CTX
>>  	 */
>>  
>>  	/* room for return value of orig_call or fentry prog */
>> @@ -2464,6 +2471,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  	else
>>  		/* sub rsp, stack_size */
>>  		EMIT4(0x48, 0x83, 0xEC, stack_size);
>> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
>> +		EMIT1(0x50);		/* push rax */
>>  	/* mov QWORD PTR [rbp - rbx_off], rbx */
>>  	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
>>  
>> @@ -2516,9 +2525,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  		restore_regs(m, &prog, regs_off);
>>  		save_args(m, &prog, arg_stack_off, true);
>>  
>> +		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
>> +			/* Before calling the original function, restore the
>> +			 * tail_call_cnt from stack to rax.
>> +			 */
>> +			RESTORE_TAIL_CALL_CNT(stack_size);
>> +
>>  		if (flags & BPF_TRAMP_F_ORIG_STACK) {
>> -			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
>> -			EMIT2(0xff, 0xd0); /* call *rax */
>> +			emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8);
>> +			EMIT2(0xff, 0xd3); /* call *rbx */
>>  		} else {
>>  			/* call original function */
>>  			if (emit_rsb_call(&prog, orig_call, prog)) {
>> @@ -2569,7 +2584,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  			ret = -EINVAL;
>>  			goto cleanup;
>>  		}
>> -	}
>> +	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
>> +		/* Before running the original function, restore the
>> +		 * tail_call_cnt from stack to rax.
>> +		 */
>> +		RESTORE_TAIL_CALL_CNT(stack_size);
>> +
>>  	/* restore return value of orig_call or fentry prog back into RAX */
>>  	if (save_ret)
>>  		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index cfabbcf47bdb8..c8df257ea435d 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1028,6 +1028,11 @@ struct btf_func_model {
>>   */
>>  #define BPF_TRAMP_F_SHARE_IPMODIFY	BIT(6)
>>  
>> +/* Indicate that current trampoline is in a tail call context. Then, it has to
>> + * cache and restore tail_call_cnt to avoid infinite tail call loop.
>> + */
>> +#define BPF_TRAMP_F_TAIL_CALL_CTX	BIT(7)
>> +
>>  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
>>   * bytes on x86.
>>   */
>> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
>> index 78acf28d48732..16ab5da7161f2 100644
>> --- a/kernel/bpf/trampoline.c
>> +++ b/kernel/bpf/trampoline.c
>> @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>>  		goto out;
>>  	}
>>  
>> -	/* clear all bits except SHARE_IPMODIFY */
>> -	tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
>> +	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
>> +	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
>>  
>>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 4ccca1f6c9981..6f290bc6f5f19 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -19246,6 +19246,21 @@ static int check_non_sleepable_error_inject(u32 btf_id)
>>  	return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
>>  }
>>  
>> +static inline int find_subprog_index(const struct bpf_prog *prog,
> 
> FWIW please no inlines in source files, but I don't currently follow the
> need for that routine.

Got it. It's unnecessary to inline it.

> 
>> +				     u32 btf_id)
>> +{
>> +	struct bpf_prog_aux *aux = prog->aux;
>> +	int i, subprog = -1;
>> +
>> +	for (i = 0; i < aux->func_info_cnt; i++)
>> +		if (aux->func_info[i].type_id == btf_id) {
>> +			subprog = i;
>> +			break;
>> +		}
>> +
>> +	return subprog;
>> +}
>> +
>>  int bpf_check_attach_target(struct bpf_verifier_log *log,
>>  			    const struct bpf_prog *prog,
>>  			    const struct bpf_prog *tgt_prog,
>> @@ -19254,9 +19269,9 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>>  {
>>  	bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
>>  	const char prefix[] = "btf_trace_";
>> -	int ret = 0, subprog = -1, i;
>>  	const struct btf_type *t;
>>  	bool conservative = true;
>> +	int ret = 0, subprog;
>>  	const char *tname;
>>  	struct btf *btf;
>>  	long addr = 0;
>> @@ -19291,11 +19306,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>>  			return -EINVAL;
>>  		}
>>  
>> -		for (i = 0; i < aux->func_info_cnt; i++)
>> -			if (aux->func_info[i].type_id == btf_id) {
>> -				subprog = i;
>> -				break;
>> -			}
>> +		subprog = find_subprog_index(tgt_prog, btf_id);
>>  		if (subprog == -1) {
>>  			bpf_log(log, "Subprog %s doesn't exist\n", tname);
>>  			return -EINVAL;
>> @@ -19559,7 +19570,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
>>  	struct bpf_attach_target_info tgt_info = {};
>>  	u32 btf_id = prog->aux->attach_btf_id;
>>  	struct bpf_trampoline *tr;
>> -	int ret;
>> +	int ret, subprog;
>>  	u64 key;
>>  
>>  	if (prog->type == BPF_PROG_TYPE_SYSCALL) {
>> @@ -19629,6 +19640,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
>>  	if (!tr)
>>  		return -ENOMEM;
>>  
>> +	if (tgt_prog && tgt_prog->aux->tail_call_reachable) {
>> +		subprog = find_subprog_index(tgt_prog, btf_id);
>> +		tr->flags = subprog > 0 ? BPF_TRAMP_F_TAIL_CALL_CTX : 0;
>> +	}
> 
> I kinda forgot trampoline internals so please bear with me.
> Here you are checking actually...what? That current program is a subprog
> of tgt prog? My knee jerk reaction would be to propagate the
> BPF_TRAMP_F_TAIL_CALL_CTX based on just tail_call_reachable, but I need
> some more time to get my head around it again, sorry :<

Yeah, that current program must be a subprog of tgt prog.

For example:

tailcall_subprog() {
  bpf_tail_call_static(&jmp_table, 0);
}

tailcall_prog() {
  tailcall_subprog();
}

prog() {
  bpf_tail_call_static(&jmp_table, 0);
}

jmp_table populates with tailcall_prog().

When do fentry on prog(), there's no tail_call_cnt for fentry to
propagate. As we can see in emit_prologue(), fentry runs before
initialising tail_call_cnt.

When do fentry on tailcall_prog()? NO, it's impossible to do fentry on
tailcall_prog(). Because the tailcall 'jmp' skips the fentry on
tailcall_prog().

And, when do fentry on tailcall_subprog(), fentry has to propagate
tail_call_cnt properly.

In conclusion, that current program must be a subprog of tgt prog.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] bpf, x64: Fix tailcall infinite loop
  2023-08-26  4:03     ` Leon Hwang
@ 2023-08-30 22:49       ` Maciej Fijalkowski
  2023-08-31 13:12         ` Leon Hwang
  0 siblings, 1 reply; 8+ messages in thread
From: Maciej Fijalkowski @ 2023-08-30 22:49 UTC (permalink / raw)
  To: Leon Hwang; +Cc: ast, daniel, andrii, song, bpf

On Sat, Aug 26, 2023 at 12:03:12PM +0800, Leon Hwang wrote:
> 
> 
> On 2023/8/26 01:58, Maciej Fijalkowski wrote:
> > On Fri, Aug 25, 2023 at 10:52:15PM +0800, Leon Hwang wrote:
> >> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
> >> handling in JIT"), the tailcall on x64 works better than before.
> >>
> >> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
> >> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> >>
> >> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
> >> to other BPF programs"), BPF program is able to trace other BPF programs.
> >>
> >> How about combining them all together?
> >>
> >> 1. FENTRY/FEXIT on a BPF subprogram.
> >> 2. A tailcall runs in the BPF subprogram.
> >> 3. The tailcall calls itself.
> > 
> > I would be interested in seeing broken asm code TBH :)
> > 
> >>
> >> As a result, a tailcall infinite loop comes up. And the loop would halt
> >> the machine.
> >>
> >> As we know, in tail call context, the tail_call_cnt propagates by stack
> >> and rax register between BPF subprograms. So do it in trampolines.
> >>
> >> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
> >> ---
> >>  arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++------
> >>  include/linux/bpf.h         |  5 +++++
> >>  kernel/bpf/trampoline.c     |  4 ++--
> >>  kernel/bpf/verifier.c       | 30 +++++++++++++++++++++++-------
> >>  4 files changed, 56 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >> index a5930042139d3..2846c21d75bfa 100644
> >> --- a/arch/x86/net/bpf_jit_comp.c
> >> +++ b/arch/x86/net/bpf_jit_comp.c
> >> @@ -303,8 +303,12 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
> >>  	prog += X86_PATCH_SIZE;
> >>  	if (!ebpf_from_cbpf) {
> >>  		if (tail_call_reachable && !is_subprog)
> >> +			/* When it's the entry of the whole tailcall context,
> >> +			 * zeroing rax means initialising tail_call_cnt.
> >> +			 */
> >>  			EMIT2(0x31, 0xC0); /* xor eax, eax */
> >>  		else
> >> +			/* Keep the same instruction layout. */
> > 
> > While these comments are helpful I have mixed feelings about them residing
> > in this patch - rule of thumb to me is to keep the fixes as small as
> > possible.
> 
> Got it. I'll separate them into another patch.
> 
> Thanks for your rule of thumb.
> 
> > 
> >>  			EMIT2(0x66, 0x90); /* nop2 */
> >>  	}
> >>  	EMIT1(0x55);             /* push rbp */
> >> @@ -1018,6 +1022,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
> >>  
> >>  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
> >>  
> >> +/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> >> +#define RESTORE_TAIL_CALL_CNT(stack)				\
> >> +	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
> >> +
> >>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
> >>  		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
> >>  {
> >> @@ -1623,9 +1631,7 @@ st:			if (is_imm8(insn->off))
> >>  
> >>  			func = (u8 *) __bpf_call_base + imm32;
> >>  			if (tail_call_reachable) {
> >> -				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> >> -				EMIT3_off32(0x48, 0x8B, 0x85,
> >> -					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
> >> +				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
> >>  				if (!imm32)
> >>  					return -EINVAL;
> >>  				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
> >> @@ -2400,6 +2406,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >>  	 *                     [ ...        ]
> >>  	 *                     [ stack_arg2 ]
> >>  	 * RBP - arg_stack_off [ stack_arg1 ]
> >> +	 * RSP                 [ tail_call_cnt ] BPF_TRAMP_F_TAIL_CALL_CTX
> >>  	 */
> >>  
> >>  	/* room for return value of orig_call or fentry prog */
> >> @@ -2464,6 +2471,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >>  	else
> >>  		/* sub rsp, stack_size */
> >>  		EMIT4(0x48, 0x83, 0xEC, stack_size);
> >> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> >> +		EMIT1(0x50);		/* push rax */
> >>  	/* mov QWORD PTR [rbp - rbx_off], rbx */
> >>  	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
> >>  
> >> @@ -2516,9 +2525,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >>  		restore_regs(m, &prog, regs_off);
> >>  		save_args(m, &prog, arg_stack_off, true);
> >>  
> >> +		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> >> +			/* Before calling the original function, restore the
> >> +			 * tail_call_cnt from stack to rax.
> >> +			 */
> >> +			RESTORE_TAIL_CALL_CNT(stack_size);
> >> +
> >>  		if (flags & BPF_TRAMP_F_ORIG_STACK) {
> >> -			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> >> -			EMIT2(0xff, 0xd0); /* call *rax */
> >> +			emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8);
> >> +			EMIT2(0xff, 0xd3); /* call *rbx */
> >>  		} else {
> >>  			/* call original function */
> >>  			if (emit_rsb_call(&prog, orig_call, prog)) {
> >> @@ -2569,7 +2584,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >>  			ret = -EINVAL;
> >>  			goto cleanup;
> >>  		}
> >> -	}
> >> +	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> >> +		/* Before running the original function, restore the
> >> +		 * tail_call_cnt from stack to rax.
> >> +		 */
> >> +		RESTORE_TAIL_CALL_CNT(stack_size);
> >> +
> >>  	/* restore return value of orig_call or fentry prog back into RAX */
> >>  	if (save_ret)
> >>  		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >> index cfabbcf47bdb8..c8df257ea435d 100644
> >> --- a/include/linux/bpf.h
> >> +++ b/include/linux/bpf.h
> >> @@ -1028,6 +1028,11 @@ struct btf_func_model {
> >>   */
> >>  #define BPF_TRAMP_F_SHARE_IPMODIFY	BIT(6)
> >>  
> >> +/* Indicate that current trampoline is in a tail call context. Then, it has to
> >> + * cache and restore tail_call_cnt to avoid infinite tail call loop.
> >> + */
> >> +#define BPF_TRAMP_F_TAIL_CALL_CTX	BIT(7)
> >> +
> >>  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
> >>   * bytes on x86.
> >>   */
> >> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> >> index 78acf28d48732..16ab5da7161f2 100644
> >> --- a/kernel/bpf/trampoline.c
> >> +++ b/kernel/bpf/trampoline.c
> >> @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
> >>  		goto out;
> >>  	}
> >>  
> >> -	/* clear all bits except SHARE_IPMODIFY */
> >> -	tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
> >> +	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
> >> +	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
> >>  
> >>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
> >>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index 4ccca1f6c9981..6f290bc6f5f19 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -19246,6 +19246,21 @@ static int check_non_sleepable_error_inject(u32 btf_id)
> >>  	return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
> >>  }
> >>  
> >> +static inline int find_subprog_index(const struct bpf_prog *prog,
> > 
> > FWIW please no inlines in source files, but I don't currently follow the
> > need for that routine.
> 
> Got it. It's unnecessary to inline it.
> 
> > 
> >> +				     u32 btf_id)
> >> +{
> >> +	struct bpf_prog_aux *aux = prog->aux;
> >> +	int i, subprog = -1;
> >> +
> >> +	for (i = 0; i < aux->func_info_cnt; i++)
> >> +		if (aux->func_info[i].type_id == btf_id) {
> >> +			subprog = i;
> >> +			break;
> >> +		}
> >> +
> >> +	return subprog;
> >> +}
> >> +
> >>  int bpf_check_attach_target(struct bpf_verifier_log *log,
> >>  			    const struct bpf_prog *prog,
> >>  			    const struct bpf_prog *tgt_prog,
> >> @@ -19254,9 +19269,9 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> >>  {
> >>  	bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
> >>  	const char prefix[] = "btf_trace_";
> >> -	int ret = 0, subprog = -1, i;
> >>  	const struct btf_type *t;
> >>  	bool conservative = true;
> >> +	int ret = 0, subprog;
> >>  	const char *tname;
> >>  	struct btf *btf;
> >>  	long addr = 0;
> >> @@ -19291,11 +19306,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> >>  			return -EINVAL;
> >>  		}
> >>  
> >> -		for (i = 0; i < aux->func_info_cnt; i++)
> >> -			if (aux->func_info[i].type_id == btf_id) {
> >> -				subprog = i;
> >> -				break;
> >> -			}
> >> +		subprog = find_subprog_index(tgt_prog, btf_id);
> >>  		if (subprog == -1) {
> >>  			bpf_log(log, "Subprog %s doesn't exist\n", tname);
> >>  			return -EINVAL;
> >> @@ -19559,7 +19570,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
> >>  	struct bpf_attach_target_info tgt_info = {};
> >>  	u32 btf_id = prog->aux->attach_btf_id;
> >>  	struct bpf_trampoline *tr;
> >> -	int ret;
> >> +	int ret, subprog;
> >>  	u64 key;
> >>  
> >>  	if (prog->type == BPF_PROG_TYPE_SYSCALL) {
> >> @@ -19629,6 +19640,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
> >>  	if (!tr)
> >>  		return -ENOMEM;
> >>  
> >> +	if (tgt_prog && tgt_prog->aux->tail_call_reachable) {
> >> +		subprog = find_subprog_index(tgt_prog, btf_id);
> >> +		tr->flags = subprog > 0 ? BPF_TRAMP_F_TAIL_CALL_CTX : 0;
> >> +	}
> > 
> > I kinda forgot trampoline internals so please bear with me.
> > Here you are checking actually...what? That current program is a subprog
> > of tgt prog? My knee jerk reaction would be to propagate the
> > BPF_TRAMP_F_TAIL_CALL_CTX based on just tail_call_reachable, but I need
> > some more time to get my head around it again, sorry :<
> 
> Yeah, that current program must be a subprog of tgt prog.
> 
> For example:
> 
> tailcall_subprog() {
>   bpf_tail_call_static(&jmp_table, 0);
> }
> 
> tailcall_prog() {
>   tailcall_subprog();
> }
> 
> prog() {
>   bpf_tail_call_static(&jmp_table, 0);
> }
> 
> jmp_table populates with tailcall_prog().
> 
> When do fentry on prog(), there's no tail_call_cnt for fentry to
> propagate. As we can see in emit_prologue(), fentry runs before
> initialising tail_call_cnt.
> 
> When do fentry on tailcall_prog()? NO, it's impossible to do fentry on
> tailcall_prog(). Because the tailcall 'jmp' skips the fentry on
> tailcall_prog().
> 
> And, when do fentry on tailcall_subprog(), fentry has to propagate
> tail_call_cnt properly.
> 
> In conclusion, that current program must be a subprog of tgt prog.

Verifier propagates the info about tail call usage through whole call
chain on a given prog so this doesn't really matter to me where do we
attach fentry progs. All I'm saying is:

	if (tgt_prog && tgt_prog->aux->tail_call_reachable)
		tr->flags = BPF_TRAMP_F_TAIL_CALL_CTX;

should be just fine. I might be missing something but with above your
selftest does not hang my system.

> 
> Thanks,
> Leon
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] bpf, x64: Fix tailcall infinite loop
  2023-08-30 22:49       ` Maciej Fijalkowski
@ 2023-08-31 13:12         ` Leon Hwang
  2023-08-31 14:44           ` Leon Hwang
  0 siblings, 1 reply; 8+ messages in thread
From: Leon Hwang @ 2023-08-31 13:12 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: ast, daniel, andrii, song, bpf



On 2023/8/31 06:49, Maciej Fijalkowski wrote:
> On Sat, Aug 26, 2023 at 12:03:12PM +0800, Leon Hwang wrote:
>>
>>
>> On 2023/8/26 01:58, Maciej Fijalkowski wrote:
>>> On Fri, Aug 25, 2023 at 10:52:15PM +0800, Leon Hwang wrote:
>>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>>>> handling in JIT"), the tailcall on x64 works better than before.
>>>>
>>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
>>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>>>
>>>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
>>>> to other BPF programs"), BPF program is able to trace other BPF programs.
>>>>
>>>> How about combining them all together?
>>>>
>>>> 1. FENTRY/FEXIT on a BPF subprogram.
>>>> 2. A tailcall runs in the BPF subprogram.
>>>> 3. The tailcall calls itself.
>>>
>>> I would be interested in seeing broken asm code TBH :)
>>>
>>>>
>>>> As a result, a tailcall infinite loop comes up. And the loop would halt
>>>> the machine.
>>>>
>>>> As we know, in tail call context, the tail_call_cnt propagates by stack
>>>> and rax register between BPF subprograms. So do it in trampolines.
>>>>
>>>> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
>>>> ---
>>>>  arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++------
>>>>  include/linux/bpf.h         |  5 +++++
>>>>  kernel/bpf/trampoline.c     |  4 ++--
>>>>  kernel/bpf/verifier.c       | 30 +++++++++++++++++++++++-------
>>>>  4 files changed, 56 insertions(+), 15 deletions(-)
>>>>

[SNIP]

>>>>  
>>>> +	if (tgt_prog && tgt_prog->aux->tail_call_reachable) {
>>>> +		subprog = find_subprog_index(tgt_prog, btf_id);
>>>> +		tr->flags = subprog > 0 ? BPF_TRAMP_F_TAIL_CALL_CTX : 0;
>>>> +	}
>>>
>>> I kinda forgot trampoline internals so please bear with me.
>>> Here you are checking actually...what? That current program is a subprog
>>> of tgt prog? My knee jerk reaction would be to propagate the
>>> BPF_TRAMP_F_TAIL_CALL_CTX based on just tail_call_reachable, but I need
>>> some more time to get my head around it again, sorry :<
>>
>> Yeah, that current program must be a subprog of tgt prog.
>>
>> For example:
>>
>> tailcall_subprog() {
>>   bpf_tail_call_static(&jmp_table, 0);
>> }
>>
>> tailcall_prog() {
>>   tailcall_subprog();
>> }
>>
>> prog() {
>>   bpf_tail_call_static(&jmp_table, 0);
>> }
>>
>> jmp_table populates with tailcall_prog().
>>
>> When do fentry on prog(), there's no tail_call_cnt for fentry to
>> propagate. As we can see in emit_prologue(), fentry runs before
>> initialising tail_call_cnt.
>>
>> When do fentry on tailcall_prog()? NO, it's impossible to do fentry on
>> tailcall_prog(). Because the tailcall 'jmp' skips the fentry on
>> tailcall_prog().
>>
>> And, when do fentry on tailcall_subprog(), fentry has to propagate
>> tail_call_cnt properly.
>>
>> In conclusion, that current program must be a subprog of tgt prog.
> 
> Verifier propagates the info about tail call usage through whole call
> chain on a given prog so this doesn't really matter to me where do we
> attach fentry progs. All I'm saying is:
> 
> 	if (tgt_prog && tgt_prog->aux->tail_call_reachable)
> 		tr->flags = BPF_TRAMP_F_TAIL_CALL_CTX;
> 
> should be just fine. I might be missing something but with above your
> selftest does not hang my system.

I think it's unnecessary to propagate tail call usage info when do
fentry on prog(), which is the entry of the whole tail call context. If
do propagate in this case, it's meaningless to execute two extra
instructions.

I confirm that the above selftest is able to hang VM. I copy test_progs
along with tailcall*.bpf.o to another VM, which is Ubuntu 22.04.3 with
kernel 5.15.0-82-generic, then run ./test_progs -t tailcalls. And then
the VM hangs.

Here's the Ubuntu 22.04.3 VM info:
# uname -a
Linux hwang 5.15.0-82-generic #91-Ubuntu SMP Mon Aug 14 14:14:14 UTC
2023 x86_64 x86_64 x86_64 GNU/Linux

Thanks,
Leon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] bpf, x64: Fix tailcall infinite loop
  2023-08-31 13:12         ` Leon Hwang
@ 2023-08-31 14:44           ` Leon Hwang
  0 siblings, 0 replies; 8+ messages in thread
From: Leon Hwang @ 2023-08-31 14:44 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: ast, daniel, andrii, song, bpf



On 2023/8/31 21:12, Leon Hwang wrote:
> 
> 
> On 2023/8/31 06:49, Maciej Fijalkowski wrote:
>> On Sat, Aug 26, 2023 at 12:03:12PM +0800, Leon Hwang wrote:
>>>
>>>
>>> On 2023/8/26 01:58, Maciej Fijalkowski wrote:
>>>> On Fri, Aug 25, 2023 at 10:52:15PM +0800, Leon Hwang wrote:
>>>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>>>>> handling in JIT"), the tailcall on x64 works better than before.
>>>>>
>>>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
>>>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>>>>
>>>>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
>>>>> to other BPF programs"), BPF program is able to trace other BPF programs.
>>>>>
>>>>> How about combining them all together?
>>>>>
>>>>> 1. FENTRY/FEXIT on a BPF subprogram.
>>>>> 2. A tailcall runs in the BPF subprogram.
>>>>> 3. The tailcall calls itself.
>>>>
>>>> I would be interested in seeing broken asm code TBH :)
>>>>
>>>>>
>>>>> As a result, a tailcall infinite loop comes up. And the loop would halt
>>>>> the machine.
>>>>>
>>>>> As we know, in tail call context, the tail_call_cnt propagates by stack
>>>>> and rax register between BPF subprograms. So do it in trampolines.
>>>>>
>>>>> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
>>>>> ---
>>>>>  arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++------
>>>>>  include/linux/bpf.h         |  5 +++++
>>>>>  kernel/bpf/trampoline.c     |  4 ++--
>>>>>  kernel/bpf/verifier.c       | 30 +++++++++++++++++++++++-------
>>>>>  4 files changed, 56 insertions(+), 15 deletions(-)
>>>>>
> 
> [SNIP]
> 
>>>>>  
>>>>> +	if (tgt_prog && tgt_prog->aux->tail_call_reachable) {
>>>>> +		subprog = find_subprog_index(tgt_prog, btf_id);
>>>>> +		tr->flags = subprog > 0 ? BPF_TRAMP_F_TAIL_CALL_CTX : 0;
>>>>> +	}
>>>>
>>>> I kinda forgot trampoline internals so please bear with me.
>>>> Here you are checking actually...what? That current program is a subprog
>>>> of tgt prog? My knee jerk reaction would be to propagate the
>>>> BPF_TRAMP_F_TAIL_CALL_CTX based on just tail_call_reachable, but I need
>>>> some more time to get my head around it again, sorry :<
>>>
>>> Yeah, that current program must be a subprog of tgt prog.
>>>
>>> For example:
>>>
>>> tailcall_subprog() {
>>>   bpf_tail_call_static(&jmp_table, 0);
>>> }
>>>
>>> tailcall_prog() {
>>>   tailcall_subprog();
>>> }
>>>
>>> prog() {
>>>   bpf_tail_call_static(&jmp_table, 0);
>>> }
>>>
>>> jmp_table populates with tailcall_prog().
>>>
>>> When do fentry on prog(), there's no tail_call_cnt for fentry to
>>> propagate. As we can see in emit_prologue(), fentry runs before
>>> initialising tail_call_cnt.
>>>
>>> When do fentry on tailcall_prog()? NO, it's impossible to do fentry on
>>> tailcall_prog(). Because the tailcall 'jmp' skips the fentry on
>>> tailcall_prog().
>>>
>>> And, when do fentry on tailcall_subprog(), fentry has to propagate
>>> tail_call_cnt properly.
>>>
>>> In conclusion, that current program must be a subprog of tgt prog.
>>
>> Verifier propagates the info about tail call usage through whole call
>> chain on a given prog so this doesn't really matter to me where do we
>> attach fentry progs. All I'm saying is:
>>
>> 	if (tgt_prog && tgt_prog->aux->tail_call_reachable)
>> 		tr->flags = BPF_TRAMP_F_TAIL_CALL_CTX;
>>
>> should be just fine. I might be missing something but with above your
>> selftest does not hang my system.
> 
> I think it's unnecessary to propagate tail call usage info when do
> fentry on prog(), which is the entry of the whole tail call context. If
> do propagate in this case, it's meaningless to execute two extra
> instructions.

Because it's harmless, I agree with you. I'll change it to

 	if (tgt_prog && tgt_prog->aux->tail_call_reachable)
 		tr->flags = BPF_TRAMP_F_TAIL_CALL_CTX;

With this update, it's easier to understand BPF_TRAMP_F_TAIL_CALL_CTX.

> 
> I confirm that the above selftest is able to hang VM. I copy test_progs
> along with tailcall*.bpf.o to another VM, which is Ubuntu 22.04.3 with
> kernel 5.15.0-82-generic, then run ./test_progs -t tailcalls. And then
> the VM hangs.
> 
> Here's the Ubuntu 22.04.3 VM info:
> # uname -a
> Linux hwang 5.15.0-82-generic #91-Ubuntu SMP Mon Aug 14 14:14:14 UTC
> 2023 x86_64 x86_64 x86_64 GNU/Linux

What I suggest here is to run the selftest in the second patch, not the
above example.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-31 14:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-25 14:52 [RFC PATCH bpf-next v3 0/2] bpf, x64: Fix tailcall infinite loop Leon Hwang
2023-08-25 14:52 ` [RFC PATCH bpf-next v3 1/2] " Leon Hwang
2023-08-25 17:58   ` Maciej Fijalkowski
2023-08-26  4:03     ` Leon Hwang
2023-08-30 22:49       ` Maciej Fijalkowski
2023-08-31 13:12         ` Leon Hwang
2023-08-31 14:44           ` Leon Hwang
2023-08-25 14:52 ` [RFC PATCH bpf-next v3 2/2] selftests/bpf: Add testcases for tailcall infinite loop fixing Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox