BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace
@ 2024-09-23 13:40 Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

Previously, I fixed a tailcall infinite loop issue caused by trampoline[0].

At this time, I fix a tailcall infinite loop issue caused by tailcall and
freplace combination by preventing updating extended prog to prog_array map
and preventing extending tail callee prog with freplace.

v2 -> v3:
  * Address comments from Alexei:
    * Stop hacking JIT.
    * Prevent the specific use case at attach/update time.

v1 -> v2:
  * Address comment from Eduard:
    * Explain why nop5 and xor/nop3 are swapped at prologue.
  * Address comment from Alexei:
    * Disallow attaching tail_call_reachable freplace prog to
      not-tail_call_reachable target in verifier.
  * Update "bpf, arm64: Fix tailcall infinite loop caused by freplace" with
    latest arm64 JIT code.

Links:
[0] https://lore.kernel.org/bpf/20230912150442.2009-1-hffilwlqm@gmail.com/

Leon Hwang (4):
  bpf: Prevent updating extended prog to prog_array map
  bpf: Prevent extending tail callee prog with freplace
  selftests/bpf: Add a test case to confirm a tailcall infinite loop
    issue has been prevented
  selftests/bpf: Add cases to test tailcall in freplace

 include/linux/bpf.h                           |   2 +
 kernel/bpf/arraymap.c                         |  13 +-
 kernel/bpf/syscall.c                          |  19 +-
 .../selftests/bpf/prog_tests/tailcalls.c      | 196 +++++++++++++++++-
 .../tailcall_bpf2bpf_hierarchy_freplace.c     |  30 +++
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  |  37 +++-
 6 files changed, 290 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c

-- 
2.44.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  2024-09-25  1:24   ` Eduard Zingerman
  2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

This patch prevents a potential infinite loop issue caused by combination
of tailcal and freplace partially.

For example:

tc_bpf2bpf.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

__noinline
int subprog_tc(struct __sk_buff *skb)
{
	return skb->len * 2;
}

SEC("tc")
int entry_tc(struct __sk_buff *skb)
{
	return subprog_tc(skb);
}

char __license[] SEC("license") = "GPL";

tailcall_bpf2bpf_hierarchy_freplace.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

int count = 0;

static __noinline
int subprog_tail(struct __sk_buff *skb)
{
	bpf_tail_call_static(skb, &jmp_table, 0);
	return 0;
}

SEC("freplace")
int entry_freplace(struct __sk_buff *skb)
{
	count++;
	subprog_tail(skb);
	subprog_tail(skb);
	return count;
}

char __license[] SEC("license") = "GPL";

The attach target of entry_freplace is subprog_tc, and the tail callee
in subprog_tail is entry_tc.

Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace
-> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
entry_freplace will count from zero for every time of entry_freplace
execution. Kernel will panic:

[   15.310490] BUG: TASK stack guard page was hit at (____ptrval____)
(stack is (____ptrval____)..(____ptrval____))
[   15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI
[   15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G           OE
   6.10.0-rc6-g026dcdae8d3e-dirty #72
[   15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e
fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c
[   15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202
[   15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX:
0000000000008cb5
[   15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI:
ffff9c98808b7e00
[   15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09:
0000000000000000
[   15.310490] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffb500c01af000
[   15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15:
0000000000000000
[   15.310490] FS:  00007f133b665140(0000) GS:ffff9c98bbd00000(0000)
knlGS:0000000000000000
[   15.310490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4:
00000000000006f0
[   15.310490] Call Trace:
[   15.310490]  <#DF>
[   15.310490]  ? die+0x36/0x90
[   15.310490]  ? handle_stack_overflow+0x4d/0x60
[   15.310490]  ? exc_double_fault+0x117/0x1a0
[   15.310490]  ? asm_exc_double_fault+0x23/0x30
[   15.310490]  ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490]  </#DF>
[   15.310490]  <TASK>
[   15.310490]  bpf_prog_85781a698094722f_entry+0x4c/0x64
[   15.310490]  bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b
[   15.310490]  ...
[   15.310490]  bpf_prog_85781a698094722f_entry+0x4c/0x64
[   15.310490]  bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b
[   15.310490]  bpf_test_run+0x210/0x370
[   15.310490]  ? bpf_test_run+0x128/0x370
[   15.310490]  bpf_prog_test_run_skb+0x388/0x7a0
[   15.310490]  __sys_bpf+0xdbf/0x2c40
[   15.310490]  ? clockevents_program_event+0x52/0xf0
[   15.310490]  ? lock_release+0xbf/0x290
[   15.310490]  __x64_sys_bpf+0x1e/0x30
[   15.310490]  do_syscall_64+0x68/0x140
[   15.310490]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   15.310490] RIP: 0033:0x7f133b52725d
[   15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[   15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX:
0000000000000141
[   15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX:
00007f133b52725d
[   15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI:
000000000000000a
[   15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09:
00007ffddbc102a0
[   15.310490] R10: 0000000000000064 R11: 0000000000000206 R12:
0000000000000004
[   15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15:
00007f133b6ed000
[   15.310490]  </TASK>
[   15.310490] Modules linked in: bpf_testmod(OE)
[   15.310490] ---[ end trace 0000000000000000 ]---
[   15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e
fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c
[   15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202
[   15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX:
0000000000008cb5
[   15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI:
ffff9c98808b7e00
[   15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09:
0000000000000000
[   15.310490] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffb500c01af000
[   15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15:
0000000000000000
[   15.310490] FS:  00007f133b665140(0000) GS:ffff9c98bbd00000(0000)
knlGS:0000000000000000
[   15.310490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4:
00000000000006f0
[   15.310490] Kernel panic - not syncing: Fatal exception in interrupt
[   15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)

This patch partially prevents this panic by preventing updating extended
prog to prog_array map.

If a prog or its subprog has been extended by freplace prog, the prog
can not be updated to prog_array map.

Alongside next patch, the panic will be prevented completely.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h   | 1 +
 kernel/bpf/arraymap.c | 7 ++++++-
 kernel/bpf/syscall.c  | 7 ++++++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0c3893c471711..048aa2625cbef 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1483,6 +1483,7 @@ struct bpf_prog_aux {
 	bool xdp_has_frags;
 	bool exception_cb;
 	bool exception_boundary;
+	bool is_extended; /* true if extended by freplace program */
 	struct bpf_arena *arena;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 79660e3fca4c1..8d97bae98fa70 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -951,7 +951,12 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
 	if (IS_ERR(prog))
 		return prog;
 
-	if (!bpf_prog_map_compatible(map, prog)) {
+	if (!bpf_prog_map_compatible(map, prog) || prog->aux->is_extended) {
+		/* Extended prog can not be tail callee. It's to prevent a
+		 * potential infinite loop like:
+		 * tail callee prog entry -> tail callee prog subprog ->
+		 * freplace prog entry --tailcall-> tail callee prog entry.
+		 */
 		bpf_prog_put(prog);
 		return ERR_PTR(-EINVAL);
 	}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8a4117f6d7610..18b3f9216b050 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3292,8 +3292,11 @@ static void bpf_tracing_link_release(struct bpf_link *link)
 	bpf_trampoline_put(tr_link->trampoline);
 
 	/* tgt_prog is NULL if target is a kernel function */
-	if (tr_link->tgt_prog)
+	if (tr_link->tgt_prog) {
+		if (link->prog->type == BPF_PROG_TYPE_EXT)
+			tr_link->tgt_prog->aux->is_extended = false;
 		bpf_prog_put(tr_link->tgt_prog);
+	}
 }
 
 static void bpf_tracing_link_dealloc(struct bpf_link *link)
@@ -3523,6 +3526,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 	if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
 		/* we allocated a new trampoline, so free the old one */
 		bpf_trampoline_put(prog->aux->dst_trampoline);
+	if (prog->type == BPF_PROG_TYPE_EXT)
+		tgt_prog->aux->is_extended = true;
 
 	prog->aux->dst_prog = NULL;
 	prog->aux->dst_trampoline = NULL;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  2024-09-25  5:32   ` Eduard Zingerman
  2024-09-23 13:40 ` [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace Leon Hwang
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

Alongside previous patch, the infinite loop issue caused by combination of
tailcal and freplace can be prevented completely.

The previous patch can not prevent the use case that updates a prog to
prog_array map and then extends the prog with freplace prog.

This patch fixes the case by preventing extending a prog, which has been
updated to prog_array map, with freplace prog.

If a prog has been updated to prog_array map, it or its subprog can not
be extended by freplace prog.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/arraymap.c |  6 +++++-
 kernel/bpf/syscall.c  | 12 ++++++++++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 048aa2625cbef..b864b37e67c17 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1484,6 +1484,7 @@ struct bpf_prog_aux {
 	bool exception_cb;
 	bool exception_boundary;
 	bool is_extended; /* true if extended by freplace program */
+	atomic_t tail_callee_cnt;
 	struct bpf_arena *arena;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 8d97bae98fa70..c12e0e3bf6ad0 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -961,13 +961,17 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
 		return ERR_PTR(-EINVAL);
 	}
 
+	atomic_inc(&prog->aux->tail_callee_cnt);
 	return prog;
 }
 
 static void prog_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
 {
+	struct bpf_prog *prog = ptr;
+
 	/* bpf_prog is freed after one RCU or tasks trace grace period */
-	bpf_prog_put(ptr);
+	atomic_dec(&prog->aux->tail_callee_cnt);
+	bpf_prog_put(prog);
 }
 
 static u32 prog_fd_array_sys_lookup_elem(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 18b3f9216b050..be829016d8182 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3501,6 +3501,18 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 		tgt_prog = prog->aux->dst_prog;
 	}
 
+	if (prog->type == BPF_PROG_TYPE_EXT &&
+	    atomic_read(&tgt_prog->aux->tail_callee_cnt)) {
+		/* Program extensions can not extend target prog when the target
+		 * prog has been updated to any prog_array map as tail callee.
+		 * It's to prevent a potential infinite loop like:
+		 * tgt prog entry -> tgt prog subprog -> freplace prog entry
+		 * --tailcall-> tgt prog entry.
+		 */
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	err = bpf_link_prime(&link->link.link, &link_primer);
 	if (err)
 		goto out_unlock;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace Leon Hwang
  3 siblings, 0 replies; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
335/26  tailcalls/tailcall_bpf2bpf_freplace:OK
335     tailcalls:OK
Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 99 ++++++++++++++++++-
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  | 24 ++++-
 2 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index 21c5a37846ade..fa3f3bb11b098 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -1525,7 +1525,8 @@ static void test_tailcall_freplace(void)
 
 	prog_fd = bpf_program__fd(tc_skel->progs.entry_tc);
 	freplace_prog = freplace_skel->progs.entry_freplace;
-	err = bpf_program__set_attach_target(freplace_prog, prog_fd, "subprog");
+	err = bpf_program__set_attach_target(freplace_prog, prog_fd,
+					     "subprog_tailcall_tc");
 	if (!ASSERT_OK(err, "set_attach_target"))
 		goto out;
 
@@ -1534,7 +1535,7 @@ static void test_tailcall_freplace(void)
 		goto out;
 
 	freplace_link = bpf_program__attach_freplace(freplace_prog, prog_fd,
-						     "subprog");
+						     "subprog_tailcall_tc");
 	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
 		goto out;
 
@@ -1556,6 +1557,98 @@ static void test_tailcall_freplace(void)
 	tailcall_freplace__destroy(freplace_skel);
 }
 
+/* test_tailcall_bpf2bpf_freplace checks the failure that fails to attach a tail
+ * callee prog with freplace prog or fails to update an extended prog to
+ * prog_array map.
+ */
+static void test_tailcall_bpf2bpf_freplace(void)
+{
+	struct tailcall_freplace *freplace_skel = NULL;
+	struct bpf_link *freplace_link = NULL;
+	struct tc_bpf2bpf *tc_skel = NULL;
+	char buff[128] = {};
+	int prog_fd, map_fd;
+	int err, key;
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .data_in = buff,
+		    .data_size_in = sizeof(buff),
+		    .repeat = 1,
+	);
+
+	tc_skel = tc_bpf2bpf__open_and_load();
+	if (!ASSERT_OK_PTR(tc_skel, "tc_bpf2bpf__open_and_load"))
+		goto out;
+
+	prog_fd = bpf_program__fd(tc_skel->progs.entry_tc);
+	freplace_skel = tailcall_freplace__open();
+	if (!ASSERT_OK_PTR(freplace_skel, "tailcall_freplace__open"))
+		goto out;
+
+	err = bpf_program__set_attach_target(freplace_skel->progs.entry_freplace,
+					     prog_fd, "subprog_tc");
+	if (!ASSERT_OK(err, "set_attach_target"))
+		goto out;
+
+	err = tailcall_freplace__load(freplace_skel);
+	if (!ASSERT_OK(err, "tailcall_freplace__load"))
+		goto out;
+
+	/* OK to attach then detach freplace prog. */
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_link__destroy(freplace_link);
+	if (!ASSERT_OK(err, "destroy link"))
+		goto out;
+
+	/* OK to update prog_array map then delete element from the map. */
+
+	key = 0;
+	map_fd = bpf_map__fd(freplace_skel->maps.jmp_table);
+	err = bpf_map_update_elem(map_fd, &key, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	err = bpf_map_delete_elem(map_fd, &key);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	/* Fail to attach a tail callee prog with freplace prog. */
+
+	err = bpf_map_update_elem(map_fd, &key, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tc");
+	if (!ASSERT_ERR_PTR(freplace_link, "attach_freplace failure"))
+		goto out;
+
+	err = bpf_map_delete_elem(map_fd, &key);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	/* Fail to update an extended prog to prog_array map. */
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_map_update_elem(map_fd, &key, &prog_fd, BPF_ANY);
+	if (!ASSERT_ERR(err, "update jmp_table failure"))
+		goto out;
+
+out:
+	bpf_link__destroy(freplace_link);
+	tailcall_freplace__destroy(freplace_skel);
+	tc_bpf2bpf__destroy(tc_skel);
+}
+
 void test_tailcalls(void)
 {
 	if (test__start_subtest("tailcall_1"))
@@ -1606,4 +1699,6 @@ void test_tailcalls(void)
 	test_tailcall_bpf2bpf_hierarchy_3();
 	if (test__start_subtest("tailcall_freplace"))
 		test_tailcall_freplace();
+	if (test__start_subtest("tailcall_bpf2bpf_freplace"))
+		test_tailcall_bpf2bpf_freplace();
 }
diff --git a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
index 8a0632c37839a..34f3c780194e4 100644
--- a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
@@ -4,11 +4,30 @@
 #include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"
 
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+__noinline
+int subprog_tailcall_tc(struct __sk_buff *skb)
+{
+	int ret = 1;
+
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	__sink(skb);
+	__sink(ret);
+	return ret;
+}
+
 __noinline
-int subprog(struct __sk_buff *skb)
+int subprog_tc(struct __sk_buff *skb)
 {
 	int ret = 1;
 
+	__sink(skb);
 	__sink(ret);
 	return ret;
 }
@@ -16,7 +35,8 @@ int subprog(struct __sk_buff *skb)
 SEC("tc")
 int entry_tc(struct __sk_buff *skb)
 {
-	return subprog(skb);
+	subprog_tc(skb);
+	return subprog_tailcall_tc(skb);
 }
 
 char __license[] SEC("license") = "GPL";
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
                   ` (2 preceding siblings ...)
  2024-09-23 13:40 ` [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  3 siblings, 0 replies; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
335/27  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_1:OK
335/28  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_2:OK
335     tailcalls:OK
Summary: 1/28 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 97 +++++++++++++++++++
 .../tailcall_bpf2bpf_hierarchy_freplace.c     | 30 ++++++
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  | 13 +++
 3 files changed, 140 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index fa3f3bb11b098..0564ad6c9b288 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -5,6 +5,7 @@
 #include "tailcall_poke.skel.h"
 #include "tailcall_bpf2bpf_hierarchy2.skel.h"
 #include "tailcall_bpf2bpf_hierarchy3.skel.h"
+#include "tailcall_bpf2bpf_hierarchy_freplace.skel.h"
 #include "tailcall_freplace.skel.h"
 #include "tc_bpf2bpf.skel.h"
 
@@ -1649,6 +1650,98 @@ static void test_tailcall_bpf2bpf_freplace(void)
 	tc_bpf2bpf__destroy(tc_skel);
 }
 
+static void test_tailcall_bpf2bpf_hierarchy_freplace(bool freplace_subprog,
+						     bool target_entry2,
+						     int count1, int count2)
+{
+	struct tailcall_bpf2bpf_hierarchy_freplace *freplace_skel = NULL;
+	struct bpf_link *freplace_link = NULL;
+	struct tc_bpf2bpf *tc_skel = NULL;
+	int prog_fd, map_fd;
+	char buff[128] = {};
+	int err, key, val;
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .data_in = buff,
+		    .data_size_in = sizeof(buff),
+		    .repeat = 1,
+	);
+
+	tc_skel = tc_bpf2bpf__open_and_load();
+	if (!ASSERT_OK_PTR(tc_skel, "tc_bpf2bpf__open_and_load"))
+		goto out;
+
+	prog_fd = bpf_program__fd(target_entry2 ? tc_skel->progs.entry_tc_2 :
+				  tc_skel->progs.entry_tc);
+	freplace_skel = tailcall_bpf2bpf_hierarchy_freplace__open();
+	if (!ASSERT_OK_PTR(freplace_skel, "tailcall_bpf2bpf_hierarchy_freplace__open"))
+		goto out;
+
+	err = bpf_program__set_attach_target(freplace_skel->progs.entry_freplace,
+					     prog_fd, freplace_subprog ?
+					     "subprog_tailcall_tc" : "entry_tc");
+	if (!ASSERT_OK(err, "set_attach_target"))
+		goto out;
+
+	err = tailcall_bpf2bpf_hierarchy_freplace__load(freplace_skel);
+	if (!ASSERT_OK(err, "tailcall_bpf2bpf_hierarchy_freplace__load"))
+		goto out;
+
+	val = bpf_program__fd(freplace_skel->progs.entry_freplace);
+	map_fd = bpf_map__fd(freplace_skel->maps.jmp_table);
+	key = 0;
+	err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, freplace_subprog ?
+						     "subprog_tailcall_tc" : "entry_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts");
+	ASSERT_EQ(topts.retval, count1, "test_run_opts retval");
+
+	err = bpf_map_delete_elem(map_fd, &key);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts again");
+	ASSERT_EQ(topts.retval, count2, "test_run_opts retval again");
+
+out:
+	bpf_link__destroy(freplace_link);
+	tailcall_bpf2bpf_hierarchy_freplace__destroy(freplace_skel);
+	tc_bpf2bpf__destroy(tc_skel);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_1 checks the count value of tail
+ * call limit enforcement matches with expectation for the case:
+ *
+ *                                    subprog_tail --tailcall-> entry_freplace
+ * entry_tc --jump-> entry_freplace <
+ *                                    subprog_tail --tailcall-> entry_freplace
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_1(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(false, false, 34, 35);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_2 checks the count value of tail
+ * call limit enforcement matches with expectation for the case:
+ *
+ *                                                                              subprog_tail --tailcall-> entry_freplace
+ * entry_tc_2 --> subprog_tailcall_tc (call 10 times) --jump-> entry_freplace <
+ *                                                                              subprog_tail --tailcall-> entry_freplace
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_2(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(true, true, 340, 350);
+}
+
 void test_tailcalls(void)
 {
 	if (test__start_subtest("tailcall_1"))
@@ -1701,4 +1794,8 @@ void test_tailcalls(void)
 		test_tailcall_freplace();
 	if (test__start_subtest("tailcall_bpf2bpf_freplace"))
 		test_tailcall_bpf2bpf_freplace();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_1"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_1();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_2"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_2();
 }
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c
new file mode 100644
index 0000000000000..6f7c1fac9ddb7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+int count = 0;
+
+static __noinline
+int subprog_tail(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	return 0;
+}
+
+SEC("freplace")
+int entry_freplace(struct __sk_buff *skb)
+{
+	count++;
+	subprog_tail(skb);
+	subprog_tail(skb);
+	return count;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
index 34f3c780194e4..beacf60a52677 100644
--- a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
@@ -39,4 +39,17 @@ int entry_tc(struct __sk_buff *skb)
 	return subprog_tailcall_tc(skb);
 }
 
+SEC("tc")
+int entry_tc_2(struct __sk_buff *skb)
+{
+	int ret, i;
+
+	for (i = 0; i < 10; i++) {
+		ret = subprog_tailcall_tc(skb);
+		__sink(ret);
+	}
+
+	return ret;
+}
+
 char __license[] SEC("license") = "GPL";
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
@ 2024-09-25  1:24   ` Eduard Zingerman
  2024-09-26  7:16     ` Leon Hwang
  0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-25  1:24 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:

[...]

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 8a4117f6d7610..18b3f9216b050 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3292,8 +3292,11 @@ static void bpf_tracing_link_release(struct bpf_link *link)
>  	bpf_trampoline_put(tr_link->trampoline);
>  
>  	/* tgt_prog is NULL if target is a kernel function */
> -	if (tr_link->tgt_prog)
> +	if (tr_link->tgt_prog) {
> +		if (link->prog->type == BPF_PROG_TYPE_EXT)
> +			tr_link->tgt_prog->aux->is_extended = false;
>  		bpf_prog_put(tr_link->tgt_prog);
> +	}
>  }
>  
>  static void bpf_tracing_link_dealloc(struct bpf_link *link)
> @@ -3523,6 +3526,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>  	if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
>  		/* we allocated a new trampoline, so free the old one */
>  		bpf_trampoline_put(prog->aux->dst_trampoline);
> +	if (prog->type == BPF_PROG_TYPE_EXT)
> +		tgt_prog->aux->is_extended = true;
>  
>  	prog->aux->dst_prog = NULL;
>  	prog->aux->dst_trampoline = NULL;

Sorry, this might be a silly question, I do not fully understand how
programs and trampolines are protected against concurrent update.

Sequence of actions in bpf_tracing_prog_attach():
a. call bpf_trampoline_link_prog(&link->link, tr)
   this returns success if `tr->extension_prog` is NULL,
   meaning trampoline is "free";
b. update tgt_prog->aux->is_extended = true.

Sequence of actions in bpf_tracing_link_release():
c. call bpf_trampoline_unlink_prog(&tr_link->link, tr_link->trampoline)
   this sets `tr->extension_prog` to NULL;
d. update tr_link->tgt_prog->aux->is_extended = false.

In a concurrent environment, is it possible to have actions ordered as:
- thread #1: does bpf_tracing_link_release(link pointing to tgt_prog)
- thread #2: does bpf_tracing_prog_attach(some_prog, tgt_prog)
- thread #1: (c) tr->extension_prog is set to NULL
- thread #2: (a) tr->extension_prog is set to some_prog
- thread #2: (b) tgt_prog->aux->is_extended = true
- thread #1: (d) tgt_prog->aux->is_extended = false

Thus, loosing the is_extended mark?

(As far as I understand bpf_trampoline_compute_key() call in
 bpf_tracing_prog_attach() it is possible for threads #1 and #2 to
 operate on a same trampoline object).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
@ 2024-09-25  5:32   ` Eduard Zingerman
  2024-09-26  7:19     ` Leon Hwang
  0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-25  5:32 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:

[...]

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 048aa2625cbef..b864b37e67c17 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1484,6 +1484,7 @@ struct bpf_prog_aux {
>  	bool exception_cb;
>  	bool exception_boundary;
>  	bool is_extended; /* true if extended by freplace program */
> +	atomic_t tail_callee_cnt;

Nit: the name is a bit misleading, this counts how many times the
     program resides it prog maps. Confusing w/o additional comments.
     Maybe something like 'member_of_prog_array_cnt'?

>  	struct bpf_arena *arena;
>  	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
>  	const struct btf_type *attach_func_proto;
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 8d97bae98fa70..c12e0e3bf6ad0 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -961,13 +961,17 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
>  		return ERR_PTR(-EINVAL);
>  	}
>  
> +	atomic_inc(&prog->aux->tail_callee_cnt);
>  	return prog;
>  }

[...]

>  static u32 prog_fd_array_sys_lookup_elem(void *ptr)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 18b3f9216b050..be829016d8182 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3501,6 +3501,18 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>  		tgt_prog = prog->aux->dst_prog;
>  	}
>  
> +	if (prog->type == BPF_PROG_TYPE_EXT &&
> +	    atomic_read(&tgt_prog->aux->tail_callee_cnt)) {
> +		/* Program extensions can not extend target prog when the target
> +		 * prog has been updated to any prog_array map as tail callee.
> +		 * It's to prevent a potential infinite loop like:
> +		 * tgt prog entry -> tgt prog subprog -> freplace prog entry
> +		 * --tailcall-> tgt prog entry.
> +		 */
> +		err = -EINVAL;
> +		goto out_unlock;
> +	}
> +
>  	err = bpf_link_prime(&link->link.link, &link_primer);
>  	if (err)
>  		goto out_unlock;

Is it possible there is a race between map update and prog attach?
E.g. suppose the following sequence of events:
- thread #1 enters prog_fd_array_get_ptr()
- thread #1 successfully completes prog->aux->is_extended check (not extended)
- thread #2 enters bpf_tracing_prog_attach()
- thread #2 does atomic_read() for tgt_prog and it returns 0
- thread #2 proceeds attaching freplace to tgt_prog
- thread #1 does atomic_inc(&prog->aux->tail_callee_cnt)

Thus arriving to a state when tgt_prog is both a member of a map and
is freplaced. Is this a valid scenario?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-25  1:24   ` Eduard Zingerman
@ 2024-09-26  7:16     ` Leon Hwang
  2024-09-27 12:23       ` Eduard Zingerman
  0 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-26  7:16 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

Hi Eduard,

Thank you for your review.

On 25/9/24 09:24, Eduard Zingerman wrote:
> On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:
> 
> [...]
> 
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 8a4117f6d7610..18b3f9216b050 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -3292,8 +3292,11 @@ static void bpf_tracing_link_release(struct bpf_link *link)
>>  	bpf_trampoline_put(tr_link->trampoline);
>>  
>>  	/* tgt_prog is NULL if target is a kernel function */
>> -	if (tr_link->tgt_prog)
>> +	if (tr_link->tgt_prog) {
>> +		if (link->prog->type == BPF_PROG_TYPE_EXT)
>> +			tr_link->tgt_prog->aux->is_extended = false;
>>  		bpf_prog_put(tr_link->tgt_prog);
>> +	}
>>  }
>>  
>>  static void bpf_tracing_link_dealloc(struct bpf_link *link)
>> @@ -3523,6 +3526,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>>  	if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
>>  		/* we allocated a new trampoline, so free the old one */
>>  		bpf_trampoline_put(prog->aux->dst_trampoline);
>> +	if (prog->type == BPF_PROG_TYPE_EXT)
>> +		tgt_prog->aux->is_extended = true;
>>  
>>  	prog->aux->dst_prog = NULL;
>>  	prog->aux->dst_trampoline = NULL;
> 
> Sorry, this might be a silly question, I do not fully understand how
> programs and trampolines are protected against concurrent update.
> 

There's no protection against concurrent update.

> Sequence of actions in bpf_tracing_prog_attach():
> a. call bpf_trampoline_link_prog(&link->link, tr)
>    this returns success if `tr->extension_prog` is NULL,
>    meaning trampoline is "free";
> b. update tgt_prog->aux->is_extended = true.
> 
> Sequence of actions in bpf_tracing_link_release():
> c. call bpf_trampoline_unlink_prog(&tr_link->link, tr_link->trampoline)
>    this sets `tr->extension_prog` to NULL;
> d. update tr_link->tgt_prog->aux->is_extended = false.
> 
> In a concurrent environment, is it possible to have actions ordered as:
> - thread #1: does bpf_tracing_link_release(link pointing to tgt_prog)
> - thread #2: does bpf_tracing_prog_attach(some_prog, tgt_prog)
> - thread #1: (c) tr->extension_prog is set to NULL
> - thread #2: (a) tr->extension_prog is set to some_prog
> - thread #2: (b) tgt_prog->aux->is_extended = true
> - thread #1: (d) tgt_prog->aux->is_extended = false
> 
> Thus, loosing the is_extended mark?

Yes, you are correct.

> 
> (As far as I understand bpf_trampoline_compute_key() call in
>  bpf_tracing_prog_attach() it is possible for threads #1 and #2 to
>  operate on a same trampoline object).
> 

In order to avoid the above case:

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6988e432fc3d..1f19b754623c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3281,6 +3281,9 @@ static void bpf_tracing_link_release(struct
bpf_link *link)
        struct bpf_tracing_link *tr_link =
                container_of(link, struct bpf_tracing_link, link.link);

+       if (link->prog->type == BPF_PROG_TYPE_EXT)
+               tr_link->tgt_prog->aux->is_extended = false;
+
        WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link,
                                                tr_link->trampoline));

@@ -3518,6 +3521,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog
*prog,
        if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
                /* we allocated a new trampoline, so free the old one */
                bpf_trampoline_put(prog->aux->dst_trampoline);
+       if (prog->type == BPF_PROG_TYPE_EXT)
+               tgt_prog->aux->is_extended = true;

        prog->aux->dst_prog = NULL;
        prog->aux->dst_trampoline = NULL;

In bpf_tracing_link_release():
1. update tr_link->tgt_prog->aux->is_extended = false.
2. bpf_trampoline_unlink_prog().

In bpf_tracing_prog_attach():
1. bpf_trampoline_link_prog().
2. update tgt_prog->aux->is_extended = true.

Then, it is able to avoid losing the is_extended mark.

Thanks,
Leon


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-25  5:32   ` Eduard Zingerman
@ 2024-09-26  7:19     ` Leon Hwang
  2024-09-27 10:58       ` Eduard Zingerman
  0 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-26  7:19 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot



On 25/9/24 13:32, Eduard Zingerman wrote:
> On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:
> 
> [...]
> 
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 048aa2625cbef..b864b37e67c17 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1484,6 +1484,7 @@ struct bpf_prog_aux {
>>  	bool exception_cb;
>>  	bool exception_boundary;
>>  	bool is_extended; /* true if extended by freplace program */
>> +	atomic_t tail_callee_cnt;
> 
> Nit: the name is a bit misleading, this counts how many times the
>      program resides it prog maps. Confusing w/o additional comments.
>      Maybe something like 'member_of_prog_array_cnt'?
> 

'member_of_prog_array_cnt' is not accurate enough.

'prog_array_member_cnt' is better, and should alongside comment /*
counts how many times as member of prog_array */.

>>  	struct bpf_arena *arena;
>>  	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
>>  	const struct btf_type *attach_func_proto;
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 8d97bae98fa70..c12e0e3bf6ad0 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -961,13 +961,17 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
>>  		return ERR_PTR(-EINVAL);
>>  	}
>>  
>> +	atomic_inc(&prog->aux->tail_callee_cnt);
>>  	return prog;
>>  }
> 
> [...]
> 
>>  static u32 prog_fd_array_sys_lookup_elem(void *ptr)
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 18b3f9216b050..be829016d8182 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -3501,6 +3501,18 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>>  		tgt_prog = prog->aux->dst_prog;
>>  	}
>>  
>> +	if (prog->type == BPF_PROG_TYPE_EXT &&
>> +	    atomic_read(&tgt_prog->aux->tail_callee_cnt)) {
>> +		/* Program extensions can not extend target prog when the target
>> +		 * prog has been updated to any prog_array map as tail callee.
>> +		 * It's to prevent a potential infinite loop like:
>> +		 * tgt prog entry -> tgt prog subprog -> freplace prog entry
>> +		 * --tailcall-> tgt prog entry.
>> +		 */
>> +		err = -EINVAL;
>> +		goto out_unlock;
>> +	}
>> +
>>  	err = bpf_link_prime(&link->link.link, &link_primer);
>>  	if (err)
>>  		goto out_unlock;
> 
> Is it possible there is a race between map update and prog attach?

Yes, it is possible.

> E.g. suppose the following sequence of events:
> - thread #1 enters prog_fd_array_get_ptr()
> - thread #1 successfully completes prog->aux->is_extended check (not extended)
> - thread #2 enters bpf_tracing_prog_attach()
> - thread #2 does atomic_read() for tgt_prog and it returns 0
> - thread #2 proceeds attaching freplace to tgt_prog
> - thread #1 does atomic_inc(&prog->aux->tail_callee_cnt)
> 
> Thus arriving to a state when tgt_prog is both a member of a map and
> is freplaced. Is this a valid scenario?
> 

This patch series aims to prevent such case that tgt_prog is a member of
prog_array and is freplaced at the same time.

Without this patch series, a prog can be extended by freplace prog and then
be updated to prog_array, or can be updated to prog_array and then be
extended by freplace prog, in order to construct such case.

This patch aims to prevent "be updated to prog_array and then be extended
by freplace prog".
The previous patch aims to prevent "be extended by freplace prog and then
be updated to prog_array".

So, in order to avoid the above case:

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index a43e62e2a8bb..da4e26029a33 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -948,7 +948,9 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
        if (IS_ERR(prog))
                return prog;

-       if (!bpf_prog_map_compatible(map, prog)) {
+       atomic_inc(&prog->aux->tail_callee_cnt);
+       if (!bpf_prog_map_compatible(map, prog) || prog->aux->is_extended) {
+               atomic_dec(&prog->aux->tail_callee_cnt);
                bpf_prog_put(prog);
                return ERR_PTR(-EINVAL);
        }

1. Increment tail_callee_cnt.
2. Decrement tail_callee_cnt, if prog->aux->is_extended.

Then, thread #2 does atomic_read() for tgt_prog, and it won't return 0.

Thanks,
Leon

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-26  7:19     ` Leon Hwang
@ 2024-09-27 10:58       ` Eduard Zingerman
  0 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-27 10:58 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Thu, 2024-09-26 at 15:19 +0800, Leon Hwang wrote:

[...]

> > E.g. suppose the following sequence of events:
> > - thread #1 enters prog_fd_array_get_ptr()
> > - thread #1 successfully completes prog->aux->is_extended check (not extended)
> > - thread #2 enters bpf_tracing_prog_attach()
> > - thread #2 does atomic_read() for tgt_prog and it returns 0
> > - thread #2 proceeds attaching freplace to tgt_prog
> > - thread #1 does atomic_inc(&prog->aux->tail_callee_cnt)
> > 
> > Thus arriving to a state when tgt_prog is both a member of a map and
> > is freplaced. Is this a valid scenario?
> > 
> 
> This patch series aims to prevent such case that tgt_prog is a member of
> prog_array and is freplaced at the same time.
> 
> Without this patch series, a prog can be extended by freplace prog and then
> be updated to prog_array, or can be updated to prog_array and then be
> extended by freplace prog, in order to construct such case.
> 
> This patch aims to prevent "be updated to prog_array and then be extended
> by freplace prog".
> The previous patch aims to prevent "be extended by freplace prog and then
> be updated to prog_array".
> 
> So, in order to avoid the above case:
> 
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index a43e62e2a8bb..da4e26029a33 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -948,7 +948,9 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
>         if (IS_ERR(prog))
>                 return prog;
> 
> -       if (!bpf_prog_map_compatible(map, prog)) {
> +       atomic_inc(&prog->aux->tail_callee_cnt);
> +       if (!bpf_prog_map_compatible(map, prog) || prog->aux->is_extended) {
> +               atomic_dec(&prog->aux->tail_callee_cnt);
>                 bpf_prog_put(prog);
>                 return ERR_PTR(-EINVAL);
>         }

I'm not sure this really solves the issue.
Documentation for both 'atomic_inc' and 'atomic_read'
(used in bpf_tracing_prog_attach()) says that these are operations with
relaxed memory ordering. Meaning that e.g. 'atomic_inc' executed
inside prog_fd_array_get_ptr() is not necessarily immediately visible
for other thread executing 'atomic_read' in bpf_tracing_prog_attach().
I think that some memory barrier is needed (non-relaxed func variant).

But all this gets unnecessarily complicated, neither
prog_fd_array_get_ptr() nor bpf_tracing_prog_attach() are executed
often, I think that 'tail_callee_cnt' and 'is_extended' should be
protected by a mutex.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-26  7:16     ` Leon Hwang
@ 2024-09-27 12:23       ` Eduard Zingerman
  0 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-27 12:23 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Thu, 2024-09-26 at 15:16 +0800, Leon Hwang wrote:

[...]

> There's no protection against concurrent update.
> 
> > Sequence of actions in bpf_tracing_prog_attach():
> > a. call bpf_trampoline_link_prog(&link->link, tr)
> >    this returns success if `tr->extension_prog` is NULL,
> >    meaning trampoline is "free";
> > b. update tgt_prog->aux->is_extended = true.
> > 
> > Sequence of actions in bpf_tracing_link_release():
> > c. call bpf_trampoline_unlink_prog(&tr_link->link, tr_link->trampoline)
> >    this sets `tr->extension_prog` to NULL;
> > d. update tr_link->tgt_prog->aux->is_extended = false.
> > 
> > In a concurrent environment, is it possible to have actions ordered as:
> > - thread #1: does bpf_tracing_link_release(link pointing to tgt_prog)
> > - thread #2: does bpf_tracing_prog_attach(some_prog, tgt_prog)
> > - thread #1: (c) tr->extension_prog is set to NULL
> > - thread #2: (a) tr->extension_prog is set to some_prog
> > - thread #2: (b) tgt_prog->aux->is_extended = true
> > - thread #1: (d) tgt_prog->aux->is_extended = false
> > 
> > Thus, loosing the is_extended mark?
> 
> Yes, you are correct.
> 
> > 
> > (As far as I understand bpf_trampoline_compute_key() call in
> >  bpf_tracing_prog_attach() it is possible for threads #1 and #2 to
> >  operate on a same trampoline object).
> > 
> 
> In order to avoid the above case:
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 6988e432fc3d..1f19b754623c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3281,6 +3281,9 @@ static void bpf_tracing_link_release(struct
> bpf_link *link)
>         struct bpf_tracing_link *tr_link =
>                 container_of(link, struct bpf_tracing_link, link.link);
> 
> +       if (link->prog->type == BPF_PROG_TYPE_EXT)
> +               tr_link->tgt_prog->aux->is_extended = false;
> +

Isn't this too early to reset 'is_extended'?
E.g. consider scenario:
- thread #1 enters bpf_tracing_link_release() and sets is_extended == false
- thread #2 enters prog_fd_array_get_ptr(), is_extended is false,
            and it proceeds putting tgt_prog to an array;
- execution of tgt_prog is triggered and freplace patch is still in effect,
  because thread #1 had not finished bpf_tracing_link_release() yet.
  Here same bug we are trying to protect against (circular tailcall)
  is still potentially visible, isn't it?

>         WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link,
>                                                 tr_link->trampoline));
> 
> @@ -3518,6 +3521,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog
> *prog,
>         if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
>                 /* we allocated a new trampoline, so free the old one */
>                 bpf_trampoline_put(prog->aux->dst_trampoline);
> +       if (prog->type == BPF_PROG_TYPE_EXT)
> +               tgt_prog->aux->is_extended = true;
>
>         prog->aux->dst_prog = NULL;
>         prog->aux->dst_trampoline = NULL;
> 
> In bpf_tracing_link_release():
> 1. update tr_link->tgt_prog->aux->is_extended = false.
> 2. bpf_trampoline_unlink_prog().
> 
> In bpf_tracing_prog_attach():
> 1. bpf_trampoline_link_prog().
> 2. update tgt_prog->aux->is_extended = true.
> 
> Then, it is able to avoid losing the is_extended mark.
> 
> Thanks,
> Leon
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-09-27 12:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
2024-09-25  1:24   ` Eduard Zingerman
2024-09-26  7:16     ` Leon Hwang
2024-09-27 12:23       ` Eduard Zingerman
2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
2024-09-25  5:32   ` Eduard Zingerman
2024-09-26  7:19     ` Leon Hwang
2024-09-27 10:58       ` Eduard Zingerman
2024-09-23 13:40 ` [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented Leon Hwang
2024-09-23 13:40 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox