All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace
@ 2024-09-23 13:40 Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

Previously, I fixed a tailcall infinite loop issue caused by trampoline[0].

At this time, I fix a tailcall infinite loop issue caused by tailcall and
freplace combination by preventing updating extended prog to prog_array map
and preventing extending tail callee prog with freplace.

v2 -> v3:
  * Address comments from Alexei:
    * Stop hacking JIT.
    * Prevent the specific use case at attach/update time.

v1 -> v2:
  * Address comment from Eduard:
    * Explain why nop5 and xor/nop3 are swapped at prologue.
  * Address comment from Alexei:
    * Disallow attaching tail_call_reachable freplace prog to
      not-tail_call_reachable target in verifier.
  * Update "bpf, arm64: Fix tailcall infinite loop caused by freplace" with
    latest arm64 JIT code.

Links:
[0] https://lore.kernel.org/bpf/20230912150442.2009-1-hffilwlqm@gmail.com/

Leon Hwang (4):
  bpf: Prevent updating extended prog to prog_array map
  bpf: Prevent extending tail callee prog with freplace
  selftests/bpf: Add a test case to confirm a tailcall infinite loop
    issue has been prevented
  selftests/bpf: Add cases to test tailcall in freplace

 include/linux/bpf.h                           |   2 +
 kernel/bpf/arraymap.c                         |  13 +-
 kernel/bpf/syscall.c                          |  19 +-
 .../selftests/bpf/prog_tests/tailcalls.c      | 196 +++++++++++++++++-
 .../tailcall_bpf2bpf_hierarchy_freplace.c     |  30 +++
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  |  37 +++-
 6 files changed, 290 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c

-- 
2.44.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  2024-09-25  1:24   ` Eduard Zingerman
  2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

This patch prevents a potential infinite loop issue caused by combination
of tailcal and freplace partially.

For example:

tc_bpf2bpf.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

__noinline
int subprog_tc(struct __sk_buff *skb)
{
	return skb->len * 2;
}

SEC("tc")
int entry_tc(struct __sk_buff *skb)
{
	return subprog_tc(skb);
}

char __license[] SEC("license") = "GPL";

tailcall_bpf2bpf_hierarchy_freplace.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

int count = 0;

static __noinline
int subprog_tail(struct __sk_buff *skb)
{
	bpf_tail_call_static(skb, &jmp_table, 0);
	return 0;
}

SEC("freplace")
int entry_freplace(struct __sk_buff *skb)
{
	count++;
	subprog_tail(skb);
	subprog_tail(skb);
	return count;
}

char __license[] SEC("license") = "GPL";

The attach target of entry_freplace is subprog_tc, and the tail callee
in subprog_tail is entry_tc.

Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace
-> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
entry_freplace will count from zero for every time of entry_freplace
execution. Kernel will panic:

[   15.310490] BUG: TASK stack guard page was hit at (____ptrval____)
(stack is (____ptrval____)..(____ptrval____))
[   15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI
[   15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G           OE
   6.10.0-rc6-g026dcdae8d3e-dirty #72
[   15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e
fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c
[   15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202
[   15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX:
0000000000008cb5
[   15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI:
ffff9c98808b7e00
[   15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09:
0000000000000000
[   15.310490] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffb500c01af000
[   15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15:
0000000000000000
[   15.310490] FS:  00007f133b665140(0000) GS:ffff9c98bbd00000(0000)
knlGS:0000000000000000
[   15.310490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4:
00000000000006f0
[   15.310490] Call Trace:
[   15.310490]  <#DF>
[   15.310490]  ? die+0x36/0x90
[   15.310490]  ? handle_stack_overflow+0x4d/0x60
[   15.310490]  ? exc_double_fault+0x117/0x1a0
[   15.310490]  ? asm_exc_double_fault+0x23/0x30
[   15.310490]  ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490]  </#DF>
[   15.310490]  <TASK>
[   15.310490]  bpf_prog_85781a698094722f_entry+0x4c/0x64
[   15.310490]  bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b
[   15.310490]  ...
[   15.310490]  bpf_prog_85781a698094722f_entry+0x4c/0x64
[   15.310490]  bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b
[   15.310490]  bpf_test_run+0x210/0x370
[   15.310490]  ? bpf_test_run+0x128/0x370
[   15.310490]  bpf_prog_test_run_skb+0x388/0x7a0
[   15.310490]  __sys_bpf+0xdbf/0x2c40
[   15.310490]  ? clockevents_program_event+0x52/0xf0
[   15.310490]  ? lock_release+0xbf/0x290
[   15.310490]  __x64_sys_bpf+0x1e/0x30
[   15.310490]  do_syscall_64+0x68/0x140
[   15.310490]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   15.310490] RIP: 0033:0x7f133b52725d
[   15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[   15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX:
0000000000000141
[   15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX:
00007f133b52725d
[   15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI:
000000000000000a
[   15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09:
00007ffddbc102a0
[   15.310490] R10: 0000000000000064 R11: 0000000000000206 R12:
0000000000000004
[   15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15:
00007f133b6ed000
[   15.310490]  </TASK>
[   15.310490] Modules linked in: bpf_testmod(OE)
[   15.310490] ---[ end trace 0000000000000000 ]---
[   15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e
fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c
[   15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202
[   15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX:
0000000000008cb5
[   15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI:
ffff9c98808b7e00
[   15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09:
0000000000000000
[   15.310490] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffb500c01af000
[   15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15:
0000000000000000
[   15.310490] FS:  00007f133b665140(0000) GS:ffff9c98bbd00000(0000)
knlGS:0000000000000000
[   15.310490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4:
00000000000006f0
[   15.310490] Kernel panic - not syncing: Fatal exception in interrupt
[   15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)

This patch partially prevents this panic by preventing updating extended
prog to prog_array map.

If a prog or its subprog has been extended by freplace prog, the prog
can not be updated to prog_array map.

Alongside next patch, the panic will be prevented completely.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h   | 1 +
 kernel/bpf/arraymap.c | 7 ++++++-
 kernel/bpf/syscall.c  | 7 ++++++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0c3893c471711..048aa2625cbef 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1483,6 +1483,7 @@ struct bpf_prog_aux {
 	bool xdp_has_frags;
 	bool exception_cb;
 	bool exception_boundary;
+	bool is_extended; /* true if extended by freplace program */
 	struct bpf_arena *arena;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 79660e3fca4c1..8d97bae98fa70 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -951,7 +951,12 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
 	if (IS_ERR(prog))
 		return prog;
 
-	if (!bpf_prog_map_compatible(map, prog)) {
+	if (!bpf_prog_map_compatible(map, prog) || prog->aux->is_extended) {
+		/* Extended prog can not be tail callee. It's to prevent a
+		 * potential infinite loop like:
+		 * tail callee prog entry -> tail callee prog subprog ->
+		 * freplace prog entry --tailcall-> tail callee prog entry.
+		 */
 		bpf_prog_put(prog);
 		return ERR_PTR(-EINVAL);
 	}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8a4117f6d7610..18b3f9216b050 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3292,8 +3292,11 @@ static void bpf_tracing_link_release(struct bpf_link *link)
 	bpf_trampoline_put(tr_link->trampoline);
 
 	/* tgt_prog is NULL if target is a kernel function */
-	if (tr_link->tgt_prog)
+	if (tr_link->tgt_prog) {
+		if (link->prog->type == BPF_PROG_TYPE_EXT)
+			tr_link->tgt_prog->aux->is_extended = false;
 		bpf_prog_put(tr_link->tgt_prog);
+	}
 }
 
 static void bpf_tracing_link_dealloc(struct bpf_link *link)
@@ -3523,6 +3526,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 	if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
 		/* we allocated a new trampoline, so free the old one */
 		bpf_trampoline_put(prog->aux->dst_trampoline);
+	if (prog->type == BPF_PROG_TYPE_EXT)
+		tgt_prog->aux->is_extended = true;
 
 	prog->aux->dst_prog = NULL;
 	prog->aux->dst_trampoline = NULL;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  2024-09-25  5:32   ` Eduard Zingerman
  2024-09-23 13:40 ` [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace Leon Hwang
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

Alongside previous patch, the infinite loop issue caused by combination of
tailcal and freplace can be prevented completely.

The previous patch can not prevent the use case that updates a prog to
prog_array map and then extends the prog with freplace prog.

This patch fixes the case by preventing extending a prog, which has been
updated to prog_array map, with freplace prog.

If a prog has been updated to prog_array map, it or its subprog can not
be extended by freplace prog.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/arraymap.c |  6 +++++-
 kernel/bpf/syscall.c  | 12 ++++++++++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 048aa2625cbef..b864b37e67c17 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1484,6 +1484,7 @@ struct bpf_prog_aux {
 	bool exception_cb;
 	bool exception_boundary;
 	bool is_extended; /* true if extended by freplace program */
+	atomic_t tail_callee_cnt;
 	struct bpf_arena *arena;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 8d97bae98fa70..c12e0e3bf6ad0 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -961,13 +961,17 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
 		return ERR_PTR(-EINVAL);
 	}
 
+	atomic_inc(&prog->aux->tail_callee_cnt);
 	return prog;
 }
 
 static void prog_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
 {
+	struct bpf_prog *prog = ptr;
+
 	/* bpf_prog is freed after one RCU or tasks trace grace period */
-	bpf_prog_put(ptr);
+	atomic_dec(&prog->aux->tail_callee_cnt);
+	bpf_prog_put(prog);
 }
 
 static u32 prog_fd_array_sys_lookup_elem(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 18b3f9216b050..be829016d8182 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3501,6 +3501,18 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 		tgt_prog = prog->aux->dst_prog;
 	}
 
+	if (prog->type == BPF_PROG_TYPE_EXT &&
+	    atomic_read(&tgt_prog->aux->tail_callee_cnt)) {
+		/* Program extensions can not extend target prog when the target
+		 * prog has been updated to any prog_array map as tail callee.
+		 * It's to prevent a potential infinite loop like:
+		 * tgt prog entry -> tgt prog subprog -> freplace prog entry
+		 * --tailcall-> tgt prog entry.
+		 */
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	err = bpf_link_prime(&link->link.link, &link_primer);
 	if (err)
 		goto out_unlock;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  2024-09-23 13:40 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace Leon Hwang
  3 siblings, 0 replies; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
335/26  tailcalls/tailcall_bpf2bpf_freplace:OK
335     tailcalls:OK
Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 99 ++++++++++++++++++-
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  | 24 ++++-
 2 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index 21c5a37846ade..fa3f3bb11b098 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -1525,7 +1525,8 @@ static void test_tailcall_freplace(void)
 
 	prog_fd = bpf_program__fd(tc_skel->progs.entry_tc);
 	freplace_prog = freplace_skel->progs.entry_freplace;
-	err = bpf_program__set_attach_target(freplace_prog, prog_fd, "subprog");
+	err = bpf_program__set_attach_target(freplace_prog, prog_fd,
+					     "subprog_tailcall_tc");
 	if (!ASSERT_OK(err, "set_attach_target"))
 		goto out;
 
@@ -1534,7 +1535,7 @@ static void test_tailcall_freplace(void)
 		goto out;
 
 	freplace_link = bpf_program__attach_freplace(freplace_prog, prog_fd,
-						     "subprog");
+						     "subprog_tailcall_tc");
 	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
 		goto out;
 
@@ -1556,6 +1557,98 @@ static void test_tailcall_freplace(void)
 	tailcall_freplace__destroy(freplace_skel);
 }
 
+/* test_tailcall_bpf2bpf_freplace checks the failure that fails to attach a tail
+ * callee prog with freplace prog or fails to update an extended prog to
+ * prog_array map.
+ */
+static void test_tailcall_bpf2bpf_freplace(void)
+{
+	struct tailcall_freplace *freplace_skel = NULL;
+	struct bpf_link *freplace_link = NULL;
+	struct tc_bpf2bpf *tc_skel = NULL;
+	char buff[128] = {};
+	int prog_fd, map_fd;
+	int err, key;
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .data_in = buff,
+		    .data_size_in = sizeof(buff),
+		    .repeat = 1,
+	);
+
+	tc_skel = tc_bpf2bpf__open_and_load();
+	if (!ASSERT_OK_PTR(tc_skel, "tc_bpf2bpf__open_and_load"))
+		goto out;
+
+	prog_fd = bpf_program__fd(tc_skel->progs.entry_tc);
+	freplace_skel = tailcall_freplace__open();
+	if (!ASSERT_OK_PTR(freplace_skel, "tailcall_freplace__open"))
+		goto out;
+
+	err = bpf_program__set_attach_target(freplace_skel->progs.entry_freplace,
+					     prog_fd, "subprog_tc");
+	if (!ASSERT_OK(err, "set_attach_target"))
+		goto out;
+
+	err = tailcall_freplace__load(freplace_skel);
+	if (!ASSERT_OK(err, "tailcall_freplace__load"))
+		goto out;
+
+	/* OK to attach then detach freplace prog. */
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_link__destroy(freplace_link);
+	if (!ASSERT_OK(err, "destroy link"))
+		goto out;
+
+	/* OK to update prog_array map then delete element from the map. */
+
+	key = 0;
+	map_fd = bpf_map__fd(freplace_skel->maps.jmp_table);
+	err = bpf_map_update_elem(map_fd, &key, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	err = bpf_map_delete_elem(map_fd, &key);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	/* Fail to attach a tail callee prog with freplace prog. */
+
+	err = bpf_map_update_elem(map_fd, &key, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tc");
+	if (!ASSERT_ERR_PTR(freplace_link, "attach_freplace failure"))
+		goto out;
+
+	err = bpf_map_delete_elem(map_fd, &key);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	/* Fail to update an extended prog to prog_array map. */
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_map_update_elem(map_fd, &key, &prog_fd, BPF_ANY);
+	if (!ASSERT_ERR(err, "update jmp_table failure"))
+		goto out;
+
+out:
+	bpf_link__destroy(freplace_link);
+	tailcall_freplace__destroy(freplace_skel);
+	tc_bpf2bpf__destroy(tc_skel);
+}
+
 void test_tailcalls(void)
 {
 	if (test__start_subtest("tailcall_1"))
@@ -1606,4 +1699,6 @@ void test_tailcalls(void)
 	test_tailcall_bpf2bpf_hierarchy_3();
 	if (test__start_subtest("tailcall_freplace"))
 		test_tailcall_freplace();
+	if (test__start_subtest("tailcall_bpf2bpf_freplace"))
+		test_tailcall_bpf2bpf_freplace();
 }
diff --git a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
index 8a0632c37839a..34f3c780194e4 100644
--- a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
@@ -4,11 +4,30 @@
 #include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"
 
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+__noinline
+int subprog_tailcall_tc(struct __sk_buff *skb)
+{
+	int ret = 1;
+
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	__sink(skb);
+	__sink(ret);
+	return ret;
+}
+
 __noinline
-int subprog(struct __sk_buff *skb)
+int subprog_tc(struct __sk_buff *skb)
 {
 	int ret = 1;
 
+	__sink(skb);
 	__sink(ret);
 	return ret;
 }
@@ -16,7 +35,8 @@ int subprog(struct __sk_buff *skb)
 SEC("tc")
 int entry_tc(struct __sk_buff *skb)
 {
-	return subprog(skb);
+	subprog_tc(skb);
+	return subprog_tailcall_tc(skb);
 }
 
 char __license[] SEC("license") = "GPL";
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace
  2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
                   ` (2 preceding siblings ...)
  2024-09-23 13:40 ` [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented Leon Hwang
@ 2024-09-23 13:40 ` Leon Hwang
  3 siblings, 0 replies; 11+ messages in thread
From: Leon Hwang @ 2024-09-23 13:40 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
335/27  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_1:OK
335/28  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_2:OK
335     tailcalls:OK
Summary: 1/28 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 97 +++++++++++++++++++
 .../tailcall_bpf2bpf_hierarchy_freplace.c     | 30 ++++++
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  | 13 +++
 3 files changed, 140 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index fa3f3bb11b098..0564ad6c9b288 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -5,6 +5,7 @@
 #include "tailcall_poke.skel.h"
 #include "tailcall_bpf2bpf_hierarchy2.skel.h"
 #include "tailcall_bpf2bpf_hierarchy3.skel.h"
+#include "tailcall_bpf2bpf_hierarchy_freplace.skel.h"
 #include "tailcall_freplace.skel.h"
 #include "tc_bpf2bpf.skel.h"
 
@@ -1649,6 +1650,98 @@ static void test_tailcall_bpf2bpf_freplace(void)
 	tc_bpf2bpf__destroy(tc_skel);
 }
 
+static void test_tailcall_bpf2bpf_hierarchy_freplace(bool freplace_subprog,
+						     bool target_entry2,
+						     int count1, int count2)
+{
+	struct tailcall_bpf2bpf_hierarchy_freplace *freplace_skel = NULL;
+	struct bpf_link *freplace_link = NULL;
+	struct tc_bpf2bpf *tc_skel = NULL;
+	int prog_fd, map_fd;
+	char buff[128] = {};
+	int err, key, val;
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .data_in = buff,
+		    .data_size_in = sizeof(buff),
+		    .repeat = 1,
+	);
+
+	tc_skel = tc_bpf2bpf__open_and_load();
+	if (!ASSERT_OK_PTR(tc_skel, "tc_bpf2bpf__open_and_load"))
+		goto out;
+
+	prog_fd = bpf_program__fd(target_entry2 ? tc_skel->progs.entry_tc_2 :
+				  tc_skel->progs.entry_tc);
+	freplace_skel = tailcall_bpf2bpf_hierarchy_freplace__open();
+	if (!ASSERT_OK_PTR(freplace_skel, "tailcall_bpf2bpf_hierarchy_freplace__open"))
+		goto out;
+
+	err = bpf_program__set_attach_target(freplace_skel->progs.entry_freplace,
+					     prog_fd, freplace_subprog ?
+					     "subprog_tailcall_tc" : "entry_tc");
+	if (!ASSERT_OK(err, "set_attach_target"))
+		goto out;
+
+	err = tailcall_bpf2bpf_hierarchy_freplace__load(freplace_skel);
+	if (!ASSERT_OK(err, "tailcall_bpf2bpf_hierarchy_freplace__load"))
+		goto out;
+
+	val = bpf_program__fd(freplace_skel->progs.entry_freplace);
+	map_fd = bpf_map__fd(freplace_skel->maps.jmp_table);
+	key = 0;
+	err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, freplace_subprog ?
+						     "subprog_tailcall_tc" : "entry_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts");
+	ASSERT_EQ(topts.retval, count1, "test_run_opts retval");
+
+	err = bpf_map_delete_elem(map_fd, &key);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts again");
+	ASSERT_EQ(topts.retval, count2, "test_run_opts retval again");
+
+out:
+	bpf_link__destroy(freplace_link);
+	tailcall_bpf2bpf_hierarchy_freplace__destroy(freplace_skel);
+	tc_bpf2bpf__destroy(tc_skel);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_1 checks the count value of tail
+ * call limit enforcement matches with expectation for the case:
+ *
+ *                                    subprog_tail --tailcall-> entry_freplace
+ * entry_tc --jump-> entry_freplace <
+ *                                    subprog_tail --tailcall-> entry_freplace
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_1(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(false, false, 34, 35);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_2 checks the count value of tail
+ * call limit enforcement matches with expectation for the case:
+ *
+ *                                                                              subprog_tail --tailcall-> entry_freplace
+ * entry_tc_2 --> subprog_tailcall_tc (call 10 times) --jump-> entry_freplace <
+ *                                                                              subprog_tail --tailcall-> entry_freplace
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_2(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(true, true, 340, 350);
+}
+
 void test_tailcalls(void)
 {
 	if (test__start_subtest("tailcall_1"))
@@ -1701,4 +1794,8 @@ void test_tailcalls(void)
 		test_tailcall_freplace();
 	if (test__start_subtest("tailcall_bpf2bpf_freplace"))
 		test_tailcall_bpf2bpf_freplace();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_1"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_1();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_2"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_2();
 }
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c
new file mode 100644
index 0000000000000..6f7c1fac9ddb7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+int count = 0;
+
+static __noinline
+int subprog_tail(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	return 0;
+}
+
+SEC("freplace")
+int entry_freplace(struct __sk_buff *skb)
+{
+	count++;
+	subprog_tail(skb);
+	subprog_tail(skb);
+	return count;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
index 34f3c780194e4..beacf60a52677 100644
--- a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
@@ -39,4 +39,17 @@ int entry_tc(struct __sk_buff *skb)
 	return subprog_tailcall_tc(skb);
 }
 
+SEC("tc")
+int entry_tc_2(struct __sk_buff *skb)
+{
+	int ret, i;
+
+	for (i = 0; i < 10; i++) {
+		ret = subprog_tailcall_tc(skb);
+		__sink(ret);
+	}
+
+	return ret;
+}
+
 char __license[] SEC("license") = "GPL";
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
@ 2024-09-25  1:24   ` Eduard Zingerman
  2024-09-26  7:16     ` Leon Hwang
  0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-25  1:24 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:

[...]

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 8a4117f6d7610..18b3f9216b050 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3292,8 +3292,11 @@ static void bpf_tracing_link_release(struct bpf_link *link)
>  	bpf_trampoline_put(tr_link->trampoline);
>  
>  	/* tgt_prog is NULL if target is a kernel function */
> -	if (tr_link->tgt_prog)
> +	if (tr_link->tgt_prog) {
> +		if (link->prog->type == BPF_PROG_TYPE_EXT)
> +			tr_link->tgt_prog->aux->is_extended = false;
>  		bpf_prog_put(tr_link->tgt_prog);
> +	}
>  }
>  
>  static void bpf_tracing_link_dealloc(struct bpf_link *link)
> @@ -3523,6 +3526,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>  	if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
>  		/* we allocated a new trampoline, so free the old one */
>  		bpf_trampoline_put(prog->aux->dst_trampoline);
> +	if (prog->type == BPF_PROG_TYPE_EXT)
> +		tgt_prog->aux->is_extended = true;
>  
>  	prog->aux->dst_prog = NULL;
>  	prog->aux->dst_trampoline = NULL;

Sorry, this might be a silly question, I do not fully understand how
programs and trampolines are protected against concurrent update.

Sequence of actions in bpf_tracing_prog_attach():
a. call bpf_trampoline_link_prog(&link->link, tr)
   this returns success if `tr->extension_prog` is NULL,
   meaning trampoline is "free";
b. update tgt_prog->aux->is_extended = true.

Sequence of actions in bpf_tracing_link_release():
c. call bpf_trampoline_unlink_prog(&tr_link->link, tr_link->trampoline)
   this sets `tr->extension_prog` to NULL;
d. update tr_link->tgt_prog->aux->is_extended = false.

In a concurrent environment, is it possible to have actions ordered as:
- thread #1: does bpf_tracing_link_release(link pointing to tgt_prog)
- thread #2: does bpf_tracing_prog_attach(some_prog, tgt_prog)
- thread #1: (c) tr->extension_prog is set to NULL
- thread #2: (a) tr->extension_prog is set to some_prog
- thread #2: (b) tgt_prog->aux->is_extended = true
- thread #1: (d) tgt_prog->aux->is_extended = false

Thus, loosing the is_extended mark?

(As far as I understand bpf_trampoline_compute_key() call in
 bpf_tracing_prog_attach() it is possible for threads #1 and #2 to
 operate on a same trampoline object).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
@ 2024-09-25  5:32   ` Eduard Zingerman
  2024-09-26  7:19     ` Leon Hwang
  0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-25  5:32 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:

[...]

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 048aa2625cbef..b864b37e67c17 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1484,6 +1484,7 @@ struct bpf_prog_aux {
>  	bool exception_cb;
>  	bool exception_boundary;
>  	bool is_extended; /* true if extended by freplace program */
> +	atomic_t tail_callee_cnt;

Nit: the name is a bit misleading, this counts how many times the
     program resides it prog maps. Confusing w/o additional comments.
     Maybe something like 'member_of_prog_array_cnt'?

>  	struct bpf_arena *arena;
>  	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
>  	const struct btf_type *attach_func_proto;
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 8d97bae98fa70..c12e0e3bf6ad0 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -961,13 +961,17 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
>  		return ERR_PTR(-EINVAL);
>  	}
>  
> +	atomic_inc(&prog->aux->tail_callee_cnt);
>  	return prog;
>  }

[...]

>  static u32 prog_fd_array_sys_lookup_elem(void *ptr)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 18b3f9216b050..be829016d8182 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3501,6 +3501,18 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>  		tgt_prog = prog->aux->dst_prog;
>  	}
>  
> +	if (prog->type == BPF_PROG_TYPE_EXT &&
> +	    atomic_read(&tgt_prog->aux->tail_callee_cnt)) {
> +		/* Program extensions can not extend target prog when the target
> +		 * prog has been updated to any prog_array map as tail callee.
> +		 * It's to prevent a potential infinite loop like:
> +		 * tgt prog entry -> tgt prog subprog -> freplace prog entry
> +		 * --tailcall-> tgt prog entry.
> +		 */
> +		err = -EINVAL;
> +		goto out_unlock;
> +	}
> +
>  	err = bpf_link_prime(&link->link.link, &link_primer);
>  	if (err)
>  		goto out_unlock;

Is it possible there is a race between map update and prog attach?
E.g. suppose the following sequence of events:
- thread #1 enters prog_fd_array_get_ptr()
- thread #1 successfully completes prog->aux->is_extended check (not extended)
- thread #2 enters bpf_tracing_prog_attach()
- thread #2 does atomic_read() for tgt_prog and it returns 0
- thread #2 proceeds attaching freplace to tgt_prog
- thread #1 does atomic_inc(&prog->aux->tail_callee_cnt)

Thus arriving to a state when tgt_prog is both a member of a map and
is freplaced. Is this a valid scenario?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-25  1:24   ` Eduard Zingerman
@ 2024-09-26  7:16     ` Leon Hwang
  2024-09-27 12:23       ` Eduard Zingerman
  0 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-26  7:16 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

Hi Eduard,

Thank you for your review.

On 25/9/24 09:24, Eduard Zingerman wrote:
> On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:
> 
> [...]
> 
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 8a4117f6d7610..18b3f9216b050 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -3292,8 +3292,11 @@ static void bpf_tracing_link_release(struct bpf_link *link)
>>  	bpf_trampoline_put(tr_link->trampoline);
>>  
>>  	/* tgt_prog is NULL if target is a kernel function */
>> -	if (tr_link->tgt_prog)
>> +	if (tr_link->tgt_prog) {
>> +		if (link->prog->type == BPF_PROG_TYPE_EXT)
>> +			tr_link->tgt_prog->aux->is_extended = false;
>>  		bpf_prog_put(tr_link->tgt_prog);
>> +	}
>>  }
>>  
>>  static void bpf_tracing_link_dealloc(struct bpf_link *link)
>> @@ -3523,6 +3526,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>>  	if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
>>  		/* we allocated a new trampoline, so free the old one */
>>  		bpf_trampoline_put(prog->aux->dst_trampoline);
>> +	if (prog->type == BPF_PROG_TYPE_EXT)
>> +		tgt_prog->aux->is_extended = true;
>>  
>>  	prog->aux->dst_prog = NULL;
>>  	prog->aux->dst_trampoline = NULL;
> 
> Sorry, this might be a silly question, I do not fully understand how
> programs and trampolines are protected against concurrent update.
> 

There's no protection against concurrent update.

> Sequence of actions in bpf_tracing_prog_attach():
> a. call bpf_trampoline_link_prog(&link->link, tr)
>    this returns success if `tr->extension_prog` is NULL,
>    meaning trampoline is "free";
> b. update tgt_prog->aux->is_extended = true.
> 
> Sequence of actions in bpf_tracing_link_release():
> c. call bpf_trampoline_unlink_prog(&tr_link->link, tr_link->trampoline)
>    this sets `tr->extension_prog` to NULL;
> d. update tr_link->tgt_prog->aux->is_extended = false.
> 
> In a concurrent environment, is it possible to have actions ordered as:
> - thread #1: does bpf_tracing_link_release(link pointing to tgt_prog)
> - thread #2: does bpf_tracing_prog_attach(some_prog, tgt_prog)
> - thread #1: (c) tr->extension_prog is set to NULL
> - thread #2: (a) tr->extension_prog is set to some_prog
> - thread #2: (b) tgt_prog->aux->is_extended = true
> - thread #1: (d) tgt_prog->aux->is_extended = false
> 
> Thus, loosing the is_extended mark?

Yes, you are correct.

> 
> (As far as I understand bpf_trampoline_compute_key() call in
>  bpf_tracing_prog_attach() it is possible for threads #1 and #2 to
>  operate on a same trampoline object).
> 

In order to avoid the above case:

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6988e432fc3d..1f19b754623c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3281,6 +3281,9 @@ static void bpf_tracing_link_release(struct
bpf_link *link)
        struct bpf_tracing_link *tr_link =
                container_of(link, struct bpf_tracing_link, link.link);

+       if (link->prog->type == BPF_PROG_TYPE_EXT)
+               tr_link->tgt_prog->aux->is_extended = false;
+
        WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link,
                                                tr_link->trampoline));

@@ -3518,6 +3521,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog
*prog,
        if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
                /* we allocated a new trampoline, so free the old one */
                bpf_trampoline_put(prog->aux->dst_trampoline);
+       if (prog->type == BPF_PROG_TYPE_EXT)
+               tgt_prog->aux->is_extended = true;

        prog->aux->dst_prog = NULL;
        prog->aux->dst_trampoline = NULL;

In bpf_tracing_link_release():
1. update tr_link->tgt_prog->aux->is_extended = false.
2. bpf_trampoline_unlink_prog().

In bpf_tracing_prog_attach():
1. bpf_trampoline_link_prog().
2. update tgt_prog->aux->is_extended = true.

Then, it is able to avoid losing the is_extended mark.

Thanks,
Leon


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-25  5:32   ` Eduard Zingerman
@ 2024-09-26  7:19     ` Leon Hwang
  2024-09-27 10:58       ` Eduard Zingerman
  0 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2024-09-26  7:19 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot



On 25/9/24 13:32, Eduard Zingerman wrote:
> On Mon, 2024-09-23 at 21:40 +0800, Leon Hwang wrote:
> 
> [...]
> 
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 048aa2625cbef..b864b37e67c17 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1484,6 +1484,7 @@ struct bpf_prog_aux {
>>  	bool exception_cb;
>>  	bool exception_boundary;
>>  	bool is_extended; /* true if extended by freplace program */
>> +	atomic_t tail_callee_cnt;
> 
> Nit: the name is a bit misleading, this counts how many times the
>      program resides it prog maps. Confusing w/o additional comments.
>      Maybe something like 'member_of_prog_array_cnt'?
> 

'member_of_prog_array_cnt' is not accurate enough.

'prog_array_member_cnt' is better, and should alongside comment /*
counts how many times as member of prog_array */.

>>  	struct bpf_arena *arena;
>>  	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
>>  	const struct btf_type *attach_func_proto;
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 8d97bae98fa70..c12e0e3bf6ad0 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -961,13 +961,17 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
>>  		return ERR_PTR(-EINVAL);
>>  	}
>>  
>> +	atomic_inc(&prog->aux->tail_callee_cnt);
>>  	return prog;
>>  }
> 
> [...]
> 
>>  static u32 prog_fd_array_sys_lookup_elem(void *ptr)
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 18b3f9216b050..be829016d8182 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -3501,6 +3501,18 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>>  		tgt_prog = prog->aux->dst_prog;
>>  	}
>>  
>> +	if (prog->type == BPF_PROG_TYPE_EXT &&
>> +	    atomic_read(&tgt_prog->aux->tail_callee_cnt)) {
>> +		/* Program extensions can not extend target prog when the target
>> +		 * prog has been updated to any prog_array map as tail callee.
>> +		 * It's to prevent a potential infinite loop like:
>> +		 * tgt prog entry -> tgt prog subprog -> freplace prog entry
>> +		 * --tailcall-> tgt prog entry.
>> +		 */
>> +		err = -EINVAL;
>> +		goto out_unlock;
>> +	}
>> +
>>  	err = bpf_link_prime(&link->link.link, &link_primer);
>>  	if (err)
>>  		goto out_unlock;
> 
> Is it possible there is a race between map update and prog attach?

Yes, it is possible.

> E.g. suppose the following sequence of events:
> - thread #1 enters prog_fd_array_get_ptr()
> - thread #1 successfully completes prog->aux->is_extended check (not extended)
> - thread #2 enters bpf_tracing_prog_attach()
> - thread #2 does atomic_read() for tgt_prog and it returns 0
> - thread #2 proceeds attaching freplace to tgt_prog
> - thread #1 does atomic_inc(&prog->aux->tail_callee_cnt)
> 
> Thus arriving to a state when tgt_prog is both a member of a map and
> is freplaced. Is this a valid scenario?
> 

This patch series aims to prevent such case that tgt_prog is a member of
prog_array and is freplaced at the same time.

Without this patch series, a prog can be extended by freplace prog and then
be updated to prog_array, or can be updated to prog_array and then be
extended by freplace prog, in order to construct such case.

This patch aims to prevent "be updated to prog_array and then be extended
by freplace prog".
The previous patch aims to prevent "be extended by freplace prog and then
be updated to prog_array".

So, in order to avoid the above case:

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index a43e62e2a8bb..da4e26029a33 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -948,7 +948,9 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
        if (IS_ERR(prog))
                return prog;

-       if (!bpf_prog_map_compatible(map, prog)) {
+       atomic_inc(&prog->aux->tail_callee_cnt);
+       if (!bpf_prog_map_compatible(map, prog) || prog->aux->is_extended) {
+               atomic_dec(&prog->aux->tail_callee_cnt);
                bpf_prog_put(prog);
                return ERR_PTR(-EINVAL);
        }

1. Increment tail_callee_cnt.
2. Decrement tail_callee_cnt, if prog->aux->is_extended.

Then, thread #2 does atomic_read() for tgt_prog, and it won't return 0.

Thanks,
Leon

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace
  2024-09-26  7:19     ` Leon Hwang
@ 2024-09-27 10:58       ` Eduard Zingerman
  0 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-27 10:58 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Thu, 2024-09-26 at 15:19 +0800, Leon Hwang wrote:

[...]

> > E.g. suppose the following sequence of events:
> > - thread #1 enters prog_fd_array_get_ptr()
> > - thread #1 successfully completes prog->aux->is_extended check (not extended)
> > - thread #2 enters bpf_tracing_prog_attach()
> > - thread #2 does atomic_read() for tgt_prog and it returns 0
> > - thread #2 proceeds attaching freplace to tgt_prog
> > - thread #1 does atomic_inc(&prog->aux->tail_callee_cnt)
> > 
> > Thus arriving to a state when tgt_prog is both a member of a map and
> > is freplaced. Is this a valid scenario?
> > 
> 
> This patch series aims to prevent such case that tgt_prog is a member of
> prog_array and is freplaced at the same time.
> 
> Without this patch series, a prog can be extended by freplace prog and then
> be updated to prog_array, or can be updated to prog_array and then be
> extended by freplace prog, in order to construct such case.
> 
> This patch aims to prevent "be updated to prog_array and then be extended
> by freplace prog".
> The previous patch aims to prevent "be extended by freplace prog and then
> be updated to prog_array".
> 
> So, in order to avoid the above case:
> 
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index a43e62e2a8bb..da4e26029a33 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -948,7 +948,9 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
>         if (IS_ERR(prog))
>                 return prog;
> 
> -       if (!bpf_prog_map_compatible(map, prog)) {
> +       atomic_inc(&prog->aux->tail_callee_cnt);
> +       if (!bpf_prog_map_compatible(map, prog) || prog->aux->is_extended) {
> +               atomic_dec(&prog->aux->tail_callee_cnt);
>                 bpf_prog_put(prog);
>                 return ERR_PTR(-EINVAL);
>         }

I'm not sure this really solves the issue.
Documentation for both 'atomic_inc' and 'atomic_read'
(used in bpf_tracing_prog_attach()) says that these are operations with
relaxed memory ordering. Meaning that e.g. 'atomic_inc' executed
inside prog_fd_array_get_ptr() is not necessarily immediately visible
for other thread executing 'atomic_read' in bpf_tracing_prog_attach().
I think that some memory barrier is needed (non-relaxed func variant).

But all this gets unnecessarily complicated, neither
prog_fd_array_get_ptr() nor bpf_tracing_prog_attach() are executed
often, I think that 'tail_callee_cnt' and 'is_extended' should be
protected by a mutex.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map
  2024-09-26  7:16     ` Leon Hwang
@ 2024-09-27 12:23       ` Eduard Zingerman
  0 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2024-09-27 12:23 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, iii, kernel-patches-bot

On Thu, 2024-09-26 at 15:16 +0800, Leon Hwang wrote:

[...]

> There's no protection against concurrent update.
> 
> > Sequence of actions in bpf_tracing_prog_attach():
> > a. call bpf_trampoline_link_prog(&link->link, tr)
> >    this returns success if `tr->extension_prog` is NULL,
> >    meaning trampoline is "free";
> > b. update tgt_prog->aux->is_extended = true.
> > 
> > Sequence of actions in bpf_tracing_link_release():
> > c. call bpf_trampoline_unlink_prog(&tr_link->link, tr_link->trampoline)
> >    this sets `tr->extension_prog` to NULL;
> > d. update tr_link->tgt_prog->aux->is_extended = false.
> > 
> > In a concurrent environment, is it possible to have actions ordered as:
> > - thread #1: does bpf_tracing_link_release(link pointing to tgt_prog)
> > - thread #2: does bpf_tracing_prog_attach(some_prog, tgt_prog)
> > - thread #1: (c) tr->extension_prog is set to NULL
> > - thread #2: (a) tr->extension_prog is set to some_prog
> > - thread #2: (b) tgt_prog->aux->is_extended = true
> > - thread #1: (d) tgt_prog->aux->is_extended = false
> > 
> > Thus, loosing the is_extended mark?
> 
> Yes, you are correct.
> 
> > 
> > (As far as I understand bpf_trampoline_compute_key() call in
> >  bpf_tracing_prog_attach() it is possible for threads #1 and #2 to
> >  operate on a same trampoline object).
> > 
> 
> In order to avoid the above case:
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 6988e432fc3d..1f19b754623c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3281,6 +3281,9 @@ static void bpf_tracing_link_release(struct
> bpf_link *link)
>         struct bpf_tracing_link *tr_link =
>                 container_of(link, struct bpf_tracing_link, link.link);
> 
> +       if (link->prog->type == BPF_PROG_TYPE_EXT)
> +               tr_link->tgt_prog->aux->is_extended = false;
> +

Isn't this too early to reset 'is_extended'?
E.g. consider scenario:
- thread #1 enters bpf_tracing_link_release() and sets is_extended == false
- thread #2 enters prog_fd_array_get_ptr(), is_extended is false,
            and it proceeds putting tgt_prog to an array;
- execution of tgt_prog is triggered and freplace patch is still in effect,
  because thread #1 had not finished bpf_tracing_link_release() yet.
  Here same bug we are trying to protect against (circular tailcall)
  is still potentially visible, isn't it?

>         WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link,
>                                                 tr_link->trampoline));
> 
> @@ -3518,6 +3521,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog
> *prog,
>         if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
>                 /* we allocated a new trampoline, so free the old one */
>                 bpf_trampoline_put(prog->aux->dst_trampoline);
> +       if (prog->type == BPF_PROG_TYPE_EXT)
> +               tgt_prog->aux->is_extended = true;
>
>         prog->aux->dst_prog = NULL;
>         prog->aux->dst_trampoline = NULL;
> 
> In bpf_tracing_link_release():
> 1. update tr_link->tgt_prog->aux->is_extended = false.
> 2. bpf_trampoline_unlink_prog().
> 
> In bpf_tracing_prog_attach():
> 1. bpf_trampoline_link_prog().
> 2. update tgt_prog->aux->is_extended = true.
> 
> Then, it is able to avoid losing the is_extended mark.
> 
> Thanks,
> Leon
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-09-27 12:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 13:40 [PATCH bpf-next v3 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
2024-09-23 13:40 ` [PATCH bpf-next v3 1/4] bpf: Prevent updating extended prog to prog_array map Leon Hwang
2024-09-25  1:24   ` Eduard Zingerman
2024-09-26  7:16     ` Leon Hwang
2024-09-27 12:23       ` Eduard Zingerman
2024-09-23 13:40 ` [PATCH bpf-next v3 2/4] bpf: Prevent extending tail callee prog with freplace Leon Hwang
2024-09-25  5:32   ` Eduard Zingerman
2024-09-26  7:19     ` Leon Hwang
2024-09-27 10:58       ` Eduard Zingerman
2024-09-23 13:40 ` [PATCH bpf-next v3 3/4] selftests/bpf: Add a test case to confirm a tailcall infinite loop issue has been prevented Leon Hwang
2024-09-23 13:40 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test tailcall in freplace Leon Hwang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.