bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/4] bpf: Fix tailcall infinite loop caused by freplace
@ 2024-09-01 13:38 Leon Hwang
  2024-09-01 13:38 ` [PATCH bpf-next v2 1/4] bpf, x64: " Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Leon Hwang @ 2024-09-01 13:38 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

Previously, I fixed a tailcall infinite loop issue caused by trampoline[0].

At this time, I fix a tailcall infinite loop issue caused by freplace.

Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"),
freplace prog is able to tail call its target prog.

What happens when freplace prog attaches to its target prog's subprog and
tail calls its target prog?

The kernel will panic because TASK stack guard page was hit.

The panic is fixed on both x64 and arm64[1]. Please check the corresponding
patch to see the details.

v1 -> v2:
  * Address comment from Eduard:
    * Explain why nop5 and xor/nop3 are swapped at prologue.
  * Address comment from Alexei:
    * Disallow attaching tail_call_reachable freplace prog to
      not-tail_call_reachable target in verifier.
  * Update "bpf, arm64: Fix tailcall infinite loop caused by freplace" with
    latest arm64 JIT code.

Links:
[0] https://lore.kernel.org/bpf/20230912150442.2009-1-hffilwlqm@gmail.com/
[1] https://github.com/kernel-patches/bpf/pull/7638

Leon Hwang (4):
  bpf, x64: Fix tailcall infinite loop caused by freplace
  bpf, arm64: Fix tailcall infinite loop caused by freplace
  selftests/bpf: Add testcases for another tailcall infinite loop fixing
  selftests/bpf: Fix verifier tailcall jit selftest

 arch/arm64/net/bpf_jit_comp.c                 |  44 +++-
 arch/x86/net/bpf_jit_comp.c                   |  26 ++-
 kernel/bpf/verifier.c                         |   6 +
 .../selftests/bpf/prog_tests/tailcalls.c      | 216 +++++++++++++++++-
 .../tailcall_bpf2bpf_hierarchy_freplace.c     |  30 +++
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  |  37 ++-
 .../bpf/progs/verifier_tailcall_jit.c         |   4 +-
 7 files changed, 344 insertions(+), 19 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c

-- 
2.44.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 1/4] bpf, x64: Fix tailcall infinite loop caused by freplace
  2024-09-01 13:38 [PATCH bpf-next v2 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
@ 2024-09-01 13:38 ` Leon Hwang
  2024-09-13 19:28   ` Alexei Starovoitov
  2024-09-01 13:38 ` [PATCH bpf-next v2 2/4] bpf, arm64: " Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Leon Hwang @ 2024-09-01 13:38 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

This patch fixes a tailcall infinite loop issue caused by freplace.

Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"),
freplace prog is allowed to tail call its target prog. Then, when a
freplace prog attaches to its target prog's subprog and tail calls its
target prog, kernel will panic.

For example:

tc_bpf2bpf.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

__noinline
int subprog_tc(struct __sk_buff *skb)
{
	return skb->len * 2;
}

SEC("tc")
int entry_tc(struct __sk_buff *skb)
{
	return subprog_tc(skb);
}

char __license[] SEC("license") = "GPL";

tailcall_bpf2bpf_hierarchy_freplace.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

int count = 0;

static __noinline
int subprog_tail(struct __sk_buff *skb)
{
	bpf_tail_call_static(skb, &jmp_table, 0);
	return 0;
}

SEC("freplace")
int entry_freplace(struct __sk_buff *skb)
{
	count++;
	subprog_tail(skb);
	subprog_tail(skb);
	return count;
}

char __license[] SEC("license") = "GPL";

The attach target of entry_freplace is subprog_tc, and the tail callee
in subprog_tail is entry_tc.

Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace
-> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
entry_freplace will count from zero for every time of entry_freplace
execution. Kernel will panic:

[   15.310490] BUG: TASK stack guard page was hit at (____ptrval____)
(stack is (____ptrval____)..(____ptrval____))
[   15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI
[   15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G           OE
   6.10.0-rc6-g026dcdae8d3e-dirty #72
[   15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e
fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c
[   15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202
[   15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX:
0000000000008cb5
[   15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI:
ffff9c98808b7e00
[   15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09:
0000000000000000
[   15.310490] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffb500c01af000
[   15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15:
0000000000000000
[   15.310490] FS:  00007f133b665140(0000) GS:ffff9c98bbd00000(0000)
knlGS:0000000000000000
[   15.310490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4:
00000000000006f0
[   15.310490] Call Trace:
[   15.310490]  <#DF>
[   15.310490]  ? die+0x36/0x90
[   15.310490]  ? handle_stack_overflow+0x4d/0x60
[   15.310490]  ? exc_double_fault+0x117/0x1a0
[   15.310490]  ? asm_exc_double_fault+0x23/0x30
[   15.310490]  ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490]  </#DF>
[   15.310490]  <TASK>
[   15.310490]  bpf_prog_85781a698094722f_entry+0x4c/0x64
[   15.310490]  bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b
[   15.310490]  ...
[   15.310490]  bpf_prog_85781a698094722f_entry+0x4c/0x64
[   15.310490]  bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b
[   15.310490]  bpf_test_run+0x210/0x370
[   15.310490]  ? bpf_test_run+0x128/0x370
[   15.310490]  bpf_prog_test_run_skb+0x388/0x7a0
[   15.310490]  __sys_bpf+0xdbf/0x2c40
[   15.310490]  ? clockevents_program_event+0x52/0xf0
[   15.310490]  ? lock_release+0xbf/0x290
[   15.310490]  __x64_sys_bpf+0x1e/0x30
[   15.310490]  do_syscall_64+0x68/0x140
[   15.310490]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   15.310490] RIP: 0033:0x7f133b52725d
[   15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[   15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX:
0000000000000141
[   15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX:
00007f133b52725d
[   15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI:
000000000000000a
[   15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09:
00007ffddbc102a0
[   15.310490] R10: 0000000000000064 R11: 0000000000000206 R12:
0000000000000004
[   15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15:
00007f133b6ed000
[   15.310490]  </TASK>
[   15.310490] Modules linked in: bpf_testmod(OE)
[   15.310490] ---[ end trace 0000000000000000 ]---
[   15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53
[   15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e
fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c
[   15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202
[   15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX:
0000000000008cb5
[   15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI:
ffff9c98808b7e00
[   15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09:
0000000000000000
[   15.310490] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffb500c01af000
[   15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15:
0000000000000000
[   15.310490] FS:  00007f133b665140(0000) GS:ffff9c98bbd00000(0000)
knlGS:0000000000000000
[   15.310490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4:
00000000000006f0
[   15.310490] Kernel panic - not syncing: Fatal exception in interrupt
[   15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)

This patch fixes the issue by initializing tail_call_cnt at the prologue
of entry_tc.

Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to
subprog_tc by rax.
Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to
count tailcall in freplace prog.
Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to
subprog_tail by rax.
Next, while tail calling to entry_tc, the tail_call_cnt on the stack of
entry_tc increments via the tail_call_cnt_ptr.

The whole procedure shows as the following JITed prog dumping.

bpftool p d j n entry_tc:

int entry_tc(struct __sk_buff * skb):
bpf_prog_1c515f389a9059b4_entry_tc:
; return subprog_tc(skb);
   0:	endbr64
   4:	xorq	%rax, %rax		;; rax = 0 (tail_call_cnt)
   7:	nopl	(%rax,%rax)
   c:	pushq	%rbp
   d:	movq	%rsp, %rbp
  10:	endbr64
  14:	cmpq	$33, %rax		;; if rax > 33, rax = tcc_ptr
  18:	ja	0x20			;; if rax > 33 goto 0x20 ---+
  1a:	pushq	%rax			;; [rbp - 8] = rax = 0      |
  1b:	movq	%rsp, %rax		;; rax = rbp - 8            |
  1e:	jmp	0x21			;; ---------+               |
  20:	pushq	%rax			;; <--------|---------------+
  21:	pushq	%rax			;; <--------+ [rbp - 16] = rax
  22:	movq	-16(%rbp), %rax		;; rax = tcc_ptr
  29:	callq	0x70			;; call subprog_tc()
; return subprog_tc(skb);
  2e:	leave
  2f:	retq

int subprog_tc(struct __sk_buff * skb):
bpf_prog_1e8f76e2374a0607_subprog_tc:
; return skb->len * 2;
   0:	endbr64
   4:	nopl	(%rax)			;; do not touch tail_call_cnt
   7:	jmp	0x108			;; jump to entry_freplace()
   c:	pushq	%rbp
   d:	movq	%rsp, %rbp
  10:	endbr64
  14:	pushq	%rax
  15:	pushq	%rax
  16:	movl	112(%rdi), %eax
; return skb->len * 2;
  19:	shll	%eax
; return skb->len * 2;
  1b:	leave
  1c:	retq

bpftool p d j n entry_freplace:

int entry_freplace(struct __sk_buff * skb):
bpf_prog_85781a698094722f_entry_freplace:
; int entry_freplace(struct __sk_buff *skb)
   0:	endbr64
   4:	nopl	(%rax)			;; do not touch tail_call_cnt
   7:	nopl	(%rax,%rax)
   c:	pushq	%rbp
   d:	movq	%rsp, %rbp
  10:	endbr64
  14:	cmpq	$33, %rax		;; if rax > 33, rax = tcc_ptr
  18:	ja	0x20			;; if rax > 33 goto 0x20 ---+
  1a:	pushq	%rax			;; [rbp - 8] = rax = 0      |
  1b:	movq	%rsp, %rax		;; rax = rbp - 8            |
  1e:	jmp	0x21			;; ---------+               |
  20:	pushq	%rax			;; <--------|---------------+
  21:	pushq	%rax			;; <--------+ [rbp - 16] = rax
  22:	pushq	%rbx			;; callee saved
  23:	pushq	%r13			;; callee saved
  25:	movq	%rdi, %rbx		;; rbx = skb (callee saved)
; count++;
  28:	movabsq	$-123406219759616, %r13
  32:	movl	(%r13), %edi
  36:	addl	$1, %edi
  39:	movl	%edi, (%r13)
; subprog_tail(skb);
  3d:	movq	%rbx, %rdi		;; rdi = skb
  40:	movq	-16(%rbp), %rax		;; rax = tcc_ptr
  47:	callq	0x80			;; call subprog_tail()
; subprog_tail(skb);
  4c:	movq	%rbx, %rdi		;; rdi = skb
  4f:	movq	-16(%rbp), %rax		;; rax = tcc_ptr
  56:	callq	0x80			;; call subprog_tail()
; return count;
  5b:	movl	(%r13), %eax
; return count;
  5f:	popq	%r13
  61:	popq	%rbx
  62:	leave
  63:	retq

int subprog_tail(struct __sk_buff * skb):
bpf_prog_3a140cef239a4b4f_subprog_tail:
; int subprog_tail(struct __sk_buff *skb)
   0:	endbr64
   4:	nopl	(%rax)			;; do not touch tail_call_cnt
   7:	nopl	(%rax,%rax)
   c:	pushq	%rbp
   d:	movq	%rsp, %rbp
  10:	endbr64
  14:	pushq	%rax			;; [rbp - 8]  = rax (tcc_ptr)
  15:	pushq	%rax			;; [rbp - 16] = rax (tcc_ptr)
  16:	pushq	%rbx			;; callee saved
  17:	pushq	%r13			;; callee saved
  19:	movq	%rdi, %rbx		;; rbx = skb
; asm volatile("r1 = %[ctx]\n\t"
  1c:	movabsq	$-128494642337280, %r13	;; r13 = jmp_table
  26:	movq	%rbx, %rdi		;; 1st arg, skb
  29:	movq	%r13, %rsi		;; 2nd arg, jmp_table
  2c:	xorl	%edx, %edx		;; 3rd arg, index = 0
  2e:	movq	-16(%rbp), %rax		;; rax = [rbp - 16] (tcc_ptr)
  35:	cmpq	$33, (%rax)
  39:	jae	0x4e			;; if *tcc_ptr >= 33 goto 0x4e --------+
  3b:	nopl	(%rax,%rax)		;; jmp bypass, toggled by poking       |
  40:	addq	$1, (%rax)		;; (*tcc_ptr)++                        |
  44:	popq	%r13			;; callee saved                        |
  46:	popq	%rbx			;; callee saved                        |
  47:	popq	%rax			;; undo rbp-16 push                    |
  48:	popq	%rax			;; undo rbp-8  push                    |
  49:	jmp	0xfffffffffffffe18	;; tail call target, toggled by poking |
; return 0;				;;                                     |
  4e:	popq	%r13			;; restore callee saved <--------------+
  50:	popq	%rbx			;; restore callee saved
  51:	leave
  52:	retq

As a result, the tail_call_cnt is stored on the stack of entry_tc. And
the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace,
subprog_tail and entry_tc.

But wait, what if entry_freplace has tailcall and entry_tc has no
tailcall? It's to disallow attaching this entry_freplace to this
entry_tc in verifier.

And, what if entry_freplace has tailcall and entry_tc has tailcall and
entry_freplace attaches to entry_tc? In this patch, the tailcall info of
entry_freplace inherits from its target. Therefore, it swaps the
positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at
the prologue of entry_tc and then propagate the tail_call_cnt to
entry_freplace.

Fixes: 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 26 ++++++++++++++++++--------
 kernel/bpf/verifier.c       |  6 ++++++
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 074b41fafbe3f..143974b3e953b 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -274,6 +274,8 @@ struct jit_context {
 #define X86_PATCH_SIZE		5
 /* Number of bytes that will be skipped on tailcall */
 #define X86_TAIL_CALL_OFFSET	(12 + ENDBR_INSN_SIZE)
+/* Number of extra bytes that will be skipped on poke */
+#define X86_POKE_EXTRA		3
 
 static void push_r12(u8 **pprog)
 {
@@ -441,17 +443,13 @@ static void emit_prologue_tail_call(u8 **pprog, bool is_subprog)
  */
 static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 			  bool tail_call_reachable, bool is_subprog,
-			  bool is_exception_cb)
+			  bool is_exception_cb, bool is_extension)
 {
 	u8 *prog = *pprog;
 
 	emit_cfi(&prog, is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash);
-	/* BPF trampoline can be made to work without these nops,
-	 * but let's waste 5 bytes for now and optimize later
-	 */
-	emit_nops(&prog, X86_PATCH_SIZE);
 	if (!ebpf_from_cbpf) {
-		if (tail_call_reachable && !is_subprog)
+		if (tail_call_reachable && !is_extension && !is_subprog)
 			/* When it's the entry of the whole tailcall context,
 			 * zeroing rax means initialising tail_call_cnt.
 			 */
@@ -460,6 +458,10 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 			/* Keep the same instruction layout. */
 			emit_nops(&prog, 3);     /* nop3 */
 	}
+	/* BPF trampoline can be made to work without these nops,
+	 * but let's waste 5 bytes for now and optimize later
+	 */
+	emit_nops(&prog, X86_PATCH_SIZE);
 	/* Exception callback receives FP as third parameter */
 	if (is_exception_cb) {
 		EMIT3(0x48, 0x89, 0xF4); /* mov rsp, rsi */
@@ -573,10 +575,13 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 
 	/*
 	 * See emit_prologue(), for IBT builds the trampoline hook is preceded
-	 * with an ENDBR instruction.
+	 * with an ENDBR instruction and 3 bytes tail_call_cnt initialization
+	 * instruction.
 	 */
 	if (is_endbr(*(u32 *)ip))
 		ip += ENDBR_INSN_SIZE;
+	if (is_bpf_text_address((long)ip))
+		ip += X86_POKE_EXTRA;
 
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr);
 }
@@ -1366,6 +1371,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
 {
 	bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
+	bool is_extension = bpf_prog->type == BPF_PROG_TYPE_EXT;
 	struct bpf_insn *insn = bpf_prog->insnsi;
 	bool callee_regs_used[4] = {};
 	int insn_cnt = bpf_prog->len;
@@ -1384,7 +1390,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 
 	emit_prologue(&prog, bpf_prog->aux->stack_depth,
 		      bpf_prog_was_classic(bpf_prog), tail_call_reachable,
-		      bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
+		      bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb,
+		      is_extension);
 	/* Exception callback will clobber callee regs for its own use, and
 	 * restore the original callee regs from main prog's stack frame.
 	 */
@@ -2923,6 +2930,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 		 */
 		if (is_endbr(*(u32 *)orig_call))
 			orig_call += ENDBR_INSN_SIZE;
+		if (is_bpf_text_address((long)orig_call))
+			orig_call += X86_POKE_EXTRA;
 		orig_call += X86_PATCH_SIZE;
 	}
 
@@ -3025,6 +3034,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 		/* remember return value in a stack for bpf prog to access */
 		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
 		im->ip_after_call = image + (prog - (u8 *)rw_image);
+		emit_nops(&prog, X86_POKE_EXTRA);
 		emit_nops(&prog, X86_PATCH_SIZE);
 	}
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 217eb0eafa2a6..0a92fd4e0401e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21898,6 +21898,12 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 			bpf_log(log, "Cannot extend fentry/fexit\n");
 			return -EINVAL;
 		}
+		if (prog_extension && prog->aux->tail_call_reachable &&
+		    !(aux->func ? aux->func[subprog]->aux->tail_call_reachable :
+		      aux->tail_call_reachable)) {
+			bpf_log(log, "Cannot extend no-tailcall target with tailcall ext prog\n");
+			return -EINVAL;
+		}
 	} else {
 		if (prog_extension) {
 			bpf_log(log, "Cannot replace kernel functions\n");
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-01 13:38 [PATCH bpf-next v2 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
  2024-09-01 13:38 ` [PATCH bpf-next v2 1/4] bpf, x64: " Leon Hwang
@ 2024-09-01 13:38 ` Leon Hwang
  2024-09-08 13:01   ` Leon Hwang
  2024-09-01 13:38 ` [PATCH bpf-next v2 3/4] selftests/bpf: Add testcases for another tailcall infinite loop fixing Leon Hwang
  2024-09-01 13:38 ` [PATCH bpf-next v2 4/4] selftests/bpf: Fix verifier tailcall jit selftest Leon Hwang
  3 siblings, 1 reply; 14+ messages in thread
From: Leon Hwang @ 2024-09-01 13:38 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the same
issue happens on arm64, too.

For example:

tc_bpf2bpf.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

__noinline
int subprog_tc(struct __sk_buff *skb)
{
	return skb->len * 2;
}

SEC("tc")
int entry_tc(struct __sk_buff *skb)
{
	return subprog(skb);
}

char __license[] SEC("license") = "GPL";

tailcall_bpf2bpf_hierarchy_freplace.c:

// SPDX-License-Identifier: GPL-2.0
\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

int count = 0;

static __noinline
int subprog_tail(struct __sk_buff *skb)
{
	bpf_tail_call_static(skb, &jmp_table, 0);
	return 0;
}

SEC("freplace")
int entry_freplace(struct __sk_buff *skb)
{
	count++;
	subprog_tail(skb);
	subprog_tail(skb);
	return count;
}

char __license[] SEC("license") = "GPL";

The attach target of entry_freplace is subprog_tc, and the tail callee
in subprog_tail is entry_tc.

Then, the infinite loop will be entry_tc -> entry_tc -> entry_freplace ->
subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
entry_freplace will count from zero for every time of entry_freplace
execution.

This patch fixes the issue by avoiding touching tail_call_cnt at
prologue when it's subprog or freplace prog.

Then, when freplace prog attaches to entry_tc, it has to initialize
tail_call_cnt and tail_call_cnt_ptr, because its target is main prog and
its target's prologue hasn't initialize them before the attach hook.

So, this patch uses x7 register to tell freplace prog that its target
prog is main prog or not.

Meanwhile, while tail calling to a freplace prog, it is required to
reset x7 register to prevent re-initializing tail_call_cnt at freplace
prog's prologue.

Fixes: 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/arm64/net/bpf_jit_comp.c | 44 +++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8aa32cb140b9e..cdc12dd83c4be 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -277,6 +277,7 @@ static bool is_lsi_offset(int offset, int scale)
 /* generated main prog prologue:
  *      bti c // if CONFIG_ARM64_BTI_KERNEL
  *      mov x9, lr
+ *      mov x7, 1 // if not-freplace main prog
  *      nop  // POKE_OFFSET
  *      paciasp // if CONFIG_ARM64_PTR_AUTH_KERNEL
  *      stp x29, lr, [sp, #-16]!
@@ -288,15 +289,19 @@ static bool is_lsi_offset(int offset, int scale)
  */
 static void prepare_bpf_tail_call_cnt(struct jit_ctx *ctx)
 {
+	const bool is_ext = ctx->prog->type == BPF_PROG_TYPE_EXT;
 	const bool is_main_prog = !bpf_is_subprog(ctx->prog);
 	const u8 ptr = bpf2a64[TCCNT_PTR];
 
-	if (is_main_prog) {
+	if (is_main_prog && !is_ext) {
 		/* Initialize tail_call_cnt. */
 		emit(A64_PUSH(A64_ZR, ptr, A64_SP), ctx);
 		emit(A64_MOV(1, ptr, A64_SP), ctx);
-	} else
+	} else {
+		/* Keep the same insn layout for freplace main prog. */
 		emit(A64_PUSH(ptr, ptr, A64_SP), ctx);
+		emit(A64_NOP, ctx);
+	}
 }
 
 static void find_used_callee_regs(struct jit_ctx *ctx)
@@ -416,16 +421,20 @@ static void pop_callee_regs(struct jit_ctx *ctx)
 #define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
 
 /* Offset of nop instruction in bpf prog entry to be poked */
-#define POKE_OFFSET (BTI_INSNS + 1)
+#define POKE_OFFSET (BTI_INSNS + 2)
 
 /* Tail call offset to jump into */
-#define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 4)
+#define PROLOGUE_OFFSET (BTI_INSNS + 3 + PAC_INSNS + 4)
 
 static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 {
 	const struct bpf_prog *prog = ctx->prog;
+	const bool is_ext = prog->type == BPF_PROG_TYPE_EXT;
 	const bool is_main_prog = !bpf_is_subprog(prog);
+	const u8 r0 = bpf2a64[BPF_REG_0];
 	const u8 fp = bpf2a64[BPF_REG_FP];
+	const u8 ptr = bpf2a64[TCCNT_PTR];
+	const u8 tmp = bpf2a64[TMP_REG_1];
 	const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
 	const int idx0 = ctx->idx;
 	int cur_offset;
@@ -462,6 +471,10 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	emit_bti(A64_BTI_JC, ctx);
 
 	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
+	if (!is_ext)
+		emit(A64_MOVZ(1, r0, is_main_prog, 0), ctx);
+	else
+		emit(A64_NOP, ctx);
 	emit(A64_NOP, ctx);
 
 	if (!prog->aux->exception_cb) {
@@ -485,6 +498,18 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 			/* BTI landing pad for the tail call, done with a BR */
 			emit_bti(A64_BTI_J, ctx);
 		}
+		/* If freplace's target prog is main prog, it has to make x26 as
+		 * tail_call_cnt_ptr, and then initialize tail_call_cnt via the
+		 * tail_call_cnt_ptr.
+		 */
+		if (is_main_prog && is_ext) {
+			emit(A64_MOVZ(1, tmp, 1, 0), ctx);
+			emit(A64_CMP(1, r0, tmp), ctx);
+			emit(A64_B_(A64_COND_NE, 4), ctx);
+			emit(A64_MOV(1, ptr, A64_SP), ctx);
+			emit(A64_MOVZ(1, r0, 0, 0), ctx);
+			emit(A64_STR64I(r0, ptr, 0), ctx);
+		}
 		push_callee_regs(ctx);
 	} else {
 		/*
@@ -521,6 +546,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 
 static int emit_bpf_tail_call(struct jit_ctx *ctx)
 {
+	const u8 r0 = bpf2a64[BPF_REG_0];
 	/* bpf_tail_call(void *prog_ctx, struct bpf_array *array, u64 index) */
 	const u8 r2 = bpf2a64[BPF_REG_2];
 	const u8 r3 = bpf2a64[BPF_REG_3];
@@ -579,6 +605,12 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 
 	pop_callee_regs(ctx);
 
+	/* When freplace prog tail calls freplace prog, setting r0 as 0 is to
+	 * prevent re-initializing tail_call_cnt at the prologue of target
+	 * freplace prog.
+	 */
+	emit(A64_MOVZ(1, r0, 0, 0), ctx);
+
 	/* goto *(prog->bpf_func + prologue_offset); */
 	off = offsetof(struct bpf_prog, bpf_func);
 	emit_a64_mov_i64(tmp, off, ctx);
@@ -2189,9 +2221,10 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 		emit(A64_RET(A64_R(10)), ctx);
 		/* store return value */
 		emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
-		/* reserve a nop for bpf_tramp_image_put */
+		/* reserve two nops for bpf_tramp_image_put */
 		im->ip_after_call = ctx->ro_image + ctx->idx;
 		emit(A64_NOP, ctx);
+		emit(A64_NOP, ctx);
 	}
 
 	/* update the branches saved in invoke_bpf_mod_ret with cbnz */
@@ -2474,6 +2507,7 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
 		/* skip to the nop instruction in bpf prog entry:
 		 * bti c // if BTI enabled
 		 * mov x9, x30
+		 * mov x7, 1 // if not-freplace main prog
 		 * nop
 		 */
 		ip = image + POKE_OFFSET * AARCH64_INSN_SIZE;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 3/4] selftests/bpf: Add testcases for another tailcall infinite loop fixing
  2024-09-01 13:38 [PATCH bpf-next v2 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
  2024-09-01 13:38 ` [PATCH bpf-next v2 1/4] bpf, x64: " Leon Hwang
  2024-09-01 13:38 ` [PATCH bpf-next v2 2/4] bpf, arm64: " Leon Hwang
@ 2024-09-01 13:38 ` Leon Hwang
  2024-09-01 13:38 ` [PATCH bpf-next v2 4/4] selftests/bpf: Fix verifier tailcall jit selftest Leon Hwang
  3 siblings, 0 replies; 14+ messages in thread
From: Leon Hwang @ 2024-09-01 13:38 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

The issue has been fixed on both x64 and arm64.

On x64:

cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
334/26  tailcalls/tailcall_bpf2bpf_freplace:OK
334/27  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_1:OK
334/28  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_2:OK
334/29  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_3:OK
334/30  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_4:OK
334/31  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_5:OK
334     tailcalls:OK
Summary: 1/31 PASSED, 0 SKIPPED, 0 FAILED

on arm64:

cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
334/26  tailcalls/tailcall_bpf2bpf_freplace:OK
334/27  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_1:OK
334/28  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_2:OK
334/29  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_3:OK
334/30  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_4:OK
334/31  tailcalls/tailcall_bpf2bpf_hierarchy_freplace_5:OK
334     tailcalls:OK
Summary: 1/31 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 216 +++++++++++++++++-
 .../tailcall_bpf2bpf_hierarchy_freplace.c     |  30 +++
 .../testing/selftests/bpf/progs/tc_bpf2bpf.c  |  37 ++-
 3 files changed, 279 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index 21c5a37846ade..0faa0f57f71d4 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -5,6 +5,7 @@
 #include "tailcall_poke.skel.h"
 #include "tailcall_bpf2bpf_hierarchy2.skel.h"
 #include "tailcall_bpf2bpf_hierarchy3.skel.h"
+#include "tailcall_bpf2bpf_hierarchy_freplace.skel.h"
 #include "tailcall_freplace.skel.h"
 #include "tc_bpf2bpf.skel.h"
 
@@ -1525,7 +1526,8 @@ static void test_tailcall_freplace(void)
 
 	prog_fd = bpf_program__fd(tc_skel->progs.entry_tc);
 	freplace_prog = freplace_skel->progs.entry_freplace;
-	err = bpf_program__set_attach_target(freplace_prog, prog_fd, "subprog");
+	err = bpf_program__set_attach_target(freplace_prog, prog_fd,
+					     "subprog_tailcall_tc");
 	if (!ASSERT_OK(err, "set_attach_target"))
 		goto out;
 
@@ -1534,7 +1536,7 @@ static void test_tailcall_freplace(void)
 		goto out;
 
 	freplace_link = bpf_program__attach_freplace(freplace_prog, prog_fd,
-						     "subprog");
+						     "subprog_tailcall_tc");
 	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
 		goto out;
 
@@ -1556,6 +1558,204 @@ static void test_tailcall_freplace(void)
 	tailcall_freplace__destroy(freplace_skel);
 }
 
+/* test_tailcall_bpf2bpf_freplace checks that the count value of the tail
+ * call limit enforcement matches with expectation for this case:
+ *
+ * entry_tc --> subprog_tailcall_tc --jump-> entry_freplace --tailcall-> entry_tc
+ *
+ * Meanwhile, check the failure that fails to attach tailcall-reachable freplace
+ * prog to not-tailcall-reachable subprog.
+ */
+static void test_tailcall_bpf2bpf_freplace(void)
+{
+	struct tailcall_freplace *freplace_skel = NULL;
+	struct bpf_link *freplace_link = NULL;
+	struct tc_bpf2bpf *tc_skel = NULL;
+	char buff[128] = {};
+	int prog_fd, map_fd;
+	int err, i;
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .data_in = buff,
+		    .data_size_in = sizeof(buff),
+		    .repeat = 1,
+	);
+
+	tc_skel = tc_bpf2bpf__open_and_load();
+	if (!ASSERT_OK_PTR(tc_skel, "tc_bpf2bpf__open_and_load"))
+		goto out;
+
+	prog_fd = bpf_program__fd(tc_skel->progs.entry_tc);
+	freplace_skel = tailcall_freplace__open();
+	if (!ASSERT_OK_PTR(freplace_skel, "tailcall_freplace__open"))
+		goto out;
+
+	err = bpf_program__set_attach_target(freplace_skel->progs.entry_freplace,
+					     prog_fd, "subprog_tailcall_tc");
+	if (!ASSERT_OK(err, "set_attach_target"))
+		goto out;
+
+	err = tailcall_freplace__load(freplace_skel);
+	if (!ASSERT_OK(err, "tailcall_freplace__load"))
+		goto out;
+
+	i = 0;
+	map_fd = bpf_map__fd(freplace_skel->maps.jmp_table);
+	err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tc");
+	if (!ASSERT_ERR_PTR(freplace_link, "attach_freplace fail"))
+		goto out;
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, "subprog_tailcall_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts");
+	ASSERT_EQ(topts.retval, 34, "test_run_opts retval");
+
+out:
+	bpf_link__destroy(freplace_link);
+	tailcall_freplace__destroy(freplace_skel);
+	tc_bpf2bpf__destroy(tc_skel);
+}
+
+static void test_tailcall_bpf2bpf_hierarchy_freplace(bool freplace_subprog,
+						     bool tailcall_tc,
+						     bool target_entry2,
+						     int count1, int count2)
+{
+	struct tailcall_bpf2bpf_hierarchy_freplace *freplace_skel = NULL;
+	struct bpf_link *freplace_link = NULL;
+	int freplace_prog_fd, prog_fd, map_fd;
+	struct tc_bpf2bpf *tc_skel = NULL;
+	char buff[128] = {};
+	int err, i, val;
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .data_in = buff,
+		    .data_size_in = sizeof(buff),
+		    .repeat = 1,
+	);
+
+	tc_skel = tc_bpf2bpf__open_and_load();
+	if (!ASSERT_OK_PTR(tc_skel, "tc_bpf2bpf__open_and_load"))
+		goto out;
+
+	prog_fd = bpf_program__fd(target_entry2 ? tc_skel->progs.entry_tc_2 :
+				  tc_skel->progs.entry_tc);
+	freplace_skel = tailcall_bpf2bpf_hierarchy_freplace__open();
+	if (!ASSERT_OK_PTR(freplace_skel, "tailcall_bpf2bpf_hierarchy_freplace__open"))
+		goto out;
+
+	err = bpf_program__set_attach_target(freplace_skel->progs.entry_freplace,
+					     prog_fd, freplace_subprog ?
+					     "subprog_tailcall_tc" : "entry_tc");
+	if (!ASSERT_OK(err, "set_attach_target"))
+		goto out;
+
+	err = tailcall_bpf2bpf_hierarchy_freplace__load(freplace_skel);
+	if (!ASSERT_OK(err, "tailcall_bpf2bpf_hierarchy_freplace__load"))
+		goto out;
+
+	freplace_prog_fd = bpf_program__fd(freplace_skel->progs.entry_freplace);
+	map_fd = bpf_map__fd(freplace_skel->maps.jmp_table);
+	val = tailcall_tc ? prog_fd : freplace_prog_fd;
+	i = 0;
+	err = bpf_map_update_elem(map_fd, &i, &val, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	freplace_link = bpf_program__attach_freplace(freplace_skel->progs.entry_freplace,
+						     prog_fd, freplace_subprog ?
+						     "subprog_tailcall_tc" : "entry_tc");
+	if (!ASSERT_OK_PTR(freplace_link, "attach_freplace"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts");
+	ASSERT_EQ(topts.retval, count1, "test_run_opts retval");
+
+	i = 0;
+	err = bpf_map_delete_elem(map_fd, &i);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts again");
+	ASSERT_EQ(topts.retval, count2, "test_run_opts retval again");
+
+out:
+	bpf_link__destroy(freplace_link);
+	tailcall_bpf2bpf_hierarchy_freplace__destroy(freplace_skel);
+	tc_bpf2bpf__destroy(tc_skel);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_1 checks the count value of tail
+ * call limit enforcement matches with expectation for this case:
+ *
+ *                                    subprog_tail --tailcall-> entry_freplace
+ * entry_tc --jump-> entry_freplace <
+ *                                    subprog_tail --tailcall-> entry_freplace
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_1(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(false, false, false, 34, 35);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_2 checks the count value of tail
+ * call limit enforcement matches with expectation for this case:
+ *
+ *                                                            subprog_tail --tailcall-> entry_freplace
+ * entry_tc --> subprog_tailcall_tc --jump-> entry_freplace <
+ *                                                            subprog_tail --tailcall-> entry_freplace
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_2(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(true, false, false, 34, 35);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_3 checks the count value of tail
+ * call limit enforcement matches with expectation for this case:
+ *
+ *                                                            subprog_tail --tailcall-> entry_tc
+ * entry_tc --> subprog_tailcall_tc --jump-> entry_freplace <
+ *                                                            subprog_tail --tailcall-> entry_tc
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_3(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(true, true, false, 34, 35);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_4 checks the count value of tail
+ * call limit enforcement matches with expectation for this case:
+ *
+ *                                                                              subprog_tail --tailcall-> entry_freplace
+ * entry_tc_2 --> subprog_tailcall_tc (call 10 times) --jump-> entry_freplace <
+ *                                                                              subprog_tail --tailcall-> entry_freplace
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_4(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(true, false, true, 43, 53);
+}
+
+/* test_tailcall_bpf2bpf_hierarchy_freplace_5 checks the count value of tail
+ * call limit enforcement matches with expectation for this case:
+ *
+ *                                                                              subprog_tail --tailcall-> entry_tc_2
+ * entry_tc_2 --> subprog_tailcall_tc (call 10 times) --jump-> entry_freplace <
+ *                                                                              subprog_tail --tailcall-> entry_tc_2
+ */
+static void test_tailcall_bpf2bpf_hierarchy_freplace_5(void)
+{
+	test_tailcall_bpf2bpf_hierarchy_freplace(true, true, true, 340, 350);
+}
+
 void test_tailcalls(void)
 {
 	if (test__start_subtest("tailcall_1"))
@@ -1606,4 +1806,16 @@ void test_tailcalls(void)
 	test_tailcall_bpf2bpf_hierarchy_3();
 	if (test__start_subtest("tailcall_freplace"))
 		test_tailcall_freplace();
+	if (test__start_subtest("tailcall_bpf2bpf_freplace"))
+		test_tailcall_bpf2bpf_freplace();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_1"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_1();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_2"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_2();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_3"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_3();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_4"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_4();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_freplace_5"))
+		test_tailcall_bpf2bpf_hierarchy_freplace_5();
 }
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c
new file mode 100644
index 0000000000000..6f7c1fac9ddb7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_freplace.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+int count = 0;
+
+static __noinline
+int subprog_tail(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	return 0;
+}
+
+SEC("freplace")
+int entry_freplace(struct __sk_buff *skb)
+{
+	count++;
+	subprog_tail(skb);
+	subprog_tail(skb);
+	return count;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
index 8a0632c37839a..beacf60a52677 100644
--- a/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/tc_bpf2bpf.c
@@ -4,11 +4,30 @@
 #include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"
 
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+__noinline
+int subprog_tailcall_tc(struct __sk_buff *skb)
+{
+	int ret = 1;
+
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	__sink(skb);
+	__sink(ret);
+	return ret;
+}
+
 __noinline
-int subprog(struct __sk_buff *skb)
+int subprog_tc(struct __sk_buff *skb)
 {
 	int ret = 1;
 
+	__sink(skb);
 	__sink(ret);
 	return ret;
 }
@@ -16,7 +35,21 @@ int subprog(struct __sk_buff *skb)
 SEC("tc")
 int entry_tc(struct __sk_buff *skb)
 {
-	return subprog(skb);
+	subprog_tc(skb);
+	return subprog_tailcall_tc(skb);
+}
+
+SEC("tc")
+int entry_tc_2(struct __sk_buff *skb)
+{
+	int ret, i;
+
+	for (i = 0; i < 10; i++) {
+		ret = subprog_tailcall_tc(skb);
+		__sink(ret);
+	}
+
+	return ret;
 }
 
 char __license[] SEC("license") = "GPL";
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 4/4] selftests/bpf: Fix verifier tailcall jit selftest
  2024-09-01 13:38 [PATCH bpf-next v2 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
                   ` (2 preceding siblings ...)
  2024-09-01 13:38 ` [PATCH bpf-next v2 3/4] selftests/bpf: Add testcases for another tailcall infinite loop fixing Leon Hwang
@ 2024-09-01 13:38 ` Leon Hwang
  3 siblings, 0 replies; 14+ messages in thread
From: Leon Hwang @ 2024-09-01 13:38 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, leon.hwang, kernel-patches-bot

The verifier tailcall jit selftest is broken, because the prologue of x86
bpf prog has been updated to fix the tailcall infinite loop issue caused
by freplace.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
index 8d60c634a114f..60a6045e2d9ae 100644
--- a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
+++ b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
@@ -33,8 +33,8 @@ __success
 __arch_x86_64
 /* program entry for main(), regular function prologue */
 __jited("	endbr64")
-__jited("	nopl	(%rax,%rax)")
 __jited("	xorq	%rax, %rax")
+__jited("	nopl	(%rax,%rax)")
 __jited("	pushq	%rbp")
 __jited("	movq	%rsp, %rbp")
 /* tail call prologue for program:
@@ -63,8 +63,8 @@ __jited("	{{(retq|jmp	0x)}}")		/* return or jump to rethunk */
 __jited("...")
 /* subprogram entry for sub(), regular function prologue */
 __jited("	endbr64")
-__jited("	nopl	(%rax,%rax)")
 __jited("	nopl	(%rax)")
+__jited("	nopl	(%rax,%rax)")
 __jited("	pushq	%rbp")
 __jited("	movq	%rsp, %rbp")
 /* tail call prologue for subprogram address of tail call counter
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-01 13:38 ` [PATCH bpf-next v2 2/4] bpf, arm64: " Leon Hwang
@ 2024-09-08 13:01   ` Leon Hwang
  2024-09-09  9:02     ` Xu Kuohai
  0 siblings, 1 reply; 14+ messages in thread
From: Leon Hwang @ 2024-09-08 13:01 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, kernel-patches-bot



On 1/9/24 21:38, Leon Hwang wrote:
> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the same
> issue happens on arm64, too.
> 
> For example:
> 
> tc_bpf2bpf.c:
> 
> // SPDX-License-Identifier: GPL-2.0
> \#include <linux/bpf.h>
> \#include <bpf/bpf_helpers.h>
> 
> __noinline
> int subprog_tc(struct __sk_buff *skb)
> {
> 	return skb->len * 2;
> }
> 
> SEC("tc")
> int entry_tc(struct __sk_buff *skb)
> {
> 	return subprog(skb);
> }
> 
> char __license[] SEC("license") = "GPL";
> 
> tailcall_bpf2bpf_hierarchy_freplace.c:
> 
> // SPDX-License-Identifier: GPL-2.0
> \#include <linux/bpf.h>
> \#include <bpf/bpf_helpers.h>
> 
> struct {
> 	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
> 	__uint(max_entries, 1);
> 	__uint(key_size, sizeof(__u32));
> 	__uint(value_size, sizeof(__u32));
> } jmp_table SEC(".maps");
> 
> int count = 0;
> 
> static __noinline
> int subprog_tail(struct __sk_buff *skb)
> {
> 	bpf_tail_call_static(skb, &jmp_table, 0);
> 	return 0;
> }
> 
> SEC("freplace")
> int entry_freplace(struct __sk_buff *skb)
> {
> 	count++;
> 	subprog_tail(skb);
> 	subprog_tail(skb);
> 	return count;
> }
> 
> char __license[] SEC("license") = "GPL";
> 
> The attach target of entry_freplace is subprog_tc, and the tail callee
> in subprog_tail is entry_tc.
> 
> Then, the infinite loop will be entry_tc -> entry_tc -> entry_freplace ->
> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
> entry_freplace will count from zero for every time of entry_freplace
> execution.
> 
> This patch fixes the issue by avoiding touching tail_call_cnt at
> prologue when it's subprog or freplace prog.
> 
> Then, when freplace prog attaches to entry_tc, it has to initialize
> tail_call_cnt and tail_call_cnt_ptr, because its target is main prog and
> its target's prologue hasn't initialize them before the attach hook.
> 
> So, this patch uses x7 register to tell freplace prog that its target
> prog is main prog or not.
> 
> Meanwhile, while tail calling to a freplace prog, it is required to
> reset x7 register to prevent re-initializing tail_call_cnt at freplace
> prog's prologue.
> 
> Fixes: 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility")
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  arch/arm64/net/bpf_jit_comp.c | 44 +++++++++++++++++++++++++++++++----
>  1 file changed, 39 insertions(+), 5 deletions(-)
> 
Hi Puranjay and Kuohai,

As it's not recommended to introduce arch_bpf_run(), this is my approach
to fix the niche case on arm64.

Do you have any better idea to fix it?

Thanks,
Leon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-08 13:01   ` Leon Hwang
@ 2024-09-09  9:02     ` Xu Kuohai
  2024-09-09 10:38       ` Leon Hwang
  0 siblings, 1 reply; 14+ messages in thread
From: Xu Kuohai @ 2024-09-09  9:02 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	xukuohai, eddyz87, iii, kernel-patches-bot

On 9/8/2024 9:01 PM, Leon Hwang wrote:
> 
> 
> On 1/9/24 21:38, Leon Hwang wrote:
>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the same
>> issue happens on arm64, too.
>>
>> For example:
>>
>> tc_bpf2bpf.c:
>>
>> // SPDX-License-Identifier: GPL-2.0
>> \#include <linux/bpf.h>
>> \#include <bpf/bpf_helpers.h>
>>
>> __noinline
>> int subprog_tc(struct __sk_buff *skb)
>> {
>> 	return skb->len * 2;
>> }
>>
>> SEC("tc")
>> int entry_tc(struct __sk_buff *skb)
>> {
>> 	return subprog(skb);
>> }
>>
>> char __license[] SEC("license") = "GPL";
>>
>> tailcall_bpf2bpf_hierarchy_freplace.c:
>>
>> // SPDX-License-Identifier: GPL-2.0
>> \#include <linux/bpf.h>
>> \#include <bpf/bpf_helpers.h>
>>
>> struct {
>> 	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>> 	__uint(max_entries, 1);
>> 	__uint(key_size, sizeof(__u32));
>> 	__uint(value_size, sizeof(__u32));
>> } jmp_table SEC(".maps");
>>
>> int count = 0;
>>
>> static __noinline
>> int subprog_tail(struct __sk_buff *skb)
>> {
>> 	bpf_tail_call_static(skb, &jmp_table, 0);
>> 	return 0;
>> }
>>
>> SEC("freplace")
>> int entry_freplace(struct __sk_buff *skb)
>> {
>> 	count++;
>> 	subprog_tail(skb);
>> 	subprog_tail(skb);
>> 	return count;
>> }
>>
>> char __license[] SEC("license") = "GPL";
>>
>> The attach target of entry_freplace is subprog_tc, and the tail callee
>> in subprog_tail is entry_tc.
>>
>> Then, the infinite loop will be entry_tc -> entry_tc -> entry_freplace ->
>> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
>> entry_freplace will count from zero for every time of entry_freplace
>> execution.
>>
>> This patch fixes the issue by avoiding touching tail_call_cnt at
>> prologue when it's subprog or freplace prog.
>>
>> Then, when freplace prog attaches to entry_tc, it has to initialize
>> tail_call_cnt and tail_call_cnt_ptr, because its target is main prog and
>> its target's prologue hasn't initialize them before the attach hook.
>>
>> So, this patch uses x7 register to tell freplace prog that its target
>> prog is main prog or not.
>>
>> Meanwhile, while tail calling to a freplace prog, it is required to
>> reset x7 register to prevent re-initializing tail_call_cnt at freplace
>> prog's prologue.
>>
>> Fixes: 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility")
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>   arch/arm64/net/bpf_jit_comp.c | 44 +++++++++++++++++++++++++++++++----
>>   1 file changed, 39 insertions(+), 5 deletions(-)
>>
> Hi Puranjay and Kuohai,
> 
> As it's not recommended to introduce arch_bpf_run(), this is my approach
> to fix the niche case on arm64.
> 
> Do you have any better idea to fix it?
>

IIUC, the recommended appraoch is to teach verifier to reject the
freplace + tailcall combination. If this combiation is allowed, we
will face more than just this issue. For example, what happens if
a freplace prog is attached to tail callee? The freplace prog is not
reachable through the tail call, right?

> Thanks,
> Leon
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-09  9:02     ` Xu Kuohai
@ 2024-09-09 10:38       ` Leon Hwang
  2024-09-09 12:08         ` Xu Kuohai
  0 siblings, 1 reply; 14+ messages in thread
From: Leon Hwang @ 2024-09-09 10:38 UTC (permalink / raw)
  To: Xu Kuohai, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	eddyz87, iii, kernel-patches-bot



On 9/9/24 17:02, Xu Kuohai wrote:
> On 9/8/2024 9:01 PM, Leon Hwang wrote:
>>
>>
>> On 1/9/24 21:38, Leon Hwang wrote:
>>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the same
>>> issue happens on arm64, too.
>>>
>>> For example:
>>>
>>> tc_bpf2bpf.c:
>>>
>>> // SPDX-License-Identifier: GPL-2.0
>>> \#include <linux/bpf.h>
>>> \#include <bpf/bpf_helpers.h>
>>>
>>> __noinline
>>> int subprog_tc(struct __sk_buff *skb)
>>> {
>>>     return skb->len * 2;
>>> }
>>>
>>> SEC("tc")
>>> int entry_tc(struct __sk_buff *skb)
>>> {
>>>     return subprog(skb);
>>> }
>>>
>>> char __license[] SEC("license") = "GPL";
>>>
>>> tailcall_bpf2bpf_hierarchy_freplace.c:
>>>
>>> // SPDX-License-Identifier: GPL-2.0
>>> \#include <linux/bpf.h>
>>> \#include <bpf/bpf_helpers.h>
>>>
>>> struct {
>>>     __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>>>     __uint(max_entries, 1);
>>>     __uint(key_size, sizeof(__u32));
>>>     __uint(value_size, sizeof(__u32));
>>> } jmp_table SEC(".maps");
>>>
>>> int count = 0;
>>>
>>> static __noinline
>>> int subprog_tail(struct __sk_buff *skb)
>>> {
>>>     bpf_tail_call_static(skb, &jmp_table, 0);
>>>     return 0;
>>> }
>>>
>>> SEC("freplace")
>>> int entry_freplace(struct __sk_buff *skb)
>>> {
>>>     count++;
>>>     subprog_tail(skb);
>>>     subprog_tail(skb);
>>>     return count;
>>> }
>>>
>>> char __license[] SEC("license") = "GPL";
>>>
>>> The attach target of entry_freplace is subprog_tc, and the tail callee
>>> in subprog_tail is entry_tc.
>>>
>>> Then, the infinite loop will be entry_tc -> entry_tc ->
>>> entry_freplace ->
>>> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
>>> entry_freplace will count from zero for every time of entry_freplace
>>> execution.
>>>
>>> This patch fixes the issue by avoiding touching tail_call_cnt at
>>> prologue when it's subprog or freplace prog.
>>>
>>> Then, when freplace prog attaches to entry_tc, it has to initialize
>>> tail_call_cnt and tail_call_cnt_ptr, because its target is main prog and
>>> its target's prologue hasn't initialize them before the attach hook.
>>>
>>> So, this patch uses x7 register to tell freplace prog that its target
>>> prog is main prog or not.
>>>
>>> Meanwhile, while tail calling to a freplace prog, it is required to
>>> reset x7 register to prevent re-initializing tail_call_cnt at freplace
>>> prog's prologue.
>>>
>>> Fixes: 1c123c567fb1 ("bpf: Resolve fext program type when checking
>>> map compatibility")
>>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>>> ---
>>>   arch/arm64/net/bpf_jit_comp.c | 44 +++++++++++++++++++++++++++++++----
>>>   1 file changed, 39 insertions(+), 5 deletions(-)
>>>
>> Hi Puranjay and Kuohai,
>>
>> As it's not recommended to introduce arch_bpf_run(), this is my approach
>> to fix the niche case on arm64.
>>
>> Do you have any better idea to fix it?
>>
> 
> IIUC, the recommended appraoch is to teach verifier to reject the
> freplace + tailcall combination. If this combiation is allowed, we
> will face more than just this issue. For example, what happens if
> a freplace prog is attached to tail callee? The freplace prog is not
> reachable through the tail call, right?
> 

It's to reject the freplace + tailcall combination partially, see "bpf,
x64: Fix tailcall infinite loop caused by freplace". (Oh, I should
separate the rejection to a standalone patch.)
It rejects the case that freplace prog has tailcall and its attach
target has no tailcall.

As for your example, it depends on:

                freplace       target    reject?
Has tailcall?     YES            NO        YES
Has tailcall?     YES            YES       NO
Has tailcall?     NO             YES       NO
Has tailcall?     NO             YES       NO

Then, freplace prog can be tail callee always. I haven't seen any bad
case when freplace prog is tail callee.

I'm not planning to disable freplace + tailcall combination totally,
because I use this combination in an in-house XDP project of my company.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-09 10:38       ` Leon Hwang
@ 2024-09-09 12:08         ` Xu Kuohai
  2024-09-09 14:42           ` Leon Hwang
  0 siblings, 1 reply; 14+ messages in thread
From: Xu Kuohai @ 2024-09-09 12:08 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	eddyz87, iii, kernel-patches-bot

On 9/9/2024 6:38 PM, Leon Hwang wrote:
> 
> 
> On 9/9/24 17:02, Xu Kuohai wrote:
>> On 9/8/2024 9:01 PM, Leon Hwang wrote:
>>>
>>>
>>> On 1/9/24 21:38, Leon Hwang wrote:
>>>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the same
>>>> issue happens on arm64, too.
>>>>
>>>> For example:
>>>>
>>>> tc_bpf2bpf.c:
>>>>
>>>> // SPDX-License-Identifier: GPL-2.0
>>>> \#include <linux/bpf.h>
>>>> \#include <bpf/bpf_helpers.h>
>>>>
>>>> __noinline
>>>> int subprog_tc(struct __sk_buff *skb)
>>>> {
>>>>      return skb->len * 2;
>>>> }
>>>>
>>>> SEC("tc")
>>>> int entry_tc(struct __sk_buff *skb)
>>>> {
>>>>      return subprog(skb);
>>>> }
>>>>
>>>> char __license[] SEC("license") = "GPL";
>>>>
>>>> tailcall_bpf2bpf_hierarchy_freplace.c:
>>>>
>>>> // SPDX-License-Identifier: GPL-2.0
>>>> \#include <linux/bpf.h>
>>>> \#include <bpf/bpf_helpers.h>
>>>>
>>>> struct {
>>>>      __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>>>>      __uint(max_entries, 1);
>>>>      __uint(key_size, sizeof(__u32));
>>>>      __uint(value_size, sizeof(__u32));
>>>> } jmp_table SEC(".maps");
>>>>
>>>> int count = 0;
>>>>
>>>> static __noinline
>>>> int subprog_tail(struct __sk_buff *skb)
>>>> {
>>>>      bpf_tail_call_static(skb, &jmp_table, 0);
>>>>      return 0;
>>>> }
>>>>
>>>> SEC("freplace")
>>>> int entry_freplace(struct __sk_buff *skb)
>>>> {
>>>>      count++;
>>>>      subprog_tail(skb);
>>>>      subprog_tail(skb);
>>>>      return count;
>>>> }
>>>>
>>>> char __license[] SEC("license") = "GPL";
>>>>
>>>> The attach target of entry_freplace is subprog_tc, and the tail callee
>>>> in subprog_tail is entry_tc.
>>>>
>>>> Then, the infinite loop will be entry_tc -> entry_tc ->
>>>> entry_freplace ->
>>>> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in
>>>> entry_freplace will count from zero for every time of entry_freplace
>>>> execution.
>>>>
>>>> This patch fixes the issue by avoiding touching tail_call_cnt at
>>>> prologue when it's subprog or freplace prog.
>>>>
>>>> Then, when freplace prog attaches to entry_tc, it has to initialize
>>>> tail_call_cnt and tail_call_cnt_ptr, because its target is main prog and
>>>> its target's prologue hasn't initialize them before the attach hook.
>>>>
>>>> So, this patch uses x7 register to tell freplace prog that its target
>>>> prog is main prog or not.
>>>>
>>>> Meanwhile, while tail calling to a freplace prog, it is required to
>>>> reset x7 register to prevent re-initializing tail_call_cnt at freplace
>>>> prog's prologue.
>>>>
>>>> Fixes: 1c123c567fb1 ("bpf: Resolve fext program type when checking
>>>> map compatibility")
>>>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>>>> ---
>>>>    arch/arm64/net/bpf_jit_comp.c | 44 +++++++++++++++++++++++++++++++----
>>>>    1 file changed, 39 insertions(+), 5 deletions(-)
>>>>
>>> Hi Puranjay and Kuohai,
>>>
>>> As it's not recommended to introduce arch_bpf_run(), this is my approach
>>> to fix the niche case on arm64.
>>>
>>> Do you have any better idea to fix it?
>>>
>>
>> IIUC, the recommended appraoch is to teach verifier to reject the
>> freplace + tailcall combination. If this combiation is allowed, we
>> will face more than just this issue. For example, what happens if
>> a freplace prog is attached to tail callee? The freplace prog is not
>> reachable through the tail call, right?
>>
> 
> It's to reject the freplace + tailcall combination partially, see "bpf,
> x64: Fix tailcall infinite loop caused by freplace". (Oh, I should
> separate the rejection to a standalone patch.)
> It rejects the case that freplace prog has tailcall and its attach
> target has no tailcall.
> 
> As for your example, it depends on:
> 
>                  freplace       target    reject?
> Has tailcall?     YES            NO        YES
> Has tailcall?     YES            YES       NO
> Has tailcall?     NO             YES       NO
> Has tailcall?     NO             YES       NO
> 
> Then, freplace prog can be tail callee always. I haven't seen any bad
> case when freplace prog is tail callee.
>

Here is a concrete case. prog1 tail calls prog2, and prog2_new is
attached to prog2 via freplace.

SEC("tc")
int prog1(struct __sk_buff *skb)
{
         bpf_tail_call_static(skb, &progs, 0); // tail call prog2
         return 0;
}

SEC("tc")
int prog2(struct __sk_buff *skb)
{
         return 0;
}

SEC("freplace")
int prog2_new(struct __sk_buff *skb) // target is prog2
{
         return 0;
}

In this case, prog2_new is not reachable, since the tail call
target in prog2 is start address of prog2  + TAIL_CALL_OFFSET,
which locates behind freplace/fentry callsite of prog2.

> I'm not planning to disable freplace + tailcall combination totally,
> because I use this combination in an in-house XDP project of my company.
> 
> Thanks,
> Leon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-09 12:08         ` Xu Kuohai
@ 2024-09-09 14:42           ` Leon Hwang
  2024-09-13 17:47             ` Alexei Starovoitov
  0 siblings, 1 reply; 14+ messages in thread
From: Leon Hwang @ 2024-09-09 14:42 UTC (permalink / raw)
  To: Xu Kuohai, bpf
  Cc: ast, daniel, andrii, toke, martin.lau, yonghong.song, puranjay,
	eddyz87, iii, kernel-patches-bot



On 9/9/24 20:08, Xu Kuohai wrote:
> On 9/9/2024 6:38 PM, Leon Hwang wrote:
>>
>>
>> On 9/9/24 17:02, Xu Kuohai wrote:
>>> On 9/8/2024 9:01 PM, Leon Hwang wrote:
>>>>
>>>>
>>>> On 1/9/24 21:38, Leon Hwang wrote:
>>>>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the
>>>>> same
>>>>> issue happens on arm64, too.
>>>>>

[...]

>>>>>
>>>> Hi Puranjay and Kuohai,
>>>>
>>>> As it's not recommended to introduce arch_bpf_run(), this is my
>>>> approach
>>>> to fix the niche case on arm64.
>>>>
>>>> Do you have any better idea to fix it?
>>>>
>>>
>>> IIUC, the recommended appraoch is to teach verifier to reject the
>>> freplace + tailcall combination. If this combiation is allowed, we
>>> will face more than just this issue. For example, what happens if
>>> a freplace prog is attached to tail callee? The freplace prog is not
>>> reachable through the tail call, right?
>>>
>>
>> It's to reject the freplace + tailcall combination partially, see "bpf,
>> x64: Fix tailcall infinite loop caused by freplace". (Oh, I should
>> separate the rejection to a standalone patch.)
>> It rejects the case that freplace prog has tailcall and its attach
>> target has no tailcall.
>>
>> As for your example, it depends on:
>>
>>                  freplace       target    reject?
>> Has tailcall?     YES            NO        YES
>> Has tailcall?     YES            YES       NO
>> Has tailcall?     NO             YES       NO
>> Has tailcall?     NO             YES       NO
>>
>> Then, freplace prog can be tail callee always. I haven't seen any bad
>> case when freplace prog is tail callee.
>>
> 
> Here is a concrete case. prog1 tail calls prog2, and prog2_new is
> attached to prog2 via freplace.
> 
> SEC("tc")
> int prog1(struct __sk_buff *skb)
> {
>         bpf_tail_call_static(skb, &progs, 0); // tail call prog2
>         return 0;
> }
> 
> SEC("tc")
> int prog2(struct __sk_buff *skb)
> {
>         return 0;
> }
> 
> SEC("freplace")
> int prog2_new(struct __sk_buff *skb) // target is prog2
> {
>         return 0;
> }
> 
> In this case, prog2_new is not reachable, since the tail call
> target in prog2 is start address of prog2  + TAIL_CALL_OFFSET,
> which locates behind freplace/fentry callsite of prog2.
> 

This is an abnormal use case. We can do nothing with it, e.g. we're
unable to notify user that prog2_new is not reachable for this case.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-09 14:42           ` Leon Hwang
@ 2024-09-13 17:47             ` Alexei Starovoitov
  2024-09-14  9:14               ` Xu Kuohai
  0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2024-09-13 17:47 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Xu Kuohai, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Toke Høiland-Jørgensen,
	Martin KaFai Lau, Yonghong Song, Puranjay Mohan, Eddy Z,
	Ilya Leoshkevich, kernel-patches-bot

On Mon, Sep 9, 2024 at 7:42 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 9/9/24 20:08, Xu Kuohai wrote:
> > On 9/9/2024 6:38 PM, Leon Hwang wrote:
> >>
> >>
> >> On 9/9/24 17:02, Xu Kuohai wrote:
> >>> On 9/8/2024 9:01 PM, Leon Hwang wrote:
> >>>>
> >>>>
> >>>> On 1/9/24 21:38, Leon Hwang wrote:
> >>>>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the
> >>>>> same
> >>>>> issue happens on arm64, too.
> >>>>>
>
> [...]
>
> >>>>>
> >>>> Hi Puranjay and Kuohai,
> >>>>
> >>>> As it's not recommended to introduce arch_bpf_run(), this is my
> >>>> approach
> >>>> to fix the niche case on arm64.
> >>>>
> >>>> Do you have any better idea to fix it?
> >>>>
> >>>
> >>> IIUC, the recommended appraoch is to teach verifier to reject the
> >>> freplace + tailcall combination. If this combiation is allowed, we
> >>> will face more than just this issue. For example, what happens if
> >>> a freplace prog is attached to tail callee? The freplace prog is not
> >>> reachable through the tail call, right?
> >>>
> >>
> >> It's to reject the freplace + tailcall combination partially, see "bpf,
> >> x64: Fix tailcall infinite loop caused by freplace". (Oh, I should
> >> separate the rejection to a standalone patch.)
> >> It rejects the case that freplace prog has tailcall and its attach
> >> target has no tailcall.
> >>
> >> As for your example, it depends on:
> >>
> >>                  freplace       target    reject?
> >> Has tailcall?     YES            NO        YES
> >> Has tailcall?     YES            YES       NO
> >> Has tailcall?     NO             YES       NO
> >> Has tailcall?     NO             YES       NO
> >>
> >> Then, freplace prog can be tail callee always. I haven't seen any bad
> >> case when freplace prog is tail callee.
> >>
> >
> > Here is a concrete case. prog1 tail calls prog2, and prog2_new is
> > attached to prog2 via freplace.
> >
> > SEC("tc")
> > int prog1(struct __sk_buff *skb)
> > {
> >         bpf_tail_call_static(skb, &progs, 0); // tail call prog2
> >         return 0;
> > }
> >
> > SEC("tc")
> > int prog2(struct __sk_buff *skb)
> > {
> >         return 0;
> > }
> >
> > SEC("freplace")
> > int prog2_new(struct __sk_buff *skb) // target is prog2
> > {
> >         return 0;
> > }
> >
> > In this case, prog2_new is not reachable, since the tail call
> > target in prog2 is start address of prog2  + TAIL_CALL_OFFSET,
> > which locates behind freplace/fentry callsite of prog2.
> >
>
> This is an abnormal use case. We can do nothing with it, e.g. we're
> unable to notify user that prog2_new is not reachable for this case.

Since it doesn't behave as the user would expect, I think, it's better
to disallow such combinations in the verifier.
Either freplace is trying to patch a prog that is already in prog_array
then the load of freplace prog can report a meaningful error into the
verifier log or
already freplaced prog is being added to prog array.
I think in this case main prog tail calling into this freplace prog
will actually succeed, but it's better to reject sys_bpf update
command in bpf_fd_array_map_update_elem(),
since being-freplaced prog is not in prog_array and won't be called,
but freplace prog in prog array can be called which is inconsistent.
freplace prog should act and be called just like target being-freplaced prog.

I don't think this will break any actual use cases where freplace and
tail call are used together.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 1/4] bpf, x64: Fix tailcall infinite loop caused by freplace
  2024-09-01 13:38 ` [PATCH bpf-next v2 1/4] bpf, x64: " Leon Hwang
@ 2024-09-13 19:28   ` Alexei Starovoitov
  2024-09-15 13:00     ` Leon Hwang
  0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2024-09-13 19:28 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Yonghong Song,
	Puranjay Mohan, Xu Kuohai, Eddy Z, Ilya Leoshkevich,
	kernel-patches-bot

On Sun, Sep 1, 2024 at 6:41 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> @@ -573,10 +575,13 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
>
>         /*
>          * See emit_prologue(), for IBT builds the trampoline hook is preceded
> -        * with an ENDBR instruction.
> +        * with an ENDBR instruction and 3 bytes tail_call_cnt initialization
> +        * instruction.
>          */
>         if (is_endbr(*(u32 *)ip))
>                 ip += ENDBR_INSN_SIZE;
> +       if (is_bpf_text_address((long)ip))
> +               ip += X86_POKE_EXTRA;

This is a foot gun.
bpf_arch_text_poke() is used not only at the beginning of the function.
So unconditional ip += 3 is not just puzzling with 'what is this for',
but dangerous and wasteful...

> @@ -2923,6 +2930,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>                  */
>                 if (is_endbr(*(u32 *)orig_call))
>                         orig_call += ENDBR_INSN_SIZE;
> +               if (is_bpf_text_address((long)orig_call))
> +                       orig_call += X86_POKE_EXTRA;
>                 orig_call += X86_PATCH_SIZE;
>         }

..this bit needs to be hacked too...

> @@ -3025,6 +3034,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>                 /* remember return value in a stack for bpf prog to access */
>                 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
>                 im->ip_after_call = image + (prog - (u8 *)rw_image);
> +               emit_nops(&prog, X86_POKE_EXTRA);
>                 emit_nops(&prog, X86_PATCH_SIZE);

And this is just pure waste of kernel code and cpu run-time.

You're adding 3 byte nop for no reason at all.

See commit e21aa341785c ("bpf: Fix fexit trampoline.")
that added:
                int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
                                             NULL, im->ip_epilogue);
logic that is patching bpf trampoline in the middle of it.
(not at the start).

Because of unconditional +=3 in bpf_arch_text_poke() every trampoline
will have to waste nop3 ?
No.

Please fix freplace and tail call combination without
this kind of unacceptable shortcuts.

I very much prefer to stop hacking into JITs and trampolines because
tailcalls and freplace don't work well together.

We cannot completely disable that combination because libxdp
is using freplace to populate chain of progs the main prog is calling
and these freplace progs might be doing tailcall,
but we can still prevent such infinite loop that you describe in commit log:
entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc
in the verifier without resorting to JIT hacks.

pw-bot: cr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] bpf, arm64: Fix tailcall infinite loop caused by freplace
  2024-09-13 17:47             ` Alexei Starovoitov
@ 2024-09-14  9:14               ` Xu Kuohai
  0 siblings, 0 replies; 14+ messages in thread
From: Xu Kuohai @ 2024-09-14  9:14 UTC (permalink / raw)
  To: Alexei Starovoitov, Leon Hwang
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Yonghong Song,
	Puranjay Mohan, Eddy Z, Ilya Leoshkevich, kernel-patches-bot

On 9/14/2024 1:47 AM, Alexei Starovoitov wrote:
> On Mon, Sep 9, 2024 at 7:42 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 9/9/24 20:08, Xu Kuohai wrote:
>>> On 9/9/2024 6:38 PM, Leon Hwang wrote:
>>>>
>>>>
>>>> On 9/9/24 17:02, Xu Kuohai wrote:
>>>>> On 9/8/2024 9:01 PM, Leon Hwang wrote:
>>>>>>
>>>>>>
>>>>>> On 1/9/24 21:38, Leon Hwang wrote:
>>>>>>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the
>>>>>>> same
>>>>>>> issue happens on arm64, too.
>>>>>>>
>>
>> [...]
>>
>>>>>>>
>>>>>> Hi Puranjay and Kuohai,
>>>>>>
>>>>>> As it's not recommended to introduce arch_bpf_run(), this is my
>>>>>> approach
>>>>>> to fix the niche case on arm64.
>>>>>>
>>>>>> Do you have any better idea to fix it?
>>>>>>
>>>>>
>>>>> IIUC, the recommended appraoch is to teach verifier to reject the
>>>>> freplace + tailcall combination. If this combiation is allowed, we
>>>>> will face more than just this issue. For example, what happens if
>>>>> a freplace prog is attached to tail callee? The freplace prog is not
>>>>> reachable through the tail call, right?
>>>>>
>>>>
>>>> It's to reject the freplace + tailcall combination partially, see "bpf,
>>>> x64: Fix tailcall infinite loop caused by freplace". (Oh, I should
>>>> separate the rejection to a standalone patch.)
>>>> It rejects the case that freplace prog has tailcall and its attach
>>>> target has no tailcall.
>>>>
>>>> As for your example, it depends on:
>>>>
>>>>                   freplace       target    reject?
>>>> Has tailcall?     YES            NO        YES
>>>> Has tailcall?     YES            YES       NO
>>>> Has tailcall?     NO             YES       NO
>>>> Has tailcall?     NO             YES       NO
>>>>
>>>> Then, freplace prog can be tail callee always. I haven't seen any bad
>>>> case when freplace prog is tail callee.
>>>>
>>>
>>> Here is a concrete case. prog1 tail calls prog2, and prog2_new is
>>> attached to prog2 via freplace.
>>>
>>> SEC("tc")
>>> int prog1(struct __sk_buff *skb)
>>> {
>>>          bpf_tail_call_static(skb, &progs, 0); // tail call prog2
>>>          return 0;
>>> }
>>>
>>> SEC("tc")
>>> int prog2(struct __sk_buff *skb)
>>> {
>>>          return 0;
>>> }
>>>
>>> SEC("freplace")
>>> int prog2_new(struct __sk_buff *skb) // target is prog2
>>> {
>>>          return 0;
>>> }
>>>
>>> In this case, prog2_new is not reachable, since the tail call
>>> target in prog2 is start address of prog2  + TAIL_CALL_OFFSET,
>>> which locates behind freplace/fentry callsite of prog2.
>>>
>>
>> This is an abnormal use case. We can do nothing with it, e.g. we're
>> unable to notify user that prog2_new is not reachable for this case.
> 
> Since it doesn't behave as the user would expect, I think, it's better
> to disallow such combinations in the verifier.
> Either freplace is trying to patch a prog that is already in prog_array
> then the load of freplace prog can report a meaningful error into the
> verifier log or
> already freplaced prog is being added to prog array.
> I think in this case main prog tail calling into this freplace prog
> will actually succeed, but it's better to reject sys_bpf update
> command in bpf_fd_array_map_update_elem(),
> since being-freplaced prog is not in prog_array and won't be called,
> but freplace prog in prog array can be called which is inconsistent.
> freplace prog should act and be called just like target being-freplaced prog.
> 
> I don't think this will break any actual use cases where freplace and
> tail call are used together.

+1, we should not ignore it silently. If the freplace + tailcall
combination can not be disallowed completely, we should disallow
this case separately (or maybe we could find an acceptable way to
make it work as expected?) . Let me give it a try.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 1/4] bpf, x64: Fix tailcall infinite loop caused by freplace
  2024-09-13 19:28   ` Alexei Starovoitov
@ 2024-09-15 13:00     ` Leon Hwang
  0 siblings, 0 replies; 14+ messages in thread
From: Leon Hwang @ 2024-09-15 13:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Yonghong Song,
	Puranjay Mohan, Xu Kuohai, Eddy Z, Ilya Leoshkevich,
	kernel-patches-bot



On 2024/9/14 03:28, Alexei Starovoitov wrote:
> On Sun, Sep 1, 2024 at 6:41 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> @@ -573,10 +575,13 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
>>
>>         /*
>>          * See emit_prologue(), for IBT builds the trampoline hook is preceded
>> -        * with an ENDBR instruction.
>> +        * with an ENDBR instruction and 3 bytes tail_call_cnt initialization
>> +        * instruction.
>>          */
>>         if (is_endbr(*(u32 *)ip))
>>                 ip += ENDBR_INSN_SIZE;
>> +       if (is_bpf_text_address((long)ip))
>> +               ip += X86_POKE_EXTRA;
> 
> This is a foot gun.
> bpf_arch_text_poke() is used not only at the beginning of the function.
> So unconditional ip += 3 is not just puzzling with 'what is this for',
> but dangerous and wasteful...
> 
>> @@ -2923,6 +2930,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>>                  */
>>                 if (is_endbr(*(u32 *)orig_call))
>>                         orig_call += ENDBR_INSN_SIZE;
>> +               if (is_bpf_text_address((long)orig_call))
>> +                       orig_call += X86_POKE_EXTRA;
>>                 orig_call += X86_PATCH_SIZE;
>>         }
> 
> ..this bit needs to be hacked too...
> 
>> @@ -3025,6 +3034,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>>                 /* remember return value in a stack for bpf prog to access */
>>                 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
>>                 im->ip_after_call = image + (prog - (u8 *)rw_image);
>> +               emit_nops(&prog, X86_POKE_EXTRA);
>>                 emit_nops(&prog, X86_PATCH_SIZE);
> 
> And this is just pure waste of kernel code and cpu run-time.
> 
> You're adding 3 byte nop for no reason at all.
> 
> See commit e21aa341785c ("bpf: Fix fexit trampoline.")
> that added:
>                 int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
>                                              NULL, im->ip_epilogue);
> logic that is patching bpf trampoline in the middle of it.
> (not at the start).
> 
> Because of unconditional +=3 in bpf_arch_text_poke() every trampoline
> will have to waste nop3 ?
> No.
> 
> Please fix freplace and tail call combination without
> this kind of unacceptable shortcuts.
> 
> I very much prefer to stop hacking into JITs and trampolines because
> tailcalls and freplace don't work well together.
> 
> We cannot completely disable that combination because libxdp
> is using freplace to populate chain of progs the main prog is calling
> and these freplace progs might be doing tailcall,
> but we can still prevent such infinite loop that you describe in commit log:
> entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc
> in the verifier without resorting to JIT hacks.
> 
> pw-bot: cr

IIUC, it's going to prevent this niche case at update/attach time, that
a prog cannot be updated to prog_array map and be extended by freplace
prog at the same time.

More specific explanation:

1. If a prog has been updated to prog_array, it and its subprog cannot
   be extended by freplace prog.
2. If a prog or its subprog has been extended by freplace prog, it
   cannot be updated to prog_array map.

Then, I do a POC[0] to prevent this niche case. And the POC does not
break any selftest.

[0] https://github.com/kernel-patches/bpf/pull/7732

Here's the diff of the POC.

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0c3893c471711..b864b37e67c17 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1483,6 +1483,8 @@ struct bpf_prog_aux {
 	bool xdp_has_frags;
 	bool exception_cb;
 	bool exception_boundary;
+	bool is_extended; /* true if extended by freplace program */
+	atomic_t tail_callee_cnt;
 	struct bpf_arena *arena;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 79660e3fca4c1..be41240d4fb3a 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -951,18 +951,27 @@ static void *prog_fd_array_get_ptr(struct bpf_map
*map,
 	if (IS_ERR(prog))
 		return prog;

-	if (!bpf_prog_map_compatible(map, prog)) {
+	if (!bpf_prog_map_compatible(map, prog) || prog->aux->is_extended) {
+		/* Extended prog can not be tail callee. It's to prevent a
+		 * potential infinite loop like:
+		 * tail callee prog entry -> tail callee prog subprog ->
+		 * freplace prog entry --tailcall-> tail callee prog entry.
+		 */
 		bpf_prog_put(prog);
 		return ERR_PTR(-EINVAL);
 	}

+	atomic_inc(&prog->aux->tail_callee_cnt);
 	return prog;
 }

 static void prog_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool
need_defer)
 {
+	struct bpf_prog *prog = ptr;
+
 	/* bpf_prog is freed after one RCU or tasks trace grace period */
-	bpf_prog_put(ptr);
+	atomic_dec(&prog->aux->tail_callee_cnt);
+	bpf_prog_put(prog);
 }

 static u32 prog_fd_array_sys_lookup_elem(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8a4117f6d7610..be829016d8182 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3292,8 +3292,11 @@ static void bpf_tracing_link_release(struct
bpf_link *link)
 	bpf_trampoline_put(tr_link->trampoline);

 	/* tgt_prog is NULL if target is a kernel function */
-	if (tr_link->tgt_prog)
+	if (tr_link->tgt_prog) {
+		if (link->prog->type == BPF_PROG_TYPE_EXT)
+			tr_link->tgt_prog->aux->is_extended = false;
 		bpf_prog_put(tr_link->tgt_prog);
+	}
 }

 static void bpf_tracing_link_dealloc(struct bpf_link *link)
@@ -3498,6 +3501,18 @@ static int bpf_tracing_prog_attach(struct
bpf_prog *prog,
 		tgt_prog = prog->aux->dst_prog;
 	}

+	if (prog->type == BPF_PROG_TYPE_EXT &&
+	    atomic_read(&tgt_prog->aux->tail_callee_cnt)) {
+		/* Program extensions can not extend target prog when the target
+		 * prog has been updated to any prog_array map as tail callee.
+		 * It's to prevent a potential infinite loop like:
+		 * tgt prog entry -> tgt prog subprog -> freplace prog entry
+		 * --tailcall-> tgt prog entry.
+		 */
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	err = bpf_link_prime(&link->link.link, &link_primer);
 	if (err)
 		goto out_unlock;
@@ -3523,6 +3538,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog
*prog,
 	if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline)
 		/* we allocated a new trampoline, so free the old one */
 		bpf_trampoline_put(prog->aux->dst_trampoline);
+	if (prog->type == BPF_PROG_TYPE_EXT)
+		tgt_prog->aux->is_extended = true;

 	prog->aux->dst_prog = NULL;
 	prog->aux->dst_trampoline = NULL;

Thanks,
Leon

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-09-15 13:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-01 13:38 [PATCH bpf-next v2 0/4] bpf: Fix tailcall infinite loop caused by freplace Leon Hwang
2024-09-01 13:38 ` [PATCH bpf-next v2 1/4] bpf, x64: " Leon Hwang
2024-09-13 19:28   ` Alexei Starovoitov
2024-09-15 13:00     ` Leon Hwang
2024-09-01 13:38 ` [PATCH bpf-next v2 2/4] bpf, arm64: " Leon Hwang
2024-09-08 13:01   ` Leon Hwang
2024-09-09  9:02     ` Xu Kuohai
2024-09-09 10:38       ` Leon Hwang
2024-09-09 12:08         ` Xu Kuohai
2024-09-09 14:42           ` Leon Hwang
2024-09-13 17:47             ` Alexei Starovoitov
2024-09-14  9:14               ` Xu Kuohai
2024-09-01 13:38 ` [PATCH bpf-next v2 3/4] selftests/bpf: Add testcases for another tailcall infinite loop fixing Leon Hwang
2024-09-01 13:38 ` [PATCH bpf-next v2 4/4] selftests/bpf: Fix verifier tailcall jit selftest Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).