[PATCH bpf-next 0/4] bpf: tailcall: Eliminate max_entries and bpf

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/4] bpf: tailcall: Eliminate max_entries and bpf_func access at runtime
@ 2026-01-02 15:00 Leon Hwang
  2026-01-02 15:00 ` [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Leon Hwang @ 2026-01-02 15:00 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	David S . Miller, David Ahern, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Andrew Morton,
	linux-arm-kernel, linux-kernel, netdev, kernel-patches-bot,
	Leon Hwang

This patch series optimizes BPF tail calls on x86_64 and arm64 by
eliminating runtime memory accesses for max_entries and 'prog->bpf_func'
when the prog array map is known at verification time.

Currently, every tail call requires:
  1. Loading max_entries from the prog array map
  2. Dereferencing 'prog->bpf_func' to get the target address

This series introduces a mechanism to precompute and cache the tail call
target addresses (bpf_func + prologue_offset) in the prog array itself:
  array->ptrs[max_entries + index] = prog->bpf_func + prologue_offset

When a program is added to or removed from the prog array, the cached
target is atomically updated via xchg().

The verifier now encodes additional information in the tail call
instruction's imm field:
  - bits 0-7:   map index in used_maps[]
  - bits 8-15:  dynamic array flag (1 if map pointer is poisoned)
  - bits 16-31: poke table index + 1 for direct tail calls

For static tail calls (map known at verification time):
  - max_entries is embedded as an immediate in the comparison instruction
  - The cached target from array->ptrs[max_entries + index] is used
    directly, avoiding the 'prog->bpf_func' dereference

For dynamic tail calls (map pointer poisoned):
  - Fall back to runtime lookup of max_entries and prog->bpf_func

This reduces cache misses and improves tail call performance for the
common case where the prog array is statically known.

Leon Hwang (4):
  bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset
  bpf, x64: tailcall: Eliminate max_entries and bpf_func access at
    runtime
  bpf, arm64: tailcall: Eliminate max_entries and bpf_func access at
    runtime
  bpf, lib/test_bpf: Fix broken tailcall tests

 arch/arm64/net/bpf_jit_comp.c | 71 +++++++++++++++++++++++++----------
 arch/x86/net/bpf_jit_comp.c   | 51 ++++++++++++++++++-------
 include/linux/bpf.h           |  1 +
 kernel/bpf/arraymap.c         | 27 ++++++++++++-
 kernel/bpf/verifier.c         | 30 ++++++++++++++-
 lib/test_bpf.c                | 39 ++++++++++++++++---
 6 files changed, 178 insertions(+), 41 deletions(-)

--
2.52.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset
  2026-01-02 15:00 [PATCH bpf-next 0/4] bpf: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
@ 2026-01-02 15:00 ` Leon Hwang
  2026-01-02 15:21   ` bot+bpf-ci
  2026-01-02 15:00 ` [PATCH bpf-next 2/4] bpf, x64: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-01-02 15:00 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	David S . Miller, David Ahern, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Andrew Morton,
	linux-arm-kernel, linux-kernel, netdev, kernel-patches-bot,
	Leon Hwang

Introduce bpf_arch_tail_call_prologue_offset() to allow architectures
to specify the offset from bpf_func to the actual program entry point
for tail calls. This offset accounts for prologue instructions that
should be skipped (e.g., fentry NOPs, TCC initialization).

When an architecture provides a non-zero prologue offset, prog arrays
allocate additional space to cache precomputed tail call targets:
  array->ptrs[max_entries + index] = prog->bpf_func + prologue_offset

This cached target is updated atomically via xchg() when programs are
added or removed from the prog array, eliminating the need to compute
the target address at runtime during tail calls.

The function is exported for use by the test_bpf module.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/arraymap.c | 27 ++++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4e7d72dfbcd4..acd85c239af9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3792,6 +3792,7 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
 
 void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
 			       struct bpf_prog *new, struct bpf_prog *old);
+int bpf_arch_tail_call_prologue_offset(void);
 
 void *bpf_arch_text_copy(void *dst, void *src, size_t len);
 int bpf_arch_text_invalidate(void *dst, size_t len);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 1eeb31c5b317..beedd1281c22 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -127,6 +127,9 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 			array_size += (u64) max_entries * elem_size;
 		}
 	}
+	if (attr->map_type == BPF_MAP_TYPE_PROG_ARRAY && bpf_arch_tail_call_prologue_offset())
+		/* Store tailcall targets */
+		array_size += (u64) max_entries * sizeof(void *);
 
 	/* allocate all map elements and zero-initialize them */
 	if (attr->map_flags & BPF_F_MMAPABLE) {
@@ -1087,16 +1090,38 @@ void __weak bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
 	WARN_ON_ONCE(1);
 }
 
+int __weak bpf_arch_tail_call_prologue_offset(void)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bpf_arch_tail_call_prologue_offset);
+
+static void bpf_tail_call_target_update(struct bpf_array *array, u32 key, struct bpf_prog *new)
+{
+	int offset = bpf_arch_tail_call_prologue_offset();
+	void *target;
+
+	if (!offset)
+		return;
+
+	target = new ? (void *) new->bpf_func + offset : 0;
+	xchg(array->ptrs + array->map.max_entries + key, target);
+}
+
 static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
 				    struct bpf_prog *old,
 				    struct bpf_prog *new)
 {
 	struct prog_poke_elem *elem;
 	struct bpf_array_aux *aux;
+	struct bpf_array *array;
 
-	aux = container_of(map, struct bpf_array, map)->aux;
+	array = container_of(map, struct bpf_array, map);
+	aux = array->aux;
 	WARN_ON_ONCE(!mutex_is_locked(&aux->poke_mutex));
 
+	bpf_tail_call_target_update(array, key, new);
+
 	list_for_each_entry(elem, &aux->poke_progs, list) {
 		struct bpf_jit_poke_descriptor *poke;
 		int i;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 2/4] bpf, x64: tailcall: Eliminate max_entries and bpf_func access at runtime
  2026-01-02 15:00 [PATCH bpf-next 0/4] bpf: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
  2026-01-02 15:00 ` [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset Leon Hwang
@ 2026-01-02 15:00 ` Leon Hwang
  2026-01-02 15:00 ` [PATCH bpf-next 3/4] bpf, arm64: " Leon Hwang
  2026-01-02 15:00 ` [PATCH bpf-next 4/4] bpf, lib/test_bpf: Fix broken tailcall tests Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-01-02 15:00 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	David S . Miller, David Ahern, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Andrew Morton,
	linux-arm-kernel, linux-kernel, netdev, kernel-patches-bot,
	Leon Hwang

Optimize BPF tail calls on x86_64 by eliminating runtime memory accesses
for max_entries and prog->bpf_func when the prog array map is known at
verification time.

The verifier now encodes three fields in the tail call instruction's imm:
  - bits 0-7:   map index in used_maps[] (max 63)
  - bits 8-15:  dynamic array flag (1 if map pointer is poisoned)
  - bits 16-31: poke table index + 1 for direct tail calls (max 1023)

For static tail calls (map known at verification time):
  - max_entries is embedded as an immediate in the comparison instruction
  - The cached target from array->ptrs[max_entries + index] is used
    directly, avoiding the prog->bpf_func dereference

For dynamic tail calls (map pointer poisoned):
  - Fall back to runtime lookup of max_entries and prog->bpf_func

This reduces cache misses and improves tail call performance for the
common case where the prog array is statically known.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 51 +++++++++++++++++++++++++++----------
 kernel/bpf/verifier.c       | 30 ++++++++++++++++++++--
 2 files changed, 66 insertions(+), 15 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e3b1c4b1d550..9fd707612da5 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -733,11 +733,13 @@ static void emit_return(u8 **pprog, u8 *ip)
  * out:
  */
 static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
+					u32 map_index, bool dyn_array,
 					u8 **pprog, bool *callee_regs_used,
 					u32 stack_depth, u8 *ip,
 					struct jit_context *ctx)
 {
 	int tcc_ptr_off = BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack_depth);
+	struct bpf_map *map = bpf_prog->aux->used_maps[map_index];
 	u8 *prog = *pprog, *start = *pprog;
 	int offset;
 
@@ -752,11 +754,14 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 	 *	goto out;
 	 */
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
-	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
-	      offsetof(struct bpf_array, map.max_entries));
+	if (dyn_array)
+		EMIT3(0x3B, 0x56,                 /* cmp edx, dword ptr [rsi + 16] */
+		      offsetof(struct bpf_array, map.max_entries));
+	else
+		EMIT2_off32(0x81, 0xFA, map->max_entries); /* cmp edx, imm32 (map->max_entries) */
 
 	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
-	EMIT2(X86_JBE, offset);                   /* jbe out */
+	EMIT2(X86_JAE, offset);                   /* jae out */
 
 	/*
 	 * if ((*tcc_ptr)++ >= MAX_TAIL_CALL_CNT)
@@ -768,9 +773,15 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
 	EMIT2(X86_JAE, offset);                   /* jae out */
 
-	/* prog = array->ptrs[index]; */
-	EMIT4_off32(0x48, 0x8B, 0x8C, 0xD6,       /* mov rcx, [rsi + rdx * 8 + offsetof(...)] */
-		    offsetof(struct bpf_array, ptrs));
+	/*
+	 * if (dyn_array)
+	 *	prog = array->ptrs[index];
+	 * else
+	 *	tgt = array->ptrs[max_entries + index];
+	 */
+	offset = offsetof(struct bpf_array, ptrs);
+	offset += dyn_array ? 0 : map->max_entries * sizeof(void *);
+	EMIT4_off32(0x48, 0x8B, 0x8C, 0xD6, offset); /* mov rcx, [rsi + rdx * 8 + offset] */
 
 	/*
 	 * if (prog == NULL)
@@ -803,11 +814,14 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 		EMIT3_off32(0x48, 0x81, 0xC4,     /* add rsp, sd */
 			    round_up(stack_depth, 8));
 
-	/* goto *(prog->bpf_func + X86_TAIL_CALL_OFFSET); */
-	EMIT4(0x48, 0x8B, 0x49,                   /* mov rcx, qword ptr [rcx + 32] */
-	      offsetof(struct bpf_prog, bpf_func));
-	EMIT4(0x48, 0x83, 0xC1,                   /* add rcx, X86_TAIL_CALL_OFFSET */
-	      X86_TAIL_CALL_OFFSET);
+	if (dyn_array) {
+		/* goto *(prog->bpf_func + X86_TAIL_CALL_OFFSET); */
+		EMIT4(0x48, 0x8B, 0x49,           /* mov rcx, qword ptr [rcx + 32] */
+		      offsetof(struct bpf_prog, bpf_func));
+		EMIT4(0x48, 0x83, 0xC1,           /* add rcx, X86_TAIL_CALL_OFFSET */
+		      X86_TAIL_CALL_OFFSET);
+	}
+
 	/*
 	 * Now we're ready to jump into next BPF program
 	 * rdi == ctx (1st arg)
@@ -2461,15 +2475,21 @@ st:			if (is_imm8(insn->off))
 		}
 
 		case BPF_JMP | BPF_TAIL_CALL:
-			if (imm32)
+			bool dynamic_array = (imm32 >> 8) & 0xFF;
+			u32 map_index = imm32 & 0xFF;
+			s32 imm16 = imm32 >> 16;
+
+			if (imm16)
 				emit_bpf_tail_call_direct(bpf_prog,
-							  &bpf_prog->aux->poke_tab[imm32 - 1],
+							  &bpf_prog->aux->poke_tab[imm16 - 1],
 							  &prog, image + addrs[i - 1],
 							  callee_regs_used,
 							  stack_depth,
 							  ctx);
 			else
 				emit_bpf_tail_call_indirect(bpf_prog,
+							    map_index,
+							    dynamic_array,
 							    &prog,
 							    callee_regs_used,
 							    stack_depth,
@@ -4047,6 +4067,11 @@ void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
 	}
 }
 
+int bpf_arch_tail_call_prologue_offset(void)
+{
+	return X86_TAIL_CALL_OFFSET;
+}
+
 bool bpf_jit_supports_arena(void)
 {
 	return true;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3d44c5d06623..ab9c84e76a62 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -22602,6 +22602,18 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
 	return 0;
 }
 
+static int tail_call_find_map_index(struct bpf_verifier_env *env, struct bpf_map *map)
+{
+	int i;
+
+	for (i = 0; i < env->used_map_cnt; i++) {
+		if (env->used_maps[i] == map)
+			return i;
+	}
+
+	return -ENOENT;
+}
+
 /* Do various post-verification rewrites in a single program pass.
  * These rewrites simplify JIT and interpreter implementations.
  */
@@ -22993,10 +23005,24 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			 * call and to prevent accidental JITing by JIT compiler
 			 * that doesn't support bpf_tail_call yet
 			 */
-			insn->imm = 0;
 			insn->code = BPF_JMP | BPF_TAIL_CALL;
 
+			/*
+			 * insn->imm contains 3 fields:
+			 *   map index(8 bits):   6 bits are enough, 63 max
+			 *   poisoned(8 bits):    1 bit is enough
+			 *   poke index(16 bits): 1023 max
+			 */
+
 			aux = &env->insn_aux_data[i + delta];
+			insn->imm = tail_call_find_map_index(env, aux->map_ptr_state.map_ptr);
+			if (insn->imm < 0) {
+				verifier_bug(env, "index not found for prog array map\n");
+				return -EINVAL;
+			}
+
+			insn->imm |= bpf_map_ptr_poisoned(aux) << 8;
+
 			if (env->bpf_capable && !prog->blinding_requested &&
 			    prog->jit_requested &&
 			    !bpf_map_key_poisoned(aux) &&
@@ -23015,7 +23041,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 					return ret;
 				}
 
-				insn->imm = ret + 1;
+				insn->imm |= (ret + 1) << 16;
 				goto next_insn;
 			}
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 3/4] bpf, arm64: tailcall: Eliminate max_entries and bpf_func access at runtime
  2026-01-02 15:00 [PATCH bpf-next 0/4] bpf: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
  2026-01-02 15:00 ` [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset Leon Hwang
  2026-01-02 15:00 ` [PATCH bpf-next 2/4] bpf, x64: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
@ 2026-01-02 15:00 ` Leon Hwang
  2026-01-02 15:00 ` [PATCH bpf-next 4/4] bpf, lib/test_bpf: Fix broken tailcall tests Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-01-02 15:00 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	David S . Miller, David Ahern, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Andrew Morton,
	linux-arm-kernel, linux-kernel, netdev, kernel-patches-bot,
	Leon Hwang

Apply the same tail call optimization to arm64 as done for x86_64.

When the prog array map is known at verification time (dyn_array=false):
  - Embed max_entries as an immediate value instead of loading from memory
  - Use the precomputed target from array->ptrs[max_entries + index]
  - Jump directly to the cached target without dereferencing prog->bpf_func

When the map is dynamically determined (dyn_array=true):
  - Load max_entries from the array at runtime
  - Look up prog from array->ptrs[index] and compute the target address

Implement bpf_arch_tail_call_prologue_offset() returning
"PROLOGUE_OFFSET * 4" to convert the instruction count to bytes.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/arm64/net/bpf_jit_comp.c | 71 +++++++++++++++++++++++++----------
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 0c4d44bcfbf4..bcd890bff36a 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -620,8 +620,10 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	return 0;
 }
 
-static int emit_bpf_tail_call(struct jit_ctx *ctx)
+static int emit_bpf_tail_call(struct jit_ctx *ctx, u32 map_index, bool dyn_array)
 {
+	struct bpf_map *map = ctx->prog->aux->used_maps[map_index];
+
 	/* bpf_tail_call(void *prog_ctx, struct bpf_array *array, u64 index) */
 	const u8 r2 = bpf2a64[BPF_REG_2];
 	const u8 r3 = bpf2a64[BPF_REG_3];
@@ -638,9 +640,13 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 	/* if (index >= array->map.max_entries)
 	 *     goto out;
 	 */
-	off = offsetof(struct bpf_array, map.max_entries);
-	emit_a64_mov_i64(tmp, off, ctx);
-	emit(A64_LDR32(tmp, r2, tmp), ctx);
+	if (dyn_array) {
+		off = offsetof(struct bpf_array, map.max_entries);
+		emit_a64_mov_i64(tmp, off, ctx);
+		emit(A64_LDR32(tmp, r2, tmp), ctx);
+	} else {
+		emit_a64_mov_i64(tmp, map->max_entries, ctx);
+	}
 	emit(A64_MOV(0, r3, r3), ctx);
 	emit(A64_CMP(0, r3, tmp), ctx);
 	branch1 = ctx->image + ctx->idx;
@@ -659,15 +665,26 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 	/* (*tail_call_cnt_ptr)++; */
 	emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
 
-	/* prog = array->ptrs[index];
-	 * if (prog == NULL)
-	 *     goto out;
-	 */
-	off = offsetof(struct bpf_array, ptrs);
-	emit_a64_mov_i64(tmp, off, ctx);
-	emit(A64_ADD(1, tmp, r2, tmp), ctx);
-	emit(A64_LSL(1, prg, r3, 3), ctx);
-	emit(A64_LDR64(prg, tmp, prg), ctx);
+	if (dyn_array) {
+		/* prog = array->ptrs[index];
+		 * if (prog == NULL)
+		 *     goto out;
+		 */
+		off = offsetof(struct bpf_array, ptrs);
+		emit_a64_mov_i64(tmp, off, ctx);
+		emit(A64_ADD(1, tmp, r2, tmp), ctx);
+		emit(A64_LSL(1, prg, r3, 3), ctx);
+		emit(A64_LDR64(prg, tmp, prg), ctx);
+	} else {
+		/* tgt = array->ptrs[max_entries + index];
+		 * if (tgt == 0)
+		 *     goto out;
+		 */
+		emit(A64_LSL(1, prg, r3, 3), ctx);
+		off = offsetof(struct bpf_array, ptrs) + map->max_entries * sizeof(void *);
+		emit_a64_add_i(1, prg, prg, tmp, off, ctx);
+		emit(A64_LDR64(prg, r2, prg), ctx);
+	}
 	branch3 = ctx->image + ctx->idx;
 	emit(A64_NOP, ctx);
 
@@ -680,12 +697,17 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 
 	pop_callee_regs(ctx);
 
-	/* goto *(prog->bpf_func + prologue_offset); */
-	off = offsetof(struct bpf_prog, bpf_func);
-	emit_a64_mov_i64(tmp, off, ctx);
-	emit(A64_LDR64(tmp, prg, tmp), ctx);
-	emit(A64_ADD_I(1, tmp, tmp, sizeof(u32) * PROLOGUE_OFFSET), ctx);
-	emit(A64_BR(tmp), ctx);
+	if (dyn_array) {
+		/* goto *(prog->bpf_func + prologue_offset); */
+		off = offsetof(struct bpf_prog, bpf_func);
+		emit_a64_mov_i64(tmp, off, ctx);
+		emit(A64_LDR64(tmp, prg, tmp), ctx);
+		emit(A64_ADD_I(1, tmp, tmp, sizeof(u32) * PROLOGUE_OFFSET), ctx);
+		emit(A64_BR(tmp), ctx);
+	} else {
+		/* goto *tgt; */
+		emit(A64_BR(prg), ctx);
+	}
 
 	if (ctx->image) {
 		off = &ctx->image[ctx->idx] - branch1;
@@ -701,6 +723,12 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 	return 0;
 }
 
+int bpf_arch_tail_call_prologue_offset(void)
+{
+	/* offset is in instructions, convert to bytes */
+	return PROLOGUE_OFFSET * 4;
+}
+
 static int emit_atomic_ld_st(const struct bpf_insn *insn, struct jit_ctx *ctx)
 {
 	const s32 imm = insn->imm;
@@ -1617,7 +1645,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	}
 	/* tail call */
 	case BPF_JMP | BPF_TAIL_CALL:
-		if (emit_bpf_tail_call(ctx))
+		bool dynamic_array = (insn->imm >> 8) & 0xFF;
+		u32 map_index = insn->imm & 0xFF;
+
+		if (emit_bpf_tail_call(ctx, map_index, dynamic_array))
 			return -EFAULT;
 		break;
 	/* function return */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 4/4] bpf, lib/test_bpf: Fix broken tailcall tests
  2026-01-02 15:00 [PATCH bpf-next 0/4] bpf: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
                   ` (2 preceding siblings ...)
  2026-01-02 15:00 ` [PATCH bpf-next 3/4] bpf, arm64: " Leon Hwang
@ 2026-01-02 15:00 ` Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-01-02 15:00 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	David S . Miller, David Ahern, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Andrew Morton,
	linux-arm-kernel, linux-kernel, netdev, kernel-patches-bot,
	Leon Hwang

Update the tail call tests in test_bpf to work with the new tail call
optimization that requires:
  1. A valid used_maps array pointing to the prog array
  2. Precomputed tail call targets in array->ptrs[max_entries + index]

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 lib/test_bpf.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index af0041df2b72..680d34d46f19 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -15448,26 +15448,45 @@ static void __init destroy_tail_call_tests(struct bpf_array *progs)
 {
 	int i;
 
-	for (i = 0; i < ARRAY_SIZE(tail_call_tests); i++)
-		if (progs->ptrs[i])
-			bpf_prog_free(progs->ptrs[i]);
+	for (i = 0; i < ARRAY_SIZE(tail_call_tests); i++) {
+		struct bpf_prog *fp = progs->ptrs[i];
+
+		if (!fp)
+			continue;
+
+		/*
+		 * The used_maps points to fake maps that don't have
+		 * proper ops, so clear it before bpf_prog_free to avoid
+		 * bpf_free_used_maps trying to process it.
+		 */
+		kfree(fp->aux->used_maps);
+		fp->aux->used_maps = NULL;
+		fp->aux->used_map_cnt = 0;
+		bpf_prog_free(fp);
+	}
 	kfree(progs);
 }
 
 static __init int prepare_tail_call_tests(struct bpf_array **pprogs)
 {
+	int prologue_offset = bpf_arch_tail_call_prologue_offset();
 	int ntests = ARRAY_SIZE(tail_call_tests);
+	u32 max_entries = ntests + 1;
 	struct bpf_array *progs;
 	int which, err;
 
 	/* Allocate the table of programs to be used for tail calls */
-	progs = kzalloc(struct_size(progs, ptrs, ntests + 1), GFP_KERNEL);
+	progs = kzalloc(struct_size(progs, ptrs, max_entries * 2), GFP_KERNEL);
 	if (!progs)
 		goto out_nomem;
 
+	/* Set max_entries before JIT, as it's used in JIT */
+	progs->map.max_entries = max_entries;
+
 	/* Create all eBPF programs and populate the table */
 	for (which = 0; which < ntests; which++) {
 		struct tail_call_test *test = &tail_call_tests[which];
+		struct bpf_map *map = &progs->map;
 		struct bpf_prog *fp;
 		int len, i;
 
@@ -15487,10 +15506,16 @@ static __init int prepare_tail_call_tests(struct bpf_array **pprogs)
 		if (!fp)
 			goto out_nomem;
 
+		fp->aux->used_maps = kmalloc_array(1, sizeof(map), GFP_KERNEL);
+		if (!fp->aux->used_maps)
+			goto out_nomem;
+
 		fp->len = len;
 		fp->type = BPF_PROG_TYPE_SOCKET_FILTER;
 		fp->aux->stack_depth = test->stack_depth;
 		fp->aux->tail_call_reachable = test->has_tail_call;
+		fp->aux->used_maps[0] = map;
+		fp->aux->used_map_cnt = 1;
 		memcpy(fp->insnsi, test->insns, len * sizeof(struct bpf_insn));
 
 		/* Relocate runtime tail call offsets and addresses */
@@ -15548,6 +15573,10 @@ static __init int prepare_tail_call_tests(struct bpf_array **pprogs)
 				if ((long)__bpf_call_base + insn->imm != addr)
 					*insn = BPF_JMP_A(0); /* Skip: NOP */
 				break;
+
+			case BPF_JMP | BPF_TAIL_CALL:
+				insn->imm = 0;
+				break;
 			}
 		}
 
@@ -15555,11 +15584,11 @@ static __init int prepare_tail_call_tests(struct bpf_array **pprogs)
 		if (err)
 			goto out_err;
 
+		progs->ptrs[max_entries + which] = (void *) fp->bpf_func + prologue_offset;
 		progs->ptrs[which] = fp;
 	}
 
 	/* The last entry contains a NULL program pointer */
-	progs->map.max_entries = ntests + 1;
 	*pprogs = progs;
 	return 0;
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset
  2026-01-02 15:00 ` [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset Leon Hwang
@ 2026-01-02 15:21   ` bot+bpf-ci
  2026-01-02 15:38     ` Leon Hwang
  0 siblings, 1 reply; 7+ messages in thread
From: bot+bpf-ci @ 2026-01-02 15:21 UTC (permalink / raw)
  To: leon.hwang, bpf
  Cc: ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, puranjay, xukuohai,
	catalin.marinas, will, davem, dsahern, tglx, mingo, bp,
	dave.hansen, x86, hpa, akpm, linux-arm-kernel, linux-kernel,
	netdev, kernel-patches-bot, leon.hwang, ast, andrii, daniel,
	martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 1eeb31c5b317..beedd1281c22 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -127,6 +127,9 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>  			array_size += (u64) max_entries * elem_size;
>  		}
>  	}
> +	if (attr->map_type == BPF_MAP_TYPE_PROG_ARRAY && bpf_arch_tail_call_prologue_offset())
> +		/* Store tailcall targets */
> +		array_size += (u64) max_entries * sizeof(void *);

Should array_map_mem_usage() also account for this additional space?
Currently array_map_mem_usage() handles special cases for percpu arrays
and mmapable arrays, but has no corresponding check for PROG_ARRAY maps
when bpf_arch_tail_call_prologue_offset() is non-zero. This would cause
the reported memory usage to be lower than the actual allocation when
an architecture implements this function (e.g., x86_64 and arm64 in the
subsequent patches of this series).

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20660615603

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset
  2026-01-02 15:21   ` bot+bpf-ci
@ 2026-01-02 15:38     ` Leon Hwang
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-01-02 15:38 UTC (permalink / raw)
  To: bot+bpf-ci, bpf
  Cc: ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, puranjay, xukuohai,
	catalin.marinas, will, davem, dsahern, tglx, mingo, bp,
	dave.hansen, x86, hpa, akpm, linux-arm-kernel, linux-kernel,
	netdev, kernel-patches-bot, martin.lau, clm, ihor.solodrai



On 2026/1/2 23:21, bot+bpf-ci@kernel.org wrote:
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 1eeb31c5b317..beedd1281c22 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -127,6 +127,9 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>>  			array_size += (u64) max_entries * elem_size;
>>  		}
>>  	}
>> +	if (attr->map_type == BPF_MAP_TYPE_PROG_ARRAY && bpf_arch_tail_call_prologue_offset())
>> +		/* Store tailcall targets */
>> +		array_size += (u64) max_entries * sizeof(void *);
> 
> Should array_map_mem_usage() also account for this additional space?
> Currently array_map_mem_usage() handles special cases for percpu arrays
> and mmapable arrays, but has no corresponding check for PROG_ARRAY maps
> when bpf_arch_tail_call_prologue_offset() is non-zero. This would cause
> the reported memory usage to be lower than the actual allocation when
> an architecture implements this function (e.g., x86_64 and arm64 in the
> subsequent patches of this series).
> 
> [ ... ]
> 

You are right, array_map_mem_usage() needs to stay in sync with the
allocation logic to avoid under-reporting memory.

I will include this fix in the next revision after I’ve collected more
feedback.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-02 15:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02 15:00 [PATCH bpf-next 0/4] bpf: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
2026-01-02 15:00 ` [PATCH bpf-next 1/4] bpf: tailcall: Introduce bpf_arch_tail_call_prologue_offset Leon Hwang
2026-01-02 15:21   ` bot+bpf-ci
2026-01-02 15:38     ` Leon Hwang
2026-01-02 15:00 ` [PATCH bpf-next 2/4] bpf, x64: tailcall: Eliminate max_entries and bpf_func access at runtime Leon Hwang
2026-01-02 15:00 ` [PATCH bpf-next 3/4] bpf, arm64: " Leon Hwang
2026-01-02 15:00 ` [PATCH bpf-next 4/4] bpf, lib/test_bpf: Fix broken tailcall tests Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).