[PATCH bpf-next v3 0/2] bpf: Relax 8 frame limitation for global subprogs

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/2] bpf: Relax 8 frame limitation for global subprogs
@ 2026-03-03  4:31 Emil Tsalapatis
  2026-03-03  4:31 ` [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks Emil Tsalapatis
  2026-03-03  4:31 ` [PATCH bpf-next v3 2/2] bpf: Add deep call stack selftests Emil Tsalapatis
  0 siblings, 2 replies; 12+ messages in thread
From: Emil Tsalapatis @ 2026-03-03  4:31 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, eddyz87, martin.lau, memxor, song,
	yonghong.song, Emil Tsalapatis

The BPF verifier currently limits the maximum runtime call stack to
8 frames. Larger BPF programs like sched-ext schedulers routinely
fail verification because they exceed this limit, even as they use
very little actual stack space for each frame.

Relax the verifier to permit call stacks > 8 frames deep when the
call stacks include global subprogs. The old 8 stack frame limit now
only applies to call stacks composed entirely of static function calls.
This works because global functions are each verified in isolation, so
the verifier does not need to cross-reference verification state across
the function call boundary, which has been the reason for limiting the
call stack size in the first place.

This patch does not change the verification time limit of 8 stack
frames. Static functions that are inlined for verification purposes
still only go 8 frames deep to avoid changing the verifier's internal
data structures used for verification. These data structures only
support holding information on up to 8 stack frames.

This patch also does not adjust the actual maximum stack size of 512.

CHANGELOG
=========
v2 -> v3 (https://lore.kernel.org/bpf/20260210213606.475415-1-emil@etsalapatis.com/)
- Change logic to remove arbitrary limit on call depth (Eduard)
- Add additional selftests (Eduard)

v1 -> v2 (https://lore.kernel.org/bpf/20260202233716.835638-1-emil@etsalapatis.com)
- Adjust patch to only increase the runtime stack depth, leaving the
verification-time stack depth unchanged (Alexei)

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>

Emil Tsalapatis (2):
  bpf: Only enforce 8 frame call stack limit for all-static stacks
  bpf: Add deep call stack selftests

 include/linux/bpf_verifier.h                  |  6 ++
 kernel/bpf/verifier.c                         | 43 ++++++----
 .../bpf/prog_tests/test_global_funcs.c        |  2 +
 .../selftests/bpf/progs/test_global_func3.c   | 18 ++---
 .../bpf/progs/test_global_func_deep_stack.c   | 79 +++++++++++++++++++
 5 files changed, 123 insertions(+), 25 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_func_deep_stack.c

-- 
2.49.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-03  4:31 [PATCH bpf-next v3 0/2] bpf: Relax 8 frame limitation for global subprogs Emil Tsalapatis
@ 2026-03-03  4:31 ` Emil Tsalapatis
  2026-03-03  5:13   ` bot+bpf-ci
                     ` (2 more replies)
  2026-03-03  4:31 ` [PATCH bpf-next v3 2/2] bpf: Add deep call stack selftests Emil Tsalapatis
  1 sibling, 3 replies; 12+ messages in thread
From: Emil Tsalapatis @ 2026-03-03  4:31 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, eddyz87, martin.lau, memxor, song,
	yonghong.song, Emil Tsalapatis

The BPF verifier currently enforces a call stack depth of 8 frames,
regardless of the actual stack space consumption of those frames. The
limit is necessary for static call stacks, because the bookkeeping data
structures used by the verifier when stepping into static functions
during verification only support 8 stack frames. However, this
limitation only matters for static stack frames: Global subprogs are
verified by themselves and do not require limiting the call depth.

Relax this limitation to only apply to static stack frames. Verification
now only fails when there is a sequence of 8 calls to non-global
subprogs. Calling into a global subprog resets the counter. This allows
deeper call stacks, provided all frames still fit in the stack.

The change does not increase the maximum size of the call stack, only
the maximum number of frames we can place in it.

Also change the progs/test_global_func3.c selftest to use static
functions, since with the new patch it would otherwise unexpectedly
pass verification.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 include/linux/bpf_verifier.h                  |  6 +++
 kernel/bpf/verifier.c                         | 43 ++++++++++++-------
 .../selftests/bpf/progs/test_global_func3.c   | 18 ++++----
 3 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index c1e30096ea7b..39a54e631bcd 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -650,6 +650,8 @@ enum priv_stack_mode {
 	PRIV_STACK_ADAPTIVE,
 };
 
+struct bpf_subprog_info;
+
 struct bpf_subprog_info {
 	/* 'start' has to be the first field otherwise find_subprog() won't work */
 	u32 start; /* insn idx of function entry point */
@@ -677,6 +679,10 @@ struct bpf_subprog_info {
 
 	enum priv_stack_mode priv_stack_mode;
 	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
+
+	int ret_insn;
+	int frame;
+	int cidx;
 };
 
 struct bpf_verifier_env;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1153a828ce8d..d362ddd47d71 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6652,9 +6652,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 	struct bpf_insn *insn = env->prog->insnsi;
 	int depth = 0, frame = 0, i, subprog_end, subprog_depth;
 	bool tail_call_reachable = false;
-	int ret_insn[MAX_CALL_FRAMES];
-	int ret_prog[MAX_CALL_FRAMES];
-	int j;
+	int total;
+	int tmp;
+
+	/* no caller idx */
+	subprog[idx].cidx = -1;
 
 	i = subprog[idx].start;
 	if (!priv_stack_supported)
@@ -6706,8 +6708,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 	} else {
 		depth += subprog_depth;
 		if (depth > MAX_BPF_STACK) {
+			for (total = 1; subprog[idx].cidx >= 0 ; total++)
+				idx = subprog[idx].cidx;
+
 			verbose(env, "combined stack size of %d calls is %d. Too large\n",
-				frame + 1, depth);
+				total, depth);
 			return -EACCES;
 		}
 	}
@@ -6723,8 +6728,8 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 				continue;
 			if (subprog[idx].is_cb)
 				err = true;
-			for (int c = 0; c < frame && !err; c++) {
-				if (subprog[ret_prog[c]].is_cb) {
+			for (tmp = idx; tmp >= 0 && !err; tmp = subprog[tmp].cidx) {
+				if (subprog[tmp].is_cb) {
 					err = true;
 					break;
 				}
@@ -6740,8 +6745,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
 			continue;
 		/* remember insn and function to return to */
-		ret_insn[frame] = i + 1;
-		ret_prog[frame] = idx;
+
+		subprog[idx].frame = frame;
+		subprog[idx].ret_insn = i + 1;
 
 		/* find the callee */
 		next_insn = i + insn[i].imm + 1;
@@ -6762,6 +6768,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 			}
 		}
 		i = next_insn;
+
+		/* caller idx */
+		subprog[sidx].cidx = idx;
 		idx = sidx;
 		if (!priv_stack_supported)
 			subprog[idx].priv_stack_mode = NO_PRIV_STACK;
@@ -6769,7 +6778,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 		if (subprog[idx].has_tail_call)
 			tail_call_reachable = true;
 
-		frame++;
+		frame = subprog_is_global(env, idx) ? 0 : frame + 1;
 		if (frame >= MAX_CALL_FRAMES) {
 			verbose(env, "the call stack of %d frames is too deep !\n",
 				frame);
@@ -6783,12 +6792,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 	 * tail call counter throughout bpf2bpf calls combined with tailcalls
 	 */
 	if (tail_call_reachable)
-		for (j = 0; j < frame; j++) {
-			if (subprog[ret_prog[j]].is_exception_cb) {
+		for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx) {
+			if (subprog[tmp].is_exception_cb) {
 				verbose(env, "cannot tail call within exception cb\n");
 				return -EINVAL;
 			}
-			subprog[ret_prog[j]].tail_call_reachable = true;
+			subprog[tmp].tail_call_reachable = true;
 		}
 	if (subprog[0].tail_call_reachable)
 		env->prog->aux->tail_call_reachable = true;
@@ -6796,13 +6805,15 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
 	/* end of for() loop means the last insn of the 'subprog'
 	 * was reached. Doesn't matter whether it was JA or EXIT
 	 */
-	if (frame == 0)
+	if (frame == 0 && subprog[idx].cidx < 0)
 		return 0;
 	if (subprog[idx].priv_stack_mode != PRIV_STACK_ADAPTIVE)
 		depth -= round_up_stack_depth(env, subprog[idx].stack_depth);
-	frame--;
-	i = ret_insn[frame];
-	idx = ret_prog[frame];
+
+	idx = subprog[idx].cidx;
+	frame = subprog[idx].frame;
+	i = subprog[idx].ret_insn;
+
 	goto continue_func;
 }
 
diff --git a/tools/testing/selftests/bpf/progs/test_global_func3.c b/tools/testing/selftests/bpf/progs/test_global_func3.c
index 142b682d3c2f..974fd8c19561 100644
--- a/tools/testing/selftests/bpf/progs/test_global_func3.c
+++ b/tools/testing/selftests/bpf/progs/test_global_func3.c
@@ -5,56 +5,56 @@
 #include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f1(struct __sk_buff *skb)
 {
 	return skb->len;
 }
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f2(int val, struct __sk_buff *skb)
 {
 	return f1(skb) + val;
 }
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f3(int val, struct __sk_buff *skb, int var)
 {
 	return f2(var, skb) + val;
 }
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f4(struct __sk_buff *skb)
 {
 	return f3(1, skb, 2);
 }
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f5(struct __sk_buff *skb)
 {
 	return f4(skb);
 }
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f6(struct __sk_buff *skb)
 {
 	return f5(skb);
 }
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f7(struct __sk_buff *skb)
 {
 	return f6(skb);
 }
 
-__attribute__ ((noinline))
+static __attribute__ ((noinline))
 int f8(struct __sk_buff *skb)
 {
 	return f7(skb);
 }
 
 SEC("tc")
-__failure __msg("the call stack of 8 frames")
+__failure __msg("the call stack of 9 frames")
 int global_func3(struct __sk_buff *skb)
 {
 	return f8(skb);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 2/2] bpf: Add deep call stack selftests
  2026-03-03  4:31 [PATCH bpf-next v3 0/2] bpf: Relax 8 frame limitation for global subprogs Emil Tsalapatis
  2026-03-03  4:31 ` [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks Emil Tsalapatis
@ 2026-03-03  4:31 ` Emil Tsalapatis
  2026-03-04  1:15   ` Eduard Zingerman
  1 sibling, 1 reply; 12+ messages in thread
From: Emil Tsalapatis @ 2026-03-03  4:31 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, eddyz87, martin.lau, memxor, song,
	yonghong.song, Emil Tsalapatis

Add tests that demonstrate the verifier support for deep call stacks
while still enforcing maximum stack size limits.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 .../bpf/prog_tests/test_global_funcs.c        |  2 +
 .../bpf/progs/test_global_func_deep_stack.c   | 79 +++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_func_deep_stack.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
index e905cbaf6b3d..500446808908 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
@@ -18,6 +18,7 @@
 #include "test_global_func15.skel.h"
 #include "test_global_func16.skel.h"
 #include "test_global_func17.skel.h"
+#include "test_global_func_deep_stack.skel.h"
 #include "test_global_func_ctx_args.skel.h"
 
 #include "bpf/libbpf_internal.h"
@@ -155,6 +156,7 @@ void test_test_global_funcs(void)
 	RUN_TESTS(test_global_func15);
 	RUN_TESTS(test_global_func16);
 	RUN_TESTS(test_global_func17);
+	RUN_TESTS(test_global_func_deep_stack);
 	RUN_TESTS(test_global_func_ctx_args);
 
 	if (test__start_subtest("ctx_arg_rewrite"))
diff --git a/tools/testing/selftests/bpf/progs/test_global_func_deep_stack.c b/tools/testing/selftests/bpf/progs/test_global_func_deep_stack.c
new file mode 100644
index 000000000000..02b1a0f9d6ac
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_global_func_deep_stack.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2026 Meta Platforms, Inc and affiliates. */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+/*
+ * Macro tricks to tersely define for long non-recursive call chains. Add
+ * computation to the functions prevent tail recursion from reducing the
+ * stack size to 0.
+ */
+
+#define CAT(a, b) a ## b
+#define XCAT(a, b) CAT(a, b)
+
+#define F_0 \
+__attribute__((noinline)) \
+int f0(unsigned long a) { volatile long b = a + 16; if (a == 0) return 0; return b; }
+
+#define FN(n, prev) \
+__attribute__((noinline)) \
+int XCAT(f, n)(unsigned long a) { volatile long b = XCAT(f, prev)(a - 1); if (!b) return 0; return b + 1; }
+
+/* Call chain 33 levels deep. */
+#define F_1 F_0		FN(1, 0)
+#define F_2 F_1   	FN(2, 1)
+#define F_3 F_2   	FN(3, 2)
+#define F_4 F_3   	FN(4, 3)
+#define F_5 F_4   	FN(5, 4)
+#define F_6 F_5   	FN(6, 5)
+#define F_7 F_6   	FN(7, 6)
+#define F_8 F_7   	FN(8, 7)
+#define F_9 F_8   	FN(9, 8)
+#define F_10 F_9	FN(10, 9)
+#define F_11 F_10	FN(11, 10)
+#define F_12 F_11   	FN(12, 11)
+#define F_13 F_12   	FN(13, 12)
+#define F_14 F_13   	FN(14, 13)
+#define F_15 F_14   	FN(15, 14)
+#define F_16 F_15   	FN(16, 15)
+#define F_17 F_16   	FN(17, 16)
+#define F_18 F_17   	FN(18, 17)
+#define F_19 F_18   	FN(19, 18)
+#define F_20 F_19	FN(20, 19)
+#define F_21 F_20	FN(21, 20)
+#define F_22 F_21   	FN(22, 21)
+#define F_23 F_22   	FN(23, 22)
+#define F_24 F_23   	FN(24, 23)
+#define F_25 F_24   	FN(25, 24)
+#define F_26 F_25   	FN(26, 25)
+#define F_27 F_26   	FN(27, 26)
+#define F_28 F_27   	FN(28, 27)
+#define F_29 F_28   	FN(29, 28)
+#define F_30 F_29	FN(30, 29)
+#define F_31 F_30	FN(31, 30)
+#define F_32 F_31   	FN(32, 31)
+
+#define CAT2(a, b) a ## b
+#define XCAT2(a, b) CAT2(a, b)
+
+#define F(n) XCAT2(F_, n)
+
+F(32)
+
+/* Ensure that even 32 levels deep, the function verifies. */
+SEC("syscall")
+__success
+int global_func_deep_stack_success(struct __sk_buff *skb)
+{
+	return f31(55);
+}
+
+/* Check we actually honor stack limits (33 * 16 = 528 > 512 = MAX_STACK_DEPTH). */
+SEC("syscall")
+__failure __msg("combined stack size of 34 calls")
+int global_func_deep_stack_fail(struct __sk_buff *skb)
+{
+	return f32(123);
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-03  4:31 ` [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks Emil Tsalapatis
@ 2026-03-03  5:13   ` bot+bpf-ci
  2026-03-03 17:55     ` Emil Tsalapatis
  2026-03-04  1:01   ` Eduard Zingerman
  2026-03-04 16:51   ` Mykyta Yatsenko
  2 siblings, 1 reply; 12+ messages in thread
From: bot+bpf-ci @ 2026-03-03  5:13 UTC (permalink / raw)
  To: emil, bpf
  Cc: andrii, ast, daniel, eddyz87, martin.lau, memxor, song,
	yonghong.song, emil, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3048 bytes --]

> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -650,6 +650,8 @@ enum priv_stack_mode {
>  	PRIV_STACK_ADAPTIVE,
>  };
>
> +struct bpf_subprog_info;
> +
>  struct bpf_subprog_info {

This isn't a bug, but the forward declaration above the struct
definition appears unnecessary.  The new fields (ret_insn, frame,
cidx) are all plain int types, not self-referential pointers that
would require forward declaration.  Is there a reason for it?

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c

[ ... ]

> @@ -6799,7 +6808,7 @@
>  		if (subprog[idx].has_tail_call)
>  			tail_call_reachable = true;
>
> -		frame++;
> +		frame = subprog_is_global(env, idx) ? 0 : frame + 1;
>  		if (frame >= MAX_CALL_FRAMES) {
>  			verbose(env, "the call stack of %d frames is too deep !\n",
>  				frame);

In the selftest below, the expected error message was changed to
"the call stack of 9 frames", but this verbose() call still prints
the raw frame counter.

With 9 static functions in the chain (global_func3 + f1 through f8),
frame reaches 8 when entering f1, and the verifier outputs "the call
stack of 8 frames is too deep !" -- which does not contain "9 frames".

The MAX_BPF_STACK error path was updated from frame + 1 to a cidx
chain walk for total count:

    for (total = 1; subprog[idx].cidx >= 0 ; total++)
        idx = subprog[idx].cidx;
    verbose(env, "combined stack size of %d calls ...\n", total, depth);

Was a similar update intended here?  Changing frame to frame + 1 in the
verbose() call would match the selftest expectation and be consistent
with how the stack overflow message now reports total count.

> diff --git a/tools/testing/selftests/bpf/progs/test_global_func3.c b/tools/testing/selftests/bpf/progs/test_global_func3.c
> --- a/tools/testing/selftests/bpf/progs/test_global_func3.c
> +++ b/tools/testing/selftests/bpf/progs/test_global_func3.c

[ ... ]

> @@ -5,56 +5,56 @@
>  SEC("tc")
> -__failure __msg("the call stack of 8 frames")
> +__failure __msg("the call stack of 9 frames")
>  int global_func3(struct __sk_buff *skb)
>  {
>  	return f8(skb);

This expected message does not match the verifier output as described
above.

Also, the function header comment for check_max_stack_depth_subprog()
currently reads:

    Since recursion is prevented by check_cfg() this algorithm
    only needs a local stack of MAX_CALL_FRAMES to remember callsites

The local arrays ret_insn[] and ret_prog[] have been removed in favor
of struct fields (subprog[].cidx, subprog[].frame, subprog[].ret_insn).
Should this comment be updated to reflect the new approach?  The v2
patch did update this comment for its design.

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22608683847

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-03  5:13   ` bot+bpf-ci
@ 2026-03-03 17:55     ` Emil Tsalapatis
  0 siblings, 0 replies; 12+ messages in thread
From: Emil Tsalapatis @ 2026-03-03 17:55 UTC (permalink / raw)
  To: bot+bpf-ci, bpf
  Cc: andrii, ast, daniel, eddyz87, martin.lau, memxor, song,
	yonghong.song, clm, ihor.solodrai

On Tue Mar 3, 2026 at 12:13 AM EST, bot+bpf-ci wrote:
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -650,6 +650,8 @@ enum priv_stack_mode {
>>  	PRIV_STACK_ADAPTIVE,
>>  };
>>
>> +struct bpf_subprog_info;
>> +
>>  struct bpf_subprog_info {
>
> This isn't a bug, but the forward declaration above the struct
> definition appears unnecessary.  The new fields (ret_insn, frame,
> cidx) are all plain int types, not self-referential pointers that
> would require forward declaration.  Is there a reason for it?
>

Ack, it _used_ to be self referential and I forgot to remove it.
Will fix.

>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>
> [ ... ]
>
>> @@ -6799,7 +6808,7 @@
>>  		if (subprog[idx].has_tail_call)
>>  			tail_call_reachable = true;
>>
>> -		frame++;
>> +		frame = subprog_is_global(env, idx) ? 0 : frame + 1;
>>  		if (frame >= MAX_CALL_FRAMES) {
>>  			verbose(env, "the call stack of %d frames is too deep !\n",
>>  				frame);
>
> In the selftest below, the expected error message was changed to
> "the call stack of 9 frames", but this verbose() call still prints
> the raw frame counter.
>
> With 9 static functions in the chain (global_func3 + f1 through f8),
> frame reaches 8 when entering f1, and the verifier outputs "the call
> stack of 8 frames is too deep !" -- which does not contain "9 frames".
>
> The MAX_BPF_STACK error path was updated from frame + 1 to a cidx
> chain walk for total count:
>
>     for (total = 1; subprog[idx].cidx >= 0 ; total++)
>         idx = subprog[idx].cidx;
>     verbose(env, "combined stack size of %d calls ...\n", total, depth);
>
> Was a similar update intended here?  Changing frame to frame + 1 in the
> verbose() call would match the selftest expectation and be consistent
> with how the stack overflow message now reports total count.
>

No, that one should still be either 8 or 9 (see below) because it's the 
actual limit - including the global frames in the tally would be confusing. 
Maybe the right thing is to adjust the commit message.

>> diff --git a/tools/testing/selftests/bpf/progs/test_global_func3.c b/tools/testing/selftests/bpf/progs/test_global_func3.c
>> --- a/tools/testing/selftests/bpf/progs/test_global_func3.c
>> +++ b/tools/testing/selftests/bpf/progs/test_global_func3.c
>
> [ ... ]
>
>> @@ -5,56 +5,56 @@
>>  SEC("tc")
>> -__failure __msg("the call stack of 8 frames")
>> +__failure __msg("the call stack of 9 frames")
>>  int global_func3(struct __sk_buff *skb)
>>  {
>>  	return f8(skb);
>
> This expected message does not match the verifier output as described
> above.
>
> Also, the function header comment for check_max_stack_depth_subprog()
> currently reads:
>
>     Since recursion is prevented by check_cfg() this algorithm
>     only needs a local stack of MAX_CALL_FRAMES to remember callsites
>
> The local arrays ret_insn[] and ret_prog[] have been removed in favor
> of struct fields (subprog[].cidx, subprog[].frame, subprog[].ret_insn).
> Should this comment be updated to reflect the new approach?  The v2
> patch did update this comment for its design.
>

It does though, all tests are passing: There are two almost identical messages 
in the verifier that report off-by-one different numbers: 

"the call stack of %d frames is too deep",  (setup_func_entry)
"the call stack of %d frames is too deep !" (check_max_stack_depth_subprog)

The one triggering now is in setup_func_entry inside do_check_{main,
subprogs}, so the number of frames reported has changed. Expecting 9
frames is consistent with other selftests.

>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22608683847


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-03  4:31 ` [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks Emil Tsalapatis
  2026-03-03  5:13   ` bot+bpf-ci
@ 2026-03-04  1:01   ` Eduard Zingerman
  2026-03-05 19:38     ` Emil Tsalapatis
  2026-03-04 16:51   ` Mykyta Yatsenko
  2 siblings, 1 reply; 12+ messages in thread
From: Eduard Zingerman @ 2026-03-04  1:01 UTC (permalink / raw)
  To: Emil Tsalapatis, bpf
  Cc: andrii, ast, daniel, martin.lau, memxor, song, yonghong.song

On Mon, 2026-03-02 at 23:31 -0500, Emil Tsalapatis wrote:
> The BPF verifier currently enforces a call stack depth of 8 frames,
> regardless of the actual stack space consumption of those frames. The
> limit is necessary for static call stacks, because the bookkeeping data
> structures used by the verifier when stepping into static functions
> during verification only support 8 stack frames. However, this
> limitation only matters for static stack frames: Global subprogs are
> verified by themselves and do not require limiting the call depth.
> 
> Relax this limitation to only apply to static stack frames. Verification
> now only fails when there is a sequence of 8 calls to non-global
> subprogs. Calling into a global subprog resets the counter. This allows
> deeper call stacks, provided all frames still fit in the stack.
> 
> The change does not increase the maximum size of the call stack, only
> the maximum number of frames we can place in it.
> 
> Also change the progs/test_global_func3.c selftest to use static
> functions, since with the new patch it would otherwise unexpectedly
> pass verification.
> 
> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
>  include/linux/bpf_verifier.h                  |  6 +++
>  kernel/bpf/verifier.c                         | 43 ++++++++++++-------
>  .../selftests/bpf/progs/test_global_func3.c   | 18 ++++----
>  3 files changed, 42 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index c1e30096ea7b..39a54e631bcd 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -650,6 +650,8 @@ enum priv_stack_mode {
>  	PRIV_STACK_ADAPTIVE,
>  };
>  
> +struct bpf_subprog_info;
> +
>  struct bpf_subprog_info {
>  	/* 'start' has to be the first field otherwise find_subprog() won't work */
>  	u32 start; /* insn idx of function entry point */
> @@ -677,6 +679,10 @@ struct bpf_subprog_info {
>  
>  	enum priv_stack_mode priv_stack_mode;
>  	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
> +

Nit: please add a comment here saying the below are temporary values
     used in check_max_stack_depth_subprog() (tbh, I'd move those to a
     separate array in env but maybe that's an overkill).

> +	int ret_insn;
> +	int frame;
> +	int cidx;

Nit: please rename to `caller` and add a comment.

>  };
>  
>  struct bpf_verifier_env;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 1153a828ce8d..d362ddd47d71 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6652,9 +6652,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	struct bpf_insn *insn = env->prog->insnsi;
>  	int depth = 0, frame = 0, i, subprog_end, subprog_depth;
>  	bool tail_call_reachable = false;
> -	int ret_insn[MAX_CALL_FRAMES];
> -	int ret_prog[MAX_CALL_FRAMES];
> -	int j;
> +	int total;
> +	int tmp;
> +
> +	/* no caller idx */
> +	subprog[idx].cidx = -1;
>  
>  	i = subprog[idx].start;
>  	if (!priv_stack_supported)
> @@ -6706,8 +6708,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	} else {
>  		depth += subprog_depth;
>  		if (depth > MAX_BPF_STACK) {
> +			for (total = 1; subprog[idx].cidx >= 0 ; total++)
> +				idx = subprog[idx].cidx;
> +
>  			verbose(env, "combined stack size of %d calls is %d. Too large\n",
> -				frame + 1, depth);
> +				total, depth);
>  			return -EACCES;
>  		}
>  	}
> @@ -6723,8 +6728,8 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  				continue;
>  			if (subprog[idx].is_cb)
>  				err = true;

Below the old version of the loop does not include current frame, but
the new version *does* include current frame. Looks like the above
check can be removed.

> -			for (int c = 0; c < frame && !err; c++) {
> -				if (subprog[ret_prog[c]].is_cb) {
> +			for (tmp = idx; tmp >= 0 && !err; tmp = subprog[tmp].cidx) {
> +				if (subprog[tmp].is_cb) {
>  					err = true;
>  					break;
>  				}

This code checks that bpf_throw() cannot be called from callbacks.
The `is_cb` is set in push_callback_call() both for sync and async
callbacks, it is also set for exception callback separately.
Meaning that check_return_code() can be simplified further.

> @@ -6740,8 +6745,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
>  			continue;
>  		/* remember insn and function to return to */
> -		ret_insn[frame] = i + 1;
> -		ret_prog[frame] = idx;
> +
> +		subprog[idx].frame = frame;
> +		subprog[idx].ret_insn = i + 1;

Nit: move this down to `subprog[sidx].cidx = idx;`, so that the whole
     "frame" setup is done in one place?

>  
>  		/* find the callee */
>  		next_insn = i + insn[i].imm + 1;
> @@ -6762,6 +6768,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  			}
>  		}
>  		i = next_insn;
> +
> +		/* caller idx */
> +		subprog[sidx].cidx = idx;
>  		idx = sidx;
>  		if (!priv_stack_supported)
>  			subprog[idx].priv_stack_mode = NO_PRIV_STACK;
> @@ -6769,7 +6778,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  		if (subprog[idx].has_tail_call)
>  			tail_call_reachable = true;
>  
> -		frame++;
> +		frame = subprog_is_global(env, idx) ? 0 : frame + 1;
>  		if (frame >= MAX_CALL_FRAMES) {
>  			verbose(env, "the call stack of %d frames is too deep !\n",
>  				frame);
> @@ -6783,12 +6792,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	 * tail call counter throughout bpf2bpf calls combined with tailcalls
>  	 */
>  	if (tail_call_reachable)

Unrelated to current patch, but looks like an issue.
`tail_call_reachable` is not reset anywhere in
check_max_stack_depth_subprog(). Once it flips to true all calls
visited afterwards trigger spine to be marked.

> -		for (j = 0; j < frame; j++) {
> -			if (subprog[ret_prog[j]].is_exception_cb) {
> +		for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx) {
> +			if (subprog[tmp].is_exception_cb) {

As with previous such loop, the new code seem to include current frame.
Does this change anything?

>  				verbose(env, "cannot tail call within exception cb\n");
>  				return -EINVAL;
>  			}
> -			subprog[ret_prog[j]].tail_call_reachable = true;
> +			subprog[tmp].tail_call_reachable = true;
>  		}
>  	if (subprog[0].tail_call_reachable)
>  		env->prog->aux->tail_call_reachable = true;
> @@ -6796,13 +6805,15 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	/* end of for() loop means the last insn of the 'subprog'
>  	 * was reached. Doesn't matter whether it was JA or EXIT
>  	 */
> -	if (frame == 0)
> +	if (frame == 0 && subprog[idx].cidx < 0)
>  		return 0;
>  	if (subprog[idx].priv_stack_mode != PRIV_STACK_ADAPTIVE)
>  		depth -= round_up_stack_depth(env, subprog[idx].stack_depth);
> -	frame--;
> -	i = ret_insn[frame];
> -	idx = ret_prog[frame];
> +
> +	idx = subprog[idx].cidx;
> +	frame = subprog[idx].frame;
> +	i = subprog[idx].ret_insn;
> +
>  	goto continue_func;
>  }
>  

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf: Add deep call stack selftests
  2026-03-03  4:31 ` [PATCH bpf-next v3 2/2] bpf: Add deep call stack selftests Emil Tsalapatis
@ 2026-03-04  1:15   ` Eduard Zingerman
  2026-03-05 17:37     ` Emil Tsalapatis
  0 siblings, 1 reply; 12+ messages in thread
From: Eduard Zingerman @ 2026-03-04  1:15 UTC (permalink / raw)
  To: Emil Tsalapatis, bpf
  Cc: andrii, ast, daniel, martin.lau, memxor, song, yonghong.song

On Mon, 2026-03-02 at 23:31 -0500, Emil Tsalapatis wrote:
> Add tests that demonstrate the verifier support for deep call stacks
> while still enforcing maximum stack size limits.
> 
> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> +/* Check we actually honor stack limits (33 * 16 = 528 > 512 = MAX_STACK_DEPTH). */

Nit: I was wondering why stack usage per func is 16, as each one only
     consumes one slot. Turns out round_up_stack_depth() rounds things
     up to 16. Could you please either add a comment here, or just add
     a second long in each function?

> +SEC("syscall")
> +__failure __msg("combined stack size of 34 calls")
> +int global_func_deep_stack_fail(struct __sk_buff *skb)
> +{
> +	return f32(123);
> +}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-03  4:31 ` [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks Emil Tsalapatis
  2026-03-03  5:13   ` bot+bpf-ci
  2026-03-04  1:01   ` Eduard Zingerman
@ 2026-03-04 16:51   ` Mykyta Yatsenko
  2026-03-05 17:36     ` Emil Tsalapatis
  2 siblings, 1 reply; 12+ messages in thread
From: Mykyta Yatsenko @ 2026-03-04 16:51 UTC (permalink / raw)
  To: Emil Tsalapatis, bpf
  Cc: andrii, ast, daniel, eddyz87, martin.lau, memxor, song,
	yonghong.song, Emil Tsalapatis

Emil Tsalapatis <emil@etsalapatis.com> writes:

> The BPF verifier currently enforces a call stack depth of 8 frames,
> regardless of the actual stack space consumption of those frames. The
> limit is necessary for static call stacks, because the bookkeeping data
> structures used by the verifier when stepping into static functions
> during verification only support 8 stack frames. However, this
> limitation only matters for static stack frames: Global subprogs are
> verified by themselves and do not require limiting the call depth.
>
> Relax this limitation to only apply to static stack frames. Verification
> now only fails when there is a sequence of 8 calls to non-global
> subprogs. Calling into a global subprog resets the counter. This allows
> deeper call stacks, provided all frames still fit in the stack.
>
> The change does not increase the maximum size of the call stack, only
> the maximum number of frames we can place in it.
>
> Also change the progs/test_global_func3.c selftest to use static
> functions, since with the new patch it would otherwise unexpectedly
> pass verification.
>
> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
>  include/linux/bpf_verifier.h                  |  6 +++
>  kernel/bpf/verifier.c                         | 43 ++++++++++++-------
>  .../selftests/bpf/progs/test_global_func3.c   | 18 ++++----
>  3 files changed, 42 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index c1e30096ea7b..39a54e631bcd 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -650,6 +650,8 @@ enum priv_stack_mode {
>  	PRIV_STACK_ADAPTIVE,
>  };
>  
> +struct bpf_subprog_info;
> +
>  struct bpf_subprog_info {
>  	/* 'start' has to be the first field otherwise find_subprog() won't work */
>  	u32 start; /* insn idx of function entry point */
> @@ -677,6 +679,10 @@ struct bpf_subprog_info {
>  
>  	enum priv_stack_mode priv_stack_mode;
>  	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
> +
> +	int ret_insn;
> +	int frame;
> +	int cidx;
nit: can you add comment that cidx means caller index, i saw it below
and it clarifies this code a lot.
>  };
>  
>  struct bpf_verifier_env;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 1153a828ce8d..d362ddd47d71 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6652,9 +6652,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	struct bpf_insn *insn = env->prog->insnsi;
>  	int depth = 0, frame = 0, i, subprog_end, subprog_depth;
>  	bool tail_call_reachable = false;
> -	int ret_insn[MAX_CALL_FRAMES];
> -	int ret_prog[MAX_CALL_FRAMES];
> -	int j;
> +	int total;
> +	int tmp;
> +
> +	/* no caller idx */
> +	subprog[idx].cidx = -1;
>  
>  	i = subprog[idx].start;
>  	if (!priv_stack_supported)
> @@ -6706,8 +6708,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	} else {
>  		depth += subprog_depth;
>  		if (depth > MAX_BPF_STACK) {
> +			for (total = 1; subprog[idx].cidx >= 0 ; total++)
nit: I think the way this loop is implemented below is more readable:
for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx)
    total++;
it is confusing now, because for loop initializes and increments total,
but tracks cidx for exit check. 
> +				idx = subprog[idx].cidx;
> +
>  			verbose(env, "combined stack size of %d calls is %d. Too large\n",
> -				frame + 1, depth);
> +				total, depth);
>  			return -EACCES;
>  		}
>  	}
> @@ -6723,8 +6728,8 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  				continue;
>  			if (subprog[idx].is_cb)
>  				err = true;
> -			for (int c = 0; c < frame && !err; c++) {
> -				if (subprog[ret_prog[c]].is_cb) {
> +			for (tmp = idx; tmp >= 0 && !err; tmp = subprog[tmp].cidx) {
> +				if (subprog[tmp].is_cb) {
>  					err = true;
>  					break;
>  				}
> @@ -6740,8 +6745,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
>  			continue;
>  		/* remember insn and function to return to */
> -		ret_insn[frame] = i + 1;
> -		ret_prog[frame] = idx;
> +
> +		subprog[idx].frame = frame;
> +		subprog[idx].ret_insn = i + 1;
>  
>  		/* find the callee */
>  		next_insn = i + insn[i].imm + 1;
> @@ -6762,6 +6768,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  			}
>  		}
>  		i = next_insn;
> +
> +		/* caller idx */
> +		subprog[sidx].cidx = idx;
>  		idx = sidx;
>  		if (!priv_stack_supported)
>  			subprog[idx].priv_stack_mode = NO_PRIV_STACK;
> @@ -6769,7 +6778,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  		if (subprog[idx].has_tail_call)
>  			tail_call_reachable = true;
>  
> -		frame++;
> +		frame = subprog_is_global(env, idx) ? 0 : frame + 1;
>  		if (frame >= MAX_CALL_FRAMES) {
>  			verbose(env, "the call stack of %d frames is too deep !\n",
>  				frame);
> @@ -6783,12 +6792,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	 * tail call counter throughout bpf2bpf calls combined with tailcalls
>  	 */
>  	if (tail_call_reachable)
> -		for (j = 0; j < frame; j++) {
> -			if (subprog[ret_prog[j]].is_exception_cb) {
> +		for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx) {
> +			if (subprog[tmp].is_exception_cb) {
>  				verbose(env, "cannot tail call within exception cb\n");
>  				return -EINVAL;
>  			}
> -			subprog[ret_prog[j]].tail_call_reachable = true;
> +			subprog[tmp].tail_call_reachable = true;
>  		}
>  	if (subprog[0].tail_call_reachable)
>  		env->prog->aux->tail_call_reachable = true;
> @@ -6796,13 +6805,15 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>  	/* end of for() loop means the last insn of the 'subprog'
>  	 * was reached. Doesn't matter whether it was JA or EXIT
>  	 */
> -	if (frame == 0)
> +	if (frame == 0 && subprog[idx].cidx < 0)
>  		return 0;
>  	if (subprog[idx].priv_stack_mode != PRIV_STACK_ADAPTIVE)
>  		depth -= round_up_stack_depth(env, subprog[idx].stack_depth);
> -	frame--;
> -	i = ret_insn[frame];
> -	idx = ret_prog[frame];
> +
> +	idx = subprog[idx].cidx;
> +	frame = subprog[idx].frame;
> +	i = subprog[idx].ret_insn;
> +
>  	goto continue_func;
>  }
>  
> diff --git a/tools/testing/selftests/bpf/progs/test_global_func3.c b/tools/testing/selftests/bpf/progs/test_global_func3.c
> index 142b682d3c2f..974fd8c19561 100644
> --- a/tools/testing/selftests/bpf/progs/test_global_func3.c
> +++ b/tools/testing/selftests/bpf/progs/test_global_func3.c
> @@ -5,56 +5,56 @@
>  #include <bpf/bpf_helpers.h>
>  #include "bpf_misc.h"
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f1(struct __sk_buff *skb)
>  {
>  	return skb->len;
>  }
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f2(int val, struct __sk_buff *skb)
>  {
>  	return f1(skb) + val;
>  }
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f3(int val, struct __sk_buff *skb, int var)
>  {
>  	return f2(var, skb) + val;
>  }
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f4(struct __sk_buff *skb)
>  {
>  	return f3(1, skb, 2);
>  }
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f5(struct __sk_buff *skb)
>  {
>  	return f4(skb);
>  }
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f6(struct __sk_buff *skb)
>  {
>  	return f5(skb);
>  }
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f7(struct __sk_buff *skb)
>  {
>  	return f6(skb);
>  }
>  
> -__attribute__ ((noinline))
> +static __attribute__ ((noinline))
>  int f8(struct __sk_buff *skb)
>  {
>  	return f7(skb);
>  }
>  
>  SEC("tc")
> -__failure __msg("the call stack of 8 frames")
> +__failure __msg("the call stack of 9 frames")
>  int global_func3(struct __sk_buff *skb)
>  {
>  	return f8(skb);
> -- 
> 2.49.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-04 16:51   ` Mykyta Yatsenko
@ 2026-03-05 17:36     ` Emil Tsalapatis
  0 siblings, 0 replies; 12+ messages in thread
From: Emil Tsalapatis @ 2026-03-05 17:36 UTC (permalink / raw)
  To: Mykyta Yatsenko, bpf
  Cc: andrii, ast, daniel, eddyz87, martin.lau, memxor, song,
	yonghong.song

On Wed Mar 4, 2026 at 11:51 AM EST, Mykyta Yatsenko wrote:
> Emil Tsalapatis <emil@etsalapatis.com> writes:
>
>> The BPF verifier currently enforces a call stack depth of 8 frames,
>> regardless of the actual stack space consumption of those frames. The
>> limit is necessary for static call stacks, because the bookkeeping data
>> structures used by the verifier when stepping into static functions
>> during verification only support 8 stack frames. However, this
>> limitation only matters for static stack frames: Global subprogs are
>> verified by themselves and do not require limiting the call depth.
>>
>> Relax this limitation to only apply to static stack frames. Verification
>> now only fails when there is a sequence of 8 calls to non-global
>> subprogs. Calling into a global subprog resets the counter. This allows
>> deeper call stacks, provided all frames still fit in the stack.
>>
>> The change does not increase the maximum size of the call stack, only
>> the maximum number of frames we can place in it.
>>
>> Also change the progs/test_global_func3.c selftest to use static
>> functions, since with the new patch it would otherwise unexpectedly
>> pass verification.
>>
>> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
>> ---
>>  include/linux/bpf_verifier.h                  |  6 +++
>>  kernel/bpf/verifier.c                         | 43 ++++++++++++-------
>>  .../selftests/bpf/progs/test_global_func3.c   | 18 ++++----
>>  3 files changed, 42 insertions(+), 25 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index c1e30096ea7b..39a54e631bcd 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -650,6 +650,8 @@ enum priv_stack_mode {
>>  	PRIV_STACK_ADAPTIVE,
>>  };
>>  
>> +struct bpf_subprog_info;
>> +
>>  struct bpf_subprog_info {
>>  	/* 'start' has to be the first field otherwise find_subprog() won't work */
>>  	u32 start; /* insn idx of function entry point */
>> @@ -677,6 +679,10 @@ struct bpf_subprog_info {
>>  
>>  	enum priv_stack_mode priv_stack_mode;
>>  	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
>> +
>> +	int ret_insn;
>> +	int frame;
>> +	int cidx;
> nit: can you add comment that cidx means caller index, i saw it below
> and it clarifies this code a lot.
>>  };

Ack, I will add a comment on this.

>>  
>>  struct bpf_verifier_env;
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 1153a828ce8d..d362ddd47d71 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -6652,9 +6652,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	struct bpf_insn *insn = env->prog->insnsi;
>>  	int depth = 0, frame = 0, i, subprog_end, subprog_depth;
>>  	bool tail_call_reachable = false;
>> -	int ret_insn[MAX_CALL_FRAMES];
>> -	int ret_prog[MAX_CALL_FRAMES];
>> -	int j;
>> +	int total;
>> +	int tmp;
>> +
>> +	/* no caller idx */
>> +	subprog[idx].cidx = -1;
>>  
>>  	i = subprog[idx].start;
>>  	if (!priv_stack_supported)
>> @@ -6706,8 +6708,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	} else {
>>  		depth += subprog_depth;
>>  		if (depth > MAX_BPF_STACK) {
>> +			for (total = 1; subprog[idx].cidx >= 0 ; total++)
> nit: I think the way this loop is implemented below is more readable:
> for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx)
>     total++;
> it is confusing now, because for loop initializes and increments total,
> but tracks cidx for exit check. 

Ack, I'll update it in the respin to be more like the others.

>> +				idx = subprog[idx].cidx;
>> +
>>  			verbose(env, "combined stack size of %d calls is %d. Too large\n",
>> -				frame + 1, depth);
>> +				total, depth);
>>  			return -EACCES;
>>  		}
>>  	}
>> @@ -6723,8 +6728,8 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  				continue;
>>  			if (subprog[idx].is_cb)
>>  				err = true;
>> -			for (int c = 0; c < frame && !err; c++) {
>> -				if (subprog[ret_prog[c]].is_cb) {
>> +			for (tmp = idx; tmp >= 0 && !err; tmp = subprog[tmp].cidx) {
>> +				if (subprog[tmp].is_cb) {
>>  					err = true;
>>  					break;
>>  				}
>> @@ -6740,8 +6745,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
>>  			continue;
>>  		/* remember insn and function to return to */
>> -		ret_insn[frame] = i + 1;
>> -		ret_prog[frame] = idx;
>> +
>> +		subprog[idx].frame = frame;
>> +		subprog[idx].ret_insn = i + 1;
>>  
>>  		/* find the callee */
>>  		next_insn = i + insn[i].imm + 1;
>> @@ -6762,6 +6768,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  			}
>>  		}
>>  		i = next_insn;
>> +
>> +		/* caller idx */
>> +		subprog[sidx].cidx = idx;
>>  		idx = sidx;
>>  		if (!priv_stack_supported)
>>  			subprog[idx].priv_stack_mode = NO_PRIV_STACK;
>> @@ -6769,7 +6778,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  		if (subprog[idx].has_tail_call)
>>  			tail_call_reachable = true;
>>  
>> -		frame++;
>> +		frame = subprog_is_global(env, idx) ? 0 : frame + 1;
>>  		if (frame >= MAX_CALL_FRAMES) {
>>  			verbose(env, "the call stack of %d frames is too deep !\n",
>>  				frame);
>> @@ -6783,12 +6792,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	 * tail call counter throughout bpf2bpf calls combined with tailcalls
>>  	 */
>>  	if (tail_call_reachable)
>> -		for (j = 0; j < frame; j++) {
>> -			if (subprog[ret_prog[j]].is_exception_cb) {
>> +		for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx) {
>> +			if (subprog[tmp].is_exception_cb) {
>>  				verbose(env, "cannot tail call within exception cb\n");
>>  				return -EINVAL;
>>  			}
>> -			subprog[ret_prog[j]].tail_call_reachable = true;
>> +			subprog[tmp].tail_call_reachable = true;
>>  		}
>>  	if (subprog[0].tail_call_reachable)
>>  		env->prog->aux->tail_call_reachable = true;
>> @@ -6796,13 +6805,15 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	/* end of for() loop means the last insn of the 'subprog'
>>  	 * was reached. Doesn't matter whether it was JA or EXIT
>>  	 */
>> -	if (frame == 0)
>> +	if (frame == 0 && subprog[idx].cidx < 0)
>>  		return 0;
>>  	if (subprog[idx].priv_stack_mode != PRIV_STACK_ADAPTIVE)
>>  		depth -= round_up_stack_depth(env, subprog[idx].stack_depth);
>> -	frame--;
>> -	i = ret_insn[frame];
>> -	idx = ret_prog[frame];
>> +
>> +	idx = subprog[idx].cidx;
>> +	frame = subprog[idx].frame;
>> +	i = subprog[idx].ret_insn;
>> +
>>  	goto continue_func;
>>  }
>>  
>> diff --git a/tools/testing/selftests/bpf/progs/test_global_func3.c b/tools/testing/selftests/bpf/progs/test_global_func3.c
>> index 142b682d3c2f..974fd8c19561 100644
>> --- a/tools/testing/selftests/bpf/progs/test_global_func3.c
>> +++ b/tools/testing/selftests/bpf/progs/test_global_func3.c
>> @@ -5,56 +5,56 @@
>>  #include <bpf/bpf_helpers.h>
>>  #include "bpf_misc.h"
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f1(struct __sk_buff *skb)
>>  {
>>  	return skb->len;
>>  }
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f2(int val, struct __sk_buff *skb)
>>  {
>>  	return f1(skb) + val;
>>  }
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f3(int val, struct __sk_buff *skb, int var)
>>  {
>>  	return f2(var, skb) + val;
>>  }
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f4(struct __sk_buff *skb)
>>  {
>>  	return f3(1, skb, 2);
>>  }
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f5(struct __sk_buff *skb)
>>  {
>>  	return f4(skb);
>>  }
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f6(struct __sk_buff *skb)
>>  {
>>  	return f5(skb);
>>  }
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f7(struct __sk_buff *skb)
>>  {
>>  	return f6(skb);
>>  }
>>  
>> -__attribute__ ((noinline))
>> +static __attribute__ ((noinline))
>>  int f8(struct __sk_buff *skb)
>>  {
>>  	return f7(skb);
>>  }
>>  
>>  SEC("tc")
>> -__failure __msg("the call stack of 8 frames")
>> +__failure __msg("the call stack of 9 frames")
>>  int global_func3(struct __sk_buff *skb)
>>  {
>>  	return f8(skb);
>> -- 
>> 2.49.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf: Add deep call stack selftests
  2026-03-04  1:15   ` Eduard Zingerman
@ 2026-03-05 17:37     ` Emil Tsalapatis
  0 siblings, 0 replies; 12+ messages in thread
From: Emil Tsalapatis @ 2026-03-05 17:37 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: andrii, ast, daniel, martin.lau, memxor, song, yonghong.song

On Tue Mar 3, 2026 at 8:15 PM EST, Eduard Zingerman wrote:
> On Mon, 2026-03-02 at 23:31 -0500, Emil Tsalapatis wrote:
>> Add tests that demonstrate the verifier support for deep call stacks
>> while still enforcing maximum stack size limits.
>> 
>> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
>> ---
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
>> +/* Check we actually honor stack limits (33 * 16 = 528 > 512 = MAX_STACK_DEPTH). */
>
> Nit: I was wondering why stack usage per func is 16, as each one only
>      consumes one slot. Turns out round_up_stack_depth() rounds things
>      up to 16. Could you please either add a comment here, or just add
>      a second long in each function?
>

Ack, I will clarify why this is the case in a comment.

>> +SEC("syscall")
>> +__failure __msg("combined stack size of 34 calls")
>> +int global_func_deep_stack_fail(struct __sk_buff *skb)
>> +{
>> +	return f32(123);
>> +}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-04  1:01   ` Eduard Zingerman
@ 2026-03-05 19:38     ` Emil Tsalapatis
  2026-03-05 20:46       ` Eduard Zingerman
  0 siblings, 1 reply; 12+ messages in thread
From: Emil Tsalapatis @ 2026-03-05 19:38 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: andrii, ast, daniel, martin.lau, memxor, song, yonghong.song

On Tue Mar 3, 2026 at 8:01 PM EST, Eduard Zingerman wrote:
> On Mon, 2026-03-02 at 23:31 -0500, Emil Tsalapatis wrote:
>> The BPF verifier currently enforces a call stack depth of 8 frames,
>> regardless of the actual stack space consumption of those frames. The
>> limit is necessary for static call stacks, because the bookkeeping data
>> structures used by the verifier when stepping into static functions
>> during verification only support 8 stack frames. However, this
>> limitation only matters for static stack frames: Global subprogs are
>> verified by themselves and do not require limiting the call depth.
>> 
>> Relax this limitation to only apply to static stack frames. Verification
>> now only fails when there is a sequence of 8 calls to non-global
>> subprogs. Calling into a global subprog resets the counter. This allows
>> deeper call stacks, provided all frames still fit in the stack.
>> 
>> The change does not increase the maximum size of the call stack, only
>> the maximum number of frames we can place in it.
>> 
>> Also change the progs/test_global_func3.c selftest to use static
>> functions, since with the new patch it would otherwise unexpectedly
>> pass verification.
>> 
>> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
>> ---
>>  include/linux/bpf_verifier.h                  |  6 +++
>>  kernel/bpf/verifier.c                         | 43 ++++++++++++-------
>>  .../selftests/bpf/progs/test_global_func3.c   | 18 ++++----
>>  3 files changed, 42 insertions(+), 25 deletions(-)
>> 
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index c1e30096ea7b..39a54e631bcd 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -650,6 +650,8 @@ enum priv_stack_mode {
>>  	PRIV_STACK_ADAPTIVE,
>>  };
>>  
>> +struct bpf_subprog_info;
>> +
>>  struct bpf_subprog_info {
>>  	/* 'start' has to be the first field otherwise find_subprog() won't work */
>>  	u32 start; /* insn idx of function entry point */
>> @@ -677,6 +679,10 @@ struct bpf_subprog_info {
>>  
>>  	enum priv_stack_mode priv_stack_mode;
>>  	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
>> +
>
> Nit: please add a comment here saying the below are temporary values
>      used in check_max_stack_depth_subprog() (tbh, I'd move those to a
>      separate array in env but maybe that's an overkill).
>
>> +	int ret_insn;
>> +	int frame;
>> +	int cidx;
>
> Nit: please rename to `caller` and add a comment.
>
>>  };

Ack (and to all other nits).

>>  
>>  struct bpf_verifier_env;
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 1153a828ce8d..d362ddd47d71 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -6652,9 +6652,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	struct bpf_insn *insn = env->prog->insnsi;
>>  	int depth = 0, frame = 0, i, subprog_end, subprog_depth;
>>  	bool tail_call_reachable = false;
>> -	int ret_insn[MAX_CALL_FRAMES];
>> -	int ret_prog[MAX_CALL_FRAMES];
>> -	int j;
>> +	int total;
>> +	int tmp;
>> +
>> +	/* no caller idx */
>> +	subprog[idx].cidx = -1;
>>  
>>  	i = subprog[idx].start;
>>  	if (!priv_stack_supported)
>> @@ -6706,8 +6708,11 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	} else {
>>  		depth += subprog_depth;
>>  		if (depth > MAX_BPF_STACK) {
>> +			for (total = 1; subprog[idx].cidx >= 0 ; total++)
>> +				idx = subprog[idx].cidx;
>> +
>>  			verbose(env, "combined stack size of %d calls is %d. Too large\n",
>> -				frame + 1, depth);
>> +				total, depth);
>>  			return -EACCES;
>>  		}
>>  	}
>> @@ -6723,8 +6728,8 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  				continue;
>>  			if (subprog[idx].is_cb)
>>  				err = true;
>
> Below the old version of the loop does not include current frame, but
> the new version *does* include current frame. Looks like the above
> check can be removed.
>

Makes sense.

>> -			for (int c = 0; c < frame && !err; c++) {
>> -				if (subprog[ret_prog[c]].is_cb) {
>> +			for (tmp = idx; tmp >= 0 && !err; tmp = subprog[tmp].cidx) {
>> +				if (subprog[tmp].is_cb) {
>>  					err = true;
>>  					break;
>>  				}
>
> This code checks that bpf_throw() cannot be called from callbacks.
> The `is_cb` is set in push_callback_call() both for sync and async
> callbacks, it is also set for exception callback separately.
> Meaning that check_return_code() can be simplified further.
>

Do you mean we can simplify the check for choosing between
check_return_code and check_global_subprog_return_code to just check
is_cb instead of checking both is_async_cb and is_exception_cb?

>> @@ -6740,8 +6745,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
>>  			continue;
>>  		/* remember insn and function to return to */
>> -		ret_insn[frame] = i + 1;
>> -		ret_prog[frame] = idx;
>> +
>> +		subprog[idx].frame = frame;
>> +		subprog[idx].ret_insn = i + 1;
>
> Nit: move this down to `subprog[sidx].cidx = idx;`, so that the whole
>      "frame" setup is done in one place?
>
>>  
>>  		/* find the callee */
>>  		next_insn = i + insn[i].imm + 1;
>> @@ -6762,6 +6768,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  			}
>>  		}
>>  		i = next_insn;
>> +
>> +		/* caller idx */
>> +		subprog[sidx].cidx = idx;
>>  		idx = sidx;
>>  		if (!priv_stack_supported)
>>  			subprog[idx].priv_stack_mode = NO_PRIV_STACK;
>> @@ -6769,7 +6778,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  		if (subprog[idx].has_tail_call)
>>  			tail_call_reachable = true;
>>  
>> -		frame++;
>> +		frame = subprog_is_global(env, idx) ? 0 : frame + 1;
>>  		if (frame >= MAX_CALL_FRAMES) {
>>  			verbose(env, "the call stack of %d frames is too deep !\n",
>>  				frame);
>> @@ -6783,12 +6792,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	 * tail call counter throughout bpf2bpf calls combined with tailcalls
>>  	 */
>>  	if (tail_call_reachable)
>
> Unrelated to current patch, but looks like an issue.
> `tail_call_reachable` is not reset anywhere in
> check_max_stack_depth_subprog(). Once it flips to true all calls
> visited afterwards trigger spine to be marked.
>

You're right, this is never unset. I can send a followup patchset for
this along with the followups for the check_return_code patch. I believe
we can change this to eagerly set the subprog's tail_call_reachable flag
directly instead of using a stack variable. 

>> -		for (j = 0; j < frame; j++) {
>> -			if (subprog[ret_prog[j]].is_exception_cb) {
>> +		for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx) {
>> +			if (subprog[tmp].is_exception_cb) {
>
> As with previous such loop, the new code seem to include current frame.
> Does this change anything?
>

I think it is ok, though since as you pointed out tail_call_reachable doesn't 
currently seem to work correctly I can only assume: We set tail_call_reachable 
if any of the subprogs called by the current one make a tail call. If
that's the case, the current frame should also count as tail_call_reachable.

It all depends on where tail_call_reachable is supposed to be set and unset. I 
think it's supposed to flow from callee to caller, and if that's the
case we should hoist up the whole loop to where we initially set the
stack variable.

>>  				verbose(env, "cannot tail call within exception cb\n");
>>  				return -EINVAL;
>>  			}
>> -			subprog[ret_prog[j]].tail_call_reachable = true;
>> +			subprog[tmp].tail_call_reachable = true;
>>  		}
>>  	if (subprog[0].tail_call_reachable)
>>  		env->prog->aux->tail_call_reachable = true;
>> @@ -6796,13 +6805,15 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>  	/* end of for() loop means the last insn of the 'subprog'
>>  	 * was reached. Doesn't matter whether it was JA or EXIT
>>  	 */
>> -	if (frame == 0)
>> +	if (frame == 0 && subprog[idx].cidx < 0)
>>  		return 0;
>>  	if (subprog[idx].priv_stack_mode != PRIV_STACK_ADAPTIVE)
>>  		depth -= round_up_stack_depth(env, subprog[idx].stack_depth);
>> -	frame--;
>> -	i = ret_insn[frame];
>> -	idx = ret_prog[frame];
>> +
>> +	idx = subprog[idx].cidx;
>> +	frame = subprog[idx].frame;
>> +	i = subprog[idx].ret_insn;
>> +
>>  	goto continue_func;
>>  }
>>  
>
> [...]


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks
  2026-03-05 19:38     ` Emil Tsalapatis
@ 2026-03-05 20:46       ` Eduard Zingerman
  0 siblings, 0 replies; 12+ messages in thread
From: Eduard Zingerman @ 2026-03-05 20:46 UTC (permalink / raw)
  To: Emil Tsalapatis, bpf
  Cc: andrii, ast, daniel, martin.lau, memxor, song, yonghong.song

On Thu, 2026-03-05 at 14:38 -0500, Emil Tsalapatis wrote:

[...]

> > > @@ -6723,8 +6728,8 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
> > >  				continue;
> > >  			if (subprog[idx].is_cb)
> > >  				err = true;
> > 
> > Below the old version of the loop does not include current frame, but
> > the new version *does* include current frame. Looks like the above
> > check can be removed.
> > 
> 
> Makes sense.
> 
> > > -			for (int c = 0; c < frame && !err; c++) {
> > > -				if (subprog[ret_prog[c]].is_cb) {
> > > +			for (tmp = idx; tmp >= 0 && !err; tmp = subprog[tmp].cidx) {
> > > +				if (subprog[tmp].is_cb) {
> > >  					err = true;
> > >  					break;
> > >  				}
> > 
> > This code checks that bpf_throw() cannot be called from callbacks.
> > The `is_cb` is set in push_callback_call() both for sync and async
> > callbacks, it is also set for exception callback separately.
> > Meaning that check_return_code() can be simplified further.
> > 
> 
> Do you mean we can simplify the check for choosing between
> check_return_code and check_global_subprog_return_code to just check
> is_cb instead of checking both is_async_cb and is_exception_cb?

Actually, I was thinking about `frame->in_async_callback_fn` checks in
check_return_code(). But on a second thought these are unrelated.
Please skip my comment above.

[...]

> > > @@ -6783,12 +6792,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
> > >  	 * tail call counter throughout bpf2bpf calls combined with tailcalls
> > >  	 */
> > >  	if (tail_call_reachable)
> > 
> > Unrelated to current patch, but looks like an issue.
> > `tail_call_reachable` is not reset anywhere in
> > check_max_stack_depth_subprog(). Once it flips to true all calls
> > visited afterwards trigger spine to be marked.
> > 
> 
> You're right, this is never unset. I can send a followup patchset for
> this along with the followups for the check_return_code patch. I believe
> we can change this to eagerly set the subprog's tail_call_reachable flag
> directly instead of using a stack variable. 
> 
> > > -		for (j = 0; j < frame; j++) {
> > > -			if (subprog[ret_prog[j]].is_exception_cb) {
> > > +		for (tmp = idx; tmp >= 0; tmp = subprog[tmp].cidx) {
> > > +			if (subprog[tmp].is_exception_cb) {
> > 
> > As with previous such loop, the new code seem to include current frame.
> > Does this change anything?
> > 
> 
> I think it is ok, though since as you pointed out tail_call_reachable doesn't 
> currently seem to work correctly I can only assume: We set tail_call_reachable 
> if any of the subprogs called by the current one make a tail call. If
> that's the case, the current frame should also count as tail_call_reachable.
> 
> It all depends on where tail_call_reachable is supposed to be set and unset. I 
> think it's supposed to flow from callee to caller, and if that's the
> case we should hoist up the whole loop to where we initially set the
> stack variable.

So, the tail_call_reachable flag is transferred to subprog[*].tail_call_reachable
property. In check_subprogs() there is direct `subprog[cur_subprog].tail_call_reachable = true`
if tail call exists in the subprogram. Looks like behavior does not really change
with this patch.

> > >  				verbose(env, "cannot tail call within exception cb\n");
> > >  				return -EINVAL;
> > >  			}
> > > -			subprog[ret_prog[j]].tail_call_reachable = true;
> > > +			subprog[tmp].tail_call_reachable = true;
> > >  		}
> > >  	if (subprog[0].tail_call_reachable)
> > >  		env->prog->aux->tail_call_reachable = true;

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-05 20:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03  4:31 [PATCH bpf-next v3 0/2] bpf: Relax 8 frame limitation for global subprogs Emil Tsalapatis
2026-03-03  4:31 ` [PATCH bpf-next v3 1/2] bpf: Only enforce 8 frame call stack limit for all-static stacks Emil Tsalapatis
2026-03-03  5:13   ` bot+bpf-ci
2026-03-03 17:55     ` Emil Tsalapatis
2026-03-04  1:01   ` Eduard Zingerman
2026-03-05 19:38     ` Emil Tsalapatis
2026-03-05 20:46       ` Eduard Zingerman
2026-03-04 16:51   ` Mykyta Yatsenko
2026-03-05 17:36     ` Emil Tsalapatis
2026-03-03  4:31 ` [PATCH bpf-next v3 2/2] bpf: Add deep call stack selftests Emil Tsalapatis
2026-03-04  1:15   ` Eduard Zingerman
2026-03-05 17:37     ` Emil Tsalapatis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox