public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs
@ 2026-04-02  1:27 Yonghong Song
  2026-04-02  1:27 ` [PATCH bpf-next 01/10] bpf: Introduce bpf register BPF_REG_STACK_ARG_BASE Yonghong Song
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Currently, bpf function calls and kfunc's are limited by 5 reg-level
parameters. For function calls with more than 5 parameters,
developers can use always inlining or pass a struct pointer
after packing more parameters in that struct. But there is
no workaround for kfunc if more than 5 parameters is needed.

This patch set lifts the 5-argument limit by introducing stack-based
argument passing for BPF functions and kfunc's, coordinated with
compiler support in LLVM [1]. The compiler emits stores/loads through
a new bpf register r12 (BPF_REG_STACK_ARG_BASE) to pass arguments beyond
the 5th, keeping the stack arg area separate from the r10-based program
stack. The maximum number of arguments is capped at MAX_BPF_FUNC_ARGS
(12), which is sufficient for the vast majority of use cases.

The x86_64 JIT translates r12-relative accesses to RBP-relative
native instructions. Each function's stack allocation is extended
by (incoming + max_outgoing) bytes to hold the stack arg area below
the program stack. This makes implementation easier as the r10 can
be reused for stack argument access. At BPF-to-BPF call sites, outgoing
args are pushed onto the native stack before CALL and popped after
return. For kfunc calls, args are marshaled per the x86_64 C calling
convention (arg 6 in R9, args 7+ on the native stack).

Global subprogs with >5 args are not yet supported. Only x86_64
is supported for now.

For the rest of patches, patches 1-5 added verifier support of
stack arguments for bpf-to-bpf functions and kfunc's. Patch 6
enables x86_64 for stack arguments. Patch 7 implemented JIT for
x86_64. Patches 8-10 are some selftests.

  [1] https://github.com/llvm/llvm-project/pull/189060 

Yonghong Song (10):
  bpf: Introduce bpf register BPF_REG_STACK_ARG_BASE
  bpf: Reuse MAX_BPF_FUNC_ARGS for maximum number of arguments
  bpf: Support stack arguments for bpf functions
  bpf: Support stack arguments for kfunc calls
  bpf: Reject stack arguments in non-JITed programs
  bpf: Enable stack argument support for x86_64
  bpf,x86: Implement JIT support for stack arguments
  selftests/bpf: Add tests for BPF function stack arguments
  selftests/bpf: Add negative test for oversized kfunc stack argument
  selftests/bpf: Add verifier tests for stack argument validation

 arch/x86/net/bpf_jit_comp.c                   | 150 +++++++-
 include/linux/bpf.h                           |   6 +
 include/linux/bpf_verifier.h                  |  15 +-
 include/linux/filter.h                        |   4 +-
 kernel/bpf/btf.c                              |  21 +-
 kernel/bpf/core.c                             |  12 +-
 kernel/bpf/verifier.c                         | 351 ++++++++++++++++--
 .../selftests/bpf/prog_tests/stack_arg.c      | 143 +++++++
 .../selftests/bpf/prog_tests/stack_arg_fail.c |  24 ++
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 tools/testing/selftests/bpf/progs/stack_arg.c | 111 ++++++
 .../selftests/bpf/progs/stack_arg_fail.c      |  32 ++
 .../selftests/bpf/progs/stack_arg_kfunc.c     |  59 +++
 .../selftests/bpf/progs/verifier_stack_arg.c  | 122 ++++++
 .../selftests/bpf/test_kmods/bpf_testmod.c    |  29 ++
 .../bpf/test_kmods/bpf_testmod_kfunc.h        |  14 +
 16 files changed, 1045 insertions(+), 50 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/stack_arg.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/stack_arg_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/stack_arg.c
 create mode 100644 tools/testing/selftests/bpf/progs/stack_arg_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/stack_arg_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_stack_arg.c

-- 
2.52.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 01/10] bpf: Introduce bpf register BPF_REG_STACK_ARG_BASE
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
@ 2026-04-02  1:27 ` Yonghong Song
  2026-04-02  1:27 ` [PATCH bpf-next 02/10] bpf: Reuse MAX_BPF_FUNC_ARGS for maximum number of arguments Yonghong Song
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

The newly-added register BPF_REG_STACK_ARG_BASE corresponds to bpf
register R12 added by [1]. R12 is used as the base for stack arguments
so it won't mess out R10 based stacks.

  [1] https://github.com/llvm/llvm-project/pull/189060

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/filter.h | 3 ++-
 kernel/bpf/core.c      | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index e40d4071a345..68f018dd4b9c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -59,7 +59,8 @@ struct ctl_table_header;
 
 /* Kernel hidden auxiliary/helper register. */
 #define BPF_REG_AX		MAX_BPF_REG
-#define MAX_BPF_EXT_REG		(MAX_BPF_REG + 1)
+#define BPF_REG_STACK_ARG_BASE	(MAX_BPF_REG + 1)
+#define MAX_BPF_EXT_REG		(MAX_BPF_REG + 2)
 #define MAX_BPF_JIT_REG		MAX_BPF_EXT_REG
 
 /* unused opcode to mark special call to bpf_tail_call() helper */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1af5fb3f21d9..3520337a1c0e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1299,8 +1299,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
 	u32 imm_rnd = get_random_u32();
 	s16 off;
 
-	BUILD_BUG_ON(BPF_REG_AX  + 1 != MAX_BPF_JIT_REG);
-	BUILD_BUG_ON(MAX_BPF_REG + 1 != MAX_BPF_JIT_REG);
+	BUILD_BUG_ON(BPF_REG_AX + 2 != MAX_BPF_JIT_REG);
+	BUILD_BUG_ON(BPF_REG_STACK_ARG_BASE + 1 != MAX_BPF_JIT_REG);
 
 	/* Constraints on AX register:
 	 *
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 02/10] bpf: Reuse MAX_BPF_FUNC_ARGS for maximum number of arguments
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
  2026-04-02  1:27 ` [PATCH bpf-next 01/10] bpf: Introduce bpf register BPF_REG_STACK_ARG_BASE Yonghong Song
@ 2026-04-02  1:27 ` Yonghong Song
  2026-04-02  1:27 ` [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions Yonghong Song
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Currently, MAX_BPF_FUNC_ARGS is used for tracepoint related progs where
the number of parameters cannot exceed MAX_BPF_FUNC_ARGS.

Here, MAX_BPF_FUNC_ARGS is reused to set a limit of the number of arguments
for bpf functions and kfunc's. The current value for MAX_BPF_FUNC_ARGS
is 12 which should be sufficient for majority of bpf functions and
kfunc's.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 05b34a6355b0..e24c4a2e95f7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1151,6 +1151,10 @@ struct bpf_prog_offload {
 
 /* The longest tracepoint has 12 args.
  * See include/trace/bpf_probe.h
+ *
+ * Also reuse this macro for maximum number of arguments a BPF function
+ * or a kfunc can have. Args 1-5 are passed in registers, args 6-12 via
+ * stack arg slots.
  */
 #define MAX_BPF_FUNC_ARGS 12
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
  2026-04-02  1:27 ` [PATCH bpf-next 01/10] bpf: Introduce bpf register BPF_REG_STACK_ARG_BASE Yonghong Song
  2026-04-02  1:27 ` [PATCH bpf-next 02/10] bpf: Reuse MAX_BPF_FUNC_ARGS for maximum number of arguments Yonghong Song
@ 2026-04-02  1:27 ` Yonghong Song
  2026-04-02  3:18   ` bot+bpf-ci
                     ` (3 more replies)
  2026-04-02  1:27 ` [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls Yonghong Song
                   ` (6 subsequent siblings)
  9 siblings, 4 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Currently BPF functions (subprogs) are limited to 5 register arguments.
With [1], the compiler can emit code that passes additional arguments
via a dedicated stack area through bpf register
BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.

The following is an example to show how stack arguments are saved
and transferred between caller and callee:

  int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
    ...
    bar(a1, a2, a3, a4, a5, a6, a7, a8);
    ...
  }

The following is a illustration of stack allocation:

   Caller (foo)                           Callee (bar)
   ============                           ============
   r12-relative stack arg area:           r12-relative stack arg area:

   r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
   r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
                                     ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
   ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
   r12-24: [outgoing arg 6 to callee]+||   ...
   r12-32: [outgoing arg 7 to callee]-+|
   r12-40: [outgoing arg 8 to callee]--+

  The caller writes outgoing args past its own incoming area.
  At the call site, the verifier transfers the caller's outgoing
  slots into the callee's incoming slots.

The verifier tracks stack arg slots separately from the regular r10
stack. A new 'bpf_stack_arg_state' structure mirrors the existing stack
slot tracking (spilled_ptr + slot_type[]) but lives in a dedicated
'stack_arg_slots' array in bpf_func_state. This separation keeps the
stack arg area from interfering with the normal stack and frame pointer
(r10) bookkeeping.

If the bpf function has more than one calls, e.g.,

  int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
    ...
    bar1(a1, a2, a3, a4, a5, a6, a7, a8);
    ...
    bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
    ...
  }

The following is an illustration:

   Caller (foo)                           Callee (bar1)
   ============                           =============
   r12-relative stack arg area:           r12-relative stack arg area:

   r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
   r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
                                     ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
   ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
   r12-24: [outgoing arg 6 to callee]+||  ...
   r12-32: [outgoing arg 7 to callee]-+|
   r12-40: [outgoing arg 8 to callee]--+
   ...
   Back from bar1
   ...                                     Callee (bar2)
   ===                                     =============
                                     +---> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
                                     |+--> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
                                     ||+-> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
                                     |||+> r12-32: [incoming arg 9] (from caller's outgoing r12-48)
   ---- incoming/outgoing boundary   ||||  ---- incoming/outgoing boundary
   r12-24: [outgoing arg 6 to callee]+|||  ...
   r12-32: [outgoing arg 7 to callee]-+||
   r12-40: [outgoing arg 8 to callee]--+|
   r12-48: [outgoing arg 9 to callee]---+

Global subprogs with >5 args are not yet supported.

  [1] https://github.com/llvm/llvm-project/pull/189060

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h          |   2 +
 include/linux/bpf_verifier.h |  15 ++-
 kernel/bpf/btf.c             |  14 +-
 kernel/bpf/verifier.c        | 248 ++++++++++++++++++++++++++++++++---
 4 files changed, 257 insertions(+), 22 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e24c4a2e95f7..a0a1e14e4394 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1666,6 +1666,8 @@ struct bpf_prog_aux {
 	u32 max_pkt_offset;
 	u32 max_tp_access;
 	u32 stack_depth;
+	u16 incoming_stack_arg_depth;
+	u16 stack_arg_depth; /* both incoming and max outgoing of stack arguments */
 	u32 id;
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
 	u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 090aa26d1c98..a260610cd1c1 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -268,6 +268,11 @@ struct bpf_retval_range {
 	bool return_32bit;
 };
 
+struct bpf_stack_arg_state {
+	struct bpf_reg_state spilled_ptr; /* for spilled scalar/pointer semantics */
+	u8 slot_type[BPF_REG_SIZE];
+};
+
 /* state of the program:
  * type of all registers and stack info
  */
@@ -319,6 +324,10 @@ struct bpf_func_state {
 	 * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE.
 	 */
 	int allocated_stack;
+
+	u16 stack_arg_depth; /* Size of incoming + max outgoing stack args in bytes. */
+	u16 incoming_stack_arg_depth; /* Size of incoming stack args in bytes. */
+	struct bpf_stack_arg_state *stack_arg_slots;
 };
 
 #define MAX_CALL_FRAMES 8
@@ -674,10 +683,12 @@ struct bpf_subprog_info {
 	bool keep_fastcall_stack: 1;
 	bool changes_pkt_data: 1;
 	bool might_sleep: 1;
-	u8 arg_cnt:3;
+	u8 arg_cnt:4;
 
 	enum priv_stack_mode priv_stack_mode;
-	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
+	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS];
+	u16 incoming_stack_arg_depth;
+	u16 outgoing_stack_arg_depth;
 };
 
 struct bpf_verifier_env;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index a62d78581207..c5f3aa05d5a3 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7887,13 +7887,19 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
 	}
 	args = (const struct btf_param *)(t + 1);
 	nargs = btf_type_vlen(t);
-	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
-		if (!is_global)
-			return -EINVAL;
-		bpf_log(log, "Global function %s() with %d > %d args. Buggy compiler.\n",
+	if (nargs > MAX_BPF_FUNC_ARGS) {
+		bpf_log(log, "Function %s() with %d > %d args not supported.\n",
+			tname, nargs, MAX_BPF_FUNC_ARGS);
+		return -EINVAL;
+	}
+	if (is_global && nargs > MAX_BPF_FUNC_REG_ARGS) {
+		bpf_log(log, "Global function %s() with %d > %d args not supported.\n",
 			tname, nargs, MAX_BPF_FUNC_REG_ARGS);
 		return -EINVAL;
 	}
+	if (nargs > MAX_BPF_FUNC_REG_ARGS)
+		sub->incoming_stack_arg_depth = (nargs - MAX_BPF_FUNC_REG_ARGS) * BPF_REG_SIZE;
+
 	/* check that function is void or returns int, exception cb also requires this */
 	t = btf_type_by_id(btf, t->type);
 	while (btf_type_is_modifier(t))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8c1cf2eb6cbb..d424fe611ef8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1488,6 +1488,19 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
 		return -ENOMEM;
 
 	dst->allocated_stack = src->allocated_stack;
+
+	/* copy stack_arg_slots state */
+	n = src->stack_arg_depth / BPF_REG_SIZE;
+	if (n) {
+		dst->stack_arg_slots = copy_array(dst->stack_arg_slots, src->stack_arg_slots, n,
+						  sizeof(struct bpf_stack_arg_state),
+						  GFP_KERNEL_ACCOUNT);
+		if (!dst->stack_arg_slots)
+			return -ENOMEM;
+
+		dst->stack_arg_depth = src->stack_arg_depth;
+		dst->incoming_stack_arg_depth = src->incoming_stack_arg_depth;
+	}
 	return 0;
 }
 
@@ -1529,6 +1542,25 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
 	return 0;
 }
 
+static int grow_stack_arg_slots(struct bpf_verifier_env *env,
+				struct bpf_func_state *state, int size)
+{
+	size_t old_n = state->stack_arg_depth / BPF_REG_SIZE, n;
+
+	size = round_up(size, BPF_REG_SIZE);
+	n = size / BPF_REG_SIZE;
+	if (old_n >= n)
+		return 0;
+
+	state->stack_arg_slots = realloc_array(state->stack_arg_slots, old_n, n,
+					       sizeof(struct bpf_stack_arg_state));
+	if (!state->stack_arg_slots)
+		return -ENOMEM;
+
+	state->stack_arg_depth = size;
+	return 0;
+}
+
 /* Acquire a pointer id from the env and update the state->refs to include
  * this new pointer reference.
  * On success, returns a valid pointer id to associate with the register
@@ -1699,6 +1731,7 @@ static void free_func_state(struct bpf_func_state *state)
 {
 	if (!state)
 		return;
+	kfree(state->stack_arg_slots);
 	kfree(state->stack);
 	kfree(state);
 }
@@ -5848,6 +5881,101 @@ static int check_stack_write(struct bpf_verifier_env *env,
 	return err;
 }
 
+/* Validate that a stack arg access is 8-byte sized and aligned. */
+static int check_stack_arg_access(struct bpf_verifier_env *env,
+				  struct bpf_insn *insn, const char *op)
+{
+	int size = bpf_size_to_bytes(BPF_SIZE(insn->code));
+
+	if (size != BPF_REG_SIZE) {
+		verbose(env, "stack arg %s must be %d bytes, got %d\n",
+			op, BPF_REG_SIZE, size);
+		return -EINVAL;
+	}
+	if (insn->off % BPF_REG_SIZE) {
+		verbose(env, "stack arg %s offset %d not aligned to %d\n",
+			op, insn->off, BPF_REG_SIZE);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* Check that a stack arg slot has been properly initialized. */
+static bool is_stack_arg_slot_initialized(struct bpf_func_state *state, int spi)
+{
+	u8 type;
+
+	if (spi >= (int)(state->stack_arg_depth / BPF_REG_SIZE))
+		return false;
+	type = state->stack_arg_slots[spi].slot_type[BPF_REG_SIZE - 1];
+	return type == STACK_SPILL || type == STACK_MISC;
+}
+
+/*
+ * Write a value to the stack arg area.
+ * off is the negative offset from the stack arg frame pointer.
+ * Callers ensures off is 8-byte aligned and size is BPF_REG_SIZE.
+ */
+static int check_stack_arg_write(struct bpf_verifier_env *env, struct bpf_func_state *state,
+				 int off, int value_regno)
+{
+	int spi = (-off - 1) / BPF_REG_SIZE;
+	struct bpf_func_state *cur;
+	struct bpf_reg_state *reg;
+	int i, err;
+	u8 type;
+
+	err = grow_stack_arg_slots(env, state, -off);
+	if (err)
+		return err;
+
+	cur = env->cur_state->frame[env->cur_state->curframe];
+	if (value_regno >= 0) {
+		reg = &cur->regs[value_regno];
+		state->stack_arg_slots[spi].spilled_ptr = *reg;
+		type = is_spillable_regtype(reg->type) ? STACK_SPILL : STACK_MISC;
+		for (i = 0; i < BPF_REG_SIZE; i++)
+			state->stack_arg_slots[spi].slot_type[i] = type;
+	} else {
+		/* BPF_ST: store immediate, treat as scalar */
+		reg = &state->stack_arg_slots[spi].spilled_ptr;
+		reg->type = SCALAR_VALUE;
+		__mark_reg_known(reg, (u32)env->prog->insnsi[env->insn_idx].imm);
+		for (i = 0; i < BPF_REG_SIZE; i++)
+			state->stack_arg_slots[spi].slot_type[i] = STACK_MISC;
+	}
+	return 0;
+}
+
+/*
+ * Read a value from the stack arg area.
+ * off is the negative offset from the stack arg frame pointer.
+ * Callers ensures off is 8-byte aligned and size is BPF_REG_SIZE.
+ */
+static int check_stack_arg_read(struct bpf_verifier_env *env, struct bpf_func_state *state,
+				int off, int dst_regno)
+{
+	int spi = (-off - 1) / BPF_REG_SIZE;
+	struct bpf_func_state *cur;
+	u8 *stype;
+
+	if (-off > state->stack_arg_depth) {
+		verbose(env, "invalid read from stack arg off %d depth %d\n",
+			off, state->stack_arg_depth);
+		return -EACCES;
+	}
+
+	stype = state->stack_arg_slots[spi].slot_type;
+	cur = env->cur_state->frame[env->cur_state->curframe];
+
+	if (stype[BPF_REG_SIZE - 1] == STACK_SPILL)
+		copy_register_state(&cur->regs[dst_regno],
+				    &state->stack_arg_slots[spi].spilled_ptr);
+	else
+		mark_reg_unknown(env, cur->regs, dst_regno);
+	return 0;
+}
+
 static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
 				 int off, int size, enum bpf_access_type type)
 {
@@ -8022,10 +8150,23 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			  bool strict_alignment_once, bool is_ldsx,
 			  bool allow_trust_mismatch, const char *ctx)
 {
+	struct bpf_verifier_state *vstate = env->cur_state;
+	struct bpf_func_state *state = vstate->frame[vstate->curframe];
 	struct bpf_reg_state *regs = cur_regs(env);
 	enum bpf_reg_type src_reg_type;
 	int err;
 
+	/* Handle stack arg access */
+	if (insn->src_reg == BPF_REG_STACK_ARG_BASE) {
+		err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
+		if (err)
+			return err;
+		err = check_stack_arg_access(env, insn, "read");
+		if (err)
+			return err;
+		return check_stack_arg_read(env, state, insn->off, insn->dst_reg);
+	}
+
 	/* check src operand */
 	err = check_reg_arg(env, insn->src_reg, SRC_OP);
 	if (err)
@@ -8054,10 +8195,23 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
 static int check_store_reg(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			   bool strict_alignment_once)
 {
+	struct bpf_verifier_state *vstate = env->cur_state;
+	struct bpf_func_state *state = vstate->frame[vstate->curframe];
 	struct bpf_reg_state *regs = cur_regs(env);
 	enum bpf_reg_type dst_reg_type;
 	int err;
 
+	/* Handle stack arg write */
+	if (insn->dst_reg == BPF_REG_STACK_ARG_BASE) {
+		err = check_reg_arg(env, insn->src_reg, SRC_OP);
+		if (err)
+			return err;
+		err = check_stack_arg_access(env, insn, "write");
+		if (err)
+			return err;
+		return check_stack_arg_write(env, state, insn->off, insn->src_reg);
+	}
+
 	/* check src1 operand */
 	err = check_reg_arg(env, insn->src_reg, SRC_OP);
 	if (err)
@@ -10940,8 +11094,10 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			   int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_subprog_info *caller_info;
 	struct bpf_func_state *caller;
 	int err, subprog, target_insn;
+	u16 callee_incoming;
 
 	target_insn = *insn_idx + insn->imm + 1;
 	subprog = find_subprog(env, target_insn);
@@ -10993,6 +11149,15 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return 0;
 	}
 
+	/*
+	 * Track caller's outgoing stack arg depth (max across all callees).
+	 * This is needed so the JIT knows how much stack arg space to allocate.
+	 */
+	caller_info = &env->subprog_info[caller->subprogno];
+	callee_incoming = env->subprog_info[subprog].incoming_stack_arg_depth;
+	if (callee_incoming > caller_info->outgoing_stack_arg_depth)
+		caller_info->outgoing_stack_arg_depth = callee_incoming;
+
 	/* for regular function entry setup new frame and continue
 	 * from that frame.
 	 */
@@ -11048,13 +11213,41 @@ static int set_callee_state(struct bpf_verifier_env *env,
 			    struct bpf_func_state *caller,
 			    struct bpf_func_state *callee, int insn_idx)
 {
-	int i;
+	struct bpf_subprog_info *callee_info;
+	int i, err;
 
 	/* copy r1 - r5 args that callee can access.  The copy includes parent
 	 * pointers, which connects us up to the liveness chain
 	 */
 	for (i = BPF_REG_1; i <= BPF_REG_5; i++)
 		callee->regs[i] = caller->regs[i];
+
+	/*
+	 * Transfer stack args from caller's outgoing area to callee's incoming area.
+	 * Caller wrote outgoing args at offsets '-(incoming + 8)', '-(incoming + 16)', ...
+	 * These outgoing args will go to callee's incoming area.
+	 */
+	callee_info = &env->subprog_info[callee->subprogno];
+	if (callee_info->incoming_stack_arg_depth) {
+		int caller_incoming_slots = caller->incoming_stack_arg_depth / BPF_REG_SIZE;
+		int callee_incoming_slots = callee_info->incoming_stack_arg_depth / BPF_REG_SIZE;
+
+		callee->incoming_stack_arg_depth = callee_info->incoming_stack_arg_depth;
+		err = grow_stack_arg_slots(env, callee, callee_info->incoming_stack_arg_depth);
+		if (err)
+			return err;
+
+		for (i = 0; i < callee_incoming_slots; i++) {
+			int caller_spi = i + caller_incoming_slots;
+
+			if (!is_stack_arg_slot_initialized(caller, caller_spi)) {
+				verbose(env, "stack arg#%d not properly initialized\n",
+					i + 1 + MAX_BPF_FUNC_REG_ARGS);
+				return -EINVAL;
+			}
+			callee->stack_arg_slots[i] = caller->stack_arg_slots[caller_spi];
+		}
+	}
 	return 0;
 }
 
@@ -21262,23 +21455,37 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
 			verbose(env, "BPF_ST uses reserved fields\n");
 			return -EINVAL;
 		}
-		/* check src operand */
-		err = check_reg_arg(env, insn->dst_reg, SRC_OP);
-		if (err)
-			return err;
 
-		dst_reg_type = cur_regs(env)[insn->dst_reg].type;
+		/* Handle stack arg write (store immediate) */
+		if (insn->dst_reg == BPF_REG_STACK_ARG_BASE) {
+			struct bpf_verifier_state *vstate = env->cur_state;
+			struct bpf_func_state *state = vstate->frame[vstate->curframe];
 
-		/* check that memory (dst_reg + off) is writeable */
-		err = check_mem_access(env, env->insn_idx, insn->dst_reg,
-				       insn->off, BPF_SIZE(insn->code),
-				       BPF_WRITE, -1, false, false);
-		if (err)
-			return err;
+			err = check_stack_arg_access(env, insn, "write");
+			if (err)
+				return err;
+			err = check_stack_arg_write(env, state, insn->off, -1);
+			if (err)
+				return err;
+		} else {
+			/* check src operand */
+			err = check_reg_arg(env, insn->dst_reg, SRC_OP);
+			if (err)
+				return err;
 
-		err = save_aux_ptr_type(env, dst_reg_type, false);
-		if (err)
-			return err;
+			dst_reg_type = cur_regs(env)[insn->dst_reg].type;
+
+			/* check that memory (dst_reg + off) is writeable */
+			err = check_mem_access(env, env->insn_idx, insn->dst_reg,
+					       insn->off, BPF_SIZE(insn->code),
+					       BPF_WRITE, -1, false, false);
+			if (err)
+				return err;
+
+			err = save_aux_ptr_type(env, dst_reg_type, false);
+			if (err)
+				return err;
+		}
 	} else if (class == BPF_JMP || class == BPF_JMP32) {
 		u8 opcode = BPF_OP(insn->code);
 
@@ -22974,8 +23181,14 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	int err, num_exentries;
 	int old_len, subprog_start_adjustment = 0;
 
-	if (env->subprog_cnt <= 1)
+	if (env->subprog_cnt <= 1) {
+		/*
+		 * Even without subprogs, kfunc calls with >5 args need stack arg space
+		 * allocated by the root program.
+		 */
+		prog->aux->stack_arg_depth = env->subprog_info[0].outgoing_stack_arg_depth;
 		return 0;
+	}
 
 	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
 		if (!bpf_pseudo_func(insn) && !bpf_pseudo_call(insn))
@@ -23065,6 +23278,9 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 
 		func[i]->aux->name[0] = 'F';
 		func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
+		func[i]->aux->incoming_stack_arg_depth = env->subprog_info[i].incoming_stack_arg_depth;
+		func[i]->aux->stack_arg_depth = env->subprog_info[i].incoming_stack_arg_depth +
+						env->subprog_info[i].outgoing_stack_arg_depth;
 		if (env->subprog_info[i].priv_stack_mode == PRIV_STACK_ADAPTIVE)
 			func[i]->aux->jits_use_priv_stack = true;
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
                   ` (2 preceding siblings ...)
  2026-04-02  1:27 ` [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions Yonghong Song
@ 2026-04-02  1:27 ` Yonghong Song
  2026-04-02  3:18   ` bot+bpf-ci
  2026-04-02 21:02   ` Amery Hung
  2026-04-02  1:27 ` [PATCH bpf-next 05/10] bpf: Reject stack arguments in non-JITed programs Yonghong Song
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Extend the stack argument mechanism to kfunc calls, allowing kfuncs
with more than 5 parameters to receive additional arguments via the
r12-based stack arg area.

For kfuncs, the caller is a BPF program and the callee is a kernel
function. The BPF program writes outgoing args at r12-relative offsets
past its own incoming area.

The following is an example to show how stack arguments are saved:

   int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
     ...
     kfunc1(a1, a2, a3, a4, a5, a6, a7, a8);
     ...
     kfunc2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
     ...
   }

The following is an illustration:

   Caller (foo)
   ============
       r12-relative stack arg area:

       r12-8:  [incoming arg 6]
       r12-16: [incoming arg 7]

       ---- incoming/outgoing boundary (kfunc1)
       r12-24: [outgoing arg 6 to callee]
       r12-32: [outgoing arg 7 to callee]
       r12-40: [outgoing arg 8 to callee]
       ...
       Back from kfunc1
       ...

       ---- incoming/outgoing boundary
       r12-24: [outgoing arg 6 to callee]
       r12-32: [outgoing arg 7 to callee]
       r12-40: [outgoing arg 8 to callee]
       r12-48: [outgoing arg 9 to callee]

Later JIT will marshal outgoing arguments to the native calling convention
for kfunc1() and kfunc2().

In check_kfunc_args(), for args beyond the 5th, retrieve the spilled
register state from the caller's stack arg slots. Temporarily copy
it into regs[BPF_REG_1] to reuse the existing type checking
infrastructure, then restore after checking. Also in fixup_kfunc_call(),
repurpose insn->off (no longer needed after kfunc address resolution)
to store the number of stack args, so the JIT knows how many args to marshal.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c | 97 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 80 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d424fe611ef8..6579156486b8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3502,7 +3502,7 @@ static int add_kfunc_call(struct bpf_verifier_env *env, u32 func_id, s16 offset)
 	struct bpf_kfunc_meta kfunc;
 	struct bpf_kfunc_desc *desc;
 	unsigned long addr;
-	int err;
+	int i, err;
 
 	prog_aux = env->prog->aux;
 	tab = prog_aux->kfunc_tab;
@@ -3578,6 +3578,14 @@ static int add_kfunc_call(struct bpf_verifier_env *env, u32 func_id, s16 offset)
 	if (err)
 		return err;
 
+	for (i = MAX_BPF_FUNC_REG_ARGS; i < func_model.nr_args; i++) {
+		if (func_model.arg_size[i] > sizeof(u64)) {
+			verbose(env, "kfunc %s arg#%d size %d > %zu not supported for stack args\n",
+				kfunc.name, i, func_model.arg_size[i], sizeof(u64));
+			return -EINVAL;
+		}
+	}
+
 	desc = &tab->descs[tab->nr_descs++];
 	desc->func_id = func_id;
 	desc->offset = offset;
@@ -12995,9 +13003,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 		       struct bpf_kfunc_call_arg_meta *meta,
 		       const struct btf_type *t, const struct btf_type *ref_t,
 		       const char *ref_tname, const struct btf_param *args,
-		       int argno, int nargs)
+		       int argno, int nargs, u32 regno)
 {
-	u32 regno = argno + 1;
 	struct bpf_reg_state *regs = cur_regs(env);
 	struct bpf_reg_state *reg = &regs[regno];
 	bool arg_mem_size = false;
@@ -13677,9 +13684,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 
 	args = (const struct btf_param *)(meta->func_proto + 1);
 	nargs = btf_type_vlen(meta->func_proto);
-	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
+	if (nargs > MAX_BPF_FUNC_ARGS) {
 		verbose(env, "Function %s has %d > %d args\n", func_name, nargs,
-			MAX_BPF_FUNC_REG_ARGS);
+			MAX_BPF_FUNC_ARGS);
 		return -EINVAL;
 	}
 
@@ -13687,13 +13694,41 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	 * verifier sees.
 	 */
 	for (i = 0; i < nargs; i++) {
-		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
+		struct bpf_reg_state *regs = cur_regs(env), *reg;
+		struct bpf_reg_state saved_reg;
 		const struct btf_type *t, *ref_t, *resolve_ret;
 		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1, ref_id, type_size;
 		bool is_ret_buf_sz = false;
+		bool is_stack_arg = false;
 		int kf_arg_type;
 
+		if (i < MAX_BPF_FUNC_REG_ARGS) {
+			reg = &regs[i + 1];
+		} else {
+			/*
+			 * Retrieve the spilled reg state from the stack arg slot.
+			 * Reuse the existing type checking infrastructure which
+			 * reads from cur_regs(env)[regno], temporarily copy the
+			 * stack arg reg state into regs[BPF_REG_1] and restore
+			 * it after checking.
+			 */
+			struct bpf_func_state *caller = cur_func(env);
+			int spi = caller->incoming_stack_arg_depth / BPF_REG_SIZE +
+				  (i - MAX_BPF_FUNC_REG_ARGS);
+
+			if (!is_stack_arg_slot_initialized(caller, spi)) {
+				verbose(env, "stack arg#%d not properly initialized\n", i);
+				return -EINVAL;
+			}
+
+			is_stack_arg = true;
+			regno = BPF_REG_1;
+			saved_reg = regs[BPF_REG_1];
+			regs[BPF_REG_1] = caller->stack_arg_slots[spi].spilled_ptr;
+			reg = &regs[BPF_REG_1];
+		}
+
 		if (is_kfunc_arg_prog_aux(btf, &args[i])) {
 			/* Reject repeated use bpf_prog_aux */
 			if (meta->arg_prog) {
@@ -13702,7 +13737,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			}
 			meta->arg_prog = true;
 			cur_aux(env)->arg_prog = regno;
-			continue;
+			goto next_arg;
 		}
 
 		if (is_kfunc_arg_ignore(btf, &args[i]) || is_kfunc_arg_implicit(meta, i))
@@ -13725,9 +13760,11 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 					verbose(env, "R%d must be a known constant\n", regno);
 					return -EINVAL;
 				}
-				ret = mark_chain_precision(env, regno);
-				if (ret < 0)
-					return ret;
+				if (i < MAX_BPF_FUNC_REG_ARGS) {
+					ret = mark_chain_precision(env, regno);
+					if (ret < 0)
+						return ret;
+				}
 				meta->arg_constant.found = true;
 				meta->arg_constant.value = reg->var_off.value;
 			} else if (is_kfunc_arg_scalar_with_name(btf, &args[i], "rdonly_buf_size")) {
@@ -13749,11 +13786,13 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				}
 
 				meta->r0_size = reg->var_off.value;
-				ret = mark_chain_precision(env, regno);
-				if (ret)
-					return ret;
+				if (i < MAX_BPF_FUNC_REG_ARGS) {
+					ret = mark_chain_precision(env, regno);
+					if (ret)
+						return ret;
+				}
 			}
-			continue;
+			goto next_arg;
 		}
 
 		if (!btf_type_is_ptr(t)) {
@@ -13782,13 +13821,14 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 
-		kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs);
+		kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs,
+						     regno);
 		if (kf_arg_type < 0)
 			return kf_arg_type;
 
 		switch (kf_arg_type) {
 		case KF_ARG_PTR_TO_NULL:
-			continue;
+			goto next_arg;
 		case KF_ARG_PTR_TO_MAP:
 			if (!reg->map_ptr) {
 				verbose(env, "pointer in R%d isn't map pointer\n", regno);
@@ -14201,6 +14241,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		}
 		}
+next_arg:
+		if (is_stack_arg)
+			regs[BPF_REG_1] = saved_reg;
 	}
 
 	if (is_kfunc_release(meta) && !meta->release_regno) {
@@ -14778,7 +14821,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	nargs = btf_type_vlen(meta.func_proto);
 	args = (const struct btf_param *)(meta.func_proto + 1);
-	for (i = 0; i < nargs; i++) {
+	for (i = 0; i < nargs && i < MAX_BPF_FUNC_REG_ARGS; i++) {
 		u32 regno = i + 1;
 
 		t = btf_type_skip_modifiers(desc_btf, args[i].type, NULL);
@@ -14789,6 +14832,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			mark_btf_func_reg_size(env, regno, t->size);
 	}
 
+	/* Track outgoing stack arg depth for kfuncs with >5 args */
+	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
+		struct bpf_func_state *caller = cur_func(env);
+		struct bpf_subprog_info *caller_info = &env->subprog_info[caller->subprogno];
+		u16 kfunc_stack_arg_depth = (nargs - MAX_BPF_FUNC_REG_ARGS) * BPF_REG_SIZE;
+
+		if (kfunc_stack_arg_depth > caller_info->outgoing_stack_arg_depth)
+			caller_info->outgoing_stack_arg_depth = kfunc_stack_arg_depth;
+	}
+
 	if (is_iter_next_kfunc(&meta)) {
 		err = process_iter_next_call(env, insn_idx, &meta);
 		if (err)
@@ -23615,6 +23668,16 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	if (!bpf_jit_supports_far_kfunc_call())
 		insn->imm = BPF_CALL_IMM(desc->addr);
 
+	/*
+	 * After resolving the kfunc address, insn->off is no longer needed
+	 * for BTF fd index. Repurpose it to store the number of stack args
+	 * so the JIT can marshal them.
+	 */
+	if (desc->func_model.nr_args > MAX_BPF_FUNC_REG_ARGS)
+		insn->off = desc->func_model.nr_args - MAX_BPF_FUNC_REG_ARGS;
+	else
+		insn->off = 0;
+
 	if (is_bpf_obj_new_kfunc(desc->func_id) || is_bpf_percpu_obj_new_kfunc(desc->func_id)) {
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 05/10] bpf: Reject stack arguments in non-JITed programs
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
                   ` (3 preceding siblings ...)
  2026-04-02  1:27 ` [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls Yonghong Song
@ 2026-04-02  1:27 ` Yonghong Song
  2026-04-02  1:27 ` [PATCH bpf-next 06/10] bpf: Enable stack argument support for x86_64 Yonghong Song
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

The interpreter does not understand the bpf register r12
(BPF_REG_STACK_ARG_BASE) used for stack argument addressing. So
reject interpreter usage if stack arguments are used either
in the main program or any subprogram.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/core.c     | 3 ++-
 kernel/bpf/verifier.c | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3520337a1c0e..a04b31eb4c49 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2553,7 +2553,8 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 		goto finalize;
 
 	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) ||
-	    bpf_prog_has_kfunc_call(fp))
+	    bpf_prog_has_kfunc_call(fp) ||
+	    fp->aux->stack_arg_depth)
 		jit_needed = true;
 
 	if (!bpf_prog_select_interpreter(fp))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6579156486b8..6d5a764c3460 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -23542,6 +23542,12 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 		verbose(env, "calling kernel functions are not allowed in non-JITed programs\n");
 		return -EINVAL;
 	}
+	for (i = 0; i < env->subprog_cnt; i++) {
+		if (env->subprog_info[i].incoming_stack_arg_depth) {
+			verbose(env, "stack args are not supported in non-JITed programs\n");
+			return -EINVAL;
+		}
+	}
 	if (env->subprog_cnt > 1 && env->prog->aux->tail_call_reachable) {
 		/* When JIT fails the progs with bpf2bpf calls and tail_calls
 		 * have to be rejected, since interpreter doesn't support them yet.
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 06/10] bpf: Enable stack argument support for x86_64
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
                   ` (4 preceding siblings ...)
  2026-04-02  1:27 ` [PATCH bpf-next 05/10] bpf: Reject stack arguments in non-JITed programs Yonghong Song
@ 2026-04-02  1:27 ` Yonghong Song
  2026-04-02  1:28 ` [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments Yonghong Song
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Add stack argument support for x86_64.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 5 +++++
 include/linux/filter.h      | 1 +
 kernel/bpf/btf.c            | 9 ++++++++-
 kernel/bpf/core.c           | 5 +++++
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e9b78040d703..32864dbc2c4e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3937,6 +3937,11 @@ bool bpf_jit_supports_kfunc_call(void)
 	return true;
 }
 
+bool bpf_jit_supports_stack_args(void)
+{
+	return true;
+}
+
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
 	if (text_poke_copy(dst, src, len) == NULL)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 68f018dd4b9c..a5035fb80a6b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1160,6 +1160,7 @@ bool bpf_jit_inlines_helper_call(s32 imm);
 bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_percpu_insn(void);
 bool bpf_jit_supports_kfunc_call(void);
+bool bpf_jit_supports_stack_args(void);
 bool bpf_jit_supports_far_kfunc_call(void);
 bool bpf_jit_supports_exceptions(void);
 bool bpf_jit_supports_ptr_xchg(void);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index c5f3aa05d5a3..1cbe0f2b0e41 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -20,6 +20,7 @@
 #include <linux/btf.h>
 #include <linux/btf_ids.h>
 #include <linux/bpf.h>
+#include <linux/filter.h>
 #include <linux/bpf_lsm.h>
 #include <linux/skmsg.h>
 #include <linux/perf_event.h>
@@ -7897,8 +7898,14 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
 			tname, nargs, MAX_BPF_FUNC_REG_ARGS);
 		return -EINVAL;
 	}
-	if (nargs > MAX_BPF_FUNC_REG_ARGS)
+	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
+		if (!bpf_jit_supports_stack_args()) {
+			bpf_log(log, "JIT does not support function %s() with %d args\n",
+				tname, nargs);
+			return -ENOTSUPP;
+		}
 		sub->incoming_stack_arg_depth = (nargs - MAX_BPF_FUNC_REG_ARGS) * BPF_REG_SIZE;
+	}
 
 	/* check that function is void or returns int, exception cb also requires this */
 	t = btf_type_by_id(btf, t->type);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index a04b31eb4c49..01de4f7f3a82 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3156,6 +3156,11 @@ bool __weak bpf_jit_supports_kfunc_call(void)
 	return false;
 }
 
+bool __weak bpf_jit_supports_stack_args(void)
+{
+	return false;
+}
+
 bool __weak bpf_jit_supports_far_kfunc_call(void)
 {
 	return false;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
                   ` (5 preceding siblings ...)
  2026-04-02  1:27 ` [PATCH bpf-next 06/10] bpf: Enable stack argument support for x86_64 Yonghong Song
@ 2026-04-02  1:28 ` Yonghong Song
  2026-04-02 22:26   ` Amery Hung
  2026-04-02 23:51   ` Alexei Starovoitov
  2026-04-02  1:28 ` [PATCH bpf-next 08/10] selftests/bpf: Add tests for BPF function " Yonghong Song
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Add x86_64 JIT support for BPF functions and kfuncs with more than
5 arguments. The extra arguments are passed through a stack area
addressed by register r12 (BPF_REG_STACK_ARG_BASE) in BPF bytecode,
which the JIT translates to RBP-relative accesses in native code.

There are two possible approaches to allocate the stack arg area:

  Option 1: Allocate a single combined region (incoming + max_outgoing)
    below the program stack in the function prologue. All r12-relative
    accesses become [rbp - prog_stack_depth - offset] where the 'offset'
    is the offset value in (incoming + max_outgoing) region. This is
    simple because the area is always at a fixed offset from RBP.
    The tradeoff is slightly higher stack usage when multiple callees
    have different stack arg counts — the area is sized to the maximum.

  Option 2: Allocate each outgoing area individually at the call
    site, sized exactly to the callee's needs. This minimizes
    stack usage but significantly complicates the JIT: each call
    site must dynamically adjust RSP, and addresses of stack args
    would shift depending on context, making the offset
    calculations harder.

This patch uses Option 1 for simplicity.

The native x86_64 stack layout for a function with incoming and
outgoing stack args:

  high address
  ┌─────────────────────────┐
  │ incoming stack arg N    │  [rbp + 16 + (N - 1) * 8]  (pushed by caller)
  │ ...                     │
  │ incoming stack arg 1    │  [rbp + 16]
  ├─────────────────────────┤
  │ return address          │  [rbp + 8]
  │ saved rbp               │  [rbp]
  ├─────────────────────────┤
  │ callee-saved regs       │
  │ BPF program stack       │  (stack_depth bytes)
  ├─────────────────────────┤
  │ incoming stack arg 1    │  [rbp - prog_stack_depth - 8]
  │ ...   (copied from      │   (copied in prologue)
  │        caller's push)   │
  │ incoming stack arg N    │  [rbp - prog_stack_depth - N * 8]
  ├─────────────────────────┤
  │ outgoing stack arg 1    │  (written via r12-relative STX/ST,
  │ ...                     │   JIT translates to RBP-relative)
  │ outgoing stack arg M    │
  └─────────────────────────┘
    ...                        Other stack usage
  ┌─────────────────────────┐
  │ incoming stack arg M    │ (copy from outgoing stack arg to
  │ ...                     │  incoming stack arg)
  │ incoming stack arg 1    │
  ├─────────────────────────┤
  │ return address          │
  │ saved rbp               │
  ├─────────────────────────┤
  │ ...                     │
  └─────────────────────────┘
  low address

In prologue, the caller's incoming stack arguments are copied to callee's
incoming stack arguments, which will be fetched by later load insns.
The outgoing stack arguments are written by JIT RBP-relative STX or ST.

For each bpf-to-bpf call, push outgoing stack args onto the native
stack before CALL, pop them after return. So the same 'outgoing stack arg'
area is used by all bpf-to-bpf functions.

For kfunc calls, push stack args (arg 7+) onto the native stack
and load arg 6 into R9 per the x86_64 calling convention,
then clean up RSP after return.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 145 ++++++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 7 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 32864dbc2c4e..807493f109e5 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -367,6 +367,27 @@ static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
 	*pprog = prog;
 }
 
+static int push_stack_args(u8 **pprog, s32 base_off, int from, int to)
+{
+	u8 *prog = *pprog;
+	int j, off, cnt = 0;
+
+	for (j = from; j >= to; j--) {
+		off = base_off - j * 8;
+
+		/* push qword [rbp + off] */
+		if (is_imm8(off)) {
+			EMIT3(0xFF, 0x75, off);
+			cnt += 3;
+		} else {
+			EMIT2_off32(0xFF, 0xB5, off);
+			cnt += 6;
+		}
+	}
+	*pprog = prog;
+	return cnt;
+}
+
 static void pop_r12(u8 **pprog)
 {
 	u8 *prog = *pprog;
@@ -1664,19 +1685,35 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 	int i, excnt = 0;
 	int ilen, proglen = 0;
 	u8 *prog = temp;
-	u32 stack_depth;
+	u16 stack_arg_depth, incoming_stack_arg_depth;
+	u32 prog_stack_depth, stack_depth;
+	bool has_stack_args;
 	int err;
 
 	stack_depth = bpf_prog->aux->stack_depth;
+	stack_arg_depth = bpf_prog->aux->stack_arg_depth;
+	incoming_stack_arg_depth = bpf_prog->aux->incoming_stack_arg_depth;
 	priv_stack_ptr = bpf_prog->aux->priv_stack_ptr;
 	if (priv_stack_ptr) {
 		priv_frame_ptr = priv_stack_ptr + PRIV_STACK_GUARD_SZ + round_up(stack_depth, 8);
 		stack_depth = 0;
 	}
 
+	/*
+	 * Save program stack depth before adding stack arg space.
+	 * Each function allocates its own stack arg space
+	 * (incoming + outgoing) below its BPF stack.
+	 * Stack args are accessed via RBP-based addressing.
+	 */
+	prog_stack_depth = round_up(stack_depth, 8);
+	if (stack_arg_depth)
+		stack_depth += stack_arg_depth;
+	has_stack_args = stack_arg_depth > 0;
+
 	arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena);
 	user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena);
 
+
 	detect_reg_usage(insn, insn_cnt, callee_regs_used);
 
 	emit_prologue(&prog, image, stack_depth,
@@ -1704,6 +1741,38 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 		emit_mov_imm64(&prog, X86_REG_R12,
 			       arena_vm_start >> 32, (u32) arena_vm_start);
 
+	if (incoming_stack_arg_depth && bpf_is_subprog(bpf_prog)) {
+		int n = incoming_stack_arg_depth / 8;
+
+		/*
+		 * Caller pushed stack args before CALL, so after prologue
+		 * (CALL saves ret addr, then PUSH saves old RBP) they sit
+		 * above RBP:
+		 *
+		 *   [rbp + 16 + (n - 1) * 8]  stack_arg n
+		 *   ...
+		 *   [rbp + 24]                stack_arg 2
+		 *   [rbp + 16]                stack_arg 1
+		 *   [rbp +  8]                return address
+		 *   [rbp +  0]                saved rbp
+		 *
+		 * Copy each into callee's own region below the program stack:
+		 *   [rbp - prog_stack_depth - i * 8]
+		 */
+		for (i = 0; i < n; i++) {
+			s32 src = 16 + i * 8;
+			s32 dst = -prog_stack_depth - (i + 1) * 8;
+
+			/* mov rax, [rbp + src] */
+			EMIT4(0x48, 0x8B, 0x45, src);
+			/* mov [rbp + dst], rax */
+			if (is_imm8(dst))
+				EMIT4(0x48, 0x89, 0x45, dst);
+			else
+				EMIT3_off32(0x48, 0x89, 0x85, dst);
+		}
+	}
+
 	if (priv_frame_ptr)
 		emit_priv_frame_ptr(&prog, priv_frame_ptr);
 
@@ -1715,13 +1784,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 	prog = temp;
 
 	for (i = 1; i <= insn_cnt; i++, insn++) {
+		bool adjust_stack_arg_off = false;
 		const s32 imm32 = insn->imm;
 		u32 dst_reg = insn->dst_reg;
 		u32 src_reg = insn->src_reg;
 		u8 b2 = 0, b3 = 0;
 		u8 *start_of_ldx;
 		s64 jmp_offset;
-		s16 insn_off;
+		s32 insn_off;
 		u8 jmp_cond;
 		u8 *func;
 		int nops;
@@ -1734,6 +1804,21 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 				dst_reg = X86_REG_R9;
 		}
 
+		if (has_stack_args) {
+			u8 class = BPF_CLASS(insn->code);
+
+			if (class == BPF_LDX &&
+			    src_reg == BPF_REG_STACK_ARG_BASE) {
+				src_reg = BPF_REG_FP;
+				adjust_stack_arg_off = true;
+			}
+			if ((class == BPF_STX || class == BPF_ST) &&
+			    dst_reg == BPF_REG_STACK_ARG_BASE) {
+				dst_reg = BPF_REG_FP;
+				adjust_stack_arg_off = true;
+			}
+		}
+
 		switch (insn->code) {
 			/* ALU */
 		case BPF_ALU | BPF_ADD | BPF_X:
@@ -2131,10 +2216,16 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 		case BPF_ST | BPF_MEM | BPF_DW:
 			EMIT2(add_1mod(0x48, dst_reg), 0xC7);
 
-st:			if (is_imm8(insn->off))
-				EMIT2(add_1reg(0x40, dst_reg), insn->off);
+st:		{
+			int off = insn->off;
+
+			if (adjust_stack_arg_off)
+				off -= prog_stack_depth;
+			if (is_imm8(off))
+				EMIT2(add_1reg(0x40, dst_reg), off);
 			else
-				EMIT1_off32(add_1reg(0x80, dst_reg), insn->off);
+				EMIT1_off32(add_1reg(0x80, dst_reg), off);
+		}
 
 			EMIT(imm32, bpf_size_to_x86_bytes(BPF_SIZE(insn->code)));
 			break;
@@ -2143,9 +2234,14 @@ st:			if (is_imm8(insn->off))
 		case BPF_STX | BPF_MEM | BPF_B:
 		case BPF_STX | BPF_MEM | BPF_H:
 		case BPF_STX | BPF_MEM | BPF_W:
-		case BPF_STX | BPF_MEM | BPF_DW:
-			emit_stx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
+		case BPF_STX | BPF_MEM | BPF_DW: {
+			int off = insn->off;
+
+			if (adjust_stack_arg_off)
+				off -= prog_stack_depth;
+			emit_stx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, off);
 			break;
+		}
 
 		case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
 		case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
@@ -2243,6 +2339,8 @@ st:			if (is_imm8(insn->off))
 		case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
 		case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
 			insn_off = insn->off;
+			if (adjust_stack_arg_off)
+				insn_off -= prog_stack_depth;
 
 			if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
 			    BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
@@ -2440,6 +2538,8 @@ st:			if (is_imm8(insn->off))
 
 			/* call */
 		case BPF_JMP | BPF_CALL: {
+			int off, base_off, n_stack_args, kfunc_stack_args = 0, stack_args = 0;
+			u16 outgoing_stack_args = stack_arg_depth - incoming_stack_arg_depth;
 			u8 *ip = image + addrs[i - 1];
 
 			func = (u8 *) __bpf_call_base + imm32;
@@ -2449,6 +2549,29 @@ st:			if (is_imm8(insn->off))
 			}
 			if (!imm32)
 				return -EINVAL;
+
+			if (src_reg == BPF_PSEUDO_CALL && outgoing_stack_args > 0) {
+				n_stack_args = outgoing_stack_args / 8;
+				base_off = -(prog_stack_depth + incoming_stack_arg_depth);
+				ip += push_stack_args(&prog, base_off, n_stack_args, 1);
+			}
+
+			if (src_reg != BPF_PSEUDO_CALL && insn->off > 0) {
+				kfunc_stack_args = insn->off;
+				stack_args = kfunc_stack_args > 1 ? kfunc_stack_args - 1 : 0;
+				base_off = -(prog_stack_depth + incoming_stack_arg_depth);
+				ip += push_stack_args(&prog, base_off, kfunc_stack_args, 2);
+
+				/* mov r9, [rbp + base_off - 8] */
+				off = base_off - 8;
+				if (is_imm8(off)) {
+					EMIT4(0x4C, 0x8B, 0x4D, off);
+					ip += 4;
+				} else {
+					EMIT3_off32(0x4C, 0x8B, 0x8D, off);
+					ip += 7;
+				}
+			}
 			if (priv_frame_ptr) {
 				push_r9(&prog);
 				ip += 2;
@@ -2458,6 +2581,14 @@ st:			if (is_imm8(insn->off))
 				return -EINVAL;
 			if (priv_frame_ptr)
 				pop_r9(&prog);
+			if (stack_args > 0) {
+				/* add rsp, stack_args * 8 */
+				EMIT4(0x48, 0x83, 0xC4, stack_args * 8);
+			}
+			if (src_reg == BPF_PSEUDO_CALL && outgoing_stack_args > 0) {
+				/* add rsp, outgoing_stack_args */
+				EMIT4(0x48, 0x83, 0xC4, outgoing_stack_args);
+			}
 			break;
 		}
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 08/10] selftests/bpf: Add tests for BPF function stack arguments
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
                   ` (6 preceding siblings ...)
  2026-04-02  1:28 ` [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments Yonghong Song
@ 2026-04-02  1:28 ` Yonghong Song
  2026-04-02  1:28 ` [PATCH bpf-next 09/10] selftests/bpf: Add negative test for oversized kfunc stack argument Yonghong Song
  2026-04-02  1:28 ` [PATCH bpf-next 10/10] selftests/bpf: Add verifier tests for stack argument validation Yonghong Song
  9 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Add selftests covering stack argument passing for both BPF-to-BPF
subprog calls and kfunc calls with more than 5 arguments. All tests
are guarded by __BPF_FEATURE_STACK_ARGUMENT and __TARGET_ARCH_x86.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/stack_arg.c      | 143 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/stack_arg.c | 111 ++++++++++++++
 .../selftests/bpf/progs/stack_arg_kfunc.c     |  59 ++++++++
 .../selftests/bpf/test_kmods/bpf_testmod.c    |  22 +++
 .../bpf/test_kmods/bpf_testmod_kfunc.h        |   7 +
 5 files changed, 342 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/stack_arg.c
 create mode 100644 tools/testing/selftests/bpf/progs/stack_arg.c
 create mode 100644 tools/testing/selftests/bpf/progs/stack_arg_kfunc.c

diff --git a/tools/testing/selftests/bpf/prog_tests/stack_arg.c b/tools/testing/selftests/bpf/prog_tests/stack_arg.c
new file mode 100644
index 000000000000..86b8025f5541
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/stack_arg.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <network_helpers.h>
+#include "stack_arg.skel.h"
+#include "stack_arg_kfunc.skel.h"
+
+static void test_nesting(void)
+{
+	struct stack_arg *skel;
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+
+	skel = stack_arg__open();
+	if (!ASSERT_OK_PTR(skel, "open"))
+		return;
+
+	if (!skel->rodata->has_stack_arg) {
+		test__skip();
+		goto out;
+	}
+
+	err = stack_arg__load(skel);
+	if (!ASSERT_OK(err, "load"))
+		goto out;
+
+	skel->bss->a = 0;
+	skel->bss->b = 0;
+	skel->bss->c = 0;
+	skel->bss->d = 0;
+	skel->bss->e = 0;
+	skel->bss->f = 6;
+	skel->bss->g = 7;
+	skel->bss->i = 8;
+
+	prog_fd = bpf_program__fd(skel->progs.test);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run");
+	ASSERT_EQ(topts.retval, 29, "retval");
+
+out:
+	stack_arg__destroy(skel);
+}
+
+static void run_subtest(struct bpf_program *prog, int expected)
+{
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+
+	prog_fd = bpf_program__fd(prog);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run");
+	ASSERT_EQ(topts.retval, expected, "retval");
+}
+
+static void test_global_many(void)
+{
+	struct stack_arg *skel;
+
+	skel = stack_arg__open();
+	if (!ASSERT_OK_PTR(skel, "open"))
+		return;
+
+	if (!skel->rodata->has_stack_arg) {
+		test__skip();
+		goto out;
+	}
+
+	if (!ASSERT_OK(stack_arg__load(skel), "load"))
+		goto out;
+
+	run_subtest(skel->progs.test_global_many_args, 36);
+
+out:
+	stack_arg__destroy(skel);
+}
+
+static void test_async_cb_many(void)
+{
+	struct stack_arg *skel;
+
+	skel = stack_arg__open();
+	if (!ASSERT_OK_PTR(skel, "open"))
+		return;
+
+	if (!skel->rodata->has_stack_arg) {
+		test__skip();
+		goto out;
+	}
+
+	if (!ASSERT_OK(stack_arg__load(skel), "load"))
+		goto out;
+
+	run_subtest(skel->progs.test_async_cb_many_args, 0);
+
+out:
+	stack_arg__destroy(skel);
+}
+
+static void test_kfunc(void)
+{
+	struct stack_arg_kfunc *skel;
+
+	skel = stack_arg_kfunc__open();
+	if (!ASSERT_OK_PTR(skel, "open"))
+		return;
+
+	if (!skel->rodata->has_stack_arg) {
+		test__skip();
+		goto out;
+	}
+
+	if (!ASSERT_OK(stack_arg_kfunc__load(skel), "load"))
+		goto out;
+
+	run_subtest(skel->progs.test_stack_arg_scalar, 36);
+	run_subtest(skel->progs.test_stack_arg_ptr, 45);
+	run_subtest(skel->progs.test_stack_arg_mix, 51);
+
+out:
+	stack_arg_kfunc__destroy(skel);
+}
+
+void test_stack_arg(void)
+{
+	if (test__start_subtest("nesting"))
+		test_nesting();
+	if (test__start_subtest("global_many_args"))
+		test_global_many();
+	if (test__start_subtest("async_cb_many_args"))
+		test_async_cb_many();
+	if (test__start_subtest("kfunc"))
+		test_kfunc();
+}
diff --git a/tools/testing/selftests/bpf/progs/stack_arg.c b/tools/testing/selftests/bpf/progs/stack_arg.c
new file mode 100644
index 000000000000..c139a699fc3c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stack_arg.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+
+#define CLOCK_MONOTONIC 1
+
+long a, b, c, d, e, f, g, i;
+
+struct timer_elem {
+	struct bpf_timer timer;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct timer_elem);
+} timer_map SEC(".maps");
+
+int timer_result;
+
+#if defined(__TARGET_ARCH_x86) && defined(__BPF_FEATURE_STACK_ARGUMENT)
+
+const volatile bool has_stack_arg = true;
+
+__noinline static long func_b(long a, long b, long c, long d,
+			      long e, long f, long g, long h)
+{
+	return a + b + c + d + e + f + g + h;
+}
+
+__noinline static long func_a(long a, long b, long c, long d,
+			      long e, long f, long g, long h)
+{
+	return func_b(a + 1, b + 1, c + 1, d + 1,
+		      e + 1, f + 1, g + 1, h + 1);
+}
+
+SEC("tc")
+int test(void)
+{
+	return func_a(a, b, c, d, e, f, g, i);
+}
+
+__noinline static int static_func_many_args(int a, int b, int c, int d,
+					    int e, int f, int g, int h)
+{
+	return a + b + c + d + e + f + g + h;
+}
+
+__noinline int global_calls_many_args(int a, int b, int c)
+{
+	return static_func_many_args(a, b, c, 4, 5, 6, 7, 8);
+}
+
+SEC("tc")
+int test_global_many_args(void)
+{
+	return global_calls_many_args(1, 2, 3);
+}
+
+static int timer_cb_many_args(void *map, int *key, struct bpf_timer *timer)
+{
+	timer_result = static_func_many_args(10, 20, 30, 40, 50, 60, 70, 80);
+	return 0;
+}
+
+SEC("tc")
+int test_async_cb_many_args(void)
+{
+	struct timer_elem *elem;
+	int key = 0;
+
+	elem = bpf_map_lookup_elem(&timer_map, &key);
+	if (!elem)
+		return -1;
+
+	bpf_timer_init(&elem->timer, &timer_map, CLOCK_MONOTONIC);
+	bpf_timer_set_callback(&elem->timer, timer_cb_many_args);
+	bpf_timer_start(&elem->timer, 1, 0);
+	return 0;
+}
+
+#else
+
+const volatile bool has_stack_arg = false;
+
+SEC("tc")
+int test(void)
+{
+	return 0;
+}
+
+SEC("tc")
+int test_global_many_args(void)
+{
+	return 0;
+}
+
+SEC("tc")
+int test_async_cb_many_args(void)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/stack_arg_kfunc.c b/tools/testing/selftests/bpf/progs/stack_arg_kfunc.c
new file mode 100644
index 000000000000..a440e9b42a4a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stack_arg_kfunc.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "../test_kmods/bpf_testmod_kfunc.h"
+
+#if defined(__TARGET_ARCH_x86) && defined(__BPF_FEATURE_STACK_ARGUMENT)
+
+const volatile bool has_stack_arg = true;
+
+SEC("tc")
+int test_stack_arg_scalar(struct __sk_buff *skb)
+{
+	return bpf_kfunc_call_stack_arg(1, 2, 3, 4, 5, 6, 7, 8);
+}
+
+SEC("tc")
+int test_stack_arg_ptr(struct __sk_buff *skb)
+{
+	struct prog_test_pass1 p = { .x0 = 10, .x1 = 20 };
+
+	return bpf_kfunc_call_stack_arg_ptr(1, 2, 3, 4, 5, &p);
+}
+
+SEC("tc")
+int test_stack_arg_mix(struct __sk_buff *skb)
+{
+	struct prog_test_pass1 p = { .x0 = 10 };
+	struct prog_test_pass1 q = { .x1 = 20 };
+
+	return bpf_kfunc_call_stack_arg_mix(1, 2, 3, 4, 5, &p, 6, &q);
+}
+
+#else
+
+const volatile bool has_stack_arg = false;
+
+SEC("tc")
+int test_stack_arg_scalar(struct __sk_buff *skb)
+{
+	return 0;
+}
+
+SEC("tc")
+int test_stack_arg_ptr(struct __sk_buff *skb)
+{
+	return 0;
+}
+
+SEC("tc")
+int test_stack_arg_mix(struct __sk_buff *skb)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index 061356f10093..d88ab1dc5106 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -824,6 +824,25 @@ __bpf_kfunc int bpf_kfunc_call_test5(u8 a, u16 b, u32 c)
 	return 0;
 }
 
+__bpf_kfunc u64 bpf_kfunc_call_stack_arg(u64 a, u64 b, u64 c, u64 d,
+					 u64 e, u64 f, u64 g, u64 h)
+{
+	return a + b + c + d + e + f + g + h;
+}
+
+__bpf_kfunc u64 bpf_kfunc_call_stack_arg_ptr(u64 a, u64 b, u64 c, u64 d, u64 e,
+					     struct prog_test_pass1 *p)
+{
+	return a + b + c + d + e + p->x0 + p->x1;
+}
+
+__bpf_kfunc u64 bpf_kfunc_call_stack_arg_mix(u64 a, u64 b, u64 c, u64 d, u64 e,
+					     struct prog_test_pass1 *p, u64 f,
+					     struct prog_test_pass1 *q)
+{
+	return a + b + c + d + e + p->x0 + f + q->x1;
+}
+
 static struct prog_test_ref_kfunc prog_test_struct = {
 	.a = 42,
 	.b = 108,
@@ -1287,6 +1306,9 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test2)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test3)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test4)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test5)
+BTF_ID_FLAGS(func, bpf_kfunc_call_stack_arg)
+BTF_ID_FLAGS(func, bpf_kfunc_call_stack_arg_ptr)
+BTF_ID_FLAGS(func, bpf_kfunc_call_stack_arg_mix)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
index aa0b8d41e71b..f93cf2db12b2 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
@@ -111,6 +111,13 @@ int bpf_kfunc_call_test2(struct sock *sk, __u32 a, __u32 b) __ksym;
 struct sock *bpf_kfunc_call_test3(struct sock *sk) __ksym;
 long bpf_kfunc_call_test4(signed char a, short b, int c, long d) __ksym;
 int bpf_kfunc_call_test5(__u8 a, __u16 b, __u32 c) __ksym;
+__u64 bpf_kfunc_call_stack_arg(__u64 a, __u64 b, __u64 c, __u64 d,
+			       __u64 e, __u64 f, __u64 g, __u64 h) __ksym;
+__u64 bpf_kfunc_call_stack_arg_ptr(__u64 a, __u64 b, __u64 c, __u64 d, __u64 e,
+				   struct prog_test_pass1 *p) __ksym;
+__u64 bpf_kfunc_call_stack_arg_mix(__u64 a, __u64 b, __u64 c, __u64 d, __u64 e,
+				   struct prog_test_pass1 *p, __u64 f,
+				   struct prog_test_pass1 *q) __ksym;
 
 void bpf_kfunc_call_test_pass_ctx(struct __sk_buff *skb) __ksym;
 void bpf_kfunc_call_test_pass1(struct prog_test_pass1 *p) __ksym;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 09/10] selftests/bpf: Add negative test for oversized kfunc stack argument
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
                   ` (7 preceding siblings ...)
  2026-04-02  1:28 ` [PATCH bpf-next 08/10] selftests/bpf: Add tests for BPF function " Yonghong Song
@ 2026-04-02  1:28 ` Yonghong Song
  2026-04-02  1:28 ` [PATCH bpf-next 10/10] selftests/bpf: Add verifier tests for stack argument validation Yonghong Song
  9 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Add a test that the verifier rejects kfunc calls where a stack argument
exceeds 8 bytes (the register-sized slot limit).

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/stack_arg_fail.c | 24 ++++++++++++++
 .../selftests/bpf/progs/stack_arg_fail.c      | 32 +++++++++++++++++++
 .../selftests/bpf/test_kmods/bpf_testmod.c    |  7 ++++
 .../bpf/test_kmods/bpf_testmod_kfunc.h        |  7 ++++
 4 files changed, 70 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/stack_arg_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/stack_arg_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/stack_arg_fail.c b/tools/testing/selftests/bpf/prog_tests/stack_arg_fail.c
new file mode 100644
index 000000000000..328a79edee45
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/stack_arg_fail.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include "stack_arg_fail.skel.h"
+
+void test_stack_arg_fail(void)
+{
+	struct stack_arg_fail *skel;
+
+	skel = stack_arg_fail__open();
+	if (!ASSERT_OK_PTR(skel, "open"))
+		return;
+
+	if (!skel->rodata->has_stack_arg) {
+		test__skip();
+		goto out;
+	}
+
+	ASSERT_ERR(stack_arg_fail__load(skel), "load_should_fail");
+
+out:
+	stack_arg_fail__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/stack_arg_fail.c b/tools/testing/selftests/bpf/progs/stack_arg_fail.c
new file mode 100644
index 000000000000..caa63b6f6a80
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stack_arg_fail.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "../test_kmods/bpf_testmod_kfunc.h"
+
+#if defined(__BPF_FEATURE_STACK_ARGUMENT)
+
+const volatile bool has_stack_arg = true;
+
+SEC("tc")
+int test_stack_arg_big(struct __sk_buff *skb)
+{
+	struct prog_test_big_arg s = { .a = 1, .b = 2 };
+
+	return bpf_kfunc_call_stack_arg_big(1, 2, 3, 4, 5, s);
+}
+
+#else
+
+const volatile bool has_stack_arg = false;
+
+SEC("tc")
+int test_stack_arg_big(struct __sk_buff *skb)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index d88ab1dc5106..00d0b238219d 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -843,6 +843,12 @@ __bpf_kfunc u64 bpf_kfunc_call_stack_arg_mix(u64 a, u64 b, u64 c, u64 d, u64 e,
 	return a + b + c + d + e + p->x0 + f + q->x1;
 }
 
+__bpf_kfunc u64 bpf_kfunc_call_stack_arg_big(u64 a, u64 b, u64 c, u64 d, u64 e,
+					     struct prog_test_big_arg s)
+{
+	return a + b + c + d + e + s.a + s.b;
+}
+
 static struct prog_test_ref_kfunc prog_test_struct = {
 	.a = 42,
 	.b = 108,
@@ -1309,6 +1315,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test5)
 BTF_ID_FLAGS(func, bpf_kfunc_call_stack_arg)
 BTF_ID_FLAGS(func, bpf_kfunc_call_stack_arg_ptr)
 BTF_ID_FLAGS(func, bpf_kfunc_call_stack_arg_mix)
+BTF_ID_FLAGS(func, bpf_kfunc_call_stack_arg_big)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
index f93cf2db12b2..b142b87e9f60 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
@@ -48,6 +48,11 @@ struct prog_test_pass2 {
 	} x;
 };
 
+struct prog_test_big_arg {
+	long a;
+	long b;
+};
+
 struct prog_test_fail1 {
 	void *p;
 	int x;
@@ -118,6 +123,8 @@ __u64 bpf_kfunc_call_stack_arg_ptr(__u64 a, __u64 b, __u64 c, __u64 d, __u64 e,
 __u64 bpf_kfunc_call_stack_arg_mix(__u64 a, __u64 b, __u64 c, __u64 d, __u64 e,
 				   struct prog_test_pass1 *p, __u64 f,
 				   struct prog_test_pass1 *q) __ksym;
+__u64 bpf_kfunc_call_stack_arg_big(__u64 a, __u64 b, __u64 c, __u64 d, __u64 e,
+				   struct prog_test_big_arg s) __ksym;
 
 void bpf_kfunc_call_test_pass_ctx(struct __sk_buff *skb) __ksym;
 void bpf_kfunc_call_test_pass1(struct prog_test_pass1 *p) __ksym;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 10/10] selftests/bpf: Add verifier tests for stack argument validation
  2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
                   ` (8 preceding siblings ...)
  2026-04-02  1:28 ` [PATCH bpf-next 09/10] selftests/bpf: Add negative test for oversized kfunc stack argument Yonghong Song
@ 2026-04-02  1:28 ` Yonghong Song
  9 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02  1:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

Add inline-asm-based verifier tests that exercise the stack argument
validation logic directly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 .../selftests/bpf/progs/verifier_stack_arg.c  | 122 ++++++++++++++++++
 2 files changed, 124 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_stack_arg.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index bcf01cb4cfe4..cbdbcb913bae 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -90,6 +90,7 @@
 #include "verifier_sockmap_mutate.skel.h"
 #include "verifier_spill_fill.skel.h"
 #include "verifier_spin_lock.skel.h"
+#include "verifier_stack_arg.skel.h"
 #include "verifier_stack_ptr.skel.h"
 #include "verifier_store_release.skel.h"
 #include "verifier_subprog_precision.skel.h"
@@ -235,6 +236,7 @@ void test_verifier_sock_addr(void)            { RUN(verifier_sock_addr); }
 void test_verifier_sockmap_mutate(void)       { RUN(verifier_sockmap_mutate); }
 void test_verifier_spill_fill(void)           { RUN(verifier_spill_fill); }
 void test_verifier_spin_lock(void)            { RUN(verifier_spin_lock); }
+void test_verifier_stack_arg(void)            { RUN(verifier_stack_arg); }
 void test_verifier_stack_ptr(void)            { RUN(verifier_stack_ptr); }
 void test_verifier_store_release(void)        { RUN(verifier_store_release); }
 void test_verifier_subprog_precision(void)    { RUN(verifier_subprog_precision); }
diff --git a/tools/testing/selftests/bpf/progs/verifier_stack_arg.c b/tools/testing/selftests/bpf/progs/verifier_stack_arg.c
new file mode 100644
index 000000000000..eb1005c771f7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_stack_arg.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#if defined(__TARGET_ARCH_x86) && defined(__BPF_FEATURE_STACK_ARGUMENT)
+
+__noinline __used
+static int subprog_6args(int a, int b, int c, int d, int e, int f)
+{
+	return a + b + c + d + e + f;
+}
+
+__noinline __used
+static int subprog_7args(int a, int b, int c, int d, int e, int f, int g)
+{
+	return a + b + c + d + e + f + g;
+}
+
+SEC("tc")
+__description("stack_arg: subprog with 6 args")
+__success
+__arch_x86_64
+__naked void stack_arg_6args(void)
+{
+	asm volatile (
+		"r1 = 1;"
+		"r2 = 2;"
+		"r3 = 3;"
+		"r4 = 4;"
+		"r5 = 5;"
+		"*(u64 *)(r12 - 8) = 6;"
+		"call subprog_6args;"
+		"exit;"
+		::: __clobber_all
+	);
+}
+
+SEC("tc")
+__description("stack_arg: two subprogs with >5 args")
+__success
+__arch_x86_64
+__naked void stack_arg_two_subprogs(void)
+{
+	asm volatile (
+		"r1 = 1;"
+		"r2 = 2;"
+		"r3 = 3;"
+		"r4 = 4;"
+		"r5 = 5;"
+		"*(u64 *)(r12 - 8) = 10;"
+		"call subprog_6args;"
+		"r6 = r0;"
+		"r1 = 1;"
+		"r2 = 2;"
+		"r3 = 3;"
+		"r4 = 4;"
+		"r5 = 5;"
+		"*(u64 *)(r12 - 16) = 30;"
+		"*(u64 *)(r12 - 8) = 20;"
+		"call subprog_7args;"
+		"r0 += r6;"
+		"exit;"
+		::: __clobber_all
+	);
+}
+
+SEC("tc")
+__description("stack_arg: gap at offset -8, only wrote -16")
+__failure
+__msg("stack arg#6 not properly initialized")
+__arch_x86_64
+__naked void stack_arg_gap_at_minus8(void)
+{
+	asm volatile (
+		"r1 = 1;"
+		"r2 = 2;"
+		"r3 = 3;"
+		"r4 = 4;"
+		"r5 = 5;"
+		"*(u64 *)(r12 - 16) = 30;"
+		"call subprog_7args;"
+		"exit;"
+		::: __clobber_all
+	);
+}
+
+SEC("tc")
+__description("stack_arg: misaligned stack arg write")
+__failure
+__msg("stack arg write must be 8 bytes, got 4")
+__arch_x86_64
+__naked void stack_arg_not_written(void)
+{
+	asm volatile (
+		"r1 = 1;"
+		"r2 = 2;"
+		"r3 = 3;"
+		"r4 = 4;"
+		"r5 = 5;"
+		"*(u32 *)(r12 - 4) = 30;"
+		"call subprog_6args;"
+		"exit;"
+		::: __clobber_all
+	);
+}
+
+#else
+
+SEC("socket")
+__description("stack_arg is not supported by compiler or jit, use a dummy test")
+__success
+int dummy_test(void)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02  1:27 ` [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions Yonghong Song
@ 2026-04-02  3:18   ` bot+bpf-ci
  2026-04-02 14:42     ` Yonghong Song
  2026-04-02 18:55   ` Amery Hung
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: bot+bpf-ci @ 2026-04-02  3:18 UTC (permalink / raw)
  To: yonghong.song, bpf
  Cc: ast, andrii, daniel, jose.marchesi, kernel-team, martin.lau, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2565 bytes --]

> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 090aa26d1c98..a260610cd1c1 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h

[ ... ]

> @@ -319,6 +324,10 @@ struct bpf_func_state {
>  	 * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE.
>  	 */
>  	int allocated_stack;
> +
> +	u16 stack_arg_depth; /* Size of incoming + max outgoing stack args in bytes. */
> +	u16 incoming_stack_arg_depth; /* Size of incoming stack args in bytes. */
> +	struct bpf_stack_arg_state *stack_arg_slots;
>  };

The new stack_arg_slots state is part of bpf_func_state and is
properly copied in copy_stack_state() and freed in free_func_state(),
but stacksafe() does not compare it during state pruning.

stacksafe() only iterates the regular stack:

kernel/bpf/verifier.c:stacksafe() {
    for (i = 0; i < old->allocated_stack; i++) {
        ...
        old->stack[spi].slot_type[...]
        ...
    }
    return true;
}

It never examines old->stack_arg_slots or cur->stack_arg_slots. This
means func_states_equal() -> stacksafe() treats two states as
equivalent even when they differ in their stack arg slot contents.

Could this allow the verifier to incorrectly prune a path where
outgoing stack arg slots are uninitialized? For example, if a BPF
program has two paths to a call instruction -- one that writes the
outgoing stack args and one that does not -- the verifier could
explore the initialized path first, save the state at a checkpoint,
then prune the uninitialized path as equivalent. The initialization
check in set_callee_state():

kernel/bpf/verifier.c:set_callee_state() {
    for (i = 0; i < callee_incoming_slots; i++) {
        int caller_spi = i + caller_incoming_slots;

        if (!is_stack_arg_slot_initialized(caller, caller_spi)) {
            verbose(env, "stack arg#%d not properly initialized\n",
                i + 1 + MAX_BPF_FUNC_REG_ARGS);
            return -EINVAL;
        }
        callee->stack_arg_slots[i] = caller->stack_arg_slots[caller_spi];
    }
}

would never run on the pruned path, potentially allowing the callee
to receive uninitialized data.

It seems like stacksafe() needs a companion loop for stack_arg_slots,
similar to how it walks the regular stack, to maintain verifier
soundness for this new state.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23879588767

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls
  2026-04-02  1:27 ` [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls Yonghong Song
@ 2026-04-02  3:18   ` bot+bpf-ci
  2026-04-02 14:45     ` Yonghong Song
  2026-04-02 21:02   ` Amery Hung
  1 sibling, 1 reply; 33+ messages in thread
From: bot+bpf-ci @ 2026-04-02  3:18 UTC (permalink / raw)
  To: yonghong.song, bpf
  Cc: ast, andrii, daniel, jose.marchesi, kernel-team, martin.lau, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2501 bytes --]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d424fe611ef8..6579156486b8 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c

[ ... ]

> @@ -13687,13 +13694,41 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  	 * verifier sees.
>  	 */
>  	for (i = 0; i < nargs; i++) {
> -		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
> +		struct bpf_reg_state *regs = cur_regs(env), *reg;
> +		struct bpf_reg_state saved_reg;

[ ... ]

> +		if (i < MAX_BPF_FUNC_REG_ARGS) {
> +			reg = &regs[i + 1];
> +		} else {
> +			/*
> +			 * Retrieve the spilled reg state from the stack arg slot.
> +			 * Reuse the existing type checking infrastructure which
> +			 * reads from cur_regs(env)[regno], temporarily copy the
> +			 * stack arg reg state into regs[BPF_REG_1] and restore
> +			 * it after checking.
> +			 */

[ ... ]

> +			is_stack_arg = true;
> +			regno = BPF_REG_1;
> +			saved_reg = regs[BPF_REG_1];
> +			regs[BPF_REG_1] = caller->stack_arg_slots[spi].spilled_ptr;
> +			reg = &regs[BPF_REG_1];
> +		}
> +
>  		if (is_kfunc_arg_prog_aux(btf, &args[i])) {

[ ... ]

> -			continue;
> +			goto next_arg;
>  		}
>
>  		if (is_kfunc_arg_ignore(btf, &args[i]) || is_kfunc_arg_implicit(meta, i))
>  			continue;

The is_kfunc_arg_prog_aux path above was converted from continue to
goto next_arg so that regs[BPF_REG_1] gets restored via the next_arg
label. The scalar path and the KF_ARG_PTR_TO_NULL path were also
converted the same way.

Should this continue for is_kfunc_arg_ignore/is_kfunc_arg_implicit
also be goto next_arg? When a stack arg (i >= MAX_BPF_FUNC_REG_ARGS)
hits this continue, is_stack_arg is true but the next_arg restore is
skipped, so regs[BPF_REG_1] keeps the spilled pointer from the
current iteration. On the next iteration, saved_reg then captures that
stale value instead of the original register state.

The comment above says "restore it after checking", which this continue
path does not do. The impact is currently masked because each iteration
independently overwrites regs[BPF_REG_1] and the caller_saved clearing
in check_kfunc_call() resets R1 after the loop, but it breaks the
save/restore invariant the rest of this patch establishes.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23879588767

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02  3:18   ` bot+bpf-ci
@ 2026-04-02 14:42     ` Yonghong Song
  0 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02 14:42 UTC (permalink / raw)
  To: bot+bpf-ci, bpf
  Cc: ast, andrii, daniel, jose.marchesi, kernel-team, martin.lau,
	eddyz87, clm, ihor.solodrai



On 4/1/26 8:18 PM, bot+bpf-ci@kernel.org wrote:
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 090aa26d1c98..a260610cd1c1 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
> [ ... ]
>
>> @@ -319,6 +324,10 @@ struct bpf_func_state {
>>   	 * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE.
>>   	 */
>>   	int allocated_stack;
>> +
>> +	u16 stack_arg_depth; /* Size of incoming + max outgoing stack args in bytes. */
>> +	u16 incoming_stack_arg_depth; /* Size of incoming stack args in bytes. */
>> +	struct bpf_stack_arg_state *stack_arg_slots;
>>   };
> The new stack_arg_slots state is part of bpf_func_state and is
> properly copied in copy_stack_state() and freed in free_func_state(),
> but stacksafe() does not compare it during state pruning.
>
> stacksafe() only iterates the regular stack:
>
> kernel/bpf/verifier.c:stacksafe() {
>      for (i = 0; i < old->allocated_stack; i++) {
>          ...
>          old->stack[spi].slot_type[...]
>          ...
>      }
>      return true;
> }
>
> It never examines old->stack_arg_slots or cur->stack_arg_slots. This
> means func_states_equal() -> stacksafe() treats two states as
> equivalent even when they differ in their stack arg slot contents.
>
> Could this allow the verifier to incorrectly prune a path where
> outgoing stack arg slots are uninitialized? For example, if a BPF
> program has two paths to a call instruction -- one that writes the
> outgoing stack args and one that does not -- the verifier could
> explore the initialized path first, save the state at a checkpoint,
> then prune the uninitialized path as equivalent. The initialization
> check in set_callee_state():
>
> kernel/bpf/verifier.c:set_callee_state() {
>      for (i = 0; i < callee_incoming_slots; i++) {
>          int caller_spi = i + caller_incoming_slots;
>
>          if (!is_stack_arg_slot_initialized(caller, caller_spi)) {
>              verbose(env, "stack arg#%d not properly initialized\n",
>                  i + 1 + MAX_BPF_FUNC_REG_ARGS);
>              return -EINVAL;
>          }
>          callee->stack_arg_slots[i] = caller->stack_arg_slots[caller_spi];
>      }
> }
>
> would never run on the pruned path, potentially allowing the callee
> to receive uninitialized data.
>
> It seems like stacksafe() needs a companion loop for stack_arg_slots,
> similar to how it walks the regular stack, to maintain verifier
> soundness for this new state.

You are absolutely correct. Indeed, we should do some checking for
stack_arg_slots along side stacksafe(). I will fix it in the next
revision.

>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23879588767


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls
  2026-04-02  3:18   ` bot+bpf-ci
@ 2026-04-02 14:45     ` Yonghong Song
  0 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02 14:45 UTC (permalink / raw)
  To: bot+bpf-ci, bpf
  Cc: ast, andrii, daniel, jose.marchesi, kernel-team, martin.lau,
	eddyz87, clm, ihor.solodrai



On 4/1/26 8:18 PM, bot+bpf-ci@kernel.org wrote:
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index d424fe611ef8..6579156486b8 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
> [ ... ]
>
>> @@ -13687,13 +13694,41 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>>   	 * verifier sees.
>>   	 */
>>   	for (i = 0; i < nargs; i++) {
>> -		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
>> +		struct bpf_reg_state *regs = cur_regs(env), *reg;
>> +		struct bpf_reg_state saved_reg;
> [ ... ]
>
>> +		if (i < MAX_BPF_FUNC_REG_ARGS) {
>> +			reg = &regs[i + 1];
>> +		} else {
>> +			/*
>> +			 * Retrieve the spilled reg state from the stack arg slot.
>> +			 * Reuse the existing type checking infrastructure which
>> +			 * reads from cur_regs(env)[regno], temporarily copy the
>> +			 * stack arg reg state into regs[BPF_REG_1] and restore
>> +			 * it after checking.
>> +			 */
> [ ... ]
>
>> +			is_stack_arg = true;
>> +			regno = BPF_REG_1;
>> +			saved_reg = regs[BPF_REG_1];
>> +			regs[BPF_REG_1] = caller->stack_arg_slots[spi].spilled_ptr;
>> +			reg = &regs[BPF_REG_1];
>> +		}
>> +
>>   		if (is_kfunc_arg_prog_aux(btf, &args[i])) {
> [ ... ]
>
>> -			continue;
>> +			goto next_arg;
>>   		}
>>
>>   		if (is_kfunc_arg_ignore(btf, &args[i]) || is_kfunc_arg_implicit(meta, i))
>>   			continue;
> The is_kfunc_arg_prog_aux path above was converted from continue to
> goto next_arg so that regs[BPF_REG_1] gets restored via the next_arg
> label. The scalar path and the KF_ARG_PTR_TO_NULL path were also
> converted the same way.
>
> Should this continue for is_kfunc_arg_ignore/is_kfunc_arg_implicit
> also be goto next_arg? When a stack arg (i >= MAX_BPF_FUNC_REG_ARGS)

You are right. My initial version with a early bpf-next is okay.
But after rebase on top of master, I missed the above as it is
newly introduced. Will fix in the next revision.

> hits this continue, is_stack_arg is true but the next_arg restore is
> skipped, so regs[BPF_REG_1] keeps the spilled pointer from the
> current iteration. On the next iteration, saved_reg then captures that
> stale value instead of the original register state.
>
> The comment above says "restore it after checking", which this continue
> path does not do. The impact is currently masked because each iteration
> independently overwrites regs[BPF_REG_1] and the caller_saved clearing
> in check_kfunc_call() resets R1 after the loop, but it breaks the
> save/restore invariant the rest of this patch establishes.
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23879588767


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02  1:27 ` [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions Yonghong Song
  2026-04-02  3:18   ` bot+bpf-ci
@ 2026-04-02 18:55   ` Amery Hung
  2026-04-02 20:45     ` Yonghong Song
  2026-04-02 23:38   ` Amery Hung
  2026-04-02 23:38   ` Alexei Starovoitov
  3 siblings, 1 reply; 33+ messages in thread
From: Amery Hung @ 2026-04-02 18:55 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

On Wed, Apr 1, 2026 at 6:27 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Currently BPF functions (subprogs) are limited to 5 register arguments.
> With [1], the compiler can emit code that passes additional arguments
> via a dedicated stack area through bpf register
> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
>
> The following is an example to show how stack arguments are saved
> and transferred between caller and callee:
>
>   int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>     ...
>     bar(a1, a2, a3, a4, a5, a6, a7, a8);
>     ...
>   }
>
> The following is a illustration of stack allocation:
>
>    Caller (foo)                           Callee (bar)
>    ============                           ============
>    r12-relative stack arg area:           r12-relative stack arg area:
>
>    r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>    r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>                                      ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>    ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>    r12-24: [outgoing arg 6 to callee]+||   ...
>    r12-32: [outgoing arg 7 to callee]-+|
>    r12-40: [outgoing arg 8 to callee]--+
>
>   The caller writes outgoing args past its own incoming area.
>   At the call site, the verifier transfers the caller's outgoing
>   slots into the callee's incoming slots.
>
> The verifier tracks stack arg slots separately from the regular r10
> stack. A new 'bpf_stack_arg_state' structure mirrors the existing stack
> slot tracking (spilled_ptr + slot_type[]) but lives in a dedicated
> 'stack_arg_slots' array in bpf_func_state. This separation keeps the
> stack arg area from interfering with the normal stack and frame pointer
> (r10) bookkeeping.
>
> If the bpf function has more than one calls, e.g.,
>
>   int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>     ...
>     bar1(a1, a2, a3, a4, a5, a6, a7, a8);
>     ...
>     bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
>     ...
>   }
>
> The following is an illustration:
>
>    Caller (foo)                           Callee (bar1)
>    ============                           =============
>    r12-relative stack arg area:           r12-relative stack arg area:
>
>    r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>    r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>                                      ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>    ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>    r12-24: [outgoing arg 6 to callee]+||  ...
>    r12-32: [outgoing arg 7 to callee]-+|
>    r12-40: [outgoing arg 8 to callee]--+
>    ...
>    Back from bar1
>    ...                                     Callee (bar2)
>    ===                                     =============
>                                      +---> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>                                      |+--> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>                                      ||+-> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>                                      |||+> r12-32: [incoming arg 9] (from caller's outgoing r12-48)
>    ---- incoming/outgoing boundary   ||||  ---- incoming/outgoing boundary
>    r12-24: [outgoing arg 6 to callee]+|||  ...
>    r12-32: [outgoing arg 7 to callee]-+||
>    r12-40: [outgoing arg 8 to callee]--+|
>    r12-48: [outgoing arg 9 to callee]---+
>
> Global subprogs with >5 args are not yet supported.
>
>   [1] https://github.com/llvm/llvm-project/pull/189060
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h          |   2 +
>  include/linux/bpf_verifier.h |  15 ++-
>  kernel/bpf/btf.c             |  14 +-
>  kernel/bpf/verifier.c        | 248 ++++++++++++++++++++++++++++++++---
>  4 files changed, 257 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e24c4a2e95f7..a0a1e14e4394 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1666,6 +1666,8 @@ struct bpf_prog_aux {
>         u32 max_pkt_offset;
>         u32 max_tp_access;
>         u32 stack_depth;
> +       u16 incoming_stack_arg_depth;
> +       u16 stack_arg_depth; /* both incoming and max outgoing of stack arguments */
>         u32 id;
>         u32 func_cnt; /* used by non-func prog as the number of func progs */
>         u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 090aa26d1c98..a260610cd1c1 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -268,6 +268,11 @@ struct bpf_retval_range {
>         bool return_32bit;
>  };
>
> +struct bpf_stack_arg_state {
> +       struct bpf_reg_state spilled_ptr; /* for spilled scalar/pointer semantics */
> +       u8 slot_type[BPF_REG_SIZE];
> +};
> +
>  /* state of the program:
>   * type of all registers and stack info
>   */
> @@ -319,6 +324,10 @@ struct bpf_func_state {
>          * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE.
>          */
>         int allocated_stack;
> +
> +       u16 stack_arg_depth; /* Size of incoming + max outgoing stack args in bytes. */
> +       u16 incoming_stack_arg_depth; /* Size of incoming stack args in bytes. */
> +       struct bpf_stack_arg_state *stack_arg_slots;
>  };
>
>  #define MAX_CALL_FRAMES 8
> @@ -674,10 +683,12 @@ struct bpf_subprog_info {
>         bool keep_fastcall_stack: 1;
>         bool changes_pkt_data: 1;
>         bool might_sleep: 1;
> -       u8 arg_cnt:3;
> +       u8 arg_cnt:4;
>
>         enum priv_stack_mode priv_stack_mode;
> -       struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
> +       struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS];
> +       u16 incoming_stack_arg_depth;
> +       u16 outgoing_stack_arg_depth;
>  };
>
>  struct bpf_verifier_env;
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index a62d78581207..c5f3aa05d5a3 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -7887,13 +7887,19 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
>         }
>         args = (const struct btf_param *)(t + 1);
>         nargs = btf_type_vlen(t);
> -       if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> -               if (!is_global)
> -                       return -EINVAL;
> -               bpf_log(log, "Global function %s() with %d > %d args. Buggy compiler.\n",
> +       if (nargs > MAX_BPF_FUNC_ARGS) {
> +               bpf_log(log, "Function %s() with %d > %d args not supported.\n",
> +                       tname, nargs, MAX_BPF_FUNC_ARGS);
> +               return -EINVAL;
> +       }
> +       if (is_global && nargs > MAX_BPF_FUNC_REG_ARGS) {
> +               bpf_log(log, "Global function %s() with %d > %d args not supported.\n",
>                         tname, nargs, MAX_BPF_FUNC_REG_ARGS);
>                 return -EINVAL;
>         }
> +       if (nargs > MAX_BPF_FUNC_REG_ARGS)
> +               sub->incoming_stack_arg_depth = (nargs - MAX_BPF_FUNC_REG_ARGS) * BPF_REG_SIZE;
> +
>         /* check that function is void or returns int, exception cb also requires this */
>         t = btf_type_by_id(btf, t->type);
>         while (btf_type_is_modifier(t))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8c1cf2eb6cbb..d424fe611ef8 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1488,6 +1488,19 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
>                 return -ENOMEM;
>
>         dst->allocated_stack = src->allocated_stack;
> +
> +       /* copy stack_arg_slots state */
> +       n = src->stack_arg_depth / BPF_REG_SIZE;
> +       if (n) {
> +               dst->stack_arg_slots = copy_array(dst->stack_arg_slots, src->stack_arg_slots, n,
> +                                                 sizeof(struct bpf_stack_arg_state),
> +                                                 GFP_KERNEL_ACCOUNT);
> +               if (!dst->stack_arg_slots)
> +                       return -ENOMEM;
> +
> +               dst->stack_arg_depth = src->stack_arg_depth;
> +               dst->incoming_stack_arg_depth = src->incoming_stack_arg_depth;
> +       }
>         return 0;
>  }
>
> @@ -1529,6 +1542,25 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
>         return 0;
>  }
>
> +static int grow_stack_arg_slots(struct bpf_verifier_env *env,
> +                               struct bpf_func_state *state, int size)
> +{
> +       size_t old_n = state->stack_arg_depth / BPF_REG_SIZE, n;
> +
> +       size = round_up(size, BPF_REG_SIZE);
> +       n = size / BPF_REG_SIZE;
> +       if (old_n >= n)
> +               return 0;
> +
> +       state->stack_arg_slots = realloc_array(state->stack_arg_slots, old_n, n,
> +                                              sizeof(struct bpf_stack_arg_state));
> +       if (!state->stack_arg_slots)
> +               return -ENOMEM;
> +
> +       state->stack_arg_depth = size;
> +       return 0;
> +}
> +
>  /* Acquire a pointer id from the env and update the state->refs to include
>   * this new pointer reference.
>   * On success, returns a valid pointer id to associate with the register
> @@ -1699,6 +1731,7 @@ static void free_func_state(struct bpf_func_state *state)
>  {
>         if (!state)
>                 return;
> +       kfree(state->stack_arg_slots);
>         kfree(state->stack);
>         kfree(state);
>  }
> @@ -5848,6 +5881,101 @@ static int check_stack_write(struct bpf_verifier_env *env,
>         return err;
>  }
>
> +/* Validate that a stack arg access is 8-byte sized and aligned. */
> +static int check_stack_arg_access(struct bpf_verifier_env *env,
> +                                 struct bpf_insn *insn, const char *op)
> +{
> +       int size = bpf_size_to_bytes(BPF_SIZE(insn->code));
> +
> +       if (size != BPF_REG_SIZE) {
> +               verbose(env, "stack arg %s must be %d bytes, got %d\n",
> +                       op, BPF_REG_SIZE, size);
> +               return -EINVAL;
> +       }
> +       if (insn->off % BPF_REG_SIZE) {
> +               verbose(env, "stack arg %s offset %d not aligned to %d\n",
> +                       op, insn->off, BPF_REG_SIZE);
> +               return -EINVAL;
> +       }
> +       return 0;
> +}
> +
> +/* Check that a stack arg slot has been properly initialized. */
> +static bool is_stack_arg_slot_initialized(struct bpf_func_state *state, int spi)
> +{
> +       u8 type;
> +
> +       if (spi >= (int)(state->stack_arg_depth / BPF_REG_SIZE))
> +               return false;
> +       type = state->stack_arg_slots[spi].slot_type[BPF_REG_SIZE - 1];
> +       return type == STACK_SPILL || type == STACK_MISC;
> +}
> +
> +/*
> + * Write a value to the stack arg area.
> + * off is the negative offset from the stack arg frame pointer.
> + * Callers ensures off is 8-byte aligned and size is BPF_REG_SIZE.
> + */
> +static int check_stack_arg_write(struct bpf_verifier_env *env, struct bpf_func_state *state,
> +                                int off, int value_regno)
> +{
> +       int spi = (-off - 1) / BPF_REG_SIZE;
> +       struct bpf_func_state *cur;
> +       struct bpf_reg_state *reg;
> +       int i, err;
> +       u8 type;
> +
> +       err = grow_stack_arg_slots(env, state, -off);
> +       if (err)
> +               return err;
> +
> +       cur = env->cur_state->frame[env->cur_state->curframe];
> +       if (value_regno >= 0) {
> +               reg = &cur->regs[value_regno];
> +               state->stack_arg_slots[spi].spilled_ptr = *reg;
> +               type = is_spillable_regtype(reg->type) ? STACK_SPILL : STACK_MISC;

It seems any spillable register can be passed to the callee, so reg
containing ref_obj_id can be spilled to stack_arg_slots. However,
release_reference() does not invalidate ref_obj_id in stack_arg_slots.
Can this cause UAF like below?

a6 = bpf_task_acquire(t);
if (!a6)
        goto err;

// a6 now has a valid ref_obj_id
// foo1 calls bpf_task_release(a6);
foo1(a1, a2, a3, a4, a5, a6);

// a6 still has a valid ref_obj_id
// foo2 dereference a6 -> UAF
foo2(a1, a2, a3, a4, a5, a6);

Since stack_arg_slots is separated from the normal stack slots, other
types of stale registers may exist in the outgoing stack slots. For
example:
- stale pkt pointer after calling clear_all_pkt_pointers() in callee
- register with inprecise nullness after calling
mark_ptr_or_null_regs() in callee


> +               for (i = 0; i < BPF_REG_SIZE; i++)
> +                       state->stack_arg_slots[spi].slot_type[i] = type;
> +       } else {
> +               /* BPF_ST: store immediate, treat as scalar */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02 18:55   ` Amery Hung
@ 2026-04-02 20:45     ` Yonghong Song
  0 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02 20:45 UTC (permalink / raw)
  To: Amery Hung
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau



On 4/2/26 11:55 AM, Amery Hung wrote:
> On Wed, Apr 1, 2026 at 6:27 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Currently BPF functions (subprogs) are limited to 5 register arguments.
>> With [1], the compiler can emit code that passes additional arguments
>> via a dedicated stack area through bpf register
>> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
>>
>> The following is an example to show how stack arguments are saved
>> and transferred between caller and callee:
>>
>>    int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>>      ...
>>      bar(a1, a2, a3, a4, a5, a6, a7, a8);
>>      ...
>>    }
>>
>> The following is a illustration of stack allocation:
>>
>>     Caller (foo)                           Callee (bar)
>>     ============                           ============
>>     r12-relative stack arg area:           r12-relative stack arg area:
>>
>>     r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>     r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>                                       ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>     ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>>     r12-24: [outgoing arg 6 to callee]+||   ...
>>     r12-32: [outgoing arg 7 to callee]-+|
>>     r12-40: [outgoing arg 8 to callee]--+
>>
>>    The caller writes outgoing args past its own incoming area.
>>    At the call site, the verifier transfers the caller's outgoing
>>    slots into the callee's incoming slots.
>>
>> The verifier tracks stack arg slots separately from the regular r10
>> stack. A new 'bpf_stack_arg_state' structure mirrors the existing stack
>> slot tracking (spilled_ptr + slot_type[]) but lives in a dedicated
>> 'stack_arg_slots' array in bpf_func_state. This separation keeps the
>> stack arg area from interfering with the normal stack and frame pointer
>> (r10) bookkeeping.
>>
>> If the bpf function has more than one calls, e.g.,
>>
>>    int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>>      ...
>>      bar1(a1, a2, a3, a4, a5, a6, a7, a8);
>>      ...
>>      bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
>>      ...
>>    }
>>
>> The following is an illustration:
>>
>>     Caller (foo)                           Callee (bar1)
>>     ============                           =============
>>     r12-relative stack arg area:           r12-relative stack arg area:
>>
>>     r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>     r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>                                       ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>     ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>>     r12-24: [outgoing arg 6 to callee]+||  ...
>>     r12-32: [outgoing arg 7 to callee]-+|
>>     r12-40: [outgoing arg 8 to callee]--+
>>     ...
>>     Back from bar1
>>     ...                                     Callee (bar2)
>>     ===                                     =============
>>                                       +---> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>                                       |+--> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>                                       ||+-> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>                                       |||+> r12-32: [incoming arg 9] (from caller's outgoing r12-48)
>>     ---- incoming/outgoing boundary   ||||  ---- incoming/outgoing boundary
>>     r12-24: [outgoing arg 6 to callee]+|||  ...
>>     r12-32: [outgoing arg 7 to callee]-+||
>>     r12-40: [outgoing arg 8 to callee]--+|
>>     r12-48: [outgoing arg 9 to callee]---+
>>
>> Global subprogs with >5 args are not yet supported.
>>
>>    [1] https://github.com/llvm/llvm-project/pull/189060
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf.h          |   2 +
>>   include/linux/bpf_verifier.h |  15 ++-
>>   kernel/bpf/btf.c             |  14 +-
>>   kernel/bpf/verifier.c        | 248 ++++++++++++++++++++++++++++++++---
>>   4 files changed, 257 insertions(+), 22 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index e24c4a2e95f7..a0a1e14e4394 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1666,6 +1666,8 @@ struct bpf_prog_aux {
>>          u32 max_pkt_offset;
>>          u32 max_tp_access;
>>          u32 stack_depth;
>> +       u16 incoming_stack_arg_depth;
>> +       u16 stack_arg_depth; /* both incoming and max outgoing of stack arguments */
>>          u32 id;
>>          u32 func_cnt; /* used by non-func prog as the number of func progs */
>>          u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 090aa26d1c98..a260610cd1c1 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -268,6 +268,11 @@ struct bpf_retval_range {
>>          bool return_32bit;
>>   };

...

>> +
>> +/*
>> + * Write a value to the stack arg area.
>> + * off is the negative offset from the stack arg frame pointer.
>> + * Callers ensures off is 8-byte aligned and size is BPF_REG_SIZE.
>> + */
>> +static int check_stack_arg_write(struct bpf_verifier_env *env, struct bpf_func_state *state,
>> +                                int off, int value_regno)
>> +{
>> +       int spi = (-off - 1) / BPF_REG_SIZE;
>> +       struct bpf_func_state *cur;
>> +       struct bpf_reg_state *reg;
>> +       int i, err;
>> +       u8 type;
>> +
>> +       err = grow_stack_arg_slots(env, state, -off);
>> +       if (err)
>> +               return err;
>> +
>> +       cur = env->cur_state->frame[env->cur_state->curframe];
>> +       if (value_regno >= 0) {
>> +               reg = &cur->regs[value_regno];
>> +               state->stack_arg_slots[spi].spilled_ptr = *reg;
>> +               type = is_spillable_regtype(reg->type) ? STACK_SPILL : STACK_MISC;
> It seems any spillable register can be passed to the callee, so reg
> containing ref_obj_id can be spilled to stack_arg_slots. However,
> release_reference() does not invalidate ref_obj_id in stack_arg_slots.
> Can this cause UAF like below?
>
> a6 = bpf_task_acquire(t);
> if (!a6)
>          goto err;
>
> // a6 now has a valid ref_obj_id
> // foo1 calls bpf_task_release(a6);
> foo1(a1, a2, a3, a4, a5, a6);
>
> // a6 still has a valid ref_obj_id
> // foo2 dereference a6 -> UAF
> foo2(a1, a2, a3, a4, a5, a6);
>
> Since stack_arg_slots is separated from the normal stack slots, other
> types of stale registers may exist in the outgoing stack slots. For
> example:
> - stale pkt pointer after calling clear_all_pkt_pointers() in callee
> - register with inprecise nullness after calling
> mark_ptr_or_null_regs() in callee

Thanks for pointing this out. Indeed, a lot of checking are needed
for stack arguments. Looks like I need to add something like
	bpf_for_each_spilled_stack_arg
in bpf_for_each_reg_in_vstate_mask for full coverage.

Will fix in the next resivion.

>
>
>> +               for (i = 0; i < BPF_REG_SIZE; i++)
>> +                       state->stack_arg_slots[spi].slot_type[i] = type;
>> +       } else {
>> +               /* BPF_ST: store immediate, treat as scalar */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls
  2026-04-02  1:27 ` [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls Yonghong Song
  2026-04-02  3:18   ` bot+bpf-ci
@ 2026-04-02 21:02   ` Amery Hung
  1 sibling, 0 replies; 33+ messages in thread
From: Amery Hung @ 2026-04-02 21:02 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

On Wed, Apr 1, 2026 at 6:28 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Extend the stack argument mechanism to kfunc calls, allowing kfuncs
> with more than 5 parameters to receive additional arguments via the
> r12-based stack arg area.
>
> For kfuncs, the caller is a BPF program and the callee is a kernel
> function. The BPF program writes outgoing args at r12-relative offsets
> past its own incoming area.
>
> The following is an example to show how stack arguments are saved:
>
>    int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>      ...
>      kfunc1(a1, a2, a3, a4, a5, a6, a7, a8);
>      ...
>      kfunc2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
>      ...
>    }
>
> The following is an illustration:
>
>    Caller (foo)
>    ============
>        r12-relative stack arg area:
>
>        r12-8:  [incoming arg 6]
>        r12-16: [incoming arg 7]
>
>        ---- incoming/outgoing boundary (kfunc1)
>        r12-24: [outgoing arg 6 to callee]
>        r12-32: [outgoing arg 7 to callee]
>        r12-40: [outgoing arg 8 to callee]
>        ...
>        Back from kfunc1
>        ...
>
>        ---- incoming/outgoing boundary
>        r12-24: [outgoing arg 6 to callee]
>        r12-32: [outgoing arg 7 to callee]
>        r12-40: [outgoing arg 8 to callee]
>        r12-48: [outgoing arg 9 to callee]
>
> Later JIT will marshal outgoing arguments to the native calling convention
> for kfunc1() and kfunc2().
>
> In check_kfunc_args(), for args beyond the 5th, retrieve the spilled
> register state from the caller's stack arg slots. Temporarily copy
> it into regs[BPF_REG_1] to reuse the existing type checking
> infrastructure, then restore after checking. Also in fixup_kfunc_call(),
> repurpose insn->off (no longer needed after kfunc address resolution)
> to store the number of stack args, so the JIT knows how many args to marshal.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  kernel/bpf/verifier.c | 97 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 80 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d424fe611ef8..6579156486b8 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3502,7 +3502,7 @@ static int add_kfunc_call(struct bpf_verifier_env *env, u32 func_id, s16 offset)
>         struct bpf_kfunc_meta kfunc;
>         struct bpf_kfunc_desc *desc;
>         unsigned long addr;
> -       int err;
> +       int i, err;
>
>         prog_aux = env->prog->aux;
>         tab = prog_aux->kfunc_tab;
> @@ -3578,6 +3578,14 @@ static int add_kfunc_call(struct bpf_verifier_env *env, u32 func_id, s16 offset)
>         if (err)
>                 return err;
>
> +       for (i = MAX_BPF_FUNC_REG_ARGS; i < func_model.nr_args; i++) {
> +               if (func_model.arg_size[i] > sizeof(u64)) {
> +                       verbose(env, "kfunc %s arg#%d size %d > %zu not supported for stack args\n",
> +                               kfunc.name, i, func_model.arg_size[i], sizeof(u64));
> +                       return -EINVAL;
> +               }
> +       }
> +
>         desc = &tab->descs[tab->nr_descs++];
>         desc->func_id = func_id;
>         desc->offset = offset;
> @@ -12995,9 +13003,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>                        struct bpf_kfunc_call_arg_meta *meta,
>                        const struct btf_type *t, const struct btf_type *ref_t,
>                        const char *ref_tname, const struct btf_param *args,
> -                      int argno, int nargs)
> +                      int argno, int nargs, u32 regno)
>  {
> -       u32 regno = argno + 1;
>         struct bpf_reg_state *regs = cur_regs(env);
>         struct bpf_reg_state *reg = &regs[regno];
>         bool arg_mem_size = false;

[...]

> @@ -13677,9 +13684,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>
>         args = (const struct btf_param *)(meta->func_proto + 1);
>         nargs = btf_type_vlen(meta->func_proto);
> -       if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> +       if (nargs > MAX_BPF_FUNC_ARGS) {
>                 verbose(env, "Function %s has %d > %d args\n", func_name, nargs,
> -                       MAX_BPF_FUNC_REG_ARGS);
> +                       MAX_BPF_FUNC_ARGS);
>                 return -EINVAL;
>         }
>
> @@ -13687,13 +13694,41 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>          * verifier sees.
>          */
>         for (i = 0; i < nargs; i++) {
> -               struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
> +               struct bpf_reg_state *regs = cur_regs(env), *reg;
> +               struct bpf_reg_state saved_reg;
>                 const struct btf_type *t, *ref_t, *resolve_ret;
>                 enum bpf_arg_type arg_type = ARG_DONTCARE;
>                 u32 regno = i + 1, ref_id, type_size;
>                 bool is_ret_buf_sz = false;
> +               bool is_stack_arg = false;
>                 int kf_arg_type;
>
> +               if (i < MAX_BPF_FUNC_REG_ARGS) {
> +                       reg = &regs[i + 1];
> +               } else {
> +                       /*
> +                        * Retrieve the spilled reg state from the stack arg slot.
> +                        * Reuse the existing type checking infrastructure which
> +                        * reads from cur_regs(env)[regno], temporarily copy the
> +                        * stack arg reg state into regs[BPF_REG_1] and restore
> +                        * it after checking.
> +                        */
> +                       struct bpf_func_state *caller = cur_func(env);
> +                       int spi = caller->incoming_stack_arg_depth / BPF_REG_SIZE +
> +                                 (i - MAX_BPF_FUNC_REG_ARGS);
> +
> +                       if (!is_stack_arg_slot_initialized(caller, spi)) {
> +                               verbose(env, "stack arg#%d not properly initialized\n", i);
> +                               return -EINVAL;
> +                       }
> +
> +                       is_stack_arg = true;
> +                       regno = BPF_REG_1;

is_kfunc_arg_mem_size() and is_kfunc_arg_const_mem_size() will not
work properly in get_kfunc_ptr_arg_type(). Since they assume regno + 1
to be the size, R2 would be accidentally treated as size.

Maybe consider creating a helper to retrieve bpf_reg_state of a
specific arg and use it in get_kfunc_ptr_arg_type() instead of copying
and restoring?

> +                       saved_reg = regs[BPF_REG_1];
> +                       regs[BPF_REG_1] = caller->stack_arg_slots[spi].spilled_ptr;
> +                       reg = &regs[BPF_REG_1];
> +               }
> +
>                 if (is_kfunc_arg_prog_aux(btf, &args[i])) {
>                         /* Reject repeated use bpf_prog_aux */
>                         if (meta->arg_prog) {
> @@ -13702,7 +13737,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                         }
>                         meta->arg_prog = true;
>                         cur_aux(env)->arg_prog = regno;
> -                       continue;
> +                       goto next_arg;
>                 }
>
>                 if (is_kfunc_arg_ignore(btf, &args[i]) || is_kfunc_arg_implicit(meta, i))
> @@ -13725,9 +13760,11 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                                         verbose(env, "R%d must be a known constant\n", regno);
>                                         return -EINVAL;
>                                 }
> -                               ret = mark_chain_precision(env, regno);
> -                               if (ret < 0)
> -                                       return ret;
> +                               if (i < MAX_BPF_FUNC_REG_ARGS) {
> +                                       ret = mark_chain_precision(env, regno);
> +                                       if (ret < 0)
> +                                               return ret;
> +                               }
>                                 meta->arg_constant.found = true;
>                                 meta->arg_constant.value = reg->var_off.value;
>                         } else if (is_kfunc_arg_scalar_with_name(btf, &args[i], "rdonly_buf_size")) {
> @@ -13749,11 +13786,13 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                                 }
>
>                                 meta->r0_size = reg->var_off.value;
> -                               ret = mark_chain_precision(env, regno);
> -                               if (ret)
> -                                       return ret;
> +                               if (i < MAX_BPF_FUNC_REG_ARGS) {
> +                                       ret = mark_chain_precision(env, regno);
> +                                       if (ret)
> +                                               return ret;
> +                               }
>                         }
> -                       continue;
> +                       goto next_arg;
>                 }
>
>                 if (!btf_type_is_ptr(t)) {
> @@ -13782,13 +13821,14 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                 ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
>                 ref_tname = btf_name_by_offset(btf, ref_t->name_off);
>
> -               kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs);
> +               kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs,
> +                                                    regno);
>                 if (kf_arg_type < 0)
>                         return kf_arg_type;
>
>                 switch (kf_arg_type) {
>                 case KF_ARG_PTR_TO_NULL:
> -                       continue;
> +                       goto next_arg;
>                 case KF_ARG_PTR_TO_MAP:
>                         if (!reg->map_ptr) {
>                                 verbose(env, "pointer in R%d isn't map pointer\n", regno);
> @@ -14201,6 +14241,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                         break;
>                 }
>                 }
> +next_arg:
> +               if (is_stack_arg)
> +                       regs[BPF_REG_1] = saved_reg;
>         }
>
>         if (is_kfunc_release(meta) && !meta->release_regno) {
> @@ -14778,7 +14821,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>
>         nargs = btf_type_vlen(meta.func_proto);
>         args = (const struct btf_param *)(meta.func_proto + 1);
> -       for (i = 0; i < nargs; i++) {
> +       for (i = 0; i < nargs && i < MAX_BPF_FUNC_REG_ARGS; i++) {
>                 u32 regno = i + 1;
>
>                 t = btf_type_skip_modifiers(desc_btf, args[i].type, NULL);
> @@ -14789,6 +14832,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                         mark_btf_func_reg_size(env, regno, t->size);
>         }
>
> +       /* Track outgoing stack arg depth for kfuncs with >5 args */
> +       if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> +               struct bpf_func_state *caller = cur_func(env);
> +               struct bpf_subprog_info *caller_info = &env->subprog_info[caller->subprogno];
> +               u16 kfunc_stack_arg_depth = (nargs - MAX_BPF_FUNC_REG_ARGS) * BPF_REG_SIZE;
> +
> +               if (kfunc_stack_arg_depth > caller_info->outgoing_stack_arg_depth)
> +                       caller_info->outgoing_stack_arg_depth = kfunc_stack_arg_depth;
> +       }
> +
>         if (is_iter_next_kfunc(&meta)) {
>                 err = process_iter_next_call(env, insn_idx, &meta);
>                 if (err)
> @@ -23615,6 +23668,16 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>         if (!bpf_jit_supports_far_kfunc_call())
>                 insn->imm = BPF_CALL_IMM(desc->addr);
>
> +       /*
> +        * After resolving the kfunc address, insn->off is no longer needed
> +        * for BTF fd index. Repurpose it to store the number of stack args
> +        * so the JIT can marshal them.
> +        */
> +       if (desc->func_model.nr_args > MAX_BPF_FUNC_REG_ARGS)
> +               insn->off = desc->func_model.nr_args - MAX_BPF_FUNC_REG_ARGS;
> +       else
> +               insn->off = 0;
> +
>         if (is_bpf_obj_new_kfunc(desc->func_id) || is_bpf_percpu_obj_new_kfunc(desc->func_id)) {
>                 struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>                 struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
> --
> 2.52.0
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments
  2026-04-02  1:28 ` [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments Yonghong Song
@ 2026-04-02 22:26   ` Amery Hung
  2026-04-02 23:26     ` Yonghong Song
  2026-04-02 23:51   ` Alexei Starovoitov
  1 sibling, 1 reply; 33+ messages in thread
From: Amery Hung @ 2026-04-02 22:26 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

On Wed, Apr 1, 2026 at 6:28 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>

[...]

>                 case BPF_JMP | BPF_CALL: {
> +                       int off, base_off, n_stack_args, kfunc_stack_args = 0, stack_args = 0;
> +                       u16 outgoing_stack_args = stack_arg_depth - incoming_stack_arg_depth;
>                         u8 *ip = image + addrs[i - 1];
>
>                         func = (u8 *) __bpf_call_base + imm32;
> @@ -2449,6 +2549,29 @@ st:                      if (is_imm8(insn->off))
>                         }
>                         if (!imm32)
>                                 return -EINVAL;
> +
> +                       if (src_reg == BPF_PSEUDO_CALL && outgoing_stack_args > 0) {
> +                               n_stack_args = outgoing_stack_args / 8;
> +                               base_off = -(prog_stack_depth + incoming_stack_arg_depth);
> +                               ip += push_stack_args(&prog, base_off, n_stack_args, 1);
> +                       }
> +
> +                       if (src_reg != BPF_PSEUDO_CALL && insn->off > 0) {
> +                               kfunc_stack_args = insn->off;
> +                               stack_args = kfunc_stack_args > 1 ? kfunc_stack_args - 1 : 0;
> +                               base_off = -(prog_stack_depth + incoming_stack_arg_depth);
> +                               ip += push_stack_args(&prog, base_off, kfunc_stack_args, 2);
> +
> +                               /* mov r9, [rbp + base_off - 8] */
> +                               off = base_off - 8;
> +                               if (is_imm8(off)) {
> +                                       EMIT4(0x4C, 0x8B, 0x4D, off);
> +                                       ip += 4;
> +                               } else {
> +                                       EMIT3_off32(0x4C, 0x8B, 0x8D, off);
> +                                       ip += 7;
> +                               }
> +                       }

Do we need to make sure RSP is 16-byte aligned before passing arg
through stack per x86-64 ABI?

>                         if (priv_frame_ptr) {
>                                 push_r9(&prog);
>                                 ip += 2;
> @@ -2458,6 +2581,14 @@ st:                      if (is_imm8(insn->off))
>                                 return -EINVAL;
>                         if (priv_frame_ptr)
>                                 pop_r9(&prog);
> +                       if (stack_args > 0) {
> +                               /* add rsp, stack_args * 8 */
> +                               EMIT4(0x48, 0x83, 0xC4, stack_args * 8);
> +                       }
> +                       if (src_reg == BPF_PSEUDO_CALL && outgoing_stack_args > 0) {
> +                               /* add rsp, outgoing_stack_args */
> +                               EMIT4(0x48, 0x83, 0xC4, outgoing_stack_args);
> +                       }
>                         break;
>                 }
>
> --
> 2.52.0
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments
  2026-04-02 22:26   ` Amery Hung
@ 2026-04-02 23:26     ` Yonghong Song
  0 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-02 23:26 UTC (permalink / raw)
  To: Amery Hung
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau



On 4/2/26 3:26 PM, Amery Hung wrote:
> On Wed, Apr 1, 2026 at 6:28 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> [...]
>
>>                  case BPF_JMP | BPF_CALL: {
>> +                       int off, base_off, n_stack_args, kfunc_stack_args = 0, stack_args = 0;
>> +                       u16 outgoing_stack_args = stack_arg_depth - incoming_stack_arg_depth;
>>                          u8 *ip = image + addrs[i - 1];
>>
>>                          func = (u8 *) __bpf_call_base + imm32;
>> @@ -2449,6 +2549,29 @@ st:                      if (is_imm8(insn->off))
>>                          }
>>                          if (!imm32)
>>                                  return -EINVAL;
>> +
>> +                       if (src_reg == BPF_PSEUDO_CALL && outgoing_stack_args > 0) {
>> +                               n_stack_args = outgoing_stack_args / 8;
>> +                               base_off = -(prog_stack_depth + incoming_stack_arg_depth);
>> +                               ip += push_stack_args(&prog, base_off, n_stack_args, 1);
>> +                       }
>> +
>> +                       if (src_reg != BPF_PSEUDO_CALL && insn->off > 0) {
>> +                               kfunc_stack_args = insn->off;
>> +                               stack_args = kfunc_stack_args > 1 ? kfunc_stack_args - 1 : 0;
>> +                               base_off = -(prog_stack_depth + incoming_stack_arg_depth);
>> +                               ip += push_stack_args(&prog, base_off, kfunc_stack_args, 2);
>> +
>> +                               /* mov r9, [rbp + base_off - 8] */
>> +                               off = base_off - 8;
>> +                               if (is_imm8(off)) {
>> +                                       EMIT4(0x4C, 0x8B, 0x4D, off);
>> +                                       ip += 4;
>> +                               } else {
>> +                                       EMIT3_off32(0x4C, 0x8B, 0x8D, off);
>> +                                       ip += 7;
>> +                               }
>> +                       }
> Do we need to make sure RSP is 16-byte aligned before passing arg
> through stack per x86-64 ABI?

Good question. Without this patch set, looks like bpf jit here does not enforce 16 byte
aligned. IIUC, 16 byte aligned requirement will be necessary for some special
SSE/AVX/128-bit XMM etc. The bpf jit does not really emit them, so I guess it is okay?

>
>>                          if (priv_frame_ptr) {
>>                                  push_r9(&prog);
>>                                  ip += 2;
>> @@ -2458,6 +2581,14 @@ st:                      if (is_imm8(insn->off))
>>                                  return -EINVAL;
>>                          if (priv_frame_ptr)
>>                                  pop_r9(&prog);
>> +                       if (stack_args > 0) {
>> +                               /* add rsp, stack_args * 8 */
>> +                               EMIT4(0x48, 0x83, 0xC4, stack_args * 8);
>> +                       }
>> +                       if (src_reg == BPF_PSEUDO_CALL && outgoing_stack_args > 0) {
>> +                               /* add rsp, outgoing_stack_args */
>> +                               EMIT4(0x48, 0x83, 0xC4, outgoing_stack_args);
>> +                       }
>>                          break;
>>                  }
>>
>> --
>> 2.52.0
>>
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02  1:27 ` [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions Yonghong Song
  2026-04-02  3:18   ` bot+bpf-ci
  2026-04-02 18:55   ` Amery Hung
@ 2026-04-02 23:38   ` Amery Hung
  2026-04-03  4:05     ` Yonghong Song
  2026-04-02 23:38   ` Alexei Starovoitov
  3 siblings, 1 reply; 33+ messages in thread
From: Amery Hung @ 2026-04-02 23:38 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau

On Wed, Apr 1, 2026 at 6:28 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Currently BPF functions (subprogs) are limited to 5 register arguments.
> With [1], the compiler can emit code that passes additional arguments
> via a dedicated stack area through bpf register
> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
>
> The following is an example to show how stack arguments are saved
> and transferred between caller and callee:
>
>   int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>     ...
>     bar(a1, a2, a3, a4, a5, a6, a7, a8);
>     ...
>   }
>
> The following is a illustration of stack allocation:
>
>    Caller (foo)                           Callee (bar)
>    ============                           ============
>    r12-relative stack arg area:           r12-relative stack arg area:
>
>    r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>    r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>                                      ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>    ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>    r12-24: [outgoing arg 6 to callee]+||   ...
>    r12-32: [outgoing arg 7 to callee]-+|
>    r12-40: [outgoing arg 8 to callee]--+
>
>   The caller writes outgoing args past its own incoming area.
>   At the call site, the verifier transfers the caller's outgoing
>   slots into the callee's incoming slots.
>
> The verifier tracks stack arg slots separately from the regular r10
> stack. A new 'bpf_stack_arg_state' structure mirrors the existing stack
> slot tracking (spilled_ptr + slot_type[]) but lives in a dedicated
> 'stack_arg_slots' array in bpf_func_state. This separation keeps the
> stack arg area from interfering with the normal stack and frame pointer
> (r10) bookkeeping.
>
> If the bpf function has more than one calls, e.g.,
>
>   int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>     ...
>     bar1(a1, a2, a3, a4, a5, a6, a7, a8);
>     ...
>     bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
>     ...
>   }
>
> The following is an illustration:
>
>    Caller (foo)                           Callee (bar1)
>    ============                           =============
>    r12-relative stack arg area:           r12-relative stack arg area:
>
>    r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>    r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>                                      ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>    ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>    r12-24: [outgoing arg 6 to callee]+||  ...
>    r12-32: [outgoing arg 7 to callee]-+|
>    r12-40: [outgoing arg 8 to callee]--+
>    ...
>    Back from bar1
>    ...                                     Callee (bar2)
>    ===                                     =============
>                                      +---> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>                                      |+--> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>                                      ||+-> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>                                      |||+> r12-32: [incoming arg 9] (from caller's outgoing r12-48)
>    ---- incoming/outgoing boundary   ||||  ---- incoming/outgoing boundary
>    r12-24: [outgoing arg 6 to callee]+|||  ...
>    r12-32: [outgoing arg 7 to callee]-+||
>    r12-40: [outgoing arg 8 to callee]--+|
>    r12-48: [outgoing arg 9 to callee]---+
>
> Global subprogs with >5 args are not yet supported.
>
>   [1] https://github.com/llvm/llvm-project/pull/189060
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h          |   2 +
>  include/linux/bpf_verifier.h |  15 ++-
>  kernel/bpf/btf.c             |  14 +-
>  kernel/bpf/verifier.c        | 248 ++++++++++++++++++++++++++++++++---
>  4 files changed, 257 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e24c4a2e95f7..a0a1e14e4394 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1666,6 +1666,8 @@ struct bpf_prog_aux {
>         u32 max_pkt_offset;
>         u32 max_tp_access;
>         u32 stack_depth;
> +       u16 incoming_stack_arg_depth;
> +       u16 stack_arg_depth; /* both incoming and max outgoing of stack arguments */
>         u32 id;
>         u32 func_cnt; /* used by non-func prog as the number of func progs */
>         u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 090aa26d1c98..a260610cd1c1 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -268,6 +268,11 @@ struct bpf_retval_range {
>         bool return_32bit;
>  };
>
> +struct bpf_stack_arg_state {
> +       struct bpf_reg_state spilled_ptr; /* for spilled scalar/pointer semantics */
> +       u8 slot_type[BPF_REG_SIZE];
> +};
> +
>  /* state of the program:
>   * type of all registers and stack info
>   */
> @@ -319,6 +324,10 @@ struct bpf_func_state {
>          * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE.
>          */
>         int allocated_stack;
> +
> +       u16 stack_arg_depth; /* Size of incoming + max outgoing stack args in bytes. */
> +       u16 incoming_stack_arg_depth; /* Size of incoming stack args in bytes. */
> +       struct bpf_stack_arg_state *stack_arg_slots;
>  };
>
>  #define MAX_CALL_FRAMES 8
> @@ -674,10 +683,12 @@ struct bpf_subprog_info {
>         bool keep_fastcall_stack: 1;
>         bool changes_pkt_data: 1;
>         bool might_sleep: 1;
> -       u8 arg_cnt:3;
> +       u8 arg_cnt:4;
>
>         enum priv_stack_mode priv_stack_mode;
> -       struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
> +       struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS];
> +       u16 incoming_stack_arg_depth;
> +       u16 outgoing_stack_arg_depth;
>  };
>
>  struct bpf_verifier_env;
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index a62d78581207..c5f3aa05d5a3 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -7887,13 +7887,19 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
>         }
>         args = (const struct btf_param *)(t + 1);
>         nargs = btf_type_vlen(t);
> -       if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> -               if (!is_global)
> -                       return -EINVAL;
> -               bpf_log(log, "Global function %s() with %d > %d args. Buggy compiler.\n",
> +       if (nargs > MAX_BPF_FUNC_ARGS) {
> +               bpf_log(log, "Function %s() with %d > %d args not supported.\n",
> +                       tname, nargs, MAX_BPF_FUNC_ARGS);
> +               return -EINVAL;
> +       }
> +       if (is_global && nargs > MAX_BPF_FUNC_REG_ARGS) {
> +               bpf_log(log, "Global function %s() with %d > %d args not supported.\n",
>                         tname, nargs, MAX_BPF_FUNC_REG_ARGS);
>                 return -EINVAL;
>         }
> +       if (nargs > MAX_BPF_FUNC_REG_ARGS)
> +               sub->incoming_stack_arg_depth = (nargs - MAX_BPF_FUNC_REG_ARGS) * BPF_REG_SIZE;
> +
>         /* check that function is void or returns int, exception cb also requires this */
>         t = btf_type_by_id(btf, t->type);
>         while (btf_type_is_modifier(t))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8c1cf2eb6cbb..d424fe611ef8 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1488,6 +1488,19 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
>                 return -ENOMEM;
>
>         dst->allocated_stack = src->allocated_stack;
> +
> +       /* copy stack_arg_slots state */
> +       n = src->stack_arg_depth / BPF_REG_SIZE;
> +       if (n) {
> +               dst->stack_arg_slots = copy_array(dst->stack_arg_slots, src->stack_arg_slots, n,
> +                                                 sizeof(struct bpf_stack_arg_state),
> +                                                 GFP_KERNEL_ACCOUNT);
> +               if (!dst->stack_arg_slots)
> +                       return -ENOMEM;
> +
> +               dst->stack_arg_depth = src->stack_arg_depth;
> +               dst->incoming_stack_arg_depth = src->incoming_stack_arg_depth;
> +       }
>         return 0;
>  }
>
> @@ -1529,6 +1542,25 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
>         return 0;
>  }
>
> +static int grow_stack_arg_slots(struct bpf_verifier_env *env,
> +                               struct bpf_func_state *state, int size)
> +{
> +       size_t old_n = state->stack_arg_depth / BPF_REG_SIZE, n;
> +
> +       size = round_up(size, BPF_REG_SIZE);
> +       n = size / BPF_REG_SIZE;
> +       if (old_n >= n)
> +               return 0;
> +
> +       state->stack_arg_slots = realloc_array(state->stack_arg_slots, old_n, n,
> +                                              sizeof(struct bpf_stack_arg_state));
> +       if (!state->stack_arg_slots)
> +               return -ENOMEM;
> +
> +       state->stack_arg_depth = size;
> +       return 0;
> +}
> +
>  /* Acquire a pointer id from the env and update the state->refs to include
>   * this new pointer reference.
>   * On success, returns a valid pointer id to associate with the register
> @@ -1699,6 +1731,7 @@ static void free_func_state(struct bpf_func_state *state)
>  {
>         if (!state)
>                 return;
> +       kfree(state->stack_arg_slots);
>         kfree(state->stack);
>         kfree(state);
>  }
> @@ -5848,6 +5881,101 @@ static int check_stack_write(struct bpf_verifier_env *env,
>         return err;
>  }
>
> +/* Validate that a stack arg access is 8-byte sized and aligned. */
> +static int check_stack_arg_access(struct bpf_verifier_env *env,
> +                                 struct bpf_insn *insn, const char *op)
> +{
> +       int size = bpf_size_to_bytes(BPF_SIZE(insn->code));
> +
> +       if (size != BPF_REG_SIZE) {
> +               verbose(env, "stack arg %s must be %d bytes, got %d\n",
> +                       op, BPF_REG_SIZE, size);
> +               return -EINVAL;
> +       }
> +       if (insn->off % BPF_REG_SIZE) {
> +               verbose(env, "stack arg %s offset %d not aligned to %d\n",
> +                       op, insn->off, BPF_REG_SIZE);
> +               return -EINVAL;
> +       }
> +       return 0;
> +}
> +
> +/* Check that a stack arg slot has been properly initialized. */
> +static bool is_stack_arg_slot_initialized(struct bpf_func_state *state, int spi)
> +{
> +       u8 type;
> +
> +       if (spi >= (int)(state->stack_arg_depth / BPF_REG_SIZE))
> +               return false;
> +       type = state->stack_arg_slots[spi].slot_type[BPF_REG_SIZE - 1];
> +       return type == STACK_SPILL || type == STACK_MISC;
> +}
> +
> +/*
> + * Write a value to the stack arg area.
> + * off is the negative offset from the stack arg frame pointer.
> + * Callers ensures off is 8-byte aligned and size is BPF_REG_SIZE.
> + */
> +static int check_stack_arg_write(struct bpf_verifier_env *env, struct bpf_func_state *state,
> +                                int off, int value_regno)
> +{
> +       int spi = (-off - 1) / BPF_REG_SIZE;
> +       struct bpf_func_state *cur;
> +       struct bpf_reg_state *reg;
> +       int i, err;
> +       u8 type;
> +
> +       err = grow_stack_arg_slots(env, state, -off);
> +       if (err)
> +               return err;
> +
> +       cur = env->cur_state->frame[env->cur_state->curframe];
> +       if (value_regno >= 0) {
> +               reg = &cur->regs[value_regno];
> +               state->stack_arg_slots[spi].spilled_ptr = *reg;
> +               type = is_spillable_regtype(reg->type) ? STACK_SPILL : STACK_MISC;
> +               for (i = 0; i < BPF_REG_SIZE; i++)
> +                       state->stack_arg_slots[spi].slot_type[i] = type;
> +       } else {
> +               /* BPF_ST: store immediate, treat as scalar */
> +               reg = &state->stack_arg_slots[spi].spilled_ptr;
> +               reg->type = SCALAR_VALUE;
> +               __mark_reg_known(reg, (u32)env->prog->insnsi[env->insn_idx].imm);
> +               for (i = 0; i < BPF_REG_SIZE; i++)
> +                       state->stack_arg_slots[spi].slot_type[i] = STACK_MISC;
> +       }
> +       return 0;
> +}
> +
> +/*
> + * Read a value from the stack arg area.
> + * off is the negative offset from the stack arg frame pointer.
> + * Callers ensures off is 8-byte aligned and size is BPF_REG_SIZE.
> + */
> +static int check_stack_arg_read(struct bpf_verifier_env *env, struct bpf_func_state *state,
> +                               int off, int dst_regno)
> +{
> +       int spi = (-off - 1) / BPF_REG_SIZE;
> +       struct bpf_func_state *cur;
> +       u8 *stype;
> +
> +       if (-off > state->stack_arg_depth) {
> +               verbose(env, "invalid read from stack arg off %d depth %d\n",
> +                       off, state->stack_arg_depth);
> +               return -EACCES;
> +       }
> +
> +       stype = state->stack_arg_slots[spi].slot_type;
> +       cur = env->cur_state->frame[env->cur_state->curframe];
> +
> +       if (stype[BPF_REG_SIZE - 1] == STACK_SPILL)
> +               copy_register_state(&cur->regs[dst_regno],
> +                                   &state->stack_arg_slots[spi].spilled_ptr);
> +       else
> +               mark_reg_unknown(env, cur->regs, dst_regno);
> +       return 0;
> +}
> +
>  static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
>                                  int off, int size, enum bpf_access_type type)
>  {
> @@ -8022,10 +8150,23 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                           bool strict_alignment_once, bool is_ldsx,
>                           bool allow_trust_mismatch, const char *ctx)
>  {
> +       struct bpf_verifier_state *vstate = env->cur_state;
> +       struct bpf_func_state *state = vstate->frame[vstate->curframe];
>         struct bpf_reg_state *regs = cur_regs(env);
>         enum bpf_reg_type src_reg_type;
>         int err;
>
> +       /* Handle stack arg access */
> +       if (insn->src_reg == BPF_REG_STACK_ARG_BASE) {
> +               err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
> +               if (err)
> +                       return err;
> +               err = check_stack_arg_access(env, insn, "read");
> +               if (err)
> +                       return err;
> +               return check_stack_arg_read(env, state, insn->off, insn->dst_reg);
> +       }
> +
>         /* check src operand */
>         err = check_reg_arg(env, insn->src_reg, SRC_OP);
>         if (err)
> @@ -8054,10 +8195,23 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  static int check_store_reg(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                            bool strict_alignment_once)
>  {
> +       struct bpf_verifier_state *vstate = env->cur_state;
> +       struct bpf_func_state *state = vstate->frame[vstate->curframe];
>         struct bpf_reg_state *regs = cur_regs(env);
>         enum bpf_reg_type dst_reg_type;
>         int err;
>
> +       /* Handle stack arg write */
> +       if (insn->dst_reg == BPF_REG_STACK_ARG_BASE) {
> +               err = check_reg_arg(env, insn->src_reg, SRC_OP);
> +               if (err)
> +                       return err;
> +               err = check_stack_arg_access(env, insn, "write");
> +               if (err)
> +                       return err;
> +               return check_stack_arg_write(env, state, insn->off, insn->src_reg);
> +       }
> +
>         /* check src1 operand */
>         err = check_reg_arg(env, insn->src_reg, SRC_OP);
>         if (err)
> @@ -10940,8 +11094,10 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                            int *insn_idx)
>  {
>         struct bpf_verifier_state *state = env->cur_state;
> +       struct bpf_subprog_info *caller_info;
>         struct bpf_func_state *caller;
>         int err, subprog, target_insn;
> +       u16 callee_incoming;
>
>         target_insn = *insn_idx + insn->imm + 1;
>         subprog = find_subprog(env, target_insn);
> @@ -10993,6 +11149,15 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                 return 0;
>         }
>
> +       /*
> +        * Track caller's outgoing stack arg depth (max across all callees).
> +        * This is needed so the JIT knows how much stack arg space to allocate.
> +        */
> +       caller_info = &env->subprog_info[caller->subprogno];
> +       callee_incoming = env->subprog_info[subprog].incoming_stack_arg_depth;
> +       if (callee_incoming > caller_info->outgoing_stack_arg_depth)
> +               caller_info->outgoing_stack_arg_depth = callee_incoming;
> +
>         /* for regular function entry setup new frame and continue
>          * from that frame.
>          */
> @@ -11048,13 +11213,41 @@ static int set_callee_state(struct bpf_verifier_env *env,
>                             struct bpf_func_state *caller,
>                             struct bpf_func_state *callee, int insn_idx)
>  {

Taking note when reading the change to set_callee_state():

The function is not called when handling callback function, which uses
push_callback_call() -> setup_func_entry() -> callback specific
set_callee_state_cb. So caller stack argument will not be transferred.

This should be fine as callee's stack_arg_depth will remain zero and
then when callee tries to do r12 based load, check_stack_arg_read()
should reject the program. Not sure if this needs a selftest since
callbacks' set_callee_state_cb will also transfer register state very
intentionally.

> -       int i;
> +       struct bpf_subprog_info *callee_info;
> +       int i, err;
>
>         /* copy r1 - r5 args that callee can access.  The copy includes parent
>          * pointers, which connects us up to the liveness chain
>          */
>         for (i = BPF_REG_1; i <= BPF_REG_5; i++)
>                 callee->regs[i] = caller->regs[i];
> +
> +       /*
> +        * Transfer stack args from caller's outgoing area to callee's incoming area.
> +        * Caller wrote outgoing args at offsets '-(incoming + 8)', '-(incoming + 16)', ...
> +        * These outgoing args will go to callee's incoming area.
> +        */
> +       callee_info = &env->subprog_info[callee->subprogno];
> +       if (callee_info->incoming_stack_arg_depth) {
> +               int caller_incoming_slots = caller->incoming_stack_arg_depth / BPF_REG_SIZE;
> +               int callee_incoming_slots = callee_info->incoming_stack_arg_depth / BPF_REG_SIZE;
> +
> +               callee->incoming_stack_arg_depth = callee_info->incoming_stack_arg_depth;
> +               err = grow_stack_arg_slots(env, callee, callee_info->incoming_stack_arg_depth);
> +               if (err)
> +                       return err;
> +
> +               for (i = 0; i < callee_incoming_slots; i++) {
> +                       int caller_spi = i + caller_incoming_slots;
> +
> +                       if (!is_stack_arg_slot_initialized(caller, caller_spi)) {
> +                               verbose(env, "stack arg#%d not properly initialized\n",
> +                                       i + 1 + MAX_BPF_FUNC_REG_ARGS);
> +                               return -EINVAL;
> +                       }
> +                       callee->stack_arg_slots[i] = caller->stack_arg_slots[caller_spi];
> +               }
> +       }
>         return 0;
>  }
>
> @@ -21262,23 +21455,37 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
>                         verbose(env, "BPF_ST uses reserved fields\n");
>                         return -EINVAL;
>                 }
> -               /* check src operand */
> -               err = check_reg_arg(env, insn->dst_reg, SRC_OP);
> -               if (err)
> -                       return err;
>
> -               dst_reg_type = cur_regs(env)[insn->dst_reg].type;
> +               /* Handle stack arg write (store immediate) */
> +               if (insn->dst_reg == BPF_REG_STACK_ARG_BASE) {
> +                       struct bpf_verifier_state *vstate = env->cur_state;
> +                       struct bpf_func_state *state = vstate->frame[vstate->curframe];
>
> -               /* check that memory (dst_reg + off) is writeable */
> -               err = check_mem_access(env, env->insn_idx, insn->dst_reg,
> -                                      insn->off, BPF_SIZE(insn->code),
> -                                      BPF_WRITE, -1, false, false);
> -               if (err)
> -                       return err;
> +                       err = check_stack_arg_access(env, insn, "write");
> +                       if (err)
> +                               return err;
> +                       err = check_stack_arg_write(env, state, insn->off, -1);
> +                       if (err)
> +                               return err;
> +               } else {
> +                       /* check src operand */
> +                       err = check_reg_arg(env, insn->dst_reg, SRC_OP);
> +                       if (err)
> +                               return err;
>
> -               err = save_aux_ptr_type(env, dst_reg_type, false);
> -               if (err)
> -                       return err;
> +                       dst_reg_type = cur_regs(env)[insn->dst_reg].type;
> +
> +                       /* check that memory (dst_reg + off) is writeable */
> +                       err = check_mem_access(env, env->insn_idx, insn->dst_reg,
> +                                              insn->off, BPF_SIZE(insn->code),
> +                                              BPF_WRITE, -1, false, false);
> +                       if (err)
> +                               return err;
> +
> +                       err = save_aux_ptr_type(env, dst_reg_type, false);
> +                       if (err)
> +                               return err;
> +               }
>         } else if (class == BPF_JMP || class == BPF_JMP32) {
>                 u8 opcode = BPF_OP(insn->code);
>
> @@ -22974,8 +23181,14 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>         int err, num_exentries;
>         int old_len, subprog_start_adjustment = 0;
>
> -       if (env->subprog_cnt <= 1)
> +       if (env->subprog_cnt <= 1) {
> +               /*
> +                * Even without subprogs, kfunc calls with >5 args need stack arg space
> +                * allocated by the root program.
> +                */
> +               prog->aux->stack_arg_depth = env->subprog_info[0].outgoing_stack_arg_depth;
>                 return 0;
> +       }
>
>         for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
>                 if (!bpf_pseudo_func(insn) && !bpf_pseudo_call(insn))
> @@ -23065,6 +23278,9 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>
>                 func[i]->aux->name[0] = 'F';
>                 func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
> +               func[i]->aux->incoming_stack_arg_depth = env->subprog_info[i].incoming_stack_arg_depth;
> +               func[i]->aux->stack_arg_depth = env->subprog_info[i].incoming_stack_arg_depth +
> +                                               env->subprog_info[i].outgoing_stack_arg_depth;
>                 if (env->subprog_info[i].priv_stack_mode == PRIV_STACK_ADAPTIVE)
>                         func[i]->aux->jits_use_priv_stack = true;
>
> --
> 2.52.0
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02  1:27 ` [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions Yonghong Song
                     ` (2 preceding siblings ...)
  2026-04-02 23:38   ` Amery Hung
@ 2026-04-02 23:38   ` Alexei Starovoitov
  2026-04-03  4:10     ` Yonghong Song
  3 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2026-04-02 23:38 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau

On Wed, Apr 1, 2026 at 6:27 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Currently BPF functions (subprogs) are limited to 5 register arguments.
> With [1], the compiler can emit code that passes additional arguments
> via a dedicated stack area through bpf register
> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
>
> The following is an example to show how stack arguments are saved
> and transferred between caller and callee:
>
>   int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>     ...
>     bar(a1, a2, a3, a4, a5, a6, a7, a8);
>     ...
>   }
>
> The following is a illustration of stack allocation:
>
>    Caller (foo)                           Callee (bar)
>    ============                           ============
>    r12-relative stack arg area:           r12-relative stack arg area:
>
>    r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>    r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>                                      ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>    ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>    r12-24: [outgoing arg 6 to callee]+||   ...
>    r12-32: [outgoing arg 7 to callee]-+|
>    r12-40: [outgoing arg 8 to callee]--+

I haven't looked at the patch itself only at this diagram.
How does it suppose to map to x86 calling convention?
The shift is unusual.
x86 is using fp-N for outgoing and fp+N for incoming.
Why can't we use the same?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments
  2026-04-02  1:28 ` [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments Yonghong Song
  2026-04-02 22:26   ` Amery Hung
@ 2026-04-02 23:51   ` Alexei Starovoitov
  2026-04-03  4:13     ` Yonghong Song
  1 sibling, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2026-04-02 23:51 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau

On Wed, Apr 1, 2026 at 6:28 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Add x86_64 JIT support for BPF functions and kfuncs with more than
> 5 arguments. The extra arguments are passed through a stack area
> addressed by register r12 (BPF_REG_STACK_ARG_BASE) in BPF bytecode,
> which the JIT translates to RBP-relative accesses in native code.
>
> There are two possible approaches to allocate the stack arg area:
>
>   Option 1: Allocate a single combined region (incoming + max_outgoing)
>     below the program stack in the function prologue. All r12-relative
>     accesses become [rbp - prog_stack_depth - offset] where the 'offset'
>     is the offset value in (incoming + max_outgoing) region. This is
>     simple because the area is always at a fixed offset from RBP.
>     The tradeoff is slightly higher stack usage when multiple callees
>     have different stack arg counts — the area is sized to the maximum.
>
>   Option 2: Allocate each outgoing area individually at the call
>     site, sized exactly to the callee's needs. This minimizes
>     stack usage but significantly complicates the JIT: each call
>     site must dynamically adjust RSP, and addresses of stack args
>     would shift depending on context, making the offset
>     calculations harder.
>
> This patch uses Option 1 for simplicity.
>
> The native x86_64 stack layout for a function with incoming and
> outgoing stack args:
>
>   high address
>   ┌─────────────────────────┐
>   │ incoming stack arg N    │  [rbp + 16 + (N - 1) * 8]  (pushed by caller)
>   │ ...                     │
>   │ incoming stack arg 1    │  [rbp + 16]
>   ├─────────────────────────┤
>   │ return address          │  [rbp + 8]
>   │ saved rbp               │  [rbp]
>   ├─────────────────────────┤
>   │ callee-saved regs       │
>   │ BPF program stack       │  (stack_depth bytes)
>   ├─────────────────────────┤
>   │ incoming stack arg 1    │  [rbp - prog_stack_depth - 8]
>   │ ...   (copied from      │   (copied in prologue)
>   │        caller's push)   │
>   │ incoming stack arg N    │  [rbp - prog_stack_depth - N * 8]
>   ├─────────────────────────┤
>   │ outgoing stack arg 1    │  (written via r12-relative STX/ST,
>   │ ...                     │   JIT translates to RBP-relative)
>   │ outgoing stack arg M    │
>   └─────────────────────────┘
>     ...                        Other stack usage
>   ┌─────────────────────────┐
>   │ incoming stack arg M    │ (copy from outgoing stack arg to
>   │ ...                     │  incoming stack arg)
>   │ incoming stack arg 1    │
>   ├─────────────────────────┤
>   │ return address          │
>   │ saved rbp               │
>   ├─────────────────────────┤
>   │ ...                     │
>   └─────────────────────────┘
>   low address
>
> In prologue, the caller's incoming stack arguments are copied to callee's
> incoming stack arguments, which will be fetched by later load insns.
> The outgoing stack arguments are written by JIT RBP-relative STX or ST.
>
> For each bpf-to-bpf call, push outgoing stack args onto the native
> stack before CALL, pop them after return. So the same 'outgoing stack arg'
> area is used by all bpf-to-bpf functions.
>
> For kfunc calls, push stack args (arg 7+) onto the native stack
> and load arg 6 into R9 per the x86_64 calling convention,
> then clean up RSP after return.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  arch/x86/net/bpf_jit_comp.c | 145 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 138 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 32864dbc2c4e..807493f109e5 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -367,6 +367,27 @@ static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
>         *pprog = prog;
>  }
>
> +static int push_stack_args(u8 **pprog, s32 base_off, int from, int to)
> +{
> +       u8 *prog = *pprog;
> +       int j, off, cnt = 0;
> +
> +       for (j = from; j >= to; j--) {
> +               off = base_off - j * 8;
> +
> +               /* push qword [rbp + off] */
> +               if (is_imm8(off)) {
> +                       EMIT3(0xFF, 0x75, off);
> +                       cnt += 3;
> +               } else {
> +                       EMIT2_off32(0xFF, 0xB5, off);
> +                       cnt += 6;
> +               }
> +       }
> +       *pprog = prog;
> +       return cnt;
> +}
> +
>  static void pop_r12(u8 **pprog)
>  {
>         u8 *prog = *pprog;
> @@ -1664,19 +1685,35 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>         int i, excnt = 0;
>         int ilen, proglen = 0;
>         u8 *prog = temp;
> -       u32 stack_depth;
> +       u16 stack_arg_depth, incoming_stack_arg_depth;
> +       u32 prog_stack_depth, stack_depth;
> +       bool has_stack_args;
>         int err;
>
>         stack_depth = bpf_prog->aux->stack_depth;
> +       stack_arg_depth = bpf_prog->aux->stack_arg_depth;
> +       incoming_stack_arg_depth = bpf_prog->aux->incoming_stack_arg_depth;
>         priv_stack_ptr = bpf_prog->aux->priv_stack_ptr;
>         if (priv_stack_ptr) {
>                 priv_frame_ptr = priv_stack_ptr + PRIV_STACK_GUARD_SZ + round_up(stack_depth, 8);
>                 stack_depth = 0;
>         }
>
> +       /*
> +        * Save program stack depth before adding stack arg space.
> +        * Each function allocates its own stack arg space
> +        * (incoming + outgoing) below its BPF stack.
> +        * Stack args are accessed via RBP-based addressing.
> +        */
> +       prog_stack_depth = round_up(stack_depth, 8);
> +       if (stack_arg_depth)
> +               stack_depth += stack_arg_depth;
> +       has_stack_args = stack_arg_depth > 0;
> +
>         arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena);
>         user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena);
>
> +
>         detect_reg_usage(insn, insn_cnt, callee_regs_used);
>
>         emit_prologue(&prog, image, stack_depth,
> @@ -1704,6 +1741,38 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>                 emit_mov_imm64(&prog, X86_REG_R12,
>                                arena_vm_start >> 32, (u32) arena_vm_start);
>
> +       if (incoming_stack_arg_depth && bpf_is_subprog(bpf_prog)) {
> +               int n = incoming_stack_arg_depth / 8;
> +
> +               /*
> +                * Caller pushed stack args before CALL, so after prologue
> +                * (CALL saves ret addr, then PUSH saves old RBP) they sit
> +                * above RBP:
> +                *
> +                *   [rbp + 16 + (n - 1) * 8]  stack_arg n
> +                *   ...
> +                *   [rbp + 24]                stack_arg 2
> +                *   [rbp + 16]                stack_arg 1
> +                *   [rbp +  8]                return address
> +                *   [rbp +  0]                saved rbp
> +                *
> +                * Copy each into callee's own region below the program stack:
> +                *   [rbp - prog_stack_depth - i * 8]
> +                */
> +               for (i = 0; i < n; i++) {
> +                       s32 src = 16 + i * 8;
> +                       s32 dst = -prog_stack_depth - (i + 1) * 8;
> +
> +                       /* mov rax, [rbp + src] */
> +                       EMIT4(0x48, 0x8B, 0x45, src);
> +                       /* mov [rbp + dst], rax */
> +                       if (is_imm8(dst))
> +                               EMIT4(0x48, 0x89, 0x45, dst);
> +                       else
> +                               EMIT3_off32(0x48, 0x89, 0x85, dst);
> +               }

This is really suboptimal.
bpf calling convention for 6+ args needs to match x86.
With an exception of 6th arg.
All bpf insn need to remain as-is when calling another bpf prog
or kfunc. There should be no additional moves.
JIT should only special case 6th arg and convert bpf's STX [r12-N], src_reg
into 'mov r9, src_reg', since r9 is used to pass 6th argument on x86.
The rest of STX needs to be jitted pretty much as-is
with a twist that bpf's r12 becomes %rbp on x86.
And similar things in the callee.
Instead of LDX [r12+N] it will be a 'mov dst_reg, r9' where r9 is x86's r9.
Other LDX from [r12+M] will remain as-is, but r12->%rbp.
On arm64 more of the STX/LDX insns become native 'mov'-s
because arm64 has more registers for arguments.

pw-bot: cr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02 23:38   ` Amery Hung
@ 2026-04-03  4:05     ` Yonghong Song
  0 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-03  4:05 UTC (permalink / raw)
  To: Amery Hung
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, kernel-team, Martin KaFai Lau



On 4/2/26 4:38 PM, Amery Hung wrote:
> On Wed, Apr 1, 2026 at 6:28 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Currently BPF functions (subprogs) are limited to 5 register arguments.
>> With [1], the compiler can emit code that passes additional arguments
>> via a dedicated stack area through bpf register
>> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
>>
>> The following is an example to show how stack arguments are saved
>> and transferred between caller and callee:
>>
>>    int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>>      ...
>>      bar(a1, a2, a3, a4, a5, a6, a7, a8);
>>      ...
>>    }
>>
>> The following is a illustration of stack allocation:
>>
>>     Caller (foo)                           Callee (bar)
>>     ============                           ============
>>     r12-relative stack arg area:           r12-relative stack arg area:
>>
>>     r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>     r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>                                       ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>     ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>>     r12-24: [outgoing arg 6 to callee]+||   ...
>>     r12-32: [outgoing arg 7 to callee]-+|
>>     r12-40: [outgoing arg 8 to callee]--+
>>
>>    The caller writes outgoing args past its own incoming area.
>>    At the call site, the verifier transfers the caller's outgoing
>>    slots into the callee's incoming slots.
>>
>> The verifier tracks stack arg slots separately from the regular r10
>> stack. A new 'bpf_stack_arg_state' structure mirrors the existing stack
>> slot tracking (spilled_ptr + slot_type[]) but lives in a dedicated
>> 'stack_arg_slots' array in bpf_func_state. This separation keeps the
>> stack arg area from interfering with the normal stack and frame pointer
>> (r10) bookkeeping.
>>
>> If the bpf function has more than one calls, e.g.,
>>
>>    int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>>      ...
>>      bar1(a1, a2, a3, a4, a5, a6, a7, a8);
>>      ...
>>      bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
>>      ...
>>    }
>>
>> The following is an illustration:
>>
>>     Caller (foo)                           Callee (bar1)
>>     ============                           =============
>>     r12-relative stack arg area:           r12-relative stack arg area:
>>
>>     r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>     r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>                                       ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>     ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>>     r12-24: [outgoing arg 6 to callee]+||  ...
>>     r12-32: [outgoing arg 7 to callee]-+|
>>     r12-40: [outgoing arg 8 to callee]--+
>>     ...
>>     Back from bar1
>>     ...                                     Callee (bar2)
>>     ===                                     =============
>>                                       +---> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>                                       |+--> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>                                       ||+-> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>                                       |||+> r12-32: [incoming arg 9] (from caller's outgoing r12-48)
>>     ---- incoming/outgoing boundary   ||||  ---- incoming/outgoing boundary
>>     r12-24: [outgoing arg 6 to callee]+|||  ...
>>     r12-32: [outgoing arg 7 to callee]-+||
>>     r12-40: [outgoing arg 8 to callee]--+|
>>     r12-48: [outgoing arg 9 to callee]---+
>>
>> Global subprogs with >5 args are not yet supported.
>>
>>    [1] https://github.com/llvm/llvm-project/pull/189060
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf.h          |   2 +
>>   include/linux/bpf_verifier.h |  15 ++-
>>   kernel/bpf/btf.c             |  14 +-
>>   kernel/bpf/verifier.c        | 248 ++++++++++++++++++++++++++++++++---
>>   4 files changed, 257 insertions(+), 22 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index e24c4a2e95f7..a0a1e14e4394 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1666,6 +1666,8 @@ struct bpf_prog_aux {
>>          u32 max_pkt_offset;
>>          u32 max_tp_access;
>>          u32 stack_depth;
>> +       u16 incoming_stack_arg_depth;
>> +       u16 stack_arg_depth; /* both incoming and max outgoing of stack arguments */
>>          u32 id;
>>          u32 func_cnt; /* used by non-func prog as the number of func progs */
>>          u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
[...]
>> @@ -8054,10 +8195,23 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>   static int check_store_reg(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>                             bool strict_alignment_once)
>>   {
>> +       struct bpf_verifier_state *vstate = env->cur_state;
>> +       struct bpf_func_state *state = vstate->frame[vstate->curframe];
>>          struct bpf_reg_state *regs = cur_regs(env);
>>          enum bpf_reg_type dst_reg_type;
>>          int err;
>>
>> +       /* Handle stack arg write */
>> +       if (insn->dst_reg == BPF_REG_STACK_ARG_BASE) {
>> +               err = check_reg_arg(env, insn->src_reg, SRC_OP);
>> +               if (err)
>> +                       return err;
>> +               err = check_stack_arg_access(env, insn, "write");
>> +               if (err)
>> +                       return err;
>> +               return check_stack_arg_write(env, state, insn->off, insn->src_reg);
>> +       }
>> +
>>          /* check src1 operand */
>>          err = check_reg_arg(env, insn->src_reg, SRC_OP);
>>          if (err)
>> @@ -10940,8 +11094,10 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>                             int *insn_idx)
>>   {
>>          struct bpf_verifier_state *state = env->cur_state;
>> +       struct bpf_subprog_info *caller_info;
>>          struct bpf_func_state *caller;
>>          int err, subprog, target_insn;
>> +       u16 callee_incoming;
>>
>>          target_insn = *insn_idx + insn->imm + 1;
>>          subprog = find_subprog(env, target_insn);
>> @@ -10993,6 +11149,15 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>                  return 0;
>>          }
>>
>> +       /*
>> +        * Track caller's outgoing stack arg depth (max across all callees).
>> +        * This is needed so the JIT knows how much stack arg space to allocate.
>> +        */
>> +       caller_info = &env->subprog_info[caller->subprogno];
>> +       callee_incoming = env->subprog_info[subprog].incoming_stack_arg_depth;
>> +       if (callee_incoming > caller_info->outgoing_stack_arg_depth)
>> +               caller_info->outgoing_stack_arg_depth = callee_incoming;
>> +
>>          /* for regular function entry setup new frame and continue
>>           * from that frame.
>>           */
>> @@ -11048,13 +11213,41 @@ static int set_callee_state(struct bpf_verifier_env *env,
>>                              struct bpf_func_state *caller,
>>                              struct bpf_func_state *callee, int insn_idx)
>>   {
> Taking note when reading the change to set_callee_state():
>
> The function is not called when handling callback function, which uses
> push_callback_call() -> setup_func_entry() -> callback specific
> set_callee_state_cb. So caller stack argument will not be transferred.
>
> This should be fine as callee's stack_arg_depth will remain zero and
> then when callee tries to do r12 based load, check_stack_arg_read()
> should reject the program. Not sure if this needs a selftest since
> callbacks' set_callee_state_cb will also transfer register state very
> intentionally.

All callback functions are carefully designed in kernel. So far all
callback functions are within 5 register parameters. I ignore them
for now. If in the future, there is a need for callback functions
with more than 5 arguments, we can deal with them at that time.

>
>> -       int i;
>> +       struct bpf_subprog_info *callee_info;
>> +       int i, err;
>>
>>          /* copy r1 - r5 args that callee can access.  The copy includes parent
>>           * pointers, which connects us up to the liveness chain
>>           */
>>          for (i = BPF_REG_1; i <= BPF_REG_5; i++)
>>                  callee->regs[i] = caller->regs[i];

[...]


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-02 23:38   ` Alexei Starovoitov
@ 2026-04-03  4:10     ` Yonghong Song
  2026-04-05 21:07       ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Yonghong Song @ 2026-04-03  4:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau



On 4/2/26 4:38 PM, Alexei Starovoitov wrote:
> On Wed, Apr 1, 2026 at 6:27 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Currently BPF functions (subprogs) are limited to 5 register arguments.
>> With [1], the compiler can emit code that passes additional arguments
>> via a dedicated stack area through bpf register
>> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
>>
>> The following is an example to show how stack arguments are saved
>> and transferred between caller and callee:
>>
>>    int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>>      ...
>>      bar(a1, a2, a3, a4, a5, a6, a7, a8);
>>      ...
>>    }
>>
>> The following is a illustration of stack allocation:
>>
>>     Caller (foo)                           Callee (bar)
>>     ============                           ============
>>     r12-relative stack arg area:           r12-relative stack arg area:
>>
>>     r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>     r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>                                       ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>     ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>>     r12-24: [outgoing arg 6 to callee]+||   ...
>>     r12-32: [outgoing arg 7 to callee]-+|
>>     r12-40: [outgoing arg 8 to callee]--+
> I haven't looked at the patch itself only at this diagram.
> How does it suppose to map to x86 calling convention?
> The shift is unusual.
> x86 is using fp-N for outgoing and fp+N for incoming.
> Why can't we use the same?
>
This is not for jit. The above transfer is for verification purpose.
For example, for callee, a load 'rX = *(u64 *)(r12 - 8)' can easily
get the value rX in callee since the value is copied from caller to callee.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments
  2026-04-02 23:51   ` Alexei Starovoitov
@ 2026-04-03  4:13     ` Yonghong Song
  0 siblings, 0 replies; 33+ messages in thread
From: Yonghong Song @ 2026-04-03  4:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau



On 4/2/26 4:51 PM, Alexei Starovoitov wrote:
> On Wed, Apr 1, 2026 at 6:28 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Add x86_64 JIT support for BPF functions and kfuncs with more than
>> 5 arguments. The extra arguments are passed through a stack area
>> addressed by register r12 (BPF_REG_STACK_ARG_BASE) in BPF bytecode,
>> which the JIT translates to RBP-relative accesses in native code.
>>
>> There are two possible approaches to allocate the stack arg area:
>>
>>    Option 1: Allocate a single combined region (incoming + max_outgoing)
>>      below the program stack in the function prologue. All r12-relative
>>      accesses become [rbp - prog_stack_depth - offset] where the 'offset'
>>      is the offset value in (incoming + max_outgoing) region. This is
>>      simple because the area is always at a fixed offset from RBP.
>>      The tradeoff is slightly higher stack usage when multiple callees
>>      have different stack arg counts — the area is sized to the maximum.
>>
>>    Option 2: Allocate each outgoing area individually at the call
>>      site, sized exactly to the callee's needs. This minimizes
>>      stack usage but significantly complicates the JIT: each call
>>      site must dynamically adjust RSP, and addresses of stack args
>>      would shift depending on context, making the offset
>>      calculations harder.
>>
>> This patch uses Option 1 for simplicity.
>>
>> The native x86_64 stack layout for a function with incoming and
>> outgoing stack args:
>>
>>    high address
>>    ┌─────────────────────────┐
>>    │ incoming stack arg N    │  [rbp + 16 + (N - 1) * 8]  (pushed by caller)
>>    │ ...                     │
>>    │ incoming stack arg 1    │  [rbp + 16]
>>    ├─────────────────────────┤
>>    │ return address          │  [rbp + 8]
>>    │ saved rbp               │  [rbp]
>>    ├─────────────────────────┤
>>    │ callee-saved regs       │
>>    │ BPF program stack       │  (stack_depth bytes)
>>    ├─────────────────────────┤
>>    │ incoming stack arg 1    │  [rbp - prog_stack_depth - 8]
>>    │ ...   (copied from      │   (copied in prologue)
>>    │        caller's push)   │
>>    │ incoming stack arg N    │  [rbp - prog_stack_depth - N * 8]
>>    ├─────────────────────────┤
>>    │ outgoing stack arg 1    │  (written via r12-relative STX/ST,
>>    │ ...                     │   JIT translates to RBP-relative)
>>    │ outgoing stack arg M    │
>>    └─────────────────────────┘
>>      ...                        Other stack usage
>>    ┌─────────────────────────┐
>>    │ incoming stack arg M    │ (copy from outgoing stack arg to
>>    │ ...                     │  incoming stack arg)
>>    │ incoming stack arg 1    │
>>    ├─────────────────────────┤
>>    │ return address          │
>>    │ saved rbp               │
>>    ├─────────────────────────┤
>>    │ ...                     │
>>    └─────────────────────────┘
>>    low address
>>
>> In prologue, the caller's incoming stack arguments are copied to callee's
>> incoming stack arguments, which will be fetched by later load insns.
>> The outgoing stack arguments are written by JIT RBP-relative STX or ST.
>>
>> For each bpf-to-bpf call, push outgoing stack args onto the native
>> stack before CALL, pop them after return. So the same 'outgoing stack arg'
>> area is used by all bpf-to-bpf functions.
>>
>> For kfunc calls, push stack args (arg 7+) onto the native stack
>> and load arg 6 into R9 per the x86_64 calling convention,
>> then clean up RSP after return.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   arch/x86/net/bpf_jit_comp.c | 145 ++++++++++++++++++++++++++++++++++--
>>   1 file changed, 138 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 32864dbc2c4e..807493f109e5 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -367,6 +367,27 @@ static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
>>          *pprog = prog;
>>   }
>>
>> +static int push_stack_args(u8 **pprog, s32 base_off, int from, int to)
>> +{
>> +       u8 *prog = *pprog;
>> +       int j, off, cnt = 0;
>> +
>> +       for (j = from; j >= to; j--) {
>> +               off = base_off - j * 8;
>> +
>> +               /* push qword [rbp + off] */
>> +               if (is_imm8(off)) {
>> +                       EMIT3(0xFF, 0x75, off);
>> +                       cnt += 3;
>> +               } else {
>> +                       EMIT2_off32(0xFF, 0xB5, off);
>> +                       cnt += 6;
>> +               }
>> +       }
>> +       *pprog = prog;
>> +       return cnt;
>> +}
>> +
>>   static void pop_r12(u8 **pprog)
>>   {
>>          u8 *prog = *pprog;
>> @@ -1664,19 +1685,35 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>>          int i, excnt = 0;
>>          int ilen, proglen = 0;
>>          u8 *prog = temp;
>> -       u32 stack_depth;
>> +       u16 stack_arg_depth, incoming_stack_arg_depth;
>> +       u32 prog_stack_depth, stack_depth;
>> +       bool has_stack_args;
>>          int err;
>>
>>          stack_depth = bpf_prog->aux->stack_depth;
>> +       stack_arg_depth = bpf_prog->aux->stack_arg_depth;
>> +       incoming_stack_arg_depth = bpf_prog->aux->incoming_stack_arg_depth;
>>          priv_stack_ptr = bpf_prog->aux->priv_stack_ptr;
>>          if (priv_stack_ptr) {
>>                  priv_frame_ptr = priv_stack_ptr + PRIV_STACK_GUARD_SZ + round_up(stack_depth, 8);
>>                  stack_depth = 0;
>>          }
>>
>> +       /*
>> +        * Save program stack depth before adding stack arg space.
>> +        * Each function allocates its own stack arg space
>> +        * (incoming + outgoing) below its BPF stack.
>> +        * Stack args are accessed via RBP-based addressing.
>> +        */
>> +       prog_stack_depth = round_up(stack_depth, 8);
>> +       if (stack_arg_depth)
>> +               stack_depth += stack_arg_depth;
>> +       has_stack_args = stack_arg_depth > 0;
>> +
>>          arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena);
>>          user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena);
>>
>> +
>>          detect_reg_usage(insn, insn_cnt, callee_regs_used);
>>
>>          emit_prologue(&prog, image, stack_depth,
>> @@ -1704,6 +1741,38 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>>                  emit_mov_imm64(&prog, X86_REG_R12,
>>                                 arena_vm_start >> 32, (u32) arena_vm_start);
>>
>> +       if (incoming_stack_arg_depth && bpf_is_subprog(bpf_prog)) {
>> +               int n = incoming_stack_arg_depth / 8;
>> +
>> +               /*
>> +                * Caller pushed stack args before CALL, so after prologue
>> +                * (CALL saves ret addr, then PUSH saves old RBP) they sit
>> +                * above RBP:
>> +                *
>> +                *   [rbp + 16 + (n - 1) * 8]  stack_arg n
>> +                *   ...
>> +                *   [rbp + 24]                stack_arg 2
>> +                *   [rbp + 16]                stack_arg 1
>> +                *   [rbp +  8]                return address
>> +                *   [rbp +  0]                saved rbp
>> +                *
>> +                * Copy each into callee's own region below the program stack:
>> +                *   [rbp - prog_stack_depth - i * 8]
>> +                */
>> +               for (i = 0; i < n; i++) {
>> +                       s32 src = 16 + i * 8;
>> +                       s32 dst = -prog_stack_depth - (i + 1) * 8;
>> +
>> +                       /* mov rax, [rbp + src] */
>> +                       EMIT4(0x48, 0x8B, 0x45, src);
>> +                       /* mov [rbp + dst], rax */
>> +                       if (is_imm8(dst))
>> +                               EMIT4(0x48, 0x89, 0x45, dst);
>> +                       else
>> +                               EMIT3_off32(0x48, 0x89, 0x85, dst);
>> +               }
> This is really suboptimal.
> bpf calling convention for 6+ args needs to match x86.
> With an exception of 6th arg.
> All bpf insn need to remain as-is when calling another bpf prog
> or kfunc. There should be no additional moves.
> JIT should only special case 6th arg and convert bpf's STX [r12-N], src_reg
> into 'mov r9, src_reg', since r9 is used to pass 6th argument on x86.
> The rest of STX needs to be jitted pretty much as-is
> with a twist that bpf's r12 becomes %rbp on x86.
> And similar things in the callee.
> Instead of LDX [r12+N] it will be a 'mov dst_reg, r9' where r9 is x86's r9.
> Other LDX from [r12+M] will remain as-is, but r12->%rbp.
> On arm64 more of the STX/LDX insns become native 'mov'-s
> because arm64 has more registers for arguments.

Good point. I will try to simplify the JIT by following x86_64
calling convention.

>
> pw-bot: cr


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-03  4:10     ` Yonghong Song
@ 2026-04-05 21:07       ` Alexei Starovoitov
  2026-04-06  4:29         ` Yonghong Song
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2026-04-05 21:07 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau

On Thu, Apr 2, 2026 at 9:11 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 4/2/26 4:38 PM, Alexei Starovoitov wrote:
> > On Wed, Apr 1, 2026 at 6:27 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >> Currently BPF functions (subprogs) are limited to 5 register arguments.
> >> With [1], the compiler can emit code that passes additional arguments
> >> via a dedicated stack area through bpf register
> >> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
> >>
> >> The following is an example to show how stack arguments are saved
> >> and transferred between caller and callee:
> >>
> >>    int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
> >>      ...
> >>      bar(a1, a2, a3, a4, a5, a6, a7, a8);
> >>      ...
> >>    }
> >>
> >> The following is a illustration of stack allocation:
> >>
> >>     Caller (foo)                           Callee (bar)
> >>     ============                           ============
> >>     r12-relative stack arg area:           r12-relative stack arg area:
> >>
> >>     r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
> >>     r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
> >>                                       ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
> >>     ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
> >>     r12-24: [outgoing arg 6 to callee]+||   ...
> >>     r12-32: [outgoing arg 7 to callee]-+|
> >>     r12-40: [outgoing arg 8 to callee]--+
> > I haven't looked at the patch itself only at this diagram.
> > How does it suppose to map to x86 calling convention?
> > The shift is unusual.
> > x86 is using fp-N for outgoing and fp+N for incoming.
> > Why can't we use the same?
> >
> This is not for jit. The above transfer is for verification purpose.
> For example, for callee, a load 'rX = *(u64 *)(r12 - 8)' can easily
> get the value rX in callee since the value is copied from caller to callee.

There shouldn't be any extra copy.
For 7th and higher argument:
the caller does 'stx [r12 - N]' and the callee does 'ldx [r12 + M]'
and JIT emits them as-is only adjusting N and M constants.

For 6th argument JIT emits these two stx/ldx as moves to/from x86's r9.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-05 21:07       ` Alexei Starovoitov
@ 2026-04-06  4:29         ` Yonghong Song
  2026-04-06  4:51           ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Yonghong Song @ 2026-04-06  4:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau



On 4/5/26 2:07 PM, Alexei Starovoitov wrote:
> On Thu, Apr 2, 2026 at 9:11 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>> On 4/2/26 4:38 PM, Alexei Starovoitov wrote:
>>> On Wed, Apr 1, 2026 at 6:27 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> Currently BPF functions (subprogs) are limited to 5 register arguments.
>>>> With [1], the compiler can emit code that passes additional arguments
>>>> via a dedicated stack area through bpf register
>>>> BPF_REG_STACK_ARG_BASE (r12), introduced in the previous patch.
>>>>
>>>> The following is an example to show how stack arguments are saved
>>>> and transferred between caller and callee:
>>>>
>>>>     int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
>>>>       ...
>>>>       bar(a1, a2, a3, a4, a5, a6, a7, a8);
>>>>       ...
>>>>     }
>>>>
>>>> The following is a illustration of stack allocation:
>>>>
>>>>      Caller (foo)                           Callee (bar)
>>>>      ============                           ============
>>>>      r12-relative stack arg area:           r12-relative stack arg area:
>>>>
>>>>      r12-8:  [incoming arg 6]          +--> r12-8:  [incoming arg 6] (from caller's outgoing r12-24)
>>>>      r12-16: [incoming arg 7]          |+-> r12-16: [incoming arg 7] (from caller's outgoing r12-32)
>>>>                                        ||+> r12-24: [incoming arg 8] (from caller's outgoing r12-40)
>>>>      ---- incoming/outgoing boundary   |||  ---- incoming/outgoing boundary
>>>>      r12-24: [outgoing arg 6 to callee]+||   ...
>>>>      r12-32: [outgoing arg 7 to callee]-+|
>>>>      r12-40: [outgoing arg 8 to callee]--+
>>> I haven't looked at the patch itself only at this diagram.
>>> How does it suppose to map to x86 calling convention?
>>> The shift is unusual.
>>> x86 is using fp-N for outgoing and fp+N for incoming.
>>> Why can't we use the same?
>>>
>> This is not for jit. The above transfer is for verification purpose.
>> For example, for callee, a load 'rX = *(u64 *)(r12 - 8)' can easily
>> get the value rX in callee since the value is copied from caller to callee.
> There shouldn't be any extra copy.
> For 7th and higher argument:
> the caller does 'stx [r12 - N]' and the callee does 'ldx [r12 + M]'
> and JIT emits them as-is only adjusting N and M constants.

Indeed, for 7th and higher arguments, ldx and stx and directly get
or store values in expected calling-convention places.

>
> For 6th argument JIT emits these two stx/ldx as moves to/from x86's r9.

For stx case, we should move the bottom stack value (6th argument)
to r9 and pop the bottom stack slot (8 bytes).

For ldx case, this is a bpf callee to access a bpf caller. In such
cases, r9 is not involved.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-06  4:29         ` Yonghong Song
@ 2026-04-06  4:51           ` Alexei Starovoitov
  2026-04-06  6:03             ` Yonghong Song
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2026-04-06  4:51 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau

On Sun, Apr 5, 2026 at 9:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> >
> > For 6th argument JIT emits these two stx/ldx as moves to/from x86's r9.
>
> For stx case, we should move the bottom stack value (6th argument)
> to r9 and pop the bottom stack slot (8 bytes).

That doesn't sound right.
Passing 6th argument should not involve stack manipulation at all.
No push and no pop.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-06  4:51           ` Alexei Starovoitov
@ 2026-04-06  6:03             ` Yonghong Song
  2026-04-06 15:17               ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Yonghong Song @ 2026-04-06  6:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau



On 4/5/26 9:51 PM, Alexei Starovoitov wrote:
> On Sun, Apr 5, 2026 at 9:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>> For 6th argument JIT emits these two stx/ldx as moves to/from x86's r9.
>> For stx case, we should move the bottom stack value (6th argument)
>> to r9 and pop the bottom stack slot (8 bytes).
> That doesn't sound right.
> Passing 6th argument should not involve stack manipulation at all.
> No push and no pop.

The following stack layout can avoid the above push/pop issue:

    incoming stack arg N -> 1
    return adderss
    saved rbp
    BPF program stack
    tail call cnt <== if tail call reachable
    callee-saved regs
    r9 <== if priv_frame_ptr is not null
    outgoing stack arg 1
    outgoing stack arg M -> 2

After the above pushing (outgoing stack arg M -> 2),
if bpf-to-bpf, push the outgoing stack arg 1.
If kfunc, move outgoing stack arg 1 to r9.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-06  6:03             ` Yonghong Song
@ 2026-04-06 15:17               ` Alexei Starovoitov
  2026-04-06 16:19                 ` Yonghong Song
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2026-04-06 15:17 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau

On Sun, Apr 5, 2026 at 11:03 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 4/5/26 9:51 PM, Alexei Starovoitov wrote:
> > On Sun, Apr 5, 2026 at 9:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >>> For 6th argument JIT emits these two stx/ldx as moves to/from x86's r9.
> >> For stx case, we should move the bottom stack value (6th argument)
> >> to r9 and pop the bottom stack slot (8 bytes).
> > That doesn't sound right.
> > Passing 6th argument should not involve stack manipulation at all.
> > No push and no pop.
>
> The following stack layout can avoid the above push/pop issue:
>
>     incoming stack arg N -> 1
>     return adderss
>     saved rbp
>     BPF program stack
>     tail call cnt <== if tail call reachable

let's disallow mixing 6+ args and tailcalls.

>     callee-saved regs
>     r9 <== if priv_frame_ptr is not null

I guess we can also disable private stack and 6+ args.
Looks like it's a bit in the way.
The compilers will emit
stx [r12 - N], src_reg // store of outgoing 6th arg
call

For priv stack JIT will emit push_r9 while JITing call insn.
But it should push_r9 before stx converts to 'mov %r9 <- %src_reg'.
Maybe we can reload r9 after the call instead of push/pop.

>     outgoing stack arg 1
>     outgoing stack arg M -> 2
>
> After the above pushing (outgoing stack arg M -> 2),
> if bpf-to-bpf, push the outgoing stack arg 1.
> If kfunc, move outgoing stack arg 1 to r9.

Ideally JIT treats subprog calls and kfunc calls the same way.
Why should they be different ?
In the callee, in both cases, the first insn in the prologue
can be 'mov [rbp + ] <- %r9',

so that later 'ldx [r12 + ' can be JITed as-is. Here priv stack
is not in the way.
Without priv stack we can avoid this first 'mov [rbp + ] <- %r9'
in the prologue and instead JIT 'ldx [r12 + ' as 'mov %dst_reg <- %r9'
since nothing will be overwriting %r9.

It feels to me that it's easier to disallow priv stack for now.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-06 15:17               ` Alexei Starovoitov
@ 2026-04-06 16:19                 ` Yonghong Song
  2026-04-06 17:24                   ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Yonghong Song @ 2026-04-06 16:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau



On 4/6/26 8:17 AM, Alexei Starovoitov wrote:
> On Sun, Apr 5, 2026 at 11:03 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>> On 4/5/26 9:51 PM, Alexei Starovoitov wrote:
>>> On Sun, Apr 5, 2026 at 9:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>> For 6th argument JIT emits these two stx/ldx as moves to/from x86's r9.
>>>> For stx case, we should move the bottom stack value (6th argument)
>>>> to r9 and pop the bottom stack slot (8 bytes).
>>> That doesn't sound right.
>>> Passing 6th argument should not involve stack manipulation at all.
>>> No push and no pop.
>> The following stack layout can avoid the above push/pop issue:
>>
>>      incoming stack arg N -> 1
>>      return adderss
>>      saved rbp
>>      BPF program stack
>>      tail call cnt <== if tail call reachable
> let's disallow mixing 6+ args and tailcalls.

Okay.

>
>>      callee-saved regs
>>      r9 <== if priv_frame_ptr is not null
> I guess we can also disable private stack and 6+ args.
> Looks like it's a bit in the way.
> The compilers will emit
> stx [r12 - N], src_reg // store of outgoing 6th arg
> call
>
> For priv stack JIT will emit push_r9 while JITing call insn.
> But it should push_r9 before stx converts to 'mov %r9 <- %src_reg'.

I guess this is not enough. we can push_r9 in prologue. But for
priv stack, r9 is still used in the normal bpf load/store codes.
This will interfere with jitting 'mov %r9 <- %src_reg'.

As you mentioned later, disable priv_stack would work.

> Maybe we can reload r9 after the call instead of push/pop.
>
>>      outgoing stack arg 1
>>      outgoing stack arg M -> 2
>>
>> After the above pushing (outgoing stack arg M -> 2),
>> if bpf-to-bpf, push the outgoing stack arg 1.
>> If kfunc, move outgoing stack arg 1 to r9.
> Ideally JIT treats subprog calls and kfunc calls the same way.
> Why should they be different ?
> In the callee, in both cases, the first insn in the prologue
> can be 'mov [rbp + ] <- %r9',
>
> so that later 'ldx [r12 + ' can be JITed as-is. Here priv stack
> is not in the way.
> Without priv stack we can avoid this first 'mov [rbp + ] <- %r9'
> in the prologue and instead JIT 'ldx [r12 + ' as 'mov %dst_reg <- %r9'
> since nothing will be overwriting %r9.
>
> It feels to me that it's easier to disallow priv stack for now.

So if we disallow priv_stack, we can do
    stx[r12 - first_arg_off] = val/reg => mov %r9, val/reg
other arg_off will be pushed to stack in reverse orders.

So for kfunc, 'mov %r9, val' already there, so we should be okay.
For bpf-to-bpf, do 'push %r9'.

Is this correct?



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions
  2026-04-06 16:19                 ` Yonghong Song
@ 2026-04-06 17:24                   ` Alexei Starovoitov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2026-04-06 17:24 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jose E . Marchesi, Kernel Team, Martin KaFai Lau

On Mon, Apr 6, 2026 at 9:19 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 4/6/26 8:17 AM, Alexei Starovoitov wrote:
> > On Sun, Apr 5, 2026 at 11:03 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >>
> >> On 4/5/26 9:51 PM, Alexei Starovoitov wrote:
> >>> On Sun, Apr 5, 2026 at 9:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>>>> For 6th argument JIT emits these two stx/ldx as moves to/from x86's r9.
> >>>> For stx case, we should move the bottom stack value (6th argument)
> >>>> to r9 and pop the bottom stack slot (8 bytes).
> >>> That doesn't sound right.
> >>> Passing 6th argument should not involve stack manipulation at all.
> >>> No push and no pop.
> >> The following stack layout can avoid the above push/pop issue:
> >>
> >>      incoming stack arg N -> 1
> >>      return adderss
> >>      saved rbp
> >>      BPF program stack
> >>      tail call cnt <== if tail call reachable
> > let's disallow mixing 6+ args and tailcalls.
>
> Okay.
>
> >
> >>      callee-saved regs
> >>      r9 <== if priv_frame_ptr is not null
> > I guess we can also disable private stack and 6+ args.
> > Looks like it's a bit in the way.
> > The compilers will emit
> > stx [r12 - N], src_reg // store of outgoing 6th arg
> > call
> >
> > For priv stack JIT will emit push_r9 while JITing call insn.
> > But it should push_r9 before stx converts to 'mov %r9 <- %src_reg'.
>
> I guess this is not enough. we can push_r9 in prologue. But for
> priv stack, r9 is still used in the normal bpf load/store codes.
> This will interfere with jitting 'mov %r9 <- %src_reg'.
>
> As you mentioned later, disable priv_stack would work.
>
> > Maybe we can reload r9 after the call instead of push/pop.
> >
> >>      outgoing stack arg 1
> >>      outgoing stack arg M -> 2
> >>
> >> After the above pushing (outgoing stack arg M -> 2),
> >> if bpf-to-bpf, push the outgoing stack arg 1.
> >> If kfunc, move outgoing stack arg 1 to r9.
> > Ideally JIT treats subprog calls and kfunc calls the same way.
> > Why should they be different ?
> > In the callee, in both cases, the first insn in the prologue
> > can be 'mov [rbp + ] <- %r9',
> >
> > so that later 'ldx [r12 + ' can be JITed as-is. Here priv stack
> > is not in the way.
> > Without priv stack we can avoid this first 'mov [rbp + ] <- %r9'
> > in the prologue and instead JIT 'ldx [r12 + ' as 'mov %dst_reg <- %r9'
> > since nothing will be overwriting %r9.
> >
> > It feels to me that it's easier to disallow priv stack for now.
>
> So if we disallow priv_stack, we can do
>     stx[r12 - first_arg_off] = val/reg => mov %r9, val/reg

yes.

> other arg_off will be pushed to stack in reverse orders.

no. Not pushed. they will be JITed in place stx[r12 -off] -> mov[%rbp], val
Probably needs add %rsp somewhere to grow the stack.

> So for kfunc, 'mov %r9, val' already there, so we should be okay.
> For bpf-to-bpf, do 'push %r9'.

Don't follow. why should they be different?

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-04-06 17:24 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02  1:27 [PATCH bpf-next 00/10] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
2026-04-02  1:27 ` [PATCH bpf-next 01/10] bpf: Introduce bpf register BPF_REG_STACK_ARG_BASE Yonghong Song
2026-04-02  1:27 ` [PATCH bpf-next 02/10] bpf: Reuse MAX_BPF_FUNC_ARGS for maximum number of arguments Yonghong Song
2026-04-02  1:27 ` [PATCH bpf-next 03/10] bpf: Support stack arguments for bpf functions Yonghong Song
2026-04-02  3:18   ` bot+bpf-ci
2026-04-02 14:42     ` Yonghong Song
2026-04-02 18:55   ` Amery Hung
2026-04-02 20:45     ` Yonghong Song
2026-04-02 23:38   ` Amery Hung
2026-04-03  4:05     ` Yonghong Song
2026-04-02 23:38   ` Alexei Starovoitov
2026-04-03  4:10     ` Yonghong Song
2026-04-05 21:07       ` Alexei Starovoitov
2026-04-06  4:29         ` Yonghong Song
2026-04-06  4:51           ` Alexei Starovoitov
2026-04-06  6:03             ` Yonghong Song
2026-04-06 15:17               ` Alexei Starovoitov
2026-04-06 16:19                 ` Yonghong Song
2026-04-06 17:24                   ` Alexei Starovoitov
2026-04-02  1:27 ` [PATCH bpf-next 04/10] bpf: Support stack arguments for kfunc calls Yonghong Song
2026-04-02  3:18   ` bot+bpf-ci
2026-04-02 14:45     ` Yonghong Song
2026-04-02 21:02   ` Amery Hung
2026-04-02  1:27 ` [PATCH bpf-next 05/10] bpf: Reject stack arguments in non-JITed programs Yonghong Song
2026-04-02  1:27 ` [PATCH bpf-next 06/10] bpf: Enable stack argument support for x86_64 Yonghong Song
2026-04-02  1:28 ` [PATCH bpf-next 07/10] bpf,x86: Implement JIT support for stack arguments Yonghong Song
2026-04-02 22:26   ` Amery Hung
2026-04-02 23:26     ` Yonghong Song
2026-04-02 23:51   ` Alexei Starovoitov
2026-04-03  4:13     ` Yonghong Song
2026-04-02  1:28 ` [PATCH bpf-next 08/10] selftests/bpf: Add tests for BPF function " Yonghong Song
2026-04-02  1:28 ` [PATCH bpf-next 09/10] selftests/bpf: Add negative test for oversized kfunc stack argument Yonghong Song
2026-04-02  1:28 ` [PATCH bpf-next 10/10] selftests/bpf: Add verifier tests for stack argument validation Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox