public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] bpf, arm64: Support stack arguments
@ 2026-04-20 15:35 Puranjay Mohan
  2026-04-20 15:35 ` [PATCH bpf-next 1/3] bpf, arm64: Map BPF_REG_0 to x8 instead of x7 Puranjay Mohan
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-04-20 15:35 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Xu Kuohai,
	Catalin Marinas, Will Deacon, linux-arm-kernel

This set add support of stack arguments to the arm64 JIT based on the
preparatory work and x86 support in [1]. Arm64 allows passing 8
arguments in registers x0-x7 and remaining ones on stack. Currently, BPF
supports passing 5 arguments all of which map to x0-x4.

BPF passes arguments in R1-R5 which map to x0-x4 on arm64, BPF reg R0 is
mapped to arm64 reg x7, but as arm64 needs this register to pass
arguments now, the first patch changes the mapping of BPF reg R0 to x8
which allows passing arguments in x0-x7 (8 arguments). 9+ arguments are
passed on the stack.

Note: This set needs to be applied on top of [1]

All selftest pass:

  ./test_progs -t stack_arg,stack_arg_fail,stack_arg_kfunc,verifier_stack_arg
  #431/1   stack_arg/global_many_args:OK
  #431/2   stack_arg/async_cb_many_args:OK
  #431/3   stack_arg/bpf2bpf:OK
  #431/4   stack_arg/kfunc:OK
  #431     stack_arg:OK
  #432/1   stack_arg_fail/test_stack_arg_big:OK
  #432/2   stack_arg_fail/r11 in ALU instruction:OK
  #432/3   stack_arg_fail/r11 store with non-DW size:OK
  #432/4   stack_arg_fail/r11 store with unaligned offset:OK
  #432/5   stack_arg_fail/r11 store with positive offset:OK
  #432/6   stack_arg_fail/r11 load with negative offset:OK
  #432/7   stack_arg_fail/r11 load with non-DW size:OK
  #432/8   stack_arg_fail/r11 store with zero offset:OK
  #432     stack_arg_fail:OK
  #631/1   verifier_stack_arg/stack_arg: subprog with 6 args:OK
  #631/2   verifier_stack_arg/stack_arg: two subprogs with >5 args:OK
  #631/3   verifier_stack_arg/stack_arg: read from uninitialized stack arg slot:OK
  #631/4   verifier_stack_arg/stack_arg: gap at offset -8, only wrote -16:OK
  #631/5   verifier_stack_arg/stack_arg: pruning with different stack arg types:OK
  #631/6   verifier_stack_arg/stack_arg: release_reference invalidates stack arg slot:OK
  #631/7   verifier_stack_arg/stack_arg: pkt pointer in stack arg slot invalidated after pull_data:OK
  #631/8   verifier_stack_arg/stack_arg: null propagation rejects deref on null branch:OK
  #631/9   verifier_stack_arg/stack_arg: missing store on one branch:OK
  #631/10  verifier_stack_arg/stack_arg: share a store for both branches:OK
  #631/11  verifier_stack_arg/stack_arg: write beyond max outgoing depth:OK
  #631/12  verifier_stack_arg/stack_arg: sequential calls reuse slots:OK
  #631     verifier_stack_arg:OK
  Summary: 3/24 PASSED, 0 SKIPPED, 0 FAILED

[1] https://lore.kernel.org/all/20260419163316.731019-1-yonghong.song@linux.dev/

Puranjay Mohan (3):
  bpf, arm64: Map BPF_REG_0 to x8 instead of x7
  bpf, arm64: Add JIT support for stack arguments
  selftests/bpf: Enable stack argument tests for arm64

 arch/arm64/net/bpf_jit_comp.c                 | 91 +++++++++++++++++--
 arch/arm64/net/bpf_timed_may_goto.S           |  8 +-
 tools/testing/selftests/bpf/progs/stack_arg.c |  3 +-
 .../selftests/bpf/progs/stack_arg_kfunc.c     |  3 +-
 .../selftests/bpf/progs/verifier_jit_inline.c |  2 +-
 .../selftests/bpf/progs/verifier_ldsx.c       |  6 +-
 .../bpf/progs/verifier_private_stack.c        | 10 +-
 .../selftests/bpf/progs/verifier_stack_arg.c  | 15 ++-
 8 files changed, 116 insertions(+), 22 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 1/3] bpf, arm64: Map BPF_REG_0 to x8 instead of x7
  2026-04-20 15:35 [PATCH bpf-next 0/3] bpf, arm64: Support stack arguments Puranjay Mohan
@ 2026-04-20 15:35 ` Puranjay Mohan
  2026-04-20 15:36 ` [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments Puranjay Mohan
  2026-04-20 15:36 ` [PATCH bpf-next 3/3] selftests/bpf: Enable stack argument tests for arm64 Puranjay Mohan
  2 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-04-20 15:35 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Xu Kuohai,
	Catalin Marinas, Will Deacon, linux-arm-kernel

Move the BPF return value register from x7 to x8, freeing x7 for use
as an argument register. AAPCS64 designates x8 as the indirect result
location register; it is caller-saved and not used for argument
passing, making it a suitable home for BPF_REG_0.

This is a prerequisite for stack argument support, which needs x5-x7
to pass arguments 6-8 to native kfuncs following the AAPCS64 calling
convention.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/arm64/net/bpf_jit_comp.c                          |  4 ++--
 arch/arm64/net/bpf_timed_may_goto.S                    |  8 ++++----
 .../testing/selftests/bpf/progs/verifier_jit_inline.c  |  2 +-
 tools/testing/selftests/bpf/progs/verifier_ldsx.c      |  6 +++---
 .../selftests/bpf/progs/verifier_private_stack.c       | 10 +++++-----
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 0816c40fc7af..085e650662e3 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -47,7 +47,7 @@
 /* Map BPF registers to A64 registers */
 static const int bpf2a64[] = {
 	/* return value from in-kernel function, and exit value from eBPF */
-	[BPF_REG_0] = A64_R(7),
+	[BPF_REG_0] = A64_R(8),
 	/* arguments from eBPF program to in-kernel function */
 	[BPF_REG_1] = A64_R(0),
 	[BPF_REG_2] = A64_R(1),
@@ -1048,7 +1048,7 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
 	/* Restore FP/LR registers */
 	emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
 
-	/* Move the return value from bpf:r0 (aka x7) to x0 */
+	/* Move the return value from bpf:r0 (aka x8) to x0 */
 	emit(A64_MOV(1, A64_R(0), r0), ctx);
 
 	/* Authenticate lr */
diff --git a/arch/arm64/net/bpf_timed_may_goto.S b/arch/arm64/net/bpf_timed_may_goto.S
index 894cfcd7b241..a9a802711a7f 100644
--- a/arch/arm64/net/bpf_timed_may_goto.S
+++ b/arch/arm64/net/bpf_timed_may_goto.S
@@ -8,8 +8,8 @@ SYM_FUNC_START(arch_bpf_timed_may_goto)
 	stp     x29, x30, [sp, #-64]!
 	mov     x29, sp
 
-	/* Save BPF registers R0 - R5 (x7, x0-x4)*/
-	stp	x7, x0, [sp, #16]
+	/* Save BPF registers R0 - R5 (x8, x0-x4)*/
+	stp	x8, x0, [sp, #16]
 	stp	x1, x2, [sp, #32]
 	stp	x3, x4, [sp, #48]
 
@@ -28,8 +28,8 @@ SYM_FUNC_START(arch_bpf_timed_may_goto)
 	/* BPF_REG_AX(x9) will be stored into count, so move return value to it. */
 	mov	x9, x0
 
-	/* Restore BPF registers R0 - R5 (x7, x0-x4) */
-	ldp	x7, x0, [sp, #16]
+	/* Restore BPF registers R0 - R5 (x8, x0-x4) */
+	ldp	x8, x0, [sp, #16]
 	ldp	x1, x2, [sp, #32]
 	ldp	x3, x4, [sp, #48]
 
diff --git a/tools/testing/selftests/bpf/progs/verifier_jit_inline.c b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
index 4ea254063646..885ff69a3a62 100644
--- a/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
+++ b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
@@ -9,7 +9,7 @@ __success __retval(0)
 __arch_x86_64
 __jited("	addq	%gs:{{.*}}, %rax")
 __arch_arm64
-__jited("	mrs	x7, SP_EL0")
+__jited("	mrs	x8, SP_EL0")
 int inline_bpf_get_current_task(void)
 {
 	bpf_get_current_task();
diff --git a/tools/testing/selftests/bpf/progs/verifier_ldsx.c b/tools/testing/selftests/bpf/progs/verifier_ldsx.c
index c8494b682c31..c814e82a7242 100644
--- a/tools/testing/selftests/bpf/progs/verifier_ldsx.c
+++ b/tools/testing/selftests/bpf/progs/verifier_ldsx.c
@@ -274,11 +274,11 @@ __jited("movslq	0x10(%rdi,%r12), %r15")
 __jited("movswq	0x18(%rdi,%r12), %r15")
 __jited("movsbq	0x20(%rdi,%r12), %r15")
 __arch_arm64
-__jited("add	x11, x7, x28")
+__jited("add	x11, x8, x28")
 __jited("ldrsw	x21, [x11, #0x10]")
-__jited("add	x11, x7, x28")
+__jited("add	x11, x8, x28")
 __jited("ldrsh	x21, [x11, #0x18]")
-__jited("add	x11, x7, x28")
+__jited("add	x11, x8, x28")
 __jited("ldrsb	x21, [x11, #0x20]")
 __jited("add	x11, x0, x28")
 __jited("ldrsw	x22, [x11, #0x10]")
diff --git a/tools/testing/selftests/bpf/progs/verifier_private_stack.c b/tools/testing/selftests/bpf/progs/verifier_private_stack.c
index 646e8ef82051..c5078face38d 100644
--- a/tools/testing/selftests/bpf/progs/verifier_private_stack.c
+++ b/tools/testing/selftests/bpf/progs/verifier_private_stack.c
@@ -170,12 +170,12 @@ __jited("	mrs	x10, TPIDR_EL{{[0-1]}}")
 __jited("	add	x27, x27, x10")
 __jited("	add	x25, x27, {{.*}}")
 __jited("	bl	0x{{.*}}")
-__jited("	mov	x7, x0")
+__jited("	mov	x8, x0")
 __jited("	mov	x0, #0x2a")
 __jited("	str	x0, [x27]")
 __jited("	bl	0x{{.*}}")
-__jited("	mov	x7, x0")
-__jited("	mov	x7, #0x0")
+__jited("	mov	x8, x0")
+__jited("	mov	x8, #0x0")
 __jited("	ldp	x25, x27, [sp], {{.*}}")
 __naked void private_stack_callback(void)
 {
@@ -220,7 +220,7 @@ __jited("	mov	x0, #0x2a")
 __jited("	str	x0, [x27]")
 __jited("	mov	x0, #0x0")
 __jited("	bl	0x{{.*}}")
-__jited("	mov	x7, x0")
+__jited("	mov	x8, x0")
 __jited("	ldp	x27, x28, [sp], #0x10")
 int private_stack_exception_main_prog(void)
 {
@@ -258,7 +258,7 @@ __jited("	add	x25, x27, {{.*}}")
 __jited("	mov	x0, #0x2a")
 __jited("	str	x0, [x27]")
 __jited("	bl	0x{{.*}}")
-__jited("	mov	x7, x0")
+__jited("	mov	x8, x0")
 __jited("	ldp	x27, x28, [sp], #0x10")
 int private_stack_exception_sub_prog(void)
 {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments
  2026-04-20 15:35 [PATCH bpf-next 0/3] bpf, arm64: Support stack arguments Puranjay Mohan
  2026-04-20 15:35 ` [PATCH bpf-next 1/3] bpf, arm64: Map BPF_REG_0 to x8 instead of x7 Puranjay Mohan
@ 2026-04-20 15:36 ` Puranjay Mohan
  2026-04-21  2:58   ` Alexei Starovoitov
  2026-04-20 15:36 ` [PATCH bpf-next 3/3] selftests/bpf: Enable stack argument tests for arm64 Puranjay Mohan
  2 siblings, 1 reply; 7+ messages in thread
From: Puranjay Mohan @ 2026-04-20 15:36 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Xu Kuohai,
	Catalin Marinas, Will Deacon, linux-arm-kernel

Implement stack argument passing for BPF-to-BPF and kfunc calls with
more than 5 parameters on arm64, following the AAPCS64 calling
convention.

BPF R1-R5 already map to x0-x4. With BPF_REG_0 moved to x8 by the
previous commit, x5-x7 are free for arguments 6-8. Arguments 9-12
spill onto the stack at [SP+0], [SP+8], ... and the callee reads
them from [FP+16], [FP+24], ... (above the saved FP/LR pair).

BPF convention uses fixed offsets from BPF_REG_PARAMS (r11): off=-8 is
always arg 6, off=-16 arg 7, etc. The verifier invalidates all outgoing
stack arg slots after each call, so the compiler must re-store before
every call. This means x5-x7 don't need to be saved on stack.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/arm64/net/bpf_jit_comp.c | 87 +++++++++++++++++++++++++++++++++--
 1 file changed, 83 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 085e650662e3..7adf2b0f4610 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -86,6 +86,7 @@ struct jit_ctx {
 	__le32 *image;
 	__le32 *ro_image;
 	u32 stack_size;
+	u16 stack_arg_size;
 	u64 user_vm_start;
 	u64 arena_vm_start;
 	bool fp_used;
@@ -533,13 +534,19 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	 *                        |     |
 	 *                        +-----+ <= (BPF_FP - prog->aux->stack_depth)
 	 *                        |RSVD | padding
-	 * current A64_SP =>      +-----+ <= (BPF_FP - ctx->stack_size)
+	 *                        +-----+ <= (BPF_FP - ctx->stack_size)
+	 *                        |     |
+	 *                        | ... | outgoing stack args (9+, if any)
+	 *                        |     |
+	 * current A64_SP =>      +-----+
 	 *                        |     |
 	 *                        | ... | Function call stack
 	 *                        |     |
 	 *                        +-----+
 	 *                          low
 	 *
+	 * Stack args 6-8 are passed in x5-x7, args 9+ at [SP].
+	 * Incoming args 9+ are at [FP + 16], [FP + 24], ...
 	 */
 
 	emit_kcfi(is_main_prog ? cfi_bpf_hash : cfi_bpf_subprog_hash, ctx);
@@ -613,6 +620,9 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	if (ctx->stack_size && !ctx->priv_sp_used)
 		emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
 
+	if (ctx->stack_arg_size)
+		emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_arg_size), ctx);
+
 	if (ctx->arena_vm_start)
 		emit_a64_mov_i64(arena_vm_base, ctx->arena_vm_start, ctx);
 
@@ -673,6 +683,9 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 	/* Update tail_call_cnt if the slot is populated. */
 	emit(A64_STR64I(tcc, ptr, 0), ctx);
 
+	if (ctx->stack_arg_size)
+		emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_arg_size), ctx);
+
 	/* restore SP */
 	if (ctx->stack_size && !ctx->priv_sp_used)
 		emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
@@ -1034,6 +1047,9 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
 	const u8 r0 = bpf2a64[BPF_REG_0];
 	const u8 ptr = bpf2a64[TCCNT_PTR];
 
+	if (ctx->stack_arg_size)
+		emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_arg_size), ctx);
+
 	/* We're done with BPF stack */
 	if (ctx->stack_size && !ctx->priv_sp_used)
 		emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
@@ -1191,6 +1207,41 @@ static int add_exception_handler(const struct bpf_insn *insn,
 	return 0;
 }
 
+static const u8 stack_arg_reg[] = { A64_R(5), A64_R(6), A64_R(7) };
+
+#define NR_STACK_ARG_REGS	ARRAY_SIZE(stack_arg_reg)
+
+static void emit_stack_arg_load(u8 dst, s16 bpf_off, struct jit_ctx *ctx)
+{
+	int idx = bpf_off / sizeof(u64) - 1;
+
+	if (idx < NR_STACK_ARG_REGS)
+		emit(A64_MOV(1, dst, stack_arg_reg[idx]), ctx);
+	else
+		emit(A64_LDR64I(dst, A64_FP, (idx - NR_STACK_ARG_REGS) * sizeof(u64) + 16), ctx);
+}
+
+static void emit_stack_arg_store(u8 src_a64, s16 bpf_off, struct jit_ctx *ctx)
+{
+	int idx = -bpf_off / sizeof(u64) - 1;
+
+	if (idx < NR_STACK_ARG_REGS)
+		emit(A64_MOV(1, stack_arg_reg[idx], src_a64), ctx);
+	else
+		emit(A64_STR64I(src_a64, A64_SP, (idx - NR_STACK_ARG_REGS) * sizeof(u64)), ctx);
+}
+
+static void emit_stack_arg_store_imm(s32 imm, s16 bpf_off, const u8 tmp, struct jit_ctx *ctx)
+{
+	int idx = -bpf_off / sizeof(u64) - 1;
+
+	emit_a64_mov_i(1, tmp, imm, ctx);
+	if (idx < NR_STACK_ARG_REGS)
+		emit(A64_MOV(1, stack_arg_reg[idx], tmp), ctx);
+	else
+		emit(A64_STR64I(tmp, A64_SP, (idx - NR_STACK_ARG_REGS) * sizeof(u64)), ctx);
+}
+
 /* JITs an eBPF instruction.
  * Returns:
  * 0  - successfully JITed an 8-byte eBPF instruction.
@@ -1646,6 +1697,11 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
 	case BPF_LDX | BPF_MEM | BPF_H:
 	case BPF_LDX | BPF_MEM | BPF_B:
 	case BPF_LDX | BPF_MEM | BPF_DW:
+		if (insn->src_reg == BPF_REG_PARAMS) {
+			emit_stack_arg_load(dst, off, ctx);
+			break;
+		}
+		fallthrough;
 	case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
 	case BPF_LDX | BPF_PROBE_MEM | BPF_W:
 	case BPF_LDX | BPF_PROBE_MEM | BPF_H:
@@ -1671,7 +1727,7 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
 		}
 		if (src == fp) {
 			src_adj = ctx->priv_sp_used ? priv_sp : A64_SP;
-			off_adj = off + ctx->stack_size;
+			off_adj = off + ctx->stack_size + ctx->stack_arg_size;
 		} else {
 			src_adj = src;
 			off_adj = off;
@@ -1752,6 +1808,11 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
 	case BPF_ST | BPF_MEM | BPF_H:
 	case BPF_ST | BPF_MEM | BPF_B:
 	case BPF_ST | BPF_MEM | BPF_DW:
+		if (insn->dst_reg == BPF_REG_PARAMS) {
+			emit_stack_arg_store_imm(imm, off, tmp, ctx);
+			break;
+		}
+		fallthrough;
 	case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
 	case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
 	case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
@@ -1762,7 +1823,7 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
 		}
 		if (dst == fp) {
 			dst_adj = ctx->priv_sp_used ? priv_sp : A64_SP;
-			off_adj = off + ctx->stack_size;
+			off_adj = off + ctx->stack_size + ctx->stack_arg_size;
 		} else {
 			dst_adj = dst;
 			off_adj = off;
@@ -1814,6 +1875,11 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
 	case BPF_STX | BPF_MEM | BPF_H:
 	case BPF_STX | BPF_MEM | BPF_B:
 	case BPF_STX | BPF_MEM | BPF_DW:
+		if (insn->dst_reg == BPF_REG_PARAMS) {
+			emit_stack_arg_store(src, off, ctx);
+			break;
+		}
+		fallthrough;
 	case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
 	case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
 	case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
@@ -1824,7 +1890,7 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
 		}
 		if (dst == fp) {
 			dst_adj = ctx->priv_sp_used ? priv_sp : A64_SP;
-			off_adj = off + ctx->stack_size;
+			off_adj = off + ctx->stack_size + ctx->stack_arg_size;
 		} else {
 			dst_adj = dst;
 			off_adj = off;
@@ -2065,6 +2131,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 	ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
 	ctx.arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
 
+	if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth) {
+		u16 outgoing = prog->aux->stack_arg_depth - prog->aux->incoming_stack_arg_depth;
+		int nr_on_stack = outgoing / sizeof(u64) - NR_STACK_ARG_REGS;
+
+		if (nr_on_stack > 0)
+			ctx.stack_arg_size = round_up(nr_on_stack * sizeof(u64), 16);
+	}
+
 	if (priv_stack_ptr)
 		ctx.priv_sp_used = true;
 
@@ -2229,6 +2303,11 @@ bool bpf_jit_supports_kfunc_call(void)
 	return true;
 }
 
+bool bpf_jit_supports_stack_args(void)
+{
+	return true;
+}
+
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
 	if (!aarch64_insn_copy(dst, src, len))
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: Enable stack argument tests for arm64
  2026-04-20 15:35 [PATCH bpf-next 0/3] bpf, arm64: Support stack arguments Puranjay Mohan
  2026-04-20 15:35 ` [PATCH bpf-next 1/3] bpf, arm64: Map BPF_REG_0 to x8 instead of x7 Puranjay Mohan
  2026-04-20 15:36 ` [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments Puranjay Mohan
@ 2026-04-20 15:36 ` Puranjay Mohan
  2 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-04-20 15:36 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Xu Kuohai,
	Catalin Marinas, Will Deacon, linux-arm-kernel

Now that arm64 supports stack arguments, enable the existing stack_arg,
stack_arg_kfunc and verifier_stack_arg tests for __TARGET_ARCH_arm64.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 tools/testing/selftests/bpf/progs/stack_arg.c     |  3 ++-
 .../testing/selftests/bpf/progs/stack_arg_kfunc.c |  3 ++-
 .../selftests/bpf/progs/verifier_stack_arg.c      | 15 ++++++++++++++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/stack_arg.c b/tools/testing/selftests/bpf/progs/stack_arg.c
index 8c198ee952ff..b1276009fd30 100644
--- a/tools/testing/selftests/bpf/progs/stack_arg.c
+++ b/tools/testing/selftests/bpf/progs/stack_arg.c
@@ -23,7 +23,8 @@ struct {
 
 int timer_result;
 
-#if defined(__TARGET_ARCH_x86) && defined(__BPF_FEATURE_STACK_ARGUMENT)
+#if (defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64)) && \
+	defined(__BPF_FEATURE_STACK_ARGUMENT)
 
 const volatile bool has_stack_arg = true;
 
diff --git a/tools/testing/selftests/bpf/progs/stack_arg_kfunc.c b/tools/testing/selftests/bpf/progs/stack_arg_kfunc.c
index 6cc404d57863..3818cd0cb67b 100644
--- a/tools/testing/selftests/bpf/progs/stack_arg_kfunc.c
+++ b/tools/testing/selftests/bpf/progs/stack_arg_kfunc.c
@@ -6,7 +6,8 @@
 #include "bpf_kfuncs.h"
 #include "../test_kmods/bpf_testmod_kfunc.h"
 
-#if defined(__TARGET_ARCH_x86) && defined(__BPF_FEATURE_STACK_ARGUMENT)
+#if (defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64)) && \
+	defined(__BPF_FEATURE_STACK_ARGUMENT)
 
 const volatile bool has_stack_arg = true;
 
diff --git a/tools/testing/selftests/bpf/progs/verifier_stack_arg.c b/tools/testing/selftests/bpf/progs/verifier_stack_arg.c
index 66dd11840a63..8f1eef911f70 100644
--- a/tools/testing/selftests/bpf/progs/verifier_stack_arg.c
+++ b/tools/testing/selftests/bpf/progs/verifier_stack_arg.c
@@ -12,7 +12,8 @@ struct {
 	__type(value, long long);
 } map_hash_8b SEC(".maps");
 
-#if defined(__TARGET_ARCH_x86) && defined(__BPF_FEATURE_STACK_ARGUMENT)
+#if (defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64)) && \
+	defined(__BPF_FEATURE_STACK_ARGUMENT)
 
 __noinline __used
 static int subprog_6args(int a, int b, int c, int d, int e, int f)
@@ -36,6 +37,7 @@ SEC("tc")
 __description("stack_arg: subprog with 6 args")
 __success
 __arch_x86_64
+__arch_arm64
 __naked void stack_arg_6args(void)
 {
 	asm volatile (
@@ -55,6 +57,7 @@ SEC("tc")
 __description("stack_arg: two subprogs with >5 args")
 __success
 __arch_x86_64
+__arch_arm64
 __naked void stack_arg_two_subprogs(void)
 {
 	asm volatile (
@@ -84,6 +87,7 @@ SEC("tc")
 __description("stack_arg: read from uninitialized stack arg slot")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("invalid read from stack arg off 8 depth 0")
 __naked void stack_arg_read_uninitialized(void)
 {
@@ -99,6 +103,7 @@ SEC("tc")
 __description("stack_arg: gap at offset -8, only wrote -16")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("stack arg#6 not properly initialized")
 __naked void stack_arg_gap_at_minus8(void)
 {
@@ -120,6 +125,7 @@ __description("stack_arg: pruning with different stack arg types")
 __failure
 __flag(BPF_F_TEST_STATE_FREQ)
 __arch_x86_64
+__arch_arm64
 __msg("R1 invalid mem access 'scalar'")
 __naked void stack_arg_pruning_type_mismatch(void)
 {
@@ -157,6 +163,7 @@ SEC("tc")
 __description("stack_arg: release_reference invalidates stack arg slot")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("R1 invalid mem access 'scalar'")
 __naked void stack_arg_release_ref(void)
 {
@@ -207,6 +214,7 @@ SEC("tc")
 __description("stack_arg: pkt pointer in stack arg slot invalidated after pull_data")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("R1 invalid mem access 'scalar'")
 __naked void stack_arg_stale_pkt_ptr(void)
 {
@@ -246,6 +254,7 @@ SEC("tc")
 __description("stack_arg: null propagation rejects deref on null branch")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("R1 invalid mem access 'scalar'")
 __naked void stack_arg_null_propagation_fail(void)
 {
@@ -285,6 +294,7 @@ SEC("tc")
 __description("stack_arg: missing store on one branch")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("stack arg#6 not properly initialized")
 __naked void stack_arg_missing_store_one_branch(void)
 {
@@ -327,6 +337,7 @@ SEC("tc")
 __description("stack_arg: share a store for both branches")
 __success __retval(0)
 __arch_x86_64
+__arch_arm64
 __naked void stack_arg_shared_store(void)
 {
 	asm volatile (
@@ -369,6 +380,7 @@ SEC("tc")
 __description("stack_arg: write beyond max outgoing depth")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("stack arg write offset -80 exceeds max 7 stack args")
 __naked void stack_arg_write_beyond_max(void)
 {
@@ -393,6 +405,7 @@ SEC("tc")
 __description("stack_arg: sequential calls reuse slots")
 __failure
 __arch_x86_64
+__arch_arm64
 __msg("stack arg#6 not properly initialized")
 __naked void stack_arg_sequential_calls(void)
 {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments
  2026-04-20 15:36 ` [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments Puranjay Mohan
@ 2026-04-21  2:58   ` Alexei Starovoitov
  2026-04-21 11:53     ` Puranjay Mohan
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2026-04-21  2:58 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Xu Kuohai, Catalin Marinas, Will Deacon,
	linux-arm-kernel

On Mon, Apr 20, 2026 at 8:36 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>

nice and clean. I like how it maps to arm64 calling convention.

> +       if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth) {
> +               u16 outgoing = prog->aux->stack_arg_depth - prog->aux->incoming_stack_arg_depth;
> +               int nr_on_stack = outgoing / sizeof(u64) - NR_STACK_ARG_REGS;
> +
> +               if (nr_on_stack > 0)
> +                       ctx.stack_arg_size = round_up(nr_on_stack * sizeof(u64), 16);
> +       }

I'm struggling to understand this part.
Why do this when this func calls more than what callee passed in?
Looks fishy. I'd like to see selftests with more than 6,7,8 args.
Because only then this logic will kick in?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments
  2026-04-21  2:58   ` Alexei Starovoitov
@ 2026-04-21 11:53     ` Puranjay Mohan
  2026-04-21 13:53       ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Puranjay Mohan @ 2026-04-21 11:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Xu Kuohai, Catalin Marinas, Will Deacon,
	linux-arm-kernel

On Tue, Apr 21, 2026 at 3:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Apr 20, 2026 at 8:36 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> >
>
> nice and clean. I like how it maps to arm64 calling convention.
>
> > +       if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth) {
> > +               u16 outgoing = prog->aux->stack_arg_depth - prog->aux->incoming_stack_arg_depth;
> > +               int nr_on_stack = outgoing / sizeof(u64) - NR_STACK_ARG_REGS;
> > +
> > +               if (nr_on_stack > 0)
> > +                       ctx.stack_arg_size = round_up(nr_on_stack * sizeof(u64), 16);
> > +       }
>
> I'm struggling to understand this part.
> Why do this when this func calls more than what callee passed in?
> Looks fishy. I'd like to see selftests with more than 6,7,8 args.
> Because only then this logic will kick in?

Your confusion stems from the naming of "incoming_stack_arg_depth" and
"stack_arg_depth" (this should be called total_stack_arg_depth in my
opinion)

So, if you see fixups.c

                func[i]->aux->incoming_stack_arg_depth =
env->subprog_info[i].incoming_stack_arg_depth;
                func[i]->aux->stack_arg_depth =
env->subprog_info[i].incoming_stack_arg_depth +

env->subprog_info[i].outgoing_stack_arg_depth;

prog->aux->stack_arg_depth doesn't store outgoing stack depth, rather
it has the sum of both incoming and outgoing, that means if a func
doesn't call any function with more than 5 arguments but receives more
than five arguments, incoming_stack_arg_depth will be equal to
stack_arg_depth.

if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth)

This check is for - "Does this function call any function with more
than 5 arguments", if yes, is it more than 8? if yes allocate stack
space, otherwise stack space is not needed because argument 6,7,8 can
live in arm64 registers.

I hope this clears the confusion.

Thanks,
Puranjay

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments
  2026-04-21 11:53     ` Puranjay Mohan
@ 2026-04-21 13:53       ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2026-04-21 13:53 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Xu Kuohai, Catalin Marinas, Will Deacon,
	linux-arm-kernel

On Tue, Apr 21, 2026 at 4:53 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
>
> On Tue, Apr 21, 2026 at 3:58 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Apr 20, 2026 at 8:36 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > >
> >
> > nice and clean. I like how it maps to arm64 calling convention.
> >
> > > +       if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth) {
> > > +               u16 outgoing = prog->aux->stack_arg_depth - prog->aux->incoming_stack_arg_depth;
> > > +               int nr_on_stack = outgoing / sizeof(u64) - NR_STACK_ARG_REGS;
> > > +
> > > +               if (nr_on_stack > 0)
> > > +                       ctx.stack_arg_size = round_up(nr_on_stack * sizeof(u64), 16);
> > > +       }
> >
> > I'm struggling to understand this part.
> > Why do this when this func calls more than what callee passed in?
> > Looks fishy. I'd like to see selftests with more than 6,7,8 args.
> > Because only then this logic will kick in?
>
> Your confusion stems from the naming of "incoming_stack_arg_depth" and
> "stack_arg_depth" (this should be called total_stack_arg_depth in my
> opinion)
>
> So, if you see fixups.c
>
>                 func[i]->aux->incoming_stack_arg_depth =
> env->subprog_info[i].incoming_stack_arg_depth;
>                 func[i]->aux->stack_arg_depth =
> env->subprog_info[i].incoming_stack_arg_depth +
>
> env->subprog_info[i].outgoing_stack_arg_depth;
>
> prog->aux->stack_arg_depth doesn't store outgoing stack depth, rather
> it has the sum of both incoming and outgoing, that means if a func
> doesn't call any function with more than 5 arguments but receives more
> than five arguments, incoming_stack_arg_depth will be equal to
> stack_arg_depth.

Ohh. That's indeed all too confusing.
See my response to Yonghong.
I think stack_arg_depth should mean outgoing
and incoming_stack_arg_depth should mean incoming only and
it shouldn't be even used by JIT.
That memory was allocated by caller, so to JIT this callee
the conversion of r11+const is straightforward and no checks necessary.

> if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth)
>
> This check is for - "Does this function call any function with more
> than 5 arguments", if yes, is it more than 8? if yes allocate stack
> space, otherwise stack space is not needed because argument 6,7,8 can
> live in arm64 registers.

I think it should really be one check based on stack_arg_depth.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-21 13:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 15:35 [PATCH bpf-next 0/3] bpf, arm64: Support stack arguments Puranjay Mohan
2026-04-20 15:35 ` [PATCH bpf-next 1/3] bpf, arm64: Map BPF_REG_0 to x8 instead of x7 Puranjay Mohan
2026-04-20 15:36 ` [PATCH bpf-next 2/3] bpf, arm64: Add JIT support for stack arguments Puranjay Mohan
2026-04-21  2:58   ` Alexei Starovoitov
2026-04-21 11:53     ` Puranjay Mohan
2026-04-21 13:53       ` Alexei Starovoitov
2026-04-20 15:36 ` [PATCH bpf-next 3/3] selftests/bpf: Enable stack argument tests for arm64 Puranjay Mohan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox