linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/2] bpf, arm64: support up to 12 arguments
@ 2025-05-22 10:14 Alexis Lothoré
  2025-05-22 10:14 ` [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments Alexis Lothoré
  2025-05-22 10:14 ` [PATCH bpf-next v2 2/2] selftests/bpf: enable many-args tests for arm64 Alexis Lothoré (eBPF Foundation)
  0 siblings, 2 replies; 7+ messages in thread
From: Alexis Lothoré @ 2025-05-22 10:14 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Mykola Lysenko, Shuah Khan, Maxime Coquelin, Alexandre Torgue,
	Florent Revest
  Cc: Bastien Curutchet, ebpf, Thomas Petazzoni, bpf, linux-kernel,
	linux-arm-kernel, linux-kselftest, linux-stm32,
	Alexis Lothoré (eBPF Foundation), Xu Kuohai

Hello,

this is the v2 of the many args series for arm64, being itself a revival
of Xu Kuhoai's work to enable larger arguments count for BPF programs on
ARM64 ([1]).

The discussions in v1 shed some light on some issues around specific
cases, for example with functions passing struct on stack with custom
packing/alignment attributes: those cases can not be properly detected
with the current BTF info. So this new revision aims to separate
concerns with a simpler implementation, just accepting additional args
on stack if we can make sure about the alignment constraints (and so,
refusing attachment to functions passing structs on stacks). I then
checked if the specific alignment constraints could be checked with
larger scalar types rather than structs, but it appears that this use
case is in fact rejected at the verifier level (see a9b59159d338 ("bpf:
Do not allow btf_ctx_access with __int128 types")). So in the end the
specific alignment corner cases raised in [1] can not really happen in
the kernel in its current state. This new revision still brings support
for the standard cases as a first step, it will then be possible to
iterate on top of it to add the more specific cases like struct passed
on stack and larger types.

[1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com/#t

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
Changes in v2:
- remove alignment computation from btf.c
- deduce alignment constraints directly in jit compiler for simple types
- deny attachment to functions with "corner-cases" arguments (ie:
  structs on stack)
- remove custom tests, as the corresponding use cases are locked either
  by the JIT comp or the verifier
- drop RFC
- Link to v1: https://lore.kernel.org/r/20250411-many_args_arm64-v1-0-0a32fe72339e@bootlin.com

---
Alexis Lothoré (eBPF Foundation) (1):
      selftests/bpf: enable many-args tests for arm64

Xu Kuohai (1):
      bpf, arm64: Support up to 12 function arguments

 arch/arm64/net/bpf_jit_comp.c                | 234 ++++++++++++++++++++-------
 tools/testing/selftests/bpf/DENYLIST.aarch64 |   2 -
 2 files changed, 180 insertions(+), 56 deletions(-)
---
base-commit: 9435138c069117cd59a4912b5ea2ae44cc2c5ffa
change-id: 20250220-many_args_arm64-8bd3747e6948

Best regards,
-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments
  2025-05-22 10:14 [PATCH bpf-next v2 0/2] bpf, arm64: support up to 12 arguments Alexis Lothoré
@ 2025-05-22 10:14 ` Alexis Lothoré
  2025-05-27  8:11   ` Xu Kuohai
  2025-05-22 10:14 ` [PATCH bpf-next v2 2/2] selftests/bpf: enable many-args tests for arm64 Alexis Lothoré (eBPF Foundation)
  1 sibling, 1 reply; 7+ messages in thread
From: Alexis Lothoré @ 2025-05-22 10:14 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Mykola Lysenko, Shuah Khan, Maxime Coquelin, Alexandre Torgue,
	Florent Revest
  Cc: Bastien Curutchet, ebpf, Thomas Petazzoni, bpf, linux-kernel,
	linux-arm-kernel, linux-kselftest, linux-stm32,
	Alexis Lothoré (eBPF Foundation), Xu Kuohai

From: Xu Kuohai <xukuohai@huawei.com>

Currently ARM64 bpf trampoline supports up to 8 function arguments.
According to the statistics from commit
473e3150e30a ("bpf, x86: allow function arguments up to 12 for TRACING"),
there are about 200 functions accept 9 to 12 arguments, so adding support
for up to 12 function arguments.

Due to bpf only supporting function arguments up to 16 bytes, according to
AAPCS64, starting from the first argument, each argument is first
attempted to be loaded to 1 or 2 smallest registers from x0-x7, if there
are no enough registers to hold the entire argument, then all remaining
arguments starting from this one are pushed to the stack for passing.
There are some non-trivial cases for which it is not possible to
correctly read arguments from/write arguments to the stack: for example
struct variables may have custom packing/alignment attributes that are
invisible in BTF info. Such cases are denied for now to make sure not to
read incorrect values.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Co-developed-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
Changes in v2:
- refuse attachment to functions passing structs on stack
- use simpler alignment rules for args passed on stack, assuming that
  exotic types are denied either by the verifier and/or the trampoline
  generation code
---
 arch/arm64/net/bpf_jit_comp.c | 234 ++++++++++++++++++++++++++++++++----------
 1 file changed, 180 insertions(+), 54 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 70d7c89d3ac907798e86e0051e7b472c252c1412..8c735bc522e439a4b2e3111fc28b9575a64cdb3a 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -2064,7 +2064,7 @@ bool bpf_jit_supports_subprog_tailcalls(void)
 }
 
 static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
-			    int args_off, int retval_off, int run_ctx_off,
+			    int bargs_off, int retval_off, int run_ctx_off,
 			    bool save_ret)
 {
 	__le32 *branch;
@@ -2106,7 +2106,7 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
 	branch = ctx->image + ctx->idx;
 	emit(A64_NOP, ctx);
 
-	emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx);
+	emit(A64_ADD_I(1, A64_R(0), A64_SP, bargs_off), ctx);
 	if (!p->jited)
 		emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx);
 
@@ -2131,7 +2131,7 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
 }
 
 static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
-			       int args_off, int retval_off, int run_ctx_off,
+			       int bargs_off, int retval_off, int run_ctx_off,
 			       __le32 **branches)
 {
 	int i;
@@ -2141,7 +2141,7 @@ static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
 	 */
 	emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx);
 	for (i = 0; i < tl->nr_links; i++) {
-		invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off,
+		invoke_bpf_prog(ctx, tl->links[i], bargs_off, retval_off,
 				run_ctx_off, true);
 		/* if (*(u64 *)(sp + retval_off) !=  0)
 		 *	goto do_fexit;
@@ -2155,23 +2155,134 @@ static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
 	}
 }
 
-static void save_args(struct jit_ctx *ctx, int args_off, int nregs)
+struct arg_aux {
+	/* how many args are passed through registers, the rest of the args are
+	 * passed through stack
+	 */
+	int args_in_regs;
+	/* how many registers are used to pass arguments */
+	int regs_for_args;
+	/* how much stack is used for additional args passed to bpf program
+	 * that did not fit in original function registers
+	 **/
+	int bstack_for_args;
+	/* home much stack is used for additional args passed to the
+	 * original function when called from trampoline (this one needs
+	 * arguments to be properly aligned)
+	 */
+	int ostack_for_args;
+};
+
+static int calc_arg_aux(const struct btf_func_model *m,
+			 struct arg_aux *a)
 {
-	int i;
+	int stack_slots, nregs, slots, i;
+
+	/* verifier ensures m->nr_args <= MAX_BPF_FUNC_ARGS */
+	for (i = 0, nregs = 0; i < m->nr_args; i++) {
+		slots = (m->arg_size[i] + 7) / 8;
+		if (nregs + slots <= 8) /* passed through register ? */
+			nregs += slots;
+		else
+			break;
+	}
+
+	a->args_in_regs = i;
+	a->regs_for_args = nregs;
+	a->ostack_for_args = 0;
+
+	/* the rest arguments are passed through stack */
+	for (a->ostack_for_args = 0, a->bstack_for_args = 0;
+	     i < m->nr_args; i++) {
+		/* We can not know for sure about exact alignment needs for
+		 * struct passed on stack, so deny those
+		 */
+		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
+			return -EOPNOTSUPP;
+		stack_slots = (m->arg_size[i] + 7) / 8;
+		/* AAPCS 64 C.14: arguments passed on stack must be aligned to
+		 * max(8, arg_natural_alignment)
+		 */
+		a->bstack_for_args += stack_slots * 8;
+		a->ostack_for_args = round_up(a->ostack_for_args + stack_slots * 8, 8);
+	}
 
-	for (i = 0; i < nregs; i++) {
-		emit(A64_STR64I(i, A64_SP, args_off), ctx);
-		args_off += 8;
+	return 0;
+}
+
+static void clear_garbage(struct jit_ctx *ctx, int reg, int effective_bytes)
+{
+	if (effective_bytes) {
+		int garbage_bits = 64 - 8 * effective_bytes;
+#ifdef CONFIG_CPU_BIG_ENDIAN
+		/* garbage bits are at the right end */
+		emit(A64_LSR(1, reg, reg, garbage_bits), ctx);
+		emit(A64_LSL(1, reg, reg, garbage_bits), ctx);
+#else
+		/* garbage bits are at the left end */
+		emit(A64_LSL(1, reg, reg, garbage_bits), ctx);
+		emit(A64_LSR(1, reg, reg, garbage_bits), ctx);
+#endif
 	}
 }
 
-static void restore_args(struct jit_ctx *ctx, int args_off, int nregs)
+static void save_args(struct jit_ctx *ctx, int bargs_off, int oargs_off,
+		      const struct btf_func_model *m,
+		      const struct arg_aux *a,
+		      bool for_call_origin)
 {
 	int i;
+	int reg;
+	int doff;
+	int soff;
+	int slots;
+	u8 tmp = bpf2a64[TMP_REG_1];
+
+	/* store arguments to the stack for the bpf program, or restore
+	 * arguments from stack for the original function
+	 */
+	for (reg = 0; reg < a->regs_for_args; reg++) {
+		emit(for_call_origin ?
+		     A64_LDR64I(reg, A64_SP, bargs_off) :
+		     A64_STR64I(reg, A64_SP, bargs_off),
+		     ctx);
+		bargs_off += 8;
+	}
+
+	soff = 32; /* on stack arguments start from FP + 32 */
+	doff = (for_call_origin ? oargs_off : bargs_off);
+
+	/* save on stack arguments */
+	for (i = a->args_in_regs; i < m->nr_args; i++) {
+		slots = (m->arg_size[i] + 7) / 8;
+		/* AAPCS C.14: additional arguments on stack must be
+		 * aligned on max(8, arg_natural_alignment)
+		 */
+		soff = round_up(soff, 8);
+		if (for_call_origin)
+			doff =  round_up(doff, 8);
+		/* verifier ensures arg_size <= 16, so slots equals 1 or 2 */
+		while (slots-- > 0) {
+			emit(A64_LDR64I(tmp, A64_FP, soff), ctx);
+			/* if there is unused space in the last slot, clear
+			 * the garbage contained in the space.
+			 */
+			if (slots == 0 && !for_call_origin)
+				clear_garbage(ctx, tmp, m->arg_size[i] % 8);
+			emit(A64_STR64I(tmp, A64_SP, doff), ctx);
+			soff += 8;
+			doff += 8;
+		}
+	}
+}
+
+static void restore_args(struct jit_ctx *ctx, int bargs_off, int nregs)
+{
+	int reg;
 
-	for (i = 0; i < nregs; i++) {
-		emit(A64_LDR64I(i, A64_SP, args_off), ctx);
-		args_off += 8;
+	for (reg = 0; reg < nregs; reg++) {
+		emit(A64_LDR64I(reg, A64_SP, bargs_off), ctx);
+		bargs_off += 8;
 	}
 }
 
@@ -2194,17 +2305,21 @@ static bool is_struct_ops_tramp(const struct bpf_tramp_links *fentry_links)
  */
 static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 			      struct bpf_tramp_links *tlinks, void *func_addr,
-			      int nregs, u32 flags)
+			      const struct btf_func_model *m,
+			      const struct arg_aux *a,
+			      u32 flags)
 {
 	int i;
 	int stack_size;
 	int retaddr_off;
 	int regs_off;
 	int retval_off;
-	int args_off;
-	int nregs_off;
+	int bargs_off;
+	int nfuncargs_off;
 	int ip_off;
 	int run_ctx_off;
+	int oargs_off;
+	int nfuncargs;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -2213,31 +2328,38 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 	bool is_struct_ops = is_struct_ops_tramp(fentry);
 
 	/* trampoline stack layout:
-	 *                  [ parent ip         ]
-	 *                  [ FP                ]
-	 * SP + retaddr_off [ self ip           ]
-	 *                  [ FP                ]
+	 *                    [ parent ip         ]
+	 *                    [ FP                ]
+	 * SP + retaddr_off   [ self ip           ]
+	 *                    [ FP                ]
 	 *
-	 *                  [ padding           ] align SP to multiples of 16
+	 *                    [ padding           ] align SP to multiples of 16
 	 *
-	 *                  [ x20               ] callee saved reg x20
-	 * SP + regs_off    [ x19               ] callee saved reg x19
+	 *                    [ x20               ] callee saved reg x20
+	 * SP + regs_off      [ x19               ] callee saved reg x19
 	 *
-	 * SP + retval_off  [ return value      ] BPF_TRAMP_F_CALL_ORIG or
-	 *                                        BPF_TRAMP_F_RET_FENTRY_RET
+	 * SP + retval_off    [ return value      ] BPF_TRAMP_F_CALL_ORIG or
+	 *                                          BPF_TRAMP_F_RET_FENTRY_RET
+	 *                    [ arg reg N         ]
+	 *                    [ ...               ]
+	 * SP + bargs_off     [ arg reg 1         ] for bpf
 	 *
-	 *                  [ arg reg N         ]
-	 *                  [ ...               ]
-	 * SP + args_off    [ arg reg 1         ]
+	 * SP + nfuncargs_off [ arg regs count    ]
 	 *
-	 * SP + nregs_off   [ arg regs count    ]
+	 * SP + ip_off        [ traced function   ] BPF_TRAMP_F_IP_ARG flag
 	 *
-	 * SP + ip_off      [ traced function   ] BPF_TRAMP_F_IP_ARG flag
+	 * SP + run_ctx_off   [ bpf_tramp_run_ctx ]
 	 *
-	 * SP + run_ctx_off [ bpf_tramp_run_ctx ]
+	 *                    [ stack arg N       ]
+	 *                    [ ...               ]
+	 * SP + oargs_off     [ stack arg 1       ] for original func
 	 */
 
 	stack_size = 0;
+	oargs_off = stack_size;
+	if (flags & BPF_TRAMP_F_CALL_ORIG)
+		stack_size +=  a->ostack_for_args;
+
 	run_ctx_off = stack_size;
 	/* room for bpf_tramp_run_ctx */
 	stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8);
@@ -2247,13 +2369,14 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 	if (flags & BPF_TRAMP_F_IP_ARG)
 		stack_size += 8;
 
-	nregs_off = stack_size;
+	nfuncargs_off = stack_size;
 	/* room for args count */
 	stack_size += 8;
 
-	args_off = stack_size;
+	bargs_off = stack_size;
 	/* room for args */
-	stack_size += nregs * 8;
+	nfuncargs = a->regs_for_args + a->bstack_for_args / 8;
+	stack_size += 8 * nfuncargs;
 
 	/* room for return value */
 	retval_off = stack_size;
@@ -2300,11 +2423,11 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 	}
 
 	/* save arg regs count*/
-	emit(A64_MOVZ(1, A64_R(10), nregs, 0), ctx);
-	emit(A64_STR64I(A64_R(10), A64_SP, nregs_off), ctx);
+	emit(A64_MOVZ(1, A64_R(10), nfuncargs, 0), ctx);
+	emit(A64_STR64I(A64_R(10), A64_SP, nfuncargs_off), ctx);
 
-	/* save arg regs */
-	save_args(ctx, args_off, nregs);
+	/* save args for bpf */
+	save_args(ctx, bargs_off, oargs_off, m, a, false);
 
 	/* save callee saved registers */
 	emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx);
@@ -2320,7 +2443,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 	}
 
 	for (i = 0; i < fentry->nr_links; i++)
-		invoke_bpf_prog(ctx, fentry->links[i], args_off,
+		invoke_bpf_prog(ctx, fentry->links[i], bargs_off,
 				retval_off, run_ctx_off,
 				flags & BPF_TRAMP_F_RET_FENTRY_RET);
 
@@ -2330,12 +2453,13 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 		if (!branches)
 			return -ENOMEM;
 
-		invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off,
+		invoke_bpf_mod_ret(ctx, fmod_ret, bargs_off, retval_off,
 				   run_ctx_off, branches);
 	}
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
-		restore_args(ctx, args_off, nregs);
+		/* save args for original func */
+		save_args(ctx, bargs_off, oargs_off, m, a, true);
 		/* call original func */
 		emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx);
 		emit(A64_ADR(A64_LR, AARCH64_INSN_SIZE * 2), ctx);
@@ -2354,7 +2478,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 	}
 
 	for (i = 0; i < fexit->nr_links; i++)
-		invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off,
+		invoke_bpf_prog(ctx, fexit->links[i], bargs_off, retval_off,
 				run_ctx_off, false);
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
@@ -2368,7 +2492,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
 	}
 
 	if (flags & BPF_TRAMP_F_RESTORE_REGS)
-		restore_args(ctx, args_off, nregs);
+		restore_args(ctx, bargs_off, a->regs_for_args);
 
 	/* restore callee saved register x19 and x20 */
 	emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx);
@@ -2428,14 +2552,16 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
 		.idx = 0,
 	};
 	struct bpf_tramp_image im;
+	struct arg_aux  aaux;
 	int nregs, ret;
 
 	nregs = btf_func_model_nregs(m);
-	/* the first 8 registers are used for arguments */
-	if (nregs > 8)
-		return -ENOTSUPP;
 
-	ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, nregs, flags);
+	ret = calc_arg_aux(m, &aaux);
+	if (ret < 0)
+		return ret;
+
+	ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, m, &aaux, flags);
 	if (ret < 0)
 		return ret;
 
@@ -2462,9 +2588,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 				u32 flags, struct bpf_tramp_links *tlinks,
 				void *func_addr)
 {
-	int ret, nregs;
-	void *image, *tmp;
 	u32 size = ro_image_end - ro_image;
+	struct arg_aux aaux;
+	void *image, *tmp;
+	int ret;
 
 	/* image doesn't need to be in module memory range, so we can
 	 * use kvmalloc.
@@ -2480,13 +2607,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 		.write = true,
 	};
 
-	nregs = btf_func_model_nregs(m);
-	/* the first 8 registers are used for arguments */
-	if (nregs > 8)
-		return -ENOTSUPP;
 
 	jit_fill_hole(image, (unsigned int)(ro_image_end - ro_image));
-	ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags);
+	ret = calc_arg_aux(m, &aaux);
+	if (ret)
+		goto out;
+	ret = prepare_trampoline(&ctx, im, tlinks, func_addr, m, &aaux, flags);
 
 	if (ret > 0 && validate_code(&ctx) < 0) {
 		ret = -EINVAL;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v2 2/2] selftests/bpf: enable many-args tests for arm64
  2025-05-22 10:14 [PATCH bpf-next v2 0/2] bpf, arm64: support up to 12 arguments Alexis Lothoré
  2025-05-22 10:14 ` [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments Alexis Lothoré
@ 2025-05-22 10:14 ` Alexis Lothoré (eBPF Foundation)
  1 sibling, 0 replies; 7+ messages in thread
From: Alexis Lothoré (eBPF Foundation) @ 2025-05-22 10:14 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Mykola Lysenko, Shuah Khan, Maxime Coquelin, Alexandre Torgue,
	Florent Revest
  Cc: Bastien Curutchet, ebpf, Thomas Petazzoni, bpf, linux-kernel,
	linux-arm-kernel, linux-kselftest, linux-stm32,
	Alexis Lothoré (eBPF Foundation)

Now that support for up to 12 args is enabled for tracing programs on
ARM64, enable the existing tests for this feature on this architecture.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
Changes in v2:
- keep tracing struct tests disabled, as structs passed on stack are not
  handled by the new revision
---
 tools/testing/selftests/bpf/DENYLIST.aarch64 | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index 6d8feda27ce9de07d77d6e384666082923e3dc76..12e99c0277a8cbf9e63e8f6d3a108c8a1208407b 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,3 +1 @@
-fentry_test/fentry_many_args                     # fentry_many_args:FAIL:fentry_many_args_attach unexpected error: -524
-fexit_test/fexit_many_args                       # fexit_many_args:FAIL:fexit_many_args_attach unexpected error: -524
 tracing_struct/struct_many_args                  # struct_many_args:FAIL:tracing_struct_many_args__attach unexpected error: -524

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments
  2025-05-22 10:14 ` [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments Alexis Lothoré
@ 2025-05-27  8:11   ` Xu Kuohai
  2025-05-27  8:45     ` Alexis Lothoré
  0 siblings, 1 reply; 7+ messages in thread
From: Xu Kuohai @ 2025-05-27  8:11 UTC (permalink / raw)
  To: Alexis Lothoré, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Mykola Lysenko, Shuah Khan,
	Maxime Coquelin, Alexandre Torgue, Florent Revest
  Cc: Bastien Curutchet, ebpf, Thomas Petazzoni, bpf, linux-kernel,
	linux-arm-kernel, linux-kselftest, linux-stm32, Xu Kuohai

On 5/22/2025 6:14 PM, Alexis Lothoré wrote:

[...]

> -static void save_args(struct jit_ctx *ctx, int args_off, int nregs)
> +struct arg_aux {
> +	/* how many args are passed through registers, the rest of the args are
> +	 * passed through stack
> +	 */
> +	int args_in_regs;
> +	/* how many registers are used to pass arguments */
> +	int regs_for_args;
> +	/* how much stack is used for additional args passed to bpf program
> +	 * that did not fit in original function registers
> +	 **/

nit: "**/" should be "*/"

> +	int bstack_for_args;
> +	/* home much stack is used for additional args passed to the
> +	 * original function when called from trampoline (this one needs
> +	 * arguments to be properly aligned)
> +	 */
> +	int ostack_for_args;
> +};
> +
> +static int calc_arg_aux(const struct btf_func_model *m,
> +			 struct arg_aux *a)
>   {
> -	int i;
> +	int stack_slots, nregs, slots, i;
> +
> +	/* verifier ensures m->nr_args <= MAX_BPF_FUNC_ARGS */
> +	for (i = 0, nregs = 0; i < m->nr_args; i++) {
> +		slots = (m->arg_size[i] + 7) / 8;
> +		if (nregs + slots <= 8) /* passed through register ? */
> +			nregs += slots;
> +		else
> +			break;
> +	}
> +
> +	a->args_in_regs = i;
> +	a->regs_for_args = nregs;
> +	a->ostack_for_args = 0;
> +
> +	/* the rest arguments are passed through stack */
> +	for (a->ostack_for_args = 0, a->bstack_for_args = 0;
> +	     i < m->nr_args; i++) {

a->ostack_for_args is initialized twice.

move all initializations before the loop?

> +		/* We can not know for sure about exact alignment needs for
> +		 * struct passed on stack, so deny those
> +		 */
> +		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
> +			return -EOPNOTSUPP;

leave the error code as is, namely, return -ENOTSUPP?

> +		stack_slots = (m->arg_size[i] + 7) / 8;
> +		/* AAPCS 64 C.14: arguments passed on stack must be aligned to
> +		 * max(8, arg_natural_alignment)
> +		 */
> +		a->bstack_for_args += stack_slots * 8;
> +		a->ostack_for_args = round_up(a->ostack_for_args + stack_slots * 8, 8);

since a->ostack_for_args starts from 0 and is always incremented
by multiples of 8, round_up() to 8 is not needed.

> +	}
>   
> -	for (i = 0; i < nregs; i++) {
> -		emit(A64_STR64I(i, A64_SP, args_off), ctx);
> -		args_off += 8;
> +	return 0;
> +}
> +
> +static void clear_garbage(struct jit_ctx *ctx, int reg, int effective_bytes)
> +{
> +	if (effective_bytes) {
> +		int garbage_bits = 64 - 8 * effective_bytes;
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +		/* garbage bits are at the right end */
> +		emit(A64_LSR(1, reg, reg, garbage_bits), ctx);
> +		emit(A64_LSL(1, reg, reg, garbage_bits), ctx);
> +#else
> +		/* garbage bits are at the left end */
> +		emit(A64_LSL(1, reg, reg, garbage_bits), ctx);
> +		emit(A64_LSR(1, reg, reg, garbage_bits), ctx);
> +#endif
>   	}
>   }
>   
> -static void restore_args(struct jit_ctx *ctx, int args_off, int nregs)
> +static void save_args(struct jit_ctx *ctx, int bargs_off, int oargs_off,
> +		      const struct btf_func_model *m,
> +		      const struct arg_aux *a,
> +		      bool for_call_origin)
>   {
>   	int i;
> +	int reg;
> +	int doff;
> +	int soff;
> +	int slots;
> +	u8 tmp = bpf2a64[TMP_REG_1];
> +
> +	/* store arguments to the stack for the bpf program, or restore
> +	 * arguments from stack for the original function
> +	 */
> +	for (reg = 0; reg < a->regs_for_args; reg++) {
> +		emit(for_call_origin ?
> +		     A64_LDR64I(reg, A64_SP, bargs_off) :
> +		     A64_STR64I(reg, A64_SP, bargs_off),
> +		     ctx);
> +		bargs_off += 8;
> +	}
> +
> +	soff = 32; /* on stack arguments start from FP + 32 */
> +	doff = (for_call_origin ? oargs_off : bargs_off);
> +
> +	/* save on stack arguments */
> +	for (i = a->args_in_regs; i < m->nr_args; i++) {
> +		slots = (m->arg_size[i] + 7) / 8;
> +		/* AAPCS C.14: additional arguments on stack must be
> +		 * aligned on max(8, arg_natural_alignment)
> +		 */
> +		soff = round_up(soff, 8);
> +		if (for_call_origin)
> +			doff =  round_up(doff, 8);

since both soff and doff start from multiples of 8 and are
incremented by 8 each time, the two round_up()s are also
not needed.

> +		/* verifier ensures arg_size <= 16, so slots equals 1 or 2 */
> +		while (slots-- > 0) {
> +			emit(A64_LDR64I(tmp, A64_FP, soff), ctx);
> +			/* if there is unused space in the last slot, clear
> +			 * the garbage contained in the space.
> +			 */
> +			if (slots == 0 && !for_call_origin)
> +				clear_garbage(ctx, tmp, m->arg_size[i] % 8);
> +			emit(A64_STR64I(tmp, A64_SP, doff), ctx);
> +			soff += 8;
> +			doff += 8;
> +		}
> +	}
> +}

[...]


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments
  2025-05-27  8:11   ` Xu Kuohai
@ 2025-05-27  8:45     ` Alexis Lothoré
  2025-05-27  9:09       ` Xu Kuohai
  0 siblings, 1 reply; 7+ messages in thread
From: Alexis Lothoré @ 2025-05-27  8:45 UTC (permalink / raw)
  To: Xu Kuohai, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Catalin Marinas, Will Deacon, Mykola Lysenko,
	Shuah Khan, Maxime Coquelin, Alexandre Torgue, Florent Revest
  Cc: Bastien Curutchet, ebpf, Thomas Petazzoni, bpf, linux-kernel,
	linux-arm-kernel, linux-kselftest, linux-stm32, Xu Kuohai

Hi Xu, thanks for the review

On Tue May 27, 2025 at 10:11 AM CEST, Xu Kuohai wrote:
> On 5/22/2025 6:14 PM, Alexis Lothoré wrote:
>
> [...]
>
>> -static void save_args(struct jit_ctx *ctx, int args_off, int nregs)
>> +struct arg_aux {
>> +	/* how many args are passed through registers, the rest of the args are
>> +	 * passed through stack
>> +	 */
>> +	int args_in_regs;
>> +	/* how many registers are used to pass arguments */
>> +	int regs_for_args;
>> +	/* how much stack is used for additional args passed to bpf program
>> +	 * that did not fit in original function registers
>> +	 **/
>
> nit: "**/" should be "*/"

ACK

[...]

>> +	a->ostack_for_args = 0;
>> +
>> +	/* the rest arguments are passed through stack */
>> +	for (a->ostack_for_args = 0, a->bstack_for_args = 0;
>> +	     i < m->nr_args; i++) {
>
> a->ostack_for_args is initialized twice.
>
> move all initializations before the loop?

ACK

>> +		/* We can not know for sure about exact alignment needs for
>> +		 * struct passed on stack, so deny those
>> +		 */
>> +		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
>> +			return -EOPNOTSUPP;
>
> leave the error code as is, namely, return -ENOTSUPP?

Actually this change follows a complaint from checkpatch:

"WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP"

>> +		stack_slots = (m->arg_size[i] + 7) / 8;
>> +		/* AAPCS 64 C.14: arguments passed on stack must be aligned to
>> +		 * max(8, arg_natural_alignment)
>> +		 */
>> +		a->bstack_for_args += stack_slots * 8;
>> +		a->ostack_for_args = round_up(a->ostack_for_args + stack_slots * 8, 8);
>
> since a->ostack_for_args starts from 0 and is always incremented
> by multiples of 8, round_up() to 8 is not needed.

True. This is a (partial) remnant from the first attempt to handle more
exotic alignments like large structs or __int128, but that's indeed not
needed for this current version. I'll clean it up.

[...]

>> +	for (i = a->args_in_regs; i < m->nr_args; i++) {
>> +		slots = (m->arg_size[i] + 7) / 8;
>> +		/* AAPCS C.14: additional arguments on stack must be
>> +		 * aligned on max(8, arg_natural_alignment)
>> +		 */
>> +		soff = round_up(soff, 8);
>> +		if (for_call_origin)
>> +			doff =  round_up(doff, 8);
>
> since both soff and doff start from multiples of 8 and are
> incremented by 8 each time, the two round_up()s are also
> not needed.

ACK. I guess the small AAPCS mention can go too then.

>
>> +		/* verifier ensures arg_size <= 16, so slots equals 1 or 2 */
>> +		while (slots-- > 0) {
>> +			emit(A64_LDR64I(tmp, A64_FP, soff), ctx);
>> +			/* if there is unused space in the last slot, clear
>> +			 * the garbage contained in the space.
>> +			 */
>> +			if (slots == 0 && !for_call_origin)
>> +				clear_garbage(ctx, tmp, m->arg_size[i] % 8);
>> +			emit(A64_STR64I(tmp, A64_SP, doff), ctx);
>> +			soff += 8;
>> +			doff += 8;
>> +		}
>> +	}
>> +}
>
> [...]




-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments
  2025-05-27  8:45     ` Alexis Lothoré
@ 2025-05-27  9:09       ` Xu Kuohai
  2025-05-27  9:13         ` Alexis Lothoré
  0 siblings, 1 reply; 7+ messages in thread
From: Xu Kuohai @ 2025-05-27  9:09 UTC (permalink / raw)
  To: Alexis Lothoré, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Mykola Lysenko, Shuah Khan,
	Maxime Coquelin, Alexandre Torgue, Florent Revest
  Cc: Bastien Curutchet, ebpf, Thomas Petazzoni, bpf, linux-kernel,
	linux-arm-kernel, linux-kselftest, linux-stm32, Xu Kuohai

On 5/27/2025 4:45 PM, Alexis Lothoré wrote:

[...]

>>> +		/* We can not know for sure about exact alignment needs for
>>> +		 * struct passed on stack, so deny those
>>> +		 */
>>> +		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
>>> +			return -EOPNOTSUPP;
>> leave the error code as is, namely, return -ENOTSUPP?
> Actually this change follows a complaint from checkpatch:
> 
> "WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP"

Seems we can just ignore this warning, as ENOTSUPP is already used
throughout bpf, and the actual value -524 is well recognized.

[...]


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments
  2025-05-27  9:09       ` Xu Kuohai
@ 2025-05-27  9:13         ` Alexis Lothoré
  0 siblings, 0 replies; 7+ messages in thread
From: Alexis Lothoré @ 2025-05-27  9:13 UTC (permalink / raw)
  To: Xu Kuohai, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Catalin Marinas, Will Deacon, Mykola Lysenko,
	Shuah Khan, Maxime Coquelin, Alexandre Torgue, Florent Revest
  Cc: Bastien Curutchet, ebpf, Thomas Petazzoni, bpf, linux-kernel,
	linux-arm-kernel, linux-kselftest, linux-stm32, Xu Kuohai

On Tue May 27, 2025 at 11:09 AM CEST, Xu Kuohai wrote:
> On 5/27/2025 4:45 PM, Alexis Lothoré wrote:
>
> [...]
>
>>>> +		/* We can not know for sure about exact alignment needs for
>>>> +		 * struct passed on stack, so deny those
>>>> +		 */
>>>> +		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
>>>> +			return -EOPNOTSUPP;
>>> leave the error code as is, namely, return -ENOTSUPP?
>> Actually this change follows a complaint from checkpatch:
>> 
>> "WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP"
>
> Seems we can just ignore this warning, as ENOTSUPP is already used
> throughout bpf, and the actual value -524 is well recognized.

Ok, then I'll switch it back to ENOTSUPP


-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-27  9:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 10:14 [PATCH bpf-next v2 0/2] bpf, arm64: support up to 12 arguments Alexis Lothoré
2025-05-22 10:14 ` [PATCH bpf-next v2 1/2] bpf, arm64: Support up to 12 function arguments Alexis Lothoré
2025-05-27  8:11   ` Xu Kuohai
2025-05-27  8:45     ` Alexis Lothoré
2025-05-27  9:09       ` Xu Kuohai
2025-05-27  9:13         ` Alexis Lothoré
2025-05-22 10:14 ` [PATCH bpf-next v2 2/2] selftests/bpf: enable many-args tests for arm64 Alexis Lothoré (eBPF Foundation)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).