public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Eduard Zingerman <eddyz87@gmail.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Jose E . Marchesi" <jose.marchesi@oracle.com>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next 01/18] bpf: Support stack arguments for bpf functions
Date: Tue, 28 Apr 2026 17:47:00 +0100	[thread overview]
Message-ID: <29308729-2a9c-4a4e-9b4f-a92bd185ee22@linux.dev> (raw)
In-Reply-To: <7a031b0dcbf54e34d6a6571256b1bb65b5617bcc.camel@gmail.com>



On 4/28/26 7:29 AM, Eduard Zingerman wrote:
> On Fri, 2026-04-24 at 10:14 -0700, Yonghong Song wrote:
>
> [...]
>
> I didn't see this in the patch, hence the question: should or should
> not this feature be privileged bpf only?

It is priviledged only. See add_subprog_and_kfunc().
both bpf-to-bpf call and kfunc requires bpf_capable.

>
> [...]
>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index d5b4303315dd..2cc349d7fc17 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
> [...]
>
>> @@ -508,6 +512,17 @@ struct bpf_verifier_state {
>>   	     iter < frame->allocated_stack / BPF_REG_SIZE;		\
>>   	     iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
>>   
>> +#define bpf_get_spilled_stack_arg(slot, frame, mask)                   \
>> +	((((slot) < frame->out_stack_arg_depth / BPF_REG_SIZE) &&           \
>> +	  (frame->stack_arg_regs[slot].type != NOT_INIT))               \
>> +	 ? &frame->stack_arg_regs[slot] : NULL)
> can this be a static inline function?

We could but we have

#define bpf_get_spilled_reg(slot, frame, mask)                          \
         (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
           ((1 << frame->stack[slot].slot_type[BPF_REG_SIZE - 1]) & (mask))) \
          ? &frame->stack[slot].spilled_ptr : NULL)

Should we do the same (as static inline function)?

>
>> +
>> +/* Iterate over 'frame', setting 'reg' to either NULL or a spilled stack arg. */
>> +#define bpf_for_each_spilled_stack_arg(iter, frame, reg, mask)         \
>> +	for (iter = 0, reg = bpf_get_spilled_stack_arg(iter, frame, mask); \
>> +	     iter < frame->out_stack_arg_depth / BPF_REG_SIZE;              \
>> +	     iter++, reg = bpf_get_spilled_stack_arg(iter, frame, mask))
>> +
>>   #define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr)   \
>>   	({                                                               \
>>   		struct bpf_verifier_state *___vstate = __vst;            \
>> @@ -525,6 +540,11 @@ struct bpf_verifier_state {
>>   					continue;                        \
>>   				(void)(__expr);                          \
>>   			}                                                \
>> +			bpf_for_each_spilled_stack_arg(___j, __state, __reg, __mask) { \
>> +				if (!__reg)                              \
>> +					continue;                        \
>> +				(void)(__expr);                          \
>> +			}						 \
>>   		}                                                        \
>>   	})
> Tangential nit: I think this macro is getting a bit too complicated,
> we might want to introduce some proper reg_state iterator at some
> point, e.g.:
>
>    struct ret_iter it = new_reg_iter(state);
>    while ((reg = next_reg(&it))) { ... }

You mean have a static function with proper arguments and do the above?
I guess can do a followup later to simplify it.

>
>
>>   
>> @@ -739,10 +759,13 @@ struct bpf_subprog_info {
>>   	bool keep_fastcall_stack: 1;
>>   	bool changes_pkt_data: 1;
>>   	bool might_sleep: 1;
>> -	u8 arg_cnt:3;
>> +	u8 arg_cnt:4;
>>   
>>   	enum priv_stack_mode priv_stack_mode;
>> -	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
>> +	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS];
>> +	u16 incoming_stack_arg_depth;
> Can this be inferred from arg_cnt?
> Also, the verifier keeps doing '/ BPF_REG_SIZE' on this number,
> would it be more convenient to keep it as count?

This should work. Let me try.

>
>> +	u16 stack_arg_depth; /* incoming + max outgoing */
>> +	u16 max_out_stack_arg_depth;
>>   };
>>   
>>   struct bpf_verifier_env;
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 77af44d8a3ad..cfb35a2decf6 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -7880,13 +7880,19 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
>>   	}
>>   	args = (const struct btf_param *)(t + 1);
>>   	nargs = btf_type_vlen(t);
>> -	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
>> -		if (!is_global)
>> -			return -EINVAL;
>> -		bpf_log(log, "Global function %s() with %d > %d args. Buggy compiler.\n",
>> +	if (nargs > MAX_BPF_FUNC_ARGS) {
>> +		bpf_log(log, "Function %s() with %d > %d args not supported.\n",
>> +			tname, nargs, MAX_BPF_FUNC_ARGS);
> Nit: I'd report it as "kernel supports at-most %d parameters for regular functions, while function %s is declared to accept %d parameters"
>       just to make the rules a bit more explicit.

Okay.

>
>> +		return -EINVAL;
>> +	}
> [...]
>
>> @@ -1378,9 +1382,21 @@ int bpf_fixup_call_args(struct bpf_verifier_env *env)
>>   	struct bpf_prog *prog = env->prog;
>>   	struct bpf_insn *insn = prog->insnsi;
>>   	bool has_kfunc_call = bpf_prog_has_kfunc_call(prog);
>> -	int i, depth;
>> +	int depth;
>>   #endif
>> -	int err = 0;
>> +	int i, err = 0;
>> +
>> +	for (i = 0; i < env->subprog_cnt; i++) {
>> +		struct bpf_subprog_info *subprog = &env->subprog_info[i];
>> +		u16 outgoing = subprog->stack_arg_depth - subprog->incoming_stack_arg_depth;
>> +
>> +		if (subprog->max_out_stack_arg_depth > outgoing) {
>> +			verbose(env,
>> +				"func#%d writes stack arg slot at depth %u, but calls only require %u bytes\n",
>> +				i, subprog->max_out_stack_arg_depth, outgoing);
>> +			return -EINVAL;
> Is this an internal error condition?
> If it is, maybe use verifier_bug()?

It is not. For example,

SEC("tc")
__description("stack_arg: write unused stack arg slot")
__failure
__msg("func#0 writes stack arg slot at depth 40, but calls only require 16 bytes")
__naked void stack_arg_write_unused_slot(void)
{
         asm volatile (
                 "r1 = 1;"
                 "r2 = 2;"
                 "r3 = 3;"
                 "r4 = 4;"
                 "r5 = 5;"
                 /* Write to offset -40, unused for the callee */
                 "*(u64 *)(r11 - 40) = 99;"
                 "*(u64 *)(r11 - 16) = 20;"
                 "*(u64 *)(r11 - 8) = 10;"
                 "call subprog_7args;"
                 "r0 = 0;"
                 "exit;"
                 ::: __clobber_all
         );
}

>
>> +		}
>> +	}
>>   
>>   	if (env->prog->jit_requested &&
>>   	    !bpf_prog_is_offloaded(env->prog->aux)) {
>> diff --git a/kernel/bpf/states.c b/kernel/bpf/states.c
>> index 8478d2c6ed5b..3e59d1c3a726 100644
>> --- a/kernel/bpf/states.c
>> +++ b/kernel/bpf/states.c
>> @@ -838,6 +838,34 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>>   	return true;
>>   }
>>   
>> +/*
>> + * Compare stack arg slots between old and current states.
>> + * Outgoing stack args are path-local state and must agree for pruning.
>> + */
>> +static bool stack_arg_safe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>> +			   struct bpf_func_state *cur, struct bpf_idmap *idmap,
>> +			   enum exact_level exact)
>> +{
>> +	int i, nslots;
>> +
>> +	nslots = min(old->out_stack_arg_depth, cur->out_stack_arg_depth) / BPF_REG_SIZE;
> this is not safe, e.g. it will accept cur with one argument as
> equivalent for old with two arguments.

Good catch! Will fix.

>
>> +	for (i = 0; i < nslots; i++) {
>> +		struct bpf_reg_state *old_arg = &old->stack_arg_regs[i];
>> +		struct bpf_reg_state *cur_arg = &cur->stack_arg_regs[i];
>> +
>> +		if (old_arg->type == NOT_INIT && cur_arg->type == NOT_INIT)
>> +			continue;
>> +
>> +		if (exact == EXACT && old_arg->type != cur_arg->type)
>> +			return false;
>> +
>> +		if (!regsafe(env, old_arg, cur_arg, idmap, exact))
>> +			return false;
>> +	}
> regsafe() seem handles NOT_INIT and EXACT in the same way,
> I don't think there is a necessity to do the handling explicitly here.

I agree that NOT_INIT and EXACT tracks are redundant. I keep them there
for performance reason. But I guess the return is minimum, so I will just
do regsafe() then.

>
>> +
>> +	return true;
>> +}
>> +
> [...]
>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index ff6ff1c27517..bcf81692a22b 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -1361,6 +1361,18 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
>>   		return -ENOMEM;
>>   
>>   	dst->allocated_stack = src->allocated_stack;
>> +
>> +	/* copy stack args state */
>> +	n = src->out_stack_arg_depth / BPF_REG_SIZE;
>> +	if (n) {
>> +		dst->stack_arg_regs = copy_array(dst->stack_arg_regs, src->stack_arg_regs, n,
>> +						 sizeof(struct bpf_reg_state),
>> +						 GFP_KERNEL_ACCOUNT);
>> +		if (!dst->stack_arg_regs)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	dst->out_stack_arg_depth = src->out_stack_arg_depth;
> Given that this is capped by 12, does it make sense to maintain the counter?
> It might be simpler to always allocate an array of 12 elements.

The number of stack arguments is most 7. So yes, we can do it.

>
>>   	return 0;
>>   }
> [...]
>
>> @@ -4417,6 +4446,109 @@ static int check_stack_write(struct bpf_verifier_env *env,
>>   	return err;
>>   }
>>   
>> +/*
>> + * Write a value to the outgoing stack arg area.
>> + * off is a negative offset from r11 (e.g. -8 for arg6, -16 for arg7).
>> + */
>> +static int check_stack_arg_write(struct bpf_verifier_env *env, struct bpf_func_state *state,
>> +				 int off, int value_regno)
>> +{
> Nit: Maybe replace value_regno with pointer to a register state?
>       Just for consistency.

yes, we can do it.

>
>> +	int max_stack_arg_regs = MAX_BPF_FUNC_ARGS - MAX_BPF_FUNC_REG_ARGS;
>> +	struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
>> +	int spi = -off / BPF_REG_SIZE - 1;
>> +	struct bpf_func_state *cur;
>> +	struct bpf_reg_state *arg;
>> +	int err;
>> +
>> +	if (spi >= max_stack_arg_regs) {
>> +		verbose(env, "stack arg write offset %d exceeds max %d stack args\n",
>> +			off, max_stack_arg_regs);
>> +		return -EINVAL;
>> +	}
>> +
>> +	err = grow_stack_arg_slots(env, state, -off);
>> +	if (err)
>> +		return err;
>> +
>> +	/* Track the max outgoing stack arg access depth. */
>> +	if (-off > subprog->max_out_stack_arg_depth)
>> +		subprog->max_out_stack_arg_depth = -off;
>> +
>> +	cur = env->cur_state->frame[env->cur_state->curframe];
>> +	if (value_regno >= 0) {
>> +		state->stack_arg_regs[spi] = cur->regs[value_regno];
> Nit: there is copy_register_state(), we should either use it here or
> drop it and replace with direct assignments everywhere.

Will use copy_register_state() to be consistant with our examples.

>
>> +	} else {
>> +		/* BPF_ST: store immediate, treat as scalar */
>> +		arg = &state->stack_arg_regs[spi];
>> +		arg->type = SCALAR_VALUE;
>> +		__mark_reg_known(arg, env->prog->insnsi[env->insn_idx].imm);
>> +	}
>> +	state->no_stack_arg_load = true;
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Read a value from the incoming stack arg area.
>> + * off is a positive offset from r11 (e.g. +8 for arg6, +16 for arg7).
>> + */
>> +static int check_stack_arg_read(struct bpf_verifier_env *env, struct bpf_func_state *state,
>> +				int off, int dst_regno)
>> +{
>> +	struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
>> +	struct bpf_verifier_state *vstate = env->cur_state;
>> +	int spi = off / BPF_REG_SIZE - 1;
>> +	struct bpf_func_state *caller, *cur;
>> +	struct bpf_reg_state *arg;
>> +
>> +	if (state->no_stack_arg_load) {
>> +		verbose(env, "r11 load must be before any r11 store or call insn\n");
>> +		return -EINVAL;
>> +	}
> I think the error message should be inverted, store should precede the load.
> But tbh, I'd drop it altogether, the check right below should be sufficient.

This is necessary. See

SEC("tc")
__description("stack_arg: r11 load after r11 store")
__failure
__msg("r11 load must be before any r11 store or call insn")
__naked void stack_arg_load_after_store(void)
{
         asm volatile (
                 "r1 = 1;"
                 "r2 = 2;"
                 "r3 = 3;"
                 "r4 = 4;"
                 "r5 = 5;"
                 "*(u64 *)(r11 - 8) = 6;"
                 "r0 = *(u64 *)(r11 + 8);"
                 "call subprog_6args;"
                 "exit;"
                 ::: __clobber_all
         );
}
         
SEC("tc")
__description("stack_arg: r11 load after a call")
__failure
__msg("r11 load must be before any r11 store or call insn")
__naked void stack_arg_load_after_call(void)
{
         asm volatile (
                 "call %[bpf_get_prandom_u32];"
                 "r0 = *(u64 *)(r11 + 8);"
                 "exit;"
                 :: __imm(bpf_get_prandom_u32)
                 : __clobber_all
         );
}

>
>> +
>> +	if (off > subprog->incoming_stack_arg_depth) {
>> +		verbose(env, "invalid read from stack arg off %d depth %d\n",
>> +			off, subprog->incoming_stack_arg_depth);
>> +		return -EACCES;
>> +	}

This is for this kind of failure:

SEC("tc")
__description("stack_arg: read from uninitialized stack arg slot")
__failure
__msg("invalid read from stack arg off 8 depth 0")
__naked void stack_arg_read_uninitialized(void)
{
         asm volatile (
                 "r0 = *(u64 *)(r11 + 8);"
                 "r0 = 0;"
                 "exit;"
                 ::: __clobber_all
         );
}

>> +
>> +	caller = vstate->frame[vstate->curframe - 1];
>> +	arg = &caller->stack_arg_regs[spi];
>> +	cur = vstate->frame[vstate->curframe];
>> +
>> +	if (is_spillable_regtype(arg->type))
>> +		copy_register_state(&cur->regs[dst_regno], arg);
>> +	else
>> +		mark_reg_unknown(env, cur->regs, dst_regno);
> For stack writes we report error in such situations,
> should the same be done here?

We should be fine here.

>
>> +	return 0;
>> +}
>> +
>> +static int check_outgoing_stack_args(struct bpf_verifier_env *env, struct bpf_func_state *caller,
>> +				     int nargs)
>> +{
>> +	int i, spi;
>> +
>> +	for (i = MAX_BPF_FUNC_REG_ARGS; i < nargs; i++) {
>> +		spi = i - MAX_BPF_FUNC_REG_ARGS;
>> +		if (spi >= (caller->out_stack_arg_depth / BPF_REG_SIZE) ||
>> +		    caller->stack_arg_regs[spi].type == NOT_INIT) {
>> +			verbose(env, "stack %s not properly initialized\n",
>> +				reg_arg_name(env, argno_from_arg(i + 1)));
> Nit: error message is a bit confusing, I'd change it to better reflect the rules, e.g.:
>       "function %s expects %d arguments, stack argument %d is not initialized".

Okay.

>
>> +			return -EFAULT;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
> [...]


  reply	other threads:[~2026-04-28 16:47 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 17:14 [PATCH bpf-next 00/18] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 01/18] bpf: Support stack arguments for bpf functions Yonghong Song
2026-04-24 18:13   ` bot+bpf-ci
2026-04-25  5:09     ` Yonghong Song
2026-04-27 20:40       ` Yonghong Song
2026-04-28 14:29   ` Eduard Zingerman
2026-04-28 16:47     ` Yonghong Song [this message]
2026-04-24 17:14 ` [PATCH bpf-next 02/18] bpf: Add precision marking and backtracking for stack argument slots Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:10     ` Yonghong Song
2026-04-28 16:46   ` Eduard Zingerman
2026-04-24 17:14 ` [PATCH bpf-next 03/18] bpf: Refactor record_call_access() to extract per-arg logic Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 04/18] bpf: Extend liveness analysis to track stack argument slots Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:11     ` Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 05/18] bpf: Reject stack arguments in non-JITed programs Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-24 17:15 ` [PATCH bpf-next 06/18] bpf: Prepare architecture JIT support for stack arguments Yonghong Song
2026-04-24 17:48   ` bot+bpf-ci
2026-04-25  5:17     ` Yonghong Song
2026-04-24 17:15 ` [PATCH bpf-next 07/18] bpf: Enable r11 based insns Yonghong Song
2026-04-24 17:15 ` [PATCH bpf-next 08/18] bpf: Support stack arguments for kfunc calls Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:19     ` Yonghong Song
2026-04-24 17:15 ` [PATCH bpf-next 09/18] bpf: Reject stack arguments if tail call reachable Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-24 17:15 ` [PATCH bpf-next 10/18] bpf,x86: Implement JIT support for stack arguments Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:29     ` Yonghong Song
2026-04-24 17:16 ` [PATCH bpf-next 11/18] selftests/bpf: Add tests for BPF function " Yonghong Song
2026-04-24 17:16 ` [PATCH bpf-next 12/18] selftests/bpf: Add tests for stack argument validation Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 13/18] selftests/bpf: Add verifier " Yonghong Song
2026-04-24 17:48   ` bot+bpf-ci
2026-04-25  5:33     ` Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 14/18] selftests/bpf: Add BTF fixup for __naked subprog parameter names Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 15/18] selftests/bpf: Add precision backtracking test for stack arguments Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 16/18] bpf, arm64: Map BPF_REG_0 to x8 instead of x7 Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 17/18] bpf, arm64: Add JIT support for stack arguments Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-27  9:06     ` Puranjay Mohan
2026-04-27 20:42       ` Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 18/18] selftests/bpf: Enable stack argument tests for arm64 Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29308729-2a9c-4a4e-9b4f-a92bd185ee22@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=jose.marchesi@oracle.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox