From: Yonghong Song <yonghong.song@linux.dev>
To: Eduard Zingerman <eddyz87@gmail.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"Jose E . Marchesi" <jose.marchesi@oracle.com>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next 01/18] bpf: Support stack arguments for bpf functions
Date: Tue, 28 Apr 2026 17:47:00 +0100 [thread overview]
Message-ID: <29308729-2a9c-4a4e-9b4f-a92bd185ee22@linux.dev> (raw)
In-Reply-To: <7a031b0dcbf54e34d6a6571256b1bb65b5617bcc.camel@gmail.com>
On 4/28/26 7:29 AM, Eduard Zingerman wrote:
> On Fri, 2026-04-24 at 10:14 -0700, Yonghong Song wrote:
>
> [...]
>
> I didn't see this in the patch, hence the question: should or should
> not this feature be privileged bpf only?
It is priviledged only. See add_subprog_and_kfunc().
both bpf-to-bpf call and kfunc requires bpf_capable.
>
> [...]
>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index d5b4303315dd..2cc349d7fc17 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
> [...]
>
>> @@ -508,6 +512,17 @@ struct bpf_verifier_state {
>> iter < frame->allocated_stack / BPF_REG_SIZE; \
>> iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
>>
>> +#define bpf_get_spilled_stack_arg(slot, frame, mask) \
>> + ((((slot) < frame->out_stack_arg_depth / BPF_REG_SIZE) && \
>> + (frame->stack_arg_regs[slot].type != NOT_INIT)) \
>> + ? &frame->stack_arg_regs[slot] : NULL)
> can this be a static inline function?
We could but we have
#define bpf_get_spilled_reg(slot, frame, mask) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
((1 << frame->stack[slot].slot_type[BPF_REG_SIZE - 1]) & (mask))) \
? &frame->stack[slot].spilled_ptr : NULL)
Should we do the same (as static inline function)?
>
>> +
>> +/* Iterate over 'frame', setting 'reg' to either NULL or a spilled stack arg. */
>> +#define bpf_for_each_spilled_stack_arg(iter, frame, reg, mask) \
>> + for (iter = 0, reg = bpf_get_spilled_stack_arg(iter, frame, mask); \
>> + iter < frame->out_stack_arg_depth / BPF_REG_SIZE; \
>> + iter++, reg = bpf_get_spilled_stack_arg(iter, frame, mask))
>> +
>> #define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr) \
>> ({ \
>> struct bpf_verifier_state *___vstate = __vst; \
>> @@ -525,6 +540,11 @@ struct bpf_verifier_state {
>> continue; \
>> (void)(__expr); \
>> } \
>> + bpf_for_each_spilled_stack_arg(___j, __state, __reg, __mask) { \
>> + if (!__reg) \
>> + continue; \
>> + (void)(__expr); \
>> + } \
>> } \
>> })
> Tangential nit: I think this macro is getting a bit too complicated,
> we might want to introduce some proper reg_state iterator at some
> point, e.g.:
>
> struct ret_iter it = new_reg_iter(state);
> while ((reg = next_reg(&it))) { ... }
You mean have a static function with proper arguments and do the above?
I guess can do a followup later to simplify it.
>
>
>>
>> @@ -739,10 +759,13 @@ struct bpf_subprog_info {
>> bool keep_fastcall_stack: 1;
>> bool changes_pkt_data: 1;
>> bool might_sleep: 1;
>> - u8 arg_cnt:3;
>> + u8 arg_cnt:4;
>>
>> enum priv_stack_mode priv_stack_mode;
>> - struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
>> + struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS];
>> + u16 incoming_stack_arg_depth;
> Can this be inferred from arg_cnt?
> Also, the verifier keeps doing '/ BPF_REG_SIZE' on this number,
> would it be more convenient to keep it as count?
This should work. Let me try.
>
>> + u16 stack_arg_depth; /* incoming + max outgoing */
>> + u16 max_out_stack_arg_depth;
>> };
>>
>> struct bpf_verifier_env;
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 77af44d8a3ad..cfb35a2decf6 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -7880,13 +7880,19 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
>> }
>> args = (const struct btf_param *)(t + 1);
>> nargs = btf_type_vlen(t);
>> - if (nargs > MAX_BPF_FUNC_REG_ARGS) {
>> - if (!is_global)
>> - return -EINVAL;
>> - bpf_log(log, "Global function %s() with %d > %d args. Buggy compiler.\n",
>> + if (nargs > MAX_BPF_FUNC_ARGS) {
>> + bpf_log(log, "Function %s() with %d > %d args not supported.\n",
>> + tname, nargs, MAX_BPF_FUNC_ARGS);
> Nit: I'd report it as "kernel supports at-most %d parameters for regular functions, while function %s is declared to accept %d parameters"
> just to make the rules a bit more explicit.
Okay.
>
>> + return -EINVAL;
>> + }
> [...]
>
>> @@ -1378,9 +1382,21 @@ int bpf_fixup_call_args(struct bpf_verifier_env *env)
>> struct bpf_prog *prog = env->prog;
>> struct bpf_insn *insn = prog->insnsi;
>> bool has_kfunc_call = bpf_prog_has_kfunc_call(prog);
>> - int i, depth;
>> + int depth;
>> #endif
>> - int err = 0;
>> + int i, err = 0;
>> +
>> + for (i = 0; i < env->subprog_cnt; i++) {
>> + struct bpf_subprog_info *subprog = &env->subprog_info[i];
>> + u16 outgoing = subprog->stack_arg_depth - subprog->incoming_stack_arg_depth;
>> +
>> + if (subprog->max_out_stack_arg_depth > outgoing) {
>> + verbose(env,
>> + "func#%d writes stack arg slot at depth %u, but calls only require %u bytes\n",
>> + i, subprog->max_out_stack_arg_depth, outgoing);
>> + return -EINVAL;
> Is this an internal error condition?
> If it is, maybe use verifier_bug()?
It is not. For example,
SEC("tc")
__description("stack_arg: write unused stack arg slot")
__failure
__msg("func#0 writes stack arg slot at depth 40, but calls only require 16 bytes")
__naked void stack_arg_write_unused_slot(void)
{
asm volatile (
"r1 = 1;"
"r2 = 2;"
"r3 = 3;"
"r4 = 4;"
"r5 = 5;"
/* Write to offset -40, unused for the callee */
"*(u64 *)(r11 - 40) = 99;"
"*(u64 *)(r11 - 16) = 20;"
"*(u64 *)(r11 - 8) = 10;"
"call subprog_7args;"
"r0 = 0;"
"exit;"
::: __clobber_all
);
}
>
>> + }
>> + }
>>
>> if (env->prog->jit_requested &&
>> !bpf_prog_is_offloaded(env->prog->aux)) {
>> diff --git a/kernel/bpf/states.c b/kernel/bpf/states.c
>> index 8478d2c6ed5b..3e59d1c3a726 100644
>> --- a/kernel/bpf/states.c
>> +++ b/kernel/bpf/states.c
>> @@ -838,6 +838,34 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>> return true;
>> }
>>
>> +/*
>> + * Compare stack arg slots between old and current states.
>> + * Outgoing stack args are path-local state and must agree for pruning.
>> + */
>> +static bool stack_arg_safe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>> + struct bpf_func_state *cur, struct bpf_idmap *idmap,
>> + enum exact_level exact)
>> +{
>> + int i, nslots;
>> +
>> + nslots = min(old->out_stack_arg_depth, cur->out_stack_arg_depth) / BPF_REG_SIZE;
> this is not safe, e.g. it will accept cur with one argument as
> equivalent for old with two arguments.
Good catch! Will fix.
>
>> + for (i = 0; i < nslots; i++) {
>> + struct bpf_reg_state *old_arg = &old->stack_arg_regs[i];
>> + struct bpf_reg_state *cur_arg = &cur->stack_arg_regs[i];
>> +
>> + if (old_arg->type == NOT_INIT && cur_arg->type == NOT_INIT)
>> + continue;
>> +
>> + if (exact == EXACT && old_arg->type != cur_arg->type)
>> + return false;
>> +
>> + if (!regsafe(env, old_arg, cur_arg, idmap, exact))
>> + return false;
>> + }
> regsafe() seem handles NOT_INIT and EXACT in the same way,
> I don't think there is a necessity to do the handling explicitly here.
I agree that NOT_INIT and EXACT tracks are redundant. I keep them there
for performance reason. But I guess the return is minimum, so I will just
do regsafe() then.
>
>> +
>> + return true;
>> +}
>> +
> [...]
>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index ff6ff1c27517..bcf81692a22b 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -1361,6 +1361,18 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
>> return -ENOMEM;
>>
>> dst->allocated_stack = src->allocated_stack;
>> +
>> + /* copy stack args state */
>> + n = src->out_stack_arg_depth / BPF_REG_SIZE;
>> + if (n) {
>> + dst->stack_arg_regs = copy_array(dst->stack_arg_regs, src->stack_arg_regs, n,
>> + sizeof(struct bpf_reg_state),
>> + GFP_KERNEL_ACCOUNT);
>> + if (!dst->stack_arg_regs)
>> + return -ENOMEM;
>> + }
>> +
>> + dst->out_stack_arg_depth = src->out_stack_arg_depth;
> Given that this is capped by 12, does it make sense to maintain the counter?
> It might be simpler to always allocate an array of 12 elements.
The number of stack arguments is most 7. So yes, we can do it.
>
>> return 0;
>> }
> [...]
>
>> @@ -4417,6 +4446,109 @@ static int check_stack_write(struct bpf_verifier_env *env,
>> return err;
>> }
>>
>> +/*
>> + * Write a value to the outgoing stack arg area.
>> + * off is a negative offset from r11 (e.g. -8 for arg6, -16 for arg7).
>> + */
>> +static int check_stack_arg_write(struct bpf_verifier_env *env, struct bpf_func_state *state,
>> + int off, int value_regno)
>> +{
> Nit: Maybe replace value_regno with pointer to a register state?
> Just for consistency.
yes, we can do it.
>
>> + int max_stack_arg_regs = MAX_BPF_FUNC_ARGS - MAX_BPF_FUNC_REG_ARGS;
>> + struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
>> + int spi = -off / BPF_REG_SIZE - 1;
>> + struct bpf_func_state *cur;
>> + struct bpf_reg_state *arg;
>> + int err;
>> +
>> + if (spi >= max_stack_arg_regs) {
>> + verbose(env, "stack arg write offset %d exceeds max %d stack args\n",
>> + off, max_stack_arg_regs);
>> + return -EINVAL;
>> + }
>> +
>> + err = grow_stack_arg_slots(env, state, -off);
>> + if (err)
>> + return err;
>> +
>> + /* Track the max outgoing stack arg access depth. */
>> + if (-off > subprog->max_out_stack_arg_depth)
>> + subprog->max_out_stack_arg_depth = -off;
>> +
>> + cur = env->cur_state->frame[env->cur_state->curframe];
>> + if (value_regno >= 0) {
>> + state->stack_arg_regs[spi] = cur->regs[value_regno];
> Nit: there is copy_register_state(), we should either use it here or
> drop it and replace with direct assignments everywhere.
Will use copy_register_state() to be consistant with our examples.
>
>> + } else {
>> + /* BPF_ST: store immediate, treat as scalar */
>> + arg = &state->stack_arg_regs[spi];
>> + arg->type = SCALAR_VALUE;
>> + __mark_reg_known(arg, env->prog->insnsi[env->insn_idx].imm);
>> + }
>> + state->no_stack_arg_load = true;
>> + return 0;
>> +}
>> +
>> +/*
>> + * Read a value from the incoming stack arg area.
>> + * off is a positive offset from r11 (e.g. +8 for arg6, +16 for arg7).
>> + */
>> +static int check_stack_arg_read(struct bpf_verifier_env *env, struct bpf_func_state *state,
>> + int off, int dst_regno)
>> +{
>> + struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
>> + struct bpf_verifier_state *vstate = env->cur_state;
>> + int spi = off / BPF_REG_SIZE - 1;
>> + struct bpf_func_state *caller, *cur;
>> + struct bpf_reg_state *arg;
>> +
>> + if (state->no_stack_arg_load) {
>> + verbose(env, "r11 load must be before any r11 store or call insn\n");
>> + return -EINVAL;
>> + }
> I think the error message should be inverted, store should precede the load.
> But tbh, I'd drop it altogether, the check right below should be sufficient.
This is necessary. See
SEC("tc")
__description("stack_arg: r11 load after r11 store")
__failure
__msg("r11 load must be before any r11 store or call insn")
__naked void stack_arg_load_after_store(void)
{
asm volatile (
"r1 = 1;"
"r2 = 2;"
"r3 = 3;"
"r4 = 4;"
"r5 = 5;"
"*(u64 *)(r11 - 8) = 6;"
"r0 = *(u64 *)(r11 + 8);"
"call subprog_6args;"
"exit;"
::: __clobber_all
);
}
SEC("tc")
__description("stack_arg: r11 load after a call")
__failure
__msg("r11 load must be before any r11 store or call insn")
__naked void stack_arg_load_after_call(void)
{
asm volatile (
"call %[bpf_get_prandom_u32];"
"r0 = *(u64 *)(r11 + 8);"
"exit;"
:: __imm(bpf_get_prandom_u32)
: __clobber_all
);
}
>
>> +
>> + if (off > subprog->incoming_stack_arg_depth) {
>> + verbose(env, "invalid read from stack arg off %d depth %d\n",
>> + off, subprog->incoming_stack_arg_depth);
>> + return -EACCES;
>> + }
This is for this kind of failure:
SEC("tc")
__description("stack_arg: read from uninitialized stack arg slot")
__failure
__msg("invalid read from stack arg off 8 depth 0")
__naked void stack_arg_read_uninitialized(void)
{
asm volatile (
"r0 = *(u64 *)(r11 + 8);"
"r0 = 0;"
"exit;"
::: __clobber_all
);
}
>> +
>> + caller = vstate->frame[vstate->curframe - 1];
>> + arg = &caller->stack_arg_regs[spi];
>> + cur = vstate->frame[vstate->curframe];
>> +
>> + if (is_spillable_regtype(arg->type))
>> + copy_register_state(&cur->regs[dst_regno], arg);
>> + else
>> + mark_reg_unknown(env, cur->regs, dst_regno);
> For stack writes we report error in such situations,
> should the same be done here?
We should be fine here.
>
>> + return 0;
>> +}
>> +
>> +static int check_outgoing_stack_args(struct bpf_verifier_env *env, struct bpf_func_state *caller,
>> + int nargs)
>> +{
>> + int i, spi;
>> +
>> + for (i = MAX_BPF_FUNC_REG_ARGS; i < nargs; i++) {
>> + spi = i - MAX_BPF_FUNC_REG_ARGS;
>> + if (spi >= (caller->out_stack_arg_depth / BPF_REG_SIZE) ||
>> + caller->stack_arg_regs[spi].type == NOT_INIT) {
>> + verbose(env, "stack %s not properly initialized\n",
>> + reg_arg_name(env, argno_from_arg(i + 1)));
> Nit: error message is a bit confusing, I'd change it to better reflect the rules, e.g.:
> "function %s expects %d arguments, stack argument %d is not initialized".
Okay.
>
>> + return -EFAULT;
>> + }
>> + }
>> +
>> + return 0;
>> +}
> [...]
next prev parent reply other threads:[~2026-04-28 16:47 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 17:14 [PATCH bpf-next 00/18] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 01/18] bpf: Support stack arguments for bpf functions Yonghong Song
2026-04-24 18:13 ` bot+bpf-ci
2026-04-25 5:09 ` Yonghong Song
2026-04-27 20:40 ` Yonghong Song
2026-04-28 14:29 ` Eduard Zingerman
2026-04-28 16:47 ` Yonghong Song [this message]
2026-04-24 17:14 ` [PATCH bpf-next 02/18] bpf: Add precision marking and backtracking for stack argument slots Yonghong Song
2026-04-24 18:00 ` bot+bpf-ci
2026-04-25 5:10 ` Yonghong Song
2026-04-28 16:46 ` Eduard Zingerman
2026-04-24 17:14 ` [PATCH bpf-next 03/18] bpf: Refactor record_call_access() to extract per-arg logic Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 04/18] bpf: Extend liveness analysis to track stack argument slots Yonghong Song
2026-04-24 18:00 ` bot+bpf-ci
2026-04-25 5:11 ` Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 05/18] bpf: Reject stack arguments in non-JITed programs Yonghong Song
2026-04-24 18:00 ` bot+bpf-ci
2026-04-24 17:15 ` [PATCH bpf-next 06/18] bpf: Prepare architecture JIT support for stack arguments Yonghong Song
2026-04-24 17:48 ` bot+bpf-ci
2026-04-25 5:17 ` Yonghong Song
2026-04-24 17:15 ` [PATCH bpf-next 07/18] bpf: Enable r11 based insns Yonghong Song
2026-04-24 17:15 ` [PATCH bpf-next 08/18] bpf: Support stack arguments for kfunc calls Yonghong Song
2026-04-24 18:00 ` bot+bpf-ci
2026-04-25 5:19 ` Yonghong Song
2026-04-24 17:15 ` [PATCH bpf-next 09/18] bpf: Reject stack arguments if tail call reachable Yonghong Song
2026-04-24 18:00 ` bot+bpf-ci
2026-04-24 17:15 ` [PATCH bpf-next 10/18] bpf,x86: Implement JIT support for stack arguments Yonghong Song
2026-04-24 18:00 ` bot+bpf-ci
2026-04-25 5:29 ` Yonghong Song
2026-04-24 17:16 ` [PATCH bpf-next 11/18] selftests/bpf: Add tests for BPF function " Yonghong Song
2026-04-24 17:16 ` [PATCH bpf-next 12/18] selftests/bpf: Add tests for stack argument validation Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 13/18] selftests/bpf: Add verifier " Yonghong Song
2026-04-24 17:48 ` bot+bpf-ci
2026-04-25 5:33 ` Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 14/18] selftests/bpf: Add BTF fixup for __naked subprog parameter names Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 15/18] selftests/bpf: Add precision backtracking test for stack arguments Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 16/18] bpf, arm64: Map BPF_REG_0 to x8 instead of x7 Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 17/18] bpf, arm64: Add JIT support for stack arguments Yonghong Song
2026-04-24 18:00 ` bot+bpf-ci
2026-04-27 9:06 ` Puranjay Mohan
2026-04-27 20:42 ` Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 18/18] selftests/bpf: Enable stack argument tests for arm64 Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29308729-2a9c-4a4e-9b4f-a92bd185ee22@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=jose.marchesi@oracle.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox