public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Eduard Zingerman <eddyz87@gmail.com>
To: Yonghong Song <yonghong.song@linux.dev>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	"Jose E . Marchesi" <jose.marchesi@oracle.com>,
	kernel-team@fb.com,  Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next 01/18] bpf: Support stack arguments for bpf functions
Date: Tue, 28 Apr 2026 17:28:33 -0700	[thread overview]
Message-ID: <abe6e8e7a0ac5d4c1fbbf35643577d53db81e891.camel@gmail.com> (raw)
In-Reply-To: <29308729-2a9c-4a4e-9b4f-a92bd185ee22@linux.dev>

On Tue, 2026-04-28 at 17:47 +0100, Yonghong Song wrote:
> 
> On 4/28/26 7:29 AM, Eduard Zingerman wrote:
> > On Fri, 2026-04-24 at 10:14 -0700, Yonghong Song wrote:
> > 
> > [...]
> > 
> > I didn't see this in the patch, hence the question: should or should
> > not this feature be privileged bpf only?
> 
> It is priviledged only. See add_subprog_and_kfunc().
> both bpf-to-bpf call and kfunc requires bpf_capable.

I see, thank you.

> > [...]
> > 
> > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > index d5b4303315dd..2cc349d7fc17 100644
> > > --- a/include/linux/bpf_verifier.h
> > > +++ b/include/linux/bpf_verifier.h
> > [...]
> > 
> > > @@ -508,6 +512,17 @@ struct bpf_verifier_state {
> > >   	     iter < frame->allocated_stack / BPF_REG_SIZE;		\
> > >   	     iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
> > >   
> > > +#define bpf_get_spilled_stack_arg(slot, frame, mask)                   \
> > > +	((((slot) < frame->out_stack_arg_depth / BPF_REG_SIZE) &&           \
> > > +	  (frame->stack_arg_regs[slot].type != NOT_INIT))               \
> > > +	 ? &frame->stack_arg_regs[slot] : NULL)
> > can this be a static inline function?
> 
> We could but we have
> 
> #define bpf_get_spilled_reg(slot, frame, mask)                          \
>          (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
>            ((1 << frame->stack[slot].slot_type[BPF_REG_SIZE - 1]) & (mask))) \
>           ? &frame->stack[slot].spilled_ptr : NULL)
> 
> Should we do the same (as static inline function)?

I think so, yes.

> > > +/* Iterate over 'frame', setting 'reg' to either NULL or a spilled stack arg. */
> > > +#define bpf_for_each_spilled_stack_arg(iter, frame, reg, mask)         \
> > > +	for (iter = 0, reg = bpf_get_spilled_stack_arg(iter, frame, mask); \
> > > +	     iter < frame->out_stack_arg_depth / BPF_REG_SIZE;              \
> > > +	     iter++, reg = bpf_get_spilled_stack_arg(iter, frame, mask))
> > > +
> > >   #define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr)   \
> > >   	({                                                               \
> > >   		struct bpf_verifier_state *___vstate = __vst;            \
> > > @@ -525,6 +540,11 @@ struct bpf_verifier_state {
> > >   					continue;                        \
> > >   				(void)(__expr);                          \
> > >   			}                                                \
> > > +			bpf_for_each_spilled_stack_arg(___j, __state, __reg, __mask) { \
> > > +				if (!__reg)                              \
> > > +					continue;                        \
> > > +				(void)(__expr);                          \
> > > +			}						 \
> > >   		}                                                        \
> > >   	})
> > Tangential nit: I think this macro is getting a bit too complicated,
> > we might want to introduce some proper reg_state iterator at some
> > point, e.g.:
> > 
> >    struct ret_iter it = new_reg_iter(state);
> >    while ((reg = next_reg(&it))) { ... }
> 
> You mean have a static function with proper arguments and do the above?
> I guess can do a followup later to simplify it.

Yes, a structure describing an iterator over all
registers/spills/stack-based arguments and to functions:
one for initialization and one for moving the iterator.

[...]

> > > @@ -1378,9 +1382,21 @@ int bpf_fixup_call_args(struct bpf_verifier_env *env)
> > >   	struct bpf_prog *prog = env->prog;
> > >   	struct bpf_insn *insn = prog->insnsi;
> > >   	bool has_kfunc_call = bpf_prog_has_kfunc_call(prog);
> > > -	int i, depth;
> > > +	int depth;
> > >   #endif
> > > -	int err = 0;
> > > +	int i, err = 0;
> > > +
> > > +	for (i = 0; i < env->subprog_cnt; i++) {
> > > +		struct bpf_subprog_info *subprog = &env->subprog_info[i];
> > > +		u16 outgoing = subprog->stack_arg_depth - subprog->incoming_stack_arg_depth;
> > > +
> > > +		if (subprog->max_out_stack_arg_depth > outgoing) {
> > > +			verbose(env,
> > > +				"func#%d writes stack arg slot at depth %u, but calls only require %u bytes\n",
> > > +				i, subprog->max_out_stack_arg_depth, outgoing);
> > > +			return -EINVAL;
> > Is this an internal error condition?
> > If it is, maybe use verifier_bug()?
> 
> It is not. For example,
> 
> SEC("tc")
> __description("stack_arg: write unused stack arg slot")
> __failure
> __msg("func#0 writes stack arg slot at depth 40, but calls only require 16 bytes")
> __naked void stack_arg_write_unused_slot(void)
> {
>          asm volatile (
>                  "r1 = 1;"
>                  "r2 = 2;"
>                  "r3 = 3;"
>                  "r4 = 4;"
>                  "r5 = 5;"
>                  /* Write to offset -40, unused for the callee */
>                  "*(u64 *)(r11 - 40) = 99;"
>                  "*(u64 *)(r11 - 16) = 20;"
>                  "*(u64 *)(r11 - 8) = 10;"
>                  "call subprog_7args;"
>                  "r0 = 0;"
>                  "exit;"
>                  ::: __clobber_all
>          );
> }

But this is a very partial check, the max_out_stack_arg_depth is
computed per-subprogram, not per-call. As far as I understand the
design, it can't be computed per-call at all. Meaning that if there
are, say, two calls:
- foo(1,2,3,4,5,6,7)   // where foo expects only 6 parameters
- bar(1,2,3,4,5,6,7,8) // where bar expects only 7 parameters

In this case:
- Verifier won't know which of the two calls is bogus, so won't be
  able to point user to the instruction where error occurs.
- This is not a safety condition, meaning that kernel state is not
  broken if more arguments are pushed onto stack (and if it *is* a
  safety condition, then we need to figure out something two check
  both calls above).
  
Thus, I'd suggest not to check this property at all.

[...]

> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -1361,6 +1361,18 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
> > >   		return -ENOMEM;
> > >   
> > >   	dst->allocated_stack = src->allocated_stack;
> > > +
> > > +	/* copy stack args state */
> > > +	n = src->out_stack_arg_depth / BPF_REG_SIZE;
> > > +	if (n) {
> > > +		dst->stack_arg_regs = copy_array(dst->stack_arg_regs, src->stack_arg_regs, n,
> > > +						 sizeof(struct bpf_reg_state),
> > > +						 GFP_KERNEL_ACCOUNT);
> > > +		if (!dst->stack_arg_regs)
> > > +			return -ENOMEM;
> > > +	}
> > > +
> > > +	dst->out_stack_arg_depth = src->out_stack_arg_depth;
> > Given that this is capped by 12, does it make sense to maintain the counter?
> > It might be simpler to always allocate an array of 12 elements.
> 
> The number of stack arguments is most 7. So yes, we can do it.

Note from a short discussion with Alexei today:
he does not think this is a big deal and also thinks that saving some
space by allocating this array only when necessary would be a plus.
I, on the other hand, still think that growing this dynamically is an
over-complication.

[...]

> > > @@ -4417,6 +4446,109 @@ static int check_stack_write(struct bpf_verifier_env *env,
> > >   	return err;
> > >   }
> > >   
> > > +/*
> > > + * Write a value to the outgoing stack arg area.
> > > + * off is a negative offset from r11 (e.g. -8 for arg6, -16 for arg7).
> > > + */
> > > +static int check_stack_arg_write(struct bpf_verifier_env *env, struct bpf_func_state *state,

[...]

> > > +	/* Track the max outgoing stack arg access depth. */
> > > +	if (-off > subprog->max_out_stack_arg_depth)
> > > +		subprog->max_out_stack_arg_depth = -off;
> > > +
> > > +	cur = env->cur_state->frame[env->cur_state->curframe];
> > > +	if (value_regno >= 0) {
> > > +		state->stack_arg_regs[spi] = cur->regs[value_regno];
> > Nit: there is copy_register_state(), we should either use it here or
> > drop it and replace with direct assignments everywhere.
> 
> Will use copy_register_state() to be consistant with our examples.

It is a second time the issue is raised on the mailing list,
so it might be worth it to have a small preparatory patch removing
this function. It had a non-empty body once but now it is truly
useless. Wdyt?

[...]

> > > +/*
> > > + * Read a value from the incoming stack arg area.
> > > + * off is a positive offset from r11 (e.g. +8 for arg6, +16 for arg7).
> > > + */
> > > +static int check_stack_arg_read(struct bpf_verifier_env *env, struct bpf_func_state *state,
> > > +				int off, int dst_regno)
> > > +{
> > > +	struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
> > > +	struct bpf_verifier_state *vstate = env->cur_state;
> > > +	int spi = off / BPF_REG_SIZE - 1;
> > > +	struct bpf_func_state *caller, *cur;
> > > +	struct bpf_reg_state *arg;
> > > +
> > > +	if (state->no_stack_arg_load) {
> > > +		verbose(env, "r11 load must be before any r11 store or call insn\n");
> > > +		return -EINVAL;
> > > +	}
> > I think the error message should be inverted, store should precede the load.
> > But tbh, I'd drop it altogether, the check right below should be sufficient.
> 
> This is necessary. See
> 
> SEC("tc")
> __description("stack_arg: r11 load after r11 store")
> __failure
> __msg("r11 load must be before any r11 store or call insn")
> __naked void stack_arg_load_after_store(void)
> {
>          asm volatile (
>                  "r1 = 1;"
>                  "r2 = 2;"
>                  "r3 = 3;"
>                  "r4 = 4;"
>                  "r5 = 5;"
>                  "*(u64 *)(r11 - 8) = 6;"
>                  "r0 = *(u64 *)(r11 + 8);"
>                  "call subprog_6args;"
>                  "exit;"
>                  ::: __clobber_all
>          );
> }
>          
> SEC("tc")
> __description("stack_arg: r11 load after a call")
> __failure
> __msg("r11 load must be before any r11 store or call insn")
> __naked void stack_arg_load_after_call(void)
> {
>          asm volatile (
>                  "call %[bpf_get_prandom_u32];"
>                  "r0 = *(u64 *)(r11 + 8);"
>                  "exit;"
>                  :: __imm(bpf_get_prandom_u32)
>                  : __clobber_all
>          );
> }
> 
> > 
> > > +
> > > +	if (off > subprog->incoming_stack_arg_depth) {
> > > +		verbose(env, "invalid read from stack arg off %d depth %d\n",
> > > +			off, subprog->incoming_stack_arg_depth);
> > > +		return -EACCES;
> > > +	}
> 
> This is for this kind of failure:
> 
> SEC("tc")
> __description("stack_arg: read from uninitialized stack arg slot")
> __failure
> __msg("invalid read from stack arg off 8 depth 0")
> __naked void stack_arg_read_uninitialized(void)
> {
>          asm volatile (
>                  "r0 = *(u64 *)(r11 + 8);"
>                  "r0 = 0;"
>                  "exit;"
>                  ::: __clobber_all
>          );
> }

Consider your first example:

    > __naked void stack_arg_load_after_store(void)
    > {
    >          asm volatile (
    >                  "r1 = 1;"
    >                  "r2 = 2;"
    >                  "r3 = 3;"
    >                  "r4 = 4;"
    >                  "r5 = 5;"
    >                  "*(u64 *)(r11 - 8) = 6;"
    >                  "r0 = *(u64 *)(r11 + 8);"
                                     ^^^^^^^^^
wouldn't the second check 'if (off > subprog->incoming_stack_arg_depth)...'
be triggered here?

    >                  "call subprog_6args;"
    >                  "exit;"
    >                  ::: __clobber_all
    >          );
    > }

> > > +	caller = vstate->frame[vstate->curframe - 1];
> > > +	arg = &caller->stack_arg_regs[spi];
> > > +	cur = vstate->frame[vstate->curframe];
> > > +
> > > +	if (is_spillable_regtype(arg->type))
> > > +		copy_register_state(&cur->regs[dst_regno], arg);
> > > +	else
> > > +		mark_reg_unknown(env, cur->regs, dst_regno);
> > For stack writes we report error in such situations,
> > should the same be done here?
> 
> We should be fine here.

This is not a bug, sure, but it would be nice to have consistent
behavior for similar situations.

[...]

  parent reply	other threads:[~2026-04-29  0:28 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 17:14 [PATCH bpf-next 00/18] bpf: Support stack arguments for BPF functions and kfuncs Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 01/18] bpf: Support stack arguments for bpf functions Yonghong Song
2026-04-24 18:13   ` bot+bpf-ci
2026-04-25  5:09     ` Yonghong Song
2026-04-27 20:40       ` Yonghong Song
2026-04-28 14:29   ` Eduard Zingerman
2026-04-28 16:47     ` Yonghong Song
2026-04-28 23:50       ` Yonghong Song
2026-04-29  0:28       ` Eduard Zingerman [this message]
2026-04-29 22:52         ` Yonghong Song
2026-04-30  1:38           ` Eduard Zingerman
2026-05-02 17:03   ` Alexei Starovoitov
2026-05-02 21:54     ` Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 02/18] bpf: Add precision marking and backtracking for stack argument slots Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:10     ` Yonghong Song
2026-04-28 16:46   ` Eduard Zingerman
2026-04-28 20:54     ` Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 03/18] bpf: Refactor record_call_access() to extract per-arg logic Yonghong Song
2026-04-29  0:51   ` Eduard Zingerman
2026-04-29 22:55     ` Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 04/18] bpf: Extend liveness analysis to track stack argument slots Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:11     ` Yonghong Song
2026-04-29 12:22   ` Eduard Zingerman
2026-04-29 22:55     ` Yonghong Song
2026-04-24 17:14 ` [PATCH bpf-next 05/18] bpf: Reject stack arguments in non-JITed programs Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-29 12:27   ` Eduard Zingerman
2026-04-24 17:15 ` [PATCH bpf-next 06/18] bpf: Prepare architecture JIT support for stack arguments Yonghong Song
2026-04-24 17:48   ` bot+bpf-ci
2026-04-25  5:17     ` Yonghong Song
2026-04-29 12:37   ` Eduard Zingerman
2026-04-24 17:15 ` [PATCH bpf-next 07/18] bpf: Enable r11 based insns Yonghong Song
2026-04-29 12:48   ` Eduard Zingerman
2026-04-24 17:15 ` [PATCH bpf-next 08/18] bpf: Support stack arguments for kfunc calls Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:19     ` Yonghong Song
2026-04-24 17:15 ` [PATCH bpf-next 09/18] bpf: Reject stack arguments if tail call reachable Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-24 17:15 ` [PATCH bpf-next 10/18] bpf,x86: Implement JIT support for stack arguments Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-25  5:29     ` Yonghong Song
2026-04-24 17:16 ` [PATCH bpf-next 11/18] selftests/bpf: Add tests for BPF function " Yonghong Song
2026-04-24 17:16 ` [PATCH bpf-next 12/18] selftests/bpf: Add tests for stack argument validation Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 13/18] selftests/bpf: Add verifier " Yonghong Song
2026-04-24 17:48   ` bot+bpf-ci
2026-04-25  5:33     ` Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 14/18] selftests/bpf: Add BTF fixup for __naked subprog parameter names Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 15/18] selftests/bpf: Add precision backtracking test for stack arguments Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 16/18] bpf, arm64: Map BPF_REG_0 to x8 instead of x7 Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 17/18] bpf, arm64: Add JIT support for stack arguments Yonghong Song
2026-04-24 18:00   ` bot+bpf-ci
2026-04-27  9:06     ` Puranjay Mohan
2026-04-27 20:42       ` Yonghong Song
2026-04-24 17:17 ` [PATCH bpf-next 18/18] selftests/bpf: Enable stack argument tests for arm64 Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abe6e8e7a0ac5d4c1fbbf35643577d53db81e891.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jose.marchesi@oracle.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox