From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF0B52989B0 for ; Tue, 21 Apr 2026 04:15:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.186 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776744925; cv=none; b=lV1/ocu3tFvYsIhj0XVPTOneoJoxVMyg79sYQRGmiWiyzj1P+hMPbKazxsFKJxCi+EMtLUMV9ONo3UzREhGqw3BbbtaHFiSgJJMZXR3MnGStuGnH4qx1sO2hDQm1W/ljx625nawfOby5oDIiunt2uc9QgkdEmZ0G+v+AQ2VckoI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776744925; c=relaxed/simple; bh=HlZcS+7drJ5Y0ZOwVmaR05uNc4S2Tce84y+aZ2QQNwM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=HTasXuKBtwJd7LVGn3dNeR5wCMTzScnn5QJJOTWfF4VqFz/V4wjVBnryg/JrWTEeGjeeZHeMQ5Ae3FbY1iK8D6QP3uKJbc9UidlSrIUlwVuxB0yifeA5Q6mVay2hqrZzQ8Brs7neNiyx/et3G6KHozYaeqwIas2UFrgf/Wwm8EU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=h4KWC3Or; arc=none smtp.client-ip=95.215.58.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="h4KWC3Or" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1776744921; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MPXA1KymVxU4NwMQzNBTyzNjhWYYKFTKUkUyRMeLkRI=; b=h4KWC3OrmD75fe5SlJ4Sb04VTj+hend+kw2cWLj3Gt9qZoSj7Ue2PATad1eZ4jc5YyzJx9 tWwM44Dnf97PU8p+4smoG336kGpOl8PaKj83XLPkGrjzZXe+XogbqH3rGBye3mPU9hP52B a9ImdRcAYDzJ0GBZnnCSikhQZttiigk= Date: Mon, 20 Apr 2026 21:15:13 -0700 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next v6 07/17] bpf: Support stack arguments for bpf functions Content-Language: en-GB To: Alexei Starovoitov , bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , "Jose E . Marchesi" , kernel-team@fb.com, Martin KaFai Lau References: <20260419163316.731019-1-yonghong.song@linux.dev> <20260419163352.734115-1-yonghong.song@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 4/20/26 5:37 PM, Alexei Starovoitov wrote: > On Sun Apr 19, 2026 at 9:33 AM PDT, Yonghong Song wrote: >> Currently BPF functions (subprogs) are limited to 5 register arguments. >> With [1], the compiler can emit code that passes additional arguments >> via a dedicated stack area through bpf register BPF_REG_PARAMS (r11), >> introduced in the previous patch. >> >> The compiler uses positive r11 offsets for incoming (callee-side) args >> and negative r11 offsets for outgoing (caller-side) args, following the >> x86_64/arm64 calling convention direction. There is an 8-byte gap at >> offset 0 separating the two regions: >> Incoming (callee reads): r11+8 (arg6), r11+16 (arg7), ... >> Outgoing (caller writes): r11-8 (arg6), r11-16 (arg7), ... > This part looks clean now. > >> A per-state bitmask out_stack_arg_mask tracks which outgoing stack arg >> slots have been written on the current path. Each bit corresponds to >> an outgoing slot index (bit 0 = r11-8 = arg6, bit 1 = r11-16 = arg7, >> etc.). At a call site, the verifier checks that all slots required by >> the callee have their corresponding mask bits set. This enables >> precise per-path tracking: if one branch of a conditional writes arg6 >> but another does not, the mask correctly reflects the difference and >> the verifier rejects the uninitialized path. The mask is included in >> stack_arg_safe() so that states with different sets of initialized >> slots are not incorrectly pruned together. > But this part I don't understand. > why do you need this bitmask? > Even when they're written out of order stack_arg_depth is all you need. > Then compare old->stack_arg_regs vs cur->stack_arg_regs. > If one is not written its state will be NOT_INIT. > so > *(u64 *)(r11 - 16) = r7; > // without *(u64 *)(r11 - 8) = r6; > call bar1; // arg6 = r6, arg7 = r7 > > will fail the verification. I added bitmask to make it easy for early comparison for pruning since maintenance of bitmask is light. But I think this probably not necessary. Will remove. > >> @@ -1669,6 +1669,8 @@ struct bpf_prog_aux { >> u32 max_pkt_offset; >> u32 max_tp_access; >> u32 stack_depth; >> + u16 incoming_stack_arg_depth; >> + u16 stack_arg_depth; /* both incoming and max outgoing of stack arguments */ > these two are ok. I'm assuming you don't want JIT to recompute them. > >> u32 id; >> u32 func_cnt; /* used by non-func prog as the number of func progs */ >> u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */ >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h >> index 9fbbddc40d21..bb6d8cab3a35 100644 >> --- a/include/linux/bpf_verifier.h >> +++ b/include/linux/bpf_verifier.h >> @@ -372,6 +372,11 @@ struct bpf_func_state { >> * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE. >> */ >> int allocated_stack; >> + >> + u16 stack_arg_depth; /* Size of incoming + max outgoing stack args in bytes. */ >> + u16 incoming_stack_arg_depth; /* Size of incoming stack args in bytes. */ > but incoming_stack_arg_depth looks odd. > > Callee should be accessing caller's stack_arg_depth and > caller's stack_arg_regs. Okay. Will reduce local caching and get the value from the caller. > >> + u16 out_stack_arg_mask; /* Bitmask of outgoing stack arg slots that have been written. */ >> + struct bpf_reg_state *stack_arg_regs; /* On-stack arguments */ >> }; >> >> #define MAX_CALL_FRAMES 8 >> @@ -508,6 +513,17 @@ struct bpf_verifier_state { >> iter < frame->allocated_stack / BPF_REG_SIZE; \ >> iter++, reg = bpf_get_spilled_reg(iter, frame, mask)) >> >> +#define bpf_get_spilled_stack_arg(slot, frame, mask) \ >> + ((((slot) < frame->stack_arg_depth / BPF_REG_SIZE) && \ >> + (frame->stack_arg_regs[slot].type != NOT_INIT)) \ >> + ? &frame->stack_arg_regs[slot] : NULL) >> + >> +/* Iterate over 'frame', setting 'reg' to either NULL or a spilled stack arg. */ >> +#define bpf_for_each_spilled_stack_arg(iter, frame, reg, mask) \ >> + for (iter = 0, reg = bpf_get_spilled_stack_arg(iter, frame, mask); \ >> + iter < frame->stack_arg_depth / BPF_REG_SIZE; \ >> + iter++, reg = bpf_get_spilled_stack_arg(iter, frame, mask)) >> + >> #define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr) \ >> ({ \ >> struct bpf_verifier_state *___vstate = __vst; \ >> @@ -525,6 +541,11 @@ struct bpf_verifier_state { >> continue; \ >> (void)(__expr); \ >> } \ >> + bpf_for_each_spilled_stack_arg(___j, __state, __reg, __mask) { \ >> + if (!__reg) \ >> + continue; \ >> + (void)(__expr); \ >> + } \ >> } \ >> }) >> >> @@ -739,10 +760,13 @@ struct bpf_subprog_info { >> bool keep_fastcall_stack: 1; >> bool changes_pkt_data: 1; >> bool might_sleep: 1; >> - u8 arg_cnt:3; >> + u8 arg_cnt:4; >> >> enum priv_stack_mode priv_stack_mode; >> - struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; >> + struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS]; >> + u16 incoming_stack_arg_depth; >> + u16 outgoing_stack_arg_depth; >> + u16 max_out_stack_arg_depth; > but you already have them in prog_aux?! another copy in bpf_subprog_info?! > Remove one of them. JIT only need one set. Okay. > >> }; >> >> struct bpf_verifier_env; >> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c >> index a62d78581207..c5f3aa05d5a3 100644 >> --- a/kernel/bpf/btf.c >> +++ b/kernel/bpf/btf.c >> @@ -7887,13 +7887,19 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog) >> } >> args = (const struct btf_param *)(t + 1); >> nargs = btf_type_vlen(t); >> - if (nargs > MAX_BPF_FUNC_REG_ARGS) { >> - if (!is_global) >> - return -EINVAL; >> - bpf_log(log, "Global function %s() with %d > %d args. Buggy compiler.\n", >> + if (nargs > MAX_BPF_FUNC_ARGS) { >> + bpf_log(log, "Function %s() with %d > %d args not supported.\n", >> + tname, nargs, MAX_BPF_FUNC_ARGS); >> + return -EINVAL; >> + } >> + if (is_global && nargs > MAX_BPF_FUNC_REG_ARGS) { >> + bpf_log(log, "Global function %s() with %d > %d args not supported.\n", >> tname, nargs, MAX_BPF_FUNC_REG_ARGS); >> return -EINVAL; >> } >> + if (nargs > MAX_BPF_FUNC_REG_ARGS) >> + sub->incoming_stack_arg_depth = (nargs - MAX_BPF_FUNC_REG_ARGS) * BPF_REG_SIZE; >> + >> /* check that function is void or returns int, exception cb also requires this */ >> t = btf_type_by_id(btf, t->type); >> while (btf_type_is_modifier(t)) >> diff --git a/kernel/bpf/fixups.c b/kernel/bpf/fixups.c >> index fba9e8c00878..c4e0224ad2f2 100644 >> --- a/kernel/bpf/fixups.c >> +++ b/kernel/bpf/fixups.c >> @@ -1123,6 +1123,9 @@ static int jit_subprogs(struct bpf_verifier_env *env) >> >> func[i]->aux->name[0] = 'F'; >> func[i]->aux->stack_depth = env->subprog_info[i].stack_depth; >> + func[i]->aux->incoming_stack_arg_depth = env->subprog_info[i].incoming_stack_arg_depth; >> + func[i]->aux->stack_arg_depth = env->subprog_info[i].incoming_stack_arg_depth + >> + env->subprog_info[i].outgoing_stack_arg_depth; >> if (env->subprog_info[i].priv_stack_mode == PRIV_STACK_ADAPTIVE) >> func[i]->aux->jits_use_priv_stack = true; >> >> @@ -1301,8 +1304,10 @@ int bpf_jit_subprogs(struct bpf_verifier_env *env) >> struct bpf_insn_aux_data *orig_insn_aux; >> u32 *orig_subprog_starts; >> >> - if (env->subprog_cnt <= 1) >> + if (env->subprog_cnt <= 1) { >> + env->prog->aux->stack_arg_depth = env->subprog_info[0].outgoing_stack_arg_depth; >> return 0; >> + } >> >> prog = orig_prog = env->prog; >> if (bpf_prog_need_blind(prog)) { >> @@ -1378,9 +1383,20 @@ int bpf_fixup_call_args(struct bpf_verifier_env *env) >> struct bpf_prog *prog = env->prog; >> struct bpf_insn *insn = prog->insnsi; >> bool has_kfunc_call = bpf_prog_has_kfunc_call(prog); >> - int i, depth; >> + int depth; >> #endif >> - int err = 0; >> + int i, err = 0; >> + >> + for (i = 0; i < env->subprog_cnt; i++) { >> + struct bpf_subprog_info *subprog = &env->subprog_info[i]; >> + >> + if (subprog->max_out_stack_arg_depth > subprog->outgoing_stack_arg_depth) { >> + verbose(env, >> + "func#%d writes stack arg slot at depth %u, but calls only require %u bytes\n", >> + i, subprog->max_out_stack_arg_depth, subprog->outgoing_stack_arg_depth); >> + return -EINVAL; >> + } >> + } >> >> if (env->prog->jit_requested && >> !bpf_prog_is_offloaded(env->prog->aux)) { >> diff --git a/kernel/bpf/states.c b/kernel/bpf/states.c >> index 8478d2c6ed5b..235841d23fe3 100644 >> --- a/kernel/bpf/states.c >> +++ b/kernel/bpf/states.c >> @@ -838,6 +838,44 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, >> return true; >> } >> >> +/* >> + * Compare stack arg slots between old and current states. >> + * Outgoing stack args are path-local state and must agree for pruning. >> + */ >> +static bool stack_arg_safe(struct bpf_verifier_env *env, struct bpf_func_state *old, >> + struct bpf_func_state *cur, struct bpf_idmap *idmap, >> + enum exact_level exact) >> +{ >> + int i, nslots; >> + >> + if (old->incoming_stack_arg_depth != cur->incoming_stack_arg_depth) >> + return false; >> + >> + /* Compare both incoming and outgoing stack arg slots. */ >> + if (old->stack_arg_depth != cur->stack_arg_depth) >> + return false; >> + >> + if (old->out_stack_arg_mask != cur->out_stack_arg_mask) >> + return false; > shouldn't be neccessary. Okay. > >> + >> + nslots = old->stack_arg_depth / BPF_REG_SIZE; >> + for (i = 0; i < nslots; i++) { >> + struct bpf_reg_state *old_arg = &old->stack_arg_regs[i]; >> + struct bpf_reg_state *cur_arg = &cur->stack_arg_regs[i]; >> + >> + if (old_arg->type == NOT_INIT && cur_arg->type == NOT_INIT) >> + continue; >> + >> + if (exact == EXACT && old_arg->type != cur_arg->type) >> + return false; >> + >> + if (!regsafe(env, old_arg, cur_arg, idmap, exact)) >> + return false; >> + } >> + >> + return true; >> +} >> + >> static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *cur, >> struct bpf_idmap *idmap) >> { >> @@ -929,6 +967,9 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat >> if (!stacksafe(env, old, cur, &env->idmap_scratch, exact)) >> return false; >> >> + if (!stack_arg_safe(env, old, cur, &env->idmap_scratch, exact)) >> + return false; >> + >> return true; >> } >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 6aa4dc161a56..78c9322870a5 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -1340,6 +1340,20 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st >> return -ENOMEM; >> >> dst->allocated_stack = src->allocated_stack; >> + >> + /* copy stack args state */ >> + n = src->stack_arg_depth / BPF_REG_SIZE; >> + if (n) { >> + dst->stack_arg_regs = copy_array(dst->stack_arg_regs, src->stack_arg_regs, n, >> + sizeof(struct bpf_reg_state), >> + GFP_KERNEL_ACCOUNT); > copy is unnecessary. Okay. > >> @@ -4220,6 +4254,13 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, >> } >> if (type == STACK_INVALID && env->allow_uninit_stack) >> continue; >> + /* >> + * Cross-frame reads may hit slots poisoned by dead code elimination. >> + * Static liveness can't track indirect references through pointers, >> + * so allow the read conservatively. >> + */ >> + if (type == STACK_POISON && reg_state != state) >> + continue; > wait what? Is this a real issue? Are you saying you have a test prog > that passes FP derived pointers in arg 6, 7 and arg_tracking cannot detect it? > Then it should be added properly in arg_tracking. > This hack to allow reading poisoned slots is not ok. > This is a serious issue. Okay. The reason is that callee is not getting the proper state from caller. I will do proper implementation here. > > pw-bot: cr