From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-oa1-f50.google.com (mail-oa1-f50.google.com [209.85.160.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0D376FC5 for ; Tue, 21 Apr 2026 00:37:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776731855; cv=none; b=OCE9402yBvyVDoiuLGBDC3TlGu9Wf+PbyXuKtjtUtyIygGezCnPyWaXD7fORy1CnIbkAaLGsPiZhv4C71V7XDXkC8D+iL048AIl6YjtSyinwhzMQn4EWjN8AV8Ji8+uw2StfMFgKQVXsEhd6VLZKR9mJPtrLHwT2xB0H1bJVUuU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776731855; c=relaxed/simple; bh=uIq8D8tx8NXe5dX69S5xbsWjNnLwxWzSn2rjSSnlVI4=; h=Mime-Version:Content-Type:Date:Message-Id:To:Cc:Subject:From: References:In-Reply-To; b=uv/GSvYv+JSNGzfGudnlQzYRGAtebf6CWSngYjU+QhWRzD6EvBFcAl69jEL9gXr7aRV+ldwQRA9Hx20QcimqTAL46CA60RGhriCAaEABMwdqeMyYERW8WO/dvZUTf7UkcgAFmCShQqZzQ1IfT67BEVJ/TH3RtQ/fKjzY0cIeGdI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KpWFqaoI; arc=none smtp.client-ip=209.85.160.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KpWFqaoI" Received: by mail-oa1-f50.google.com with SMTP id 586e51a60fabf-4232323a7daso1694444fac.1 for ; Mon, 20 Apr 2026 17:37:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776731853; x=1777336653; darn=vger.kernel.org; h=in-reply-to:references:from:subject:cc:to:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=azN1gj0qSctBQVr+7rpCulKYfD9ubmnLg2J1Uf3oy+M=; b=KpWFqaoII/aL0R8BYZE7reSUabH+hzpg4mZ1pFMDlFEyOTyUEHO4PNAITfUjwqbXrM mf+JU3tusyo1ekn1X3/L2z2lAXzXJboh1AMZowWEfmZulOTT4moiEqlC6F3CpfvXyuxE ApA0cE9Xj9wlFuIOdSx2NOtIKmlq71mKS+rSrNKhw23Kj+QQWUR6Rvt9oju+VvIXuqTJ IWFCvAfe3UC1Wfx7fBgGrdKRvKPPNPP482FFWw0k+E6u5jtRxt/5vZQAZ5Ro/feHTtv1 t4+uGCShOrwY/Ok0udVAlBpf9EtFlE8IguC3dMzcpP5bjM/j1JBCnyuvEagie7ZDUGAa gcMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776731853; x=1777336653; h=in-reply-to:references:from:subject:cc:to:message-id:date :content-transfer-encoding:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=azN1gj0qSctBQVr+7rpCulKYfD9ubmnLg2J1Uf3oy+M=; b=bdj6APSQYdD+ZY5plR6qLdXHr/XxGgk7/idp+emMqqiQXC0RgIrIH8J7aac9CmiJDV 55e1NgfAcQT5+R9apdjdkwjz1RaXFYuYtX0+Gh9dfC2xPayHyXwfOp3KYjMQPY88TdU3 8V/tnPHIyp4s5e++0BOSn21H+6Lfj7IQkyHkoORcLuYM/El5SnUy+swqwXkq8nmBQaEa us9vClS6iBBBr4wE8VlWXCQV2nsrzS1RMVGxAsLQdi99S5IoX//GKA4lnXuaWQuuCYKl ILHCueZRkja8xAWbhDKVjg3QPLBciTCAU7h4F24z8ILKR45yRBLxScsTHYqDfRfX8sJ3 uweQ== X-Forwarded-Encrypted: i=1; AFNElJ/7H3JcxDPmJfXJRJjZBMK5nv67rw1O6WAPnDee5gZkHCB5m8lmqwtJ594gzliP/9J55cc=@vger.kernel.org X-Gm-Message-State: AOJu0Yygt+ZgtzZN/0jbjSK7VgytE3v5cEV3Rr5g4UzKO9YWNeY6E/34 I52DYtJyyuDML3MxuuKAhzjQGNOCLQjFmmMNRGFyQtWXCZV+GX7XiIjH X-Gm-Gg: AeBDieverCcF1fbZih563FvHMyIxuykzTGI1jVMq740LwdSnJgMMvXxH+hWcWLnVYe5 WERoK7PKnTjmb6T+iX90DsT6sHL0S5ygFaN3b8GL1NLFQCCYqQfPL0lw/+WS6dvfbPdv8LJ9m37 9i9b3vKBO6T/Qp/lufutKIEIgI6Gj4prdKJfbI2Iew7rQSk5Kxv+kLi1ixiABtC2MXfVdjCCb2i +qz7/RDboQ5DxwvrXkewy4Mpn7XzOa/J7OHswVZruIglJ0mGra/OxUfGT6eji2w8ho8apZNVaym zhkhP96LJ25OVYaSb6P+/784tt3RZW+INUyKQEE0I8f578W0rXSe3B0j82x3IMawHtaS+17nkoN 93iKJzH6oOz0ItuqZLz44LvCM1smfrV6aHg5n8l8oH0OCrNYLwmxobrXuL4j/ClqZzJdvC1cEHZ pcXiIebhyEYjo3ZEr81PzWZKJdU4nyevddRtyLNF7yYTH40h4JeRKbnHnLYgz0fqAL4Eon9WDIE IZCcJNJkfpG4k9DuEae4wNsAxk= X-Received: by 2002:a05:6870:d154:b0:417:392e:4e67 with SMTP id 586e51a60fabf-42aded189acmr9042061fac.20.1776731852591; Mon, 20 Apr 2026 17:37:32 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:4::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-42b934a1a13sm10903748fac.12.2026.04.20.17.37.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 20 Apr 2026 17:37:31 -0700 (PDT) Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Mon, 20 Apr 2026 17:37:30 -0700 Message-Id: To: "Yonghong Song" , Cc: "Alexei Starovoitov" , "Andrii Nakryiko" , "Daniel Borkmann" , "Jose E . Marchesi" , , "Martin KaFai Lau" Subject: Re: [PATCH bpf-next v6 07/17] bpf: Support stack arguments for bpf functions From: "Alexei Starovoitov" X-Mailer: aerc References: <20260419163316.731019-1-yonghong.song@linux.dev> <20260419163352.734115-1-yonghong.song@linux.dev> In-Reply-To: <20260419163352.734115-1-yonghong.song@linux.dev> On Sun Apr 19, 2026 at 9:33 AM PDT, Yonghong Song wrote: > Currently BPF functions (subprogs) are limited to 5 register arguments. > With [1], the compiler can emit code that passes additional arguments > via a dedicated stack area through bpf register BPF_REG_PARAMS (r11), > introduced in the previous patch. > > The compiler uses positive r11 offsets for incoming (callee-side) args > and negative r11 offsets for outgoing (caller-side) args, following the > x86_64/arm64 calling convention direction. There is an 8-byte gap at > offset 0 separating the two regions: > Incoming (callee reads): r11+8 (arg6), r11+16 (arg7), ... > Outgoing (caller writes): r11-8 (arg6), r11-16 (arg7), ... This part looks clean now. > A per-state bitmask out_stack_arg_mask tracks which outgoing stack arg > slots have been written on the current path. Each bit corresponds to > an outgoing slot index (bit 0 =3D r11-8 =3D arg6, bit 1 =3D r11-16 =3D ar= g7, > etc.). At a call site, the verifier checks that all slots required by > the callee have their corresponding mask bits set. This enables > precise per-path tracking: if one branch of a conditional writes arg6 > but another does not, the mask correctly reflects the difference and > the verifier rejects the uninitialized path. The mask is included in > stack_arg_safe() so that states with different sets of initialized > slots are not incorrectly pruned together. But this part I don't understand. why do you need this bitmask? Even when they're written out of order stack_arg_depth is all you need. Then compare old->stack_arg_regs vs cur->stack_arg_regs. If one is not written its state will be NOT_INIT. so *(u64 *)(r11 - 16) =3D r7; // without *(u64 *)(r11 - 8) =3D r6; call bar1; // arg6 =3D r6, arg7 =3D r7 will fail the verification. > @@ -1669,6 +1669,8 @@ struct bpf_prog_aux { > u32 max_pkt_offset; > u32 max_tp_access; > u32 stack_depth; > + u16 incoming_stack_arg_depth; > + u16 stack_arg_depth; /* both incoming and max outgoing of stack argumen= ts */ these two are ok. I'm assuming you don't want JIT to recompute them. > u32 id; > u32 func_cnt; /* used by non-func prog as the number of func progs */ > u32 real_func_cnt; /* includes hidden progs, only used for JIT and free= ing progs */ > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index 9fbbddc40d21..bb6d8cab3a35 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -372,6 +372,11 @@ struct bpf_func_state { > * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE. > */ > int allocated_stack; > + > + u16 stack_arg_depth; /* Size of incoming + max outgoing stack args in b= ytes. */ > + u16 incoming_stack_arg_depth; /* Size of incoming stack args in bytes. = */ but incoming_stack_arg_depth looks odd. Callee should be accessing caller's stack_arg_depth and caller's stack_arg_regs. > + u16 out_stack_arg_mask; /* Bitmask of outgoing stack arg slots that hav= e been written. */ > + struct bpf_reg_state *stack_arg_regs; /* On-stack arguments */ > }; > =20 > #define MAX_CALL_FRAMES 8 > @@ -508,6 +513,17 @@ struct bpf_verifier_state { > iter < frame->allocated_stack / BPF_REG_SIZE; \ > iter++, reg =3D bpf_get_spilled_reg(iter, frame, mask)) > =20 > +#define bpf_get_spilled_stack_arg(slot, frame, mask) \ > + ((((slot) < frame->stack_arg_depth / BPF_REG_SIZE) && \ > + (frame->stack_arg_regs[slot].type !=3D NOT_INIT)) \ > + ? &frame->stack_arg_regs[slot] : NULL) > + > +/* Iterate over 'frame', setting 'reg' to either NULL or a spilled stack= arg. */ > +#define bpf_for_each_spilled_stack_arg(iter, frame, reg, mask) \ > + for (iter =3D 0, reg =3D bpf_get_spilled_stack_arg(iter, frame, mask); = \ > + iter < frame->stack_arg_depth / BPF_REG_SIZE; \ > + iter++, reg =3D bpf_get_spilled_stack_arg(iter, frame, mask)) > + > #define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, _= _expr) \ > ({ \ > struct bpf_verifier_state *___vstate =3D __vst; \ > @@ -525,6 +541,11 @@ struct bpf_verifier_state { > continue; \ > (void)(__expr); \ > } \ > + bpf_for_each_spilled_stack_arg(___j, __state, __reg, __mask) { \ > + if (!__reg) \ > + continue; \ > + (void)(__expr); \ > + } \ > } \ > }) > =20 > @@ -739,10 +760,13 @@ struct bpf_subprog_info { > bool keep_fastcall_stack: 1; > bool changes_pkt_data: 1; > bool might_sleep: 1; > - u8 arg_cnt:3; > + u8 arg_cnt:4; > =20 > enum priv_stack_mode priv_stack_mode; > - struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; > + struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS]; > + u16 incoming_stack_arg_depth; > + u16 outgoing_stack_arg_depth; > + u16 max_out_stack_arg_depth; but you already have them in prog_aux?! another copy in bpf_subprog_info?! Remove one of them. JIT only need one set. > }; > =20 > struct bpf_verifier_env; > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c > index a62d78581207..c5f3aa05d5a3 100644 > --- a/kernel/bpf/btf.c > +++ b/kernel/bpf/btf.c > @@ -7887,13 +7887,19 @@ int btf_prepare_func_args(struct bpf_verifier_env= *env, int subprog) > } > args =3D (const struct btf_param *)(t + 1); > nargs =3D btf_type_vlen(t); > - if (nargs > MAX_BPF_FUNC_REG_ARGS) { > - if (!is_global) > - return -EINVAL; > - bpf_log(log, "Global function %s() with %d > %d args. Buggy compiler.\= n", > + if (nargs > MAX_BPF_FUNC_ARGS) { > + bpf_log(log, "Function %s() with %d > %d args not supported.\n", > + tname, nargs, MAX_BPF_FUNC_ARGS); > + return -EINVAL; > + } > + if (is_global && nargs > MAX_BPF_FUNC_REG_ARGS) { > + bpf_log(log, "Global function %s() with %d > %d args not supported.\n"= , > tname, nargs, MAX_BPF_FUNC_REG_ARGS); > return -EINVAL; > } > + if (nargs > MAX_BPF_FUNC_REG_ARGS) > + sub->incoming_stack_arg_depth =3D (nargs - MAX_BPF_FUNC_REG_ARGS) * BP= F_REG_SIZE; > + > /* check that function is void or returns int, exception cb also requir= es this */ > t =3D btf_type_by_id(btf, t->type); > while (btf_type_is_modifier(t)) > diff --git a/kernel/bpf/fixups.c b/kernel/bpf/fixups.c > index fba9e8c00878..c4e0224ad2f2 100644 > --- a/kernel/bpf/fixups.c > +++ b/kernel/bpf/fixups.c > @@ -1123,6 +1123,9 @@ static int jit_subprogs(struct bpf_verifier_env *en= v) > =20 > func[i]->aux->name[0] =3D 'F'; > func[i]->aux->stack_depth =3D env->subprog_info[i].stack_depth; > + func[i]->aux->incoming_stack_arg_depth =3D env->subprog_info[i].incomi= ng_stack_arg_depth; > + func[i]->aux->stack_arg_depth =3D env->subprog_info[i].incoming_stack_= arg_depth + > + env->subprog_info[i].outgoing_stack_arg_depth; > if (env->subprog_info[i].priv_stack_mode =3D=3D PRIV_STACK_ADAPTIVE) > func[i]->aux->jits_use_priv_stack =3D true; > =20 > @@ -1301,8 +1304,10 @@ int bpf_jit_subprogs(struct bpf_verifier_env *env) > struct bpf_insn_aux_data *orig_insn_aux; > u32 *orig_subprog_starts; > =20 > - if (env->subprog_cnt <=3D 1) > + if (env->subprog_cnt <=3D 1) { > + env->prog->aux->stack_arg_depth =3D env->subprog_info[0].outgoing_stac= k_arg_depth; > return 0; > + } > =20 > prog =3D orig_prog =3D env->prog; > if (bpf_prog_need_blind(prog)) { > @@ -1378,9 +1383,20 @@ int bpf_fixup_call_args(struct bpf_verifier_env *e= nv) > struct bpf_prog *prog =3D env->prog; > struct bpf_insn *insn =3D prog->insnsi; > bool has_kfunc_call =3D bpf_prog_has_kfunc_call(prog); > - int i, depth; > + int depth; > #endif > - int err =3D 0; > + int i, err =3D 0; > + > + for (i =3D 0; i < env->subprog_cnt; i++) { > + struct bpf_subprog_info *subprog =3D &env->subprog_info[i]; > + > + if (subprog->max_out_stack_arg_depth > subprog->outgoing_stack_arg_dep= th) { > + verbose(env, > + "func#%d writes stack arg slot at depth %u, but calls only require %= u bytes\n", > + i, subprog->max_out_stack_arg_depth, subprog->outgoing_stack_arg_dep= th); > + return -EINVAL; > + } > + } > =20 > if (env->prog->jit_requested && > !bpf_prog_is_offloaded(env->prog->aux)) { > diff --git a/kernel/bpf/states.c b/kernel/bpf/states.c > index 8478d2c6ed5b..235841d23fe3 100644 > --- a/kernel/bpf/states.c > +++ b/kernel/bpf/states.c > @@ -838,6 +838,44 @@ static bool stacksafe(struct bpf_verifier_env *env, = struct bpf_func_state *old, > return true; > } > =20 > +/* > + * Compare stack arg slots between old and current states. > + * Outgoing stack args are path-local state and must agree for pruning. > + */ > +static bool stack_arg_safe(struct bpf_verifier_env *env, struct bpf_func= _state *old, > + struct bpf_func_state *cur, struct bpf_idmap *idmap, > + enum exact_level exact) > +{ > + int i, nslots; > + > + if (old->incoming_stack_arg_depth !=3D cur->incoming_stack_arg_depth) > + return false; > + > + /* Compare both incoming and outgoing stack arg slots. */ > + if (old->stack_arg_depth !=3D cur->stack_arg_depth) > + return false; > + > + if (old->out_stack_arg_mask !=3D cur->out_stack_arg_mask) > + return false; shouldn't be neccessary. > + > + nslots =3D old->stack_arg_depth / BPF_REG_SIZE; > + for (i =3D 0; i < nslots; i++) { > + struct bpf_reg_state *old_arg =3D &old->stack_arg_regs[i]; > + struct bpf_reg_state *cur_arg =3D &cur->stack_arg_regs[i]; > + > + if (old_arg->type =3D=3D NOT_INIT && cur_arg->type =3D=3D NOT_INIT) > + continue; > + > + if (exact =3D=3D EXACT && old_arg->type !=3D cur_arg->type) > + return false; > + > + if (!regsafe(env, old_arg, cur_arg, idmap, exact)) > + return false; > + } > + > + return true; > +} > + > static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_= state *cur, > struct bpf_idmap *idmap) > { > @@ -929,6 +967,9 @@ static bool func_states_equal(struct bpf_verifier_env= *env, struct bpf_func_stat > if (!stacksafe(env, old, cur, &env->idmap_scratch, exact)) > return false; > =20 > + if (!stack_arg_safe(env, old, cur, &env->idmap_scratch, exact)) > + return false; > + > return true; > } > =20 > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 6aa4dc161a56..78c9322870a5 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -1340,6 +1340,20 @@ static int copy_stack_state(struct bpf_func_state = *dst, const struct bpf_func_st > return -ENOMEM; > =20 > dst->allocated_stack =3D src->allocated_stack; > + > + /* copy stack args state */ > + n =3D src->stack_arg_depth / BPF_REG_SIZE; > + if (n) { > + dst->stack_arg_regs =3D copy_array(dst->stack_arg_regs, src->stack_arg= _regs, n, > + sizeof(struct bpf_reg_state), > + GFP_KERNEL_ACCOUNT); copy is unnecessary. > @@ -4220,6 +4254,13 @@ static int check_stack_read_fixed_off(struct bpf_v= erifier_env *env, > } > if (type =3D=3D STACK_INVALID && env->allow_uninit_stack) > continue; > + /* > + * Cross-frame reads may hit slots poisoned by dead code eliminatio= n. > + * Static liveness can't track indirect references through pointers= , > + * so allow the read conservatively. > + */ > + if (type =3D=3D STACK_POISON && reg_state !=3D state) > + continue; wait what? Is this a real issue? Are you saying you have a test prog that passes FP derived pointers in arg 6, 7 and arg_tracking cannot detect = it? Then it should be added properly in arg_tracking. This hack to allow reading poisoned slots is not ok. This is a serious issue. pw-bot: cr