From: Anton Protopopov <a.s.protopopov@gmail.com>
To: Eduard Zingerman <eddyz87@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Anton Protopopov <aspsk@isovalent.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Quentin Monnet <qmo@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [PATCH v3 bpf-next 08/13] bpf, x86: add support for indirect jumps
Date: Thu, 25 Sep 2025 18:07:27 +0000 [thread overview]
Message-ID: <aNWE3x7SwgyTglAN@mail.gmail.com> (raw)
In-Reply-To: <61861bfd86d150b86c674ef7bea2b23e3482e1f2.camel@gmail.com>
On 25/09/19 05:28PM, Eduard Zingerman wrote:
> On Thu, 2025-09-18 at 09:38 +0000, Anton Protopopov wrote:
> > Add support for a new instruction
> >
> > BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=0
> >
> > which does an indirect jump to a location stored in Rx. The register
> > Rx should have type PTR_TO_INSN. This new type assures that the Rx
> > register contains a value (or a range of values) loaded from a
> > correct jump table – map of type instruction array.
> >
> > For example, for a C switch LLVM will generate the following code:
> >
> > 0: r3 = r1 # "switch (r3)"
> > 1: if r3 > 0x13 goto +0x666 # check r3 boundaries
> > 2: r3 <<= 0x3 # adjust to an index in array of addresses
> > 3: r1 = 0xbeef ll # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
> > 5: r1 += r3 # r1 inherits boundaries from r3
> > 6: r1 = *(u64 *)(r1 + 0x0) # r1 now has type INSN_TO_PTR
> > 7: gotox r1[,imm=fd(M)] # jit will generate proper code
> ^^^^^^^^^^^^
> Nit: this part is not needed atm.
Thanks, removed.
> >
> > Here the gotox instruction corresponds to one particular map. This is
> > possible however to have a gotox instruction which can be loaded from
> > different maps, e.g.
>
> [...]
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index aca43c284203..607a684642e5 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
>
> [...]
>
> > @@ -586,6 +597,9 @@ struct bpf_insn_aux_data {
> > u8 fastcall_spills_num:3;
> > u8 arg_prog:4;
> >
> > + /* true if jt->off was allocated */
> > + bool jt_allocated;
> > +
>
> Nit: in clear_insn_aux_data() maybe just check if instruction is a gotox?
Yes, this should work, thanks
>
> > /* below fields are initialized once */
> > unsigned int orig_idx; /* original instruction index */
> > bool jmp_point;
>
> [...]
>
> > static inline struct bpf_func_info_aux *subprog_aux(struct bpf_verifier_env *env, int subprog)
> > diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c
> > index 0c8dac62f457..4b945b7e31b8 100644
> > --- a/kernel/bpf/bpf_insn_array.c
> > +++ b/kernel/bpf/bpf_insn_array.c
> > @@ -1,7 +1,6 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> >
> > #include <linux/bpf.h>
> > -#include <linux/sort.h>
>
> Nit: remove this include from patch #3?
sure, thanks!
> >
> > #define MAX_INSN_ARRAY_ENTRIES 256
> >
>
> [...]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 5c1e4e37d1f8..839260e62fa9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
>
> [...]
>
> > @@ -7620,6 +7644,19 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> >
> > regs[value_regno].type = SCALAR_VALUE;
> > __mark_reg_known(®s[value_regno], val);
> > + } else if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY) {
> > + regs[value_regno].type = PTR_TO_INSN;
> > + regs[value_regno].map_ptr = map;
> > + regs[value_regno].off = reg->off;
> > + regs[value_regno].umin_value = reg->umin_value;
> > + regs[value_regno].umax_value = reg->umax_value;
> > + regs[value_regno].smin_value = reg->smin_value;
> > + regs[value_regno].smax_value = reg->smax_value;
> > + regs[value_regno].s32_min_value = reg->s32_min_value;
> > + regs[value_regno].s32_max_value = reg->s32_max_value;
> > + regs[value_regno].u32_min_value = reg->u32_min_value;
> > + regs[value_regno].u32_max_value = reg->u32_max_value;
> > + regs[value_regno].var_off = reg->var_off;
>
> This can be shortened to:
>
> copy_register_state(regs + value_regno, reg);
> regs[value_regno].type = PTR_TO_INSN;
>
> I think that a check that read is u64 wide is necessary here.
> Otherwise e.g. for u8 load you'd need to truncate the bounds set above.
> This is also necessary for alignment check at the beginning of this
> function (check_ptr_alignment() call).
will fix, thanks!
> > } else {
> > mark_reg_unknown(env, regs, value_regno);
> > }
>
> [...]
>
> > @@ -14628,6 +14672,11 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
> > }
> > break;
> > case BPF_SUB:
> > + if (ptr_to_insn_array) {
> > + verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> > + bpf_alu_string[opcode >> 4]);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Nit: Just "subtraction", no need for lookup?
> Also, maybe put this near the same check for PTR_TO_STACK?
ok
>
> > + return -EACCES;
> > + }
> > if (dst_reg == off_reg) {
> > /* scalar -= pointer. Creates an unknown scalar */
> > verbose(env, "R%d tried to subtract pointer from scalar\n",
>
> [...]
>
> > @@ -17733,6 +17783,234 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
> > return 0;
> > }
> >
> > +#define SET_HIGH(STATE, LAST) STATE = (STATE & 0xffffU) | ((LAST) << 16)
> > +#define GET_HIGH(STATE) ((u16)((STATE) >> 16))
> > +
> > +static int push_gotox_edge(int t, struct bpf_verifier_env *env, struct bpf_iarray *jt)
> > +{
> > + int *insn_stack = env->cfg.insn_stack;
> > + int *insn_state = env->cfg.insn_state;
> > + u16 prev;
> > + int w;
> > +
> > + for (prev = GET_HIGH(insn_state[t]); prev < jt->off_cnt; prev++) {
> > + w = jt->off[prev];
> > +
> > + /* EXPLORED || DISCOVERED */
> > + if (insn_state[w])
> > + continue;
>
> Suppose there is some other way to reach `w` beside gotox.
> Also suppose that `w` had been visited already.
> In such case `mark_jmp_point(env, w)` might get omitted for `w`.
thanks
> > +
> > + break;
> > + }
> > +
> > + if (prev == jt->off_cnt)
> > + return DONE_EXPLORING;
> > +
> > + mark_prune_point(env, t);
>
> Nit: do this from visit_gotox_insn() ?
yes, ok
> > +
> > + if (env->cfg.cur_stack >= env->prog->len)
> > + return -E2BIG;
> > + insn_stack[env->cfg.cur_stack++] = w;
> > +
> > + mark_jmp_point(env, w);
> > +
> > + SET_HIGH(insn_state[t], prev + 1);
> > + return KEEP_EXPLORING;
> > +}
>
> [...]
>
> > +/*
> > + * Find and collect all maps which fit in the subprog. Return the result as one
> > + * combined jump table in jt->off (allocated with kvcalloc
> ^^^
> nit: missing ')'
>
> > + */
> > +static struct bpf_iarray *jt_from_subprog(struct bpf_verifier_env *env,
> > + int subprog_start, int subprog_end)
>
> [...]
>
> > +static struct bpf_iarray *
> > +create_jt(int t, struct bpf_verifier_env *env, int fd)
> ^^^^^^
> fd is unused, same for visit_gotox_insn()
>
> [...]
>
> > @@ -18716,6 +19001,10 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
> > case PTR_TO_ARENA:
> > return true;
> > + case PTR_TO_INSN:
> > + /* is rcur a subset of rold? */
> > + return (rcur->umin_value >= rold->umin_value &&
> > + rcur->umax_value <= rold->umax_value);
>
> I think this should be:
>
> if (rold->off != rcur->off)
> return false;
> return range_within(old: rold, cur: rcur) &&
> tnum_in(a: rold->var_off, b: rcur->var_off);
ok, makes sense
> > default:
> > return regs_exact(rold, rcur, idmap);
> > }
> > @@ -19862,6 +20151,102 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> > return PROCESS_BPF_EXIT;
> > }
> >
> > +static int indirect_jump_min_max_index(struct bpf_verifier_env *env,
> > + int regno,
> > + struct bpf_map *map,
> > + u32 *pmin_index, u32 *pmax_index)
> > +{
> > + struct bpf_reg_state *reg = reg_state(env, regno);
> > + u64 min_index, max_index;
> > +
> > + if (check_add_overflow(reg->umin_value, reg->off, &min_index) ||
> > + (min_index > (u64) U32_MAX * sizeof(long))) {
> > + verbose(env, "the sum of R%u umin_value %llu and off %u is too big\n",
> > + regno, reg->umin_value, reg->off);
> > + return -ERANGE;
> > + }
> > + if (check_add_overflow(reg->umax_value, reg->off, &max_index) ||
> > + (max_index > (u64) U32_MAX * sizeof(long))) {
> > + verbose(env, "the sum of R%u umax_value %llu and off %u is too big\n",
> > + regno, reg->umax_value, reg->off);
> > + return -ERANGE;
> > + }
> > +
> > + min_index /= sizeof(long);
> > + max_index /= sizeof(long);
>
> Nit: `long` is 32-bit long on x86 (w/o -64), I understand that x86 jit
> would just reject gotox, but could you please use `sizeof(u64)` here?
Haven't check, really, but will the jump table contain 8-byte records
for x86_32? I thought they are size of pointers, thus I use long.
Still can replace by 8, yes.
> > +
> > + if (min_index >= map->max_entries || max_index >= map->max_entries) {
> > + verbose(env, "R%u points to outside of jump table: [%llu,%llu] max_entries %u\n",
> > + regno, min_index, max_index, map->max_entries);
> > + return -EINVAL;
> > + }
> > +
> > + *pmin_index = min_index;
> > + *pmax_index = max_index;
> > + return 0;
> > +}
> > +
> > +/* gotox *dst_reg */
> > +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > +{
> > + struct bpf_verifier_state *other_branch;
> > + struct bpf_reg_state *dst_reg;
> > + struct bpf_map *map;
> > + u32 min_index, max_index;
> > + int err = 0;
> > + u32 *xoff;
> > + int n;
> > + int i;
> > +
> > + dst_reg = reg_state(env, insn->dst_reg);
> > + if (dst_reg->type != PTR_TO_INSN) {
> > + verbose(env, "R%d has type %d, expected PTR_TO_INSN\n",
> > + insn->dst_reg, dst_reg->type);
> > + return -EINVAL;
> > + }
> > +
> > + map = dst_reg->map_ptr;
> > + if (verifier_bug_if(!map, env, "R%d has an empty map pointer", insn->dst_reg))
> > + return -EFAULT;
> > +
> > + if (verifier_bug_if(map->map_type != BPF_MAP_TYPE_INSN_ARRAY, env,
> > + "R%d has incorrect map type %d", insn->dst_reg, map->map_type))
> > + return -EFAULT;
> > +
> > + err = indirect_jump_min_max_index(env, insn->dst_reg, map, &min_index, &max_index);
> > + if (err)
> > + return err;
> > +
> > + xoff = kvcalloc(max_index - min_index + 1, sizeof(u32), GFP_KERNEL_ACCOUNT);
> > + if (!xoff)
> > + return -ENOMEM;
>
> Let's keep a buffer for this allocation in `env` and realloc it when needed.
> Would be good to avoid allocating memory each time this gotox is visited.
Ok (to put it in bpf_subprog_info as suggested in your next letter).
Though, probably it still needs to grow (= realloc).
> > +
> > + n = copy_insn_array_uniq(map, min_index, max_index, xoff);
> > + if (n < 0) {
> > + err = n;
> > + goto free_off;
> > + }
> > + if (n == 0) {
> > + verbose(env, "register R%d doesn't point to any offset in map id=%d\n",
> > + insn->dst_reg, map->id);
> > + err = -EINVAL;
> > + goto free_off;
> > + }
> > +
> > + for (i = 0; i < n - 1; i++) {
> > + other_branch = push_stack(env, xoff[i], env->insn_idx, false);
> ^^^^^
> `is_speculative` has to be inherited from env->cur_state
Ah, yes, thanks
> > + if (IS_ERR(other_branch)) {
> > + err = PTR_ERR(other_branch);
> > + goto free_off;
> > + }
> > + }
> > + env->insn_idx = xoff[n-1];
> > +
> > +free_off:
> > + kvfree(xoff);
> > + return err;
> > +}
> > +
> > static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> > {
> > int err;
> > @@ -19964,6 +20349,9 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> >
> > mark_reg_scratched(env, BPF_REG_0);
> > } else if (opcode == BPF_JA) {
> > + if (BPF_SRC(insn->code) == BPF_X)
> > + return check_indirect_jump(env, insn);
> > +
>
> check_indirect_jump() does not check reserved fields (like offset or dst_reg).
Ok, thanks, will fix. Though, maybe, in the visit_gotox, why to wait until here?
(just in case, should be s/dst_reg/src_reg in your comment)
>
> > if (BPF_SRC(insn->code) != BPF_K ||
> > insn->src_reg != BPF_REG_0 ||
> > insn->dst_reg != BPF_REG_0 ||
>
> [...]
>
> > @@ -24215,23 +24625,41 @@ static bool can_jump(struct bpf_insn *insn)
> > return false;
> > }
> >
> > -static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> > +/*
> > + * Returns an array of instructions succ, with succ->off[0], ...,
> > + * succ->off[n-1] with successor instructions, where n=succ->off_cnt
> > + */
> > +static struct bpf_iarray *
> > +insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
>
> Nit: maybe put insn_successors refactoring to a separate patch?
Yes, makes sense, will do. (In any case thi piece needs to be
carefully rebased after you recent changes.)
> > {
> > - struct bpf_insn *insn = &prog->insnsi[idx];
> > - int i = 0, insn_sz;
> > + struct bpf_prog *prog = env->prog;
> > + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> > + struct bpf_iarray *succ;
> > + int insn_sz;
> > u32 dst;
> >
> > - insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> > - if (can_fallthrough(insn) && idx + 1 < prog->len)
> > - succ[i++] = idx + insn_sz;
> > + if (unlikely(insn_is_gotox(insn))) {
> > + succ = env->insn_aux_data[insn_idx].jt;
> > + if (verifier_bug_if(!succ, env,
> > + "aux data for insn %u doesn't contain a jump table\n",
> > + insn_idx))
> > + return ERR_PTR(-EFAULT);
>
> Requiring each callsite to check error code for this function is very inconvenient.
> Moreover, insn_successors() is hot in liveness.c:update_instance().
> Let's just assume that NULL here cannot happen.
Hmm, ok. I will check and fix.
> > + } else {
> > + /* pre-allocated array of size up to 2; reset cnt, as it may be used already */
> > + succ = env->succ;
> > + succ->off_cnt = 0;
> >
> > - if (can_jump(insn)) {
> > - dst = idx + jmp_offset(insn) + 1;
> > - if (i == 0 || succ[0] != dst)
> > - succ[i++] = dst;
> > - }
> > + insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> > + if (can_fallthrough(insn) && insn_idx + 1 < prog->len)
> > + succ->off[succ->off_cnt++] = insn_idx + insn_sz;
> >
> > - return i;
> > + if (can_jump(insn)) {
> > + dst = insn_idx + jmp_offset(insn) + 1;
> > + if (succ->off_cnt == 0 || succ->off[0] != dst)
> > + succ->off[succ->off_cnt++] = dst;
> > + }
> > + }
> > + return succ;
> > }
> >
>
> [...]
>
> > @@ -24489,11 +24921,10 @@ static int compute_scc(struct bpf_verifier_env *env)
> > const u32 insn_cnt = env->prog->len;
> > int stack_sz, dfs_sz, err = 0;
> > u32 *stack, *pre, *low, *dfs;
> > - u32 succ_cnt, i, j, t, w;
> > + u32 i, j, t, w;
> > u32 next_preorder_num;
> > u32 next_scc_id;
> > bool assign_scc;
> > - u32 succ[2];
> >
> > next_preorder_num = 1;
> > next_scc_id = 1;
> > @@ -24592,6 +25023,8 @@ static int compute_scc(struct bpf_verifier_env *env)
> > dfs[0] = i;
> > dfs_continue:
> > while (dfs_sz) {
> > + struct bpf_iarray *succ;
> > +
>
> Nit: please move this declaration up, just to be consistent with other variables.
Sure
> > w = dfs[dfs_sz - 1];
> > if (pre[w] == 0) {
> > low[w] = next_preorder_num;
> > @@ -24600,12 +25033,17 @@ static int compute_scc(struct bpf_verifier_env *env)
> > stack[stack_sz++] = w;
> > }
> > /* Visit 'w' successors */
> > - succ_cnt = insn_successors(env->prog, w, succ);
> > - for (j = 0; j < succ_cnt; ++j) {
> > - if (pre[succ[j]]) {
> > - low[w] = min(low[w], low[succ[j]]);
> > + succ = insn_successors(env, w);
> > + if (IS_ERR(succ)) {
> > + err = PTR_ERR(succ);
> > + goto exit;
> > +
> > + }
> > + for (j = 0; j < succ->off_cnt; ++j) {
> > + if (pre[succ->off[j]]) {
> > + low[w] = min(low[w], low[succ->off[j]]);
> > } else {
> > - dfs[dfs_sz++] = succ[j];
> > + dfs[dfs_sz++] = succ->off[j];
> > goto dfs_continue;
> > }
> > }
>
> [...]
next prev parent reply other threads:[~2025-09-25 18:01 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-18 9:38 [PATCH v3 bpf-next 00/13] BPF indirect jumps Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 01/13] bpf: fix the return value of push_stack Anton Protopopov
2025-09-19 0:17 ` Eduard Zingerman
2025-09-19 7:18 ` Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 02/13] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 03/13] bpf, x86: add new map type: instructions array Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 04/13] selftests/bpf: add selftests for new insn_array map Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 05/13] bpf: support instructions arrays with constants blinding Anton Protopopov
2025-09-19 6:35 ` Eduard Zingerman
2025-09-19 7:05 ` Anton Protopopov
2025-09-19 7:12 ` Eduard Zingerman
2025-09-19 18:26 ` Alexei Starovoitov
2025-09-19 19:28 ` Daniel Borkmann
2025-09-19 19:44 ` Eduard Zingerman
2025-09-19 20:27 ` Anton Protopopov
2025-09-19 20:47 ` Eduard Zingerman
2025-09-22 9:28 ` Anton Protopopov
2025-09-30 9:07 ` Anton Protopopov
2025-09-19 21:41 ` Daniel Borkmann
2025-09-18 9:38 ` [PATCH v3 bpf-next 06/13] selftests/bpf: test instructions arrays with blinding Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 07/13] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
2025-09-19 18:25 ` Eduard Zingerman
2025-09-19 18:38 ` Eduard Zingerman
2025-09-19 19:25 ` Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 08/13] bpf, x86: add support for indirect jumps Anton Protopopov
2025-09-20 0:28 ` Eduard Zingerman
2025-09-21 19:12 ` Eduard Zingerman
2025-09-25 18:07 ` Anton Protopopov [this message]
2025-09-29 14:10 ` Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 09/13] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 10/13] libbpf: fix formatting of bpf_object__append_subprog_code Anton Protopopov
2025-09-19 23:18 ` Andrii Nakryiko
2025-09-18 9:38 ` [PATCH v3 bpf-next 11/13] libbpf: support llvm-generated indirect jumps Anton Protopopov
2025-09-19 23:18 ` Andrii Nakryiko
2025-09-22 10:13 ` Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 12/13] bpftool: Recognize insn_array map type Anton Protopopov
2025-09-18 9:38 ` [PATCH v3 bpf-next 13/13] selftests/bpf: add selftests for indirect jumps Anton Protopopov
2025-09-20 0:58 ` Eduard Zingerman
2025-09-20 22:27 ` Eduard Zingerman
2025-09-20 22:32 ` Eduard Zingerman
2025-09-25 18:14 ` Anton Protopopov
2025-09-19 6:46 ` [PATCH v3 bpf-next 00/13] BPF " Eduard Zingerman
2025-09-19 14:57 ` Anton Protopopov
2025-09-19 16:49 ` Eduard Zingerman
2025-09-19 17:27 ` Eduard Zingerman
2025-09-19 18:03 ` Eduard Zingerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aNWE3x7SwgyTglAN@mail.gmail.com \
--to=a.s.protopopov@gmail.com \
--cc=andrii@kernel.org \
--cc=aspsk@isovalent.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=qmo@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox