* [PATCH v1 bpf-next 00/11] BPF indirect jumps
@ 2025-08-16 18:06 Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack Anton Protopopov
` (10 more replies)
0 siblings, 11 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
This patchset implements a new type of map, instruction set, and uses
it to build support for indirect branches in BPF (on x86). (The same
map will be later used to provide support for indirect calls and static
keys.) See [1], [2] for more context.
This patch set is a follow-up on the initial RFC [3], now converted to
normal version to trigger CI. Note that GCC and non-x86 archs are not
supposed to work.
Short table of contents:
* Patches 1-6 implement the new map of type
BPF_MAP_TYPE_INSN_SET and corresponding selftests. This map can
be used to track the "original -> xlated -> jitted mapping" for
a given program. Patches 5,6 add support for "blinded" variant.
* Patches 7,8,9 implement the support for indirect jumps
* Patches 10,11 add support for LLVM-compiled programs containing
indirect jumps.
A special LLVM should be used for that, see [4] for the details and
some related discussions. Due to this fact, selftests for indirect
jumps which directly use `goto *rX` are commented out (such that
CI can run).
There is a list of TBDs (mostly, more selftests + some limitations
like maximal map size), however, all the selftests which compile
to contain an indirect jump work with this patchset.
See individual patches for more details on implementation details.
Changes since RFC:
* I've tried to address all the comments provided by Alexei and
Eduard in RFC. Will try to list the most important of them below.
* One big change: move from older LLVM version [5] to newer [4].
Now LLVM generates jump tables as symbols in the new special
section ".jumptables". Another part of this change is that
libbpf now doesn't try to link map load and goto *rX, as
1) this is absolutely not reliable 2) for some use cases this
is impossible (namely, when more than one jump table can be used
in the same gotox instruction).
* Added insn_successors() support (Alexei, Eduard). This includes
getting rid of the ugly bpf_insn_set_iter_xlated_offset()
interface (Eduard).
* Removed hack for the unreachable instruction, as new LLVM thank to
Eduard doesn't generate it.
* Set mem_size for direct map access properly instead of hacking.
Remove off>0 check. (Alexei)
* Do not allocate new memory for min_index/max_index (Alexei, Eduard)
* Information required during check_cfg is now cached to be reused
later (Alexei + general logic for supporting multiple JT per jump)
* Properly compare registers in regsafe (Alexei, Eduard)
* Remove support for JMP32 (Eduard)
* Better checks in adjust_ptr_min_max_vals (Eduard)
* More selftests were added (but still there's room for more) which
directly use gotox (Alexei)
* More checks and verbose messages added
* "unique pointers" are no more in the map
Links:
1. https://lpc.events/event/18/contributions/1941/
2. https://lwn.net/Articles/1017439/
3. https://lore.kernel.org/bpf/20250615085943.3871208-1-a.s.protopopov@gmail.com/
4. https://github.com/llvm/llvm-project/pull/149715
Anton Protopopov (11):
bpf: fix the return value of push_stack
bpf: save the start of functions in bpf_prog_aux
bpf, x86: add new map type: instructions array
selftests/bpf: add selftests for new insn_array map
bpf: support instructions arrays with constants blinding
selftests/bpf: test instructions arrays with blinding
bpf, x86: allow indirect jumps to r8...r15
bpf, x86: add support for indirect jumps
bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X
libbpf: support llvm-generated indirect jumps
selftests/bpf: add selftests for indirect jumps
arch/x86/net/bpf_jit_comp.c | 39 +-
include/linux/bpf.h | 30 +
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 20 +-
include/uapi/linux/bpf.h | 11 +
kernel/bpf/Makefile | 2 +-
kernel/bpf/bpf_insn_array.c | 350 ++++++++++
kernel/bpf/core.c | 20 +
kernel/bpf/disasm.c | 9 +
kernel/bpf/syscall.c | 22 +
kernel/bpf/verifier.c | 603 ++++++++++++++++--
.../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
tools/include/uapi/linux/bpf.h | 11 +
tools/lib/bpf/libbpf.c | 159 ++++-
tools/lib/bpf/libbpf_probes.c | 4 +
tools/lib/bpf/linker.c | 12 +-
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/prog_tests/bpf_goto_x.c | 132 ++++
.../selftests/bpf/prog_tests/bpf_insn_array.c | 498 +++++++++++++++
.../testing/selftests/bpf/progs/bpf_goto_x.c | 384 +++++++++++
21 files changed, 2230 insertions(+), 85 deletions(-)
create mode 100644 kernel/bpf/bpf_insn_array.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_goto_x.c
--
2.34.1
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-25 18:12 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 02/11] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
` (9 subsequent siblings)
10 siblings, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
In [1] Eduard mentioned that on push_stack failure verifier code
should return -ENOMEM instead of -EFAULT. After checking with the
other call sites I've found that code randomly returns either -ENOMEM
or -EFAULT. This patch unifies the return values for the push_stack
(and similar push_async_cb) functions such that error codes are
always assigned properly.
[1] https://lore.kernel.org/bpf/20250615085943.3871208-1-a.s.protopopov@gmail.com
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
kernel/bpf/verifier.c | 59 ++++++++++++++++++++-----------------------
1 file changed, 28 insertions(+), 31 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3a3982fe20d4..d8a65726cff2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2101,7 +2101,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
elem = kzalloc(sizeof(struct bpf_verifier_stack_elem), GFP_KERNEL_ACCOUNT);
if (!elem)
- return NULL;
+ return ERR_PTR(-ENOMEM);
elem->insn_idx = insn_idx;
elem->prev_insn_idx = prev_insn_idx;
@@ -2111,12 +2111,12 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
env->stack_size++;
err = copy_verifier_state(&elem->st, cur);
if (err)
- return NULL;
+ return ERR_PTR(-ENOMEM);
elem->st.speculative |= speculative;
if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
verbose(env, "The sequence of %d jumps is too complex.\n",
env->stack_size);
- return NULL;
+ return ERR_PTR(-EFAULT);
}
if (elem->st.parent) {
++elem->st.parent->branches;
@@ -2912,7 +2912,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
elem = kzalloc(sizeof(struct bpf_verifier_stack_elem), GFP_KERNEL_ACCOUNT);
if (!elem)
- return NULL;
+ return ERR_PTR(-ENOMEM);
elem->insn_idx = insn_idx;
elem->prev_insn_idx = prev_insn_idx;
@@ -2924,7 +2924,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
verbose(env,
"The sequence of %d jumps is too complex for async cb.\n",
env->stack_size);
- return NULL;
+ return ERR_PTR(-EFAULT);
}
/* Unlike push_stack() do not copy_verifier_state().
* The caller state doesn't matter.
@@ -2935,7 +2935,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
elem->st.in_sleepable = is_sleepable;
frame = kzalloc(sizeof(*frame), GFP_KERNEL_ACCOUNT);
if (!frame)
- return NULL;
+ return ERR_PTR(-ENOMEM);
init_func_state(env, frame,
BPF_MAIN_FUNC /* callsite */,
0 /* frameno within this callchain */,
@@ -9046,8 +9046,8 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
prev_st = find_prev_entry(env, cur_st->parent, insn_idx);
/* branch out active iter state */
queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
- if (!queued_st)
- return -ENOMEM;
+ if (IS_ERR(queued_st))
+ return PTR_ERR(queued_st);
queued_iter = get_iter_from_state(queued_st, meta);
queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
@@ -10617,8 +10617,8 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
async_cb = push_async_cb(env, env->subprog_info[subprog].start,
insn_idx, subprog,
is_bpf_wq_set_callback_impl_kfunc(insn->imm));
- if (!async_cb)
- return -EFAULT;
+ if (IS_ERR(async_cb))
+ return PTR_ERR(async_cb);
callee = async_cb->frame[0];
callee->async_entry_cnt = caller->async_entry_cnt + 1;
@@ -10634,8 +10634,8 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
* proceed with next instruction within current frame.
*/
callback_state = push_stack(env, env->subprog_info[subprog].start, insn_idx, false);
- if (!callback_state)
- return -ENOMEM;
+ if (IS_ERR(callback_state))
+ return PTR_ERR(callback_state);
err = setup_func_entry(env, subprog, insn_idx, set_callee_state_cb,
callback_state);
@@ -13778,9 +13778,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
struct bpf_reg_state *regs;
branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
- if (!branch) {
+ if (IS_ERR(branch)) {
verbose(env, "failed to push state for failed lock acquisition\n");
- return -ENOMEM;
+ return PTR_ERR(branch);
}
regs = branch->frame[branch->curframe]->regs;
@@ -14217,7 +14217,7 @@ sanitize_speculative_path(struct bpf_verifier_env *env,
struct bpf_reg_state *regs;
branch = push_stack(env, next_idx, curr_idx, true);
- if (branch && insn) {
+ if (!IS_ERR(branch) && insn) {
regs = branch->frame[branch->curframe]->regs;
if (BPF_SRC(insn->code) == BPF_K) {
mark_reg_unknown(env, regs, insn->dst_reg);
@@ -14245,7 +14245,6 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
u8 opcode = BPF_OP(insn->code);
u32 alu_state, alu_limit;
struct bpf_reg_state tmp;
- bool ret;
int err;
if (can_skip_alu_sanitation(env, insn))
@@ -14318,11 +14317,11 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
tmp = *dst_reg;
copy_register_state(dst_reg, ptr_reg);
}
- ret = sanitize_speculative_path(env, NULL, env->insn_idx + 1,
- env->insn_idx);
- if (!ptr_is_dst_reg && ret)
+ if (IS_ERR(sanitize_speculative_path(env, NULL, env->insn_idx + 1, env->insn_idx)))
+ return REASON_STACK;
+ if (!ptr_is_dst_reg)
*dst_reg = tmp;
- return !ret ? REASON_STACK : 0;
+ return 0;
}
static void sanitize_mark_insn_seen(struct bpf_verifier_env *env)
@@ -16641,8 +16640,8 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
/* branch out 'fallthrough' insn as a new state to explore */
queued_st = push_stack(env, idx + 1, idx, false);
- if (!queued_st)
- return -ENOMEM;
+ if (IS_ERR(queued_st))
+ return PTR_ERR(queued_st);
queued_st->may_goto_depth++;
if (prev_st)
@@ -16721,8 +16720,7 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
* execution.
*/
if (!env->bypass_spec_v1 &&
- !sanitize_speculative_path(env, insn, *insn_idx + 1,
- *insn_idx))
+ IS_ERR(sanitize_speculative_path(env, insn, *insn_idx + 1, *insn_idx)))
return -EFAULT;
if (env->log.level & BPF_LOG_LEVEL)
print_insn_state(env, this_branch, this_branch->curframe);
@@ -16734,9 +16732,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
* simulation under speculative execution.
*/
if (!env->bypass_spec_v1 &&
- !sanitize_speculative_path(env, insn,
- *insn_idx + insn->off + 1,
- *insn_idx))
+ IS_ERR(sanitize_speculative_path(env, insn,
+ *insn_idx + insn->off + 1,
+ *insn_idx)))
return -EFAULT;
if (env->log.level & BPF_LOG_LEVEL)
print_insn_state(env, this_branch, this_branch->curframe);
@@ -16758,10 +16756,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
return err;
}
- other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx,
- false);
- if (!other_branch)
- return -EFAULT;
+ other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx, false);
+ if (IS_ERR(other_branch))
+ return PTR_ERR(other_branch);
other_branch_regs = other_branch->frame[other_branch->curframe]->regs;
if (BPF_SRC(insn->code) == BPF_X) {
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 02/11] bpf: save the start of functions in bpf_prog_aux
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array Anton Protopopov
` (8 subsequent siblings)
10 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Introduce a new subprog_start field in bpf_prog_aux. This field may
be used by JIT compilers wanting to know the real absolute xlated
offset of the function being jitted. The func_info[func_id] may have
served this purpose, but func_info may be NULL, so JIT compilers
can't rely on it.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
include/linux/bpf.h | 1 +
kernel/bpf/verifier.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e7ee089e8a31..baca497163d7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1596,6 +1596,7 @@ struct bpf_prog_aux {
u32 ctx_arg_info_size;
u32 max_rdonly_access;
u32 max_rdwr_access;
+ u32 subprog_start;
struct btf *attach_btf;
struct bpf_ctx_arg_aux *ctx_arg_info;
void __percpu *priv_stack_ptr;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d8a65726cff2..b034f88d72a0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21572,6 +21572,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->func_idx = i;
/* Below members will be freed only at prog->aux */
func[i]->aux->btf = prog->aux->btf;
+ func[i]->aux->subprog_start = subprog_start;
func[i]->aux->func_info = prog->aux->func_info;
func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
func[i]->aux->poke_tab = prog->aux->poke_tab;
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 02/11] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-25 21:05 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 04/11] selftests/bpf: add selftests for new insn_array map Anton Protopopov
` (7 subsequent siblings)
10 siblings, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
On bpf(BPF_PROG_LOAD) syscall user-supplied BPF programs are
translated by the verifier into "xlated" BPF programs. During this
process the original instructions offsets might be adjusted and/or
individual instructions might be replaced by new sets of instructions,
or deleted.
Add a new BPF map type which is aimed to keep track of how, for a
given program, the original instructions were relocated during the
verification. Also, besides keeping track of the original -> xlated
mapping, make x86 JIT to build the xlated -> jitted mapping for every
instruction listed in an instruction array. This is required for every
future application of instruction arrays: static keys, indirect jumps
and indirect calls.
A map of the BPF_MAP_TYPE_INSN_ARRAY type must be created with a u32
keys and value of size 8. The values have different semantics for
userspace and for BPF space. For userspace a value consists of two
u32 values – xlated and jitted offsets. For BPF side the value is
a real pointer to a jitted instruction.
On map creation/initialization, before loading the program, each
element of the map should be initialized to point to an instruction
offset within the program. Before the program load such maps should
be made frozen. After the program verification xlated and jitted
offsets can be read via the bpf(2) syscall.
If a tracked instruction is removed by the verifier, then the xlated
offset is set to (u32)-1 which is considered to be too big for a valid
BPF program offset.
One such a map can, obviously, be used to track one and only one BPF
program. If the verification process was unsuccessful, then the same
map can be re-used to verify the program with a different log level.
However, if the program was loaded fine, then such a map, being
frozen in any case, can't be reused by other programs even after the
program release.
Example. Consider the following original and xlated programs:
Original prog: Xlated prog:
0: r1 = 0x0 0: r1 = 0
1: *(u32 *)(r10 - 0x4) = r1 1: *(u32 *)(r10 -4) = r1
2: r2 = r10 2: r2 = r10
3: r2 += -0x4 3: r2 += -4
4: r1 = 0x0 ll 4: r1 = map[id:88]
6: call 0x1 6: r1 += 272
7: r0 = *(u32 *)(r2 +0)
8: if r0 >= 0x1 goto pc+3
9: r0 <<= 3
10: r0 += r1
11: goto pc+1
12: r0 = 0
7: r6 = r0 13: r6 = r0
8: if r6 == 0x0 goto +0x2 14: if r6 == 0x0 goto pc+4
9: call 0x76 15: r0 = 0xffffffff8d2079c0
17: r0 = *(u64 *)(r0 +0)
10: *(u64 *)(r6 + 0x0) = r0 18: *(u64 *)(r6 +0) = r0
11: r0 = 0x0 19: r0 = 0x0
12: exit 20: exit
An instruction array map, containing, e.g., instructions [0,4,7,12]
will be translated by the verifier to [0,4,13,20]. A map with
index 5 (the middle of 16-byte instruction) or indexes greater than 12
(outside the program boundaries) would be rejected.
The functionality provided by this patch will be extended in consequent
patches to implement BPF Static Keys, indirect jumps, and indirect calls.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
arch/x86/net/bpf_jit_comp.c | 8 +
include/linux/bpf.h | 28 +++
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 2 +
include/uapi/linux/bpf.h | 11 ++
kernel/bpf/Makefile | 2 +-
kernel/bpf/bpf_insn_array.c | 336 +++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 22 +++
kernel/bpf/verifier.c | 43 +++++
tools/include/uapi/linux/bpf.h | 11 ++
10 files changed, 463 insertions(+), 1 deletion(-)
create mode 100644 kernel/bpf/bpf_insn_array.c
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 7e3fca164620..589c3d5119f9 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1612,6 +1612,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
prog = temp;
for (i = 1; i <= insn_cnt; i++, insn++) {
+ u32 abs_xlated_off = bpf_prog->aux->subprog_start + i - 1;
const s32 imm32 = insn->imm;
u32 dst_reg = insn->dst_reg;
u32 src_reg = insn->src_reg;
@@ -2642,6 +2643,13 @@ st: if (is_imm8(insn->off))
return -EFAULT;
}
memcpy(rw_image + proglen, temp, ilen);
+
+ /*
+ * Instruction arrays need to know how xlated code
+ * maps to jitted code
+ */
+ bpf_prog_update_insn_ptr(bpf_prog, abs_xlated_off, proglen,
+ image + proglen);
}
proglen += ilen;
addrs[i] = proglen;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index baca497163d7..534ce7733277 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3705,4 +3705,32 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
const char **linep, int *nump);
struct bpf_prog *bpf_prog_find_from_stack(void);
+int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog);
+int bpf_insn_array_ready(struct bpf_map *map);
+void bpf_insn_array_release(struct bpf_map *map);
+void bpf_insn_array_adjust(struct bpf_map *map, u32 off, u32 len);
+void bpf_insn_array_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
+
+/*
+ * The struct bpf_insn_ptr structure describes a pointer to a
+ * particular instruction in a loaded BPF program. Initially
+ * it is initialised from userspace via user_value.xlated_off.
+ * During the program verification all other fields are populated
+ * accordingly:
+ *
+ * jitted_ip: address of the instruction in the jitted image
+ * user_value: user-visible xlated and jitted offsets
+ * orig_xlated_off: original offset of the instruction
+ */
+struct bpf_insn_ptr {
+ void *jitted_ip;
+ struct bpf_insn_array_value user_value;
+ u32 orig_xlated_off;
+};
+
+void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
+ u32 xlated_off,
+ u32 jitted_off,
+ void *jitted_ip);
+
#endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa78f49d4a9a..b13de31e163f 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -133,6 +133,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_INSN_ARRAY, insn_array_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 020de62bd09c..aca43c284203 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -766,8 +766,10 @@ struct bpf_verifier_env {
struct list_head free_list; /* list of struct bpf_verifier_state_list */
struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
struct btf_mod_pair used_btfs[MAX_USED_BTFS]; /* array of BTF's used by BPF program */
+ struct bpf_map *insn_array_maps[MAX_USED_MAPS]; /* array of INSN_ARRAY map's to be relocated */
u32 used_map_cnt; /* number of used maps */
u32 used_btf_cnt; /* number of used BTF objects */
+ u32 insn_array_map_cnt; /* number of used maps of type BPF_MAP_TYPE_INSN_ARRAY */
u32 id_gen; /* used to generate unique reg IDs */
u32 hidden_subprog_cnt; /* number of hidden subprogs */
int exception_callback_subprog;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 233de8677382..021c27ee5591 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1026,6 +1026,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
+ BPF_MAP_TYPE_INSN_ARRAY,
__MAX_BPF_MAP_TYPE
};
@@ -7623,4 +7624,14 @@ enum bpf_kfunc_flags {
BPF_F_PAD_ZEROS = (1ULL << 0),
};
+/*
+ * Values of a BPF_MAP_TYPE_INSN_ARRAY entry must be of this type.
+ * On updates jitted_off must be equal to 0.
+ */
+struct bpf_insn_array_value {
+ __u32 jitted_off;
+ __u32 xlated_off;
+};
+
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 269c04a24664..1cd5de523b5d 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -9,7 +9,7 @@ CFLAGS_core.o += -Wno-override-init $(cflags-nogcse-yy)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
-obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o bpf_insn_array.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c
new file mode 100644
index 000000000000..0c8dac62f457
--- /dev/null
+++ b/kernel/bpf/bpf_insn_array.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/bpf.h>
+#include <linux/sort.h>
+
+#define MAX_INSN_ARRAY_ENTRIES 256
+
+struct bpf_insn_array {
+ struct bpf_map map;
+ struct mutex state_mutex;
+ int state;
+ long *ips;
+ DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
+};
+
+enum {
+ INSN_ARRAY_STATE_FREE = 0,
+ INSN_ARRAY_STATE_INIT,
+ INSN_ARRAY_STATE_READY,
+};
+
+#define cast_insn_array(MAP_PTR) \
+ container_of(MAP_PTR, struct bpf_insn_array, map)
+
+#define INSN_DELETED ((u32)-1)
+
+static inline u32 insn_array_alloc_size(u32 max_entries)
+{
+ const u32 base_size = sizeof(struct bpf_insn_array);
+ const u32 entry_size = sizeof(struct bpf_insn_ptr);
+
+ return base_size + entry_size * max_entries;
+}
+
+static int insn_array_alloc_check(union bpf_attr *attr)
+{
+ if (attr->max_entries == 0 ||
+ attr->key_size != 4 ||
+ attr->value_size != 8 ||
+ attr->map_flags != 0)
+ return -EINVAL;
+
+ if (attr->max_entries > MAX_INSN_ARRAY_ENTRIES)
+ return -E2BIG;
+
+ return 0;
+}
+
+static void insn_array_free(struct bpf_map *map)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+
+ kfree(insn_array->ips);
+ bpf_map_area_free(insn_array);
+}
+
+static struct bpf_map *insn_array_alloc(union bpf_attr *attr)
+{
+ u64 size = insn_array_alloc_size(attr->max_entries);
+ struct bpf_insn_array *insn_array;
+
+ insn_array = bpf_map_area_alloc(size, NUMA_NO_NODE);
+ if (!insn_array)
+ return ERR_PTR(-ENOMEM);
+
+ insn_array->ips = kcalloc(attr->max_entries, sizeof(long), GFP_KERNEL);
+ if (!insn_array->ips) {
+ insn_array_free(&insn_array->map);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ bpf_map_init_from_attr(&insn_array->map, attr);
+
+ mutex_init(&insn_array->state_mutex);
+ insn_array->state = INSN_ARRAY_STATE_FREE;
+
+ return &insn_array->map;
+}
+
+static int insn_array_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ u32 index = key ? *(u32 *)key : U32_MAX;
+ u32 *next = (u32 *)next_key;
+
+ if (index >= insn_array->map.max_entries) {
+ *next = 0;
+ return 0;
+ }
+
+ if (index == insn_array->map.max_entries - 1)
+ return -ENOENT;
+
+ *next = index + 1;
+ return 0;
+}
+
+static void *insn_array_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ u32 index = *(u32 *)key;
+
+ if (unlikely(index >= insn_array->map.max_entries))
+ return NULL;
+
+ return &insn_array->ptrs[index].user_value;
+}
+
+static long insn_array_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ u32 index = *(u32 *)key;
+ struct bpf_insn_array_value val = {};
+ int err = 0;
+
+ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+ return -EINVAL;
+
+ if (unlikely(index >= insn_array->map.max_entries))
+ return -E2BIG;
+
+ if (unlikely(map_flags & BPF_NOEXIST))
+ return -EEXIST;
+
+ /* No updates for maps in use */
+ if (!mutex_trylock(&insn_array->state_mutex))
+ return -EBUSY;
+
+ if (insn_array->state != INSN_ARRAY_STATE_FREE) {
+ err = -EBUSY;
+ goto unlock;
+ }
+
+ copy_map_value(map, &val, value);
+ if (val.jitted_off || val.xlated_off == INSN_DELETED) {
+ err = -EINVAL;
+ goto unlock;
+ }
+
+ insn_array->ptrs[index].orig_xlated_off = val.xlated_off;
+ insn_array->ptrs[index].user_value.xlated_off = val.xlated_off;
+
+unlock:
+ mutex_unlock(&insn_array->state_mutex);
+ return err;
+}
+
+static long insn_array_delete_elem(struct bpf_map *map, void *key)
+{
+ return -EINVAL;
+}
+
+static int insn_array_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
+{
+ if (!btf_type_is_i32(key_type))
+ return -EINVAL;
+
+ if (!btf_type_is_i64(value_type))
+ return -EINVAL;
+
+ return 0;
+}
+
+static u64 insn_array_mem_usage(const struct bpf_map *map)
+{
+ u64 extra_size = 0;
+
+ extra_size += sizeof(long) * map->max_entries; /* insn_array->ips */
+
+ return insn_array_alloc_size(map->max_entries) + extra_size;
+}
+
+BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
+
+const struct bpf_map_ops insn_array_map_ops = {
+ .map_alloc_check = insn_array_alloc_check,
+ .map_alloc = insn_array_alloc,
+ .map_free = insn_array_free,
+ .map_get_next_key = insn_array_get_next_key,
+ .map_lookup_elem = insn_array_lookup_elem,
+ .map_update_elem = insn_array_update_elem,
+ .map_delete_elem = insn_array_delete_elem,
+ .map_check_btf = insn_array_check_btf,
+ .map_mem_usage = insn_array_mem_usage,
+ .map_btf_id = &insn_array_btf_ids[0],
+};
+
+static bool is_insn_array(const struct bpf_map *map)
+{
+ return map->map_type == BPF_MAP_TYPE_INSN_ARRAY;
+}
+
+static inline bool valid_offsets(const struct bpf_insn_array *insn_array,
+ const struct bpf_prog *prog)
+{
+ u32 off;
+ int i;
+
+ for (i = 0; i < insn_array->map.max_entries; i++) {
+ off = insn_array->ptrs[i].orig_xlated_off;
+
+ if (off >= prog->len)
+ return false;
+
+ if (off > 0) {
+ if (prog->insnsi[off-1].code == (BPF_LD | BPF_DW | BPF_IMM))
+ return false;
+ }
+ }
+
+ return true;
+}
+
+int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ int i;
+
+ if (!valid_offsets(insn_array, prog))
+ return -EINVAL;
+
+ /*
+ * There can be only one program using the map
+ */
+ mutex_lock(&insn_array->state_mutex);
+ if (insn_array->state != INSN_ARRAY_STATE_FREE) {
+ mutex_unlock(&insn_array->state_mutex);
+ return -EBUSY;
+ }
+ insn_array->state = INSN_ARRAY_STATE_INIT;
+ mutex_unlock(&insn_array->state_mutex);
+
+ /*
+ * Reset all the map indexes to the original values. This is needed,
+ * e.g., when a replay of verification with different log level should
+ * be performed.
+ */
+ for (i = 0; i < map->max_entries; i++)
+ insn_array->ptrs[i].user_value.xlated_off = insn_array->ptrs[i].orig_xlated_off;
+
+ return 0;
+}
+
+int bpf_insn_array_ready(struct bpf_map *map)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ guard(mutex)(&insn_array->state_mutex);
+ int i;
+
+ for (i = 0; i < map->max_entries; i++) {
+ if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
+ continue;
+ if (!insn_array->ips[i]) {
+ /*
+ * Set the map free on failure; the program owning it
+ * might be re-loaded with different log level
+ */
+ insn_array->state = INSN_ARRAY_STATE_FREE;
+ return -EFAULT;
+ }
+ }
+
+ insn_array->state = INSN_ARRAY_STATE_READY;
+ return 0;
+}
+
+void bpf_insn_array_release(struct bpf_map *map)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ guard(mutex)(&insn_array->state_mutex);
+
+ insn_array->state = INSN_ARRAY_STATE_FREE;
+}
+
+void bpf_insn_array_adjust(struct bpf_map *map, u32 off, u32 len)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ int i;
+
+ if (len <= 1)
+ return;
+
+ for (i = 0; i < map->max_entries; i++) {
+ if (insn_array->ptrs[i].user_value.xlated_off <= off)
+ continue;
+ if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
+ continue;
+ insn_array->ptrs[i].user_value.xlated_off += len - 1;
+ }
+}
+
+void bpf_insn_array_adjust_after_remove(struct bpf_map *map, u32 off, u32 len)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ int i;
+
+ for (i = 0; i < map->max_entries; i++) {
+ if (insn_array->ptrs[i].user_value.xlated_off < off)
+ continue;
+ if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
+ continue;
+ if (insn_array->ptrs[i].user_value.xlated_off >= off &&
+ insn_array->ptrs[i].user_value.xlated_off < off + len)
+ insn_array->ptrs[i].user_value.xlated_off = INSN_DELETED;
+ else
+ insn_array->ptrs[i].user_value.xlated_off -= len;
+ }
+}
+
+void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
+ u32 xlated_off,
+ u32 jitted_off,
+ void *jitted_ip)
+{
+ struct bpf_insn_array *insn_array;
+ struct bpf_map *map;
+ int i, j;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++) {
+ map = prog->aux->used_maps[i];
+ if (!is_insn_array(map))
+ continue;
+
+ insn_array = cast_insn_array(map);
+ for (j = 0; j < map->max_entries; j++) {
+ if (insn_array->ptrs[j].user_value.xlated_off == xlated_off) {
+ insn_array->ips[j] = (long)jitted_ip;
+ insn_array->ptrs[j].jitted_ip = jitted_ip;
+ insn_array->ptrs[j].user_value.jitted_off = jitted_off;
+ }
+ }
+ }
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0fbfa8532c39..2f312527d684 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1461,6 +1461,7 @@ static int map_create(union bpf_attr *attr, bool kernel)
case BPF_MAP_TYPE_STRUCT_OPS:
case BPF_MAP_TYPE_CPUMAP:
case BPF_MAP_TYPE_ARENA:
+ case BPF_MAP_TYPE_INSN_ARRAY:
if (!bpf_token_capable(token, CAP_BPF))
goto put_token;
break;
@@ -2761,6 +2762,23 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
}
}
+static int bpf_prog_mark_insn_arrays_ready(struct bpf_prog *prog)
+{
+ int err;
+ int i;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++) {
+ if (prog->aux->used_maps[i]->map_type != BPF_MAP_TYPE_INSN_ARRAY)
+ continue;
+
+ err = bpf_insn_array_ready(prog->aux->used_maps[i]);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD fd_array_cnt
@@ -2984,6 +3002,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
if (err < 0)
goto free_used_maps;
+ err = bpf_prog_mark_insn_arrays_ready(prog);
+ if (err < 0)
+ goto free_used_maps;
+
err = bpf_prog_alloc_id(prog);
if (err)
goto free_used_maps;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b034f88d72a0..e1f7744e132b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10084,6 +10084,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
func_id != BPF_FUNC_map_push_elem)
goto error;
break;
+ case BPF_MAP_TYPE_INSN_ARRAY:
+ goto error;
default:
break;
}
@@ -20492,6 +20494,15 @@ static int __add_used_map(struct bpf_verifier_env *env, struct bpf_map *map)
env->used_maps[env->used_map_cnt++] = map;
+ if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY) {
+ err = bpf_insn_array_init(map, env->prog);
+ if (err) {
+ verbose(env, "Failed to properly initialize insn array\n");
+ return err;
+ }
+ env->insn_array_maps[env->insn_array_map_cnt++] = map;
+ }
+
return env->used_map_cnt - 1;
}
@@ -20738,6 +20749,33 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len
}
}
+static void release_insn_arrays(struct bpf_verifier_env *env)
+{
+ int i;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++)
+ bpf_insn_array_release(env->insn_array_maps[i]);
+}
+
+static void adjust_insn_arrays(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+ int i;
+
+ if (len == 1)
+ return;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++)
+ bpf_insn_array_adjust(env->insn_array_maps[i], off, len);
+}
+
+static void adjust_insn_arrays_after_remove(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+ int i;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++)
+ bpf_insn_array_adjust_after_remove(env->insn_array_maps[i], off, len);
+}
+
static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len)
{
struct bpf_jit_poke_descriptor *tab = prog->aux->poke_tab;
@@ -20780,6 +20818,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of
}
adjust_insn_aux_data(env, new_prog, off, len);
adjust_subprog_starts(env, off, len);
+ adjust_insn_arrays(env, off, len);
adjust_poke_descs(new_prog, off, len);
return new_prog;
}
@@ -20963,6 +21002,8 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
if (err)
return err;
+ adjust_insn_arrays_after_remove(env, off, cnt);
+
memmove(aux_data + off, aux_data + off + cnt,
sizeof(*aux_data) * (orig_prog_len - off - cnt));
@@ -24810,6 +24851,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
adjust_btf_func(env);
err_release_maps:
+ if (ret)
+ release_insn_arrays(env);
if (!env->prog->aux->used_maps)
/* if we didn't copy map pointers into bpf_prog_info, release
* them now. Otherwise free_used_maps() will release them.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 233de8677382..021c27ee5591 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1026,6 +1026,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
+ BPF_MAP_TYPE_INSN_ARRAY,
__MAX_BPF_MAP_TYPE
};
@@ -7623,4 +7624,14 @@ enum bpf_kfunc_flags {
BPF_F_PAD_ZEROS = (1ULL << 0),
};
+/*
+ * Values of a BPF_MAP_TYPE_INSN_ARRAY entry must be of this type.
+ * On updates jitted_off must be equal to 0.
+ */
+struct bpf_insn_array_value {
+ __u32 jitted_off;
+ __u32 xlated_off;
+};
+
+
#endif /* _UAPI__LINUX_BPF_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 04/11] selftests/bpf: add selftests for new insn_array map
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (2 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding Anton Protopopov
` (6 subsequent siblings)
10 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Tests are split in two parts.
The `bpf_insn_array_ops` test checks that the map is managed properly:
* Incorrect instruction indexes are rejected
* Non-sorted and non-unique indexes are rejected
* Unfrozen maps are not accepted
* Two programs can't use the same map
* BPF progs can't operate the map
The `bpf_insn_array` part validates, as best as it can do it from user
space, that instructions were adjusted properly:
* no changes to code => map is the same
* expected changes when instructions are added
* expected changes when instructions are deleted
* expected changes when multiple functions are present
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
.../selftests/bpf/prog_tests/bpf_insn_array.c | 406 ++++++++++++++++++
1 file changed, 406 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
new file mode 100644
index 000000000000..da329c2e85f2
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
@@ -0,0 +1,406 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <bpf/bpf.h>
+#include <test_progs.h>
+
+static int map_create(__u32 map_type, __u32 max_entries)
+{
+ const char *map_name = "insn_array";
+ __u32 key_size = 4;
+ __u32 value_size = sizeof(struct bpf_insn_array_value);
+
+ return bpf_map_create(map_type, map_name, key_size, value_size, max_entries, NULL);
+}
+
+static int prog_load(struct bpf_insn *insns, __u32 insn_cnt, int *fd_array, __u32 fd_array_cnt)
+{
+ LIBBPF_OPTS(bpf_prog_load_opts, opts);
+
+ opts.fd_array = fd_array;
+ opts.fd_array_cnt = fd_array_cnt;
+
+ return bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, &opts);
+}
+
+/*
+ * Load a program, which will not be anyhow mangled by the verifier. Add an
+ * insn_array map pointing to every instruction. Check that it hasn't changed
+ * after the program load.
+ */
+static void check_one_to_one_mapping(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 4),
+ BPF_MOV64_IMM(BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = i;
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, i, "val should be equal i");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+/*
+ * Try to load a program with a map which points to outside of the program
+ */
+static void check_out_of_bounds_index(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 4),
+ BPF_MOV64_IMM(BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd, map_fd;
+ struct bpf_insn_array_value val = {};
+ int key;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, 1);
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ key = 0;
+ val.xlated_off = ARRAY_SIZE(insns); /* too big */
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &key, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+
+ errno = 0;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)")) {
+ close(prog_fd);
+ goto cleanup;
+ }
+
+cleanup:
+ close(map_fd);
+}
+
+/*
+ * Try to load a program with a map which points to the middle of 16-bit insn
+ */
+static void check_mid_insn_index(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_LD_IMM64(BPF_REG_0, 0), /* 2 x 8 */
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd, map_fd;
+ struct bpf_insn_array_value val = {};
+ int key;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, 1);
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ key = 0;
+ val.xlated_off = 1; /* middle of 16-byte instruction */
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &key, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+
+ errno = 0;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)")) {
+ close(prog_fd);
+ goto cleanup;
+ }
+
+cleanup:
+ close(map_fd);
+}
+
+static void check_incorrect_index(void)
+{
+ check_out_of_bounds_index();
+ check_mid_insn_index();
+}
+
+/*
+ * Load a program with two patches (get jiffies, for simplicity). Add an
+ * insn_array map pointing to every instruction. Check how it was changed
+ * after the program load.
+ */
+static void check_simple(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ __u32 map_in[] = {0, 1, 2, 3, 4, 5};
+ __u32 map_out[] = {0, 1, 4, 5, 8, 9};
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = map_in[i];
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+ "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+/*
+ * Verifier can delete code in two cases: nops & dead code. From insn
+ * array's point of view, the two cases are the same, so test using
+ * the simplest method: by loading some nops
+ */
+static void check_deletions(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ __u32 map_in[] = {0, 1, 2, 3, 4, 5};
+ __u32 map_out[] = {0, -1, 1, -1, 2, 3};
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = map_in[i];
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+ "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+static void check_with_functions(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ __u32 map_in[] = { 0, 1, 2, 3, 4, 5, /* func */ 6, 7, 8, 9, 10};
+ __u32 map_out[] = {-1, 0, -1, 3, 4, 5, /* func */ -1, 6, -1, 9, 10};
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = map_in[i];
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+ "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+/* Map can be used only by one BPF program */
+static void check_no_map_reuse(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd, extra_fd = -1;
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = i;
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, i, "val should be equal i");
+ }
+
+ errno = 0;
+ extra_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_EQ(extra_fd, -EBUSY, "program should have been rejected (extra_fd != -EBUSY)"))
+ goto cleanup;
+
+ /* correctness: check that prog is still loadable without fd_array */
+ extra_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD): expected no error"))
+ goto cleanup;
+
+cleanup:
+ close(extra_fd);
+ close(prog_fd);
+ close(map_fd);
+}
+
+static void check_bpf_no_lookup(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, 1);
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ insns[0].imm = map_fd;
+
+ errno = 0;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+ if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)"))
+ goto cleanup;
+
+ /* correctness: check that prog is still loadable with normal map */
+ close(map_fd);
+ map_fd = map_create(BPF_MAP_TYPE_ARRAY, 1);
+ insns[0].imm = map_fd;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+static void check_bpf_side(void)
+{
+ check_bpf_no_lookup();
+}
+
+/* Test if offsets are adjusted properly */
+void test_bpf_insn_array(void)
+{
+ if (test__start_subtest("one2one"))
+ check_one_to_one_mapping();
+
+ if (test__start_subtest("simple"))
+ check_simple();
+
+ if (test__start_subtest("deletions"))
+ check_deletions();
+
+ if (test__start_subtest("multiple-functions"))
+ check_with_functions();
+}
+
+/* Check all kinds of operations and related restrictions */
+void test_bpf_insn_array_ops(void)
+{
+ if (test__start_subtest("incorrect-index"))
+ check_incorrect_index();
+
+ if (test__start_subtest("no-map-reuse"))
+ check_no_map_reuse();
+
+ if (test__start_subtest("bpf-side-ops"))
+ check_bpf_side();
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (3 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 04/11] selftests/bpf: add selftests for new insn_array map Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-17 5:50 ` kernel test robot
2025-08-25 23:29 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 06/11] selftests/bpf: test instructions arrays with blinding Anton Protopopov
` (5 subsequent siblings)
10 siblings, 2 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
When bpf_jit_harden is enabled, all constants in the BPF code are
blinded to prevent JIT spraying attacks. This happens during JIT
phase. Adjust all the related instruction arrays accordingly.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
kernel/bpf/core.c | 19 +++++++++++++++++++
kernel/bpf/verifier.c | 11 ++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 5d1650af899d..27e9c30ad6dc 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1482,6 +1482,21 @@ void bpf_jit_prog_release_other(struct bpf_prog *fp, struct bpf_prog *fp_other)
bpf_prog_clone_free(fp_other);
}
+static void adjust_insn_arrays(struct bpf_prog *prog, u32 off, u32 len)
+{
+ struct bpf_map *map;
+ int i;
+
+ if (len <= 1)
+ return;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++) {
+ map = prog->aux->used_maps[i];
+ if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY)
+ bpf_insn_array_adjust(map, off, len);
+ }
+}
+
struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
{
struct bpf_insn insn_buff[16], aux[2];
@@ -1537,6 +1552,9 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
clone = tmp;
insn_delta = rewritten - 1;
+ /* Instructions arrays must be updated using absolute xlated offsets */
+ adjust_insn_arrays(clone, prog->aux->subprog_start + i, rewritten);
+
/* Walk new program and skip insns we just inserted. */
insn = clone->insnsi + i + insn_delta;
insn_cnt += insn_delta;
@@ -1544,6 +1562,7 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
}
clone->blinded = 1;
+ clone->len = insn_cnt;
return clone;
}
#endif /* CONFIG_BPF_JIT */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e1f7744e132b..863b7114866b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21539,6 +21539,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
struct bpf_insn *insn;
void *old_bpf_func;
int err, num_exentries;
+ int instructions_added = 0;
if (env->subprog_cnt <= 1)
return 0;
@@ -21613,7 +21614,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->func_idx = i;
/* Below members will be freed only at prog->aux */
func[i]->aux->btf = prog->aux->btf;
- func[i]->aux->subprog_start = subprog_start;
+ func[i]->aux->subprog_start = subprog_start + instructions_added;
func[i]->aux->func_info = prog->aux->func_info;
func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
func[i]->aux->poke_tab = prog->aux->poke_tab;
@@ -21665,7 +21666,15 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->might_sleep = env->subprog_info[i].might_sleep;
if (!i)
func[i]->aux->exception_boundary = env->seen_exception;
+
+ /*
+ * To properly pass the absolute subprog start to jit
+ * all instruction adjustments should be accumulated
+ */
+ instructions_added -= func[i]->len;
func[i] = bpf_int_jit_compile(func[i]);
+ instructions_added += func[i]->len;
+
if (!func[i]->jited) {
err = -ENOTSUPP;
goto out_free;
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 06/11] selftests/bpf: test instructions arrays with blinding
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (4 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 07/11] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
` (4 subsequent siblings)
10 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add a specific test for instructions arrays with blinding enabled.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
.../selftests/bpf/prog_tests/bpf_insn_array.c | 92 +++++++++++++++++++
1 file changed, 92 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
index da329c2e85f2..4fa09c05c272 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
@@ -287,6 +287,95 @@ static void check_with_functions(void)
close(map_fd);
}
+static int set_bpf_jit_harden(char *level)
+{
+ char old_level;
+ int err = -1;
+ int fd = -1;
+
+ fd = open("/proc/sys/net/core/bpf_jit_harden", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ASSERT_FAIL("open .../bpf_jit_harden returned %d (errno=%d)", fd, errno);
+ return -1;
+ }
+
+ err = read(fd, &old_level, 1);
+ if (err != 1) {
+ ASSERT_FAIL("read from .../bpf_jit_harden returned %d (errno=%d)", err, errno);
+ err = -1;
+ goto end;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ err = write(fd, level, 1);
+ if (err != 1) {
+ ASSERT_FAIL("write to .../bpf_jit_harden returned %d (errno=%d)", err, errno);
+ err = -1;
+ goto end;
+ }
+
+ err = 0;
+ *level = old_level;
+end:
+ if (fd >= 0)
+ close(fd);
+ return err;
+}
+
+static void check_blindness(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 4),
+ BPF_MOV64_IMM(BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ struct bpf_insn_array_value val = {};
+ char bpf_jit_harden = '@'; /* non-exizsting value */
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = i;
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ bpf_jit_harden = '2';
+ if (set_bpf_jit_harden(&bpf_jit_harden)) {
+ bpf_jit_harden = '@'; /* open, read or write failed => no write was done */
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ char fmt[32];
+
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ snprintf(fmt, sizeof(fmt), "val should be equal 3*%d", i);
+ ASSERT_EQ(val.xlated_off, i * 3, fmt);
+ }
+
+cleanup:
+ /* restore the old one */
+ if (bpf_jit_harden != '@')
+ set_bpf_jit_harden(&bpf_jit_harden);
+
+ close(prog_fd);
+ close(map_fd);
+}
+
/* Map can be used only by one BPF program */
static void check_no_map_reuse(void)
{
@@ -390,6 +479,9 @@ void test_bpf_insn_array(void)
if (test__start_subtest("multiple-functions"))
check_with_functions();
+
+ if (test__start_subtest("blindness"))
+ check_blindness();
}
/* Check all kinds of operations and related restrictions */
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 07/11] bpf, x86: allow indirect jumps to r8...r15
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (5 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 06/11] selftests/bpf: test instructions arrays with blinding Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps Anton Protopopov
` (3 subsequent siblings)
10 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Currently, the emit_indirect_jump() function only accepts one of the
RAX, RCX, ..., RBP registers as the destination. Prepare it to accept
R8, R9, ..., R15 as well. This is necessary to enable indirect jumps
support in eBPF.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
arch/x86/net/bpf_jit_comp.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 589c3d5119f9..4bfb4faab4d7 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -659,7 +659,19 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
#define EMIT_LFENCE() EMIT3(0x0F, 0xAE, 0xE8)
-static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)
+{
+ u8 *prog = *pprog;
+
+ if (ereg)
+ EMIT1(0x41);
+
+ EMIT2(0xFF, 0xE0 + reg);
+
+ *pprog = prog;
+}
+
+static void emit_indirect_jump(u8 **pprog, int reg, bool ereg, u8 *ip)
{
u8 *prog = *pprog;
@@ -668,15 +680,15 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
emit_jump(&prog, its_static_thunk(reg), ip);
} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE)) {
EMIT_LFENCE();
- EMIT2(0xFF, 0xE0 + reg);
+ __emit_indirect_jump(pprog, reg, ereg);
} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
OPTIMIZER_HIDE_VAR(reg);
if (cpu_feature_enabled(X86_FEATURE_CALL_DEPTH))
- emit_jump(&prog, &__x86_indirect_jump_thunk_array[reg], ip);
+ emit_jump(&prog, &__x86_indirect_jump_thunk_array[reg + 8*ereg], ip);
else
- emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
+ emit_jump(&prog, &__x86_indirect_thunk_array[reg + 8*ereg], ip);
} else {
- EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
+ __emit_indirect_jump(pprog, reg, ereg);
if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) || IS_ENABLED(CONFIG_MITIGATION_SLS))
EMIT1(0xCC); /* int3 */
}
@@ -796,7 +808,7 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
* rdi == ctx (1st arg)
* rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
*/
- emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
+ emit_indirect_jump(&prog, 1 /* rcx */, false, ip + (prog - start));
/* out: */
ctx->tail_call_indirect_label = prog - start;
@@ -3442,7 +3454,7 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs, u8 *image,
if (err)
return err;
- emit_indirect_jump(&prog, 2 /* rdx */, image + (prog - buf));
+ emit_indirect_jump(&prog, 2 /* rdx */, false, image + (prog - buf));
*pprog = prog;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (6 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 07/11] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-18 7:57 ` Dan Carpenter
2025-08-25 23:15 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 09/11] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
` (2 subsequent siblings)
10 siblings, 2 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add support for a new instruction
BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0[, imm=fd(M)]
which does an indirect jump to a location stored in Rx. The register
Rx should have type PTR_TO_INSN. This new type assures that the Rx
register contains a value (or a range of values) loaded from a
correct jump table – map of type instruction array. The optional map
M can be used to associate the jump with a particular map. If not
given, then the verifier will collect all the possible maps with
targets withing the current subprogram.
Example: for a C switch LLVM will generate the following code:
0: r3 = r1 # "switch (r3)"
1: if r3 > 0x13 goto +0x666 # check r3 boundaries
2: r3 <<= 0x3 # adjust to an index in array of addresses
3: r1 = 0xbeef ll # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
5: r1 += r3 # r1 inherits boundaries from r3
6: r1 = *(u64 *)(r1 + 0x0) # r1 now has type INSN_TO_PTR
7: gotox r1[,imm=fd(M)] # jit will generate proper code
Here the gotox instruction corresponds to one particular map. This is
possible however to have a gotox instruction which can be loaded from
different maps, e.g.
0: r1 &= 0x1
1: r2 <<= 0x3
2: r3 = 0x0 ll # load from map M_1
4: r3 += r2
5: if r1 == 0x0 goto +0x4
6: r1 <<= 0x3
7: r3 = 0x0 ll # load from map M_2
9: r3 += r1
A: r1 = *(u64 *)(r3 + 0x0)
B: gotox r1 # jump to target loaded from M_1 or M_2
During check_cfg stage, if map M is not given inside the gotox
instruction, the verifier will collect all the maps which point to
inside the subprog being verified. When building the config, the
high 16 bytes of the insn_state are used, so this patch
(theoretically) supports jump tables of up to 2^16 slots.
During the verification stage, in check_indirect_jump, it is checked
that the register Rx was loaded from a particular instruction array.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
arch/x86/net/bpf_jit_comp.c | 11 +-
include/linux/bpf.h | 1 +
include/linux/bpf_verifier.h | 18 +-
kernel/bpf/bpf_insn_array.c | 16 +-
kernel/bpf/core.c | 1 +
kernel/bpf/verifier.c | 491 +++++++++++++++++++++++++++++++++--
6 files changed, 515 insertions(+), 23 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 4bfb4faab4d7..f419a89b0147 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -671,9 +671,11 @@ static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)
*pprog = prog;
}
-static void emit_indirect_jump(u8 **pprog, int reg, bool ereg, u8 *ip)
+static void emit_indirect_jump(u8 **pprog, int bpf_reg, u8 *ip)
{
u8 *prog = *pprog;
+ int reg = reg2hex[bpf_reg];
+ bool ereg = is_ereg(bpf_reg);
if (cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) {
OPTIMIZER_HIDE_VAR(reg);
@@ -808,7 +810,7 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
* rdi == ctx (1st arg)
* rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
*/
- emit_indirect_jump(&prog, 1 /* rcx */, false, ip + (prog - start));
+ emit_indirect_jump(&prog, BPF_REG_4 /* R4 -> rcx */, ip + (prog - start));
/* out: */
ctx->tail_call_indirect_label = prog - start;
@@ -2518,6 +2520,9 @@ st: if (is_imm8(insn->off))
break;
+ case BPF_JMP | BPF_JA | BPF_X:
+ emit_indirect_jump(&prog, insn->dst_reg, image + addrs[i - 1]);
+ break;
case BPF_JMP | BPF_JA:
case BPF_JMP32 | BPF_JA:
if (BPF_CLASS(insn->code) == BPF_JMP) {
@@ -3454,7 +3459,7 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs, u8 *image,
if (err)
return err;
- emit_indirect_jump(&prog, 2 /* rdx */, false, image + (prog - buf));
+ emit_indirect_jump(&prog, BPF_REG_3 /* R3 -> rdx */, image + (prog - buf));
*pprog = prog;
return 0;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 534ce7733277..77d7f78315ea 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -970,6 +970,7 @@ enum bpf_reg_type {
PTR_TO_ARENA,
PTR_TO_BUF, /* reg points to a read/write buffer */
PTR_TO_FUNC, /* reg points to a bpf program function */
+ PTR_TO_INSN, /* reg points to a bpf program instruction */
CONST_PTR_TO_DYNPTR, /* reg points to a const struct bpf_dynptr */
__BPF_REG_TYPE_MAX,
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index aca43c284203..6e68e0082c81 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -77,7 +77,15 @@ struct bpf_reg_state {
* the map_uid is non-zero for registers
* pointing to inner maps.
*/
- u32 map_uid;
+ union {
+ u32 map_uid;
+
+ /* Used to track boundaries of a PTR_TO_INSN */
+ struct {
+ u32 min_index;
+ u32 max_index;
+ };
+ };
};
/* for PTR_TO_BTF_ID */
@@ -542,6 +550,11 @@ struct bpf_insn_aux_data {
struct {
u32 map_index; /* index into used_maps[] */
u32 map_off; /* offset from value base address */
+
+ struct jt { /* jump table for gotox instruction */
+ u32 *off;
+ int off_cnt;
+ } jt;
};
struct {
enum bpf_reg_type reg_type; /* type of pseudo_btf_id */
@@ -586,6 +599,9 @@ struct bpf_insn_aux_data {
u8 fastcall_spills_num:3;
u8 arg_prog:4;
+ /* true if jt->off was allocated */
+ bool jt_allocated;
+
/* below fields are initialized once */
unsigned int orig_idx; /* original instruction index */
bool jmp_point;
diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c
index 0c8dac62f457..d077a5aa2c7c 100644
--- a/kernel/bpf/bpf_insn_array.c
+++ b/kernel/bpf/bpf_insn_array.c
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/bpf.h>
-#include <linux/sort.h>
#define MAX_INSN_ARRAY_ENTRIES 256
@@ -173,6 +172,20 @@ static u64 insn_array_mem_usage(const struct bpf_map *map)
return insn_array_alloc_size(map->max_entries) + extra_size;
}
+static int insn_array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+
+ if ((off % sizeof(long)) != 0 ||
+ (off / sizeof(long)) >= map->max_entries)
+ return -EINVAL;
+
+ /* from BPF's point of view, this map is a jump table */
+ *imm = (unsigned long)insn_array->ips + off / sizeof(long);
+
+ return 0;
+}
+
BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
const struct bpf_map_ops insn_array_map_ops = {
@@ -185,6 +198,7 @@ const struct bpf_map_ops insn_array_map_ops = {
.map_delete_elem = insn_array_delete_elem,
.map_check_btf = insn_array_check_btf,
.map_mem_usage = insn_array_mem_usage,
+ .map_direct_value_addr = insn_array_map_direct_value_addr,
.map_btf_id = &insn_array_btf_ids[0],
};
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 27e9c30ad6dc..1ecd2362f4ce 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1739,6 +1739,7 @@ bool bpf_opcode_in_insntable(u8 code)
[BPF_LD | BPF_IND | BPF_B] = true,
[BPF_LD | BPF_IND | BPF_H] = true,
[BPF_LD | BPF_IND | BPF_W] = true,
+ [BPF_JMP | BPF_JA | BPF_X] = true,
[BPF_JMP | BPF_JCOND] = true,
};
#undef BPF_INSN_3_TBL
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 863b7114866b..c2cfa55913f8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -212,6 +212,7 @@ static int ref_set_non_owning(struct bpf_verifier_env *env,
static void specialize_kfunc(struct bpf_verifier_env *env,
u32 func_id, u16 offset, unsigned long *addr);
static bool is_trusted_reg(const struct bpf_reg_state *reg);
+static int add_used_map(struct bpf_verifier_env *env, int fd);
static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
{
@@ -2957,14 +2958,13 @@ static int cmp_subprogs(const void *a, const void *b)
((struct bpf_subprog_info *)b)->start;
}
-/* Find subprogram that contains instruction at 'off' */
-static struct bpf_subprog_info *find_containing_subprog(struct bpf_verifier_env *env, int off)
+static int find_containing_subprog_idx(struct bpf_verifier_env *env, int off)
{
struct bpf_subprog_info *vals = env->subprog_info;
int l, r, m;
if (off >= env->prog->len || off < 0 || env->subprog_cnt == 0)
- return NULL;
+ return -1;
l = 0;
r = env->subprog_cnt - 1;
@@ -2975,7 +2975,19 @@ static struct bpf_subprog_info *find_containing_subprog(struct bpf_verifier_env
else
r = m - 1;
}
- return &vals[l];
+ return l;
+}
+
+/* Find subprogram that contains instruction at 'off' */
+static struct bpf_subprog_info *find_containing_subprog(struct bpf_verifier_env *env, int off)
+{
+ int subprog_idx;
+
+ subprog_idx = find_containing_subprog_idx(env, off);
+ if (subprog_idx < 0)
+ return NULL;
+
+ return &env->subprog_info[subprog_idx];
}
/* Find subprogram that starts exactly at 'off' */
@@ -6072,6 +6084,14 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
return 0;
}
+static u32 map_mem_size(const struct bpf_map *map)
+{
+ if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY)
+ return map->max_entries * sizeof(long);
+
+ return map->value_size;
+}
+
/* check read/write into a map element with possible variable offset */
static int check_map_access(struct bpf_verifier_env *env, u32 regno,
int off, int size, bool zero_size_allowed,
@@ -6081,11 +6101,11 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *reg = &state->regs[regno];
struct bpf_map *map = reg->map_ptr;
+ u32 mem_size = map_mem_size(map);
struct btf_record *rec;
int err, i;
- err = check_mem_region_access(env, regno, off, size, map->value_size,
- zero_size_allowed);
+ err = check_mem_region_access(env, regno, off, size, mem_size, zero_size_allowed);
if (err)
return err;
@@ -7790,12 +7810,18 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
bool allow_trust_mismatch);
+static bool map_is_insn_array(struct bpf_map *map)
+{
+ return map && map->map_type == BPF_MAP_TYPE_INSN_ARRAY;
+}
+
static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
bool strict_alignment_once, bool is_ldsx,
bool allow_trust_mismatch, const char *ctx)
{
struct bpf_reg_state *regs = cur_regs(env);
enum bpf_reg_type src_reg_type;
+ struct bpf_map *map_ptr_copy = NULL;
int err;
/* check src operand */
@@ -7810,6 +7836,9 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
src_reg_type = regs[insn->src_reg].type;
+ if (src_reg_type == PTR_TO_MAP_VALUE && map_is_insn_array(regs[insn->src_reg].map_ptr))
+ map_ptr_copy = regs[insn->src_reg].map_ptr;
+
/* Check if (src_reg + off) is readable. The state of dst_reg will be
* updated by this call.
*/
@@ -7820,6 +7849,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
allow_trust_mismatch);
err = err ?: reg_bounds_sanity_check(env, ®s[insn->dst_reg], ctx);
+ if (map_ptr_copy) {
+ regs[insn->dst_reg].type = PTR_TO_INSN;
+ regs[insn->dst_reg].map_ptr = map_ptr_copy;
+ regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
+ regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
+ }
+
return err;
}
@@ -14457,6 +14493,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *regs = state->regs, *dst_reg;
bool known = tnum_is_const(off_reg->var_off);
+ bool ptr_to_insn_array = base_type(ptr_reg->type) == PTR_TO_MAP_VALUE &&
+ map_is_insn_array(ptr_reg->map_ptr);
s64 smin_val = off_reg->smin_value, smax_val = off_reg->smax_value,
smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value;
u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value,
@@ -14554,6 +14592,36 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
switch (opcode) {
case BPF_ADD:
+ if (ptr_to_insn_array) {
+ u32 min_index = dst_reg->min_index;
+ u32 max_index = dst_reg->max_index;
+
+ if ((umin_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
+ verbose(env, "umin_value %llu + offset %u is too big to convert to index\n",
+ umin_val, ptr_reg->off);
+ return -EACCES;
+ }
+ if ((umax_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
+ verbose(env, "umax_value %llu + offset %u is too big to convert to index\n",
+ umax_val, ptr_reg->off);
+ return -EACCES;
+ }
+
+ min_index += (umin_val + ptr_reg->off) / sizeof(long);
+ max_index += (umax_val + ptr_reg->off) / sizeof(long);
+
+ if (min_index >= ptr_reg->map_ptr->max_entries) {
+ verbose(env, "min_index %u points to outside of map\n", min_index);
+ return -EACCES;
+ }
+ if (max_index >= ptr_reg->map_ptr->max_entries) {
+ verbose(env, "max_index %u points to outside of map\n", max_index);
+ return -EACCES;
+ }
+
+ dst_reg->min_index = min_index;
+ dst_reg->max_index = max_index;
+ }
/* We can take a fixed offset as long as it doesn't overflow
* the s32 'off' field
*/
@@ -14598,6 +14666,11 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
}
break;
case BPF_SUB:
+ if (ptr_to_insn_array) {
+ verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
+ bpf_alu_string[opcode >> 4]);
+ return -EACCES;
+ }
if (dst_reg == off_reg) {
/* scalar -= pointer. Creates an unknown scalar */
verbose(env, "R%d tried to subtract pointer from scalar\n",
@@ -16943,7 +17016,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
}
dst_reg->type = PTR_TO_MAP_VALUE;
dst_reg->off = aux->map_off;
- WARN_ON_ONCE(map->max_entries != 1);
+ WARN_ON_ONCE(map->map_type != BPF_MAP_TYPE_INSN_ARRAY &&
+ map->max_entries != 1);
/* We want reg->id to be same (0) as map_value is not distinct */
} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
insn->src_reg == BPF_PSEUDO_MAP_IDX) {
@@ -17696,6 +17770,246 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
return 0;
}
+#define SET_HIGH(STATE, LAST) STATE = (STATE & 0xffffU) | ((LAST) << 16)
+#define GET_HIGH(STATE) ((u16)((STATE) >> 16))
+
+static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct jt *jt)
+{
+ int *insn_stack = env->cfg.insn_stack;
+ int *insn_state = env->cfg.insn_state;
+ u16 prev;
+ int w;
+
+ for (prev = GET_HIGH(insn_state[t]); prev < jt->off_cnt; prev++) {
+ w = jt->off[prev];
+
+ /* EXPLORED || DISCOVERED */
+ if (insn_state[w])
+ continue;
+
+ break;
+ }
+
+ if (prev == jt->off_cnt)
+ return DONE_EXPLORING;
+
+ mark_prune_point(env, t);
+
+ if (env->cfg.cur_stack >= env->prog->len)
+ return -E2BIG;
+ insn_stack[env->cfg.cur_stack++] = w;
+
+ mark_jmp_point(env, w);
+
+ SET_HIGH(insn_state[t], prev + 1);
+ return KEEP_EXPLORING;
+}
+
+static int copy_insn_array(struct bpf_map *map, u32 start, u32 end, u32 *off)
+{
+ struct bpf_insn_array_value *value;
+ u32 i;
+
+ for (i = start; i <= end; i++) {
+ value = map->ops->map_lookup_elem(map, &i);
+ if (!value)
+ return -EINVAL;
+ off[i - start] = value->xlated_off;
+ }
+ return 0;
+}
+
+static int cmp_ptr_to_u32(const void *a, const void *b)
+{
+ return *(u32 *)a - *(u32 *)b;
+}
+
+static int sort_insn_array_uniq(u32 *off, int off_cnt)
+{
+ int unique = 1;
+ int i;
+
+ sort(off, off_cnt, sizeof(off[0]), cmp_ptr_to_u32, NULL);
+
+ for (i = 1; i < off_cnt; i++)
+ if (off[i] != off[unique - 1])
+ off[unique++] = off[i];
+
+ return unique;
+}
+
+/*
+ * sort_unique({map[start], ..., map[end]}) into off
+ */
+static int copy_insn_array_uniq(struct bpf_map *map, u32 start, u32 end, u32 *off)
+{
+ u32 n = end - start + 1;
+ int err;
+
+ err = copy_insn_array(map, start, end, off);
+ if (err)
+ return err;
+
+ return sort_insn_array_uniq(off, n);
+}
+
+/*
+ * Copy all unique offsets from the map
+ */
+static int jt_from_map(struct bpf_map *map, struct jt *jt)
+{
+ u32 *off;
+ int n;
+
+ off = kvcalloc(map->max_entries, sizeof(u32), GFP_KERNEL_ACCOUNT);
+ if (!off)
+ return -ENOMEM;
+
+ n = copy_insn_array_uniq(map, 0, map->max_entries - 1, off);
+ if (n < 0) {
+ kvfree(off);
+ return n;
+ }
+
+ jt->off = off;
+ jt->off_cnt = n;
+ return 0;
+}
+
+/*
+ * Find and collect all maps which fit in the subprog. Return the result as one
+ * combined jump table in jt->off (allocated with kvcalloc
+ */
+static int jt_from_subprog(struct bpf_verifier_env *env,
+ int subprog_start,
+ int subprog_end,
+ struct jt *jt)
+{
+ struct bpf_map *map;
+ struct jt jt_cur;
+ u32 *off;
+ int err;
+ int i;
+
+ jt->off = NULL;
+ jt->off_cnt = 0;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++) {
+ /*
+ * TODO (when needed): collect only jump tables, not static keys
+ * or maps for indirect calls
+ */
+ map = env->insn_array_maps[i];
+
+ err = jt_from_map(map, &jt_cur);
+ if (err) {
+ kvfree(jt->off);
+ return err;
+ }
+
+ /*
+ * This is enough to check one element. The full table is
+ * checked to fit inside the subprog later in create_jt()
+ */
+ if (jt_cur.off[0] >= subprog_start && jt_cur.off[0] < subprog_end) {
+ off = kvrealloc(jt->off, (jt->off_cnt + jt_cur.off_cnt) << 2, GFP_KERNEL_ACCOUNT);
+ if (!off) {
+ kvfree(jt_cur.off);
+ kvfree(jt->off);
+ return -ENOMEM;
+ }
+ memcpy(off + jt->off_cnt, jt_cur.off, jt_cur.off_cnt << 2);
+ jt->off = off;
+ jt->off_cnt += jt_cur.off_cnt;
+ }
+
+ kvfree(jt_cur.off);
+ }
+
+ if (jt->off == NULL) {
+ verbose(env, "no jump tables found for subprog starting at %u\n", subprog_start);
+ return -EINVAL;
+ }
+
+ jt->off_cnt = sort_insn_array_uniq(jt->off, jt->off_cnt);
+ return 0;
+}
+
+static int create_jt(int t, struct bpf_verifier_env *env, int fd, struct jt *jt)
+{
+ static struct bpf_subprog_info *subprog;
+ int subprog_idx, subprog_start, subprog_end;
+ struct bpf_map *map;
+ int map_idx;
+ int ret;
+ int i;
+
+ if (env->subprog_cnt == 0)
+ return -EFAULT;
+
+ subprog_idx = find_containing_subprog_idx(env, t);
+ if (subprog_idx < 0) {
+ verbose(env, "can't find subprog containing instruction %d\n", t);
+ return -EFAULT;
+ }
+ subprog = &env->subprog_info[subprog_idx];
+ subprog_start = subprog->start;
+ subprog_end = (subprog + 1)->start;
+
+ map_idx = add_used_map(env, fd);
+ if (map_idx >= 0) {
+ map = env->used_maps[map_idx];
+ if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY) {
+ verbose(env, "map type %d in the gotox insn %d is incorrect\n",
+ map->map_type, t);
+ return -EINVAL;
+ }
+
+ env->insn_aux_data[t].map_index = map_idx;
+
+ ret = jt_from_map(map, jt);
+ if (ret)
+ return ret;
+ } else {
+ ret = jt_from_subprog(env, subprog_start, subprog_end, jt);
+ if (ret)
+ return ret;
+ }
+
+ /* Check that the every element of the jump table fits within the given subprogram */
+ for (i = 0; i < jt->off_cnt; i++) {
+ if (jt->off[i] < subprog_start || jt->off[i] >= subprog_end) {
+ verbose(env, "jump table for insn %d points outside of the subprog [%u,%u]",
+ t, subprog_start, subprog_end);
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+/* "conditional jump with N edges" */
+static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
+{
+ struct jt *jt = &env->insn_aux_data[t].jt;
+ int ret;
+
+ if (jt->off == NULL) {
+ ret = create_jt(t, env, fd, jt);
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * Mark jt as allocated. Otherwise, this is not possible to check if it
+ * was allocated or not in the code which frees memory (jt is a part of
+ * union)
+ */
+ env->insn_aux_data[t].jt_allocated = true;
+
+ return push_goto_x_edge(t, env, jt);
+}
+
/* Visits the instruction at index t and returns one of the following:
* < 0 - an error occurred
* DONE_EXPLORING - the instruction was fully explored
@@ -17786,8 +18100,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
case BPF_JA:
- if (BPF_SRC(insn->code) != BPF_K)
- return -EINVAL;
+ if (BPF_SRC(insn->code) == BPF_X)
+ return visit_goto_x_insn(t, env, insn->imm);
if (BPF_CLASS(insn->code) == BPF_JMP)
off = insn->off;
@@ -17818,6 +18132,13 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
}
}
+static bool insn_is_gotox(struct bpf_insn *insn)
+{
+ return BPF_CLASS(insn->code) == BPF_JMP &&
+ BPF_OP(insn->code) == BPF_JA &&
+ BPF_SRC(insn->code) == BPF_X;
+}
+
/* non-recursive depth-first-search to detect loops in BPF program
* loop == back-edge in directed graph
*/
@@ -18679,6 +19000,10 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
case PTR_TO_ARENA:
return true;
+ case PTR_TO_INSN:
+ /* cur ⊆ old */
+ return (rcur->min_index >= rold->min_index &&
+ rcur->max_index <= rold->max_index);
default:
return regs_exact(rold, rcur, idmap);
}
@@ -19825,6 +20150,67 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
return PROCESS_BPF_EXIT;
}
+/* gotox *dst_reg */
+static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
+{
+ struct bpf_verifier_state *other_branch;
+ struct bpf_reg_state *dst_reg;
+ struct bpf_map *map;
+ int err = 0;
+ u32 *xoff;
+ int n;
+ int i;
+
+ dst_reg = reg_state(env, insn->dst_reg);
+ if (dst_reg->type != PTR_TO_INSN) {
+ verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
+ insn->dst_reg, dst_reg->type);
+ return -EINVAL;
+ }
+
+ map = dst_reg->map_ptr;
+ if (!map)
+ return -EINVAL;
+
+ if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY)
+ return -EINVAL;
+
+ if (dst_reg->max_index >= map->max_entries) {
+ verbose(env, "BPF_JA|BPF_X R%d is out of map boundaries: index=%u, max_index=%u\n",
+ insn->dst_reg, dst_reg->max_index, map->max_entries-1);
+ return -EINVAL;
+ }
+
+ xoff = kvcalloc(dst_reg->max_index - dst_reg->min_index + 1, sizeof(u32), GFP_KERNEL_ACCOUNT);
+ if (!xoff)
+ return -ENOMEM;
+
+ n = copy_insn_array_uniq(map, dst_reg->min_index, dst_reg->max_index, xoff);
+ if (n < 0) {
+ err = n;
+ goto free_off;
+ }
+ if (n == 0) {
+ verbose(env, "register R%d doesn't point to any offset in map id=%d\n",
+ insn->dst_reg, map->id);
+ err = -EINVAL;
+ goto free_off;
+ }
+
+ for (i = 0; i < n - 1; i++) {
+ other_branch = push_stack(env, xoff[i], env->insn_idx, false);
+ if (IS_ERR(other_branch)) {
+ err = PTR_ERR(other_branch);
+ goto free_off;
+ }
+ }
+ env->insn_idx = xoff[n-1];
+
+free_off:
+ kvfree(xoff);
+ return err;
+}
+
static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
{
int err;
@@ -19927,6 +20313,9 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
mark_reg_scratched(env, BPF_REG_0);
} else if (opcode == BPF_JA) {
+ if (BPF_SRC(insn->code) == BPF_X)
+ return check_indirect_jump(env, insn);
+
if (BPF_SRC(insn->code) != BPF_K ||
insn->src_reg != BPF_REG_0 ||
insn->dst_reg != BPF_REG_0 ||
@@ -20423,6 +20812,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
case BPF_MAP_TYPE_QUEUE:
case BPF_MAP_TYPE_STACK:
case BPF_MAP_TYPE_ARENA:
+ case BPF_MAP_TYPE_INSN_ARRAY:
break;
default:
verbose(env,
@@ -20981,6 +21371,23 @@ static int bpf_adj_linfo_after_remove(struct bpf_verifier_env *env, u32 off,
return 0;
}
+/*
+ * Clean up dynamically allocated fields of aux data for instructions [start, ..., end]
+ */
+static void clear_insn_aux_data(struct bpf_insn_aux_data *aux_data, int start, int end)
+{
+ int i;
+
+ for (i = start; i <= end; i++) {
+ if (aux_data[i].jt_allocated) {
+ kvfree(aux_data[i].jt.off);
+ aux_data[i].jt.off = NULL;
+ aux_data[i].jt.off_cnt = 0;
+ aux_data[i].jt_allocated = false;
+ }
+ }
+}
+
static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
{
struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
@@ -21004,6 +21411,8 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
adjust_insn_arrays_after_remove(env, off, cnt);
+ clear_insn_aux_data(aux_data, off, off + cnt - 1);
+
memmove(aux_data + off, aux_data + off + cnt,
sizeof(*aux_data) * (orig_prog_len - off - cnt));
@@ -21643,6 +22052,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->jited_linfo = prog->aux->jited_linfo;
func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
func[i]->aux->arena = prog->aux->arena;
+ func[i]->aux->used_maps = env->used_maps;
+ func[i]->aux->used_map_cnt = env->used_map_cnt;
num_exentries = 0;
insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) {
@@ -24175,18 +24586,18 @@ static bool can_jump(struct bpf_insn *insn)
return false;
}
-static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
+static int insn_successors_regular(struct bpf_prog *prog, u32 insn_idx, u32 *succ)
{
- struct bpf_insn *insn = &prog->insnsi[idx];
+ struct bpf_insn *insn = &prog->insnsi[insn_idx];
int i = 0, insn_sz;
u32 dst;
insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
- if (can_fallthrough(insn) && idx + 1 < prog->len)
- succ[i++] = idx + insn_sz;
+ if (can_fallthrough(insn) && insn_idx + 1 < prog->len)
+ succ[i++] = insn_idx + insn_sz;
if (can_jump(insn)) {
- dst = idx + jmp_offset(insn) + 1;
+ dst = insn_idx + jmp_offset(insn) + 1;
if (i == 0 || succ[0] != dst)
succ[i++] = dst;
}
@@ -24194,6 +24605,36 @@ static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
return i;
}
+static int insn_successors_gotox(struct bpf_verifier_env *env,
+ struct bpf_prog *prog,
+ u32 insn_idx, u32 **succ)
+{
+ struct jt *jt = &env->insn_aux_data[insn_idx].jt;
+
+ if (WARN_ON_ONCE(!jt->off || !jt->off_cnt))
+ return -EFAULT;
+
+ *succ = jt->off;
+ return jt->off_cnt;
+}
+
+/*
+ * Fill in *succ[0],...,*succ[n-1] with successors. The default *succ
+ * pointer (of size 2) may be replaced with a custom one if more
+ * elements are required (i.e., an indirect jump).
+ */
+static int insn_successors(struct bpf_verifier_env *env,
+ struct bpf_prog *prog,
+ u32 insn_idx, u32 **succ)
+{
+ struct bpf_insn *insn = &prog->insnsi[insn_idx];
+
+ if (unlikely(insn_is_gotox(insn)))
+ return insn_successors_gotox(env, prog, insn_idx, succ);
+
+ return insn_successors_regular(prog, insn_idx, *succ);
+}
+
/* Each field is a register bitmask */
struct insn_live_regs {
u16 use; /* registers read by instruction */
@@ -24387,11 +24828,17 @@ static int compute_live_registers(struct bpf_verifier_env *env)
int insn_idx = env->cfg.insn_postorder[i];
struct insn_live_regs *live = &state[insn_idx];
int succ_num;
- u32 succ[2];
+ u32 _succ[2];
+ u32 *succ = &_succ[0];
u16 new_out = 0;
u16 new_in = 0;
- succ_num = insn_successors(env->prog, insn_idx, succ);
+ succ_num = insn_successors(env, env->prog, insn_idx, &succ);
+ if (succ_num < 0) {
+ err = succ_num;
+ goto out;
+
+ }
for (int s = 0; s < succ_num; ++s)
new_out |= state[succ[s]].in;
new_in = (new_out & ~live->def) | live->use;
@@ -24453,7 +24900,6 @@ static int compute_scc(struct bpf_verifier_env *env)
u32 next_preorder_num;
u32 next_scc_id;
bool assign_scc;
- u32 succ[2];
next_preorder_num = 1;
next_scc_id = 1;
@@ -24552,6 +24998,9 @@ static int compute_scc(struct bpf_verifier_env *env)
dfs[0] = i;
dfs_continue:
while (dfs_sz) {
+ u32 _succ[2];
+ u32 *succ = &_succ[0];
+
w = dfs[dfs_sz - 1];
if (pre[w] == 0) {
low[w] = next_preorder_num;
@@ -24560,7 +25009,12 @@ static int compute_scc(struct bpf_verifier_env *env)
stack[stack_sz++] = w;
}
/* Visit 'w' successors */
- succ_cnt = insn_successors(env->prog, w, succ);
+ succ_cnt = insn_successors(env, env->prog, w, &succ);
+ if (succ_cnt < 0) {
+ err = succ_cnt;
+ goto exit;
+
+ }
for (j = 0; j < succ_cnt; ++j) {
if (pre[succ[j]]) {
low[w] = min(low[w], low[succ[j]]);
@@ -24882,6 +25336,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
err_unlock:
if (!is_priv)
mutex_unlock(&bpf_verifier_lock);
+ clear_insn_aux_data(env->insn_aux_data, 0, env->prog->len - 1);
vfree(env->insn_aux_data);
err_free_env:
kvfree(env->cfg.insn_postorder);
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 09/11] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (7 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 11/11] selftests/bpf: add selftests for " Anton Protopopov
10 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add support for indirect jump instruction.
Example output from bpftool:
0: (79) r3 = *(u64 *)(r1 +0)
1: (25) if r3 > 0x4 goto pc+666
2: (67) r3 <<= 3
3: (18) r1 = 0xffffbeefspameggs
5: (0f) r1 += r3
6: (79) r1 = *(u64 *)(r1 +0)
7: (0d) gotox r1
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
kernel/bpf/disasm.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 20883c6b1546..4a1ecc6f7582 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -183,6 +183,13 @@ static inline bool is_mov_percpu_addr(const struct bpf_insn *insn)
return insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->off == BPF_ADDR_PERCPU;
}
+static void print_bpf_ja_indirect(bpf_insn_print_t verbose,
+ void *private_data,
+ const struct bpf_insn *insn)
+{
+ verbose(private_data, "(%02x) gotox r%d\n", insn->code, insn->dst_reg);
+}
+
void print_bpf_insn(const struct bpf_insn_cbs *cbs,
const struct bpf_insn *insn,
bool allow_ptr_leaks)
@@ -358,6 +365,8 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
} else if (insn->code == (BPF_JMP | BPF_JA)) {
verbose(cbs->private_data, "(%02x) goto pc%+d\n",
insn->code, insn->off);
+ } else if (insn->code == (BPF_JMP | BPF_JA | BPF_X)) {
+ print_bpf_ja_indirect(verbose, cbs->private_data, insn);
} else if (insn->code == (BPF_JMP | BPF_JCOND) &&
insn->src_reg == BPF_MAY_GOTO) {
verbose(cbs->private_data, "(%02x) may_goto pc%+d\n",
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (8 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 09/11] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
2025-08-21 0:20 ` Andrii Nakryiko
2025-08-26 0:06 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 11/11] selftests/bpf: add selftests for " Anton Protopopov
10 siblings, 2 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
For v5 instruction set, LLVM now is allowed to generate indirect
jumps for switch statements and for 'goto *rX' assembly. Every such a
jump will be accompanied by necessary metadata, e.g. (`llvm-objdump
-Sr ...`):
0: r2 = 0x0 ll
0000000000000030: R_BPF_64_64 BPF.JT.0.0
Here BPF.JT.1.0 is a symbol residing in the .jumptables section:
Symbol table:
4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
The -bpf-min-jump-table-entries llvm option may be used to control
the minimal size of a switch which will be converted to an indirect
jumps.
The code generated by LLVM for a switch will look, approximately,
like this:
0: rX <- jump_table_x[i]
2: rX <<= 3
3: gotox *rX
Right now there is no robust way to associate the jump with the
corresponding map, so libbpf doesn't insert map file descriptor
inside the gotox instruction.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
.../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
tools/lib/bpf/libbpf.c | 159 +++++++++++++++---
tools/lib/bpf/libbpf_probes.c | 4 +
tools/lib/bpf/linker.c | 12 +-
5 files changed, 153 insertions(+), 26 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 252e4c538edb..3377d4a01c62 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -55,7 +55,7 @@ MAP COMMANDS
| | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
-| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
+| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
DESCRIPTION
===========
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index c9de44a45778..79b90f274bef 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
" devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
" cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
" queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
- " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
+ " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
" " HELP_SPEC_OPTIONS " |\n"
" {-f|--bpffs} | {-n|--nomount} }\n"
"",
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index fe4fc5438678..a5f04544c09c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -191,6 +191,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf",
[BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
[BPF_MAP_TYPE_ARENA] = "arena",
+ [BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
};
static const char * const prog_type_name[] = {
@@ -372,6 +373,7 @@ enum reloc_type {
RELO_EXTERN_CALL,
RELO_SUBPROG_ADDR,
RELO_CORE,
+ RELO_INSN_ARRAY,
};
struct reloc_desc {
@@ -383,6 +385,7 @@ struct reloc_desc {
int map_idx;
int sym_off;
int ext_idx;
+ int sym_size;
};
};
};
@@ -496,6 +499,10 @@ struct bpf_program {
__u32 line_info_rec_size;
__u32 line_info_cnt;
__u32 prog_flags;
+
+ __u32 subprog_offset[256];
+ __u32 subprog_sec_offst[256];
+ __u32 subprog_cnt;
};
struct bpf_struct_ops {
@@ -525,6 +532,7 @@ struct bpf_struct_ops {
#define STRUCT_OPS_SEC ".struct_ops"
#define STRUCT_OPS_LINK_SEC ".struct_ops.link"
#define ARENA_SEC ".addr_space.1"
+#define JUMPTABLES_SEC ".jumptables"
enum libbpf_map_type {
LIBBPF_MAP_UNSPEC,
@@ -658,6 +666,7 @@ struct elf_state {
Elf64_Ehdr *ehdr;
Elf_Data *symbols;
Elf_Data *arena_data;
+ Elf_Data jumptables_data;
size_t shstrndx; /* section index for section name strings */
size_t strtabidx;
struct elf_sec_desc *secs;
@@ -668,6 +677,7 @@ struct elf_state {
int symbols_shndx;
bool has_st_ops;
int arena_data_shndx;
+ int jumptables_data_shndx;
};
struct usdt_manager;
@@ -3945,6 +3955,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
} else if (strcmp(name, ARENA_SEC) == 0) {
obj->efile.arena_data = data;
obj->efile.arena_data_shndx = idx;
+ } else if (strcmp(name, JUMPTABLES_SEC) == 0) {
+ memcpy(&obj->efile.jumptables_data, data, sizeof(*data));
+ obj->efile.jumptables_data_shndx = idx;
} else {
pr_info("elf: skipping unrecognized data section(%d) %s\n",
idx, name);
@@ -4599,6 +4612,16 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
return 0;
}
+ /* jump table data relocation */
+ if (shdr_idx == obj->efile.jumptables_data_shndx) {
+ reloc_desc->type = RELO_INSN_ARRAY;
+ reloc_desc->insn_idx = insn_idx;
+ reloc_desc->map_idx = -1;
+ reloc_desc->sym_off = sym->st_value;
+ reloc_desc->sym_size = sym->st_size;
+ return 0;
+ }
+
/* generic map reference relocation */
if (type == LIBBPF_MAP_UNSPEC) {
if (!bpf_object__shndx_is_maps(obj, shdr_idx)) {
@@ -6101,6 +6124,60 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
}
+static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
+{
+ static union bpf_attr attr = {
+ .map_type = BPF_MAP_TYPE_INSN_ARRAY,
+ .key_size = 4,
+ .value_size = sizeof(struct bpf_insn_array_value),
+ .max_entries = 0,
+ };
+ struct bpf_insn_array_value val = {};
+ int map_fd;
+ int err;
+ __u32 i;
+ __u32 *jt;
+
+ attr.max_entries = size / 8;
+
+ map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+ if (map_fd < 0)
+ return map_fd;
+
+ jt = (__u32 *)(obj->efile.jumptables_data.d_buf + off);
+ if (!jt)
+ return -EINVAL;
+
+ for (i = 0; i < attr.max_entries; i++) {
+ val.xlated_off = jt[2*i]/8 + adjust_off;
+ err = bpf_map_update_elem(map_fd, &i, &val, 0);
+ if (err) {
+ close(map_fd);
+ return err;
+ }
+ }
+
+ err = bpf_map_freeze(map_fd);
+ if (err) {
+ close(map_fd);
+ return err;
+ }
+
+ return map_fd;
+}
+
+static int subprog_insn_off(struct bpf_program *prog, int insn_idx)
+{
+ int i;
+
+ for (i = prog->subprog_cnt - 1; i >= 0; i--)
+ if (insn_idx >= prog->subprog_offset[i])
+ return prog->subprog_offset[i] - prog->subprog_sec_offst[i];
+
+ return -prog->sec_insn_off;
+}
+
+
/* Relocate data references within program code:
* - map references;
* - global variable references;
@@ -6192,6 +6269,21 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
case RELO_CORE:
/* will be handled by bpf_program_record_relos() */
break;
+ case RELO_INSN_ARRAY: {
+ int map_fd;
+
+ map_fd = create_jt_map(obj, relo->sym_off, relo->sym_size,
+ subprog_insn_off(prog, relo->insn_idx));
+ if (map_fd < 0) {
+ pr_warn("prog '%s': relo #%d: failed to create a jt map for sym_off=%u\n",
+ prog->name, i, relo->sym_off);
+ return map_fd;
+ }
+ insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
+ insn->imm = map_fd;
+ insn->off = 0;
+ }
+ break;
default:
pr_warn("prog '%s': relo #%d: bad relo type %d\n",
prog->name, i, relo->type);
@@ -6389,36 +6481,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
return 0;
}
+static int
+bpf_prog__append_subprog_offsets(struct bpf_program *prog, __u32 sec_insn_off, __u32 sub_insn_off)
+{
+ if (prog->subprog_cnt == ARRAY_SIZE(prog->subprog_sec_offst)) {
+ pr_warn("prog '%s': number of subprogs exceeds %zu\n",
+ prog->name, ARRAY_SIZE(prog->subprog_sec_offst));
+ return -E2BIG;
+ }
+
+ prog->subprog_sec_offst[prog->subprog_cnt] = sec_insn_off;
+ prog->subprog_offset[prog->subprog_cnt] = sub_insn_off;
+
+ prog->subprog_cnt += 1;
+ return 0;
+}
+
static int
bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
- struct bpf_program *subprog)
+ struct bpf_program *subprog)
{
- struct bpf_insn *insns;
- size_t new_cnt;
- int err;
+ struct bpf_insn *insns;
+ size_t new_cnt;
+ int err;
- subprog->sub_insn_off = main_prog->insns_cnt;
+ subprog->sub_insn_off = main_prog->insns_cnt;
- new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
- insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
- if (!insns) {
- pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
- return -ENOMEM;
- }
- main_prog->insns = insns;
- main_prog->insns_cnt = new_cnt;
+ new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
+ insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
+ if (!insns) {
+ pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
+ return -ENOMEM;
+ }
+ main_prog->insns = insns;
+ main_prog->insns_cnt = new_cnt;
- memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
- subprog->insns_cnt * sizeof(*insns));
+ memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
+ subprog->insns_cnt * sizeof(*insns));
- pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
- main_prog->name, subprog->insns_cnt, subprog->name);
+ pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
+ main_prog->name, subprog->insns_cnt, subprog->name);
- /* The subprog insns are now appended. Append its relos too. */
- err = append_subprog_relos(main_prog, subprog);
- if (err)
- return err;
- return 0;
+ /* The subprog insns are now appended. Append its relos too. */
+ err = append_subprog_relos(main_prog, subprog);
+ if (err)
+ return err;
+
+ err = bpf_prog__append_subprog_offsets(main_prog, subprog->sec_insn_off,
+ subprog->sub_insn_off);
+ if (err)
+ return err;
+
+ return 0;
}
static int
@@ -7954,6 +8068,7 @@ static int bpf_object_prepare_progs(struct bpf_object *obj)
if (err)
return err;
}
+
return 0;
}
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 9dfbe7750f56..bccf4bb747e1 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -364,6 +364,10 @@ static int probe_map_create(enum bpf_map_type map_type)
case BPF_MAP_TYPE_SOCKHASH:
case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
break;
+ case BPF_MAP_TYPE_INSN_ARRAY:
+ key_size = sizeof(__u32);
+ value_size = sizeof(struct bpf_insn_array_value);
+ break;
case BPF_MAP_TYPE_UNSPEC:
default:
return -EOPNOTSUPP;
diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
index a469e5d4fee7..827867f8bba3 100644
--- a/tools/lib/bpf/linker.c
+++ b/tools/lib/bpf/linker.c
@@ -28,6 +28,9 @@
#include "str_error.h"
#define BTF_EXTERN_SEC ".extern"
+#define RODATA_REL_SEC ".rel.rodata"
+#define JUMPTABLES_SEC ".jumptables"
+#define JUMPTABLES_REL_SEC ".rel.jumptables"
struct src_sec {
const char *sec_name;
@@ -2026,6 +2029,9 @@ static int linker_append_elf_sym(struct bpf_linker *linker, struct src_obj *obj,
obj->sym_map[src_sym_idx] = dst_sec->sec_sym_idx;
return 0;
}
+
+ if (!strcmp(src_sec->sec_name, JUMPTABLES_SEC))
+ goto add_sym;
}
if (sym_bind == STB_LOCAL)
@@ -2272,8 +2278,10 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
insn->imm += sec->dst_off / sizeof(struct bpf_insn);
else
insn->imm += sec->dst_off;
- } else {
- pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n");
+ } else if (strcmp(src_sec->sec_name, JUMPTABLES_REL_SEC) &&
+ strcmp(src_sec->sec_name, RODATA_REL_SEC)) {
+ pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
+ src_sec->sec_name);
return -EINVAL;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 bpf-next 11/11] selftests/bpf: add selftests for indirect jumps
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
` (9 preceding siblings ...)
2025-08-16 18:06 ` [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps Anton Protopopov
@ 2025-08-16 18:06 ` Anton Protopopov
10 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-16 18:06 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add selftests for indirect jumps. All the indirect jumps are
generated from C switch statements, so, if compiled by a compiler
which doesn't support indirect jumps, then should pass as well.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/prog_tests/bpf_goto_x.c | 132 ++++++
.../testing/selftests/bpf/progs/bpf_goto_x.c | 384 ++++++++++++++++++
3 files changed, 519 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_goto_x.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 77794efc020e..c0d8d2ba50b5 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -453,7 +453,9 @@ BPF_CFLAGS = -g -Wall -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
-I$(abspath $(OUTPUT)/../usr/include) \
-std=gnu11 \
-fno-strict-aliasing \
- -Wno-compare-distinct-pointer-types
+ -Wno-compare-distinct-pointer-types \
+ -Wno-initializer-overrides \
+ #
# TODO: enable me -Wsign-compare
CLANG_CFLAGS = $(CLANG_SYS_INCLUDES)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c b/tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
new file mode 100644
index 000000000000..7b7cbbed2a62
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/in6.h>
+#include <linux/udp.h>
+#include <linux/tcp.h>
+
+#include <sys/syscall.h>
+#include <bpf/bpf.h>
+
+#include "bpf_goto_x.skel.h"
+
+static void __test_run(struct bpf_program *prog, void *ctx_in, size_t ctx_size_in)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, topts,
+ .ctx_in = ctx_in,
+ .ctx_size_in = ctx_size_in,
+ );
+ int err, prog_fd;
+
+ prog_fd = bpf_program__fd(prog);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run_opts err");
+}
+
+static void check_simple(struct bpf_goto_x *skel,
+ struct bpf_program *prog,
+ __u64 ctx_in,
+ __u64 expected)
+{
+ skel->bss->ret_user = 0;
+
+ __test_run(prog, &ctx_in, sizeof(ctx_in));
+
+ if (!ASSERT_EQ(skel->bss->ret_user, expected, "skel->bss->ret_user"))
+ return;
+}
+
+static void check_simple_fentry(struct bpf_goto_x *skel,
+ struct bpf_program *prog,
+ __u64 ctx_in,
+ __u64 expected)
+{
+ skel->bss->in_user = ctx_in;
+ skel->bss->ret_user = 0;
+
+ /* trigger */
+ usleep(1);
+
+ if (!ASSERT_EQ(skel->bss->ret_user, expected, "skel->bss->ret_user"))
+ return;
+}
+
+static void check_goto_x_skel(struct bpf_goto_x *skel)
+{
+ int i;
+ __u64 in[] = {0, 1, 2, 3, 4, 5, 77};
+ __u64 out[] = {2, 3, 4, 5, 7, 19, 19};
+ __u64 out2[] = {103, 104, 107, 205, 115, 1019, 1019};
+ __u64 in3[] = {0, 11, 27, 31, 22, 45, 99};
+ __u64 out3[] = {2, 3, 4, 5, 19, 19, 19};
+ __u64 in4[] = {0, 1, 2, 3, 4, 5, 77};
+ __u64 out4[] = {12, 15, 7 , 15, 12, 15, 15};
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.simple_test, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.simple_test2, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.two_switches, in[i], out2[i]);
+
+ if (0) for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.big_jump_table, in3[i], out3[i]);
+
+ if (0) for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.one_jump_two_maps, in4[i], out4[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_static_global1, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_static_global2, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_nonstatic_global1, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_nonstatic_global2, in[i], out[i]);
+
+ bpf_program__attach(skel->progs.simple_test_other_sec);
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple_fentry(skel, skel->progs.simple_test_other_sec, in[i], out[i]);
+
+ bpf_program__attach(skel->progs.use_static_global_other_sec);
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple_fentry(skel, skel->progs.use_static_global_other_sec, in[i], out[i]);
+
+ bpf_program__attach(skel->progs.use_nonstatic_global_other_sec);
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple_fentry(skel, skel->progs.use_nonstatic_global_other_sec, in[i], out[i]);
+}
+
+void goto_x_skel(void)
+{
+ struct bpf_goto_x *skel;
+ int ret;
+
+ skel = bpf_goto_x__open();
+ if (!ASSERT_NEQ(skel, NULL, "bpf_goto_x__open"))
+ return;
+
+ ret = bpf_goto_x__load(skel);
+ if (!ASSERT_OK(ret, "bpf_goto_x__load"))
+ return;
+
+ check_goto_x_skel(skel);
+
+ bpf_goto_x__destroy(skel);
+}
+
+void test_bpf_goto_x(void)
+{
+ if (test__start_subtest("goto_x_skel"))
+ goto_x_skel();
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_goto_x.c b/tools/testing/selftests/bpf/progs/bpf_goto_x.c
new file mode 100644
index 000000000000..b6ce7cba52e8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_goto_x.c
@@ -0,0 +1,384 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+
+__u64 in_user;
+__u64 ret_user;
+
+struct simple_ctx {
+ __u64 x;
+};
+
+__u64 some_var;
+
+/*
+ * This function adds code which will be replaced by a different
+ * number of instructions by the verifier. This adds additional
+ * stress on testing the insn_array maps corresponding to indirect jumps.
+ */
+static __always_inline void adjust_insns(__u64 x)
+{
+ some_var ^= x + bpf_jiffies64();
+}
+
+SEC("syscall")
+int simple_test(struct simple_ctx *ctx)
+{
+ switch (ctx->x) {
+ case 0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int simple_test2(struct simple_ctx *ctx)
+{
+ switch (ctx->x) {
+ case 0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int simple_test_other_sec(struct pt_regs *ctx)
+{
+ __u64 x = in_user;
+
+ switch (x) {
+ case 0:
+ adjust_insns(x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int two_switches(struct simple_ctx *ctx)
+{
+ switch (ctx->x) {
+ case 0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ switch (ctx->x + !!ret_user) {
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 103;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 104;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 107;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 11);
+ ret_user = 205;
+ break;
+ case 5:
+ adjust_insns(ctx->x + 11);
+ ret_user = 115;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 1019;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int big_jump_table(struct simple_ctx *ctx __attribute__((unused)))
+{
+#if 0
+ const void *const jt[256] = {
+ [0 ... 255] = &&default_label,
+ [0] = &&l0,
+ [11] = &&l11,
+ [27] = &&l27,
+ [31] = &&l31,
+ };
+
+ goto *jt[ctx->x & 0xff];
+
+l0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ return 0;
+
+l11:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ return 0;
+
+l27:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ return 0;
+
+l31:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ return 0;
+
+default_label:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ return 0;
+#else
+ return 0;
+#endif
+}
+
+SEC("syscall")
+int one_jump_two_maps(struct simple_ctx *ctx __attribute__((unused)))
+{
+#if 0
+ __label__ l1, l2, l3, l4;
+ void *jt1[2] = { &&l1, &&l2 };
+ void *jt2[2] = { &&l3, &&l4 };
+ unsigned int a = ctx->x % 2;
+ unsigned int b = (ctx->x / 2) % 2;
+ volatile int ret = 0;
+
+ if (!(a < 2 && b < 2))
+ return 19;
+
+ if (ctx->x % 2)
+ goto *jt1[a];
+ else
+ goto *jt2[b];
+
+ l1: ret += 1;
+ l2: ret += 3;
+ l3: ret += 5;
+ l4: ret += 7;
+
+ ret_user = ret;
+ return ret;
+#else
+ return 0;
+#endif
+}
+
+/* Just to introduce some non-zero offsets in .text */
+static __noinline int f0(volatile struct simple_ctx *ctx __arg_ctx)
+{
+ if (ctx)
+ return 1;
+ else
+ return 13;
+}
+
+SEC("syscall") int f1(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ return f0(ctx);
+}
+
+static __noinline int __static_global(__u64 x)
+{
+ switch (x) {
+ case 0:
+ adjust_insns(x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int use_static_global1(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ return __static_global(ctx->x);
+}
+
+SEC("syscall")
+int use_static_global2(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ adjust_insns(ctx->x + 1);
+ return __static_global(ctx->x);
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int use_static_global_other_sec(void *ctx)
+{
+ return __static_global(in_user);
+}
+
+__noinline int __nonstatic_global(__u64 x)
+{
+ switch (x) {
+ case 0:
+ adjust_insns(x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int use_nonstatic_global1(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ return __nonstatic_global(ctx->x);
+}
+
+SEC("syscall")
+int use_nonstatic_global2(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ adjust_insns(ctx->x + 1);
+ return __nonstatic_global(ctx->x);
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int use_nonstatic_global_other_sec(void *ctx)
+{
+ return __nonstatic_global(in_user);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.34.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding
2025-08-16 18:06 ` [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding Anton Protopopov
@ 2025-08-17 5:50 ` kernel test robot
2025-08-18 8:24 ` Anton Protopopov
2025-08-25 23:29 ` Eduard Zingerman
1 sibling, 1 reply; 38+ messages in thread
From: kernel test robot @ 2025-08-17 5:50 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Eduard Zingerman,
Quentin Monnet, Yonghong Song
Cc: oe-kbuild-all
Hi Anton,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Anton-Protopopov/bpf-fix-the-return-value-of-push_stack/20250817-020411
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20250816180631.952085-6-a.s.protopopov%40gmail.com
patch subject: [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding
config: sparc-randconfig-001-20250817 (https://download.01.org/0day-ci/archive/20250817/202508171315.5y3oPyC2-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250817/202508171315.5y3oPyC2-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508171315.5y3oPyC2-lkp@intel.com/
All errors (new ones prefixed by >>):
sparc64-linux-ld: kernel/bpf/core.o: in function `bpf_jit_blind_constants':
>> core.c:(.text+0x8064): undefined reference to `bpf_insn_array_adjust'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-16 18:06 ` [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps Anton Protopopov
@ 2025-08-18 7:57 ` Dan Carpenter
2025-08-18 8:22 ` Anton Protopopov
2025-08-25 23:15 ` Eduard Zingerman
1 sibling, 1 reply; 38+ messages in thread
From: Dan Carpenter @ 2025-08-18 7:57 UTC (permalink / raw)
To: oe-kbuild, Anton Protopopov, bpf, Alexei Starovoitov,
Andrii Nakryiko, Anton Protopopov, Daniel Borkmann,
Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: lkp, oe-kbuild-all
Hi Anton,
kernel test robot noticed the following build warnings:
url: https://github.com/intel-lab-lkp/linux/commits/Anton-Protopopov/bpf-fix-the-return-value-of-push_stack/20250817-020411
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20250816180631.952085-9-a.s.protopopov%40gmail.com
patch subject: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
config: x86_64-randconfig-161-20250818 (https://download.01.org/0day-ci/archive/20250818/202508180805.aUCPtTuQ-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202508180805.aUCPtTuQ-lkp@intel.com/
smatch warnings:
kernel/bpf/verifier.c:25013 compute_scc() warn: unsigned 'succ_cnt' is never less than zero.
vim +/w +25013 kernel/bpf/verifier.c
24891 static int compute_scc(struct bpf_verifier_env *env)
24892 {
24893 const u32 NOT_ON_STACK = U32_MAX;
24894
24895 struct bpf_insn_aux_data *aux = env->insn_aux_data;
24896 const u32 insn_cnt = env->prog->len;
24897 int stack_sz, dfs_sz, err = 0;
24898 u32 *stack, *pre, *low, *dfs;
24899 u32 succ_cnt, i, j, t, w;
^^^^^^^^^^^^
24900 u32 next_preorder_num;
24901 u32 next_scc_id;
24902 bool assign_scc;
24903
24904 next_preorder_num = 1;
24905 next_scc_id = 1;
24906 /*
[ snip ]
25008 next_preorder_num++;
25009 stack[stack_sz++] = w;
25010 }
25011 /* Visit 'w' successors */
25012 succ_cnt = insn_successors(env, env->prog, w, &succ);
25013 if (succ_cnt < 0) {
^^^^^^^^^^^^
unsigned can't be negative.
25014 err = succ_cnt;
25015 goto exit;
25016
25017 }
25018 for (j = 0; j < succ_cnt; ++j) {
25019 if (pre[succ[j]]) {
25020 low[w] = min(low[w], low[succ[j]]);
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-18 7:57 ` Dan Carpenter
@ 2025-08-18 8:22 ` Anton Protopopov
0 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-18 8:22 UTC (permalink / raw)
To: Dan Carpenter
Cc: oe-kbuild, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Eduard Zingerman,
Quentin Monnet, Yonghong Song, lkp, oe-kbuild-all
On 25/08/18 10:57AM, Dan Carpenter wrote:
> Hi Anton,
>
> kernel test robot noticed the following build warnings:
>
> url: https://github.com/intel-lab-lkp/linux/commits/Anton-Protopopov/bpf-fix-the-return-value-of-push_stack/20250817-020411
> base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link: https://lore.kernel.org/r/20250816180631.952085-9-a.s.protopopov%40gmail.com
> patch subject: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
> config: x86_64-randconfig-161-20250818 (https://download.01.org/0day-ci/archive/20250818/202508180805.aUCPtTuQ-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202508180805.aUCPtTuQ-lkp@intel.com/
>
> smatch warnings:
> kernel/bpf/verifier.c:25013 compute_scc() warn: unsigned 'succ_cnt' is never less than zero.
>
> vim +/w +25013 kernel/bpf/verifier.c
> 24891 static int compute_scc(struct bpf_verifier_env *env)
> 24892 {
> 24893 const u32 NOT_ON_STACK = U32_MAX;
> 24894
> 24895 struct bpf_insn_aux_data *aux = env->insn_aux_data;
> 24896 const u32 insn_cnt = env->prog->len;
> 24897 int stack_sz, dfs_sz, err = 0;
> 24898 u32 *stack, *pre, *low, *dfs;
> 24899 u32 succ_cnt, i, j, t, w;
> ^^^^^^^^^^^^
>
> 24900 u32 next_preorder_num;
> 24901 u32 next_scc_id;
> 24902 bool assign_scc;
> 24903
> 24904 next_preorder_num = 1;
> 24905 next_scc_id = 1;
> 24906 /*
>
> [ snip ]
>
> 25008 next_preorder_num++;
> 25009 stack[stack_sz++] = w;
> 25010 }
> 25011 /* Visit 'w' successors */
> 25012 succ_cnt = insn_successors(env, env->prog, w, &succ);
> 25013 if (succ_cnt < 0) {
> ^^^^^^^^^^^^
> unsigned can't be negative.
Thanks! Fixed, will squash into v2.
> 25014 err = succ_cnt;
> 25015 goto exit;
> 25016
> 25017 }
> 25018 for (j = 0; j < succ_cnt; ++j) {
> 25019 if (pre[succ[j]]) {
> 25020 low[w] = min(low[w], low[succ[j]]);
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding
2025-08-17 5:50 ` kernel test robot
@ 2025-08-18 8:24 ` Anton Protopopov
0 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-18 8:24 UTC (permalink / raw)
To: kernel test robot
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song,
oe-kbuild-all
On 25/08/17 01:50PM, kernel test robot wrote:
> Hi Anton,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on bpf-next/master]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Anton-Protopopov/bpf-fix-the-return-value-of-push_stack/20250817-020411
> base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link: https://lore.kernel.org/r/20250816180631.952085-6-a.s.protopopov%40gmail.com
> patch subject: [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding
> config: sparc-randconfig-001-20250817 (https://download.01.org/0day-ci/archive/20250817/202508171315.5y3oPyC2-lkp@intel.com/config)
> compiler: sparc64-linux-gcc (GCC) 8.5.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250817/202508171315.5y3oPyC2-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202508171315.5y3oPyC2-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> sparc64-linux-ld: kernel/bpf/core.o: in function `bpf_jit_blind_constants':
> >> core.c:(.text+0x8064): undefined reference to `bpf_insn_array_adjust'
This is because the bpf_insn_array_adjust() is enabled with CONFIG_BPF_SYSCALL.
Will add the corresponding check in v2.
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-16 18:06 ` [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps Anton Protopopov
@ 2025-08-21 0:20 ` Andrii Nakryiko
2025-08-21 13:05 ` Anton Protopopov
2025-08-26 0:06 ` Eduard Zingerman
1 sibling, 1 reply; 38+ messages in thread
From: Andrii Nakryiko @ 2025-08-21 0:20 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On Sat, Aug 16, 2025 at 11:02 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> For v5 instruction set, LLVM now is allowed to generate indirect
> jumps for switch statements and for 'goto *rX' assembly. Every such a
> jump will be accompanied by necessary metadata, e.g. (`llvm-objdump
> -Sr ...`):
>
> 0: r2 = 0x0 ll
> 0000000000000030: R_BPF_64_64 BPF.JT.0.0
>
> Here BPF.JT.1.0 is a symbol residing in the .jumptables section:
>
> Symbol table:
> 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
>
> The -bpf-min-jump-table-entries llvm option may be used to control
> the minimal size of a switch which will be converted to an indirect
> jumps.
>
> The code generated by LLVM for a switch will look, approximately,
> like this:
>
> 0: rX <- jump_table_x[i]
> 2: rX <<= 3
> 3: gotox *rX
>
> Right now there is no robust way to associate the jump with the
> corresponding map, so libbpf doesn't insert map file descriptor
> inside the gotox instruction.
Just from the commit description it's not clear whether that's
something that needs fixing or is OK? If it's OK, why call it out?..
>
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> ---
> .../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> tools/bpf/bpftool/map.c | 2 +-
> tools/lib/bpf/libbpf.c | 159 +++++++++++++++---
> tools/lib/bpf/libbpf_probes.c | 4 +
> tools/lib/bpf/linker.c | 12 +-
> 5 files changed, 153 insertions(+), 26 deletions(-)
>
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index 252e4c538edb..3377d4a01c62 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -55,7 +55,7 @@ MAP COMMANDS
> | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
> | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> -| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
> +| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
>
> DESCRIPTION
> ===========
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index c9de44a45778..79b90f274bef 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
> " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
> " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> - " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
> + " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
> " " HELP_SPEC_OPTIONS " |\n"
> " {-f|--bpffs} | {-n|--nomount} }\n"
> "",
bpftool changes sifted through into libbpf patch?
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index fe4fc5438678..a5f04544c09c 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -191,6 +191,7 @@ static const char * const map_type_name[] = {
> [BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf",
> [BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
> [BPF_MAP_TYPE_ARENA] = "arena",
> + [BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
> };
>
> static const char * const prog_type_name[] = {
> @@ -372,6 +373,7 @@ enum reloc_type {
> RELO_EXTERN_CALL,
> RELO_SUBPROG_ADDR,
> RELO_CORE,
> + RELO_INSN_ARRAY,
> };
>
> struct reloc_desc {
> @@ -383,6 +385,7 @@ struct reloc_desc {
> int map_idx;
> int sym_off;
> int ext_idx;
> + int sym_size;
make it a union with ext_idx? ext_idx isn't used for jump table
relocation, right?
> };
> };
> };
> @@ -496,6 +499,10 @@ struct bpf_program {
> __u32 line_info_rec_size;
> __u32 line_info_cnt;
> __u32 prog_flags;
> +
> + __u32 subprog_offset[256];
> + __u32 subprog_sec_offst[256];
> + __u32 subprog_cnt;
um... allocate dynamically, if necessary? (but also see above, might
not be necessary at all)
> };
>
> struct bpf_struct_ops {
> @@ -525,6 +532,7 @@ struct bpf_struct_ops {
> #define STRUCT_OPS_SEC ".struct_ops"
> #define STRUCT_OPS_LINK_SEC ".struct_ops.link"
> #define ARENA_SEC ".addr_space.1"
> +#define JUMPTABLES_SEC ".jumptables"
>
> enum libbpf_map_type {
> LIBBPF_MAP_UNSPEC,
> @@ -658,6 +666,7 @@ struct elf_state {
> Elf64_Ehdr *ehdr;
> Elf_Data *symbols;
> Elf_Data *arena_data;
> + Elf_Data jumptables_data;
> size_t shstrndx; /* section index for section name strings */
> size_t strtabidx;
> struct elf_sec_desc *secs;
> @@ -668,6 +677,7 @@ struct elf_state {
> int symbols_shndx;
> bool has_st_ops;
> int arena_data_shndx;
> + int jumptables_data_shndx;
> };
>
> struct usdt_manager;
> @@ -3945,6 +3955,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
> } else if (strcmp(name, ARENA_SEC) == 0) {
> obj->efile.arena_data = data;
> obj->efile.arena_data_shndx = idx;
> + } else if (strcmp(name, JUMPTABLES_SEC) == 0) {
> + memcpy(&obj->efile.jumptables_data, data, sizeof(*data));
you need to preserve the contents of jump tables to preparation stage,
right? Just memcpy'ing Elf_Data doesn't preserve d_buf's contents, no?
So you need to allocate memory for the contents and keep it until
preparation phase.
pw-bot: cr
> + obj->efile.jumptables_data_shndx = idx;
> } else {
> pr_info("elf: skipping unrecognized data section(%d) %s\n",
> idx, name);
> @@ -4599,6 +4612,16 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
> return 0;
> }
>
> + /* jump table data relocation */
> + if (shdr_idx == obj->efile.jumptables_data_shndx) {
> + reloc_desc->type = RELO_INSN_ARRAY;
> + reloc_desc->insn_idx = insn_idx;
> + reloc_desc->map_idx = -1;
> + reloc_desc->sym_off = sym->st_value;
> + reloc_desc->sym_size = sym->st_size;
> + return 0;
> + }
> +
> /* generic map reference relocation */
> if (type == LIBBPF_MAP_UNSPEC) {
> if (!bpf_object__shndx_is_maps(obj, shdr_idx)) {
> @@ -6101,6 +6124,60 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
> insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
> }
>
> +static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
> +{
> + static union bpf_attr attr = {
> + .map_type = BPF_MAP_TYPE_INSN_ARRAY,
> + .key_size = 4,
> + .value_size = sizeof(struct bpf_insn_array_value),
> + .max_entries = 0,
> + };
> + struct bpf_insn_array_value val = {};
> + int map_fd;
> + int err;
> + __u32 i;
> + __u32 *jt;
nit: combine same-typed variable declarations?
> +
> + attr.max_entries = size / 8;
8 is sizeof(struct bpf_insns_array_value)? make it obvious?
> +
> + map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> + if (map_fd < 0)
> + return map_fd;
is the bpf_map_create() API not usable here? what's the reason to
open-code (incorrectly, not doing memset(0)) bpf_attr?
> +
> + jt = (__u32 *)(obj->efile.jumptables_data.d_buf + off);
> + if (!jt)
if off is not zero, this will never be true... this check looks wrong.
Check this once at the point where you record jumptables_data?
> + return -EINVAL;
> +
> + for (i = 0; i < attr.max_entries; i++) {
> + val.xlated_off = jt[2*i]/8 + adjust_off;
nit: code style: `jt[2 * i] / 8`
and this 8 is basically sizeof(struct bpf_insn), right? Can you use
that to have a bit more semantic meaning here?
> + err = bpf_map_update_elem(map_fd, &i, &val, 0);
> + if (err) {
> + close(map_fd);
> + return err;
> + }
> + }
> +
> + err = bpf_map_freeze(map_fd);
> + if (err) {
> + close(map_fd);
> + return err;
> + }
> +
> + return map_fd;
> +}
> +
> +static int subprog_insn_off(struct bpf_program *prog, int insn_idx)
> +{
> + int i;
> +
> + for (i = prog->subprog_cnt - 1; i >= 0; i--)
> + if (insn_idx >= prog->subprog_offset[i])
> + return prog->subprog_offset[i] - prog->subprog_sec_offst[i];
I feel like this whole subprog_offset and subprog_sec_offst shouldn't
be even necessary.
Check bpf_object__relocate(). I'm not sure why this was done this way
that we go across all programs in phases, doing code relocation first,
then data relocation later (across all programs again). I might be
forgetting some details, but if we change this to do all the
relocation for each program one at a time, then all this information
that you explicitly record is already recorded in
subprog->sub_insn_off and you can use it until we start relocating
another entry-point program. Can you give it a try?
So basically the structure will be:
for (i = 0; i < obj->nr_programs; i++) {
prog = ...
if (prog_is_subprog(...))
continue;
if (!prog->autoload)
continue;
bpf_object__relocate_calls()
/* that exception callback handling */
bpf_object__relocate_data()
bpf_program_fixup_func_info()
}
It feels like this should work because there cannot be
interdependencies between entry programs.
> +
> + return -prog->sec_insn_off;
why this return value?... can you elaborate?
> +}
> +
> +
> /* Relocate data references within program code:
> * - map references;
> * - global variable references;
> @@ -6192,6 +6269,21 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
> case RELO_CORE:
> /* will be handled by bpf_program_record_relos() */
> break;
> + case RELO_INSN_ARRAY: {
> + int map_fd;
> +
> + map_fd = create_jt_map(obj, relo->sym_off, relo->sym_size,
> + subprog_insn_off(prog, relo->insn_idx));
> + if (map_fd < 0) {
> + pr_warn("prog '%s': relo #%d: failed to create a jt map for sym_off=%u\n",
jt -> jump table, this is supposed to be at least somewhat
human-readable ;) also we seem to be not using blah=%d approach, so
just "sym_off %d" (and note that sym_off is int, not unsigned)
> + prog->name, i, relo->sym_off);
> + return map_fd;
> + }
> + insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
> + insn->imm = map_fd;
> + insn->off = 0;
> + }
> + break;
> default:
> pr_warn("prog '%s': relo #%d: bad relo type %d\n",
> prog->name, i, relo->type);
> @@ -6389,36 +6481,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
> return 0;
> }
>
> +static int
> +bpf_prog__append_subprog_offsets(struct bpf_program *prog, __u32 sec_insn_off, __u32 sub_insn_off)
please don't use double underscore for non-API functions, just
prog_append_subprog_offs()
but actually I'd just inline it into bpf_object__append_subprog_code,
it doesn't seem complicated enough to warrant its own function
> +{
> + if (prog->subprog_cnt == ARRAY_SIZE(prog->subprog_sec_offst)) {
please use libbpf_reallocarray()
> + pr_warn("prog '%s': number of subprogs exceeds %zu\n",
> + prog->name, ARRAY_SIZE(prog->subprog_sec_offst));
> + return -E2BIG;
> + }
> +
> + prog->subprog_sec_offst[prog->subprog_cnt] = sec_insn_off;
typo: offst, but also here and below prefer sticking to "off", it's
used pretty universally in libbpf code
> + prog->subprog_offset[prog->subprog_cnt] = sub_insn_off;
> +
> + prog->subprog_cnt += 1;
> + return 0;
> +}
> +
> static int
> bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> - struct bpf_program *subprog)
> + struct bpf_program *subprog)
> {
> - struct bpf_insn *insns;
> - size_t new_cnt;
> - int err;
> + struct bpf_insn *insns;
> + size_t new_cnt;
> + int err;
>
> - subprog->sub_insn_off = main_prog->insns_cnt;
> + subprog->sub_insn_off = main_prog->insns_cnt;
>
> - new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
> - insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
> - if (!insns) {
> - pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
> - return -ENOMEM;
> - }
> - main_prog->insns = insns;
> - main_prog->insns_cnt = new_cnt;
> + new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
> + insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
> + if (!insns) {
> + pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
> + return -ENOMEM;
> + }
> + main_prog->insns = insns;
> + main_prog->insns_cnt = new_cnt;
>
> - memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
> - subprog->insns_cnt * sizeof(*insns));
> + memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
> + subprog->insns_cnt * sizeof(*insns));
>
> - pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
> - main_prog->name, subprog->insns_cnt, subprog->name);
> + pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
> + main_prog->name, subprog->insns_cnt, subprog->name);
>
> - /* The subprog insns are now appended. Append its relos too. */
> - err = append_subprog_relos(main_prog, subprog);
> - if (err)
> - return err;
> - return 0;
> + /* The subprog insns are now appended. Append its relos too. */
> + err = append_subprog_relos(main_prog, subprog);
> + if (err)
> + return err;
> +
> + err = bpf_prog__append_subprog_offsets(main_prog, subprog->sec_insn_off,
> + subprog->sub_insn_off);
> + if (err)
> + return err;
> +
> + return 0;
> }
>
> static int
> @@ -7954,6 +8068,7 @@ static int bpf_object_prepare_progs(struct bpf_object *obj)
> if (err)
> return err;
> }
> +
?
> return 0;
> }
>
> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> index 9dfbe7750f56..bccf4bb747e1 100644
> --- a/tools/lib/bpf/libbpf_probes.c
> +++ b/tools/lib/bpf/libbpf_probes.c
> @@ -364,6 +364,10 @@ static int probe_map_create(enum bpf_map_type map_type)
> case BPF_MAP_TYPE_SOCKHASH:
> case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
> break;
> + case BPF_MAP_TYPE_INSN_ARRAY:
> + key_size = sizeof(__u32);
> + value_size = sizeof(struct bpf_insn_array_value);
> + break;
> case BPF_MAP_TYPE_UNSPEC:
> default:
> return -EOPNOTSUPP;
> diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
> index a469e5d4fee7..827867f8bba3 100644
> --- a/tools/lib/bpf/linker.c
> +++ b/tools/lib/bpf/linker.c
> @@ -28,6 +28,9 @@
> #include "str_error.h"
>
> #define BTF_EXTERN_SEC ".extern"
> +#define RODATA_REL_SEC ".rel.rodata"
> +#define JUMPTABLES_SEC ".jumptables"
> +#define JUMPTABLES_REL_SEC ".rel.jumptables"
>
> struct src_sec {
> const char *sec_name;
> @@ -2026,6 +2029,9 @@ static int linker_append_elf_sym(struct bpf_linker *linker, struct src_obj *obj,
> obj->sym_map[src_sym_idx] = dst_sec->sec_sym_idx;
> return 0;
> }
> +
> + if (!strcmp(src_sec->sec_name, JUMPTABLES_SEC))
If you look around in this file (and most of libbpf source code), you
won't see !strcmp() in it. Let's be consistent and explicit with == 0
and != 0 here and below.
> + goto add_sym;
> }
>
> if (sym_bind == STB_LOCAL)
> @@ -2272,8 +2278,10 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
> insn->imm += sec->dst_off / sizeof(struct bpf_insn);
> else
> insn->imm += sec->dst_off;
> - } else {
> - pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n");
> + } else if (strcmp(src_sec->sec_name, JUMPTABLES_REL_SEC) &&
> + strcmp(src_sec->sec_name, RODATA_REL_SEC)) {
where does .rel.rodata come from?
and we don't need to adjust the contents of any of those sections, right?...
can you please add some tests validating that two object files with
jumptables can be linked together and end up with proper combined
.jumptables section?
and in terms of code, can we do
} else if (strcmp(..., JUMPTABLES_REL_SEC) == 0) {
/* nothing to do for .rel.jumptables */
} else {
pr_warn(...);
}
It makes it more apparent what is supported and what's not.
> + pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
> + src_sec->sec_name);
> return -EINVAL;
> }
> }
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-21 0:20 ` Andrii Nakryiko
@ 2025-08-21 13:05 ` Anton Protopopov
2025-08-21 18:14 ` Andrii Nakryiko
0 siblings, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-21 13:05 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On 25/08/20 05:20PM, Andrii Nakryiko wrote:
> On Sat, Aug 16, 2025 at 11:02 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > For v5 instruction set, LLVM now is allowed to generate indirect
> > jumps for switch statements and for 'goto *rX' assembly. Every such a
> > jump will be accompanied by necessary metadata, e.g. (`llvm-objdump
> > -Sr ...`):
> >
> > 0: r2 = 0x0 ll
> > 0000000000000030: R_BPF_64_64 BPF.JT.0.0
> >
> > Here BPF.JT.1.0 is a symbol residing in the .jumptables section:
> >
> > Symbol table:
> > 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
> >
> > The -bpf-min-jump-table-entries llvm option may be used to control
> > the minimal size of a switch which will be converted to an indirect
> > jumps.
> >
> > The code generated by LLVM for a switch will look, approximately,
> > like this:
> >
> > 0: rX <- jump_table_x[i]
> > 2: rX <<= 3
> > 3: gotox *rX
> >
> > Right now there is no robust way to associate the jump with the
> > corresponding map, so libbpf doesn't insert map file descriptor
> > inside the gotox instruction.
>
> Just from the commit description it's not clear whether that's
> something that needs fixing or is OK? If it's OK, why call it out?..
Right, will rephrase.
The idea here is that if you have, say, a switch, then, most
probably, it is compiled into 1 jump table and 1 gotox. And, if
compiler can provide enough metadata, then this makes sense for
libbpf to also associate JT with gotox by inserting the same map
descriptor inside both instructions. However now this doesn't
work, and also there are cases when one gotox can be associated with
multiple JTs.
> >
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > ---
> > .../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> > tools/bpf/bpftool/map.c | 2 +-
> > tools/lib/bpf/libbpf.c | 159 +++++++++++++++---
> > tools/lib/bpf/libbpf_probes.c | 4 +
> > tools/lib/bpf/linker.c | 12 +-
> > 5 files changed, 153 insertions(+), 26 deletions(-)
> >
> > diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > index 252e4c538edb..3377d4a01c62 100644
> > --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > @@ -55,7 +55,7 @@ MAP COMMANDS
> > | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
> > | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> > | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> > -| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
> > +| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
> >
> > DESCRIPTION
> > ===========
> > diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> > index c9de44a45778..79b90f274bef 100644
> > --- a/tools/bpf/bpftool/map.c
> > +++ b/tools/bpf/bpftool/map.c
> > @@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
> > " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
> > " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> > " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> > - " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
> > + " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
> > " " HELP_SPEC_OPTIONS " |\n"
> > " {-f|--bpffs} | {-n|--nomount} }\n"
> > "",
>
> bpftool changes sifted through into libbpf patch?
Yes thanks. I think I've sqhashed the fix here, becase it broke
the `test_progs -a libbpf_str` test.
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index fe4fc5438678..a5f04544c09c 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -191,6 +191,7 @@ static const char * const map_type_name[] = {
> > [BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf",
> > [BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
> > [BPF_MAP_TYPE_ARENA] = "arena",
> > + [BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
> > };
> >
> > static const char * const prog_type_name[] = {
> > @@ -372,6 +373,7 @@ enum reloc_type {
> > RELO_EXTERN_CALL,
> > RELO_SUBPROG_ADDR,
> > RELO_CORE,
> > + RELO_INSN_ARRAY,
> > };
> >
> > struct reloc_desc {
> > @@ -383,6 +385,7 @@ struct reloc_desc {
> > int map_idx;
> > int sym_off;
> > int ext_idx;
> > + int sym_size;
>
> make it a union with ext_idx? ext_idx isn't used for jump table
> relocation, right?
Done.
>
> > };
> > };
> > };
> > @@ -496,6 +499,10 @@ struct bpf_program {
> > __u32 line_info_rec_size;
> > __u32 line_info_cnt;
> > __u32 prog_flags;
> > +
> > + __u32 subprog_offset[256];
> > + __u32 subprog_sec_offst[256];
> > + __u32 subprog_cnt;
>
> um... allocate dynamically, if necessary? (but also see above, might
> not be necessary at all)
>
> > };
> >
> > struct bpf_struct_ops {
> > @@ -525,6 +532,7 @@ struct bpf_struct_ops {
> > #define STRUCT_OPS_SEC ".struct_ops"
> > #define STRUCT_OPS_LINK_SEC ".struct_ops.link"
> > #define ARENA_SEC ".addr_space.1"
> > +#define JUMPTABLES_SEC ".jumptables"
> >
> > enum libbpf_map_type {
> > LIBBPF_MAP_UNSPEC,
> > @@ -658,6 +666,7 @@ struct elf_state {
> > Elf64_Ehdr *ehdr;
> > Elf_Data *symbols;
> > Elf_Data *arena_data;
> > + Elf_Data jumptables_data;
> > size_t shstrndx; /* section index for section name strings */
> > size_t strtabidx;
> > struct elf_sec_desc *secs;
> > @@ -668,6 +677,7 @@ struct elf_state {
> > int symbols_shndx;
> > bool has_st_ops;
> > int arena_data_shndx;
> > + int jumptables_data_shndx;
> > };
> >
> > struct usdt_manager;
> > @@ -3945,6 +3955,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
> > } else if (strcmp(name, ARENA_SEC) == 0) {
> > obj->efile.arena_data = data;
> > obj->efile.arena_data_shndx = idx;
> > + } else if (strcmp(name, JUMPTABLES_SEC) == 0) {
> > + memcpy(&obj->efile.jumptables_data, data, sizeof(*data));
>
> you need to preserve the contents of jump tables to preparation stage,
> right? Just memcpy'ing Elf_Data doesn't preserve d_buf's contents, no?
> So you need to allocate memory for the contents and keep it until
> preparation phase.
Ah, yes, right.
> pw-bot: cr
>
>
> > + obj->efile.jumptables_data_shndx = idx;
> > } else {
> > pr_info("elf: skipping unrecognized data section(%d) %s\n",
> > idx, name);
> > @@ -4599,6 +4612,16 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
> > return 0;
> > }
> >
> > + /* jump table data relocation */
> > + if (shdr_idx == obj->efile.jumptables_data_shndx) {
> > + reloc_desc->type = RELO_INSN_ARRAY;
> > + reloc_desc->insn_idx = insn_idx;
> > + reloc_desc->map_idx = -1;
> > + reloc_desc->sym_off = sym->st_value;
> > + reloc_desc->sym_size = sym->st_size;
> > + return 0;
> > + }
> > +
> > /* generic map reference relocation */
> > if (type == LIBBPF_MAP_UNSPEC) {
> > if (!bpf_object__shndx_is_maps(obj, shdr_idx)) {
> > @@ -6101,6 +6124,60 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
> > insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
> > }
> >
> > +static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
> > +{
> > + static union bpf_attr attr = {
> > + .map_type = BPF_MAP_TYPE_INSN_ARRAY,
> > + .key_size = 4,
> > + .value_size = sizeof(struct bpf_insn_array_value),
> > + .max_entries = 0,
> > + };
> > + struct bpf_insn_array_value val = {};
> > + int map_fd;
> > + int err;
> > + __u32 i;
> > + __u32 *jt;
>
> nit: combine same-typed variable declarations?
ok
> > +
> > + attr.max_entries = size / 8;
>
> 8 is sizeof(struct bpf_insns_array_value)? make it obvious?
ok
> > +
> > + map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> > + if (map_fd < 0)
> > + return map_fd;
>
> is the bpf_map_create() API not usable here? what's the reason to
> open-code (incorrectly, not doing memset(0)) bpf_attr?
Yeah, I am always forgetting that bpf_map_create() exists...
Need to put "no syscall(__NR_bpf)" in the checklist.
> > +
> > + jt = (__u32 *)(obj->efile.jumptables_data.d_buf + off);
> > + if (!jt)
>
> if off is not zero, this will never be true... this check looks wrong.
> Check this once at the point where you record jumptables_data?
Thanks, missed this.
> > + return -EINVAL;
> > +
> > + for (i = 0; i < attr.max_entries; i++) {
> > + val.xlated_off = jt[2*i]/8 + adjust_off;
>
> nit: code style: `jt[2 * i] / 8`
>
> and this 8 is basically sizeof(struct bpf_insn), right? Can you use
> that to have a bit more semantic meaning here?
Sure, thanks.
> > + err = bpf_map_update_elem(map_fd, &i, &val, 0);
> > + if (err) {
> > + close(map_fd);
> > + return err;
> > + }
> > + }
> > +
> > + err = bpf_map_freeze(map_fd);
> > + if (err) {
> > + close(map_fd);
> > + return err;
> > + }
> > +
> > + return map_fd;
> > +}
> > +
> > +static int subprog_insn_off(struct bpf_program *prog, int insn_idx)
> > +{
> > + int i;
> > +
> > + for (i = prog->subprog_cnt - 1; i >= 0; i--)
> > + if (insn_idx >= prog->subprog_offset[i])
> > + return prog->subprog_offset[i] - prog->subprog_sec_offst[i];
>
> I feel like this whole subprog_offset and subprog_sec_offst shouldn't
> be even necessary.
>
> Check bpf_object__relocate(). I'm not sure why this was done this way
> that we go across all programs in phases, doing code relocation first,
> then data relocation later (across all programs again). I might be
> forgetting some details, but if we change this to do all the
> relocation for each program one at a time, then all this information
> that you explicitly record is already recorded in
> subprog->sub_insn_off and you can use it until we start relocating
> another entry-point program. Can you give it a try?
>
> So basically the structure will be:
>
> for (i = 0; i < obj->nr_programs; i++) {
> prog = ...
> if (prog_is_subprog(...))
> continue;
> if (!prog->autoload)
> continue;
> bpf_object__relocate_calls()
> /* that exception callback handling */
> bpf_object__relocate_data()
> bpf_program_fixup_func_info()
> }
>
> It feels like this should work because there cannot be
> interdependencies between entry programs.
Ok, I will take a look at this before v2.
>
> > +
> > + return -prog->sec_insn_off;
>
> why this return value?... can you elaborate?
Jump tables generated by LLVM contain offsets relative to the
beginning of a section. The offsets inside a BPF_INSN_ARRAY
are absolute (for a "load unit", i.e., insns in bpf_prog_load).
So if, say, a section A contains two progs, f1 and f2, then,
f1 starts at 0 and f2 at F2_START. So when the f2 is loaded
jump tables needs to be adjusted by -F2_START such that offsets
are correct.
> > +}
> > +
> > +
> > /* Relocate data references within program code:
> > * - map references;
> > * - global variable references;
> > @@ -6192,6 +6269,21 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
> > case RELO_CORE:
> > /* will be handled by bpf_program_record_relos() */
> > break;
> > + case RELO_INSN_ARRAY: {
> > + int map_fd;
> > +
> > + map_fd = create_jt_map(obj, relo->sym_off, relo->sym_size,
> > + subprog_insn_off(prog, relo->insn_idx));
> > + if (map_fd < 0) {
> > + pr_warn("prog '%s': relo #%d: failed to create a jt map for sym_off=%u\n",
>
> jt -> jump table, this is supposed to be at least somewhat
> human-readable ;) also we seem to be not using blah=%d approach, so
> just "sym_off %d" (and note that sym_off is int, not unsigned)
thanks, will fix
> > + prog->name, i, relo->sym_off);
> > + return map_fd;
> > + }
> > + insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
> > + insn->imm = map_fd;
> > + insn->off = 0;
> > + }
> > + break;
> > default:
> > pr_warn("prog '%s': relo #%d: bad relo type %d\n",
> > prog->name, i, relo->type);
> > @@ -6389,36 +6481,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
> > return 0;
> > }
> >
> > +static int
> > +bpf_prog__append_subprog_offsets(struct bpf_program *prog, __u32 sec_insn_off, __u32 sub_insn_off)
>
> please don't use double underscore for non-API functions, just
> prog_append_subprog_offs()
>
> but actually I'd just inline it into bpf_object__append_subprog_code,
> it doesn't seem complicated enough to warrant its own function
ok, makes sense, will inline
> > +{
> > + if (prog->subprog_cnt == ARRAY_SIZE(prog->subprog_sec_offst)) {
>
> please use libbpf_reallocarray()
ok
>
> > + pr_warn("prog '%s': number of subprogs exceeds %zu\n",
> > + prog->name, ARRAY_SIZE(prog->subprog_sec_offst));
> > + return -E2BIG;
> > + }
> > +
> > + prog->subprog_sec_offst[prog->subprog_cnt] = sec_insn_off;
>
> typo: offst, but also here and below prefer sticking to "off", it's
> used pretty universally in libbpf code
ok
> > + prog->subprog_offset[prog->subprog_cnt] = sub_insn_off;
> > +
> > + prog->subprog_cnt += 1;
> > + return 0;
> > +}
> > +
> > static int
> > bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> > - struct bpf_program *subprog)
> > + struct bpf_program *subprog)
> > {
> > - struct bpf_insn *insns;
> > - size_t new_cnt;
> > - int err;
> > + struct bpf_insn *insns;
> > + size_t new_cnt;
> > + int err;
> >
> > - subprog->sub_insn_off = main_prog->insns_cnt;
> > + subprog->sub_insn_off = main_prog->insns_cnt;
> >
> > - new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
> > - insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
> > - if (!insns) {
> > - pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
> > - return -ENOMEM;
> > - }
> > - main_prog->insns = insns;
> > - main_prog->insns_cnt = new_cnt;
> > + new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
> > + insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
> > + if (!insns) {
> > + pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
> > + return -ENOMEM;
> > + }
> > + main_prog->insns = insns;
> > + main_prog->insns_cnt = new_cnt;
> >
> > - memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
> > - subprog->insns_cnt * sizeof(*insns));
> > + memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
> > + subprog->insns_cnt * sizeof(*insns));
> >
> > - pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
> > - main_prog->name, subprog->insns_cnt, subprog->name);
> > + pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
> > + main_prog->name, subprog->insns_cnt, subprog->name);
> >
> > - /* The subprog insns are now appended. Append its relos too. */
> > - err = append_subprog_relos(main_prog, subprog);
> > - if (err)
> > - return err;
> > - return 0;
> > + /* The subprog insns are now appended. Append its relos too. */
> > + err = append_subprog_relos(main_prog, subprog);
> > + if (err)
> > + return err;
> > +
> > + err = bpf_prog__append_subprog_offsets(main_prog, subprog->sec_insn_off,
> > + subprog->sub_insn_off);
> > + if (err)
> > + return err;
> > +
> > + return 0;
> > }
> >
> > static int
> > @@ -7954,6 +8068,7 @@ static int bpf_object_prepare_progs(struct bpf_object *obj)
> > if (err)
> > return err;
> > }
> > +
>
> ?
Thanks, fixing + squashing artefacts.
>
> > return 0;
> > }
> >
> > diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> > index 9dfbe7750f56..bccf4bb747e1 100644
> > --- a/tools/lib/bpf/libbpf_probes.c
> > +++ b/tools/lib/bpf/libbpf_probes.c
> > @@ -364,6 +364,10 @@ static int probe_map_create(enum bpf_map_type map_type)
> > case BPF_MAP_TYPE_SOCKHASH:
> > case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
> > break;
> > + case BPF_MAP_TYPE_INSN_ARRAY:
> > + key_size = sizeof(__u32);
> > + value_size = sizeof(struct bpf_insn_array_value);
> > + break;
> > case BPF_MAP_TYPE_UNSPEC:
> > default:
> > return -EOPNOTSUPP;
> > diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
> > index a469e5d4fee7..827867f8bba3 100644
> > --- a/tools/lib/bpf/linker.c
> > +++ b/tools/lib/bpf/linker.c
> > @@ -28,6 +28,9 @@
> > #include "str_error.h"
> >
> > #define BTF_EXTERN_SEC ".extern"
> > +#define RODATA_REL_SEC ".rel.rodata"
> > +#define JUMPTABLES_SEC ".jumptables"
> > +#define JUMPTABLES_REL_SEC ".rel.jumptables"
> >
> > struct src_sec {
> > const char *sec_name;
> > @@ -2026,6 +2029,9 @@ static int linker_append_elf_sym(struct bpf_linker *linker, struct src_obj *obj,
> > obj->sym_map[src_sym_idx] = dst_sec->sec_sym_idx;
> > return 0;
> > }
> > +
> > + if (!strcmp(src_sec->sec_name, JUMPTABLES_SEC))
>
> If you look around in this file (and most of libbpf source code), you
> won't see !strcmp() in it. Let's be consistent and explicit with == 0
> and != 0 here and below.
ok
>
> > + goto add_sym;
> > }
> >
> > if (sym_bind == STB_LOCAL)
> > @@ -2272,8 +2278,10 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
> > insn->imm += sec->dst_off / sizeof(struct bpf_insn);
> > else
> > insn->imm += sec->dst_off;
> > - } else {
> > - pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n");
> > + } else if (strcmp(src_sec->sec_name, JUMPTABLES_REL_SEC) &&
> > + strcmp(src_sec->sec_name, RODATA_REL_SEC)) {
>
> where does .rel.rodata come from?
>
> and we don't need to adjust the contents of any of those sections, right?...
>
> can you please add some tests validating that two object files with
> jumptables can be linked together and end up with proper combined
> .jumptables section?
>
>
> and in terms of code, can we do
>
> } else if (strcmp(..., JUMPTABLES_REL_SEC) == 0) {
> /* nothing to do for .rel.jumptables */
> } else {
> pr_warn(...);
> }
>
> It makes it more apparent what is supported and what's not.
Yes, sure. The rodata might be obsolete, I will check, and
.rel.jumptables is actually not used. This should be cleaned up
once LLVM patch stabilizes. Thanks for noticing this,
this way it is for sure added to my checklist :-)
>
> > + pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
> > + src_sec->sec_name);
> > return -EINVAL;
> > }
> > }
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-21 13:05 ` Anton Protopopov
@ 2025-08-21 18:14 ` Andrii Nakryiko
2025-08-21 19:12 ` Anton Protopopov
0 siblings, 1 reply; 38+ messages in thread
From: Andrii Nakryiko @ 2025-08-21 18:14 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On Thu, Aug 21, 2025 at 6:00 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> On 25/08/20 05:20PM, Andrii Nakryiko wrote:
> > On Sat, Aug 16, 2025 at 11:02 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > >
> > > For v5 instruction set, LLVM now is allowed to generate indirect
> > > jumps for switch statements and for 'goto *rX' assembly. Every such a
> > > jump will be accompanied by necessary metadata, e.g. (`llvm-objdump
> > > -Sr ...`):
> > >
> > > 0: r2 = 0x0 ll
> > > 0000000000000030: R_BPF_64_64 BPF.JT.0.0
> > >
> > > Here BPF.JT.1.0 is a symbol residing in the .jumptables section:
> > >
> > > Symbol table:
> > > 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
> > >
> > > The -bpf-min-jump-table-entries llvm option may be used to control
> > > the minimal size of a switch which will be converted to an indirect
> > > jumps.
> > >
> > > The code generated by LLVM for a switch will look, approximately,
> > > like this:
> > >
> > > 0: rX <- jump_table_x[i]
> > > 2: rX <<= 3
> > > 3: gotox *rX
> > >
> > > Right now there is no robust way to associate the jump with the
> > > corresponding map, so libbpf doesn't insert map file descriptor
> > > inside the gotox instruction.
> >
> > Just from the commit description it's not clear whether that's
> > something that needs fixing or is OK? If it's OK, why call it out?..
>
> Right, will rephrase.
>
> The idea here is that if you have, say, a switch, then, most
> probably, it is compiled into 1 jump table and 1 gotox. And, if
> compiler can provide enough metadata, then this makes sense for
> libbpf to also associate JT with gotox by inserting the same map
> descriptor inside both instructions. However now this doesn't
> work, and also there are cases when one gotox can be associated with
> multiple JTs.
Ok, and right now we'll basically generate two identical BPF maps? If
we wanted to optimize this, wouldn't it be sufficient to just reuse
maps if relocation points to the same symbol?
>
> > >
> > > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > > ---
> > > .../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> > > tools/bpf/bpftool/map.c | 2 +-
> > > tools/lib/bpf/libbpf.c | 159 +++++++++++++++---
> > > tools/lib/bpf/libbpf_probes.c | 4 +
> > > tools/lib/bpf/linker.c | 12 +-
> > > 5 files changed, 153 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > > index 252e4c538edb..3377d4a01c62 100644
> > > --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > > +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > > @@ -55,7 +55,7 @@ MAP COMMANDS
> > > | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
> > > | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> > > | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> > > -| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
> > > +| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
> > >
> > > DESCRIPTION
> > > ===========
> > > diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> > > index c9de44a45778..79b90f274bef 100644
> > > --- a/tools/bpf/bpftool/map.c
> > > +++ b/tools/bpf/bpftool/map.c
> > > @@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
> > > " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
> > > " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> > > " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> > > - " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
> > > + " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
> > > " " HELP_SPEC_OPTIONS " |\n"
> > > " {-f|--bpffs} | {-n|--nomount} }\n"
> > > "",
> >
> > bpftool changes sifted through into libbpf patch?
>
> Yes thanks. I think I've sqhashed the fix here, becase it broke
> the `test_progs -a libbpf_str` test.
>
libbpf_str test doesn't rely on bpftool, so fixing up selftest in the
same patch makes sense (to not break bisection), but bpftool changes
still make no change and should be done separately
[...]
> >
> > > +
> > > + return -prog->sec_insn_off;
> >
> > why this return value?... can you elaborate?
>
> Jump tables generated by LLVM contain offsets relative to the
> beginning of a section. The offsets inside a BPF_INSN_ARRAY
> are absolute (for a "load unit", i.e., insns in bpf_prog_load).
> So if, say, a section A contains two progs, f1 and f2, then,
> f1 starts at 0 and f2 at F2_START. So when the f2 is loaded
> jump tables needs to be adjusted by -F2_START such that offsets
> are correct.
the thing I missed is that this isn't some sort of error condition,
it's just when offset falls into main program function
naming is also a bit misleading, IMO because it doesn't just return
instruction offset, but rather an *adjustment* to an offset in jump
table
[...]
> > where does .rel.rodata come from?
> >
> > and we don't need to adjust the contents of any of those sections, right?...
> >
> > can you please add some tests validating that two object files with
> > jumptables can be linked together and end up with proper combined
> > .jumptables section?
> >
> >
> > and in terms of code, can we do
> >
> > } else if (strcmp(..., JUMPTABLES_REL_SEC) == 0) {
> > /* nothing to do for .rel.jumptables */
> > } else {
> > pr_warn(...);
> > }
> >
> > It makes it more apparent what is supported and what's not.
>
> Yes, sure. The rodata might be obsolete, I will check, and
> .rel.jumptables is actually not used. This should be cleaned up
> once LLVM patch stabilizes. Thanks for noticing this,
> this way it is for sure added to my checklist :-)
>
ok, thanks
> >
> > > + pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
> > > + src_sec->sec_name);
> > > return -EINVAL;
> > > }
> > > }
> > > --
> > > 2.34.1
> > >
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-21 18:14 ` Andrii Nakryiko
@ 2025-08-21 19:12 ` Anton Protopopov
0 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-21 19:12 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On 25/08/21 11:14AM, Andrii Nakryiko wrote:
> On Thu, Aug 21, 2025 at 6:00 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > On 25/08/20 05:20PM, Andrii Nakryiko wrote:
> > > On Sat, Aug 16, 2025 at 11:02 AM Anton Protopopov
> > > <a.s.protopopov@gmail.com> wrote:
> > > >
> > > > For v5 instruction set, LLVM now is allowed to generate indirect
> > > > jumps for switch statements and for 'goto *rX' assembly. Every such a
> > > > jump will be accompanied by necessary metadata, e.g. (`llvm-objdump
> > > > -Sr ...`):
> > > >
> > > > 0: r2 = 0x0 ll
> > > > 0000000000000030: R_BPF_64_64 BPF.JT.0.0
> > > >
> > > > Here BPF.JT.1.0 is a symbol residing in the .jumptables section:
> > > >
> > > > Symbol table:
> > > > 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
> > > >
> > > > The -bpf-min-jump-table-entries llvm option may be used to control
> > > > the minimal size of a switch which will be converted to an indirect
> > > > jumps.
> > > >
> > > > The code generated by LLVM for a switch will look, approximately,
> > > > like this:
> > > >
> > > > 0: rX <- jump_table_x[i]
> > > > 2: rX <<= 3
> > > > 3: gotox *rX
> > > >
> > > > Right now there is no robust way to associate the jump with the
> > > > corresponding map, so libbpf doesn't insert map file descriptor
> > > > inside the gotox instruction.
> > >
> > > Just from the commit description it's not clear whether that's
> > > something that needs fixing or is OK? If it's OK, why call it out?..
> >
> > Right, will rephrase.
> >
> > The idea here is that if you have, say, a switch, then, most
> > probably, it is compiled into 1 jump table and 1 gotox. And, if
> > compiler can provide enough metadata, then this makes sense for
> > libbpf to also associate JT with gotox by inserting the same map
> > descriptor inside both instructions. However now this doesn't
> > work, and also there are cases when one gotox can be associated with
> > multiple JTs.
>
> Ok, and right now we'll basically generate two identical BPF maps? If
> we wanted to optimize this, wouldn't it be sufficient to just reuse
> maps if relocation points to the same symbol?
No, right now the gotox doesn't contain a map, only ldimm64. In
check_cfg when the verifier encounters a gotox instruction it finds
all the potential jump tables for that subprog. In the later stage
for a `gotox Rx` the verifier knows the exact map from which Rx was
loaded, and can verify precisely.
> > > > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > > > ---
> > > > .../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> > > > tools/bpf/bpftool/map.c | 2 +-
> > > > tools/lib/bpf/libbpf.c | 159 +++++++++++++++---
> > > > tools/lib/bpf/libbpf_probes.c | 4 +
> > > > tools/lib/bpf/linker.c | 12 +-
> > > > 5 files changed, 153 insertions(+), 26 deletions(-)
> > > >
> > > > diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > > > index 252e4c538edb..3377d4a01c62 100644
> > > > --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > > > +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > > > @@ -55,7 +55,7 @@ MAP COMMANDS
> > > > | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
> > > > | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> > > > | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> > > > -| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
> > > > +| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
> > > >
> > > > DESCRIPTION
> > > > ===========
> > > > diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> > > > index c9de44a45778..79b90f274bef 100644
> > > > --- a/tools/bpf/bpftool/map.c
> > > > +++ b/tools/bpf/bpftool/map.c
> > > > @@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
> > > > " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
> > > > " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> > > > " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> > > > - " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
> > > > + " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
> > > > " " HELP_SPEC_OPTIONS " |\n"
> > > > " {-f|--bpffs} | {-n|--nomount} }\n"
> > > > "",
> > >
> > > bpftool changes sifted through into libbpf patch?
> >
> > Yes thanks. I think I've sqhashed the fix here, becase it broke
> > the `test_progs -a libbpf_str` test.
> >
>
> libbpf_str test doesn't rely on bpftool, so fixing up selftest in the
> same patch makes sense (to not break bisection), but bpftool changes
> still make no change and should be done separately
Yes, seems that you're right. I think I also was fixing the
./test_bpftool.py and squashed similar changes into the libbpf
commit. I will check and split before resending.
> [...]
>
> > >
> > > > +
> > > > + return -prog->sec_insn_off;
> > >
> > > why this return value?... can you elaborate?
> >
> > Jump tables generated by LLVM contain offsets relative to the
> > beginning of a section. The offsets inside a BPF_INSN_ARRAY
> > are absolute (for a "load unit", i.e., insns in bpf_prog_load).
> > So if, say, a section A contains two progs, f1 and f2, then,
> > f1 starts at 0 and f2 at F2_START. So when the f2 is loaded
> > jump tables needs to be adjusted by -F2_START such that offsets
> > are correct.
>
> the thing I missed is that this isn't some sort of error condition,
> it's just when offset falls into main program function
>
> naming is also a bit misleading, IMO because it doesn't just return
> instruction offset, but rather an *adjustment* to an offset in jump
> table
Yeah, and I think it is even named appropriately in the call site.
I will check how to make this more transparent for the reader.
> [...]
>
> > > where does .rel.rodata come from?
> > >
> > > and we don't need to adjust the contents of any of those sections, right?...
> > >
> > > can you please add some tests validating that two object files with
> > > jumptables can be linked together and end up with proper combined
> > > .jumptables section?
> > >
> > >
> > > and in terms of code, can we do
> > >
> > > } else if (strcmp(..., JUMPTABLES_REL_SEC) == 0) {
> > > /* nothing to do for .rel.jumptables */
> > > } else {
> > > pr_warn(...);
> > > }
> > >
> > > It makes it more apparent what is supported and what's not.
> >
> > Yes, sure. The rodata might be obsolete, I will check, and
> > .rel.jumptables is actually not used. This should be cleaned up
> > once LLVM patch stabilizes. Thanks for noticing this,
> > this way it is for sure added to my checklist :-)
> >
>
> ok, thanks
>
> > >
> > > > + pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
> > > > + src_sec->sec_name);
> > > > return -EINVAL;
> > > > }
> > > > }
> > > > --
> > > > 2.34.1
> > > >
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack
2025-08-16 18:06 ` [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack Anton Protopopov
@ 2025-08-25 18:12 ` Eduard Zingerman
2025-08-26 15:00 ` Anton Protopopov
0 siblings, 1 reply; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-25 18:12 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song
On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
The change makes sense to me, please see a few comments below.
[...]
> @@ -2111,12 +2111,12 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
> env->stack_size++;
> err = copy_verifier_state(&elem->st, cur);
> if (err)
> - return NULL;
> + return ERR_PTR(-ENOMEM);
> elem->st.speculative |= speculative;
> if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
> verbose(env, "The sequence of %d jumps is too complex.\n",
> env->stack_size);
> - return NULL;
> + return ERR_PTR(-EFAULT);
Nit: this should be -E2BIG, I think.
> }
> if (elem->st.parent) {
> ++elem->st.parent->branches;
> @@ -2912,7 +2912,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
>
> elem = kzalloc(sizeof(struct bpf_verifier_stack_elem), GFP_KERNEL_ACCOUNT);
> if (!elem)
> - return NULL;
> + return ERR_PTR(-ENOMEM);
>
> elem->insn_idx = insn_idx;
> elem->prev_insn_idx = prev_insn_idx;
> @@ -2924,7 +2924,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
> verbose(env,
> "The sequence of %d jumps is too complex for async cb.\n",
> env->stack_size);
> - return NULL;
> + return ERR_PTR(-EFAULT);
(and here too)
> }
> /* Unlike push_stack() do not copy_verifier_state().
> * The caller state doesn't matter.
[...]
> @@ -14217,7 +14217,7 @@ sanitize_speculative_path(struct bpf_verifier_env *env,
> struct bpf_reg_state *regs;
>
> branch = push_stack(env, next_idx, curr_idx, true);
> - if (branch && insn) {
> + if (!IS_ERR(branch) && insn) {
Note: branch returned by `sanitize_speculative_path` is never used.
Maybe change the function to return `int` and do the regular
err = sanitize_speculative_path()
if (err)
return err;
thing here?
> regs = branch->frame[branch->curframe]->regs;
> if (BPF_SRC(insn->code) == BPF_K) {
> mark_reg_unknown(env, regs, insn->dst_reg);
[...]
> @@ -16721,8 +16720,7 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> * execution.
> */
> if (!env->bypass_spec_v1 &&
> - !sanitize_speculative_path(env, insn, *insn_idx + 1,
> - *insn_idx))
> + IS_ERR(sanitize_speculative_path(env, insn, *insn_idx + 1, *insn_idx)))
> return -EFAULT;
I think the error code should be taken from the return value of the
sanitize_speculative_path().
> if (env->log.level & BPF_LOG_LEVEL)
> print_insn_state(env, this_branch, this_branch->curframe);
> @@ -16734,9 +16732,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> * simulation under speculative execution.
> */
> if (!env->bypass_spec_v1 &&
> - !sanitize_speculative_path(env, insn,
> - *insn_idx + insn->off + 1,
> - *insn_idx))
> + IS_ERR(sanitize_speculative_path(env, insn,
> + *insn_idx + insn->off + 1,
> + *insn_idx)))
Same here.
> return -EFAULT;
> if (env->log.level & BPF_LOG_LEVEL)
> print_insn_state(env, this_branch, this_branch->curframe);
[...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array
2025-08-16 18:06 ` [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array Anton Protopopov
@ 2025-08-25 21:05 ` Eduard Zingerman
2025-08-26 15:52 ` Anton Protopopov
0 siblings, 1 reply; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-25 21:05 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song
On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
[...]
> --- /dev/null
> +++ b/kernel/bpf/bpf_insn_array.c
[...]
> +int bpf_insn_array_ready(struct bpf_map *map)
> +{
> + struct bpf_insn_array *insn_array = cast_insn_array(map);
> + guard(mutex)(&insn_array->state_mutex);
> + int i;
> +
> + for (i = 0; i < map->max_entries; i++) {
> + if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
> + continue;
> + if (!insn_array->ips[i]) {
> + /*
> + * Set the map free on failure; the program owning it
> + * might be re-loaded with different log level
> + */
> + insn_array->state = INSN_ARRAY_STATE_FREE;
> + return -EFAULT;
This shouldn't happen, right?
If so, maybe use verifier_bug here with some description?
(and move bpf_insn_array_ready() call to verifier.c:bpf_check(),
so that the log pointer is available).
> + }
> + }
> +
> + insn_array->state = INSN_ARRAY_STATE_READY;
> + return 0;
> +}
[...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-16 18:06 ` [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps Anton Protopopov
2025-08-18 7:57 ` Dan Carpenter
@ 2025-08-25 23:15 ` Eduard Zingerman
2025-08-27 15:34 ` Anton Protopopov
2025-08-28 9:58 ` Anton Protopopov
1 sibling, 2 replies; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-25 23:15 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song
On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
> Add support for a new instruction
>
> BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0[, imm=fd(M)]
^^^^^^^^^^^^^
Do we really need to support this now?
[...]
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 4bfb4faab4d7..f419a89b0147 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -671,9 +671,11 @@ static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)
> *pprog = prog;
> }
>
> -static void emit_indirect_jump(u8 **pprog, int reg, bool ereg, u8 *ip)
> +static void emit_indirect_jump(u8 **pprog, int bpf_reg, u8 *ip)
Nit: maybe make this change a part of the previous patch?
> {
> u8 *prog = *pprog;
> + int reg = reg2hex[bpf_reg];
> + bool ereg = is_ereg(bpf_reg);
>
> if (cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) {
> OPTIMIZER_HIDE_VAR(reg);
[...]
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index aca43c284203..6e68e0082c81 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -77,7 +77,15 @@ struct bpf_reg_state {
> * the map_uid is non-zero for registers
> * pointing to inner maps.
> */
> - u32 map_uid;
> + union {
> + u32 map_uid;
> +
> + /* Used to track boundaries of a PTR_TO_INSN */
> + struct {
> + u32 min_index;
> + u32 max_index;
Could you please elaborate why these fields are necessary?
It appears that .var_off/.{s,u}{32_,}{min,max}_value fields can be
used to track current index bounds (min/max fields for bounds,
.var_off field to check 8-byte alignment).
> + };
> + };
> };
>
> /* for PTR_TO_BTF_ID */
> @@ -542,6 +550,11 @@ struct bpf_insn_aux_data {
> struct {
> u32 map_index; /* index into used_maps[] */
> u32 map_off; /* offset from value base address */
> +
> + struct jt { /* jump table for gotox instruction */
^^
should this be anonymous or have a `bpf_` prefix?
> + u32 *off;
> + int off_cnt;
> + } jt;
> };
> struct {
> enum bpf_reg_type reg_type; /* type of pseudo_btf_id */
[...]
> diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c
> index 0c8dac62f457..d077a5aa2c7c 100644
> --- a/kernel/bpf/bpf_insn_array.c
> +++ b/kernel/bpf/bpf_insn_array.c
> @@ -1,7 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
>
> #include <linux/bpf.h>
> -#include <linux/sort.h>
>
> #define MAX_INSN_ARRAY_ENTRIES 256
>
> @@ -173,6 +172,20 @@ static u64 insn_array_mem_usage(const struct bpf_map *map)
> return insn_array_alloc_size(map->max_entries) + extra_size;
> }
>
> +static int insn_array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
> +{
> + struct bpf_insn_array *insn_array = cast_insn_array(map);
> +
> + if ((off % sizeof(long)) != 0 ||
> + (off / sizeof(long)) >= map->max_entries)
> + return -EINVAL;
> +
> + /* from BPF's point of view, this map is a jump table */
> + *imm = (unsigned long)insn_array->ips + off / sizeof(long);
> +
> + return 0;
> +}
> +
This function is called during main verification pass by
verifier.c:check_mem_access() -> verifier.c:bpf_map_direct_read().
However, insn_array->ips is filled by bpf_jit_comp.c:do_jit()
bpf_insn_array.c:bpf_prog_update_insn_ptr(), which is called *after*
main verification pass. Do I miss something, or this can't work?
> BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
>
> const struct bpf_map_ops insn_array_map_ops = {
[...]
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 863b7114866b..c2cfa55913f8 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
[...]
> @@ -6072,6 +6084,14 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> return 0;
> }
>
> +static u32 map_mem_size(const struct bpf_map *map)
Nit: It is a bit non-obvious why this function returns the size of a
single value for all map types except insn array. Maybe add a
comment here, something like:
Return the size of the memory region accessible from a pointer
to map value. For INSN_ARRAY maps whole bpf_insn_array->ips
array is accessible.
> +{
> + if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY)
> + return map->max_entries * sizeof(long);
^^^^^^^^^^^^
Nit: sizeof_field(struct bpf_insn_array, ips) ?
> +
> + return map->value_size;
> +}
> +
> /* check read/write into a map element with possible variable offset */
> static int check_map_access(struct bpf_verifier_env *env, u32 regno,
> int off, int size, bool zero_size_allowed,
[...]
> @@ -7820,6 +7849,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
> allow_trust_mismatch);
> err = err ?: reg_bounds_sanity_check(env, ®s[insn->dst_reg], ctx);
>
> + if (map_ptr_copy) {
> + regs[insn->dst_reg].type = PTR_TO_INSN;
> + regs[insn->dst_reg].map_ptr = map_ptr_copy;
> + regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
> + regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
> + }
> +
I think this should be handled inside check_mem_access(), see case for
reg->type == PTR_TO_MAP_VALUE.
> return err;
> }
>
[...]
> @@ -14554,6 +14592,36 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>
> switch (opcode) {
> case BPF_ADD:
> + if (ptr_to_insn_array) {
> + u32 min_index = dst_reg->min_index;
> + u32 max_index = dst_reg->max_index;
> +
> + if ((umin_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
> + verbose(env, "umin_value %llu + offset %u is too big to convert to index\n",
> + umin_val, ptr_reg->off);
> + return -EACCES;
> + }
> + if ((umax_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
> + verbose(env, "umax_value %llu + offset %u is too big to convert to index\n",
> + umax_val, ptr_reg->off);
> + return -EACCES;
> + }
> +
> + min_index += (umin_val + ptr_reg->off) / sizeof(long);
> + max_index += (umax_val + ptr_reg->off) / sizeof(long);
> +
> + if (min_index >= ptr_reg->map_ptr->max_entries) {
> + verbose(env, "min_index %u points to outside of map\n", min_index);
> + return -EACCES;
> + }
> + if (max_index >= ptr_reg->map_ptr->max_entries) {
> + verbose(env, "max_index %u points to outside of map\n", max_index);
> + return -EACCES;
> + }
> +
> + dst_reg->min_index = min_index;
> + dst_reg->max_index = max_index;
> + }
I think this and the following hunk would disappear if {min,max}_index
are replaced by regular offset tracking mechanics.
> /* We can take a fixed offset as long as it doesn't overflow
> * the s32 'off' field
> */
> @@ -14598,6 +14666,11 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
> }
> break;
> case BPF_SUB:
> + if (ptr_to_insn_array) {
> + verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> + bpf_alu_string[opcode >> 4]);
> + return -EACCES;
> + }
> if (dst_reg == off_reg) {
> /* scalar -= pointer. Creates an unknown scalar */
> verbose(env, "R%d tried to subtract pointer from scalar\n",
> @@ -16943,7 +17016,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> }
> dst_reg->type = PTR_TO_MAP_VALUE;
> dst_reg->off = aux->map_off;
> - WARN_ON_ONCE(map->max_entries != 1);
> + WARN_ON_ONCE(map->map_type != BPF_MAP_TYPE_INSN_ARRAY &&
> + map->max_entries != 1);
Q: when is this necessary?
> /* We want reg->id to be same (0) as map_value is not distinct */
> } else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
> insn->src_reg == BPF_PSEUDO_MAP_IDX) {
> @@ -17696,6 +17770,246 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
> return 0;
> }
>
> +#define SET_HIGH(STATE, LAST) STATE = (STATE & 0xffffU) | ((LAST) << 16)
> +#define GET_HIGH(STATE) ((u16)((STATE) >> 16))
> +
> +static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct jt *jt)
I think check_cfg() can be refactored to use insn_successors().
In such a case it won't be necessary to special case gotox processing
(appart from insn_aux->jt allocation).
> +{
> + int *insn_stack = env->cfg.insn_stack;
> + int *insn_state = env->cfg.insn_state;
> + u16 prev;
> + int w;
> +
> + for (prev = GET_HIGH(insn_state[t]); prev < jt->off_cnt; prev++) {
> + w = jt->off[prev];
> +
> + /* EXPLORED || DISCOVERED */
> + if (insn_state[w])
> + continue;
> +
> + break;
> + }
> +
> + if (prev == jt->off_cnt)
> + return DONE_EXPLORING;
> +
> + mark_prune_point(env, t);
> +
> + if (env->cfg.cur_stack >= env->prog->len)
> + return -E2BIG;
> + insn_stack[env->cfg.cur_stack++] = w;
> +
> + mark_jmp_point(env, w);
> +
> + SET_HIGH(insn_state[t], prev + 1);
> + return KEEP_EXPLORING;
> +}
> +
> +static int copy_insn_array(struct bpf_map *map, u32 start, u32 end, u32 *off)
> +{
> + struct bpf_insn_array_value *value;
> + u32 i;
> +
> + for (i = start; i <= end; i++) {
> + value = map->ops->map_lookup_elem(map, &i);
> + if (!value)
> + return -EINVAL;
> + off[i - start] = value->xlated_off;
> + }
> + return 0;
> +}
> +
> +static int cmp_ptr_to_u32(const void *a, const void *b)
> +{
> + return *(u32 *)a - *(u32 *)b;
> +}
This will overflow for e.g. `0 - 8`.
> +
> +static int sort_insn_array_uniq(u32 *off, int off_cnt)
> +{
> + int unique = 1;
> + int i;
> +
> + sort(off, off_cnt, sizeof(off[0]), cmp_ptr_to_u32, NULL);
> +
> + for (i = 1; i < off_cnt; i++)
> + if (off[i] != off[unique - 1])
> + off[unique++] = off[i];
> +
> + return unique;
> +}
> +
> +/*
> + * sort_unique({map[start], ..., map[end]}) into off
> + */
> +static int copy_insn_array_uniq(struct bpf_map *map, u32 start, u32 end, u32 *off)
> +{
> + u32 n = end - start + 1;
> + int err;
> +
> + err = copy_insn_array(map, start, end, off);
> + if (err)
> + return err;
> +
> + return sort_insn_array_uniq(off, n);
> +}
> +
> +/*
> + * Copy all unique offsets from the map
> + */
> +static int jt_from_map(struct bpf_map *map, struct jt *jt)
> +{
> + u32 *off;
> + int n;
> +
> + off = kvcalloc(map->max_entries, sizeof(u32), GFP_KERNEL_ACCOUNT);
> + if (!off)
> + return -ENOMEM;
> +
> + n = copy_insn_array_uniq(map, 0, map->max_entries - 1, off);
> + if (n < 0) {
> + kvfree(off);
> + return n;
> + }
> +
> + jt->off = off;
> + jt->off_cnt = n;
> + return 0;
> +}
> +
> +/*
> + * Find and collect all maps which fit in the subprog. Return the result as one
> + * combined jump table in jt->off (allocated with kvcalloc
> + */
> +static int jt_from_subprog(struct bpf_verifier_env *env,
> + int subprog_start,
> + int subprog_end,
> + struct jt *jt)
> +{
> + struct bpf_map *map;
> + struct jt jt_cur;
> + u32 *off;
> + int err;
> + int i;
> +
> + jt->off = NULL;
> + jt->off_cnt = 0;
> +
> + for (i = 0; i < env->insn_array_map_cnt; i++) {
> + /*
> + * TODO (when needed): collect only jump tables, not static keys
> + * or maps for indirect calls
> + */
> + map = env->insn_array_maps[i];
> +
> + err = jt_from_map(map, &jt_cur);
> + if (err) {
> + kvfree(jt->off);
> + return err;
> + }
> +
> + /*
> + * This is enough to check one element. The full table is
> + * checked to fit inside the subprog later in create_jt()
> + */
> + if (jt_cur.off[0] >= subprog_start && jt_cur.off[0] < subprog_end) {
This won't always catch cases when insn array references offsets from
several subprograms. Also is one subprogram limitation really necessary?
> + off = kvrealloc(jt->off, (jt->off_cnt + jt_cur.off_cnt) << 2, GFP_KERNEL_ACCOUNT);
> + if (!off) {
> + kvfree(jt_cur.off);
> + kvfree(jt->off);
> + return -ENOMEM;
> + }
> + memcpy(off + jt->off_cnt, jt_cur.off, jt_cur.off_cnt << 2);
> + jt->off = off;
> + jt->off_cnt += jt_cur.off_cnt;
> + }
> +
> + kvfree(jt_cur.off);
> + }
> +
> + if (jt->off == NULL) {
> + verbose(env, "no jump tables found for subprog starting at %u\n", subprog_start);
> + return -EINVAL;
> + }
> +
> + jt->off_cnt = sort_insn_array_uniq(jt->off, jt->off_cnt);
> + return 0;
> +}
> +
> +static int create_jt(int t, struct bpf_verifier_env *env, int fd, struct jt *jt)
> +{
> + static struct bpf_subprog_info *subprog;
> + int subprog_idx, subprog_start, subprog_end;
> + struct bpf_map *map;
> + int map_idx;
> + int ret;
> + int i;
> +
> + if (env->subprog_cnt == 0)
> + return -EFAULT;
> +
> + subprog_idx = find_containing_subprog_idx(env, t);
> + if (subprog_idx < 0) {
> + verbose(env, "can't find subprog containing instruction %d\n", t);
> + return -EFAULT;
> + }
> + subprog = &env->subprog_info[subprog_idx];
> + subprog_start = subprog->start;
> + subprog_end = (subprog + 1)->start;
> +
> + map_idx = add_used_map(env, fd);
Will this spam the log with bogus
"fd %d is not pointing to valid bpf_map\n" messages if gotox does not
specify fd?
> + if (map_idx >= 0) {
> + map = env->used_maps[map_idx];
> + if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY) {
> + verbose(env, "map type %d in the gotox insn %d is incorrect\n",
> + map->map_type, t);
> + return -EINVAL;
> + }
> +
> + env->insn_aux_data[t].map_index = map_idx;
> +
> + ret = jt_from_map(map, jt);
> + if (ret)
> + return ret;
> + } else {
> + ret = jt_from_subprog(env, subprog_start, subprog_end, jt);
> + if (ret)
> + return ret;
> + }
> +
> + /* Check that the every element of the jump table fits within the given subprogram */
> + for (i = 0; i < jt->off_cnt; i++) {
> + if (jt->off[i] < subprog_start || jt->off[i] >= subprog_end) {
> + verbose(env, "jump table for insn %d points outside of the subprog [%u,%u]",
> + t, subprog_start, subprog_end);
> + return -EINVAL;
> + }
> + }
> +
> + return 0;
> +}
> +
> +/* "conditional jump with N edges" */
> +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> +{
> + struct jt *jt = &env->insn_aux_data[t].jt;
> + int ret;
> +
> + if (jt->off == NULL) {
> + ret = create_jt(t, env, fd, jt);
> + if (ret)
> + return ret;
> + }
> +
> + /*
> + * Mark jt as allocated. Otherwise, this is not possible to check if it
> + * was allocated or not in the code which frees memory (jt is a part of
> + * union)
> + */
> + env->insn_aux_data[t].jt_allocated = true;
> +
> + return push_goto_x_edge(t, env, jt);
> +}
> +
> /* Visits the instruction at index t and returns one of the following:
> * < 0 - an error occurred
> * DONE_EXPLORING - the instruction was fully explored
> @@ -17786,8 +18100,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
>
> case BPF_JA:
> - if (BPF_SRC(insn->code) != BPF_K)
> - return -EINVAL;
> + if (BPF_SRC(insn->code) == BPF_X)
> + return visit_goto_x_insn(t, env, insn->imm);
>
> if (BPF_CLASS(insn->code) == BPF_JMP)
> off = insn->off;
[...]
> @@ -18679,6 +19000,10 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
> case PTR_TO_ARENA:
> return true;
> + case PTR_TO_INSN:
> + /* cur ⊆ old */
Out of curiosity: are unicode symbols allowed in kernel source code?
> + return (rcur->min_index >= rold->min_index &&
> + rcur->max_index <= rold->max_index);
> default:
> return regs_exact(rold, rcur, idmap);
> }
> @@ -19825,6 +20150,67 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> return PROCESS_BPF_EXIT;
> }
>
> +/* gotox *dst_reg */
> +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> +{
> + struct bpf_verifier_state *other_branch;
> + struct bpf_reg_state *dst_reg;
> + struct bpf_map *map;
> + int err = 0;
> + u32 *xoff;
> + int n;
> + int i;
> +
> + dst_reg = reg_state(env, insn->dst_reg);
> + if (dst_reg->type != PTR_TO_INSN) {
> + verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> + insn->dst_reg, dst_reg->type);
> + return -EINVAL;
> + }
> +
> + map = dst_reg->map_ptr;
> + if (!map)
> + return -EINVAL;
Is this a verifier bug or legit situation?
If it is a bug, maybe add a verifier_bug() here and return -EFAULT?
> +
> + if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY)
> + return -EINVAL;
Same question here, ->type is already `PTR_TO_INSN`.
> +
> + if (dst_reg->max_index >= map->max_entries) {
> + verbose(env, "BPF_JA|BPF_X R%d is out of map boundaries: index=%u, max_index=%u\n",
> + insn->dst_reg, dst_reg->max_index, map->max_entries-1);
> + return -EINVAL;
> + }
> +
> + xoff = kvcalloc(dst_reg->max_index - dst_reg->min_index + 1, sizeof(u32), GFP_KERNEL_ACCOUNT);
> + if (!xoff)
> + return -ENOMEM;
> +
> + n = copy_insn_array_uniq(map, dst_reg->min_index, dst_reg->max_index, xoff);
Nit: I'd avoid this allocation and do a loop for(i = min_index; i <= max_index; i++),
with map->ops->map_lookup_elem(map, &i) (or a wrapper) inside it.
> + if (n < 0) {
> + err = n;
> + goto free_off;
> + }
> + if (n == 0) {
> + verbose(env, "register R%d doesn't point to any offset in map id=%d\n",
> + insn->dst_reg, map->id);
> + err = -EINVAL;
> + goto free_off;
> + }
> +
> + for (i = 0; i < n - 1; i++) {
> + other_branch = push_stack(env, xoff[i], env->insn_idx, false);
> + if (IS_ERR(other_branch)) {
> + err = PTR_ERR(other_branch);
> + goto free_off;
> + }
> + }
> + env->insn_idx = xoff[n-1];
> +
> +free_off:
> + kvfree(xoff);
> + return err;
> +}
> +
> static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> {
> int err;
[...]
> @@ -20981,6 +21371,23 @@ static int bpf_adj_linfo_after_remove(struct bpf_verifier_env *env, u32 off,
> return 0;
> }
>
> +/*
> + * Clean up dynamically allocated fields of aux data for instructions [start, ..., end]
> + */
> +static void clear_insn_aux_data(struct bpf_insn_aux_data *aux_data, int start, int end)
Nit: switching this to (..., int start, int len) would simplify arithmetic at call sites.
> +{
> + int i;
> +
> + for (i = start; i <= end; i++) {
> + if (aux_data[i].jt_allocated) {
> + kvfree(aux_data[i].jt.off);
> + aux_data[i].jt.off = NULL;
> + aux_data[i].jt.off_cnt = 0;
> + aux_data[i].jt_allocated = false;
> + }
> + }
> +}
> +
> static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
> {
> struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
[...]
> @@ -24175,18 +24586,18 @@ static bool can_jump(struct bpf_insn *insn)
> return false;
> }
>
> -static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> +static int insn_successors_regular(struct bpf_prog *prog, u32 insn_idx, u32 *succ)
> {
> - struct bpf_insn *insn = &prog->insnsi[idx];
> + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> int i = 0, insn_sz;
> u32 dst;
>
> insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> - if (can_fallthrough(insn) && idx + 1 < prog->len)
> - succ[i++] = idx + insn_sz;
> + if (can_fallthrough(insn) && insn_idx + 1 < prog->len)
> + succ[i++] = insn_idx + insn_sz;
>
> if (can_jump(insn)) {
> - dst = idx + jmp_offset(insn) + 1;
> + dst = insn_idx + jmp_offset(insn) + 1;
> if (i == 0 || succ[0] != dst)
> succ[i++] = dst;
> }
> @@ -24194,6 +24605,36 @@ static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> return i;
> }
>
> +static int insn_successors_gotox(struct bpf_verifier_env *env,
> + struct bpf_prog *prog,
> + u32 insn_idx, u32 **succ)
> +{
> + struct jt *jt = &env->insn_aux_data[insn_idx].jt;
> +
> + if (WARN_ON_ONCE(!jt->off || !jt->off_cnt))
> + return -EFAULT;
> +
> + *succ = jt->off;
> + return jt->off_cnt;
> +}
> +
> +/*
> + * Fill in *succ[0],...,*succ[n-1] with successors. The default *succ
> + * pointer (of size 2) may be replaced with a custom one if more
> + * elements are required (i.e., an indirect jump).
> + */
> +static int insn_successors(struct bpf_verifier_env *env,
> + struct bpf_prog *prog,
> + u32 insn_idx, u32 **succ)
> +{
> + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> +
> + if (unlikely(insn_is_gotox(insn)))
> + return insn_successors_gotox(env, prog, insn_idx, succ);
> +
> + return insn_successors_regular(prog, insn_idx, *succ);
> +}
> +
The `prog` parameter can be dropped, as it is accessible from `env`.
I don't like the `u32 **succ` part of this interface.
What about one of the following alternatives:
- u32 *insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
and `u32 succ_buf[2]` added to bpf_verifier_env?
- int insn_successor(struct bpf_verifier_env *env, u32 insn_idx, u32 succ_num):
bool fallthrough = can_fallthrough(insn);
bool jump = can_jump(insn);
if (succ_num == 0) {
if (fallthrough)
return <next insn>
if (jump)
return <jump tgt>
} else if (succ_num == 1) {
if (fallthrough && jump)
return <jmp tgt>
} else if (is_gotox) {
return <lookup>
}
return -1;
?
> /* Each field is a register bitmask */
> struct insn_live_regs {
> u16 use; /* registers read by instruction */
> @@ -24387,11 +24828,17 @@ static int compute_live_registers(struct bpf_verifier_env *env)
Could you please extend `tools/testing/selftests/bpf/progs/compute_live_registers.c`
with test cases for gotox?
> int insn_idx = env->cfg.insn_postorder[i];
> struct insn_live_regs *live = &state[insn_idx];
> int succ_num;
> - u32 succ[2];
> + u32 _succ[2];
> + u32 *succ = &_succ[0];
> u16 new_out = 0;
> u16 new_in = 0;
>
> - succ_num = insn_successors(env->prog, insn_idx, succ);
> + succ_num = insn_successors(env, env->prog, insn_idx, &succ);
> + if (succ_num < 0) {
> + err = succ_num;
> + goto out;
> +
> + }
> for (int s = 0; s < succ_num; ++s)
> new_out |= state[succ[s]].in;
> new_in = (new_out & ~live->def) | live->use;
[...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding
2025-08-16 18:06 ` [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding Anton Protopopov
2025-08-17 5:50 ` kernel test robot
@ 2025-08-25 23:29 ` Eduard Zingerman
2025-08-27 9:20 ` Anton Protopopov
1 sibling, 1 reply; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-25 23:29 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song
On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
[...]
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 5d1650af899d..27e9c30ad6dc 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
[...]
> @@ -1544,6 +1562,7 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
> }
>
> clone->blinded = 1;
> + clone->len = insn_cnt;
Is this an old bug? Does it require a separate commit and a fixes tag?
> return clone;
> }
> #endif /* CONFIG_BPF_JIT */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e1f7744e132b..863b7114866b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
[...]
> @@ -21665,7 +21666,15 @@ static int jit_subprogs(struct bpf_verifier_env *env)
> func[i]->aux->might_sleep = env->subprog_info[i].might_sleep;
> if (!i)
> func[i]->aux->exception_boundary = env->seen_exception;
> +
> + /*
> + * To properly pass the absolute subprog start to jit
> + * all instruction adjustments should be accumulated
> + */
> + instructions_added -= func[i]->len;
> func[i] = bpf_int_jit_compile(func[i]);
> + instructions_added += func[i]->len;
> +
Nit: This -= / += pair is a bit hackish, maybe add a separate variable
to compute current delta?
> if (!func[i]->jited) {
> err = -ENOTSUPP;
> goto out_free;
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-16 18:06 ` [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps Anton Protopopov
2025-08-21 0:20 ` Andrii Nakryiko
@ 2025-08-26 0:06 ` Eduard Zingerman
2025-08-26 16:15 ` Anton Protopopov
1 sibling, 1 reply; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-26 0:06 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song
On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
[...]
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index fe4fc5438678..a5f04544c09c 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
[...]
> @@ -6101,6 +6124,60 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
> insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
> }
>
> +static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
> +{
> + static union bpf_attr attr = {
> + .map_type = BPF_MAP_TYPE_INSN_ARRAY,
> + .key_size = 4,
> + .value_size = sizeof(struct bpf_insn_array_value),
> + .max_entries = 0,
> + };
> + struct bpf_insn_array_value val = {};
> + int map_fd;
> + int err;
> + __u32 i;
> + __u32 *jt;
> +
> + attr.max_entries = size / 8;
> +
> + map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> + if (map_fd < 0)
> + return map_fd;
> +
> + jt = (__u32 *)(obj->efile.jumptables_data.d_buf + off);
^^^^^^^^^
Jump table entries are u64 now, e.g. see test case:
https://github.com/llvm/llvm-project/blob/39dc3c41e459e6c847c1e45e7e93c53aaf74c1de/llvm/test/CodeGen/BPF/jump_table_swith_stmt.ll#L68
[...]
> @@ -6389,36 +6481,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
[...]
> static int
> bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> - struct bpf_program *subprog)
> + struct bpf_program *subprog)
> {
> - struct bpf_insn *insns;
> - size_t new_cnt;
> - int err;
> + struct bpf_insn *insns;
> + size_t new_cnt;
> + int err;
Could you please extract spaces vs tabs fix for this function as a separate commit?
Just to make diff easier to read.
[...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack
2025-08-25 18:12 ` Eduard Zingerman
@ 2025-08-26 15:00 ` Anton Protopopov
0 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-26 15:00 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/25 11:12AM, Eduard Zingerman wrote:
> On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
>
> The change makes sense to me, please see a few comments below.
>
> [...]
>
> > @@ -2111,12 +2111,12 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
> > env->stack_size++;
> > err = copy_verifier_state(&elem->st, cur);
> > if (err)
> > - return NULL;
> > + return ERR_PTR(-ENOMEM);
> > elem->st.speculative |= speculative;
> > if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
> > verbose(env, "The sequence of %d jumps is too complex.\n",
> > env->stack_size);
> > - return NULL;
> > + return ERR_PTR(-EFAULT);
>
> Nit: this should be -E2BIG, I think.
I didn't want to change the set of return values. Agree that the
-E2BIG error looks better here, plus there's a corresponding verifier
message, so I will resend with -E2BIG.
Also agree with all your comments below, will address them in v2.
> > }
> > if (elem->st.parent) {
> > ++elem->st.parent->branches;
> > @@ -2912,7 +2912,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
> >
> > elem = kzalloc(sizeof(struct bpf_verifier_stack_elem), GFP_KERNEL_ACCOUNT);
> > if (!elem)
> > - return NULL;
> > + return ERR_PTR(-ENOMEM);
> >
> > elem->insn_idx = insn_idx;
> > elem->prev_insn_idx = prev_insn_idx;
> > @@ -2924,7 +2924,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
> > verbose(env,
> > "The sequence of %d jumps is too complex for async cb.\n",
> > env->stack_size);
> > - return NULL;
> > + return ERR_PTR(-EFAULT);
>
> (and here too)
>
> > }
> > /* Unlike push_stack() do not copy_verifier_state().
> > * The caller state doesn't matter.
>
> [...]
>
> > @@ -14217,7 +14217,7 @@ sanitize_speculative_path(struct bpf_verifier_env *env,
> > struct bpf_reg_state *regs;
> >
> > branch = push_stack(env, next_idx, curr_idx, true);
> > - if (branch && insn) {
> > + if (!IS_ERR(branch) && insn) {
>
> Note: branch returned by `sanitize_speculative_path` is never used.
> Maybe change the function to return `int` and do the regular
>
> err = sanitize_speculative_path()
> if (err)
> return err;
>
> thing here?
>
> > regs = branch->frame[branch->curframe]->regs;
> > if (BPF_SRC(insn->code) == BPF_K) {
> > mark_reg_unknown(env, regs, insn->dst_reg);
>
> [...]
>
> > @@ -16721,8 +16720,7 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > * execution.
> > */
> > if (!env->bypass_spec_v1 &&
> > - !sanitize_speculative_path(env, insn, *insn_idx + 1,
> > - *insn_idx))
> > + IS_ERR(sanitize_speculative_path(env, insn, *insn_idx + 1, *insn_idx)))
> > return -EFAULT;
>
> I think the error code should be taken from the return value of the
> sanitize_speculative_path().
>
> > if (env->log.level & BPF_LOG_LEVEL)
> > print_insn_state(env, this_branch, this_branch->curframe);
> > @@ -16734,9 +16732,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > * simulation under speculative execution.
> > */
> > if (!env->bypass_spec_v1 &&
> > - !sanitize_speculative_path(env, insn,
> > - *insn_idx + insn->off + 1,
> > - *insn_idx))
> > + IS_ERR(sanitize_speculative_path(env, insn,
> > + *insn_idx + insn->off + 1,
> > + *insn_idx)))
>
> Same here.
>
> > return -EFAULT;
> > if (env->log.level & BPF_LOG_LEVEL)
> > print_insn_state(env, this_branch, this_branch->curframe);
>
> [...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array
2025-08-25 21:05 ` Eduard Zingerman
@ 2025-08-26 15:52 ` Anton Protopopov
2025-08-26 16:04 ` Eduard Zingerman
0 siblings, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-26 15:52 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/25 02:05PM, Eduard Zingerman wrote:
> On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
>
> [...]
>
> > --- /dev/null
> > +++ b/kernel/bpf/bpf_insn_array.c
>
> [...]
>
> > +int bpf_insn_array_ready(struct bpf_map *map)
> > +{
> > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > + guard(mutex)(&insn_array->state_mutex);
> > + int i;
> > +
> > + for (i = 0; i < map->max_entries; i++) {
> > + if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
> > + continue;
> > + if (!insn_array->ips[i]) {
> > + /*
> > + * Set the map free on failure; the program owning it
> > + * might be re-loaded with different log level
> > + */
> > + insn_array->state = INSN_ARRAY_STATE_FREE;
> > + return -EFAULT;
>
> This shouldn't happen, right?
> If so, maybe use verifier_bug here with some description?
> (and move bpf_insn_array_ready() call to verifier.c:bpf_check(),
> so that the log pointer is available).
Shouldn't happen. But, unfortunately, only after
bpf_prog_select_runtime() which is executed after bpf_check(). Might
be nice to allow jit/bpf_prog_select_runtime to use the verifier
environment.
(Not 100% similar, but related to blinding part of this series:
blinding is happening as the very first step of every jit (initially
was implemented for x86, and then copy/pasted to everywhere else).
Might be nice to move it to be one of the last stages of verifier,
then code is shared and env is available as well. For this series I
had to add a bit of custom code to support instruction arrays at this
blinding stage.)
> > + }
> > + }
> > +
> > + insn_array->state = INSN_ARRAY_STATE_READY;
> > + return 0;
> > +}
>
> [...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array
2025-08-26 15:52 ` Anton Protopopov
@ 2025-08-26 16:04 ` Eduard Zingerman
0 siblings, 0 replies; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-26 16:04 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On Tue, 2025-08-26 at 15:52 +0000, Anton Protopopov wrote:
> On 25/08/25 02:05PM, Eduard Zingerman wrote:
> > On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
> >
> > [...]
> >
> > > --- /dev/null
> > > +++ b/kernel/bpf/bpf_insn_array.c
> >
> > [...]
> >
> > > +int bpf_insn_array_ready(struct bpf_map *map)
> > > +{
> > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > + guard(mutex)(&insn_array->state_mutex);
> > > + int i;
> > > +
> > > + for (i = 0; i < map->max_entries; i++) {
> > > + if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
> > > + continue;
> > > + if (!insn_array->ips[i]) {
> > > + /*
> > > + * Set the map free on failure; the program owning it
> > > + * might be re-loaded with different log level
> > > + */
> > > + insn_array->state = INSN_ARRAY_STATE_FREE;
> > > + return -EFAULT;
> >
> > This shouldn't happen, right?
> > If so, maybe use verifier_bug here with some description?
> > (and move bpf_insn_array_ready() call to verifier.c:bpf_check(),
> > so that the log pointer is available).
>
> Shouldn't happen. But, unfortunately, only after
> bpf_prog_select_runtime() which is executed after bpf_check(). Might
> be nice to allow jit/bpf_prog_select_runtime to use the verifier
> environment.
The insn_array->ips array is filled by bpf_jit_comp.c:do_jit() ->
bpf_insn_array.c:bpf_prog_update_insn_ptr().
My initial thinking was that do_jit() is invoked by the following chain:
verifier.c:bpf_check() -> fixup_call_args() -> jit_subprogs().
Hence it appeared possible to move the above check/call to
bpf_insn_array_ready to bpf_check() itself.
However, looking at jit_subprogs() now I see:
...
if (env->subprog_cnt <= 1)
return 0;
...
<proceeds jiting all subprogs including main>
So, it looks like the case when only main subprogram is present is
special. Oh, well.
> (Not 100% similar, but related to blinding part of this series:
> blinding is happening as the very first step of every jit (initially
> was implemented for x86, and then copy/pasted to everywhere else).
> Might be nice to move it to be one of the last stages of verifier,
> then code is shared and env is available as well. For this series I
> had to add a bit of custom code to support instruction arrays at this
> blinding stage.)
>
> > > + }
> > > + }
> > > +
> > > + insn_array->state = INSN_ARRAY_STATE_READY;
> > > + return 0;
> > > +}
> >
> > [...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-26 0:06 ` Eduard Zingerman
@ 2025-08-26 16:15 ` Anton Protopopov
2025-08-26 16:51 ` Anton Protopopov
0 siblings, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-26 16:15 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/25 05:06PM, Eduard Zingerman wrote:
> On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
>
> [...]
>
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index fe4fc5438678..a5f04544c09c 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
>
> [...]
>
> > @@ -6101,6 +6124,60 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
> > insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
> > }
> >
> > +static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
> > +{
> > + static union bpf_attr attr = {
> > + .map_type = BPF_MAP_TYPE_INSN_ARRAY,
> > + .key_size = 4,
> > + .value_size = sizeof(struct bpf_insn_array_value),
> > + .max_entries = 0,
> > + };
> > + struct bpf_insn_array_value val = {};
> > + int map_fd;
> > + int err;
> > + __u32 i;
> > + __u32 *jt;
> > +
> > + attr.max_entries = size / 8;
> > +
> > + map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> > + if (map_fd < 0)
> > + return map_fd;
> > +
> > + jt = (__u32 *)(obj->efile.jumptables_data.d_buf + off);
> ^^^^^^^^^
> Jump table entries are u64 now, e.g. see test case:
> https://github.com/llvm/llvm-project/blob/39dc3c41e459e6c847c1e45e7e93c53aaf74c1de/llvm/test/CodeGen/BPF/jump_table_swith_stmt.ll#L68
>
> [...]
Yes, thanks, I will change it to u64. (Just in case, it works now
because the code happens to work properly on little-endian: it uses
jt[2*i] for i-th element.)
> > @@ -6389,36 +6481,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
>
> [...]
>
> > static int
> > bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> > - struct bpf_program *subprog)
> > + struct bpf_program *subprog)
> > {
> > - struct bpf_insn *insns;
> > - size_t new_cnt;
> > - int err;
> > + struct bpf_insn *insns;
> > + size_t new_cnt;
> > + int err;
>
> Could you please extract spaces vs tabs fix for this function as a separate commit?
> Just to make diff easier to read.
>
> [...]
Sure, sorry, I haven't noticed it.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-26 16:51 ` Anton Protopopov
@ 2025-08-26 16:47 ` Eduard Zingerman
0 siblings, 0 replies; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-26 16:47 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On Tue, 2025-08-26 at 16:51 +0000, Anton Protopopov wrote:
[...]
> Just in case, this chunk is
>
> @@ -6418,6 +6524,17 @@ bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main
> err = append_subprog_relos(main_prog, subprog);
> if (err)
> return err;
> +
> + /* Save subprogram offsets */
> + if (main_prog->subprog_cnt == ARRAY_SIZE(main_prog->subprog_sec_off)) {
> + pr_warn("prog '%s': number of subprogs exceeds %zu\n",
> + main_prog->name, ARRAY_SIZE(main_prog->subprog_sec_off));
> + return -E2BIG;
> + }
> + main_prog->subprog_sec_off[main_prog->subprog_cnt] = subprog->sec_insn_off;
> + main_prog->subprog_off[main_prog->subprog_cnt] = subprog->sub_insn_off;
> + main_prog->subprog_cnt += 1;
> +
> return 0;
> }
>
> (In v2 it will either change to realloc vs. static allocation, or disappear.)
Thank you, filtered out whitespace changes when reading the patch yesterday.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
2025-08-26 16:15 ` Anton Protopopov
@ 2025-08-26 16:51 ` Anton Protopopov
2025-08-26 16:47 ` Eduard Zingerman
0 siblings, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-26 16:51 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/26 04:15PM, Anton Protopopov wrote:
> On 25/08/25 05:06PM, Eduard Zingerman wrote:
> > On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
> >
> > [...]
> >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index fe4fc5438678..a5f04544c09c 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> >
> > [...]
> >
> > > @@ -6101,6 +6124,60 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
> > > insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
> > > }
> > >
> > > +static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
> > > +{
> > > + static union bpf_attr attr = {
> > > + .map_type = BPF_MAP_TYPE_INSN_ARRAY,
> > > + .key_size = 4,
> > > + .value_size = sizeof(struct bpf_insn_array_value),
> > > + .max_entries = 0,
> > > + };
> > > + struct bpf_insn_array_value val = {};
> > > + int map_fd;
> > > + int err;
> > > + __u32 i;
> > > + __u32 *jt;
> > > +
> > > + attr.max_entries = size / 8;
> > > +
> > > + map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> > > + if (map_fd < 0)
> > > + return map_fd;
> > > +
> > > + jt = (__u32 *)(obj->efile.jumptables_data.d_buf + off);
> > ^^^^^^^^^
> > Jump table entries are u64 now, e.g. see test case:
> > https://github.com/llvm/llvm-project/blob/39dc3c41e459e6c847c1e45e7e93c53aaf74c1de/llvm/test/CodeGen/BPF/jump_table_swith_stmt.ll#L68
> >
> > [...]
>
> Yes, thanks, I will change it to u64. (Just in case, it works now
> because the code happens to work properly on little-endian: it uses
> jt[2*i] for i-th element.)
>
> > > @@ -6389,36 +6481,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
> >
> > [...]
> >
> > > static int
> > > bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> > > - struct bpf_program *subprog)
> > > + struct bpf_program *subprog)
> > > {
> > > - struct bpf_insn *insns;
> > > - size_t new_cnt;
> > > - int err;
> > > + struct bpf_insn *insns;
> > > + size_t new_cnt;
> > > + int err;
> >
> > Could you please extract spaces vs tabs fix for this function as a separate commit?
> > Just to make diff easier to read.
> >
> > [...]
>
> Sure, sorry, I haven't noticed it.
Just in case, this chunk is
@@ -6418,6 +6524,17 @@ bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main
err = append_subprog_relos(main_prog, subprog);
if (err)
return err;
+
+ /* Save subprogram offsets */
+ if (main_prog->subprog_cnt == ARRAY_SIZE(main_prog->subprog_sec_off)) {
+ pr_warn("prog '%s': number of subprogs exceeds %zu\n",
+ main_prog->name, ARRAY_SIZE(main_prog->subprog_sec_off));
+ return -E2BIG;
+ }
+ main_prog->subprog_sec_off[main_prog->subprog_cnt] = subprog->sec_insn_off;
+ main_prog->subprog_off[main_prog->subprog_cnt] = subprog->sub_insn_off;
+ main_prog->subprog_cnt += 1;
+
return 0;
}
(In v2 it will either change to realloc vs. static allocation, or disappear.)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding
2025-08-25 23:29 ` Eduard Zingerman
@ 2025-08-27 9:20 ` Anton Protopopov
0 siblings, 0 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-27 9:20 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/25 04:29PM, Eduard Zingerman wrote:
> On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
>
> [...]
>
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 5d1650af899d..27e9c30ad6dc 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
>
> [...]
>
> > @@ -1544,6 +1562,7 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
> > }
> >
> > clone->blinded = 1;
> > + clone->len = insn_cnt;
>
> Is this an old bug? Does it require a separate commit and a fixes tag?
Turns out this change is actually not needed, as the
bpf_patch_insn_single() function sets the len properly.
> > return clone;
> > }
> > #endif /* CONFIG_BPF_JIT */
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index e1f7744e132b..863b7114866b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
>
> [...]
>
> > @@ -21665,7 +21666,15 @@ static int jit_subprogs(struct bpf_verifier_env *env)
> > func[i]->aux->might_sleep = env->subprog_info[i].might_sleep;
> > if (!i)
> > func[i]->aux->exception_boundary = env->seen_exception;
> > +
> > + /*
> > + * To properly pass the absolute subprog start to jit
> > + * all instruction adjustments should be accumulated
> > + */
> > + instructions_added -= func[i]->len;
> > func[i] = bpf_int_jit_compile(func[i]);
> > + instructions_added += func[i]->len;
> > +
>
> Nit: This -= / += pair is a bit hackish, maybe add a separate variable
> to compute current delta?
Sure, I've rewrote this piece.
> > if (!func[i]->jited) {
> > err = -ENOTSUPP;
> > goto out_free;
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-25 23:15 ` Eduard Zingerman
@ 2025-08-27 15:34 ` Anton Protopopov
2025-08-27 18:58 ` Eduard Zingerman
2025-08-28 9:58 ` Anton Protopopov
1 sibling, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-27 15:34 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/25 04:15PM, Eduard Zingerman wrote:
> On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
>
> > Add support for a new instruction
> >
> > BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0[, imm=fd(M)]
> ^^^^^^^^^^^^^
> Do we really need to support this now?
>
> [...]
Maybe not, as libbpf in any case always sets 0. I will remove it for now.
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 4bfb4faab4d7..f419a89b0147 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -671,9 +671,11 @@ static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)
> > *pprog = prog;
> > }
> >
> > -static void emit_indirect_jump(u8 **pprog, int reg, bool ereg, u8 *ip)
> > +static void emit_indirect_jump(u8 **pprog, int bpf_reg, u8 *ip)
>
> Nit: maybe make this change a part of the previous patch?
Done.
> > {
> > u8 *prog = *pprog;
> > + int reg = reg2hex[bpf_reg];
> > + bool ereg = is_ereg(bpf_reg);
> >
> > if (cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) {
> > OPTIMIZER_HIDE_VAR(reg);
>
> [...]
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index aca43c284203..6e68e0082c81 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -77,7 +77,15 @@ struct bpf_reg_state {
> > * the map_uid is non-zero for registers
> > * pointing to inner maps.
> > */
> > - u32 map_uid;
> > + union {
> > + u32 map_uid;
> > +
> > + /* Used to track boundaries of a PTR_TO_INSN */
> > + struct {
> > + u32 min_index;
> > + u32 max_index;
>
> Could you please elaborate why these fields are necessary?
> It appears that .var_off/.{s,u}{32_,}{min,max}_value fields can be
> used to track current index bounds (min/max fields for bounds,
> .var_off field to check 8-byte alignment).
I thought it is better readable (and not wasting memory anymore).
They clearly say "pointer X was loaded from an instruction pointer
map M and can point to any of {M[min_index], ..., M[max_index]}".
Those indexes come from off_reg, not ptr_reg. In order to use
ptr_reg->u_min/u_max instead, more checks should be added (like those
in BPF_ADD for min/max_index) to check that registers doesn't point
to outside of M->ips. I am not sure this will be easier to read.
Also, PTR_TO_INSN is created by dereferencing the address, and right
now it looks easier just to copy min/max_index. As I understand,
normally this register is set to ips[var_off] and marked as unknown,
so there will be additional code to use u_min/u_max to keep track of
boundaries.
Or do you think this is still more clear?
I will try to look into this again in the morning.
> > + };
> > + };
> > };
> >
> > /* for PTR_TO_BTF_ID */
> > @@ -542,6 +550,11 @@ struct bpf_insn_aux_data {
> > struct {
> > u32 map_index; /* index into used_maps[] */
> > u32 map_off; /* offset from value base address */
> > +
> > + struct jt { /* jump table for gotox instruction */
> ^^
> should this be anonymous or have a `bpf_` prefix?
I will add bpf_ prefix, it is used in a few functions as a parameter.
>
> > + u32 *off;
> > + int off_cnt;
> > + } jt;
> > };
> > struct {
> > enum bpf_reg_type reg_type; /* type of pseudo_btf_id */
>
> [...]
>
> > diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c
> > index 0c8dac62f457..d077a5aa2c7c 100644
> > --- a/kernel/bpf/bpf_insn_array.c
> > +++ b/kernel/bpf/bpf_insn_array.c
> > @@ -1,7 +1,6 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> >
> > #include <linux/bpf.h>
> > -#include <linux/sort.h>
> >
> > #define MAX_INSN_ARRAY_ENTRIES 256
> >
> > @@ -173,6 +172,20 @@ static u64 insn_array_mem_usage(const struct bpf_map *map)
> > return insn_array_alloc_size(map->max_entries) + extra_size;
> > }
> >
> > +static int insn_array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
> > +{
> > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > +
> > + if ((off % sizeof(long)) != 0 ||
> > + (off / sizeof(long)) >= map->max_entries)
> > + return -EINVAL;
> > +
> > + /* from BPF's point of view, this map is a jump table */
> > + *imm = (unsigned long)insn_array->ips + off / sizeof(long);
> > +
> > + return 0;
> > +}
> > +
>
> This function is called during main verification pass by
> verifier.c:check_mem_access() -> verifier.c:bpf_map_direct_read().
> However, insn_array->ips is filled by bpf_jit_comp.c:do_jit()
> bpf_insn_array.c:bpf_prog_update_insn_ptr(), which is called *after*
> main verification pass. Do I miss something, or this can't work?
This gets an address &ips[off], not the address of the bpf program.
Ad this moment ips[off] contains garbage. Later when
bpf_prog_update_insn_ptr() is executed, ips[off] is populated with
the real IP. The running program then reads it by dereferencing the
[correct at this time] address, i.e., *(&ips[off]).
> > BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
> >
> > const struct bpf_map_ops insn_array_map_ops = {
>
> [...]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 863b7114866b..c2cfa55913f8 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
>
> [...]
>
> > @@ -6072,6 +6084,14 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > return 0;
> > }
> >
> > +static u32 map_mem_size(const struct bpf_map *map)
>
> Nit: It is a bit non-obvious why this function returns the size of a
> single value for all map types except insn array. Maybe add a
> comment here, something like:
>
> Return the size of the memory region accessible from a pointer
> to map value. For INSN_ARRAY maps whole bpf_insn_array->ips
> array is accessible.
Thanks, will add.
> > +{
> > + if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY)
> > + return map->max_entries * sizeof(long);
> ^^^^^^^^^^^^
> Nit: sizeof_field(struct bpf_insn_array, ips) ?
ok, ack
[No comments below, I will reply to those tomorrow.
And thanks a lot for your reviews!]
> > +
> > + return map->value_size;
> > +}
> > +
> > /* check read/write into a map element with possible variable offset */
> > static int check_map_access(struct bpf_verifier_env *env, u32 regno,
> > int off, int size, bool zero_size_allowed,
>
> [...]
>
> > @@ -7820,6 +7849,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
> > allow_trust_mismatch);
> > err = err ?: reg_bounds_sanity_check(env, ®s[insn->dst_reg], ctx);
> >
> > + if (map_ptr_copy) {
> > + regs[insn->dst_reg].type = PTR_TO_INSN;
> > + regs[insn->dst_reg].map_ptr = map_ptr_copy;
> > + regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
> > + regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
> > + }
> > +
>
> I think this should be handled inside check_mem_access(), see case for
> reg->type == PTR_TO_MAP_VALUE.
>
> > return err;
> > }
> >
>
> [...]
>
> > @@ -14554,6 +14592,36 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
> >
> > switch (opcode) {
> > case BPF_ADD:
> > + if (ptr_to_insn_array) {
> > + u32 min_index = dst_reg->min_index;
> > + u32 max_index = dst_reg->max_index;
> > +
> > + if ((umin_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
> > + verbose(env, "umin_value %llu + offset %u is too big to convert to index\n",
> > + umin_val, ptr_reg->off);
> > + return -EACCES;
> > + }
> > + if ((umax_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
> > + verbose(env, "umax_value %llu + offset %u is too big to convert to index\n",
> > + umax_val, ptr_reg->off);
> > + return -EACCES;
> > + }
> > +
> > + min_index += (umin_val + ptr_reg->off) / sizeof(long);
> > + max_index += (umax_val + ptr_reg->off) / sizeof(long);
> > +
> > + if (min_index >= ptr_reg->map_ptr->max_entries) {
> > + verbose(env, "min_index %u points to outside of map\n", min_index);
> > + return -EACCES;
> > + }
> > + if (max_index >= ptr_reg->map_ptr->max_entries) {
> > + verbose(env, "max_index %u points to outside of map\n", max_index);
> > + return -EACCES;
> > + }
> > +
> > + dst_reg->min_index = min_index;
> > + dst_reg->max_index = max_index;
> > + }
>
> I think this and the following hunk would disappear if {min,max}_index
> are replaced by regular offset tracking mechanics.
>
> > /* We can take a fixed offset as long as it doesn't overflow
> > * the s32 'off' field
> > */
> > @@ -14598,6 +14666,11 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
> > }
> > break;
> > case BPF_SUB:
> > + if (ptr_to_insn_array) {
> > + verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> > + bpf_alu_string[opcode >> 4]);
> > + return -EACCES;
> > + }
> > if (dst_reg == off_reg) {
> > /* scalar -= pointer. Creates an unknown scalar */
> > verbose(env, "R%d tried to subtract pointer from scalar\n",
> > @@ -16943,7 +17016,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > }
> > dst_reg->type = PTR_TO_MAP_VALUE;
> > dst_reg->off = aux->map_off;
> > - WARN_ON_ONCE(map->max_entries != 1);
> > + WARN_ON_ONCE(map->map_type != BPF_MAP_TYPE_INSN_ARRAY &&
> > + map->max_entries != 1);
>
> Q: when is this necessary?
>
> > /* We want reg->id to be same (0) as map_value is not distinct */
> > } else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
> > insn->src_reg == BPF_PSEUDO_MAP_IDX) {
> > @@ -17696,6 +17770,246 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
> > return 0;
> > }
> >
> > +#define SET_HIGH(STATE, LAST) STATE = (STATE & 0xffffU) | ((LAST) << 16)
> > +#define GET_HIGH(STATE) ((u16)((STATE) >> 16))
> > +
> > +static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct jt *jt)
>
> I think check_cfg() can be refactored to use insn_successors().
> In such a case it won't be necessary to special case gotox processing
> (appart from insn_aux->jt allocation).
>
> > +{
> > + int *insn_stack = env->cfg.insn_stack;
> > + int *insn_state = env->cfg.insn_state;
> > + u16 prev;
> > + int w;
> > +
> > + for (prev = GET_HIGH(insn_state[t]); prev < jt->off_cnt; prev++) {
> > + w = jt->off[prev];
> > +
> > + /* EXPLORED || DISCOVERED */
> > + if (insn_state[w])
> > + continue;
> > +
> > + break;
> > + }
> > +
> > + if (prev == jt->off_cnt)
> > + return DONE_EXPLORING;
> > +
> > + mark_prune_point(env, t);
> > +
> > + if (env->cfg.cur_stack >= env->prog->len)
> > + return -E2BIG;
> > + insn_stack[env->cfg.cur_stack++] = w;
> > +
> > + mark_jmp_point(env, w);
> > +
> > + SET_HIGH(insn_state[t], prev + 1);
> > + return KEEP_EXPLORING;
> > +}
> > +
> > +static int copy_insn_array(struct bpf_map *map, u32 start, u32 end, u32 *off)
> > +{
> > + struct bpf_insn_array_value *value;
> > + u32 i;
> > +
> > + for (i = start; i <= end; i++) {
> > + value = map->ops->map_lookup_elem(map, &i);
> > + if (!value)
> > + return -EINVAL;
> > + off[i - start] = value->xlated_off;
> > + }
> > + return 0;
> > +}
> > +
> > +static int cmp_ptr_to_u32(const void *a, const void *b)
> > +{
> > + return *(u32 *)a - *(u32 *)b;
> > +}
>
> This will overflow for e.g. `0 - 8`.
>
> > +
> > +static int sort_insn_array_uniq(u32 *off, int off_cnt)
> > +{
> > + int unique = 1;
> > + int i;
> > +
> > + sort(off, off_cnt, sizeof(off[0]), cmp_ptr_to_u32, NULL);
> > +
> > + for (i = 1; i < off_cnt; i++)
> > + if (off[i] != off[unique - 1])
> > + off[unique++] = off[i];
> > +
> > + return unique;
> > +}
> > +
> > +/*
> > + * sort_unique({map[start], ..., map[end]}) into off
> > + */
> > +static int copy_insn_array_uniq(struct bpf_map *map, u32 start, u32 end, u32 *off)
> > +{
> > + u32 n = end - start + 1;
> > + int err;
> > +
> > + err = copy_insn_array(map, start, end, off);
> > + if (err)
> > + return err;
> > +
> > + return sort_insn_array_uniq(off, n);
> > +}
> > +
> > +/*
> > + * Copy all unique offsets from the map
> > + */
> > +static int jt_from_map(struct bpf_map *map, struct jt *jt)
> > +{
> > + u32 *off;
> > + int n;
> > +
> > + off = kvcalloc(map->max_entries, sizeof(u32), GFP_KERNEL_ACCOUNT);
> > + if (!off)
> > + return -ENOMEM;
> > +
> > + n = copy_insn_array_uniq(map, 0, map->max_entries - 1, off);
> > + if (n < 0) {
> > + kvfree(off);
> > + return n;
> > + }
> > +
> > + jt->off = off;
> > + jt->off_cnt = n;
> > + return 0;
> > +}
> > +
> > +/*
> > + * Find and collect all maps which fit in the subprog. Return the result as one
> > + * combined jump table in jt->off (allocated with kvcalloc
> > + */
> > +static int jt_from_subprog(struct bpf_verifier_env *env,
> > + int subprog_start,
> > + int subprog_end,
> > + struct jt *jt)
> > +{
> > + struct bpf_map *map;
> > + struct jt jt_cur;
> > + u32 *off;
> > + int err;
> > + int i;
> > +
> > + jt->off = NULL;
> > + jt->off_cnt = 0;
> > +
> > + for (i = 0; i < env->insn_array_map_cnt; i++) {
> > + /*
> > + * TODO (when needed): collect only jump tables, not static keys
> > + * or maps for indirect calls
> > + */
> > + map = env->insn_array_maps[i];
> > +
> > + err = jt_from_map(map, &jt_cur);
> > + if (err) {
> > + kvfree(jt->off);
> > + return err;
> > + }
> > +
> > + /*
> > + * This is enough to check one element. The full table is
> > + * checked to fit inside the subprog later in create_jt()
> > + */
> > + if (jt_cur.off[0] >= subprog_start && jt_cur.off[0] < subprog_end) {
>
> This won't always catch cases when insn array references offsets from
> several subprograms. Also is one subprogram limitation really necessary?
>
> > + off = kvrealloc(jt->off, (jt->off_cnt + jt_cur.off_cnt) << 2, GFP_KERNEL_ACCOUNT);
> > + if (!off) {
> > + kvfree(jt_cur.off);
> > + kvfree(jt->off);
> > + return -ENOMEM;
> > + }
> > + memcpy(off + jt->off_cnt, jt_cur.off, jt_cur.off_cnt << 2);
> > + jt->off = off;
> > + jt->off_cnt += jt_cur.off_cnt;
> > + }
> > +
> > + kvfree(jt_cur.off);
> > + }
> > +
> > + if (jt->off == NULL) {
> > + verbose(env, "no jump tables found for subprog starting at %u\n", subprog_start);
> > + return -EINVAL;
> > + }
> > +
> > + jt->off_cnt = sort_insn_array_uniq(jt->off, jt->off_cnt);
> > + return 0;
> > +}
> > +
> > +static int create_jt(int t, struct bpf_verifier_env *env, int fd, struct jt *jt)
> > +{
> > + static struct bpf_subprog_info *subprog;
> > + int subprog_idx, subprog_start, subprog_end;
> > + struct bpf_map *map;
> > + int map_idx;
> > + int ret;
> > + int i;
> > +
> > + if (env->subprog_cnt == 0)
> > + return -EFAULT;
> > +
> > + subprog_idx = find_containing_subprog_idx(env, t);
> > + if (subprog_idx < 0) {
> > + verbose(env, "can't find subprog containing instruction %d\n", t);
> > + return -EFAULT;
> > + }
> > + subprog = &env->subprog_info[subprog_idx];
> > + subprog_start = subprog->start;
> > + subprog_end = (subprog + 1)->start;
> > +
> > + map_idx = add_used_map(env, fd);
>
> Will this spam the log with bogus
> "fd %d is not pointing to valid bpf_map\n" messages if gotox does not
> specify fd?
>
> > + if (map_idx >= 0) {
> > + map = env->used_maps[map_idx];
> > + if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY) {
> > + verbose(env, "map type %d in the gotox insn %d is incorrect\n",
> > + map->map_type, t);
> > + return -EINVAL;
> > + }
> > +
> > + env->insn_aux_data[t].map_index = map_idx;
> > +
> > + ret = jt_from_map(map, jt);
> > + if (ret)
> > + return ret;
> > + } else {
> > + ret = jt_from_subprog(env, subprog_start, subprog_end, jt);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + /* Check that the every element of the jump table fits within the given subprogram */
> > + for (i = 0; i < jt->off_cnt; i++) {
> > + if (jt->off[i] < subprog_start || jt->off[i] >= subprog_end) {
> > + verbose(env, "jump table for insn %d points outside of the subprog [%u,%u]",
> > + t, subprog_start, subprog_end);
> > + return -EINVAL;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/* "conditional jump with N edges" */
> > +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> > +{
> > + struct jt *jt = &env->insn_aux_data[t].jt;
> > + int ret;
> > +
> > + if (jt->off == NULL) {
> > + ret = create_jt(t, env, fd, jt);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + /*
> > + * Mark jt as allocated. Otherwise, this is not possible to check if it
> > + * was allocated or not in the code which frees memory (jt is a part of
> > + * union)
> > + */
> > + env->insn_aux_data[t].jt_allocated = true;
> > +
> > + return push_goto_x_edge(t, env, jt);
> > +}
> > +
> > /* Visits the instruction at index t and returns one of the following:
> > * < 0 - an error occurred
> > * DONE_EXPLORING - the instruction was fully explored
> > @@ -17786,8 +18100,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> > return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
> >
> > case BPF_JA:
> > - if (BPF_SRC(insn->code) != BPF_K)
> > - return -EINVAL;
> > + if (BPF_SRC(insn->code) == BPF_X)
> > + return visit_goto_x_insn(t, env, insn->imm);
> >
> > if (BPF_CLASS(insn->code) == BPF_JMP)
> > off = insn->off;
>
> [...]
>
> > @@ -18679,6 +19000,10 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
> > case PTR_TO_ARENA:
> > return true;
> > + case PTR_TO_INSN:
> > + /* cur ⊆ old */
>
> Out of curiosity: are unicode symbols allowed in kernel source code?
>
> > + return (rcur->min_index >= rold->min_index &&
> > + rcur->max_index <= rold->max_index);
> > default:
> > return regs_exact(rold, rcur, idmap);
> > }
> > @@ -19825,6 +20150,67 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> > return PROCESS_BPF_EXIT;
> > }
> >
> > +/* gotox *dst_reg */
> > +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > +{
> > + struct bpf_verifier_state *other_branch;
> > + struct bpf_reg_state *dst_reg;
> > + struct bpf_map *map;
> > + int err = 0;
> > + u32 *xoff;
> > + int n;
> > + int i;
> > +
> > + dst_reg = reg_state(env, insn->dst_reg);
> > + if (dst_reg->type != PTR_TO_INSN) {
> > + verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> > + insn->dst_reg, dst_reg->type);
> > + return -EINVAL;
> > + }
> > +
> > + map = dst_reg->map_ptr;
> > + if (!map)
> > + return -EINVAL;
>
> Is this a verifier bug or legit situation?
> If it is a bug, maybe add a verifier_bug() here and return -EFAULT?
>
> > +
> > + if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY)
> > + return -EINVAL;
>
> Same question here, ->type is already `PTR_TO_INSN`.
>
> > +
> > + if (dst_reg->max_index >= map->max_entries) {
> > + verbose(env, "BPF_JA|BPF_X R%d is out of map boundaries: index=%u, max_index=%u\n",
> > + insn->dst_reg, dst_reg->max_index, map->max_entries-1);
> > + return -EINVAL;
> > + }
> > +
> > + xoff = kvcalloc(dst_reg->max_index - dst_reg->min_index + 1, sizeof(u32), GFP_KERNEL_ACCOUNT);
> > + if (!xoff)
> > + return -ENOMEM;
> > +
> > + n = copy_insn_array_uniq(map, dst_reg->min_index, dst_reg->max_index, xoff);
>
> Nit: I'd avoid this allocation and do a loop for(i = min_index; i <= max_index; i++),
> with map->ops->map_lookup_elem(map, &i) (or a wrapper) inside it.
>
> > + if (n < 0) {
> > + err = n;
> > + goto free_off;
> > + }
> > + if (n == 0) {
> > + verbose(env, "register R%d doesn't point to any offset in map id=%d\n",
> > + insn->dst_reg, map->id);
> > + err = -EINVAL;
> > + goto free_off;
> > + }
> > +
> > + for (i = 0; i < n - 1; i++) {
> > + other_branch = push_stack(env, xoff[i], env->insn_idx, false);
> > + if (IS_ERR(other_branch)) {
> > + err = PTR_ERR(other_branch);
> > + goto free_off;
> > + }
> > + }
> > + env->insn_idx = xoff[n-1];
> > +
> > +free_off:
> > + kvfree(xoff);
> > + return err;
> > +}
> > +
> > static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> > {
> > int err;
>
> [...]
>
> > @@ -20981,6 +21371,23 @@ static int bpf_adj_linfo_after_remove(struct bpf_verifier_env *env, u32 off,
> > return 0;
> > }
> >
> > +/*
> > + * Clean up dynamically allocated fields of aux data for instructions [start, ..., end]
> > + */
> > +static void clear_insn_aux_data(struct bpf_insn_aux_data *aux_data, int start, int end)
>
> Nit: switching this to (..., int start, int len) would simplify arithmetic at call sites.
>
> > +{
> > + int i;
> > +
> > + for (i = start; i <= end; i++) {
> > + if (aux_data[i].jt_allocated) {
> > + kvfree(aux_data[i].jt.off);
> > + aux_data[i].jt.off = NULL;
> > + aux_data[i].jt.off_cnt = 0;
> > + aux_data[i].jt_allocated = false;
> > + }
> > + }
> > +}
> > +
> > static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
> > {
> > struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
>
> [...]
>
> > @@ -24175,18 +24586,18 @@ static bool can_jump(struct bpf_insn *insn)
> > return false;
> > }
> >
> > -static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> > +static int insn_successors_regular(struct bpf_prog *prog, u32 insn_idx, u32 *succ)
> > {
> > - struct bpf_insn *insn = &prog->insnsi[idx];
> > + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> > int i = 0, insn_sz;
> > u32 dst;
> >
> > insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> > - if (can_fallthrough(insn) && idx + 1 < prog->len)
> > - succ[i++] = idx + insn_sz;
> > + if (can_fallthrough(insn) && insn_idx + 1 < prog->len)
> > + succ[i++] = insn_idx + insn_sz;
> >
> > if (can_jump(insn)) {
> > - dst = idx + jmp_offset(insn) + 1;
> > + dst = insn_idx + jmp_offset(insn) + 1;
> > if (i == 0 || succ[0] != dst)
> > succ[i++] = dst;
> > }
> > @@ -24194,6 +24605,36 @@ static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> > return i;
> > }
> >
> > +static int insn_successors_gotox(struct bpf_verifier_env *env,
> > + struct bpf_prog *prog,
> > + u32 insn_idx, u32 **succ)
> > +{
> > + struct jt *jt = &env->insn_aux_data[insn_idx].jt;
> > +
> > + if (WARN_ON_ONCE(!jt->off || !jt->off_cnt))
> > + return -EFAULT;
> > +
> > + *succ = jt->off;
> > + return jt->off_cnt;
> > +}
> > +
> > +/*
> > + * Fill in *succ[0],...,*succ[n-1] with successors. The default *succ
> > + * pointer (of size 2) may be replaced with a custom one if more
> > + * elements are required (i.e., an indirect jump).
> > + */
> > +static int insn_successors(struct bpf_verifier_env *env,
> > + struct bpf_prog *prog,
> > + u32 insn_idx, u32 **succ)
> > +{
> > + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> > +
> > + if (unlikely(insn_is_gotox(insn)))
> > + return insn_successors_gotox(env, prog, insn_idx, succ);
> > +
> > + return insn_successors_regular(prog, insn_idx, *succ);
> > +}
> > +
>
> The `prog` parameter can be dropped, as it is accessible from `env`.
> I don't like the `u32 **succ` part of this interface.
> What about one of the following alternatives:
>
> - u32 *insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
> and `u32 succ_buf[2]` added to bpf_verifier_env?
>
> - int insn_successor(struct bpf_verifier_env *env, u32 insn_idx, u32 succ_num):
> bool fallthrough = can_fallthrough(insn);
> bool jump = can_jump(insn);
> if (succ_num == 0) {
> if (fallthrough)
> return <next insn>
> if (jump)
> return <jump tgt>
> } else if (succ_num == 1) {
> if (fallthrough && jump)
> return <jmp tgt>
> } else if (is_gotox) {
> return <lookup>
> }
> return -1;
>
> ?
>
> > /* Each field is a register bitmask */
> > struct insn_live_regs {
> > u16 use; /* registers read by instruction */
> > @@ -24387,11 +24828,17 @@ static int compute_live_registers(struct bpf_verifier_env *env)
>
> Could you please extend `tools/testing/selftests/bpf/progs/compute_live_registers.c`
> with test cases for gotox?
>
> > int insn_idx = env->cfg.insn_postorder[i];
> > struct insn_live_regs *live = &state[insn_idx];
> > int succ_num;
> > - u32 succ[2];
> > + u32 _succ[2];
> > + u32 *succ = &_succ[0];
> > u16 new_out = 0;
> > u16 new_in = 0;
> >
> > - succ_num = insn_successors(env->prog, insn_idx, succ);
> > + succ_num = insn_successors(env, env->prog, insn_idx, &succ);
> > + if (succ_num < 0) {
> > + err = succ_num;
> > + goto out;
> > +
> > + }
> > for (int s = 0; s < succ_num; ++s)
> > new_out |= state[succ[s]].in;
> > new_in = (new_out & ~live->def) | live->use;
>
> [...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-27 15:34 ` Anton Protopopov
@ 2025-08-27 18:58 ` Eduard Zingerman
0 siblings, 0 replies; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-27 18:58 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On Wed, 2025-08-27 at 15:34 +0000, Anton Protopopov wrote:
[...]
> > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > index aca43c284203..6e68e0082c81 100644
> > > --- a/include/linux/bpf_verifier.h
> > > +++ b/include/linux/bpf_verifier.h
> > > @@ -77,7 +77,15 @@ struct bpf_reg_state {
> > > * the map_uid is non-zero for registers
> > > * pointing to inner maps.
> > > */
> > > - u32 map_uid;
> > > + union {
> > > + u32 map_uid;
> > > +
> > > + /* Used to track boundaries of a PTR_TO_INSN */
> > > + struct {
> > > + u32 min_index;
> > > + u32 max_index;
> >
> > Could you please elaborate why these fields are necessary?
> > It appears that .var_off/.{s,u}{32_,}{min,max}_value fields can be
> > used to track current index bounds (min/max fields for bounds,
> > .var_off field to check 8-byte alignment).
>
> I thought it is better readable (and not wasting memory anymore).
> They clearly say "pointer X was loaded from an instruction pointer
> map M and can point to any of {M[min_index], ..., M[max_index]}".
> Those indexes come from off_reg, not ptr_reg. In order to use
> ptr_reg->u_min/u_max instead, more checks should be added (like those
> in BPF_ADD for min/max_index) to check that registers doesn't point
> to outside of M->ips. I am not sure this will be easier to read.
>
> Also, PTR_TO_INSN is created by dereferencing the address, and right
> now it looks easier just to copy min/max_index. As I understand,
> normally this register is set to ips[var_off] and marked as unknown,
> so there will be additional code to use u_min/u_max to keep track of
> boundaries.
>
> Or do you think this is still more clear?
>
> I will try to look into this again in the morning.
The main point is uniformity. For other pointer types current
boundaries are tracked via .var_off/.{s,u}{32_,}{min,max}_value,
out of range access is reported at the point of actual access.
Imo, preserving this uniformity simplifies reasoning about the code.
[...]
> > > @@ -173,6 +172,20 @@ static u64 insn_array_mem_usage(const struct bpf_map *map)
> > > return insn_array_alloc_size(map->max_entries) + extra_size;
> > > }
> > >
> > > +static int insn_array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
> > > +{
> > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > +
> > > + if ((off % sizeof(long)) != 0 ||
> > > + (off / sizeof(long)) >= map->max_entries)
> > > + return -EINVAL;
> > > +
> > > + /* from BPF's point of view, this map is a jump table */
> > > + *imm = (unsigned long)insn_array->ips + off / sizeof(long);
> > > +
> > > + return 0;
> > > +}
> > > +
> >
> > This function is called during main verification pass by
> > verifier.c:check_mem_access() -> verifier.c:bpf_map_direct_read().
> > However, insn_array->ips is filled by bpf_jit_comp.c:do_jit()
> > bpf_insn_array.c:bpf_prog_update_insn_ptr(), which is called *after*
> > main verification pass. Do I miss something, or this can't work?
>
> This gets an address &ips[off], not the address of the bpf program.
> Ad this moment ips[off] contains garbage. Later when
> bpf_prog_update_insn_ptr() is executed, ips[off] is populated with
> the real IP. The running program then reads it by dereferencing the
> [correct at this time] address, i.e., *(&ips[off]).
Ack, missed the address part, thank you for explaining
[...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-25 23:15 ` Eduard Zingerman
2025-08-27 15:34 ` Anton Protopopov
@ 2025-08-28 9:58 ` Anton Protopopov
2025-08-28 14:15 ` Anton Protopopov
2025-08-28 16:30 ` Eduard Zingerman
1 sibling, 2 replies; 38+ messages in thread
From: Anton Protopopov @ 2025-08-28 9:58 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/25 04:15PM, Eduard Zingerman wrote:
> On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
>
[...] (see the previous reply)
> > +{
> > + if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY)
> > + return map->max_entries * sizeof(long);
> ^^^^^^^^^^^^
> Nit: sizeof_field(struct bpf_insn_array, ips) ?
I think sizeof(long) is ok, as this always will be a size of a
pointer. (To use sizeof_field() the bpf_insn_array should be
shared in a header, is it worth it?)
> > +
> > + return map->value_size;
> > +}
> > +
> > /* check read/write into a map element with possible variable offset */
> > static int check_map_access(struct bpf_verifier_env *env, u32 regno,
> > int off, int size, bool zero_size_allowed,
>
> [...]
>
> > @@ -7820,6 +7849,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
> > allow_trust_mismatch);
> > err = err ?: reg_bounds_sanity_check(env, ®s[insn->dst_reg], ctx);
> >
> > + if (map_ptr_copy) {
> > + regs[insn->dst_reg].type = PTR_TO_INSN;
> > + regs[insn->dst_reg].map_ptr = map_ptr_copy;
> > + regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
> > + regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
> > + }
> > +
>
> I think this should be handled inside check_mem_access(), see case for
> reg->type == PTR_TO_MAP_VALUE.
yes, ok
>
> > return err;
> > }
> >
>
> [...]
>
> > @@ -14554,6 +14592,36 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
> >
> > switch (opcode) {
> > case BPF_ADD:
> > + if (ptr_to_insn_array) {
> > + u32 min_index = dst_reg->min_index;
> > + u32 max_index = dst_reg->max_index;
> > +
> > + if ((umin_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
> > + verbose(env, "umin_value %llu + offset %u is too big to convert to index\n",
> > + umin_val, ptr_reg->off);
> > + return -EACCES;
> > + }
> > + if ((umax_val + ptr_reg->off) > (u64) U32_MAX * sizeof(long)) {
> > + verbose(env, "umax_value %llu + offset %u is too big to convert to index\n",
> > + umax_val, ptr_reg->off);
> > + return -EACCES;
> > + }
> > +
> > + min_index += (umin_val + ptr_reg->off) / sizeof(long);
> > + max_index += (umax_val + ptr_reg->off) / sizeof(long);
> > +
> > + if (min_index >= ptr_reg->map_ptr->max_entries) {
> > + verbose(env, "min_index %u points to outside of map\n", min_index);
> > + return -EACCES;
> > + }
> > + if (max_index >= ptr_reg->map_ptr->max_entries) {
> > + verbose(env, "max_index %u points to outside of map\n", max_index);
> > + return -EACCES;
> > + }
> > +
> > + dst_reg->min_index = min_index;
> > + dst_reg->max_index = max_index;
> > + }
>
> I think this and the following hunk would disappear if {min,max}_index
> are replaced by regular offset tracking mechanics.
>
> > /* We can take a fixed offset as long as it doesn't overflow
> > * the s32 'off' field
> > */
> > @@ -14598,6 +14666,11 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
> > }
> > break;
> > case BPF_SUB:
> > + if (ptr_to_insn_array) {
> > + verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> > + bpf_alu_string[opcode >> 4]);
> > + return -EACCES;
> > + }
> > if (dst_reg == off_reg) {
> > /* scalar -= pointer. Creates an unknown scalar */
> > verbose(env, "R%d tried to subtract pointer from scalar\n",
> > @@ -16943,7 +17016,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > }
> > dst_reg->type = PTR_TO_MAP_VALUE;
> > dst_reg->off = aux->map_off;
> > - WARN_ON_ONCE(map->max_entries != 1);
> > + WARN_ON_ONCE(map->map_type != BPF_MAP_TYPE_INSN_ARRAY &&
> > + map->max_entries != 1);
>
> Q: when is this necessary?
For all maps except INSN_ARRAY only (map->max_entries == 1) is
allowed. This change adds an exception for INSN_ARRAY.
>
> > /* We want reg->id to be same (0) as map_value is not distinct */
> > } else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
> > insn->src_reg == BPF_PSEUDO_MAP_IDX) {
> > @@ -17696,6 +17770,246 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
> > return 0;
> > }
> >
> > +#define SET_HIGH(STATE, LAST) STATE = (STATE & 0xffffU) | ((LAST) << 16)
> > +#define GET_HIGH(STATE) ((u16)((STATE) >> 16))
> > +
> > +static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct jt *jt)
>
> I think check_cfg() can be refactored to use insn_successors().
> In such a case it won't be necessary to special case gotox processing
> (appart from insn_aux->jt allocation).
Yes, this sounds right, theoretically. I will take a look.
> > +{
> > + int *insn_stack = env->cfg.insn_stack;
> > + int *insn_state = env->cfg.insn_state;
> > + u16 prev;
> > + int w;
> > +
> > + for (prev = GET_HIGH(insn_state[t]); prev < jt->off_cnt; prev++) {
> > + w = jt->off[prev];
> > +
> > + /* EXPLORED || DISCOVERED */
> > + if (insn_state[w])
> > + continue;
> > +
> > + break;
> > + }
> > +
> > + if (prev == jt->off_cnt)
> > + return DONE_EXPLORING;
> > +
> > + mark_prune_point(env, t);
> > +
> > + if (env->cfg.cur_stack >= env->prog->len)
> > + return -E2BIG;
> > + insn_stack[env->cfg.cur_stack++] = w;
> > +
> > + mark_jmp_point(env, w);
> > +
> > + SET_HIGH(insn_state[t], prev + 1);
> > + return KEEP_EXPLORING;
> > +}
> > +
> > +static int copy_insn_array(struct bpf_map *map, u32 start, u32 end, u32 *off)
> > +{
> > + struct bpf_insn_array_value *value;
> > + u32 i;
> > +
> > + for (i = start; i <= end; i++) {
> > + value = map->ops->map_lookup_elem(map, &i);
> > + if (!value)
> > + return -EINVAL;
> > + off[i - start] = value->xlated_off;
> > + }
> > + return 0;
> > +}
> > +
> > +static int cmp_ptr_to_u32(const void *a, const void *b)
> > +{
> > + return *(u32 *)a - *(u32 *)b;
> > +}
>
> This will overflow for e.g. `0 - 8`.
Why? 0U - 8U = 0xfffffff8U (it's not an UB because values are
unsigned). Then it's cast to int on return which is -8.
> > +
> > +static int sort_insn_array_uniq(u32 *off, int off_cnt)
> > +{
> > + int unique = 1;
> > + int i;
> > +
> > + sort(off, off_cnt, sizeof(off[0]), cmp_ptr_to_u32, NULL);
> > +
> > + for (i = 1; i < off_cnt; i++)
> > + if (off[i] != off[unique - 1])
> > + off[unique++] = off[i];
> > +
> > + return unique;
> > +}
> > +
> > +/*
> > + * sort_unique({map[start], ..., map[end]}) into off
> > + */
> > +static int copy_insn_array_uniq(struct bpf_map *map, u32 start, u32 end, u32 *off)
> > +{
> > + u32 n = end - start + 1;
> > + int err;
> > +
> > + err = copy_insn_array(map, start, end, off);
> > + if (err)
> > + return err;
> > +
> > + return sort_insn_array_uniq(off, n);
> > +}
> > +
> > +/*
> > + * Copy all unique offsets from the map
> > + */
> > +static int jt_from_map(struct bpf_map *map, struct jt *jt)
> > +{
> > + u32 *off;
> > + int n;
> > +
> > + off = kvcalloc(map->max_entries, sizeof(u32), GFP_KERNEL_ACCOUNT);
> > + if (!off)
> > + return -ENOMEM;
> > +
> > + n = copy_insn_array_uniq(map, 0, map->max_entries - 1, off);
> > + if (n < 0) {
> > + kvfree(off);
> > + return n;
> > + }
> > +
> > + jt->off = off;
> > + jt->off_cnt = n;
> > + return 0;
> > +}
> > +
> > +/*
> > + * Find and collect all maps which fit in the subprog. Return the result as one
> > + * combined jump table in jt->off (allocated with kvcalloc
> > + */
> > +static int jt_from_subprog(struct bpf_verifier_env *env,
> > + int subprog_start,
> > + int subprog_end,
> > + struct jt *jt)
> > +{
> > + struct bpf_map *map;
> > + struct jt jt_cur;
> > + u32 *off;
> > + int err;
> > + int i;
> > +
> > + jt->off = NULL;
> > + jt->off_cnt = 0;
> > +
> > + for (i = 0; i < env->insn_array_map_cnt; i++) {
> > + /*
> > + * TODO (when needed): collect only jump tables, not static keys
> > + * or maps for indirect calls
> > + */
> > + map = env->insn_array_maps[i];
> > +
> > + err = jt_from_map(map, &jt_cur);
> > + if (err) {
> > + kvfree(jt->off);
> > + return err;
> > + }
> > +
> > + /*
> > + * This is enough to check one element. The full table is
> > + * checked to fit inside the subprog later in create_jt()
> > + */
> > + if (jt_cur.off[0] >= subprog_start && jt_cur.off[0] < subprog_end) {
>
> This won't always catch cases when insn array references offsets from
> several subprograms. Also is one subprogram limitation really necessary?
This was intentional. If you have a switch or a jump table
defined in C, then corresponding jump tables belong to one function.
Also, what if you have a jt which can jump from function f() to g(),
but then g() is livepatched by another function?
> > + off = kvrealloc(jt->off, (jt->off_cnt + jt_cur.off_cnt) << 2, GFP_KERNEL_ACCOUNT);
> > + if (!off) {
> > + kvfree(jt_cur.off);
> > + kvfree(jt->off);
> > + return -ENOMEM;
> > + }
> > + memcpy(off + jt->off_cnt, jt_cur.off, jt_cur.off_cnt << 2);
> > + jt->off = off;
> > + jt->off_cnt += jt_cur.off_cnt;
> > + }
> > +
> > + kvfree(jt_cur.off);
> > + }
> > +
> > + if (jt->off == NULL) {
> > + verbose(env, "no jump tables found for subprog starting at %u\n", subprog_start);
> > + return -EINVAL;
> > + }
> > +
> > + jt->off_cnt = sort_insn_array_uniq(jt->off, jt->off_cnt);
> > + return 0;
> > +}
> > +
> > +static int create_jt(int t, struct bpf_verifier_env *env, int fd, struct jt *jt)
> > +{
> > + static struct bpf_subprog_info *subprog;
> > + int subprog_idx, subprog_start, subprog_end;
> > + struct bpf_map *map;
> > + int map_idx;
> > + int ret;
> > + int i;
> > +
> > + if (env->subprog_cnt == 0)
> > + return -EFAULT;
> > +
> > + subprog_idx = find_containing_subprog_idx(env, t);
> > + if (subprog_idx < 0) {
> > + verbose(env, "can't find subprog containing instruction %d\n", t);
> > + return -EFAULT;
> > + }
> > + subprog = &env->subprog_info[subprog_idx];
> > + subprog_start = subprog->start;
> > + subprog_end = (subprog + 1)->start;
> > +
> > + map_idx = add_used_map(env, fd);
>
> Will this spam the log with bogus
> "fd %d is not pointing to valid bpf_map\n" messages if gotox does not
> specify fd?
Yes, thanks, good catch! (This code will be removed in v2, as
gotox[imm=map_fd] will be gone for now, as you've suggested.)
> > + if (map_idx >= 0) {
> > + map = env->used_maps[map_idx];
> > + if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY) {
> > + verbose(env, "map type %d in the gotox insn %d is incorrect\n",
> > + map->map_type, t);
> > + return -EINVAL;
> > + }
> > +
> > + env->insn_aux_data[t].map_index = map_idx;
> > +
> > + ret = jt_from_map(map, jt);
> > + if (ret)
> > + return ret;
> > + } else {
> > + ret = jt_from_subprog(env, subprog_start, subprog_end, jt);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + /* Check that the every element of the jump table fits within the given subprogram */
> > + for (i = 0; i < jt->off_cnt; i++) {
> > + if (jt->off[i] < subprog_start || jt->off[i] >= subprog_end) {
> > + verbose(env, "jump table for insn %d points outside of the subprog [%u,%u]",
> > + t, subprog_start, subprog_end);
> > + return -EINVAL;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/* "conditional jump with N edges" */
> > +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> > +{
> > + struct jt *jt = &env->insn_aux_data[t].jt;
> > + int ret;
> > +
> > + if (jt->off == NULL) {
> > + ret = create_jt(t, env, fd, jt);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + /*
> > + * Mark jt as allocated. Otherwise, this is not possible to check if it
> > + * was allocated or not in the code which frees memory (jt is a part of
> > + * union)
> > + */
> > + env->insn_aux_data[t].jt_allocated = true;
> > +
> > + return push_goto_x_edge(t, env, jt);
> > +}
> > +
> > /* Visits the instruction at index t and returns one of the following:
> > * < 0 - an error occurred
> > * DONE_EXPLORING - the instruction was fully explored
> > @@ -17786,8 +18100,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> > return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
> >
> > case BPF_JA:
> > - if (BPF_SRC(insn->code) != BPF_K)
> > - return -EINVAL;
> > + if (BPF_SRC(insn->code) == BPF_X)
> > + return visit_goto_x_insn(t, env, insn->imm);
> >
> > if (BPF_CLASS(insn->code) == BPF_JMP)
> > off = insn->off;
>
> [...]
>
> > @@ -18679,6 +19000,10 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
> > case PTR_TO_ARENA:
> > return true;
> > + case PTR_TO_INSN:
> > + /* cur ⊆ old */
>
> Out of curiosity: are unicode symbols allowed in kernel source code?
I've replaced with words, don't see other examples of unicode around
(but also can't find "don't use unicode" in coding-style.rst).
> > + return (rcur->min_index >= rold->min_index &&
> > + rcur->max_index <= rold->max_index);
> > default:
> > return regs_exact(rold, rcur, idmap);
> > }
> > @@ -19825,6 +20150,67 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> > return PROCESS_BPF_EXIT;
> > }
> >
> > +/* gotox *dst_reg */
> > +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > +{
> > + struct bpf_verifier_state *other_branch;
> > + struct bpf_reg_state *dst_reg;
> > + struct bpf_map *map;
> > + int err = 0;
> > + u32 *xoff;
> > + int n;
> > + int i;
> > +
> > + dst_reg = reg_state(env, insn->dst_reg);
> > + if (dst_reg->type != PTR_TO_INSN) {
> > + verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> > + insn->dst_reg, dst_reg->type);
> > + return -EINVAL;
> > + }
> > +
> > + map = dst_reg->map_ptr;
> > + if (!map)
> > + return -EINVAL;
>
> Is this a verifier bug or legit situation?
> If it is a bug, maybe add a verifier_bug() here and return -EFAULT?
Yes, thanks, this would be a bug.
> > +
> > + if (map->map_type != BPF_MAP_TYPE_INSN_ARRAY)
> > + return -EINVAL;
>
> Same question here, ->type is already `PTR_TO_INSN`.
Right, thanks, I can add a bug check here. (I think this check is here
historically earlier than PTR_TO_INSN appeared.)
> > +
> > + if (dst_reg->max_index >= map->max_entries) {
> > + verbose(env, "BPF_JA|BPF_X R%d is out of map boundaries: index=%u, max_index=%u\n",
> > + insn->dst_reg, dst_reg->max_index, map->max_entries-1);
> > + return -EINVAL;
> > + }
> > +
> > + xoff = kvcalloc(dst_reg->max_index - dst_reg->min_index + 1, sizeof(u32), GFP_KERNEL_ACCOUNT);
> > + if (!xoff)
> > + return -ENOMEM;
> > +
> > + n = copy_insn_array_uniq(map, dst_reg->min_index, dst_reg->max_index, xoff);
>
> Nit: I'd avoid this allocation and do a loop for(i = min_index; i <= max_index; i++),
> with map->ops->map_lookup_elem(map, &i) (or a wrapper) inside it.
But it should be a list of unique values, how would you sort it
without allocating memory (in a reqsonable time)?
> > + if (n < 0) {
> > + err = n;
> > + goto free_off;
> > + }
> > + if (n == 0) {
> > + verbose(env, "register R%d doesn't point to any offset in map id=%d\n",
> > + insn->dst_reg, map->id);
> > + err = -EINVAL;
> > + goto free_off;
> > + }
> > +
> > + for (i = 0; i < n - 1; i++) {
> > + other_branch = push_stack(env, xoff[i], env->insn_idx, false);
> > + if (IS_ERR(other_branch)) {
> > + err = PTR_ERR(other_branch);
> > + goto free_off;
> > + }
> > + }
> > + env->insn_idx = xoff[n-1];
> > +
> > +free_off:
> > + kvfree(xoff);
> > + return err;
> > +}
> > +
> > static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> > {
> > int err;
>
> [...]
>
> > @@ -20981,6 +21371,23 @@ static int bpf_adj_linfo_after_remove(struct bpf_verifier_env *env, u32 off,
> > return 0;
> > }
> >
> > +/*
> > + * Clean up dynamically allocated fields of aux data for instructions [start, ..., end]
> > + */
> > +static void clear_insn_aux_data(struct bpf_insn_aux_data *aux_data, int start, int end)
>
> Nit: switching this to (..., int start, int len) would simplify arithmetic at call sites.
Yes, thanks.
>
> > +{
> > + int i;
> > +
> > + for (i = start; i <= end; i++) {
> > + if (aux_data[i].jt_allocated) {
> > + kvfree(aux_data[i].jt.off);
> > + aux_data[i].jt.off = NULL;
> > + aux_data[i].jt.off_cnt = 0;
> > + aux_data[i].jt_allocated = false;
> > + }
> > + }
> > +}
> > +
> > static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
> > {
> > struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
>
> [...]
>
> > @@ -24175,18 +24586,18 @@ static bool can_jump(struct bpf_insn *insn)
> > return false;
> > }
> >
> > -static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> > +static int insn_successors_regular(struct bpf_prog *prog, u32 insn_idx, u32 *succ)
> > {
> > - struct bpf_insn *insn = &prog->insnsi[idx];
> > + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> > int i = 0, insn_sz;
> > u32 dst;
> >
> > insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> > - if (can_fallthrough(insn) && idx + 1 < prog->len)
> > - succ[i++] = idx + insn_sz;
> > + if (can_fallthrough(insn) && insn_idx + 1 < prog->len)
> > + succ[i++] = insn_idx + insn_sz;
> >
> > if (can_jump(insn)) {
> > - dst = idx + jmp_offset(insn) + 1;
> > + dst = insn_idx + jmp_offset(insn) + 1;
> > if (i == 0 || succ[0] != dst)
> > succ[i++] = dst;
> > }
> > @@ -24194,6 +24605,36 @@ static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> > return i;
> > }
> >
> > +static int insn_successors_gotox(struct bpf_verifier_env *env,
> > + struct bpf_prog *prog,
> > + u32 insn_idx, u32 **succ)
> > +{
> > + struct jt *jt = &env->insn_aux_data[insn_idx].jt;
> > +
> > + if (WARN_ON_ONCE(!jt->off || !jt->off_cnt))
> > + return -EFAULT;
> > +
> > + *succ = jt->off;
> > + return jt->off_cnt;
> > +}
> > +
> > +/*
> > + * Fill in *succ[0],...,*succ[n-1] with successors. The default *succ
> > + * pointer (of size 2) may be replaced with a custom one if more
> > + * elements are required (i.e., an indirect jump).
> > + */
> > +static int insn_successors(struct bpf_verifier_env *env,
> > + struct bpf_prog *prog,
> > + u32 insn_idx, u32 **succ)
> > +{
> > + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> > +
> > + if (unlikely(insn_is_gotox(insn)))
> > + return insn_successors_gotox(env, prog, insn_idx, succ);
> > +
> > + return insn_successors_regular(prog, insn_idx, *succ);
> > +}
> > +
>
> The `prog` parameter can be dropped, as it is accessible from `env`.
> I don't like the `u32 **succ` part of this interface.
> What about one of the following alternatives:
>
> - u32 *insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
> and `u32 succ_buf[2]` added to bpf_verifier_env?
I like this variant of yours more than the second one.
Small corrections that this would be
u32 *insn_successors(struct bpf_verifier_env *env, u32 insn_idx, int *succ_num)
to return the number of instructions.
> - int insn_successor(struct bpf_verifier_env *env, u32 insn_idx, u32 succ_num):
> bool fallthrough = can_fallthrough(insn);
> bool jump = can_jump(insn);
> if (succ_num == 0) {
> if (fallthrough)
> return <next insn>
> if (jump)
> return <jump tgt>
> } else if (succ_num == 1) {
> if (fallthrough && jump)
> return <jmp tgt>
> } else if (is_gotox) {
> return <lookup>
> }
> return -1;
>
> ?
>
> > /* Each field is a register bitmask */
> > struct insn_live_regs {
> > u16 use; /* registers read by instruction */
> > @@ -24387,11 +24828,17 @@ static int compute_live_registers(struct bpf_verifier_env *env)
>
> Could you please extend `tools/testing/selftests/bpf/progs/compute_live_registers.c`
> with test cases for gotox?
Yes, thanks for pointing to it, will do.
> > int insn_idx = env->cfg.insn_postorder[i];
> > struct insn_live_regs *live = &state[insn_idx];
> > int succ_num;
> > - u32 succ[2];
> > + u32 _succ[2];
> > + u32 *succ = &_succ[0];
> > u16 new_out = 0;
> > u16 new_in = 0;
> >
> > - succ_num = insn_successors(env->prog, insn_idx, succ);
> > + succ_num = insn_successors(env, env->prog, insn_idx, &succ);
> > + if (succ_num < 0) {
> > + err = succ_num;
> > + goto out;
> > +
> > + }
> > for (int s = 0; s < succ_num; ++s)
> > new_out |= state[succ[s]].in;
> > new_in = (new_out & ~live->def) | live->use;
>
> [...]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-28 9:58 ` Anton Protopopov
@ 2025-08-28 14:15 ` Anton Protopopov
2025-08-28 16:10 ` Eduard Zingerman
2025-08-28 16:30 ` Eduard Zingerman
1 sibling, 1 reply; 38+ messages in thread
From: Anton Protopopov @ 2025-08-28 14:15 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On 25/08/28 09:58AM, Anton Protopopov wrote:
> On 25/08/25 04:15PM, Eduard Zingerman wrote:
> > On Sat, 2025-08-16 at 18:06 +0000, Anton Protopopov wrote:
> >
>
[...]
> > >
> > > -static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> > > +static int insn_successors_regular(struct bpf_prog *prog, u32 insn_idx, u32 *succ)
> > > {
> > > - struct bpf_insn *insn = &prog->insnsi[idx];
> > > + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> > > int i = 0, insn_sz;
> > > u32 dst;
> > >
> > > insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> > > - if (can_fallthrough(insn) && idx + 1 < prog->len)
> > > - succ[i++] = idx + insn_sz;
> > > + if (can_fallthrough(insn) && insn_idx + 1 < prog->len)
> > > + succ[i++] = insn_idx + insn_sz;
> > >
> > > if (can_jump(insn)) {
> > > - dst = idx + jmp_offset(insn) + 1;
> > > + dst = insn_idx + jmp_offset(insn) + 1;
> > > if (i == 0 || succ[0] != dst)
> > > succ[i++] = dst;
> > > }
> > > @@ -24194,6 +24605,36 @@ static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> > > return i;
> > > }
> > >
> > > +static int insn_successors_gotox(struct bpf_verifier_env *env,
> > > + struct bpf_prog *prog,
> > > + u32 insn_idx, u32 **succ)
> > > +{
> > > + struct jt *jt = &env->insn_aux_data[insn_idx].jt;
> > > +
> > > + if (WARN_ON_ONCE(!jt->off || !jt->off_cnt))
> > > + return -EFAULT;
> > > +
> > > + *succ = jt->off;
> > > + return jt->off_cnt;
> > > +}
> > > +
> > > +/*
> > > + * Fill in *succ[0],...,*succ[n-1] with successors. The default *succ
> > > + * pointer (of size 2) may be replaced with a custom one if more
> > > + * elements are required (i.e., an indirect jump).
> > > + */
> > > +static int insn_successors(struct bpf_verifier_env *env,
> > > + struct bpf_prog *prog,
> > > + u32 insn_idx, u32 **succ)
> > > +{
> > > + struct bpf_insn *insn = &prog->insnsi[insn_idx];
> > > +
> > > + if (unlikely(insn_is_gotox(insn)))
> > > + return insn_successors_gotox(env, prog, insn_idx, succ);
> > > +
> > > + return insn_successors_regular(prog, insn_idx, *succ);
> > > +}
> > > +
> >
> > The `prog` parameter can be dropped, as it is accessible from `env`.
> > I don't like the `u32 **succ` part of this interface.
> > What about one of the following alternatives:
> >
> > - u32 *insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
> > and `u32 succ_buf[2]` added to bpf_verifier_env?
>
> I like this variant of yours more than the second one.
>
> Small corrections that this would be
>
> u32 *insn_successors(struct bpf_verifier_env *env, u32 insn_idx, int *succ_num)
>
> to return the number of instructions.
>
> > - int insn_successor(struct bpf_verifier_env *env, u32 insn_idx, u32 succ_num):
> > bool fallthrough = can_fallthrough(insn);
> > bool jump = can_jump(insn);
> > if (succ_num == 0) {
> > if (fallthrough)
> > return <next insn>
> > if (jump)
> > return <jump tgt>
> > } else if (succ_num == 1) {
> > if (fallthrough && jump)
> > return <jmp tgt>
> > } else if (is_gotox) {
> > return <lookup>
> > }
> > return -1;
> >
> > ?
So, insn_successors() actually returns two values: a pointer and a
number elements. This is the same value as "struct bpf_jt" (struct jt
in the sent patch). Wdyt about
struct bpf_jt *insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
? (Maybe bpf_jt is not right name here, "insn_array" is already used
in the map, maybe smth with "successors"?)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-28 14:15 ` Anton Protopopov
@ 2025-08-28 16:10 ` Eduard Zingerman
0 siblings, 0 replies; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-28 16:10 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On Thu, 2025-08-28 at 14:15 +0000, Anton Protopopov wrote:
[...]
> So, insn_successors() actually returns two values: a pointer and a
> number elements. This is the same value as "struct bpf_jt" (struct jt
> in the sent patch). Wdyt about
>
> struct bpf_jt *insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
>
> ? (Maybe bpf_jt is not right name here, "insn_array" is already used
> in the map, maybe smth with "successors"?)
Yes, that's an option.
It can also be returned by value, if that would be convenient.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps
2025-08-28 9:58 ` Anton Protopopov
2025-08-28 14:15 ` Anton Protopopov
@ 2025-08-28 16:30 ` Eduard Zingerman
1 sibling, 0 replies; 38+ messages in thread
From: Eduard Zingerman @ 2025-08-28 16:30 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Quentin Monnet, Yonghong Song
On Thu, 2025-08-28 at 09:58 +0000, Anton Protopopov wrote:
[...]
> > > @@ -16943,7 +17016,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > > }
> > > dst_reg->type = PTR_TO_MAP_VALUE;
> > > dst_reg->off = aux->map_off;
> > > - WARN_ON_ONCE(map->max_entries != 1);
> > > + WARN_ON_ONCE(map->map_type != BPF_MAP_TYPE_INSN_ARRAY &&
> > > + map->max_entries != 1);
> >
> > Q: when is this necessary?
>
> For all maps except INSN_ARRAY only (map->max_entries == 1) is
> allowed. This change adds an exception for INSN_ARRAY.
I see, thank you for explaining.
[...]
> > > +static int cmp_ptr_to_u32(const void *a, const void *b)
> > > +{
> > > + return *(u32 *)a - *(u32 *)b;
> > > +}
> >
> > This will overflow for e.g. `0 - 8`.
>
> Why? 0U - 8U = 0xfffffff8U (it's not an UB because values are
> unsigned). Then it's cast to int on return which is -8.
Uh-oh. Ok, looks like this works.
[...]
> > > +static int jt_from_subprog(struct bpf_verifier_env *env,
> > > + int subprog_start,
> > > + int subprog_end,
> > > + struct jt *jt)
> > > +{
> > > + struct bpf_map *map;
> > > + struct jt jt_cur;
> > > + u32 *off;
> > > + int err;
> > > + int i;
> > > +
> > > + jt->off = NULL;
> > > + jt->off_cnt = 0;
> > > +
> > > + for (i = 0; i < env->insn_array_map_cnt; i++) {
> > > + /*
> > > + * TODO (when needed): collect only jump tables, not static keys
> > > + * or maps for indirect calls
> > > + */
> > > + map = env->insn_array_maps[i];
> > > +
> > > + err = jt_from_map(map, &jt_cur);
> > > + if (err) {
> > > + kvfree(jt->off);
> > > + return err;
> > > + }
> > > +
> > > + /*
> > > + * This is enough to check one element. The full table is
> > > + * checked to fit inside the subprog later in create_jt()
> > > + */
> > > + if (jt_cur.off[0] >= subprog_start && jt_cur.off[0] < subprog_end) {
> >
> > This won't always catch cases when insn array references offsets from
> > several subprograms. Also is one subprogram limitation really necessary?
>
> This was intentional. If you have a switch or a jump table
> defined in C, then corresponding jump tables belong to one function.
> Also, what if you have a jt which can jump from function f() to g(),
> but then g() is livepatched by another function?
Ok, yes, for gotox such limitation makes sense.
[...]
> > > @@ -18679,6 +19000,10 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > > return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
> > > case PTR_TO_ARENA:
> > > return true;
> > > + case PTR_TO_INSN:
> > > + /* cur ⊆ old */
> >
> > Out of curiosity: are unicode symbols allowed in kernel source code?
>
> I've replaced with words, don't see other examples of unicode around
> (but also can't find "don't use unicode" in coding-style.rst).
Personally, I like unicode symbols :)
> > > + return (rcur->min_index >= rold->min_index &&
> > > + rcur->max_index <= rold->max_index);
> > > default:
> > > return regs_exact(rold, rcur, idmap);
> > > }
> > > @@ -19825,6 +20150,67 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> > > return PROCESS_BPF_EXIT;
> > > }
> > >
> > > +/* gotox *dst_reg */
> > > +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > > +{
[...]
> > > + if (dst_reg->max_index >= map->max_entries) {
> > > + verbose(env, "BPF_JA|BPF_X R%d is out of map boundaries: index=%u, max_index=%u\n",
> > > + insn->dst_reg, dst_reg->max_index, map->max_entries-1);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + xoff = kvcalloc(dst_reg->max_index - dst_reg->min_index + 1, sizeof(u32), GFP_KERNEL_ACCOUNT);
> > > + if (!xoff)
> > > + return -ENOMEM;
> > > +
> > > + n = copy_insn_array_uniq(map, dst_reg->min_index, dst_reg->max_index, xoff);
> >
> > Nit: I'd avoid this allocation and do a loop for(i = min_index; i <= max_index; i++),
> > with map->ops->map_lookup_elem(map, &i) (or a wrapper) inside it.
>
> But it should be a list of unique values, how would you sort it
> without allocating memory (in a reqsonable time)?
Because of the push_state() loop below, right?
Makes sense.
> > > + if (n < 0) {
> > > + err = n;
> > > + goto free_off;
> > > + }
> > > + if (n == 0) {
> > > + verbose(env, "register R%d doesn't point to any offset in map id=%d\n",
> > > + insn->dst_reg, map->id);
> > > + err = -EINVAL;
> > > + goto free_off;
> > > + }
> > > +
> > > + for (i = 0; i < n - 1; i++) {
> > > + other_branch = push_stack(env, xoff[i], env->insn_idx, false);
> > > + if (IS_ERR(other_branch)) {
> > > + err = PTR_ERR(other_branch);
> > > + goto free_off;
> > > + }
> > > + }
> > > + env->insn_idx = xoff[n-1];
> > > +
> > > +free_off:
> > > + kvfree(xoff);
> > > + return err;
> > > +}
> > > +
[...]
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2025-08-28 16:30 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack Anton Protopopov
2025-08-25 18:12 ` Eduard Zingerman
2025-08-26 15:00 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 02/11] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array Anton Protopopov
2025-08-25 21:05 ` Eduard Zingerman
2025-08-26 15:52 ` Anton Protopopov
2025-08-26 16:04 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 04/11] selftests/bpf: add selftests for new insn_array map Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding Anton Protopopov
2025-08-17 5:50 ` kernel test robot
2025-08-18 8:24 ` Anton Protopopov
2025-08-25 23:29 ` Eduard Zingerman
2025-08-27 9:20 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 06/11] selftests/bpf: test instructions arrays with blinding Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 07/11] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps Anton Protopopov
2025-08-18 7:57 ` Dan Carpenter
2025-08-18 8:22 ` Anton Protopopov
2025-08-25 23:15 ` Eduard Zingerman
2025-08-27 15:34 ` Anton Protopopov
2025-08-27 18:58 ` Eduard Zingerman
2025-08-28 9:58 ` Anton Protopopov
2025-08-28 14:15 ` Anton Protopopov
2025-08-28 16:10 ` Eduard Zingerman
2025-08-28 16:30 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 09/11] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps Anton Protopopov
2025-08-21 0:20 ` Andrii Nakryiko
2025-08-21 13:05 ` Anton Protopopov
2025-08-21 18:14 ` Andrii Nakryiko
2025-08-21 19:12 ` Anton Protopopov
2025-08-26 0:06 ` Eduard Zingerman
2025-08-26 16:15 ` Anton Protopopov
2025-08-26 16:51 ` Anton Protopopov
2025-08-26 16:47 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 11/11] selftests/bpf: add selftests for " Anton Protopopov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).