* [PATCH v2 bpf-next 00/13] BPF indirect jumps
@ 2025-09-13 19:39 Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 01/13] bpf: fix the return value of push_stack Anton Protopopov
` (12 more replies)
0 siblings, 13 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
This patchset implements a new type of map, instruction set, and uses
it to build support for indirect branches in BPF (on x86). (The same
map will be later used to provide support for indirect calls and static
keys.) See [1], [2] for more context.
Short table of contents:
* Patches 1-6 implement the new map of type
BPF_MAP_TYPE_INSN_SET and corresponding selftests. This map can
be used to track the "original -> xlated -> jitted mapping" for
a given program. Patches 5,6 add support for "blinded" variant.
* Patches 7,8,9 implement the support for indirect jumps
* Patches 10--13 add support for LLVM-compiled programs containing
indirect jumps.
A special LLVM should be used for that, see [3] for the details and
some related discussions. Due to this fact, selftests for indirect
jumps which directly use `goto *rX` are commented out (such that
CI can run). Due to this fact, I've run test_progs compiled with
indirect jumps as described in [4] (in brief, all tests which
normally pass on my setup, pass with indirect jumps).
There is a list of TBDs (mostly, more selftests), but the list of
changes looks big enough to send the v2.
See individual patches for more details on the implementation details.
v1 -> v2:
* push_stack changes:
* sanitize_speculative_path should just return int (Eduard)
* return code from sanitize_speculative_path, not EFAULT (Eduard)
* when BPF_COMPLEXITY_LIMIT_JMP_SEQ is reached, return E2BIG (Eduard)
* indirect jumps:
* omit support for .imm=fd in gotox, as we're not using it for now (Eduard)
* struct jt -> struct bpf_iarray (Eduard)
* insn_successors: rewrite the interface to just return a pointer (Eduard)
* remove min_index/max_index, use umin_value/umax_value instead (Alexei, Eduard)
* move emit_indirect_jump args change to the previous patch (Eduard)
* add a comment to map_mem_size() (Eduard)
* use verifier_bug for some error cases in check_indirect_jump (Eduard)
* clear_insn_aux_data: use start,len instead of start,end (Eduard)
* make regs[insn->dst_reg].type = PTR_TO_INSN part of check_mem_access (Eduard)
* constant blinding changes:
* make subprog_start adjustment better readable (Eduard)
* do not set subprog len, it is already set (Eduard)
* libbpf:
* remove check that relocations from .rodata are ok (Anton)
* do not freeze the map, it is not necessary anymore (Anton)
* rename the goto_x -> gotox everywhere (Anton)
* use u64 when parsing LLVM jump tables (Eduard)
* split patch in two due to spaces->tabs change (Eduard)
* split bpftool changes to bpftool patch (Andrii)
* make sym_size it a union with ext_idx (Andrii)
* properly copy/free the jumptables_data section from elf (Andrii)
* a few cosmetic changes around create_jt_map (Andrii)
* fix some comments + rewrite patch description (Andrii)
* inline bpf_prog__append_subprog_offsets (Andrii)
* subprog_sec_offst -> subprog_sec_off (Andrii)
* !strcmp -> strcmp() == 0 (Andrii)
* make some function names more readable (Andrii)
* allocate table of subfunc offsets via libbpf_reallocarray (Andrii)
* selftests:
* squash insn_array* tests together (Anton)
* fixed build warnings (kernel test robot)
RFC -> v1:
* I've tried to address all the comments provided by Alexei and
Eduard in RFC. Will try to list the most important of them below.
* One big change: move from older LLVM version [5] to newer [4].
Now LLVM generates jump tables as symbols in the new special
section ".jumptables". Another part of this change is that
libbpf now doesn't try to link map load and goto *rX, as
1) this is absolutely not reliable 2) for some use cases this
is impossible (namely, when more than one jump table can be used
in the same gotox instruction).
* Added insn_successors() support (Alexei, Eduard). This includes
getting rid of the ugly bpf_insn_set_iter_xlated_offset()
interface (Eduard).
* Removed hack for the unreachable instruction, as new LLVM thank to
Eduard doesn't generate it.
* Set mem_size for direct map access properly instead of hacking.
Remove off>0 check. (Alexei)
* Do not allocate new memory for min_index/max_index (Alexei, Eduard)
* Information required during check_cfg is now cached to be reused
later (Alexei + general logic for supporting multiple JT per jump)
* Properly compare registers in regsafe (Alexei, Eduard)
* Remove support for JMP32 (Eduard)
* Better checks in adjust_ptr_min_max_vals (Eduard)
* More selftests were added (but still there's room for more) which
directly use gotox (Alexei)
* More checks and verbose messages added
* "unique pointers" are no more in the map
Links:
1. https://lpc.events/event/18/contributions/1941/
2. https://lwn.net/Articles/1017439/
3. https://github.com/llvm/llvm-project/pull/149715
4. https://github.com/llvm/llvm-project/pull/149715#issuecomment-3274833753
5. v1: https://lore.kernel.org/bpf/20250816180631.952085-1-a.s.protopopov@gmail.com/
6. rfc: https://lore.kernel.org/bpf/20250615085943.3871208-1-a.s.protopopov@gmail.com/
Anton Protopopov (13):
bpf: fix the return value of push_stack
bpf: save the start of functions in bpf_prog_aux
bpf, x86: add new map type: instructions array
selftests/bpf: add selftests for new insn_array map
bpf: support instructions arrays with constants blinding
selftests/bpf: test instructions arrays with blinding
bpf, x86: allow indirect jumps to r8...r15
bpf, x86: add support for indirect jumps
bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X
libbpf: fix formatting of bpf_object__append_subprog_code
libbpf: support llvm-generated indirect jumps
bpftool: Recognize insn_array map type
selftests/bpf: add selftests for indirect jumps
arch/x86/net/bpf_jit_comp.c | 39 +-
include/linux/bpf.h | 30 +
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 17 +
include/uapi/linux/bpf.h | 11 +
kernel/bpf/Makefile | 2 +-
kernel/bpf/bpf_insn_array.c | 350 ++++++++++
kernel/bpf/core.c | 21 +
kernel/bpf/disasm.c | 9 +
kernel/bpf/log.c | 1 +
kernel/bpf/syscall.c | 22 +
kernel/bpf/verifier.c | 646 ++++++++++++++++--
.../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
tools/include/uapi/linux/bpf.h | 11 +
tools/lib/bpf/libbpf.c | 192 +++++-
tools/lib/bpf/libbpf_probes.c | 4 +
tools/lib/bpf/linker.c | 10 +-
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/prog_tests/bpf_gotox.c | 132 ++++
.../selftests/bpf/prog_tests/bpf_insn_array.c | 497 ++++++++++++++
tools/testing/selftests/bpf/progs/bpf_gotox.c | 384 +++++++++++
22 files changed, 2277 insertions(+), 110 deletions(-)
create mode 100644 kernel/bpf/bpf_insn_array.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_gotox.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_gotox.c
--
2.34.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 01/13] bpf: fix the return value of push_stack
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 02/13] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
` (11 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
In [1] Eduard mentioned that on push_stack failure verifier code
should return -ENOMEM instead of -EFAULT. After checking with the
other call sites I've found that code randomly returns either -ENOMEM
or -EFAULT. This patch unifies the return values for the push_stack
(and similar push_async_cb) functions such that error codes are
always assigned properly.
[1] https://lore.kernel.org/bpf/20250615085943.3871208-1-a.s.protopopov@gmail.com
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
kernel/bpf/verifier.c | 80 +++++++++++++++++++++----------------------
1 file changed, 40 insertions(+), 40 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 17fe623400a5..5b4d28048b19 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2105,7 +2105,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
elem = kzalloc(sizeof(struct bpf_verifier_stack_elem), GFP_KERNEL_ACCOUNT);
if (!elem)
- return NULL;
+ return ERR_PTR(-ENOMEM);
elem->insn_idx = insn_idx;
elem->prev_insn_idx = prev_insn_idx;
@@ -2115,12 +2115,12 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
env->stack_size++;
err = copy_verifier_state(&elem->st, cur);
if (err)
- return NULL;
+ return ERR_PTR(-ENOMEM);
elem->st.speculative |= speculative;
if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
verbose(env, "The sequence of %d jumps is too complex.\n",
env->stack_size);
- return NULL;
+ return ERR_PTR(-E2BIG);
}
if (elem->st.parent) {
++elem->st.parent->branches;
@@ -2917,7 +2917,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
elem = kzalloc(sizeof(struct bpf_verifier_stack_elem), GFP_KERNEL_ACCOUNT);
if (!elem)
- return NULL;
+ return ERR_PTR(-ENOMEM);
elem->insn_idx = insn_idx;
elem->prev_insn_idx = prev_insn_idx;
@@ -2929,7 +2929,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
verbose(env,
"The sequence of %d jumps is too complex for async cb.\n",
env->stack_size);
- return NULL;
+ return ERR_PTR(-E2BIG);
}
/* Unlike push_stack() do not copy_verifier_state().
* The caller state doesn't matter.
@@ -2940,7 +2940,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
elem->st.in_sleepable = is_sleepable;
frame = kzalloc(sizeof(*frame), GFP_KERNEL_ACCOUNT);
if (!frame)
- return NULL;
+ return ERR_PTR(-ENOMEM);
init_func_state(env, frame,
BPF_MAIN_FUNC /* callsite */,
0 /* frameno within this callchain */,
@@ -9055,8 +9055,8 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
prev_st = find_prev_entry(env, cur_st->parent, insn_idx);
/* branch out active iter state */
queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
- if (!queued_st)
- return -ENOMEM;
+ if (IS_ERR(queued_st))
+ return PTR_ERR(queued_st);
queued_iter = get_iter_from_state(queued_st, meta);
queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
@@ -10626,8 +10626,8 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
async_cb = push_async_cb(env, env->subprog_info[subprog].start,
insn_idx, subprog,
is_bpf_wq_set_callback_impl_kfunc(insn->imm));
- if (!async_cb)
- return -EFAULT;
+ if (IS_ERR(async_cb))
+ return PTR_ERR(async_cb);
callee = async_cb->frame[0];
callee->async_entry_cnt = caller->async_entry_cnt + 1;
@@ -10643,8 +10643,8 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
* proceed with next instruction within current frame.
*/
callback_state = push_stack(env, env->subprog_info[subprog].start, insn_idx, false);
- if (!callback_state)
- return -ENOMEM;
+ if (IS_ERR(callback_state))
+ return PTR_ERR(callback_state);
err = setup_func_entry(env, subprog, insn_idx, set_callee_state_cb,
callback_state);
@@ -13793,9 +13793,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
struct bpf_reg_state *regs;
branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
- if (!branch) {
+ if (IS_ERR(branch)) {
verbose(env, "failed to push state for failed lock acquisition\n");
- return -ENOMEM;
+ return PTR_ERR(branch);
}
regs = branch->frame[branch->curframe]->regs;
@@ -14223,16 +14223,15 @@ struct bpf_sanitize_info {
bool mask_to_left;
};
-static struct bpf_verifier_state *
-sanitize_speculative_path(struct bpf_verifier_env *env,
- const struct bpf_insn *insn,
- u32 next_idx, u32 curr_idx)
+static int sanitize_speculative_path(struct bpf_verifier_env *env,
+ const struct bpf_insn *insn,
+ u32 next_idx, u32 curr_idx)
{
struct bpf_verifier_state *branch;
struct bpf_reg_state *regs;
branch = push_stack(env, next_idx, curr_idx, true);
- if (branch && insn) {
+ if (!IS_ERR(branch) && insn) {
regs = branch->frame[branch->curframe]->regs;
if (BPF_SRC(insn->code) == BPF_K) {
mark_reg_unknown(env, regs, insn->dst_reg);
@@ -14241,7 +14240,7 @@ sanitize_speculative_path(struct bpf_verifier_env *env,
mark_reg_unknown(env, regs, insn->src_reg);
}
}
- return branch;
+ return IS_ERR(branch) ? PTR_ERR(branch) : 0;
}
static int sanitize_ptr_alu(struct bpf_verifier_env *env,
@@ -14260,7 +14259,6 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
u8 opcode = BPF_OP(insn->code);
u32 alu_state, alu_limit;
struct bpf_reg_state tmp;
- bool ret;
int err;
if (can_skip_alu_sanitation(env, insn))
@@ -14333,11 +14331,12 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
tmp = *dst_reg;
copy_register_state(dst_reg, ptr_reg);
}
- ret = sanitize_speculative_path(env, NULL, env->insn_idx + 1,
- env->insn_idx);
- if (!ptr_is_dst_reg && ret)
+ err = sanitize_speculative_path(env, NULL, env->insn_idx + 1, env->insn_idx);
+ if (err < 0)
+ return REASON_STACK;
+ if (!ptr_is_dst_reg)
*dst_reg = tmp;
- return !ret ? REASON_STACK : 0;
+ return 0;
}
static void sanitize_mark_insn_seen(struct bpf_verifier_env *env)
@@ -16660,8 +16659,8 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
/* branch out 'fallthrough' insn as a new state to explore */
queued_st = push_stack(env, idx + 1, idx, false);
- if (!queued_st)
- return -ENOMEM;
+ if (IS_ERR(queued_st))
+ return PTR_ERR(queued_st);
queued_st->may_goto_depth++;
if (prev_st)
@@ -16739,10 +16738,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
* the fall-through branch for simulation under speculative
* execution.
*/
- if (!env->bypass_spec_v1 &&
- !sanitize_speculative_path(env, insn, *insn_idx + 1,
- *insn_idx))
- return -EFAULT;
+ if (!env->bypass_spec_v1) {
+ err = sanitize_speculative_path(env, insn, *insn_idx + 1, *insn_idx);
+ if (err < 0)
+ return err;
+ }
if (env->log.level & BPF_LOG_LEVEL)
print_insn_state(env, this_branch, this_branch->curframe);
*insn_idx += insn->off;
@@ -16752,11 +16752,12 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
* program will go. If needed, push the goto branch for
* simulation under speculative execution.
*/
- if (!env->bypass_spec_v1 &&
- !sanitize_speculative_path(env, insn,
- *insn_idx + insn->off + 1,
- *insn_idx))
- return -EFAULT;
+ if (!env->bypass_spec_v1) {
+ err = sanitize_speculative_path(env, insn, *insn_idx + insn->off + 1,
+ *insn_idx);
+ if (err < 0)
+ return err;
+ }
if (env->log.level & BPF_LOG_LEVEL)
print_insn_state(env, this_branch, this_branch->curframe);
return 0;
@@ -16777,10 +16778,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
return err;
}
- other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx,
- false);
- if (!other_branch)
- return -EFAULT;
+ other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx, false);
+ if (IS_ERR(other_branch))
+ return PTR_ERR(other_branch);
other_branch_regs = other_branch->frame[other_branch->curframe]->regs;
if (BPF_SRC(insn->code) == BPF_X) {
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 02/13] bpf: save the start of functions in bpf_prog_aux
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 01/13] bpf: fix the return value of push_stack Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array Anton Protopopov
` (10 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Introduce a new subprog_start field in bpf_prog_aux. This field may
be used by JIT compilers wanting to know the real absolute xlated
offset of the function being jitted. The func_info[func_id] may have
served this purpose, but func_info may be NULL, so JIT compilers
can't rely on it.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
include/linux/bpf.h | 1 +
kernel/bpf/verifier.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 41f776071ff5..1056fd0d54d3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1601,6 +1601,7 @@ struct bpf_prog_aux {
u32 ctx_arg_info_size;
u32 max_rdonly_access;
u32 max_rdwr_access;
+ u32 subprog_start;
struct btf *attach_btf;
struct bpf_ctx_arg_aux *ctx_arg_info;
void __percpu *priv_stack_ptr;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5b4d28048b19..14c0c6fe9416 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21597,6 +21597,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->func_idx = i;
/* Below members will be freed only at prog->aux */
func[i]->aux->btf = prog->aux->btf;
+ func[i]->aux->subprog_start = subprog_start;
func[i]->aux->func_info = prog->aux->func_info;
func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
func[i]->aux->poke_tab = prog->aux->poke_tab;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 01/13] bpf: fix the return value of push_stack Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 02/13] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-15 4:09 ` kernel test robot
2025-09-20 0:30 ` Alexei Starovoitov
2025-09-13 19:39 ` [PATCH v2 bpf-next 04/13] selftests/bpf: add selftests for new insn_array map Anton Protopopov
` (9 subsequent siblings)
12 siblings, 2 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
On bpf(BPF_PROG_LOAD) syscall user-supplied BPF programs are
translated by the verifier into "xlated" BPF programs. During this
process the original instructions offsets might be adjusted and/or
individual instructions might be replaced by new sets of instructions,
or deleted.
Add a new BPF map type which is aimed to keep track of how, for a
given program, the original instructions were relocated during the
verification. Also, besides keeping track of the original -> xlated
mapping, make x86 JIT to build the xlated -> jitted mapping for every
instruction listed in an instruction array. This is required for every
future application of instruction arrays: static keys, indirect jumps
and indirect calls.
A map of the BPF_MAP_TYPE_INSN_ARRAY type must be created with a u32
keys and value of size 8. The values have different semantics for
userspace and for BPF space. For userspace a value consists of two
u32 values – xlated and jitted offsets. For BPF side the value is
a real pointer to a jitted instruction.
On map creation/initialization, before loading the program, each
element of the map should be initialized to point to an instruction
offset within the program. Before the program load such maps should
be made frozen. After the program verification xlated and jitted
offsets can be read via the bpf(2) syscall.
If a tracked instruction is removed by the verifier, then the xlated
offset is set to (u32)-1 which is considered to be too big for a valid
BPF program offset.
One such a map can, obviously, be used to track one and only one BPF
program. If the verification process was unsuccessful, then the same
map can be re-used to verify the program with a different log level.
However, if the program was loaded fine, then such a map, being
frozen in any case, can't be reused by other programs even after the
program release.
Example. Consider the following original and xlated programs:
Original prog: Xlated prog:
0: r1 = 0x0 0: r1 = 0
1: *(u32 *)(r10 - 0x4) = r1 1: *(u32 *)(r10 -4) = r1
2: r2 = r10 2: r2 = r10
3: r2 += -0x4 3: r2 += -4
4: r1 = 0x0 ll 4: r1 = map[id:88]
6: call 0x1 6: r1 += 272
7: r0 = *(u32 *)(r2 +0)
8: if r0 >= 0x1 goto pc+3
9: r0 <<= 3
10: r0 += r1
11: goto pc+1
12: r0 = 0
7: r6 = r0 13: r6 = r0
8: if r6 == 0x0 goto +0x2 14: if r6 == 0x0 goto pc+4
9: call 0x76 15: r0 = 0xffffffff8d2079c0
17: r0 = *(u64 *)(r0 +0)
10: *(u64 *)(r6 + 0x0) = r0 18: *(u64 *)(r6 +0) = r0
11: r0 = 0x0 19: r0 = 0x0
12: exit 20: exit
An instruction array map, containing, e.g., instructions [0,4,7,12]
will be translated by the verifier to [0,4,13,20]. A map with
index 5 (the middle of 16-byte instruction) or indexes greater than 12
(outside the program boundaries) would be rejected.
The functionality provided by this patch will be extended in consequent
patches to implement BPF Static Keys, indirect jumps, and indirect calls.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
arch/x86/net/bpf_jit_comp.c | 8 +
include/linux/bpf.h | 28 +++
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 2 +
include/uapi/linux/bpf.h | 11 ++
kernel/bpf/Makefile | 2 +-
kernel/bpf/bpf_insn_array.c | 336 +++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 22 +++
kernel/bpf/verifier.c | 43 +++++
tools/include/uapi/linux/bpf.h | 11 ++
10 files changed, 463 insertions(+), 1 deletion(-)
create mode 100644 kernel/bpf/bpf_insn_array.c
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8d34a9400a5e..8792d7f371d3 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1664,6 +1664,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
prog = temp;
for (i = 1; i <= insn_cnt; i++, insn++) {
+ u32 abs_xlated_off = bpf_prog->aux->subprog_start + i - 1;
const s32 imm32 = insn->imm;
u32 dst_reg = insn->dst_reg;
u32 src_reg = insn->src_reg;
@@ -2717,6 +2718,13 @@ st: if (is_imm8(insn->off))
return -EFAULT;
}
memcpy(rw_image + proglen, temp, ilen);
+
+ /*
+ * Instruction arrays need to know how xlated code
+ * maps to jitted code
+ */
+ bpf_prog_update_insn_ptr(bpf_prog, abs_xlated_off, proglen,
+ image + proglen);
}
proglen += ilen;
addrs[i] = proglen;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1056fd0d54d3..77fcb508d6ae 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3717,4 +3717,32 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
const char **linep, int *nump);
struct bpf_prog *bpf_prog_find_from_stack(void);
+int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog);
+int bpf_insn_array_ready(struct bpf_map *map);
+void bpf_insn_array_release(struct bpf_map *map);
+void bpf_insn_array_adjust(struct bpf_map *map, u32 off, u32 len);
+void bpf_insn_array_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
+
+/*
+ * The struct bpf_insn_ptr structure describes a pointer to a
+ * particular instruction in a loaded BPF program. Initially
+ * it is initialised from userspace via user_value.xlated_off.
+ * During the program verification all other fields are populated
+ * accordingly:
+ *
+ * jitted_ip: address of the instruction in the jitted image
+ * user_value: user-visible xlated and jitted offsets
+ * orig_xlated_off: original offset of the instruction
+ */
+struct bpf_insn_ptr {
+ void *jitted_ip;
+ struct bpf_insn_array_value user_value;
+ u32 orig_xlated_off;
+};
+
+void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
+ u32 xlated_off,
+ u32 jitted_off,
+ void *jitted_ip);
+
#endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa78f49d4a9a..b13de31e163f 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -133,6 +133,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_INSN_ARRAY, insn_array_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 020de62bd09c..aca43c284203 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -766,8 +766,10 @@ struct bpf_verifier_env {
struct list_head free_list; /* list of struct bpf_verifier_state_list */
struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
struct btf_mod_pair used_btfs[MAX_USED_BTFS]; /* array of BTF's used by BPF program */
+ struct bpf_map *insn_array_maps[MAX_USED_MAPS]; /* array of INSN_ARRAY map's to be relocated */
u32 used_map_cnt; /* number of used maps */
u32 used_btf_cnt; /* number of used BTF objects */
+ u32 insn_array_map_cnt; /* number of used maps of type BPF_MAP_TYPE_INSN_ARRAY */
u32 id_gen; /* used to generate unique reg IDs */
u32 hidden_subprog_cnt; /* number of hidden subprogs */
int exception_callback_subprog;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 233de8677382..021c27ee5591 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1026,6 +1026,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
+ BPF_MAP_TYPE_INSN_ARRAY,
__MAX_BPF_MAP_TYPE
};
@@ -7623,4 +7624,14 @@ enum bpf_kfunc_flags {
BPF_F_PAD_ZEROS = (1ULL << 0),
};
+/*
+ * Values of a BPF_MAP_TYPE_INSN_ARRAY entry must be of this type.
+ * On updates jitted_off must be equal to 0.
+ */
+struct bpf_insn_array_value {
+ __u32 jitted_off;
+ __u32 xlated_off;
+};
+
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f6cf8c2af5f7..e596b66a48e6 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -9,7 +9,7 @@ CFLAGS_core.o += -Wno-override-init $(cflags-nogcse-yy)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
-obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o bpf_insn_array.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c
new file mode 100644
index 000000000000..0c8dac62f457
--- /dev/null
+++ b/kernel/bpf/bpf_insn_array.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/bpf.h>
+#include <linux/sort.h>
+
+#define MAX_INSN_ARRAY_ENTRIES 256
+
+struct bpf_insn_array {
+ struct bpf_map map;
+ struct mutex state_mutex;
+ int state;
+ long *ips;
+ DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
+};
+
+enum {
+ INSN_ARRAY_STATE_FREE = 0,
+ INSN_ARRAY_STATE_INIT,
+ INSN_ARRAY_STATE_READY,
+};
+
+#define cast_insn_array(MAP_PTR) \
+ container_of(MAP_PTR, struct bpf_insn_array, map)
+
+#define INSN_DELETED ((u32)-1)
+
+static inline u32 insn_array_alloc_size(u32 max_entries)
+{
+ const u32 base_size = sizeof(struct bpf_insn_array);
+ const u32 entry_size = sizeof(struct bpf_insn_ptr);
+
+ return base_size + entry_size * max_entries;
+}
+
+static int insn_array_alloc_check(union bpf_attr *attr)
+{
+ if (attr->max_entries == 0 ||
+ attr->key_size != 4 ||
+ attr->value_size != 8 ||
+ attr->map_flags != 0)
+ return -EINVAL;
+
+ if (attr->max_entries > MAX_INSN_ARRAY_ENTRIES)
+ return -E2BIG;
+
+ return 0;
+}
+
+static void insn_array_free(struct bpf_map *map)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+
+ kfree(insn_array->ips);
+ bpf_map_area_free(insn_array);
+}
+
+static struct bpf_map *insn_array_alloc(union bpf_attr *attr)
+{
+ u64 size = insn_array_alloc_size(attr->max_entries);
+ struct bpf_insn_array *insn_array;
+
+ insn_array = bpf_map_area_alloc(size, NUMA_NO_NODE);
+ if (!insn_array)
+ return ERR_PTR(-ENOMEM);
+
+ insn_array->ips = kcalloc(attr->max_entries, sizeof(long), GFP_KERNEL);
+ if (!insn_array->ips) {
+ insn_array_free(&insn_array->map);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ bpf_map_init_from_attr(&insn_array->map, attr);
+
+ mutex_init(&insn_array->state_mutex);
+ insn_array->state = INSN_ARRAY_STATE_FREE;
+
+ return &insn_array->map;
+}
+
+static int insn_array_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ u32 index = key ? *(u32 *)key : U32_MAX;
+ u32 *next = (u32 *)next_key;
+
+ if (index >= insn_array->map.max_entries) {
+ *next = 0;
+ return 0;
+ }
+
+ if (index == insn_array->map.max_entries - 1)
+ return -ENOENT;
+
+ *next = index + 1;
+ return 0;
+}
+
+static void *insn_array_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ u32 index = *(u32 *)key;
+
+ if (unlikely(index >= insn_array->map.max_entries))
+ return NULL;
+
+ return &insn_array->ptrs[index].user_value;
+}
+
+static long insn_array_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ u32 index = *(u32 *)key;
+ struct bpf_insn_array_value val = {};
+ int err = 0;
+
+ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+ return -EINVAL;
+
+ if (unlikely(index >= insn_array->map.max_entries))
+ return -E2BIG;
+
+ if (unlikely(map_flags & BPF_NOEXIST))
+ return -EEXIST;
+
+ /* No updates for maps in use */
+ if (!mutex_trylock(&insn_array->state_mutex))
+ return -EBUSY;
+
+ if (insn_array->state != INSN_ARRAY_STATE_FREE) {
+ err = -EBUSY;
+ goto unlock;
+ }
+
+ copy_map_value(map, &val, value);
+ if (val.jitted_off || val.xlated_off == INSN_DELETED) {
+ err = -EINVAL;
+ goto unlock;
+ }
+
+ insn_array->ptrs[index].orig_xlated_off = val.xlated_off;
+ insn_array->ptrs[index].user_value.xlated_off = val.xlated_off;
+
+unlock:
+ mutex_unlock(&insn_array->state_mutex);
+ return err;
+}
+
+static long insn_array_delete_elem(struct bpf_map *map, void *key)
+{
+ return -EINVAL;
+}
+
+static int insn_array_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
+{
+ if (!btf_type_is_i32(key_type))
+ return -EINVAL;
+
+ if (!btf_type_is_i64(value_type))
+ return -EINVAL;
+
+ return 0;
+}
+
+static u64 insn_array_mem_usage(const struct bpf_map *map)
+{
+ u64 extra_size = 0;
+
+ extra_size += sizeof(long) * map->max_entries; /* insn_array->ips */
+
+ return insn_array_alloc_size(map->max_entries) + extra_size;
+}
+
+BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
+
+const struct bpf_map_ops insn_array_map_ops = {
+ .map_alloc_check = insn_array_alloc_check,
+ .map_alloc = insn_array_alloc,
+ .map_free = insn_array_free,
+ .map_get_next_key = insn_array_get_next_key,
+ .map_lookup_elem = insn_array_lookup_elem,
+ .map_update_elem = insn_array_update_elem,
+ .map_delete_elem = insn_array_delete_elem,
+ .map_check_btf = insn_array_check_btf,
+ .map_mem_usage = insn_array_mem_usage,
+ .map_btf_id = &insn_array_btf_ids[0],
+};
+
+static bool is_insn_array(const struct bpf_map *map)
+{
+ return map->map_type == BPF_MAP_TYPE_INSN_ARRAY;
+}
+
+static inline bool valid_offsets(const struct bpf_insn_array *insn_array,
+ const struct bpf_prog *prog)
+{
+ u32 off;
+ int i;
+
+ for (i = 0; i < insn_array->map.max_entries; i++) {
+ off = insn_array->ptrs[i].orig_xlated_off;
+
+ if (off >= prog->len)
+ return false;
+
+ if (off > 0) {
+ if (prog->insnsi[off-1].code == (BPF_LD | BPF_DW | BPF_IMM))
+ return false;
+ }
+ }
+
+ return true;
+}
+
+int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ int i;
+
+ if (!valid_offsets(insn_array, prog))
+ return -EINVAL;
+
+ /*
+ * There can be only one program using the map
+ */
+ mutex_lock(&insn_array->state_mutex);
+ if (insn_array->state != INSN_ARRAY_STATE_FREE) {
+ mutex_unlock(&insn_array->state_mutex);
+ return -EBUSY;
+ }
+ insn_array->state = INSN_ARRAY_STATE_INIT;
+ mutex_unlock(&insn_array->state_mutex);
+
+ /*
+ * Reset all the map indexes to the original values. This is needed,
+ * e.g., when a replay of verification with different log level should
+ * be performed.
+ */
+ for (i = 0; i < map->max_entries; i++)
+ insn_array->ptrs[i].user_value.xlated_off = insn_array->ptrs[i].orig_xlated_off;
+
+ return 0;
+}
+
+int bpf_insn_array_ready(struct bpf_map *map)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ guard(mutex)(&insn_array->state_mutex);
+ int i;
+
+ for (i = 0; i < map->max_entries; i++) {
+ if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
+ continue;
+ if (!insn_array->ips[i]) {
+ /*
+ * Set the map free on failure; the program owning it
+ * might be re-loaded with different log level
+ */
+ insn_array->state = INSN_ARRAY_STATE_FREE;
+ return -EFAULT;
+ }
+ }
+
+ insn_array->state = INSN_ARRAY_STATE_READY;
+ return 0;
+}
+
+void bpf_insn_array_release(struct bpf_map *map)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ guard(mutex)(&insn_array->state_mutex);
+
+ insn_array->state = INSN_ARRAY_STATE_FREE;
+}
+
+void bpf_insn_array_adjust(struct bpf_map *map, u32 off, u32 len)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ int i;
+
+ if (len <= 1)
+ return;
+
+ for (i = 0; i < map->max_entries; i++) {
+ if (insn_array->ptrs[i].user_value.xlated_off <= off)
+ continue;
+ if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
+ continue;
+ insn_array->ptrs[i].user_value.xlated_off += len - 1;
+ }
+}
+
+void bpf_insn_array_adjust_after_remove(struct bpf_map *map, u32 off, u32 len)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+ int i;
+
+ for (i = 0; i < map->max_entries; i++) {
+ if (insn_array->ptrs[i].user_value.xlated_off < off)
+ continue;
+ if (insn_array->ptrs[i].user_value.xlated_off == INSN_DELETED)
+ continue;
+ if (insn_array->ptrs[i].user_value.xlated_off >= off &&
+ insn_array->ptrs[i].user_value.xlated_off < off + len)
+ insn_array->ptrs[i].user_value.xlated_off = INSN_DELETED;
+ else
+ insn_array->ptrs[i].user_value.xlated_off -= len;
+ }
+}
+
+void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
+ u32 xlated_off,
+ u32 jitted_off,
+ void *jitted_ip)
+{
+ struct bpf_insn_array *insn_array;
+ struct bpf_map *map;
+ int i, j;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++) {
+ map = prog->aux->used_maps[i];
+ if (!is_insn_array(map))
+ continue;
+
+ insn_array = cast_insn_array(map);
+ for (j = 0; j < map->max_entries; j++) {
+ if (insn_array->ptrs[j].user_value.xlated_off == xlated_off) {
+ insn_array->ips[j] = (long)jitted_ip;
+ insn_array->ptrs[j].jitted_ip = jitted_ip;
+ insn_array->ptrs[j].user_value.jitted_off = jitted_off;
+ }
+ }
+ }
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3f178a0f8eb1..7b4e7a053aa0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1461,6 +1461,7 @@ static int map_create(union bpf_attr *attr, bool kernel)
case BPF_MAP_TYPE_STRUCT_OPS:
case BPF_MAP_TYPE_CPUMAP:
case BPF_MAP_TYPE_ARENA:
+ case BPF_MAP_TYPE_INSN_ARRAY:
if (!bpf_token_capable(token, CAP_BPF))
goto put_token;
break;
@@ -2761,6 +2762,23 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
}
}
+static int bpf_prog_mark_insn_arrays_ready(struct bpf_prog *prog)
+{
+ int err;
+ int i;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++) {
+ if (prog->aux->used_maps[i]->map_type != BPF_MAP_TYPE_INSN_ARRAY)
+ continue;
+
+ err = bpf_insn_array_ready(prog->aux->used_maps[i]);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD fd_array_cnt
@@ -2984,6 +3002,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
if (err < 0)
goto free_used_maps;
+ err = bpf_prog_mark_insn_arrays_ready(prog);
+ if (err < 0)
+ goto free_used_maps;
+
err = bpf_prog_alloc_id(prog);
if (err)
goto free_used_maps;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 14c0c6fe9416..1f1708fd76c4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10093,6 +10093,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
func_id != BPF_FUNC_map_push_elem)
goto error;
break;
+ case BPF_MAP_TYPE_INSN_ARRAY:
+ goto error;
default:
break;
}
@@ -20517,6 +20519,15 @@ static int __add_used_map(struct bpf_verifier_env *env, struct bpf_map *map)
env->used_maps[env->used_map_cnt++] = map;
+ if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY) {
+ err = bpf_insn_array_init(map, env->prog);
+ if (err) {
+ verbose(env, "Failed to properly initialize insn array\n");
+ return err;
+ }
+ env->insn_array_maps[env->insn_array_map_cnt++] = map;
+ }
+
return env->used_map_cnt - 1;
}
@@ -20763,6 +20774,33 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len
}
}
+static void release_insn_arrays(struct bpf_verifier_env *env)
+{
+ int i;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++)
+ bpf_insn_array_release(env->insn_array_maps[i]);
+}
+
+static void adjust_insn_arrays(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+ int i;
+
+ if (len == 1)
+ return;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++)
+ bpf_insn_array_adjust(env->insn_array_maps[i], off, len);
+}
+
+static void adjust_insn_arrays_after_remove(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+ int i;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++)
+ bpf_insn_array_adjust_after_remove(env->insn_array_maps[i], off, len);
+}
+
static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len)
{
struct bpf_jit_poke_descriptor *tab = prog->aux->poke_tab;
@@ -20805,6 +20843,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of
}
adjust_insn_aux_data(env, new_prog, off, len);
adjust_subprog_starts(env, off, len);
+ adjust_insn_arrays(env, off, len);
adjust_poke_descs(new_prog, off, len);
return new_prog;
}
@@ -20988,6 +21027,8 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
if (err)
return err;
+ adjust_insn_arrays_after_remove(env, off, cnt);
+
memmove(aux_data + off, aux_data + off + cnt,
sizeof(*aux_data) * (orig_prog_len - off - cnt));
@@ -24836,6 +24877,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
adjust_btf_func(env);
err_release_maps:
+ if (ret)
+ release_insn_arrays(env);
if (!env->prog->aux->used_maps)
/* if we didn't copy map pointers into bpf_prog_info, release
* them now. Otherwise free_used_maps() will release them.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 233de8677382..021c27ee5591 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1026,6 +1026,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
+ BPF_MAP_TYPE_INSN_ARRAY,
__MAX_BPF_MAP_TYPE
};
@@ -7623,4 +7624,14 @@ enum bpf_kfunc_flags {
BPF_F_PAD_ZEROS = (1ULL << 0),
};
+/*
+ * Values of a BPF_MAP_TYPE_INSN_ARRAY entry must be of this type.
+ * On updates jitted_off must be equal to 0.
+ */
+struct bpf_insn_array_value {
+ __u32 jitted_off;
+ __u32 xlated_off;
+};
+
+
#endif /* _UAPI__LINUX_BPF_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 04/13] selftests/bpf: add selftests for new insn_array map
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (2 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 05/13] bpf: support instructions arrays with constants blinding Anton Protopopov
` (8 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add the following selftests for new insn_array map:
* Incorrect instruction indexes are rejected
* Two programs can't use the same map
* BPF progs can't operate the map
* no changes to code => map is the same
* expected changes when instructions are added
* expected changes when instructions are deleted
* expected changes when multiple functions are present
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
.../selftests/bpf/prog_tests/bpf_insn_array.c | 405 ++++++++++++++++++
1 file changed, 405 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
new file mode 100644
index 000000000000..f785132497d6
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
@@ -0,0 +1,405 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <bpf/bpf.h>
+#include <test_progs.h>
+
+static int map_create(__u32 map_type, __u32 max_entries)
+{
+ const char *map_name = "insn_array";
+ __u32 key_size = 4;
+ __u32 value_size = sizeof(struct bpf_insn_array_value);
+
+ return bpf_map_create(map_type, map_name, key_size, value_size, max_entries, NULL);
+}
+
+static int prog_load(struct bpf_insn *insns, __u32 insn_cnt, int *fd_array, __u32 fd_array_cnt)
+{
+ LIBBPF_OPTS(bpf_prog_load_opts, opts);
+
+ opts.fd_array = fd_array;
+ opts.fd_array_cnt = fd_array_cnt;
+
+ return bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, &opts);
+}
+
+/*
+ * Load a program, which will not be anyhow mangled by the verifier. Add an
+ * insn_array map pointing to every instruction. Check that it hasn't changed
+ * after the program load.
+ */
+static void check_one_to_one_mapping(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 4),
+ BPF_MOV64_IMM(BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = i;
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, i, "val should be equal i");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+/*
+ * Try to load a program with a map which points to outside of the program
+ */
+static void check_out_of_bounds_index(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 4),
+ BPF_MOV64_IMM(BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd, map_fd;
+ struct bpf_insn_array_value val = {};
+ int key;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, 1);
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ key = 0;
+ val.xlated_off = ARRAY_SIZE(insns); /* too big */
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &key, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+
+ errno = 0;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)")) {
+ close(prog_fd);
+ goto cleanup;
+ }
+
+cleanup:
+ close(map_fd);
+}
+
+/*
+ * Try to load a program with a map which points to the middle of 16-bit insn
+ */
+static void check_mid_insn_index(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_LD_IMM64(BPF_REG_0, 0), /* 2 x 8 */
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd, map_fd;
+ struct bpf_insn_array_value val = {};
+ int key;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, 1);
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ key = 0;
+ val.xlated_off = 1; /* middle of 16-byte instruction */
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &key, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+
+ errno = 0;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)")) {
+ close(prog_fd);
+ goto cleanup;
+ }
+
+cleanup:
+ close(map_fd);
+}
+
+static void check_incorrect_index(void)
+{
+ check_out_of_bounds_index();
+ check_mid_insn_index();
+}
+
+/*
+ * Load a program with two patches (get jiffies, for simplicity). Add an
+ * insn_array map pointing to every instruction. Check how it was changed
+ * after the program load.
+ */
+static void check_simple(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ __u32 map_in[] = {0, 1, 2, 3, 4, 5};
+ __u32 map_out[] = {0, 1, 4, 5, 8, 9};
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = map_in[i];
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+ "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+/*
+ * Verifier can delete code in two cases: nops & dead code. From insn
+ * array's point of view, the two cases are the same, so test using
+ * the simplest method: by loading some nops
+ */
+static void check_deletions(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ __u32 map_in[] = {0, 1, 2, 3, 4, 5};
+ __u32 map_out[] = {0, -1, 1, -1, 2, 3};
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = map_in[i];
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+ "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+static void check_with_functions(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ __u32 map_in[] = { 0, 1, 2, 3, 4, 5, /* func */ 6, 7, 8, 9, 10};
+ __u32 map_out[] = {-1, 0, -1, 3, 4, 5, /* func */ -1, 6, -1, 9, 10};
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = map_in[i];
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+ "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+ }
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+/* Map can be used only by one BPF program */
+static void check_no_map_reuse(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd, extra_fd = -1;
+ struct bpf_insn_array_value val = {};
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = i;
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ ASSERT_EQ(val.xlated_off, i, "val should be equal i");
+ }
+
+ errno = 0;
+ extra_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_EQ(extra_fd, -EBUSY, "program should have been rejected (extra_fd != -EBUSY)"))
+ goto cleanup;
+
+ /* correctness: check that prog is still loadable without fd_array */
+ extra_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD): expected no error"))
+ goto cleanup;
+
+cleanup:
+ close(extra_fd);
+ close(prog_fd);
+ close(map_fd);
+}
+
+static void check_bpf_no_lookup(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, 1);
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ insns[0].imm = map_fd;
+
+ errno = 0;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+ if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)"))
+ goto cleanup;
+
+ /* correctness: check that prog is still loadable with normal map */
+ close(map_fd);
+ map_fd = map_create(BPF_MAP_TYPE_ARRAY, 1);
+ insns[0].imm = map_fd;
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+cleanup:
+ close(prog_fd);
+ close(map_fd);
+}
+
+static void check_bpf_side(void)
+{
+ check_bpf_no_lookup();
+}
+
+void test_bpf_insn_array(void)
+{
+ /* Test if offsets are adjusted properly */
+
+ if (test__start_subtest("one2one"))
+ check_one_to_one_mapping();
+
+ if (test__start_subtest("simple"))
+ check_simple();
+
+ if (test__start_subtest("deletions"))
+ check_deletions();
+
+ if (test__start_subtest("multiple-functions"))
+ check_with_functions();
+
+ /* Check all kinds of operations and related restrictions */
+
+ if (test__start_subtest("incorrect-index"))
+ check_incorrect_index();
+
+ if (test__start_subtest("no-map-reuse"))
+ check_no_map_reuse();
+
+ if (test__start_subtest("bpf-side-ops"))
+ check_bpf_side();
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 05/13] bpf: support instructions arrays with constants blinding
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (3 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 04/13] selftests/bpf: add selftests for new insn_array map Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 06/13] selftests/bpf: test instructions arrays with blinding Anton Protopopov
` (7 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
When bpf_jit_harden is enabled, all constants in the BPF code are
blinded to prevent JIT spraying attacks. This happens during JIT
phase. Adjust all the related instruction arrays accordingly.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
kernel/bpf/core.c | 20 ++++++++++++++++++++
kernel/bpf/verifier.c | 11 ++++++++++-
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1cda2589d4b3..90f201a6f51d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1451,6 +1451,23 @@ void bpf_jit_prog_release_other(struct bpf_prog *fp, struct bpf_prog *fp_other)
bpf_prog_clone_free(fp_other);
}
+static void adjust_insn_arrays(struct bpf_prog *prog, u32 off, u32 len)
+{
+#ifdef CONFIG_BPF_SYSCALL
+ struct bpf_map *map;
+ int i;
+
+ if (len <= 1)
+ return;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++) {
+ map = prog->aux->used_maps[i];
+ if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY)
+ bpf_insn_array_adjust(map, off, len);
+ }
+#endif
+}
+
struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
{
struct bpf_insn insn_buff[16], aux[2];
@@ -1506,6 +1523,9 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
clone = tmp;
insn_delta = rewritten - 1;
+ /* Instructions arrays must be updated using absolute xlated offsets */
+ adjust_insn_arrays(clone, prog->aux->subprog_start + i, rewritten);
+
/* Walk new program and skip insns we just inserted. */
insn = clone->insnsi + i + insn_delta;
insn_cnt += insn_delta;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1f1708fd76c4..4261486981a3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21564,6 +21564,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
struct bpf_insn *insn;
void *old_bpf_func;
int err, num_exentries;
+ int old_len, subprog_start_adjustment = 0;
if (env->subprog_cnt <= 1)
return 0;
@@ -21638,7 +21639,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->func_idx = i;
/* Below members will be freed only at prog->aux */
func[i]->aux->btf = prog->aux->btf;
- func[i]->aux->subprog_start = subprog_start;
+ func[i]->aux->subprog_start = subprog_start + subprog_start_adjustment;
func[i]->aux->func_info = prog->aux->func_info;
func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
func[i]->aux->poke_tab = prog->aux->poke_tab;
@@ -21691,7 +21692,15 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->might_sleep = env->subprog_info[i].might_sleep;
if (!i)
func[i]->aux->exception_boundary = env->seen_exception;
+
+ /*
+ * To properly pass the absolute subprog start to jit
+ * all instruction adjustments should be accumulated
+ */
+ old_len = func[i]->len;
func[i] = bpf_int_jit_compile(func[i]);
+ subprog_start_adjustment += func[i]->len - old_len;
+
if (!func[i]->jited) {
err = -ENOTSUPP;
goto out_free;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 06/13] selftests/bpf: test instructions arrays with blinding
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (4 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 05/13] bpf: support instructions arrays with constants blinding Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 07/13] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
` (6 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add a specific test for instructions arrays with blinding enabled.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
.../selftests/bpf/prog_tests/bpf_insn_array.c | 92 +++++++++++++++++++
1 file changed, 92 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
index f785132497d6..489badc17a2d 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c
@@ -287,6 +287,95 @@ static void check_with_functions(void)
close(map_fd);
}
+static int set_bpf_jit_harden(char *level)
+{
+ char old_level;
+ int err = -1;
+ int fd = -1;
+
+ fd = open("/proc/sys/net/core/bpf_jit_harden", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ASSERT_FAIL("open .../bpf_jit_harden returned %d (errno=%d)", fd, errno);
+ return -1;
+ }
+
+ err = read(fd, &old_level, 1);
+ if (err != 1) {
+ ASSERT_FAIL("read from .../bpf_jit_harden returned %d (errno=%d)", err, errno);
+ err = -1;
+ goto end;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ err = write(fd, level, 1);
+ if (err != 1) {
+ ASSERT_FAIL("write to .../bpf_jit_harden returned %d (errno=%d)", err, errno);
+ err = -1;
+ goto end;
+ }
+
+ err = 0;
+ *level = old_level;
+end:
+ if (fd >= 0)
+ close(fd);
+ return err;
+}
+
+static void check_blindness(void)
+{
+ struct bpf_insn insns[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 4),
+ BPF_MOV64_IMM(BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ };
+ int prog_fd = -1, map_fd;
+ struct bpf_insn_array_value val = {};
+ char bpf_jit_harden = '@'; /* non-exizsting value */
+ int i;
+
+ map_fd = map_create(BPF_MAP_TYPE_INSN_ARRAY, ARRAY_SIZE(insns));
+ if (!ASSERT_GE(map_fd, 0, "map_create"))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ val.xlated_off = i;
+ if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+ goto cleanup;
+ }
+
+ bpf_jit_harden = '2';
+ if (set_bpf_jit_harden(&bpf_jit_harden)) {
+ bpf_jit_harden = '@'; /* open, read or write failed => no write was done */
+ goto cleanup;
+ }
+
+ prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+ if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(insns); i++) {
+ char fmt[32];
+
+ if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+ goto cleanup;
+
+ snprintf(fmt, sizeof(fmt), "val should be equal 3*%d", i);
+ ASSERT_EQ(val.xlated_off, i * 3, fmt);
+ }
+
+cleanup:
+ /* restore the old one */
+ if (bpf_jit_harden != '@')
+ set_bpf_jit_harden(&bpf_jit_harden);
+
+ close(prog_fd);
+ close(map_fd);
+}
+
/* Map can be used only by one BPF program */
static void check_no_map_reuse(void)
{
@@ -392,6 +481,9 @@ void test_bpf_insn_array(void)
if (test__start_subtest("multiple-functions"))
check_with_functions();
+ if (test__start_subtest("blindness"))
+ check_blindness();
+
/* Check all kinds of operations and related restrictions */
if (test__start_subtest("incorrect-index"))
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 07/13] bpf, x86: allow indirect jumps to r8...r15
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (5 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 06/13] selftests/bpf: test instructions arrays with blinding Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 08/13] bpf, x86: add support for indirect jumps Anton Protopopov
` (5 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Currently the emit_indirect_jump() function only accepts one of the
RAX, RCX, ..., RBP registers as the destination. Make it to accept
R8, R9, ..., R15 as well, and make callers to pass BPF registers, not
native registers. This is required to enable indirect jumps support
in eBPF.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
arch/x86/net/bpf_jit_comp.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8792d7f371d3..fcebb48742ae 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -660,24 +660,38 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
#define EMIT_LFENCE() EMIT3(0x0F, 0xAE, 0xE8)
-static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)
{
u8 *prog = *pprog;
+ if (ereg)
+ EMIT1(0x41);
+
+ EMIT2(0xFF, 0xE0 + reg);
+
+ *pprog = prog;
+}
+
+static void emit_indirect_jump(u8 **pprog, int bpf_reg, u8 *ip)
+{
+ u8 *prog = *pprog;
+ int reg = reg2hex[bpf_reg];
+ bool ereg = is_ereg(bpf_reg);
+
if (cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) {
OPTIMIZER_HIDE_VAR(reg);
emit_jump(&prog, its_static_thunk(reg), ip);
} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE)) {
EMIT_LFENCE();
- EMIT2(0xFF, 0xE0 + reg);
+ __emit_indirect_jump(pprog, reg, ereg);
} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
OPTIMIZER_HIDE_VAR(reg);
if (cpu_feature_enabled(X86_FEATURE_CALL_DEPTH))
- emit_jump(&prog, &__x86_indirect_jump_thunk_array[reg], ip);
+ emit_jump(&prog, &__x86_indirect_jump_thunk_array[reg + 8*ereg], ip);
else
- emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
+ emit_jump(&prog, &__x86_indirect_thunk_array[reg + 8*ereg], ip);
} else {
- EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
+ __emit_indirect_jump(pprog, reg, ereg);
if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) || IS_ENABLED(CONFIG_MITIGATION_SLS))
EMIT1(0xCC); /* int3 */
}
@@ -797,7 +811,7 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
* rdi == ctx (1st arg)
* rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
*/
- emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
+ emit_indirect_jump(&prog, BPF_REG_4 /* R4 -> rcx */, ip + (prog - start));
/* out: */
ctx->tail_call_indirect_label = prog - start;
@@ -3517,7 +3531,7 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs, u8 *image,
if (err)
return err;
- emit_indirect_jump(&prog, 2 /* rdx */, image + (prog - buf));
+ emit_indirect_jump(&prog, BPF_REG_3 /* R3 -> rdx */, image + (prog - buf));
*pprog = prog;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 08/13] bpf, x86: add support for indirect jumps
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (6 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 07/13] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 09/13] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
` (4 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add support for a new instruction
BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=0
which does an indirect jump to a location stored in Rx. The register
Rx should have type PTR_TO_INSN. This new type assures that the Rx
register contains a value (or a range of values) loaded from a
correct jump table – map of type instruction array.
For example, for a C switch LLVM will generate the following code:
0: r3 = r1 # "switch (r3)"
1: if r3 > 0x13 goto +0x666 # check r3 boundaries
2: r3 <<= 0x3 # adjust to an index in array of addresses
3: r1 = 0xbeef ll # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
5: r1 += r3 # r1 inherits boundaries from r3
6: r1 = *(u64 *)(r1 + 0x0) # r1 now has type INSN_TO_PTR
7: gotox r1[,imm=fd(M)] # jit will generate proper code
Here the gotox instruction corresponds to one particular map. This is
possible however to have a gotox instruction which can be loaded from
different maps, e.g.
0: r1 &= 0x1
1: r2 <<= 0x3
2: r3 = 0x0 ll # load from map M_1
4: r3 += r2
5: if r1 == 0x0 goto +0x4
6: r1 <<= 0x3
7: r3 = 0x0 ll # load from map M_2
9: r3 += r1
A: r1 = *(u64 *)(r3 + 0x0)
B: gotox r1 # jump to target loaded from M_1 or M_2
During check_cfg stage the verifier will collect all the maps which
point to inside the subprog being verified. When building the config,
the high 16 bytes of the insn_state are used, so this patch
(theoretically) supports jump tables of up to 2^16 slots.
During the later stage, in check_indirect_jump, it is checked that
the register Rx was loaded from a particular instruction array.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
arch/x86/net/bpf_jit_comp.c | 3 +
include/linux/bpf.h | 1 +
include/linux/bpf_verifier.h | 15 +
kernel/bpf/bpf_insn_array.c | 16 +-
kernel/bpf/core.c | 1 +
kernel/bpf/log.c | 1 +
kernel/bpf/verifier.c | 513 ++++++++++++++++++++++++++++++++---
7 files changed, 514 insertions(+), 36 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index fcebb48742ae..095d249eb235 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2595,6 +2595,9 @@ st: if (is_imm8(insn->off))
break;
+ case BPF_JMP | BPF_JA | BPF_X:
+ emit_indirect_jump(&prog, insn->dst_reg, image + addrs[i - 1]);
+ break;
case BPF_JMP | BPF_JA:
case BPF_JMP32 | BPF_JA:
if (BPF_CLASS(insn->code) == BPF_JMP) {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 77fcb508d6ae..2c12edfdf63c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -973,6 +973,7 @@ enum bpf_reg_type {
PTR_TO_ARENA,
PTR_TO_BUF, /* reg points to a read/write buffer */
PTR_TO_FUNC, /* reg points to a bpf program function */
+ PTR_TO_INSN, /* reg points to a bpf program instruction */
CONST_PTR_TO_DYNPTR, /* reg points to a const struct bpf_dynptr */
__BPF_REG_TYPE_MAX,
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index aca43c284203..607a684642e5 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -533,6 +533,16 @@ struct bpf_map_ptr_state {
#define BPF_ALU_SANITIZE (BPF_ALU_SANITIZE_SRC | \
BPF_ALU_SANITIZE_DST)
+/*
+ * A structure defining an array of BPF instructions. Can be used,
+ * for example, as a return value of the insn_successors() function
+ * and in the struct bpf_insn_aux_data below.
+ */
+struct bpf_iarray {
+ int off_cnt;
+ u32 off[];
+};
+
struct bpf_insn_aux_data {
union {
enum bpf_reg_type ptr_type; /* pointer type for load/store insns */
@@ -542,6 +552,7 @@ struct bpf_insn_aux_data {
struct {
u32 map_index; /* index into used_maps[] */
u32 map_off; /* offset from value base address */
+ struct bpf_iarray *jt; /* jump table for gotox instruction */
};
struct {
enum bpf_reg_type reg_type; /* type of pseudo_btf_id */
@@ -586,6 +597,9 @@ struct bpf_insn_aux_data {
u8 fastcall_spills_num:3;
u8 arg_prog:4;
+ /* true if jt->off was allocated */
+ bool jt_allocated;
+
/* below fields are initialized once */
unsigned int orig_idx; /* original instruction index */
bool jmp_point;
@@ -847,6 +861,7 @@ struct bpf_verifier_env {
/* array of pointers to bpf_scc_info indexed by SCC id */
struct bpf_scc_info **scc_info;
u32 scc_cnt;
+ struct bpf_iarray *succ;
};
static inline struct bpf_func_info_aux *subprog_aux(struct bpf_verifier_env *env, int subprog)
diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c
index 0c8dac62f457..4b945b7e31b8 100644
--- a/kernel/bpf/bpf_insn_array.c
+++ b/kernel/bpf/bpf_insn_array.c
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/bpf.h>
-#include <linux/sort.h>
#define MAX_INSN_ARRAY_ENTRIES 256
@@ -173,6 +172,20 @@ static u64 insn_array_mem_usage(const struct bpf_map *map)
return insn_array_alloc_size(map->max_entries) + extra_size;
}
+static int insn_array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
+{
+ struct bpf_insn_array *insn_array = cast_insn_array(map);
+
+ if ((off % sizeof(long)) != 0 ||
+ (off / sizeof(long)) >= map->max_entries)
+ return -EINVAL;
+
+ /* from BPF's point of view, this map is a jump table */
+ *imm = (unsigned long)insn_array->ips + off;
+
+ return 0;
+}
+
BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
const struct bpf_map_ops insn_array_map_ops = {
@@ -185,6 +198,7 @@ const struct bpf_map_ops insn_array_map_ops = {
.map_delete_elem = insn_array_delete_elem,
.map_check_btf = insn_array_check_btf,
.map_mem_usage = insn_array_mem_usage,
+ .map_direct_value_addr = insn_array_map_direct_value_addr,
.map_btf_id = &insn_array_btf_ids[0],
};
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 90f201a6f51d..1f933857ca1d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1709,6 +1709,7 @@ bool bpf_opcode_in_insntable(u8 code)
[BPF_LD | BPF_IND | BPF_B] = true,
[BPF_LD | BPF_IND | BPF_H] = true,
[BPF_LD | BPF_IND | BPF_W] = true,
+ [BPF_JMP | BPF_JA | BPF_X] = true,
[BPF_JMP | BPF_JCOND] = true,
};
#undef BPF_INSN_3_TBL
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index e4983c1303e7..75adfe7914f2 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -461,6 +461,7 @@ const char *reg_type_str(struct bpf_verifier_env *env, enum bpf_reg_type type)
[PTR_TO_ARENA] = "arena",
[PTR_TO_BUF] = "buf",
[PTR_TO_FUNC] = "func",
+ [PTR_TO_INSN] = "insn",
[PTR_TO_MAP_KEY] = "map_key",
[CONST_PTR_TO_DYNPTR] = "dynptr_ptr",
};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4261486981a3..5985ad4761ba 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -212,6 +212,7 @@ static int ref_set_non_owning(struct bpf_verifier_env *env,
static void specialize_kfunc(struct bpf_verifier_env *env,
u32 func_id, u16 offset, unsigned long *addr);
static bool is_trusted_reg(const struct bpf_reg_state *reg);
+static int add_used_map(struct bpf_verifier_env *env, int fd);
static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
{
@@ -2962,14 +2963,13 @@ static int cmp_subprogs(const void *a, const void *b)
((struct bpf_subprog_info *)b)->start;
}
-/* Find subprogram that contains instruction at 'off' */
-static struct bpf_subprog_info *find_containing_subprog(struct bpf_verifier_env *env, int off)
+static int find_containing_subprog_idx(struct bpf_verifier_env *env, int off)
{
struct bpf_subprog_info *vals = env->subprog_info;
int l, r, m;
if (off >= env->prog->len || off < 0 || env->subprog_cnt == 0)
- return NULL;
+ return -1;
l = 0;
r = env->subprog_cnt - 1;
@@ -2980,7 +2980,19 @@ static struct bpf_subprog_info *find_containing_subprog(struct bpf_verifier_env
else
r = m - 1;
}
- return &vals[l];
+ return l;
+}
+
+/* Find subprogram that contains instruction at 'off' */
+static struct bpf_subprog_info *find_containing_subprog(struct bpf_verifier_env *env, int off)
+{
+ int subprog_idx;
+
+ subprog_idx = find_containing_subprog_idx(env, off);
+ if (subprog_idx < 0)
+ return NULL;
+
+ return &env->subprog_info[subprog_idx];
}
/* Find subprogram that starts exactly at 'off' */
@@ -6077,6 +6089,18 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
return 0;
}
+/*
+ * Return the size of the memory region accessible from a pointer to map value.
+ * For INSN_ARRAY maps whole bpf_insn_array->ips array is accessible.
+ */
+static u32 map_mem_size(const struct bpf_map *map)
+{
+ if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY)
+ return map->max_entries * sizeof(long);
+
+ return map->value_size;
+}
+
/* check read/write into a map element with possible variable offset */
static int check_map_access(struct bpf_verifier_env *env, u32 regno,
int off, int size, bool zero_size_allowed,
@@ -6086,11 +6110,11 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *reg = &state->regs[regno];
struct bpf_map *map = reg->map_ptr;
+ u32 mem_size = map_mem_size(map);
struct btf_record *rec;
int err, i;
- err = check_mem_region_access(env, regno, off, size, map->value_size,
- zero_size_allowed);
+ err = check_mem_region_access(env, regno, off, size, mem_size, zero_size_allowed);
if (err)
return err;
@@ -7605,6 +7629,19 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
regs[value_regno].type = SCALAR_VALUE;
__mark_reg_known(®s[value_regno], val);
+ } else if (map->map_type == BPF_MAP_TYPE_INSN_ARRAY) {
+ regs[value_regno].type = PTR_TO_INSN;
+ regs[value_regno].map_ptr = map;
+ regs[value_regno].off = reg->off;
+ regs[value_regno].umin_value = reg->umin_value;
+ regs[value_regno].umax_value = reg->umax_value;
+ regs[value_regno].smin_value = reg->smin_value;
+ regs[value_regno].smax_value = reg->smax_value;
+ regs[value_regno].s32_min_value = reg->s32_min_value;
+ regs[value_regno].s32_max_value = reg->s32_max_value;
+ regs[value_regno].u32_min_value = reg->u32_min_value;
+ regs[value_regno].u32_max_value = reg->u32_max_value;
+ regs[value_regno].var_off = reg->var_off;
} else {
mark_reg_unknown(env, regs, value_regno);
}
@@ -7795,6 +7832,11 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
bool allow_trust_mismatch);
+static bool map_is_insn_array(struct bpf_map *map)
+{
+ return map && map->map_type == BPF_MAP_TYPE_INSN_ARRAY;
+}
+
static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
bool strict_alignment_once, bool is_ldsx,
bool allow_trust_mismatch, const char *ctx)
@@ -14472,6 +14514,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *regs = state->regs, *dst_reg;
bool known = tnum_is_const(off_reg->var_off);
+ bool ptr_to_insn_array = base_type(ptr_reg->type) == PTR_TO_MAP_VALUE &&
+ map_is_insn_array(ptr_reg->map_ptr);
s64 smin_val = off_reg->smin_value, smax_val = off_reg->smax_value,
smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value;
u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value,
@@ -14613,6 +14657,11 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
}
break;
case BPF_SUB:
+ if (ptr_to_insn_array) {
+ verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
+ bpf_alu_string[opcode >> 4]);
+ return -EACCES;
+ }
if (dst_reg == off_reg) {
/* scalar -= pointer. Creates an unknown scalar */
verbose(env, "R%d tried to subtract pointer from scalar\n",
@@ -16965,7 +17014,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
}
dst_reg->type = PTR_TO_MAP_VALUE;
dst_reg->off = aux->map_off;
- WARN_ON_ONCE(map->max_entries != 1);
+ WARN_ON_ONCE(map->map_type != BPF_MAP_TYPE_INSN_ARRAY &&
+ map->max_entries != 1);
/* We want reg->id to be same (0) as map_value is not distinct */
} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
insn->src_reg == BPF_PSEUDO_MAP_IDX) {
@@ -17718,6 +17768,234 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
return 0;
}
+#define SET_HIGH(STATE, LAST) STATE = (STATE & 0xffffU) | ((LAST) << 16)
+#define GET_HIGH(STATE) ((u16)((STATE) >> 16))
+
+static int push_gotox_edge(int t, struct bpf_verifier_env *env, struct bpf_iarray *jt)
+{
+ int *insn_stack = env->cfg.insn_stack;
+ int *insn_state = env->cfg.insn_state;
+ u16 prev;
+ int w;
+
+ for (prev = GET_HIGH(insn_state[t]); prev < jt->off_cnt; prev++) {
+ w = jt->off[prev];
+
+ /* EXPLORED || DISCOVERED */
+ if (insn_state[w])
+ continue;
+
+ break;
+ }
+
+ if (prev == jt->off_cnt)
+ return DONE_EXPLORING;
+
+ mark_prune_point(env, t);
+
+ if (env->cfg.cur_stack >= env->prog->len)
+ return -E2BIG;
+ insn_stack[env->cfg.cur_stack++] = w;
+
+ mark_jmp_point(env, w);
+
+ SET_HIGH(insn_state[t], prev + 1);
+ return KEEP_EXPLORING;
+}
+
+static int copy_insn_array(struct bpf_map *map, u32 start, u32 end, u32 *off)
+{
+ struct bpf_insn_array_value *value;
+ u32 i;
+
+ for (i = start; i <= end; i++) {
+ value = map->ops->map_lookup_elem(map, &i);
+ if (!value)
+ return -EINVAL;
+ off[i - start] = value->xlated_off;
+ }
+ return 0;
+}
+
+static int cmp_ptr_to_u32(const void *a, const void *b)
+{
+ return *(u32 *)a - *(u32 *)b;
+}
+
+static int sort_insn_array_uniq(u32 *off, int off_cnt)
+{
+ int unique = 1;
+ int i;
+
+ sort(off, off_cnt, sizeof(off[0]), cmp_ptr_to_u32, NULL);
+
+ for (i = 1; i < off_cnt; i++)
+ if (off[i] != off[unique - 1])
+ off[unique++] = off[i];
+
+ return unique;
+}
+
+/*
+ * sort_unique({map[start], ..., map[end]}) into off
+ */
+static int copy_insn_array_uniq(struct bpf_map *map, u32 start, u32 end, u32 *off)
+{
+ u32 n = end - start + 1;
+ int err;
+
+ err = copy_insn_array(map, start, end, off);
+ if (err)
+ return err;
+
+ return sort_insn_array_uniq(off, n);
+}
+
+static struct bpf_iarray *iarray_realloc(struct bpf_iarray *old, size_t n_elem)
+{
+ size_t new_size = sizeof(struct bpf_iarray) + n_elem * 4;
+ struct bpf_iarray *new;
+
+ new = kvrealloc(old, new_size, GFP_KERNEL_ACCOUNT);
+ if (!new) {
+ /* this is what callers always want, so simplify the call site */
+ kvfree(old);
+ return NULL;
+ }
+
+ new->off_cnt = n_elem;
+ return new;
+}
+
+/*
+ * Copy all unique offsets from the map
+ */
+static struct bpf_iarray *jt_from_map(struct bpf_map *map)
+{
+ struct bpf_iarray *jt;
+ int n;
+
+ jt = iarray_realloc(NULL, map->max_entries);
+ if (!jt)
+ return ERR_PTR(-ENOMEM);
+
+ n = copy_insn_array_uniq(map, 0, map->max_entries - 1, jt->off);
+ if (n < 0) {
+ kvfree(jt);
+ return ERR_PTR(n);
+ }
+
+ return jt;
+}
+
+/*
+ * Find and collect all maps which fit in the subprog. Return the result as one
+ * combined jump table in jt->off (allocated with kvcalloc
+ */
+static struct bpf_iarray *jt_from_subprog(struct bpf_verifier_env *env,
+ int subprog_start, int subprog_end)
+{
+ struct bpf_iarray *jt = NULL;
+ struct bpf_map *map;
+ struct bpf_iarray *jt_cur;
+ int i;
+
+ for (i = 0; i < env->insn_array_map_cnt; i++) {
+ /*
+ * TODO (when needed): collect only jump tables, not static keys
+ * or maps for indirect calls
+ */
+ map = env->insn_array_maps[i];
+
+ jt_cur = jt_from_map(map);
+ if (IS_ERR(jt_cur)) {
+ kvfree(jt);
+ return jt_cur;
+ }
+
+ /*
+ * This is enough to check one element. The full table is
+ * checked to fit inside the subprog later in create_jt()
+ */
+ if (jt_cur->off[0] >= subprog_start && jt_cur->off[0] < subprog_end) {
+ u32 old_cnt = jt ? jt->off_cnt : 0;
+ jt = iarray_realloc(jt, old_cnt + jt_cur->off_cnt);
+ if (!jt) {
+ kvfree(jt_cur);
+ return ERR_PTR(-ENOMEM);
+ }
+ memcpy(jt->off + old_cnt, jt_cur->off, jt_cur->off_cnt << 2);
+ }
+
+ kvfree(jt_cur);
+ }
+
+ if (!jt) {
+ verbose(env, "no jump tables found for subprog starting at %u\n", subprog_start);
+ return ERR_PTR(-EINVAL);
+ }
+
+ jt->off_cnt = sort_insn_array_uniq(jt->off, jt->off_cnt);
+ return jt;
+}
+
+static struct bpf_iarray *
+create_jt(int t, struct bpf_verifier_env *env, int fd)
+{
+ static struct bpf_subprog_info *subprog;
+ int subprog_idx, subprog_start, subprog_end;
+ struct bpf_iarray *jt;
+ int i;
+
+ if (env->subprog_cnt == 0)
+ return ERR_PTR(-EFAULT);
+
+ subprog_idx = find_containing_subprog_idx(env, t);
+ if (subprog_idx < 0) {
+ verbose(env, "can't find subprog containing instruction %d\n", t);
+ return ERR_PTR(-EFAULT);
+ }
+ subprog = &env->subprog_info[subprog_idx];
+ subprog_start = subprog->start;
+ subprog_end = (subprog + 1)->start;
+ jt = jt_from_subprog(env, subprog_start, subprog_end);
+ if (IS_ERR(jt))
+ return jt;
+
+ /* Check that the every element of the jump table fits within the given subprogram */
+ for (i = 0; i < jt->off_cnt; i++) {
+ if (jt->off[i] < subprog_start || jt->off[i] >= subprog_end) {
+ verbose(env, "jump table for insn %d points outside of the subprog [%u,%u]",
+ t, subprog_start, subprog_end);
+ return ERR_PTR(-EINVAL);
+ }
+ }
+
+ return jt;
+}
+
+/* "conditional jump with N edges" */
+static int visit_gotox_insn(int t, struct bpf_verifier_env *env, int fd)
+{
+ struct bpf_iarray *jt = env->insn_aux_data[t].jt;
+
+ if (!jt) {
+ jt = create_jt(t, env, fd);
+ if (IS_ERR(jt))
+ return PTR_ERR(jt);
+ }
+
+ /*
+ * Mark jt as allocated. Otherwise, this is not possible to check if it
+ * was allocated or not in the code which frees memory (jt is a part of
+ * union)
+ */
+ env->insn_aux_data[t].jt_allocated = true;
+ env->insn_aux_data[t].jt = jt;
+
+ return push_gotox_edge(t, env, jt);
+}
+
/* Visits the instruction at index t and returns one of the following:
* < 0 - an error occurred
* DONE_EXPLORING - the instruction was fully explored
@@ -17808,8 +18086,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
case BPF_JA:
- if (BPF_SRC(insn->code) != BPF_K)
- return -EINVAL;
+ if (BPF_SRC(insn->code) == BPF_X)
+ return visit_gotox_insn(t, env, insn->imm);
if (BPF_CLASS(insn->code) == BPF_JMP)
off = insn->off;
@@ -17840,6 +18118,13 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
}
}
+static bool insn_is_gotox(struct bpf_insn *insn)
+{
+ return BPF_CLASS(insn->code) == BPF_JMP &&
+ BPF_OP(insn->code) == BPF_JA &&
+ BPF_SRC(insn->code) == BPF_X;
+}
+
/* non-recursive depth-first-search to detect loops in BPF program
* loop == back-edge in directed graph
*/
@@ -18701,6 +18986,10 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
case PTR_TO_ARENA:
return true;
+ case PTR_TO_INSN:
+ /* is rcur a subset of rold? */
+ return (rcur->umin_value >= rold->umin_value &&
+ rcur->umax_value <= rold->umax_value);
default:
return regs_exact(rold, rcur, idmap);
}
@@ -19847,6 +20136,102 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
return PROCESS_BPF_EXIT;
}
+static int indirect_jump_min_max_index(struct bpf_verifier_env *env,
+ int regno,
+ struct bpf_map *map,
+ u32 *pmin_index, u32 *pmax_index)
+{
+ struct bpf_reg_state *reg = reg_state(env, regno);
+ u64 min_index, max_index;
+
+ if (check_add_overflow(reg->umin_value, reg->off, &min_index) ||
+ (min_index > (u64) U32_MAX * sizeof(long))) {
+ verbose(env, "the sum of R%u umin_value %llu and off %u is too big\n",
+ regno, reg->umin_value, reg->off);
+ return -ERANGE;
+ }
+ if (check_add_overflow(reg->umax_value, reg->off, &max_index) ||
+ (max_index > (u64) U32_MAX * sizeof(long))) {
+ verbose(env, "the sum of R%u umax_value %llu and off %u is too big\n",
+ regno, reg->umax_value, reg->off);
+ return -ERANGE;
+ }
+
+ min_index /= sizeof(long);
+ max_index /= sizeof(long);
+
+ if (min_index >= map->max_entries || max_index >= map->max_entries) {
+ verbose(env, "R%u points to outside of jump table: [%llu,%llu] max_entries %u\n",
+ regno, min_index, max_index, map->max_entries);
+ return -EINVAL;
+ }
+
+ *pmin_index = min_index;
+ *pmax_index = max_index;
+ return 0;
+}
+
+/* gotox *dst_reg */
+static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
+{
+ struct bpf_verifier_state *other_branch;
+ struct bpf_reg_state *dst_reg;
+ struct bpf_map *map;
+ u32 min_index, max_index;
+ int err = 0;
+ u32 *xoff;
+ int n;
+ int i;
+
+ dst_reg = reg_state(env, insn->dst_reg);
+ if (dst_reg->type != PTR_TO_INSN) {
+ verbose(env, "R%d has type %d, expected PTR_TO_INSN\n",
+ insn->dst_reg, dst_reg->type);
+ return -EINVAL;
+ }
+
+ map = dst_reg->map_ptr;
+ if (verifier_bug_if(!map, env, "R%d has an empty map pointer", insn->dst_reg))
+ return -EFAULT;
+
+ if (verifier_bug_if(map->map_type != BPF_MAP_TYPE_INSN_ARRAY, env,
+ "R%d has incorrect map type %d", insn->dst_reg, map->map_type))
+ return -EFAULT;
+
+ err = indirect_jump_min_max_index(env, insn->dst_reg, map, &min_index, &max_index);
+ if (err)
+ return err;
+
+ xoff = kvcalloc(max_index - min_index + 1, sizeof(u32), GFP_KERNEL_ACCOUNT);
+ if (!xoff)
+ return -ENOMEM;
+
+ n = copy_insn_array_uniq(map, min_index, max_index, xoff);
+ if (n < 0) {
+ err = n;
+ goto free_off;
+ }
+ if (n == 0) {
+ verbose(env, "register R%d doesn't point to any offset in map id=%d\n",
+ insn->dst_reg, map->id);
+ err = -EINVAL;
+ goto free_off;
+ }
+
+ for (i = 0; i < n - 1; i++) {
+ other_branch = push_stack(env, xoff[i], env->insn_idx, false);
+ if (IS_ERR(other_branch)) {
+ err = PTR_ERR(other_branch);
+ goto free_off;
+ }
+ }
+ env->insn_idx = xoff[n-1];
+
+free_off:
+ kvfree(xoff);
+ return err;
+}
+
static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
{
int err;
@@ -19949,6 +20334,9 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
mark_reg_scratched(env, BPF_REG_0);
} else if (opcode == BPF_JA) {
+ if (BPF_SRC(insn->code) == BPF_X)
+ return check_indirect_jump(env, insn);
+
if (BPF_SRC(insn->code) != BPF_K ||
insn->src_reg != BPF_REG_0 ||
insn->dst_reg != BPF_REG_0 ||
@@ -20448,6 +20836,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
case BPF_MAP_TYPE_QUEUE:
case BPF_MAP_TYPE_STACK:
case BPF_MAP_TYPE_ARENA:
+ case BPF_MAP_TYPE_INSN_ARRAY:
break;
default:
verbose(env,
@@ -21006,6 +21395,23 @@ static int bpf_adj_linfo_after_remove(struct bpf_verifier_env *env, u32 off,
return 0;
}
+/*
+ * Clean up dynamically allocated fields of aux data for instructions [start, ...]
+ */
+static void clear_insn_aux_data(struct bpf_insn_aux_data *aux_data, int start, int len)
+{
+ int end = start + len;
+ int i;
+
+ for (i = start; i < end; i++) {
+ if (aux_data[i].jt_allocated) {
+ kvfree(aux_data[i].jt);
+ aux_data[i].jt = NULL;
+ aux_data[i].jt_allocated = false;
+ }
+ }
+}
+
static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
{
struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
@@ -21029,6 +21435,8 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
adjust_insn_arrays_after_remove(env, off, cnt);
+ clear_insn_aux_data(aux_data, off, cnt);
+
memmove(aux_data + off, aux_data + off + cnt,
sizeof(*aux_data) * (orig_prog_len - off - cnt));
@@ -21669,6 +22077,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->jited_linfo = prog->aux->jited_linfo;
func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
func[i]->aux->arena = prog->aux->arena;
+ func[i]->aux->used_maps = env->used_maps;
+ func[i]->aux->used_map_cnt = env->used_map_cnt;
num_exentries = 0;
insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) {
@@ -24201,23 +24611,41 @@ static bool can_jump(struct bpf_insn *insn)
return false;
}
-static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
+/*
+ * Returns an array of instructions succ, with succ->off[0], ...,
+ * succ->off[n-1] with successor instructions, where n=succ->off_cnt
+ */
+static struct bpf_iarray *
+insn_successors(struct bpf_verifier_env *env, u32 insn_idx)
{
- struct bpf_insn *insn = &prog->insnsi[idx];
- int i = 0, insn_sz;
+ struct bpf_prog *prog = env->prog;
+ struct bpf_insn *insn = &prog->insnsi[insn_idx];
+ struct bpf_iarray *succ;
+ int insn_sz;
u32 dst;
- insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
- if (can_fallthrough(insn) && idx + 1 < prog->len)
- succ[i++] = idx + insn_sz;
+ if (unlikely(insn_is_gotox(insn))) {
+ succ = env->insn_aux_data[insn_idx].jt;
+ if (verifier_bug_if(!succ, env,
+ "aux data for insn %u doesn't contain a jump table\n",
+ insn_idx))
+ return ERR_PTR(-EFAULT);
+ } else {
+ /* pre-allocated array of size up to 2; reset cnt, as it may be used already */
+ succ = env->succ;
+ succ->off_cnt = 0;
- if (can_jump(insn)) {
- dst = idx + jmp_offset(insn) + 1;
- if (i == 0 || succ[0] != dst)
- succ[i++] = dst;
- }
+ insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
+ if (can_fallthrough(insn) && insn_idx + 1 < prog->len)
+ succ->off[succ->off_cnt++] = insn_idx + insn_sz;
- return i;
+ if (can_jump(insn)) {
+ dst = insn_idx + jmp_offset(insn) + 1;
+ if (succ->off_cnt == 0 || succ->off[0] != dst)
+ succ->off[succ->off_cnt++] = dst;
+ }
+ }
+ return succ;
}
/* Each field is a register bitmask */
@@ -24412,14 +24840,18 @@ static int compute_live_registers(struct bpf_verifier_env *env)
for (i = 0; i < env->cfg.cur_postorder; ++i) {
int insn_idx = env->cfg.insn_postorder[i];
struct insn_live_regs *live = &state[insn_idx];
- int succ_num;
- u32 succ[2];
+ struct bpf_iarray *succ;
u16 new_out = 0;
u16 new_in = 0;
- succ_num = insn_successors(env->prog, insn_idx, succ);
- for (int s = 0; s < succ_num; ++s)
- new_out |= state[succ[s]].in;
+ succ = insn_successors(env, insn_idx);
+ if (IS_ERR(succ)) {
+ err = PTR_ERR(succ);
+ goto out;
+
+ }
+ for (int s = 0; s < succ->off_cnt; ++s)
+ new_out |= state[succ->off[s]].in;
new_in = (new_out & ~live->def) | live->use;
if (new_out != live->out || new_in != live->in) {
live->in = new_in;
@@ -24475,11 +24907,10 @@ static int compute_scc(struct bpf_verifier_env *env)
const u32 insn_cnt = env->prog->len;
int stack_sz, dfs_sz, err = 0;
u32 *stack, *pre, *low, *dfs;
- u32 succ_cnt, i, j, t, w;
+ u32 i, j, t, w;
u32 next_preorder_num;
u32 next_scc_id;
bool assign_scc;
- u32 succ[2];
next_preorder_num = 1;
next_scc_id = 1;
@@ -24578,6 +25009,8 @@ static int compute_scc(struct bpf_verifier_env *env)
dfs[0] = i;
dfs_continue:
while (dfs_sz) {
+ struct bpf_iarray *succ;
+
w = dfs[dfs_sz - 1];
if (pre[w] == 0) {
low[w] = next_preorder_num;
@@ -24586,12 +25019,17 @@ static int compute_scc(struct bpf_verifier_env *env)
stack[stack_sz++] = w;
}
/* Visit 'w' successors */
- succ_cnt = insn_successors(env->prog, w, succ);
- for (j = 0; j < succ_cnt; ++j) {
- if (pre[succ[j]]) {
- low[w] = min(low[w], low[succ[j]]);
+ succ = insn_successors(env, w);
+ if (IS_ERR(succ)) {
+ err = PTR_ERR(succ);
+ goto exit;
+
+ }
+ for (j = 0; j < succ->off_cnt; ++j) {
+ if (pre[succ->off[j]]) {
+ low[w] = min(low[w], low[succ->off[j]]);
} else {
- dfs[dfs_sz++] = succ[j];
+ dfs[dfs_sz++] = succ->off[j];
goto dfs_continue;
}
}
@@ -24608,8 +25046,8 @@ static int compute_scc(struct bpf_verifier_env *env)
* or if component has a self reference.
*/
assign_scc = stack[stack_sz - 1] != w;
- for (j = 0; j < succ_cnt; ++j) {
- if (succ[j] == w) {
+ for (j = 0; j < succ->off_cnt; ++j) {
+ if (succ->off[j] == w) {
assign_scc = true;
break;
}
@@ -24669,6 +25107,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
ret = -ENOMEM;
if (!env->insn_aux_data)
goto err_free_env;
+ env->succ = iarray_realloc(NULL, 2);
+ if (!env->succ)
+ goto err_free_env;
for (i = 0; i < len; i++)
env->insn_aux_data[i].orig_idx = i;
env->prog = *prog;
@@ -24908,10 +25349,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
err_unlock:
if (!is_priv)
mutex_unlock(&bpf_verifier_lock);
+ clear_insn_aux_data(env->insn_aux_data, 0, env->prog->len);
vfree(env->insn_aux_data);
err_free_env:
kvfree(env->cfg.insn_postorder);
kvfree(env->scc_info);
+ kvfree(env->succ);
kvfree(env);
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 09/13] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (7 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 08/13] bpf, x86: add support for indirect jumps Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 10/13] libbpf: fix formatting of bpf_object__append_subprog_code Anton Protopopov
` (3 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add support for indirect jump instruction.
Example output from bpftool:
0: (79) r3 = *(u64 *)(r1 +0)
1: (25) if r3 > 0x4 goto pc+666
2: (67) r3 <<= 3
3: (18) r1 = 0xffffbeefspameggs
5: (0f) r1 += r3
6: (79) r1 = *(u64 *)(r1 +0)
7: (0d) gotox r1
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
kernel/bpf/disasm.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 20883c6b1546..4a1ecc6f7582 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -183,6 +183,13 @@ static inline bool is_mov_percpu_addr(const struct bpf_insn *insn)
return insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->off == BPF_ADDR_PERCPU;
}
+static void print_bpf_ja_indirect(bpf_insn_print_t verbose,
+ void *private_data,
+ const struct bpf_insn *insn)
+{
+ verbose(private_data, "(%02x) gotox r%d\n", insn->code, insn->dst_reg);
+}
+
void print_bpf_insn(const struct bpf_insn_cbs *cbs,
const struct bpf_insn *insn,
bool allow_ptr_leaks)
@@ -358,6 +365,8 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
} else if (insn->code == (BPF_JMP | BPF_JA)) {
verbose(cbs->private_data, "(%02x) goto pc%+d\n",
insn->code, insn->off);
+ } else if (insn->code == (BPF_JMP | BPF_JA | BPF_X)) {
+ print_bpf_ja_indirect(verbose, cbs->private_data, insn);
} else if (insn->code == (BPF_JMP | BPF_JCOND) &&
insn->src_reg == BPF_MAY_GOTO) {
verbose(cbs->private_data, "(%02x) may_goto pc%+d\n",
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 10/13] libbpf: fix formatting of bpf_object__append_subprog_code
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (8 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 09/13] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 11/13] libbpf: support llvm-generated indirect jumps Anton Protopopov
` (2 subsequent siblings)
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
The commit 6c918709bd30 ("libbpf: Refactor bpf_object__reloc_code")
added the bpf_object__append_subprog_code() with incorrect indentations.
Use tabs instead. (This also makes a consequent commit better readable.)
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
tools/lib/bpf/libbpf.c | 52 +++++++++++++++++++++---------------------
1 file changed, 26 insertions(+), 26 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index fe4fc5438678..2c1f48f77680 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6393,32 +6393,32 @@ static int
bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
struct bpf_program *subprog)
{
- struct bpf_insn *insns;
- size_t new_cnt;
- int err;
-
- subprog->sub_insn_off = main_prog->insns_cnt;
-
- new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
- insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
- if (!insns) {
- pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
- return -ENOMEM;
- }
- main_prog->insns = insns;
- main_prog->insns_cnt = new_cnt;
-
- memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
- subprog->insns_cnt * sizeof(*insns));
-
- pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
- main_prog->name, subprog->insns_cnt, subprog->name);
-
- /* The subprog insns are now appended. Append its relos too. */
- err = append_subprog_relos(main_prog, subprog);
- if (err)
- return err;
- return 0;
+ struct bpf_insn *insns;
+ size_t new_cnt;
+ int err;
+
+ subprog->sub_insn_off = main_prog->insns_cnt;
+
+ new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
+ insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
+ if (!insns) {
+ pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
+ return -ENOMEM;
+ }
+ main_prog->insns = insns;
+ main_prog->insns_cnt = new_cnt;
+
+ memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
+ subprog->insns_cnt * sizeof(*insns));
+
+ pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
+ main_prog->name, subprog->insns_cnt, subprog->name);
+
+ /* The subprog insns are now appended. Append its relos too. */
+ err = append_subprog_relos(main_prog, subprog);
+ if (err)
+ return err;
+ return 0;
}
static int
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 11/13] libbpf: support llvm-generated indirect jumps
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (9 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 10/13] libbpf: fix formatting of bpf_object__append_subprog_code Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 12/13] bpftool: Recognize insn_array map type Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 13/13] selftests/bpf: add selftests for indirect jumps Anton Protopopov
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
For v5 instruction set LLVM is allowed to generate indirect jumps for
switch statements and for 'goto *rX' assembly. Every such a jump will
be accompanied by necessary metadata, e.g. (`llvm-objdump -Sr ...`):
0: r2 = 0x0 ll
0000000000000030: R_BPF_64_64 BPF.JT.0.0
Here BPF.JT.1.0 is a symbol residing in the .jumptables section:
Symbol table:
4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
The -bpf-min-jump-table-entries llvm option may be used to control the
minimal size of a switch which will be converted to an indirect jumps.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
tools/lib/bpf/libbpf.c | 150 +++++++++++++++++++++++++++++++++-
tools/lib/bpf/libbpf_probes.c | 4 +
tools/lib/bpf/linker.c | 10 ++-
3 files changed, 161 insertions(+), 3 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2c1f48f77680..57cac0810d2e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -191,6 +191,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf",
[BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
[BPF_MAP_TYPE_ARENA] = "arena",
+ [BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
};
static const char * const prog_type_name[] = {
@@ -372,6 +373,7 @@ enum reloc_type {
RELO_EXTERN_CALL,
RELO_SUBPROG_ADDR,
RELO_CORE,
+ RELO_INSN_ARRAY,
};
struct reloc_desc {
@@ -382,7 +384,10 @@ struct reloc_desc {
struct {
int map_idx;
int sym_off;
- int ext_idx;
+ union {
+ int ext_idx;
+ int sym_size;
+ };
};
};
};
@@ -424,6 +429,11 @@ struct bpf_sec_def {
libbpf_prog_attach_fn_t prog_attach_fn;
};
+struct bpf_light_subprog {
+ __u32 sec_insn_off;
+ __u32 sub_insn_off;
+};
+
/*
* bpf_prog should be a better name but it has been used in
* linux/filter.h.
@@ -496,6 +506,9 @@ struct bpf_program {
__u32 line_info_rec_size;
__u32 line_info_cnt;
__u32 prog_flags;
+
+ struct bpf_light_subprog *subprog;
+ __u32 subprog_cnt;
};
struct bpf_struct_ops {
@@ -525,6 +538,7 @@ struct bpf_struct_ops {
#define STRUCT_OPS_SEC ".struct_ops"
#define STRUCT_OPS_LINK_SEC ".struct_ops.link"
#define ARENA_SEC ".addr_space.1"
+#define JUMPTABLES_SEC ".jumptables"
enum libbpf_map_type {
LIBBPF_MAP_UNSPEC,
@@ -668,6 +682,7 @@ struct elf_state {
int symbols_shndx;
bool has_st_ops;
int arena_data_shndx;
+ int jumptables_data_shndx;
};
struct usdt_manager;
@@ -739,6 +754,9 @@ struct bpf_object {
void *arena_data;
size_t arena_data_sz;
+ void *jumptables_data;
+ size_t jumptables_data_sz;
+
struct kern_feature_cache *feat_cache;
char *token_path;
int token_fd;
@@ -765,6 +783,7 @@ void bpf_program__unload(struct bpf_program *prog)
zfree(&prog->func_info);
zfree(&prog->line_info);
+ zfree(&prog->subprog);
}
static void bpf_program__exit(struct bpf_program *prog)
@@ -3945,6 +3964,13 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
} else if (strcmp(name, ARENA_SEC) == 0) {
obj->efile.arena_data = data;
obj->efile.arena_data_shndx = idx;
+ } else if (strcmp(name, JUMPTABLES_SEC) == 0) {
+ obj->jumptables_data = malloc(data->d_size);
+ if (!obj->jumptables_data)
+ return -ENOMEM;
+ memcpy(obj->jumptables_data, data->d_buf, data->d_size);
+ obj->jumptables_data_sz = data->d_size;
+ obj->efile.jumptables_data_shndx = idx;
} else {
pr_info("elf: skipping unrecognized data section(%d) %s\n",
idx, name);
@@ -4599,6 +4625,16 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
return 0;
}
+ /* jump table data relocation */
+ if (shdr_idx == obj->efile.jumptables_data_shndx) {
+ reloc_desc->type = RELO_INSN_ARRAY;
+ reloc_desc->insn_idx = insn_idx;
+ reloc_desc->map_idx = -1;
+ reloc_desc->sym_off = sym->st_value;
+ reloc_desc->sym_size = sym->st_size;
+ return 0;
+ }
+
/* generic map reference relocation */
if (type == LIBBPF_MAP_UNSPEC) {
if (!bpf_object__shndx_is_maps(obj, shdr_idx)) {
@@ -6101,6 +6137,74 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
}
+static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
+{
+ const __u32 value_size = sizeof(struct bpf_insn_array_value);
+ const __u32 max_entries = size / value_size;
+ struct bpf_insn_array_value val = {};
+ int map_fd, err;
+ __u64 xlated_off;
+ __u64 *jt;
+ __u32 i;
+
+ map_fd = bpf_map_create(BPF_MAP_TYPE_INSN_ARRAY, "jt",
+ 4, value_size, max_entries, NULL);
+ if (map_fd < 0)
+ return map_fd;
+
+ if (!obj->jumptables_data) {
+ pr_warn("object contains no jumptables_data\n");
+ return -EINVAL;
+ }
+ if ((off + size) > obj->jumptables_data_sz) {
+ pr_warn("jumptables_data size is %zd, trying to access %d\n",
+ obj->jumptables_data_sz, off + size);
+ return -EINVAL;
+ }
+
+ jt = (__u64 *)(obj->jumptables_data + off);
+ for (i = 0; i < max_entries; i++) {
+ /*
+ * LLVM-generated jump tables contain u64 records, however
+ * should contain values that fit in u32.
+ * The adjust_off provided by the caller adjusts the offset to
+ * be relative to the beginning of the main function
+ */
+ xlated_off = jt[i]/sizeof(struct bpf_insn) + adjust_off;
+ if (xlated_off > UINT32_MAX) {
+ pr_warn("invalid jump table value %llx at offset %d (adjust_off %d)\n",
+ jt[i], off + i, adjust_off);
+ return -EINVAL;
+ }
+
+ val.xlated_off = xlated_off;
+ err = bpf_map_update_elem(map_fd, &i, &val, 0);
+ if (err) {
+ close(map_fd);
+ return err;
+ }
+ }
+ return map_fd;
+}
+
+/*
+ * In LLVM the .jumptables section contains jump tables entries relative to the
+ * section start. The BPF kernel-side code expects jump table offsets relative
+ * to the beginning of the program (passed in bpf(BPF_PROG_LOAD)). This helper
+ * computes a delta to be added when creating a map.
+ */
+static int jt_adjust_off(struct bpf_program *prog, int insn_idx)
+{
+ int i;
+
+ for (i = prog->subprog_cnt - 1; i >= 0; i--)
+ if (insn_idx >= prog->subprog[i].sub_insn_off)
+ return prog->subprog[i].sub_insn_off - prog->subprog[i].sec_insn_off;
+
+ return -prog->sec_insn_off;
+}
+
+
/* Relocate data references within program code:
* - map references;
* - global variable references;
@@ -6192,6 +6296,21 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
case RELO_CORE:
/* will be handled by bpf_program_record_relos() */
break;
+ case RELO_INSN_ARRAY: {
+ int map_fd;
+
+ map_fd = create_jt_map(obj, relo->sym_off, relo->sym_size,
+ jt_adjust_off(prog, relo->insn_idx));
+ if (map_fd < 0) {
+ pr_warn("prog '%s': relo #%d: can't create jump table: sym_off %u\n",
+ prog->name, i, relo->sym_off);
+ return map_fd;
+ }
+ insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
+ insn->imm = map_fd;
+ insn->off = 0;
+ }
+ break;
default:
pr_warn("prog '%s': relo #%d: bad relo type %d\n",
prog->name, i, relo->type);
@@ -6389,6 +6508,24 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
return 0;
}
+static int save_subprog_offsets(struct bpf_program *main_prog, struct bpf_program *subprog)
+{
+ size_t size = sizeof(main_prog->subprog[0]);
+ int new_cnt = main_prog->subprog_cnt + 1;
+ void *tmp;
+
+ tmp = libbpf_reallocarray(main_prog->subprog, new_cnt, size);
+ if (!tmp)
+ return -ENOMEM;
+
+ main_prog->subprog = tmp;
+ main_prog->subprog[new_cnt - 1].sec_insn_off = subprog->sec_insn_off;
+ main_prog->subprog[new_cnt - 1].sub_insn_off = subprog->sub_insn_off;
+ main_prog->subprog_cnt = new_cnt;
+
+ return 0;
+}
+
static int
bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
struct bpf_program *subprog)
@@ -6418,6 +6555,14 @@ bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main
err = append_subprog_relos(main_prog, subprog);
if (err)
return err;
+
+ /* Save subprogram offsets */
+ err = save_subprog_offsets(main_prog, subprog);
+ if (err) {
+ pr_warn("prog '%s': failed to add subprog offsets\n", main_prog->name);
+ return err;
+ }
+
return 0;
}
@@ -9185,6 +9330,9 @@ void bpf_object__close(struct bpf_object *obj)
zfree(&obj->arena_data);
+ zfree(&obj->jumptables_data);
+ obj->jumptables_data_sz = 0;
+
free(obj);
}
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 9dfbe7750f56..bccf4bb747e1 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -364,6 +364,10 @@ static int probe_map_create(enum bpf_map_type map_type)
case BPF_MAP_TYPE_SOCKHASH:
case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
break;
+ case BPF_MAP_TYPE_INSN_ARRAY:
+ key_size = sizeof(__u32);
+ value_size = sizeof(struct bpf_insn_array_value);
+ break;
case BPF_MAP_TYPE_UNSPEC:
default:
return -EOPNOTSUPP;
diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
index a469e5d4fee7..d1585baa9f14 100644
--- a/tools/lib/bpf/linker.c
+++ b/tools/lib/bpf/linker.c
@@ -28,6 +28,8 @@
#include "str_error.h"
#define BTF_EXTERN_SEC ".extern"
+#define JUMPTABLES_SEC ".jumptables"
+#define JUMPTABLES_REL_SEC ".rel.jumptables"
struct src_sec {
const char *sec_name;
@@ -2026,6 +2028,9 @@ static int linker_append_elf_sym(struct bpf_linker *linker, struct src_obj *obj,
obj->sym_map[src_sym_idx] = dst_sec->sec_sym_idx;
return 0;
}
+
+ if (strcmp(src_sec->sec_name, JUMPTABLES_SEC) == 0)
+ goto add_sym;
}
if (sym_bind == STB_LOCAL)
@@ -2272,8 +2277,9 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
insn->imm += sec->dst_off / sizeof(struct bpf_insn);
else
insn->imm += sec->dst_off;
- } else {
- pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n");
+ } else if (strcmp(src_sec->sec_name, JUMPTABLES_REL_SEC)) {
+ pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
+ src_sec->sec_name);
return -EINVAL;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 12/13] bpftool: Recognize insn_array map type
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (10 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 11/13] libbpf: support llvm-generated indirect jumps Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
2025-09-16 20:33 ` Quentin Monnet
2025-09-13 19:39 ` [PATCH v2 bpf-next 13/13] selftests/bpf: add selftests for indirect jumps Anton Protopopov
12 siblings, 1 reply; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Teach bpftool to recognize instruction array map type.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 252e4c538edb..3377d4a01c62 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -55,7 +55,7 @@ MAP COMMANDS
| | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
-| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
+| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
DESCRIPTION
===========
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index c9de44a45778..79b90f274bef 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
" devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
" cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
" queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
- " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
+ " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
" " HELP_SPEC_OPTIONS " |\n"
" {-f|--bpffs} | {-n|--nomount} }\n"
"",
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 bpf-next 13/13] selftests/bpf: add selftests for indirect jumps
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
` (11 preceding siblings ...)
2025-09-13 19:39 ` [PATCH v2 bpf-next 12/13] bpftool: Recognize insn_array map type Anton Protopopov
@ 2025-09-13 19:39 ` Anton Protopopov
12 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-13 19:39 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
Cc: Anton Protopopov
Add selftests for indirect jumps. All the indirect jumps are
generated from C switch statements, so, if compiled by a compiler
which doesn't support indirect jumps, then should pass as well.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/prog_tests/bpf_gotox.c | 132 ++++++
tools/testing/selftests/bpf/progs/bpf_gotox.c | 384 ++++++++++++++++++
3 files changed, 519 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_gotox.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_gotox.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 11d2a368db3e..606d7d5a48a7 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -453,7 +453,9 @@ BPF_CFLAGS = -g -Wall -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
-I$(abspath $(OUTPUT)/../usr/include) \
-std=gnu11 \
-fno-strict-aliasing \
- -Wno-compare-distinct-pointer-types
+ -Wno-compare-distinct-pointer-types \
+ -Wno-initializer-overrides \
+ #
# TODO: enable me -Wsign-compare
CLANG_CFLAGS = $(CLANG_SYS_INCLUDES)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_gotox.c b/tools/testing/selftests/bpf/prog_tests/bpf_gotox.c
new file mode 100644
index 000000000000..90647c080579
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_gotox.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/in6.h>
+#include <linux/udp.h>
+#include <linux/tcp.h>
+
+#include <sys/syscall.h>
+#include <bpf/bpf.h>
+
+#include "bpf_gotox.skel.h"
+
+static void __test_run(struct bpf_program *prog, void *ctx_in, size_t ctx_size_in)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, topts,
+ .ctx_in = ctx_in,
+ .ctx_size_in = ctx_size_in,
+ );
+ int err, prog_fd;
+
+ prog_fd = bpf_program__fd(prog);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run_opts err");
+}
+
+static void check_simple(struct bpf_gotox *skel,
+ struct bpf_program *prog,
+ __u64 ctx_in,
+ __u64 expected)
+{
+ skel->bss->ret_user = 0;
+
+ __test_run(prog, &ctx_in, sizeof(ctx_in));
+
+ if (!ASSERT_EQ(skel->bss->ret_user, expected, "skel->bss->ret_user"))
+ return;
+}
+
+static void check_simple_fentry(struct bpf_gotox *skel,
+ struct bpf_program *prog,
+ __u64 ctx_in,
+ __u64 expected)
+{
+ skel->bss->in_user = ctx_in;
+ skel->bss->ret_user = 0;
+
+ /* trigger */
+ usleep(1);
+
+ if (!ASSERT_EQ(skel->bss->ret_user, expected, "skel->bss->ret_user"))
+ return;
+}
+
+static void check_gotox_skel(struct bpf_gotox *skel)
+{
+ int i;
+ __u64 in[] = {0, 1, 2, 3, 4, 5, 77};
+ __u64 out[] = {2, 3, 4, 5, 7, 19, 19};
+ __u64 out2[] = {103, 104, 107, 205, 115, 1019, 1019};
+ __u64 in3[] = {0, 11, 27, 31, 22, 45, 99};
+ __u64 out3[] = {2, 3, 4, 5, 19, 19, 19};
+ __u64 in4[] = {0, 1, 2, 3, 4, 5, 77};
+ __u64 out4[] = {12, 15, 7 , 15, 12, 15, 15};
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.simple_test, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.simple_test2, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.two_switches, in[i], out2[i]);
+
+ if (0) for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.big_jump_table, in3[i], out3[i]);
+
+ if (0) for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.one_jump_two_maps, in4[i], out4[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_static_global1, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_static_global2, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_nonstatic_global1, in[i], out[i]);
+
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple(skel, skel->progs.use_nonstatic_global2, in[i], out[i]);
+
+ bpf_program__attach(skel->progs.simple_test_other_sec);
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple_fentry(skel, skel->progs.simple_test_other_sec, in[i], out[i]);
+
+ bpf_program__attach(skel->progs.use_static_global_other_sec);
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple_fentry(skel, skel->progs.use_static_global_other_sec, in[i], out[i]);
+
+ bpf_program__attach(skel->progs.use_nonstatic_global_other_sec);
+ for (i = 0; i < ARRAY_SIZE(in); i++)
+ check_simple_fentry(skel, skel->progs.use_nonstatic_global_other_sec, in[i], out[i]);
+}
+
+void gotox_skel(void)
+{
+ struct bpf_gotox *skel;
+ int ret;
+
+ skel = bpf_gotox__open();
+ if (!ASSERT_NEQ(skel, NULL, "bpf_gotox__open"))
+ return;
+
+ ret = bpf_gotox__load(skel);
+ if (!ASSERT_OK(ret, "bpf_gotox__load"))
+ return;
+
+ check_gotox_skel(skel);
+
+ bpf_gotox__destroy(skel);
+}
+
+void test_bpf_gotox(void)
+{
+ if (test__start_subtest("gotox_skel"))
+ gotox_skel();
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_gotox.c b/tools/testing/selftests/bpf/progs/bpf_gotox.c
new file mode 100644
index 000000000000..72917f34315c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_gotox.c
@@ -0,0 +1,384 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+
+__u64 in_user;
+__u64 ret_user;
+
+struct simple_ctx {
+ __u64 x;
+};
+
+__u64 some_var;
+
+/*
+ * This function adds code which will be replaced by a different
+ * number of instructions by the verifier. This adds additional
+ * stress on testing the insn_array maps corresponding to indirect jumps.
+ */
+static __always_inline void adjust_insns(__u64 x)
+{
+ some_var ^= x + bpf_jiffies64();
+}
+
+SEC("syscall")
+int simple_test(struct simple_ctx *ctx)
+{
+ switch (ctx->x) {
+ case 0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int simple_test2(struct simple_ctx *ctx)
+{
+ switch (ctx->x) {
+ case 0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int simple_test_other_sec(struct pt_regs *ctx)
+{
+ __u64 x = in_user;
+
+ switch (x) {
+ case 0:
+ adjust_insns(x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int two_switches(struct simple_ctx *ctx)
+{
+ switch (ctx->x) {
+ case 0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ switch (ctx->x + !!ret_user) {
+ case 1:
+ adjust_insns(ctx->x + 7);
+ ret_user = 103;
+ break;
+ case 2:
+ adjust_insns(ctx->x + 9);
+ ret_user = 104;
+ break;
+ case 3:
+ adjust_insns(ctx->x + 11);
+ ret_user = 107;
+ break;
+ case 4:
+ adjust_insns(ctx->x + 11);
+ ret_user = 205;
+ break;
+ case 5:
+ adjust_insns(ctx->x + 11);
+ ret_user = 115;
+ break;
+ default:
+ adjust_insns(ctx->x + 177);
+ ret_user = 1019;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int big_jump_table(struct simple_ctx *ctx __attribute__((unused)))
+{
+#if 0
+ const void *const jt[256] = {
+ [0 ... 255] = &&default_label,
+ [0] = &&l0,
+ [11] = &&l11,
+ [27] = &&l27,
+ [31] = &&l31,
+ };
+
+ goto *jt[ctx->x & 0xff];
+
+l0:
+ adjust_insns(ctx->x + 1);
+ ret_user = 2;
+ return 0;
+
+l11:
+ adjust_insns(ctx->x + 7);
+ ret_user = 3;
+ return 0;
+
+l27:
+ adjust_insns(ctx->x + 9);
+ ret_user = 4;
+ return 0;
+
+l31:
+ adjust_insns(ctx->x + 11);
+ ret_user = 5;
+ return 0;
+
+default_label:
+ adjust_insns(ctx->x + 177);
+ ret_user = 19;
+ return 0;
+#else
+ return 0;
+#endif
+}
+
+SEC("syscall")
+int one_jump_two_maps(struct simple_ctx *ctx __attribute__((unused)))
+{
+#if 0
+ __label__ l1, l2, l3, l4;
+ void *jt1[2] = { &&l1, &&l2 };
+ void *jt2[2] = { &&l3, &&l4 };
+ unsigned int a = ctx->x % 2;
+ unsigned int b = (ctx->x / 2) % 2;
+ volatile int ret = 0;
+
+ if (!(a < 2 && b < 2))
+ return 19;
+
+ if (ctx->x % 2)
+ goto *jt1[a];
+ else
+ goto *jt2[b];
+
+ l1: ret += 1;
+ l2: ret += 3;
+ l3: ret += 5;
+ l4: ret += 7;
+
+ ret_user = ret;
+ return ret;
+#else
+ return 0;
+#endif
+}
+
+/* Just to introduce some non-zero offsets in .text */
+static __noinline int f0(volatile struct simple_ctx *ctx __arg_ctx)
+{
+ if (ctx)
+ return 1;
+ else
+ return 13;
+}
+
+SEC("syscall") int f1(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ return f0(ctx);
+}
+
+static __noinline int __static_global(__u64 x)
+{
+ switch (x) {
+ case 0:
+ adjust_insns(x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int use_static_global1(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ return __static_global(ctx->x);
+}
+
+SEC("syscall")
+int use_static_global2(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ adjust_insns(ctx->x + 1);
+ return __static_global(ctx->x);
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int use_static_global_other_sec(void *ctx)
+{
+ return __static_global(in_user);
+}
+
+__noinline int __nonstatic_global(__u64 x)
+{
+ switch (x) {
+ case 0:
+ adjust_insns(x + 1);
+ ret_user = 2;
+ break;
+ case 1:
+ adjust_insns(x + 7);
+ ret_user = 3;
+ break;
+ case 2:
+ adjust_insns(x + 9);
+ ret_user = 4;
+ break;
+ case 3:
+ adjust_insns(x + 11);
+ ret_user = 5;
+ break;
+ case 4:
+ adjust_insns(x + 17);
+ ret_user = 7;
+ break;
+ default:
+ adjust_insns(x + 177);
+ ret_user = 19;
+ break;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int use_nonstatic_global1(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ return __nonstatic_global(ctx->x);
+}
+
+SEC("syscall")
+int use_nonstatic_global2(struct simple_ctx *ctx)
+{
+ ret_user = 0;
+ adjust_insns(ctx->x + 1);
+ return __nonstatic_global(ctx->x);
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int use_nonstatic_global_other_sec(void *ctx)
+{
+ return __nonstatic_global(in_user);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-13 19:39 ` [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array Anton Protopopov
@ 2025-09-15 4:09 ` kernel test robot
2025-09-20 0:30 ` Alexei Starovoitov
1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2025-09-15 4:09 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Eduard Zingerman,
Quentin Monnet, Yonghong Song
Cc: oe-kbuild-all
Hi Anton,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Anton-Protopopov/bpf-fix-the-return-value-of-push_stack/20250914-033453
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20250913193922.1910480-4-a.s.protopopov%40gmail.com
patch subject: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
config: x86_64-randconfig-078-20250914 (https://download.01.org/0day-ci/archive/20250915/202509151152.1FcyFoR8-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250915/202509151152.1FcyFoR8-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509151152.1FcyFoR8-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: arch/x86/net/bpf_jit_comp.o: in function `do_jit':
>> arch/x86/net/bpf_jit_comp.c:2726:(.text+0xbb78): undefined reference to `bpf_prog_update_insn_ptr'
vim +2726 arch/x86/net/bpf_jit_comp.c
1603
1604 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
1605 int oldproglen, struct jit_context *ctx, bool jmp_padding)
1606 {
1607 bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
1608 struct bpf_insn *insn = bpf_prog->insnsi;
1609 bool callee_regs_used[4] = {};
1610 int insn_cnt = bpf_prog->len;
1611 bool seen_exit = false;
1612 u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
1613 void __percpu *priv_frame_ptr = NULL;
1614 u64 arena_vm_start, user_vm_start;
1615 void __percpu *priv_stack_ptr;
1616 int i, excnt = 0;
1617 int ilen, proglen = 0;
1618 u8 *prog = temp;
1619 u32 stack_depth;
1620 int err;
1621
1622 stack_depth = bpf_prog->aux->stack_depth;
1623 priv_stack_ptr = bpf_prog->aux->priv_stack_ptr;
1624 if (priv_stack_ptr) {
1625 priv_frame_ptr = priv_stack_ptr + PRIV_STACK_GUARD_SZ + round_up(stack_depth, 8);
1626 stack_depth = 0;
1627 }
1628
1629 arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena);
1630 user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena);
1631
1632 detect_reg_usage(insn, insn_cnt, callee_regs_used);
1633
1634 emit_prologue(&prog, image, stack_depth,
1635 bpf_prog_was_classic(bpf_prog), tail_call_reachable,
1636 bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
1637 /* Exception callback will clobber callee regs for its own use, and
1638 * restore the original callee regs from main prog's stack frame.
1639 */
1640 if (bpf_prog->aux->exception_boundary) {
1641 /* We also need to save r12, which is not mapped to any BPF
1642 * register, as we throw after entry into the kernel, which may
1643 * overwrite r12.
1644 */
1645 push_r12(&prog);
1646 push_callee_regs(&prog, all_callee_regs_used);
1647 } else {
1648 if (arena_vm_start)
1649 push_r12(&prog);
1650 push_callee_regs(&prog, callee_regs_used);
1651 }
1652 if (arena_vm_start)
1653 emit_mov_imm64(&prog, X86_REG_R12,
1654 arena_vm_start >> 32, (u32) arena_vm_start);
1655
1656 if (priv_frame_ptr)
1657 emit_priv_frame_ptr(&prog, priv_frame_ptr);
1658
1659 ilen = prog - temp;
1660 if (rw_image)
1661 memcpy(rw_image + proglen, temp, ilen);
1662 proglen += ilen;
1663 addrs[0] = proglen;
1664 prog = temp;
1665
1666 for (i = 1; i <= insn_cnt; i++, insn++) {
1667 u32 abs_xlated_off = bpf_prog->aux->subprog_start + i - 1;
1668 const s32 imm32 = insn->imm;
1669 u32 dst_reg = insn->dst_reg;
1670 u32 src_reg = insn->src_reg;
1671 u8 b2 = 0, b3 = 0;
1672 u8 *start_of_ldx;
1673 s64 jmp_offset;
1674 s16 insn_off;
1675 u8 jmp_cond;
1676 u8 *func;
1677 int nops;
1678
1679 if (priv_frame_ptr) {
1680 if (src_reg == BPF_REG_FP)
1681 src_reg = X86_REG_R9;
1682
1683 if (dst_reg == BPF_REG_FP)
1684 dst_reg = X86_REG_R9;
1685 }
1686
1687 switch (insn->code) {
1688 /* ALU */
1689 case BPF_ALU | BPF_ADD | BPF_X:
1690 case BPF_ALU | BPF_SUB | BPF_X:
1691 case BPF_ALU | BPF_AND | BPF_X:
1692 case BPF_ALU | BPF_OR | BPF_X:
1693 case BPF_ALU | BPF_XOR | BPF_X:
1694 case BPF_ALU64 | BPF_ADD | BPF_X:
1695 case BPF_ALU64 | BPF_SUB | BPF_X:
1696 case BPF_ALU64 | BPF_AND | BPF_X:
1697 case BPF_ALU64 | BPF_OR | BPF_X:
1698 case BPF_ALU64 | BPF_XOR | BPF_X:
1699 maybe_emit_mod(&prog, dst_reg, src_reg,
1700 BPF_CLASS(insn->code) == BPF_ALU64);
1701 b2 = simple_alu_opcodes[BPF_OP(insn->code)];
1702 EMIT2(b2, add_2reg(0xC0, dst_reg, src_reg));
1703 break;
1704
1705 case BPF_ALU64 | BPF_MOV | BPF_X:
1706 if (insn_is_cast_user(insn)) {
1707 if (dst_reg != src_reg)
1708 /* 32-bit mov */
1709 emit_mov_reg(&prog, false, dst_reg, src_reg);
1710 /* shl dst_reg, 32 */
1711 maybe_emit_1mod(&prog, dst_reg, true);
1712 EMIT3(0xC1, add_1reg(0xE0, dst_reg), 32);
1713
1714 /* or dst_reg, user_vm_start */
1715 maybe_emit_1mod(&prog, dst_reg, true);
1716 if (is_axreg(dst_reg))
1717 EMIT1_off32(0x0D, user_vm_start >> 32);
1718 else
1719 EMIT2_off32(0x81, add_1reg(0xC8, dst_reg), user_vm_start >> 32);
1720
1721 /* rol dst_reg, 32 */
1722 maybe_emit_1mod(&prog, dst_reg, true);
1723 EMIT3(0xC1, add_1reg(0xC0, dst_reg), 32);
1724
1725 /* xor r11, r11 */
1726 EMIT3(0x4D, 0x31, 0xDB);
1727
1728 /* test dst_reg32, dst_reg32; check if lower 32-bit are zero */
1729 maybe_emit_mod(&prog, dst_reg, dst_reg, false);
1730 EMIT2(0x85, add_2reg(0xC0, dst_reg, dst_reg));
1731
1732 /* cmove r11, dst_reg; if so, set dst_reg to zero */
1733 /* WARNING: Intel swapped src/dst register encoding in CMOVcc !!! */
1734 maybe_emit_mod(&prog, AUX_REG, dst_reg, true);
1735 EMIT3(0x0F, 0x44, add_2reg(0xC0, AUX_REG, dst_reg));
1736 break;
1737 } else if (insn_is_mov_percpu_addr(insn)) {
1738 /* mov <dst>, <src> (if necessary) */
1739 EMIT_mov(dst_reg, src_reg);
1740 #ifdef CONFIG_SMP
1741 /* add <dst>, gs:[<off>] */
1742 EMIT2(0x65, add_1mod(0x48, dst_reg));
1743 EMIT3(0x03, add_2reg(0x04, 0, dst_reg), 0x25);
1744 EMIT((u32)(unsigned long)&this_cpu_off, 4);
1745 #endif
1746 break;
1747 }
1748 fallthrough;
1749 case BPF_ALU | BPF_MOV | BPF_X:
1750 if (insn->off == 0)
1751 emit_mov_reg(&prog,
1752 BPF_CLASS(insn->code) == BPF_ALU64,
1753 dst_reg, src_reg);
1754 else
1755 emit_movsx_reg(&prog, insn->off,
1756 BPF_CLASS(insn->code) == BPF_ALU64,
1757 dst_reg, src_reg);
1758 break;
1759
1760 /* neg dst */
1761 case BPF_ALU | BPF_NEG:
1762 case BPF_ALU64 | BPF_NEG:
1763 maybe_emit_1mod(&prog, dst_reg,
1764 BPF_CLASS(insn->code) == BPF_ALU64);
1765 EMIT2(0xF7, add_1reg(0xD8, dst_reg));
1766 break;
1767
1768 case BPF_ALU | BPF_ADD | BPF_K:
1769 case BPF_ALU | BPF_SUB | BPF_K:
1770 case BPF_ALU | BPF_AND | BPF_K:
1771 case BPF_ALU | BPF_OR | BPF_K:
1772 case BPF_ALU | BPF_XOR | BPF_K:
1773 case BPF_ALU64 | BPF_ADD | BPF_K:
1774 case BPF_ALU64 | BPF_SUB | BPF_K:
1775 case BPF_ALU64 | BPF_AND | BPF_K:
1776 case BPF_ALU64 | BPF_OR | BPF_K:
1777 case BPF_ALU64 | BPF_XOR | BPF_K:
1778 maybe_emit_1mod(&prog, dst_reg,
1779 BPF_CLASS(insn->code) == BPF_ALU64);
1780
1781 /*
1782 * b3 holds 'normal' opcode, b2 short form only valid
1783 * in case dst is eax/rax.
1784 */
1785 switch (BPF_OP(insn->code)) {
1786 case BPF_ADD:
1787 b3 = 0xC0;
1788 b2 = 0x05;
1789 break;
1790 case BPF_SUB:
1791 b3 = 0xE8;
1792 b2 = 0x2D;
1793 break;
1794 case BPF_AND:
1795 b3 = 0xE0;
1796 b2 = 0x25;
1797 break;
1798 case BPF_OR:
1799 b3 = 0xC8;
1800 b2 = 0x0D;
1801 break;
1802 case BPF_XOR:
1803 b3 = 0xF0;
1804 b2 = 0x35;
1805 break;
1806 }
1807
1808 if (is_imm8(imm32))
1809 EMIT3(0x83, add_1reg(b3, dst_reg), imm32);
1810 else if (is_axreg(dst_reg))
1811 EMIT1_off32(b2, imm32);
1812 else
1813 EMIT2_off32(0x81, add_1reg(b3, dst_reg), imm32);
1814 break;
1815
1816 case BPF_ALU64 | BPF_MOV | BPF_K:
1817 case BPF_ALU | BPF_MOV | BPF_K:
1818 emit_mov_imm32(&prog, BPF_CLASS(insn->code) == BPF_ALU64,
1819 dst_reg, imm32);
1820 break;
1821
1822 case BPF_LD | BPF_IMM | BPF_DW:
1823 emit_mov_imm64(&prog, dst_reg, insn[1].imm, insn[0].imm);
1824 insn++;
1825 i++;
1826 break;
1827
1828 /* dst %= src, dst /= src, dst %= imm32, dst /= imm32 */
1829 case BPF_ALU | BPF_MOD | BPF_X:
1830 case BPF_ALU | BPF_DIV | BPF_X:
1831 case BPF_ALU | BPF_MOD | BPF_K:
1832 case BPF_ALU | BPF_DIV | BPF_K:
1833 case BPF_ALU64 | BPF_MOD | BPF_X:
1834 case BPF_ALU64 | BPF_DIV | BPF_X:
1835 case BPF_ALU64 | BPF_MOD | BPF_K:
1836 case BPF_ALU64 | BPF_DIV | BPF_K: {
1837 bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
1838
1839 if (dst_reg != BPF_REG_0)
1840 EMIT1(0x50); /* push rax */
1841 if (dst_reg != BPF_REG_3)
1842 EMIT1(0x52); /* push rdx */
1843
1844 if (BPF_SRC(insn->code) == BPF_X) {
1845 if (src_reg == BPF_REG_0 ||
1846 src_reg == BPF_REG_3) {
1847 /* mov r11, src_reg */
1848 EMIT_mov(AUX_REG, src_reg);
1849 src_reg = AUX_REG;
1850 }
1851 } else {
1852 /* mov r11, imm32 */
1853 EMIT3_off32(0x49, 0xC7, 0xC3, imm32);
1854 src_reg = AUX_REG;
1855 }
1856
1857 if (dst_reg != BPF_REG_0)
1858 /* mov rax, dst_reg */
1859 emit_mov_reg(&prog, is64, BPF_REG_0, dst_reg);
1860
1861 if (insn->off == 0) {
1862 /*
1863 * xor edx, edx
1864 * equivalent to 'xor rdx, rdx', but one byte less
1865 */
1866 EMIT2(0x31, 0xd2);
1867
1868 /* div src_reg */
1869 maybe_emit_1mod(&prog, src_reg, is64);
1870 EMIT2(0xF7, add_1reg(0xF0, src_reg));
1871 } else {
1872 if (BPF_CLASS(insn->code) == BPF_ALU)
1873 EMIT1(0x99); /* cdq */
1874 else
1875 EMIT2(0x48, 0x99); /* cqo */
1876
1877 /* idiv src_reg */
1878 maybe_emit_1mod(&prog, src_reg, is64);
1879 EMIT2(0xF7, add_1reg(0xF8, src_reg));
1880 }
1881
1882 if (BPF_OP(insn->code) == BPF_MOD &&
1883 dst_reg != BPF_REG_3)
1884 /* mov dst_reg, rdx */
1885 emit_mov_reg(&prog, is64, dst_reg, BPF_REG_3);
1886 else if (BPF_OP(insn->code) == BPF_DIV &&
1887 dst_reg != BPF_REG_0)
1888 /* mov dst_reg, rax */
1889 emit_mov_reg(&prog, is64, dst_reg, BPF_REG_0);
1890
1891 if (dst_reg != BPF_REG_3)
1892 EMIT1(0x5A); /* pop rdx */
1893 if (dst_reg != BPF_REG_0)
1894 EMIT1(0x58); /* pop rax */
1895 break;
1896 }
1897
1898 case BPF_ALU | BPF_MUL | BPF_K:
1899 case BPF_ALU64 | BPF_MUL | BPF_K:
1900 maybe_emit_mod(&prog, dst_reg, dst_reg,
1901 BPF_CLASS(insn->code) == BPF_ALU64);
1902
1903 if (is_imm8(imm32))
1904 /* imul dst_reg, dst_reg, imm8 */
1905 EMIT3(0x6B, add_2reg(0xC0, dst_reg, dst_reg),
1906 imm32);
1907 else
1908 /* imul dst_reg, dst_reg, imm32 */
1909 EMIT2_off32(0x69,
1910 add_2reg(0xC0, dst_reg, dst_reg),
1911 imm32);
1912 break;
1913
1914 case BPF_ALU | BPF_MUL | BPF_X:
1915 case BPF_ALU64 | BPF_MUL | BPF_X:
1916 maybe_emit_mod(&prog, src_reg, dst_reg,
1917 BPF_CLASS(insn->code) == BPF_ALU64);
1918
1919 /* imul dst_reg, src_reg */
1920 EMIT3(0x0F, 0xAF, add_2reg(0xC0, src_reg, dst_reg));
1921 break;
1922
1923 /* Shifts */
1924 case BPF_ALU | BPF_LSH | BPF_K:
1925 case BPF_ALU | BPF_RSH | BPF_K:
1926 case BPF_ALU | BPF_ARSH | BPF_K:
1927 case BPF_ALU64 | BPF_LSH | BPF_K:
1928 case BPF_ALU64 | BPF_RSH | BPF_K:
1929 case BPF_ALU64 | BPF_ARSH | BPF_K:
1930 maybe_emit_1mod(&prog, dst_reg,
1931 BPF_CLASS(insn->code) == BPF_ALU64);
1932
1933 b3 = simple_alu_opcodes[BPF_OP(insn->code)];
1934 if (imm32 == 1)
1935 EMIT2(0xD1, add_1reg(b3, dst_reg));
1936 else
1937 EMIT3(0xC1, add_1reg(b3, dst_reg), imm32);
1938 break;
1939
1940 case BPF_ALU | BPF_LSH | BPF_X:
1941 case BPF_ALU | BPF_RSH | BPF_X:
1942 case BPF_ALU | BPF_ARSH | BPF_X:
1943 case BPF_ALU64 | BPF_LSH | BPF_X:
1944 case BPF_ALU64 | BPF_RSH | BPF_X:
1945 case BPF_ALU64 | BPF_ARSH | BPF_X:
1946 /* BMI2 shifts aren't better when shift count is already in rcx */
1947 if (boot_cpu_has(X86_FEATURE_BMI2) && src_reg != BPF_REG_4) {
1948 /* shrx/sarx/shlx dst_reg, dst_reg, src_reg */
1949 bool w = (BPF_CLASS(insn->code) == BPF_ALU64);
1950 u8 op;
1951
1952 switch (BPF_OP(insn->code)) {
1953 case BPF_LSH:
1954 op = 1; /* prefix 0x66 */
1955 break;
1956 case BPF_RSH:
1957 op = 3; /* prefix 0xf2 */
1958 break;
1959 case BPF_ARSH:
1960 op = 2; /* prefix 0xf3 */
1961 break;
1962 }
1963
1964 emit_shiftx(&prog, dst_reg, src_reg, w, op);
1965
1966 break;
1967 }
1968
1969 if (src_reg != BPF_REG_4) { /* common case */
1970 /* Check for bad case when dst_reg == rcx */
1971 if (dst_reg == BPF_REG_4) {
1972 /* mov r11, dst_reg */
1973 EMIT_mov(AUX_REG, dst_reg);
1974 dst_reg = AUX_REG;
1975 } else {
1976 EMIT1(0x51); /* push rcx */
1977 }
1978 /* mov rcx, src_reg */
1979 EMIT_mov(BPF_REG_4, src_reg);
1980 }
1981
1982 /* shl %rax, %cl | shr %rax, %cl | sar %rax, %cl */
1983 maybe_emit_1mod(&prog, dst_reg,
1984 BPF_CLASS(insn->code) == BPF_ALU64);
1985
1986 b3 = simple_alu_opcodes[BPF_OP(insn->code)];
1987 EMIT2(0xD3, add_1reg(b3, dst_reg));
1988
1989 if (src_reg != BPF_REG_4) {
1990 if (insn->dst_reg == BPF_REG_4)
1991 /* mov dst_reg, r11 */
1992 EMIT_mov(insn->dst_reg, AUX_REG);
1993 else
1994 EMIT1(0x59); /* pop rcx */
1995 }
1996
1997 break;
1998
1999 case BPF_ALU | BPF_END | BPF_FROM_BE:
2000 case BPF_ALU64 | BPF_END | BPF_FROM_LE:
2001 switch (imm32) {
2002 case 16:
2003 /* Emit 'ror %ax, 8' to swap lower 2 bytes */
2004 EMIT1(0x66);
2005 if (is_ereg(dst_reg))
2006 EMIT1(0x41);
2007 EMIT3(0xC1, add_1reg(0xC8, dst_reg), 8);
2008
2009 /* Emit 'movzwl eax, ax' */
2010 if (is_ereg(dst_reg))
2011 EMIT3(0x45, 0x0F, 0xB7);
2012 else
2013 EMIT2(0x0F, 0xB7);
2014 EMIT1(add_2reg(0xC0, dst_reg, dst_reg));
2015 break;
2016 case 32:
2017 /* Emit 'bswap eax' to swap lower 4 bytes */
2018 if (is_ereg(dst_reg))
2019 EMIT2(0x41, 0x0F);
2020 else
2021 EMIT1(0x0F);
2022 EMIT1(add_1reg(0xC8, dst_reg));
2023 break;
2024 case 64:
2025 /* Emit 'bswap rax' to swap 8 bytes */
2026 EMIT3(add_1mod(0x48, dst_reg), 0x0F,
2027 add_1reg(0xC8, dst_reg));
2028 break;
2029 }
2030 break;
2031
2032 case BPF_ALU | BPF_END | BPF_FROM_LE:
2033 switch (imm32) {
2034 case 16:
2035 /*
2036 * Emit 'movzwl eax, ax' to zero extend 16-bit
2037 * into 64 bit
2038 */
2039 if (is_ereg(dst_reg))
2040 EMIT3(0x45, 0x0F, 0xB7);
2041 else
2042 EMIT2(0x0F, 0xB7);
2043 EMIT1(add_2reg(0xC0, dst_reg, dst_reg));
2044 break;
2045 case 32:
2046 /* Emit 'mov eax, eax' to clear upper 32-bits */
2047 if (is_ereg(dst_reg))
2048 EMIT1(0x45);
2049 EMIT2(0x89, add_2reg(0xC0, dst_reg, dst_reg));
2050 break;
2051 case 64:
2052 /* nop */
2053 break;
2054 }
2055 break;
2056
2057 /* speculation barrier */
2058 case BPF_ST | BPF_NOSPEC:
2059 EMIT_LFENCE();
2060 break;
2061
2062 /* ST: *(u8*)(dst_reg + off) = imm */
2063 case BPF_ST | BPF_MEM | BPF_B:
2064 if (is_ereg(dst_reg))
2065 EMIT2(0x41, 0xC6);
2066 else
2067 EMIT1(0xC6);
2068 goto st;
2069 case BPF_ST | BPF_MEM | BPF_H:
2070 if (is_ereg(dst_reg))
2071 EMIT3(0x66, 0x41, 0xC7);
2072 else
2073 EMIT2(0x66, 0xC7);
2074 goto st;
2075 case BPF_ST | BPF_MEM | BPF_W:
2076 if (is_ereg(dst_reg))
2077 EMIT2(0x41, 0xC7);
2078 else
2079 EMIT1(0xC7);
2080 goto st;
2081 case BPF_ST | BPF_MEM | BPF_DW:
2082 EMIT2(add_1mod(0x48, dst_reg), 0xC7);
2083
2084 st: if (is_imm8(insn->off))
2085 EMIT2(add_1reg(0x40, dst_reg), insn->off);
2086 else
2087 EMIT1_off32(add_1reg(0x80, dst_reg), insn->off);
2088
2089 EMIT(imm32, bpf_size_to_x86_bytes(BPF_SIZE(insn->code)));
2090 break;
2091
2092 /* STX: *(u8*)(dst_reg + off) = src_reg */
2093 case BPF_STX | BPF_MEM | BPF_B:
2094 case BPF_STX | BPF_MEM | BPF_H:
2095 case BPF_STX | BPF_MEM | BPF_W:
2096 case BPF_STX | BPF_MEM | BPF_DW:
2097 emit_stx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
2098 break;
2099
2100 case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
2101 case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
2102 case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
2103 case BPF_ST | BPF_PROBE_MEM32 | BPF_DW:
2104 start_of_ldx = prog;
2105 emit_st_r12(&prog, BPF_SIZE(insn->code), dst_reg, insn->off, insn->imm);
2106 goto populate_extable;
2107
2108 /* LDX: dst_reg = *(u8*)(src_reg + r12 + off) */
2109 case BPF_LDX | BPF_PROBE_MEM32 | BPF_B:
2110 case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
2111 case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
2112 case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
2113 case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
2114 case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
2115 case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
2116 case BPF_STX | BPF_PROBE_MEM32 | BPF_DW:
2117 start_of_ldx = prog;
2118 if (BPF_CLASS(insn->code) == BPF_LDX)
2119 emit_ldx_r12(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
2120 else
2121 emit_stx_r12(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
2122 populate_extable:
2123 {
2124 struct exception_table_entry *ex;
2125 u8 *_insn = image + proglen + (start_of_ldx - temp);
2126 u32 arena_reg, fixup_reg;
2127 s64 delta;
2128
2129 if (!bpf_prog->aux->extable)
2130 break;
2131
2132 if (excnt >= bpf_prog->aux->num_exentries) {
2133 pr_err("mem32 extable bug\n");
2134 return -EFAULT;
2135 }
2136 ex = &bpf_prog->aux->extable[excnt++];
2137
2138 delta = _insn - (u8 *)&ex->insn;
2139 /* switch ex to rw buffer for writes */
2140 ex = (void *)rw_image + ((void *)ex - (void *)image);
2141
2142 ex->insn = delta;
2143
2144 ex->data = EX_TYPE_BPF;
2145
2146 /*
2147 * src_reg/dst_reg holds the address in the arena region with upper
2148 * 32-bits being zero because of a preceding addr_space_cast(r<n>,
2149 * 0x0, 0x1) instruction. This address is adjusted with the addition
2150 * of arena_vm_start (see the implementation of BPF_PROBE_MEM32 and
2151 * BPF_PROBE_ATOMIC) before being used for the memory access. Pass
2152 * the reg holding the unmodified 32-bit address to
2153 * ex_handler_bpf().
2154 */
2155 if (BPF_CLASS(insn->code) == BPF_LDX) {
2156 arena_reg = reg2pt_regs[src_reg];
2157 fixup_reg = reg2pt_regs[dst_reg];
2158 } else {
2159 arena_reg = reg2pt_regs[dst_reg];
2160 fixup_reg = DONT_CLEAR;
2161 }
2162
2163 ex->fixup = FIELD_PREP(FIXUP_INSN_LEN_MASK, prog - start_of_ldx) |
2164 FIELD_PREP(FIXUP_ARENA_REG_MASK, arena_reg) |
2165 FIELD_PREP(FIXUP_REG_MASK, fixup_reg);
2166 ex->fixup |= FIXUP_ARENA_ACCESS;
2167
2168 ex->data |= FIELD_PREP(DATA_ARENA_OFFSET_MASK, insn->off);
2169 }
2170 break;
2171
2172 /* LDX: dst_reg = *(u8*)(src_reg + off) */
2173 case BPF_LDX | BPF_MEM | BPF_B:
2174 case BPF_LDX | BPF_PROBE_MEM | BPF_B:
2175 case BPF_LDX | BPF_MEM | BPF_H:
2176 case BPF_LDX | BPF_PROBE_MEM | BPF_H:
2177 case BPF_LDX | BPF_MEM | BPF_W:
2178 case BPF_LDX | BPF_PROBE_MEM | BPF_W:
2179 case BPF_LDX | BPF_MEM | BPF_DW:
2180 case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
2181 /* LDXS: dst_reg = *(s8*)(src_reg + off) */
2182 case BPF_LDX | BPF_MEMSX | BPF_B:
2183 case BPF_LDX | BPF_MEMSX | BPF_H:
2184 case BPF_LDX | BPF_MEMSX | BPF_W:
2185 case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
2186 case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
2187 case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
2188 insn_off = insn->off;
2189
2190 if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
2191 BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
2192 /* Conservatively check that src_reg + insn->off is a kernel address:
2193 * src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
2194 * and
2195 * src_reg + insn->off < VSYSCALL_ADDR
2196 */
2197
2198 u64 limit = TASK_SIZE_MAX + PAGE_SIZE - VSYSCALL_ADDR;
2199 u8 *end_of_jmp;
2200
2201 /* movabsq r10, VSYSCALL_ADDR */
2202 emit_mov_imm64(&prog, BPF_REG_AX, (long)VSYSCALL_ADDR >> 32,
2203 (u32)(long)VSYSCALL_ADDR);
2204
2205 /* mov src_reg, r11 */
2206 EMIT_mov(AUX_REG, src_reg);
2207
2208 if (insn->off) {
2209 /* add r11, insn->off */
2210 maybe_emit_1mod(&prog, AUX_REG, true);
2211 EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off);
2212 }
2213
2214 /* sub r11, r10 */
2215 maybe_emit_mod(&prog, AUX_REG, BPF_REG_AX, true);
2216 EMIT2(0x29, add_2reg(0xC0, AUX_REG, BPF_REG_AX));
2217
2218 /* movabsq r10, limit */
2219 emit_mov_imm64(&prog, BPF_REG_AX, (long)limit >> 32,
2220 (u32)(long)limit);
2221
2222 /* cmp r10, r11 */
2223 maybe_emit_mod(&prog, AUX_REG, BPF_REG_AX, true);
2224 EMIT2(0x39, add_2reg(0xC0, AUX_REG, BPF_REG_AX));
2225
2226 /* if unsigned '>', goto load */
2227 EMIT2(X86_JA, 0);
2228 end_of_jmp = prog;
2229
2230 /* xor dst_reg, dst_reg */
2231 emit_mov_imm32(&prog, false, dst_reg, 0);
2232 /* jmp byte_after_ldx */
2233 EMIT2(0xEB, 0);
2234
2235 /* populate jmp_offset for JAE above to jump to start_of_ldx */
2236 start_of_ldx = prog;
2237 end_of_jmp[-1] = start_of_ldx - end_of_jmp;
2238 }
2239 if (BPF_MODE(insn->code) == BPF_PROBE_MEMSX ||
2240 BPF_MODE(insn->code) == BPF_MEMSX)
2241 emit_ldsx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off);
2242 else
2243 emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off);
2244 if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
2245 BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
2246 struct exception_table_entry *ex;
2247 u8 *_insn = image + proglen + (start_of_ldx - temp);
2248 s64 delta;
2249
2250 /* populate jmp_offset for JMP above */
2251 start_of_ldx[-1] = prog - start_of_ldx;
2252
2253 if (!bpf_prog->aux->extable)
2254 break;
2255
2256 if (excnt >= bpf_prog->aux->num_exentries) {
2257 pr_err("ex gen bug\n");
2258 return -EFAULT;
2259 }
2260 ex = &bpf_prog->aux->extable[excnt++];
2261
2262 delta = _insn - (u8 *)&ex->insn;
2263 if (!is_simm32(delta)) {
2264 pr_err("extable->insn doesn't fit into 32-bit\n");
2265 return -EFAULT;
2266 }
2267 /* switch ex to rw buffer for writes */
2268 ex = (void *)rw_image + ((void *)ex - (void *)image);
2269
2270 ex->insn = delta;
2271
2272 ex->data = EX_TYPE_BPF;
2273
2274 if (dst_reg > BPF_REG_9) {
2275 pr_err("verifier error\n");
2276 return -EFAULT;
2277 }
2278 /*
2279 * Compute size of x86 insn and its target dest x86 register.
2280 * ex_handler_bpf() will use lower 8 bits to adjust
2281 * pt_regs->ip to jump over this x86 instruction
2282 * and upper bits to figure out which pt_regs to zero out.
2283 * End result: x86 insn "mov rbx, qword ptr [rax+0x14]"
2284 * of 4 bytes will be ignored and rbx will be zero inited.
2285 */
2286 ex->fixup = FIELD_PREP(FIXUP_INSN_LEN_MASK, prog - start_of_ldx) |
2287 FIELD_PREP(FIXUP_REG_MASK, reg2pt_regs[dst_reg]);
2288 }
2289 break;
2290
2291 case BPF_STX | BPF_ATOMIC | BPF_B:
2292 case BPF_STX | BPF_ATOMIC | BPF_H:
2293 if (!bpf_atomic_is_load_store(insn)) {
2294 pr_err("bpf_jit: 1- and 2-byte RMW atomics are not supported\n");
2295 return -EFAULT;
2296 }
2297 fallthrough;
2298 case BPF_STX | BPF_ATOMIC | BPF_W:
2299 case BPF_STX | BPF_ATOMIC | BPF_DW:
2300 if (insn->imm == (BPF_AND | BPF_FETCH) ||
2301 insn->imm == (BPF_OR | BPF_FETCH) ||
2302 insn->imm == (BPF_XOR | BPF_FETCH)) {
2303 bool is64 = BPF_SIZE(insn->code) == BPF_DW;
2304 u32 real_src_reg = src_reg;
2305 u32 real_dst_reg = dst_reg;
2306 u8 *branch_target;
2307
2308 /*
2309 * Can't be implemented with a single x86 insn.
2310 * Need to do a CMPXCHG loop.
2311 */
2312
2313 /* Will need RAX as a CMPXCHG operand so save R0 */
2314 emit_mov_reg(&prog, true, BPF_REG_AX, BPF_REG_0);
2315 if (src_reg == BPF_REG_0)
2316 real_src_reg = BPF_REG_AX;
2317 if (dst_reg == BPF_REG_0)
2318 real_dst_reg = BPF_REG_AX;
2319
2320 branch_target = prog;
2321 /* Load old value */
2322 emit_ldx(&prog, BPF_SIZE(insn->code),
2323 BPF_REG_0, real_dst_reg, insn->off);
2324 /*
2325 * Perform the (commutative) operation locally,
2326 * put the result in the AUX_REG.
2327 */
2328 emit_mov_reg(&prog, is64, AUX_REG, BPF_REG_0);
2329 maybe_emit_mod(&prog, AUX_REG, real_src_reg, is64);
2330 EMIT2(simple_alu_opcodes[BPF_OP(insn->imm)],
2331 add_2reg(0xC0, AUX_REG, real_src_reg));
2332 /* Attempt to swap in new value */
2333 err = emit_atomic_rmw(&prog, BPF_CMPXCHG,
2334 real_dst_reg, AUX_REG,
2335 insn->off,
2336 BPF_SIZE(insn->code));
2337 if (WARN_ON(err))
2338 return err;
2339 /*
2340 * ZF tells us whether we won the race. If it's
2341 * cleared we need to try again.
2342 */
2343 EMIT2(X86_JNE, -(prog - branch_target) - 2);
2344 /* Return the pre-modification value */
2345 emit_mov_reg(&prog, is64, real_src_reg, BPF_REG_0);
2346 /* Restore R0 after clobbering RAX */
2347 emit_mov_reg(&prog, true, BPF_REG_0, BPF_REG_AX);
2348 break;
2349 }
2350
2351 if (bpf_atomic_is_load_store(insn))
2352 err = emit_atomic_ld_st(&prog, insn->imm, dst_reg, src_reg,
2353 insn->off, BPF_SIZE(insn->code));
2354 else
2355 err = emit_atomic_rmw(&prog, insn->imm, dst_reg, src_reg,
2356 insn->off, BPF_SIZE(insn->code));
2357 if (err)
2358 return err;
2359 break;
2360
2361 case BPF_STX | BPF_PROBE_ATOMIC | BPF_B:
2362 case BPF_STX | BPF_PROBE_ATOMIC | BPF_H:
2363 if (!bpf_atomic_is_load_store(insn)) {
2364 pr_err("bpf_jit: 1- and 2-byte RMW atomics are not supported\n");
2365 return -EFAULT;
2366 }
2367 fallthrough;
2368 case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
2369 case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
2370 start_of_ldx = prog;
2371
2372 if (bpf_atomic_is_load_store(insn))
2373 err = emit_atomic_ld_st_index(&prog, insn->imm,
2374 BPF_SIZE(insn->code), dst_reg,
2375 src_reg, X86_REG_R12, insn->off);
2376 else
2377 err = emit_atomic_rmw_index(&prog, insn->imm, BPF_SIZE(insn->code),
2378 dst_reg, src_reg, X86_REG_R12,
2379 insn->off);
2380 if (err)
2381 return err;
2382 goto populate_extable;
2383
2384 /* call */
2385 case BPF_JMP | BPF_CALL: {
2386 u8 *ip = image + addrs[i - 1];
2387
2388 func = (u8 *) __bpf_call_base + imm32;
2389 if (src_reg == BPF_PSEUDO_CALL && tail_call_reachable) {
2390 LOAD_TAIL_CALL_CNT_PTR(stack_depth);
2391 ip += 7;
2392 }
2393 if (!imm32)
2394 return -EINVAL;
2395 if (priv_frame_ptr) {
2396 push_r9(&prog);
2397 ip += 2;
2398 }
2399 ip += x86_call_depth_emit_accounting(&prog, func, ip);
2400 if (emit_call(&prog, func, ip))
2401 return -EINVAL;
2402 if (priv_frame_ptr)
2403 pop_r9(&prog);
2404 break;
2405 }
2406
2407 case BPF_JMP | BPF_TAIL_CALL:
2408 if (imm32)
2409 emit_bpf_tail_call_direct(bpf_prog,
2410 &bpf_prog->aux->poke_tab[imm32 - 1],
2411 &prog, image + addrs[i - 1],
2412 callee_regs_used,
2413 stack_depth,
2414 ctx);
2415 else
2416 emit_bpf_tail_call_indirect(bpf_prog,
2417 &prog,
2418 callee_regs_used,
2419 stack_depth,
2420 image + addrs[i - 1],
2421 ctx);
2422 break;
2423
2424 /* cond jump */
2425 case BPF_JMP | BPF_JEQ | BPF_X:
2426 case BPF_JMP | BPF_JNE | BPF_X:
2427 case BPF_JMP | BPF_JGT | BPF_X:
2428 case BPF_JMP | BPF_JLT | BPF_X:
2429 case BPF_JMP | BPF_JGE | BPF_X:
2430 case BPF_JMP | BPF_JLE | BPF_X:
2431 case BPF_JMP | BPF_JSGT | BPF_X:
2432 case BPF_JMP | BPF_JSLT | BPF_X:
2433 case BPF_JMP | BPF_JSGE | BPF_X:
2434 case BPF_JMP | BPF_JSLE | BPF_X:
2435 case BPF_JMP32 | BPF_JEQ | BPF_X:
2436 case BPF_JMP32 | BPF_JNE | BPF_X:
2437 case BPF_JMP32 | BPF_JGT | BPF_X:
2438 case BPF_JMP32 | BPF_JLT | BPF_X:
2439 case BPF_JMP32 | BPF_JGE | BPF_X:
2440 case BPF_JMP32 | BPF_JLE | BPF_X:
2441 case BPF_JMP32 | BPF_JSGT | BPF_X:
2442 case BPF_JMP32 | BPF_JSLT | BPF_X:
2443 case BPF_JMP32 | BPF_JSGE | BPF_X:
2444 case BPF_JMP32 | BPF_JSLE | BPF_X:
2445 /* cmp dst_reg, src_reg */
2446 maybe_emit_mod(&prog, dst_reg, src_reg,
2447 BPF_CLASS(insn->code) == BPF_JMP);
2448 EMIT2(0x39, add_2reg(0xC0, dst_reg, src_reg));
2449 goto emit_cond_jmp;
2450
2451 case BPF_JMP | BPF_JSET | BPF_X:
2452 case BPF_JMP32 | BPF_JSET | BPF_X:
2453 /* test dst_reg, src_reg */
2454 maybe_emit_mod(&prog, dst_reg, src_reg,
2455 BPF_CLASS(insn->code) == BPF_JMP);
2456 EMIT2(0x85, add_2reg(0xC0, dst_reg, src_reg));
2457 goto emit_cond_jmp;
2458
2459 case BPF_JMP | BPF_JSET | BPF_K:
2460 case BPF_JMP32 | BPF_JSET | BPF_K:
2461 /* test dst_reg, imm32 */
2462 maybe_emit_1mod(&prog, dst_reg,
2463 BPF_CLASS(insn->code) == BPF_JMP);
2464 EMIT2_off32(0xF7, add_1reg(0xC0, dst_reg), imm32);
2465 goto emit_cond_jmp;
2466
2467 case BPF_JMP | BPF_JEQ | BPF_K:
2468 case BPF_JMP | BPF_JNE | BPF_K:
2469 case BPF_JMP | BPF_JGT | BPF_K:
2470 case BPF_JMP | BPF_JLT | BPF_K:
2471 case BPF_JMP | BPF_JGE | BPF_K:
2472 case BPF_JMP | BPF_JLE | BPF_K:
2473 case BPF_JMP | BPF_JSGT | BPF_K:
2474 case BPF_JMP | BPF_JSLT | BPF_K:
2475 case BPF_JMP | BPF_JSGE | BPF_K:
2476 case BPF_JMP | BPF_JSLE | BPF_K:
2477 case BPF_JMP32 | BPF_JEQ | BPF_K:
2478 case BPF_JMP32 | BPF_JNE | BPF_K:
2479 case BPF_JMP32 | BPF_JGT | BPF_K:
2480 case BPF_JMP32 | BPF_JLT | BPF_K:
2481 case BPF_JMP32 | BPF_JGE | BPF_K:
2482 case BPF_JMP32 | BPF_JLE | BPF_K:
2483 case BPF_JMP32 | BPF_JSGT | BPF_K:
2484 case BPF_JMP32 | BPF_JSLT | BPF_K:
2485 case BPF_JMP32 | BPF_JSGE | BPF_K:
2486 case BPF_JMP32 | BPF_JSLE | BPF_K:
2487 /* test dst_reg, dst_reg to save one extra byte */
2488 if (imm32 == 0) {
2489 maybe_emit_mod(&prog, dst_reg, dst_reg,
2490 BPF_CLASS(insn->code) == BPF_JMP);
2491 EMIT2(0x85, add_2reg(0xC0, dst_reg, dst_reg));
2492 goto emit_cond_jmp;
2493 }
2494
2495 /* cmp dst_reg, imm8/32 */
2496 maybe_emit_1mod(&prog, dst_reg,
2497 BPF_CLASS(insn->code) == BPF_JMP);
2498
2499 if (is_imm8(imm32))
2500 EMIT3(0x83, add_1reg(0xF8, dst_reg), imm32);
2501 else
2502 EMIT2_off32(0x81, add_1reg(0xF8, dst_reg), imm32);
2503
2504 emit_cond_jmp: /* Convert BPF opcode to x86 */
2505 switch (BPF_OP(insn->code)) {
2506 case BPF_JEQ:
2507 jmp_cond = X86_JE;
2508 break;
2509 case BPF_JSET:
2510 case BPF_JNE:
2511 jmp_cond = X86_JNE;
2512 break;
2513 case BPF_JGT:
2514 /* GT is unsigned '>', JA in x86 */
2515 jmp_cond = X86_JA;
2516 break;
2517 case BPF_JLT:
2518 /* LT is unsigned '<', JB in x86 */
2519 jmp_cond = X86_JB;
2520 break;
2521 case BPF_JGE:
2522 /* GE is unsigned '>=', JAE in x86 */
2523 jmp_cond = X86_JAE;
2524 break;
2525 case BPF_JLE:
2526 /* LE is unsigned '<=', JBE in x86 */
2527 jmp_cond = X86_JBE;
2528 break;
2529 case BPF_JSGT:
2530 /* Signed '>', GT in x86 */
2531 jmp_cond = X86_JG;
2532 break;
2533 case BPF_JSLT:
2534 /* Signed '<', LT in x86 */
2535 jmp_cond = X86_JL;
2536 break;
2537 case BPF_JSGE:
2538 /* Signed '>=', GE in x86 */
2539 jmp_cond = X86_JGE;
2540 break;
2541 case BPF_JSLE:
2542 /* Signed '<=', LE in x86 */
2543 jmp_cond = X86_JLE;
2544 break;
2545 default: /* to silence GCC warning */
2546 return -EFAULT;
2547 }
2548 jmp_offset = addrs[i + insn->off] - addrs[i];
2549 if (is_imm8_jmp_offset(jmp_offset)) {
2550 if (jmp_padding) {
2551 /* To keep the jmp_offset valid, the extra bytes are
2552 * padded before the jump insn, so we subtract the
2553 * 2 bytes of jmp_cond insn from INSN_SZ_DIFF.
2554 *
2555 * If the previous pass already emits an imm8
2556 * jmp_cond, then this BPF insn won't shrink, so
2557 * "nops" is 0.
2558 *
2559 * On the other hand, if the previous pass emits an
2560 * imm32 jmp_cond, the extra 4 bytes(*) is padded to
2561 * keep the image from shrinking further.
2562 *
2563 * (*) imm32 jmp_cond is 6 bytes, and imm8 jmp_cond
2564 * is 2 bytes, so the size difference is 4 bytes.
2565 */
2566 nops = INSN_SZ_DIFF - 2;
2567 if (nops != 0 && nops != 4) {
2568 pr_err("unexpected jmp_cond padding: %d bytes\n",
2569 nops);
2570 return -EFAULT;
2571 }
2572 emit_nops(&prog, nops);
2573 }
2574 EMIT2(jmp_cond, jmp_offset);
2575 } else if (is_simm32(jmp_offset)) {
2576 EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset);
2577 } else {
2578 pr_err("cond_jmp gen bug %llx\n", jmp_offset);
2579 return -EFAULT;
2580 }
2581
2582 break;
2583
2584 case BPF_JMP | BPF_JA:
2585 case BPF_JMP32 | BPF_JA:
2586 if (BPF_CLASS(insn->code) == BPF_JMP) {
2587 if (insn->off == -1)
2588 /* -1 jmp instructions will always jump
2589 * backwards two bytes. Explicitly handling
2590 * this case avoids wasting too many passes
2591 * when there are long sequences of replaced
2592 * dead code.
2593 */
2594 jmp_offset = -2;
2595 else
2596 jmp_offset = addrs[i + insn->off] - addrs[i];
2597 } else {
2598 if (insn->imm == -1)
2599 jmp_offset = -2;
2600 else
2601 jmp_offset = addrs[i + insn->imm] - addrs[i];
2602 }
2603
2604 if (!jmp_offset) {
2605 /*
2606 * If jmp_padding is enabled, the extra nops will
2607 * be inserted. Otherwise, optimize out nop jumps.
2608 */
2609 if (jmp_padding) {
2610 /* There are 3 possible conditions.
2611 * (1) This BPF_JA is already optimized out in
2612 * the previous run, so there is no need
2613 * to pad any extra byte (0 byte).
2614 * (2) The previous pass emits an imm8 jmp,
2615 * so we pad 2 bytes to match the previous
2616 * insn size.
2617 * (3) Similarly, the previous pass emits an
2618 * imm32 jmp, and 5 bytes is padded.
2619 */
2620 nops = INSN_SZ_DIFF;
2621 if (nops != 0 && nops != 2 && nops != 5) {
2622 pr_err("unexpected nop jump padding: %d bytes\n",
2623 nops);
2624 return -EFAULT;
2625 }
2626 emit_nops(&prog, nops);
2627 }
2628 break;
2629 }
2630 emit_jmp:
2631 if (is_imm8_jmp_offset(jmp_offset)) {
2632 if (jmp_padding) {
2633 /* To avoid breaking jmp_offset, the extra bytes
2634 * are padded before the actual jmp insn, so
2635 * 2 bytes is subtracted from INSN_SZ_DIFF.
2636 *
2637 * If the previous pass already emits an imm8
2638 * jmp, there is nothing to pad (0 byte).
2639 *
2640 * If it emits an imm32 jmp (5 bytes) previously
2641 * and now an imm8 jmp (2 bytes), then we pad
2642 * (5 - 2 = 3) bytes to stop the image from
2643 * shrinking further.
2644 */
2645 nops = INSN_SZ_DIFF - 2;
2646 if (nops != 0 && nops != 3) {
2647 pr_err("unexpected jump padding: %d bytes\n",
2648 nops);
2649 return -EFAULT;
2650 }
2651 emit_nops(&prog, INSN_SZ_DIFF - 2);
2652 }
2653 EMIT2(0xEB, jmp_offset);
2654 } else if (is_simm32(jmp_offset)) {
2655 EMIT1_off32(0xE9, jmp_offset);
2656 } else {
2657 pr_err("jmp gen bug %llx\n", jmp_offset);
2658 return -EFAULT;
2659 }
2660 break;
2661
2662 case BPF_JMP | BPF_EXIT:
2663 if (seen_exit) {
2664 jmp_offset = ctx->cleanup_addr - addrs[i];
2665 goto emit_jmp;
2666 }
2667 seen_exit = true;
2668 /* Update cleanup_addr */
2669 ctx->cleanup_addr = proglen;
2670 if (bpf_prog_was_classic(bpf_prog) &&
2671 !capable(CAP_SYS_ADMIN)) {
2672 u8 *ip = image + addrs[i - 1];
2673
2674 if (emit_spectre_bhb_barrier(&prog, ip, bpf_prog))
2675 return -EINVAL;
2676 }
2677 if (bpf_prog->aux->exception_boundary) {
2678 pop_callee_regs(&prog, all_callee_regs_used);
2679 pop_r12(&prog);
2680 } else {
2681 pop_callee_regs(&prog, callee_regs_used);
2682 if (arena_vm_start)
2683 pop_r12(&prog);
2684 }
2685 EMIT1(0xC9); /* leave */
2686 emit_return(&prog, image + addrs[i - 1] + (prog - temp));
2687 break;
2688
2689 default:
2690 /*
2691 * By design x86-64 JIT should support all BPF instructions.
2692 * This error will be seen if new instruction was added
2693 * to the interpreter, but not to the JIT, or if there is
2694 * junk in bpf_prog.
2695 */
2696 pr_err("bpf_jit: unknown opcode %02x\n", insn->code);
2697 return -EINVAL;
2698 }
2699
2700 ilen = prog - temp;
2701 if (ilen > BPF_MAX_INSN_SIZE) {
2702 pr_err("bpf_jit: fatal insn size error\n");
2703 return -EFAULT;
2704 }
2705
2706 if (image) {
2707 /*
2708 * When populating the image, assert that:
2709 *
2710 * i) We do not write beyond the allocated space, and
2711 * ii) addrs[i] did not change from the prior run, in order
2712 * to validate assumptions made for computing branch
2713 * displacements.
2714 */
2715 if (unlikely(proglen + ilen > oldproglen ||
2716 proglen + ilen != addrs[i])) {
2717 pr_err("bpf_jit: fatal error\n");
2718 return -EFAULT;
2719 }
2720 memcpy(rw_image + proglen, temp, ilen);
2721
2722 /*
2723 * Instruction arrays need to know how xlated code
2724 * maps to jitted code
2725 */
> 2726 bpf_prog_update_insn_ptr(bpf_prog, abs_xlated_off, proglen,
2727 image + proglen);
2728 }
2729 proglen += ilen;
2730 addrs[i] = proglen;
2731 prog = temp;
2732 }
2733
2734 if (image && excnt != bpf_prog->aux->num_exentries) {
2735 pr_err("extable is not populated\n");
2736 return -EFAULT;
2737 }
2738 return proglen;
2739 }
2740
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 12/13] bpftool: Recognize insn_array map type
2025-09-13 19:39 ` [PATCH v2 bpf-next 12/13] bpftool: Recognize insn_array map type Anton Protopopov
@ 2025-09-16 20:33 ` Quentin Monnet
2025-09-18 8:11 ` Anton Protopopov
0 siblings, 1 reply; 26+ messages in thread
From: Quentin Monnet @ 2025-09-16 20:33 UTC (permalink / raw)
To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Anton Protopopov, Daniel Borkmann, Eduard Zingerman,
Yonghong Song
2025-09-13 19:39 UTC+0000 ~ Anton Protopopov <a.s.protopopov@gmail.com>
> Teach bpftool to recognize instruction array map type.
>
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> ---
> tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> tools/bpf/bpftool/map.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index 252e4c538edb..3377d4a01c62 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -55,7 +55,7 @@ MAP COMMANDS
> | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
> | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> -| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
> +| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
Thanks Anton!
That's a long line. As you'll likely respin your series, could you wrap
and start a new line, please?
>
> DESCRIPTION
> ===========
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index c9de44a45778..79b90f274bef 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
> " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
> " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> - " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
> + " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
Same here. Other than these:
Acked-by: Quentin Monnet <qmo@kernel.org>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 12/13] bpftool: Recognize insn_array map type
2025-09-16 20:33 ` Quentin Monnet
@ 2025-09-18 8:11 ` Anton Protopopov
0 siblings, 0 replies; 26+ messages in thread
From: Anton Protopopov @ 2025-09-18 8:11 UTC (permalink / raw)
To: Quentin Monnet
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Yonghong Song
On 25/09/16 09:33PM, Quentin Monnet wrote:
> 2025-09-13 19:39 UTC+0000 ~ Anton Protopopov <a.s.protopopov@gmail.com>
> > Teach bpftool to recognize instruction array map type.
> >
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > ---
> > tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> > tools/bpf/bpftool/map.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > index 252e4c538edb..3377d4a01c62 100644
> > --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> > @@ -55,7 +55,7 @@ MAP COMMANDS
> > | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
> > | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> > | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> > -| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
> > +| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
>
>
> Thanks Anton!
> That's a long line. As you'll likely respin your series, could you wrap
> and start a new line, please?
Thanks, fixed! (I will resend the series as v3 now due to kbuild-bot issue.)
>
> >
> > DESCRIPTION
> > ===========
> > diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> > index c9de44a45778..79b90f274bef 100644
> > --- a/tools/bpf/bpftool/map.c
> > +++ b/tools/bpf/bpftool/map.c
> > @@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
> > " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
> > " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> > " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> > - " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
> > + " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
>
>
> Same here. Other than these:
>
> Acked-by: Quentin Monnet <qmo@kernel.org>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-13 19:39 ` [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array Anton Protopopov
2025-09-15 4:09 ` kernel test robot
@ 2025-09-20 0:30 ` Alexei Starovoitov
2025-09-22 10:38 ` Anton Protopopov
1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2025-09-20 0:30 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On Sat, Sep 13, 2025 at 12:33 PM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
> --- /dev/null
> +++ b/kernel/bpf/bpf_insn_array.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
add copyright?
> +#include <linux/bpf.h>
> +#include <linux/sort.h>
> +
> +#define MAX_INSN_ARRAY_ENTRIES 256
> +
> +struct bpf_insn_array {
> + struct bpf_map map;
> + struct mutex state_mutex;
> + int state;
> + long *ips;
> + DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
> +};
> +
> +enum {
> + INSN_ARRAY_STATE_FREE = 0,
> + INSN_ARRAY_STATE_INIT,
> + INSN_ARRAY_STATE_READY,
> +};
> +
> +#define cast_insn_array(MAP_PTR) \
> + container_of(MAP_PTR, struct bpf_insn_array, map)
container_of((MAP_PTR)
checkpatch will be happier.
> +
> +#define INSN_DELETED ((u32)-1)
> +
> +static inline u32 insn_array_alloc_size(u32 max_entries)
> +{
> + const u32 base_size = sizeof(struct bpf_insn_array);
> + const u32 entry_size = sizeof(struct bpf_insn_ptr);
> +
> + return base_size + entry_size * max_entries;
> +}
> +
> +static int insn_array_alloc_check(union bpf_attr *attr)
> +{
> + if (attr->max_entries == 0 ||
> + attr->key_size != 4 ||
> + attr->value_size != 8 ||
> + attr->map_flags != 0)
> + return -EINVAL;
Use single line or two, instead of 4.
> +
> + if (attr->max_entries > MAX_INSN_ARRAY_ENTRIES)
> + return -E2BIG;
> +
> + return 0;
> +}
> +
> +static void insn_array_free(struct bpf_map *map)
> +{
> + struct bpf_insn_array *insn_array = cast_insn_array(map);
> +
> + kfree(insn_array->ips);
> + bpf_map_area_free(insn_array);
> +}
> +
> +static struct bpf_map *insn_array_alloc(union bpf_attr *attr)
> +{
> + u64 size = insn_array_alloc_size(attr->max_entries);
> + struct bpf_insn_array *insn_array;
> +
> + insn_array = bpf_map_area_alloc(size, NUMA_NO_NODE);
> + if (!insn_array)
> + return ERR_PTR(-ENOMEM);
> +
> + insn_array->ips = kcalloc(attr->max_entries, sizeof(long), GFP_KERNEL);
> + if (!insn_array->ips) {
> + insn_array_free(&insn_array->map);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + bpf_map_init_from_attr(&insn_array->map, attr);
> +
> + mutex_init(&insn_array->state_mutex);
> + insn_array->state = INSN_ARRAY_STATE_FREE;
> +
> + return &insn_array->map;
> +}
> +
> +static int insn_array_get_next_key(struct bpf_map *map, void *key, void *next_key)
> +{
> + struct bpf_insn_array *insn_array = cast_insn_array(map);
> + u32 index = key ? *(u32 *)key : U32_MAX;
> + u32 *next = (u32 *)next_key;
> +
> + if (index >= insn_array->map.max_entries) {
> + *next = 0;
> + return 0;
> + }
> +
> + if (index == insn_array->map.max_entries - 1)
> + return -ENOENT;
> +
> + *next = index + 1;
> + return 0;
> +}
Full copy paste of array_map_get_next_key() is a bit too much.
Pls refactor array_map_get_next_key() to avoid casting
to struct bpf_array, then such a helper can work for both maps.
> +
> +static void *insn_array_lookup_elem(struct bpf_map *map, void *key)
> +{
> + struct bpf_insn_array *insn_array = cast_insn_array(map);
> + u32 index = *(u32 *)key;
> +
> + if (unlikely(index >= insn_array->map.max_entries))
> + return NULL;
> +
> + return &insn_array->ptrs[index].user_value;
> +}
> +
> +static long insn_array_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
> +{
> + struct bpf_insn_array *insn_array = cast_insn_array(map);
> + u32 index = *(u32 *)key;
> + struct bpf_insn_array_value val = {};
> + int err = 0;
> +
> + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
> + return -EINVAL;
copy paste gone wrong. BPF_F_LOCK is not supported here.
> +
> + if (unlikely(index >= insn_array->map.max_entries))
> + return -E2BIG;
> +
> + if (unlikely(map_flags & BPF_NOEXIST))
> + return -EEXIST;
> +
> + /* No updates for maps in use */
> + if (!mutex_trylock(&insn_array->state_mutex))
> + return -EBUSY;
trylock ?!
If I'm reading it correctly
check_map_func_compatibility() prevents usage of this helper
from the prog, so this is syscall only,
but trylock?!
> +
> + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> + err = -EBUSY;
> + goto unlock;
> + }
> +
> + copy_map_value(map, &val, value);
> + if (val.jitted_off || val.xlated_off == INSN_DELETED) {
> + err = -EINVAL;
> + goto unlock;
> + }
> +
> + insn_array->ptrs[index].orig_xlated_off = val.xlated_off;
> + insn_array->ptrs[index].user_value.xlated_off = val.xlated_off;
> +
> +unlock:
> + mutex_unlock(&insn_array->state_mutex);
> + return err;
> +}
> +
> +static long insn_array_delete_elem(struct bpf_map *map, void *key)
> +{
> + return -EINVAL;
> +}
> +
> +static int insn_array_check_btf(const struct bpf_map *map,
> + const struct btf *btf,
> + const struct btf_type *key_type,
> + const struct btf_type *value_type)
> +{
> + if (!btf_type_is_i32(key_type))
> + return -EINVAL;
> +
> + if (!btf_type_is_i64(value_type))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +static u64 insn_array_mem_usage(const struct bpf_map *map)
> +{
> + u64 extra_size = 0;
> +
> + extra_size += sizeof(long) * map->max_entries; /* insn_array->ips */
> +
> + return insn_array_alloc_size(map->max_entries) + extra_size;
> +}
> +
> +BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
> +
> +const struct bpf_map_ops insn_array_map_ops = {
> + .map_alloc_check = insn_array_alloc_check,
> + .map_alloc = insn_array_alloc,
> + .map_free = insn_array_free,
> + .map_get_next_key = insn_array_get_next_key,
> + .map_lookup_elem = insn_array_lookup_elem,
> + .map_update_elem = insn_array_update_elem,
> + .map_delete_elem = insn_array_delete_elem,
> + .map_check_btf = insn_array_check_btf,
> + .map_mem_usage = insn_array_mem_usage,
> + .map_btf_id = &insn_array_btf_ids[0],
> +};
> +
> +static bool is_insn_array(const struct bpf_map *map)
> +{
> + return map->map_type == BPF_MAP_TYPE_INSN_ARRAY;
> +}
> +
> +static inline bool valid_offsets(const struct bpf_insn_array *insn_array,
> + const struct bpf_prog *prog)
> +{
> + u32 off;
> + int i;
> +
> + for (i = 0; i < insn_array->map.max_entries; i++) {
> + off = insn_array->ptrs[i].orig_xlated_off;
> +
> + if (off >= prog->len)
> + return false;
> +
> + if (off > 0) {
> + if (prog->insnsi[off-1].code == (BPF_LD | BPF_DW | BPF_IMM))
> + return false;
> + }
> + }
> +
> + return true;
> +}
> +
> +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> +{
> + struct bpf_insn_array *insn_array = cast_insn_array(map);
> + int i;
> +
> + if (!valid_offsets(insn_array, prog))
> + return -EINVAL;
> +
> + /*
> + * There can be only one program using the map
> + */
> + mutex_lock(&insn_array->state_mutex);
> + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> + mutex_unlock(&insn_array->state_mutex);
> + return -EBUSY;
> + }
> + insn_array->state = INSN_ARRAY_STATE_INIT;
> + mutex_unlock(&insn_array->state_mutex);
only verifier calls this helpers, no?
Why all the mutexes here and below ?
All the mutexes is a big red flag to me.
Will stop any further comments here.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-20 0:30 ` Alexei Starovoitov
@ 2025-09-22 10:38 ` Anton Protopopov
2025-09-22 16:16 ` Alexei Starovoitov
0 siblings, 1 reply; 26+ messages in thread
From: Anton Protopopov @ 2025-09-22 10:38 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On 25/09/19 05:30PM, Alexei Starovoitov wrote:
> On Sat, Sep 13, 2025 at 12:33 PM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> > --- /dev/null
> > +++ b/kernel/bpf/bpf_insn_array.c
> > @@ -0,0 +1,336 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +
>
> add copyright?
Yes, thanks!
> > +#include <linux/bpf.h>
> > +#include <linux/sort.h>
> > +
> > +#define MAX_INSN_ARRAY_ENTRIES 256
> > +
> > +struct bpf_insn_array {
> > + struct bpf_map map;
> > + struct mutex state_mutex;
> > + int state;
> > + long *ips;
> > + DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
> > +};
> > +
> > +enum {
> > + INSN_ARRAY_STATE_FREE = 0,
> > + INSN_ARRAY_STATE_INIT,
> > + INSN_ARRAY_STATE_READY,
> > +};
> > +
> > +#define cast_insn_array(MAP_PTR) \
> > + container_of(MAP_PTR, struct bpf_insn_array, map)
>
> container_of((MAP_PTR)
> checkpatch will be happier.
Thanks, fixed
> > +
> > +#define INSN_DELETED ((u32)-1)
> > +
> > +static inline u32 insn_array_alloc_size(u32 max_entries)
> > +{
> > + const u32 base_size = sizeof(struct bpf_insn_array);
> > + const u32 entry_size = sizeof(struct bpf_insn_ptr);
> > +
> > + return base_size + entry_size * max_entries;
> > +}
> > +
> > +static int insn_array_alloc_check(union bpf_attr *attr)
> > +{
> > + if (attr->max_entries == 0 ||
> > + attr->key_size != 4 ||
> > + attr->value_size != 8 ||
> > + attr->map_flags != 0)
> > + return -EINVAL;
>
> Use single line or two, instead of 4.
Done
> > +
> > + if (attr->max_entries > MAX_INSN_ARRAY_ENTRIES)
> > + return -E2BIG;
> > +
> > + return 0;
> > +}
> > +
> > +static void insn_array_free(struct bpf_map *map)
> > +{
> > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > +
> > + kfree(insn_array->ips);
> > + bpf_map_area_free(insn_array);
> > +}
> > +
> > +static struct bpf_map *insn_array_alloc(union bpf_attr *attr)
> > +{
> > + u64 size = insn_array_alloc_size(attr->max_entries);
> > + struct bpf_insn_array *insn_array;
> > +
> > + insn_array = bpf_map_area_alloc(size, NUMA_NO_NODE);
> > + if (!insn_array)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + insn_array->ips = kcalloc(attr->max_entries, sizeof(long), GFP_KERNEL);
> > + if (!insn_array->ips) {
> > + insn_array_free(&insn_array->map);
> > + return ERR_PTR(-ENOMEM);
> > + }
> > +
> > + bpf_map_init_from_attr(&insn_array->map, attr);
> > +
> > + mutex_init(&insn_array->state_mutex);
> > + insn_array->state = INSN_ARRAY_STATE_FREE;
> > +
> > + return &insn_array->map;
> > +}
> > +
> > +static int insn_array_get_next_key(struct bpf_map *map, void *key, void *next_key)
> > +{
> > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > + u32 index = key ? *(u32 *)key : U32_MAX;
> > + u32 *next = (u32 *)next_key;
> > +
> > + if (index >= insn_array->map.max_entries) {
> > + *next = 0;
> > + return 0;
> > + }
> > +
> > + if (index == insn_array->map.max_entries - 1)
> > + return -ENOENT;
> > +
> > + *next = index + 1;
> > + return 0;
> > +}
>
> Full copy paste of array_map_get_next_key() is a bit too much.
> Pls refactor array_map_get_next_key() to avoid casting
> to struct bpf_array, then such a helper can work for both maps.
Ok, thank, will do.
> > +
> > +static void *insn_array_lookup_elem(struct bpf_map *map, void *key)
> > +{
> > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > + u32 index = *(u32 *)key;
> > +
> > + if (unlikely(index >= insn_array->map.max_entries))
> > + return NULL;
> > +
> > + return &insn_array->ptrs[index].user_value;
> > +}
> > +
> > +static long insn_array_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
> > +{
> > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > + u32 index = *(u32 *)key;
> > + struct bpf_insn_array_value val = {};
> > + int err = 0;
> > +
> > + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
> > + return -EINVAL;
>
> copy paste gone wrong. BPF_F_LOCK is not supported here.
thanks, removed
> > +
> > + if (unlikely(index >= insn_array->map.max_entries))
> > + return -E2BIG;
> > +
> > + if (unlikely(map_flags & BPF_NOEXIST))
> > + return -EEXIST;
> > +
> > + /* No updates for maps in use */
> > + if (!mutex_trylock(&insn_array->state_mutex))
> > + return -EBUSY;
>
> trylock ?!
>
> If I'm reading it correctly
> check_map_func_compatibility() prevents usage of this helper
> from the prog, so this is syscall only,
> but trylock?!
See the comment below.
> > +
> > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > + err = -EBUSY;
> > + goto unlock;
> > + }
> > +
> > + copy_map_value(map, &val, value);
> > + if (val.jitted_off || val.xlated_off == INSN_DELETED) {
> > + err = -EINVAL;
> > + goto unlock;
> > + }
> > +
> > + insn_array->ptrs[index].orig_xlated_off = val.xlated_off;
> > + insn_array->ptrs[index].user_value.xlated_off = val.xlated_off;
> > +
> > +unlock:
> > + mutex_unlock(&insn_array->state_mutex);
> > + return err;
> > +}
> > +
> > +static long insn_array_delete_elem(struct bpf_map *map, void *key)
> > +{
> > + return -EINVAL;
> > +}
> > +
> > +static int insn_array_check_btf(const struct bpf_map *map,
> > + const struct btf *btf,
> > + const struct btf_type *key_type,
> > + const struct btf_type *value_type)
> > +{
> > + if (!btf_type_is_i32(key_type))
> > + return -EINVAL;
> > +
> > + if (!btf_type_is_i64(value_type))
> > + return -EINVAL;
> > +
> > + return 0;
> > +}
> > +
> > +static u64 insn_array_mem_usage(const struct bpf_map *map)
> > +{
> > + u64 extra_size = 0;
> > +
> > + extra_size += sizeof(long) * map->max_entries; /* insn_array->ips */
> > +
> > + return insn_array_alloc_size(map->max_entries) + extra_size;
> > +}
> > +
> > +BTF_ID_LIST_SINGLE(insn_array_btf_ids, struct, bpf_insn_array)
> > +
> > +const struct bpf_map_ops insn_array_map_ops = {
> > + .map_alloc_check = insn_array_alloc_check,
> > + .map_alloc = insn_array_alloc,
> > + .map_free = insn_array_free,
> > + .map_get_next_key = insn_array_get_next_key,
> > + .map_lookup_elem = insn_array_lookup_elem,
> > + .map_update_elem = insn_array_update_elem,
> > + .map_delete_elem = insn_array_delete_elem,
> > + .map_check_btf = insn_array_check_btf,
> > + .map_mem_usage = insn_array_mem_usage,
> > + .map_btf_id = &insn_array_btf_ids[0],
> > +};
> > +
> > +static bool is_insn_array(const struct bpf_map *map)
> > +{
> > + return map->map_type == BPF_MAP_TYPE_INSN_ARRAY;
> > +}
> > +
> > +static inline bool valid_offsets(const struct bpf_insn_array *insn_array,
> > + const struct bpf_prog *prog)
> > +{
> > + u32 off;
> > + int i;
> > +
> > + for (i = 0; i < insn_array->map.max_entries; i++) {
> > + off = insn_array->ptrs[i].orig_xlated_off;
> > +
> > + if (off >= prog->len)
> > + return false;
> > +
> > + if (off > 0) {
> > + if (prog->insnsi[off-1].code == (BPF_LD | BPF_DW | BPF_IMM))
> > + return false;
> > + }
> > + }
> > +
> > + return true;
> > +}
> > +
> > +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> > +{
> > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > + int i;
> > +
> > + if (!valid_offsets(insn_array, prog))
> > + return -EINVAL;
> > +
> > + /*
> > + * There can be only one program using the map
> > + */
> > + mutex_lock(&insn_array->state_mutex);
> > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > + mutex_unlock(&insn_array->state_mutex);
> > + return -EBUSY;
> > + }
> > + insn_array->state = INSN_ARRAY_STATE_INIT;
> > + mutex_unlock(&insn_array->state_mutex);
>
> only verifier calls this helpers, no?
> Why all the mutexes here and below ?
> All the mutexes is a big red flag to me.
> Will stop any further comments here.
Mutex came here from the future patch for static keys.
I will see how to rewrite this with just an atomic state.
(Try lock came from fixing some robot report which I struggle to find now...)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-22 10:38 ` Anton Protopopov
@ 2025-09-22 16:16 ` Alexei Starovoitov
2025-09-22 17:37 ` Anton Protopopov
0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2025-09-22 16:16 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On Mon, Sep 22, 2025 at 3:32 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
> > > +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> > > +{
> > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > + int i;
> > > +
> > > + if (!valid_offsets(insn_array, prog))
> > > + return -EINVAL;
> > > +
> > > + /*
> > > + * There can be only one program using the map
> > > + */
> > > + mutex_lock(&insn_array->state_mutex);
> > > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > > + mutex_unlock(&insn_array->state_mutex);
> > > + return -EBUSY;
> > > + }
> > > + insn_array->state = INSN_ARRAY_STATE_INIT;
> > > + mutex_unlock(&insn_array->state_mutex);
> >
> > only verifier calls this helpers, no?
> > Why all the mutexes here and below ?
> > All the mutexes is a big red flag to me.
> > Will stop any further comments here.
>
> Mutex came here from the future patch for static keys.
> I will see how to rewrite this with just an atomic state.
I don't follow. Who will be calling them other than the verifier?
Some kfunc? I couldn't find that in the patch set.
If so, add synchronization logic in the patch set that
actually needs it. This one doesn't not. So don't add
any mutex or atomics here.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-22 16:16 ` Alexei Starovoitov
@ 2025-09-22 17:37 ` Anton Protopopov
2025-09-22 17:57 ` Alexei Starovoitov
0 siblings, 1 reply; 26+ messages in thread
From: Anton Protopopov @ 2025-09-22 17:37 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On 25/09/22 09:16AM, Alexei Starovoitov wrote:
> On Mon, Sep 22, 2025 at 3:32 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> > > > +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> > > > +{
> > > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > > + int i;
> > > > +
> > > > + if (!valid_offsets(insn_array, prog))
> > > > + return -EINVAL;
> > > > +
> > > > + /*
> > > > + * There can be only one program using the map
> > > > + */
> > > > + mutex_lock(&insn_array->state_mutex);
> > > > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > > > + mutex_unlock(&insn_array->state_mutex);
> > > > + return -EBUSY;
> > > > + }
> > > > + insn_array->state = INSN_ARRAY_STATE_INIT;
> > > > + mutex_unlock(&insn_array->state_mutex);
> > >
> > > only verifier calls this helpers, no?
> > > Why all the mutexes here and below ?
> > > All the mutexes is a big red flag to me.
> > > Will stop any further comments here.
> >
> > Mutex came here from the future patch for static keys.
> > I will see how to rewrite this with just an atomic state.
>
> I don't follow. Who will be calling them other than the verifier?
> Some kfunc? I couldn't find that in the patch set.
> If so, add synchronization logic in the patch set that
> actually needs it. This one doesn't not. So don't add
> any mutex or atomics here.
The usage of this map is as follows:
1. A user creates it and fills in the values using the map_update_element (syscall)
2. Then the program is loaded
The map <-> program is 1:1 relation, so I want to prevent users from
1. Updating the map after the program started loading
2. Allowing two programs to use the same map (while, say, loading simultaneously)
At the same time I want map to be reusable for the same program for the case
when the program failed to load and is reloaded with the log buffer.
So there should be some synchronisation mechanism.
(In future patchset, the bpf(STATIC_KEY_UPDATE) syscall needs to execute. It
needs to be sure that the map was successfully loaded with the program. But
you're right that this doesn't make sense to leak part of this patch into this
patchset.)
Does this make sense?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-22 17:37 ` Anton Protopopov
@ 2025-09-22 17:57 ` Alexei Starovoitov
2025-09-22 19:23 ` Anton Protopopov
2025-09-23 9:55 ` Anton Protopopov
0 siblings, 2 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2025-09-22 17:57 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On Mon, Sep 22, 2025 at 10:31 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> On 25/09/22 09:16AM, Alexei Starovoitov wrote:
> > On Mon, Sep 22, 2025 at 3:32 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > > > > +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> > > > > +{
> > > > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > > > + int i;
> > > > > +
> > > > > + if (!valid_offsets(insn_array, prog))
> > > > > + return -EINVAL;
> > > > > +
> > > > > + /*
> > > > > + * There can be only one program using the map
> > > > > + */
> > > > > + mutex_lock(&insn_array->state_mutex);
> > > > > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > > > > + mutex_unlock(&insn_array->state_mutex);
> > > > > + return -EBUSY;
> > > > > + }
> > > > > + insn_array->state = INSN_ARRAY_STATE_INIT;
> > > > > + mutex_unlock(&insn_array->state_mutex);
> > > >
> > > > only verifier calls this helpers, no?
> > > > Why all the mutexes here and below ?
> > > > All the mutexes is a big red flag to me.
> > > > Will stop any further comments here.
> > >
> > > Mutex came here from the future patch for static keys.
> > > I will see how to rewrite this with just an atomic state.
> >
> > I don't follow. Who will be calling them other than the verifier?
> > Some kfunc? I couldn't find that in the patch set.
> > If so, add synchronization logic in the patch set that
> > actually needs it. This one doesn't not. So don't add
> > any mutex or atomics here.
>
> The usage of this map is as follows:
>
> 1. A user creates it and fills in the values using the map_update_element (syscall)
> 2. Then the program is loaded
>
> The map <-> program is 1:1 relation, so I want to prevent users from
>
> 1. Updating the map after the program started loading
> 2. Allowing two programs to use the same map (while, say, loading simultaneously)
Then the user space should freeze the map after updating and
before loading.
As far as 1-1 relation, we just landed exclusive map support
that ties a map to one specific program.
This mechanism can be used or 1-1 can be established by the kernel
internally.
> At the same time I want map to be reusable for the same program for the case
> when the program failed to load and is reloaded with the log buffer.
> So there should be some synchronisation mechanism.
>
> (In future patchset, the bpf(STATIC_KEY_UPDATE) syscall needs to execute. It
> needs to be sure that the map was successfully loaded with the program. But
> you're right that this doesn't make sense to leak part of this patch into this
> patchset.)
Even when that bit will be available it won't be modifying the map.
At best it will flip flag or bit whether the branch is nop or jmp.
I still don't see a need for mutexes.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-22 17:57 ` Alexei Starovoitov
@ 2025-09-22 19:23 ` Anton Protopopov
2025-09-22 20:24 ` Alexei Starovoitov
2025-09-23 9:55 ` Anton Protopopov
1 sibling, 1 reply; 26+ messages in thread
From: Anton Protopopov @ 2025-09-22 19:23 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On 25/09/22 10:57AM, Alexei Starovoitov wrote:
> On Mon, Sep 22, 2025 at 10:31 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > On 25/09/22 09:16AM, Alexei Starovoitov wrote:
> > > On Mon, Sep 22, 2025 at 3:32 AM Anton Protopopov
> > > <a.s.protopopov@gmail.com> wrote:
> > > > > > +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> > > > > > +{
> > > > > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > > > > + int i;
> > > > > > +
> > > > > > + if (!valid_offsets(insn_array, prog))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + /*
> > > > > > + * There can be only one program using the map
> > > > > > + */
> > > > > > + mutex_lock(&insn_array->state_mutex);
> > > > > > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > > > > > + mutex_unlock(&insn_array->state_mutex);
> > > > > > + return -EBUSY;
> > > > > > + }
> > > > > > + insn_array->state = INSN_ARRAY_STATE_INIT;
> > > > > > + mutex_unlock(&insn_array->state_mutex);
> > > > >
> > > > > only verifier calls this helpers, no?
> > > > > Why all the mutexes here and below ?
> > > > > All the mutexes is a big red flag to me.
> > > > > Will stop any further comments here.
> > > >
> > > > Mutex came here from the future patch for static keys.
> > > > I will see how to rewrite this with just an atomic state.
> > >
> > > I don't follow. Who will be calling them other than the verifier?
> > > Some kfunc? I couldn't find that in the patch set.
> > > If so, add synchronization logic in the patch set that
> > > actually needs it. This one doesn't not. So don't add
> > > any mutex or atomics here.
> >
> > The usage of this map is as follows:
> >
> > 1. A user creates it and fills in the values using the map_update_element (syscall)
> > 2. Then the program is loaded
> >
> > The map <-> program is 1:1 relation, so I want to prevent users from
> >
> > 1. Updating the map after the program started loading
> > 2. Allowing two programs to use the same map (while, say, loading simultaneously)
>
> Then the user space should freeze the map after updating and
> before loading.
> As far as 1-1 relation, we just landed exclusive map support
> that ties a map to one specific program.
> This mechanism can be used or 1-1 can be established by the kernel
> internally.
I've actually first did it via frozen, and then removed it after Andrii's
comments. Will get it back and remove all other mutexes
> > At the same time I want map to be reusable for the same program for the case
> > when the program failed to load and is reloaded with the log buffer.
> > So there should be some synchronisation mechanism.
> >
> > (In future patchset, the bpf(STATIC_KEY_UPDATE) syscall needs to execute. It
> > needs to be sure that the map was successfully loaded with the program. But
> > you're right that this doesn't make sense to leak part of this patch into this
> > patchset.)
>
> Even when that bit will be available it won't be modifying the map.
> At best it will flip flag or bit whether the branch is nop or jmp.
> I still don't see a need for mutexes.
ok
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-22 19:23 ` Anton Protopopov
@ 2025-09-22 20:24 ` Alexei Starovoitov
0 siblings, 0 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2025-09-22 20:24 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On Mon, Sep 22, 2025 at 12:17 PM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> On 25/09/22 10:57AM, Alexei Starovoitov wrote:
> > On Mon, Sep 22, 2025 at 10:31 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > >
> > > On 25/09/22 09:16AM, Alexei Starovoitov wrote:
> > > > On Mon, Sep 22, 2025 at 3:32 AM Anton Protopopov
> > > > <a.s.protopopov@gmail.com> wrote:
> > > > > > > +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> > > > > > > +{
> > > > > > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > > > > > + int i;
> > > > > > > +
> > > > > > > + if (!valid_offsets(insn_array, prog))
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * There can be only one program using the map
> > > > > > > + */
> > > > > > > + mutex_lock(&insn_array->state_mutex);
> > > > > > > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > > > > > > + mutex_unlock(&insn_array->state_mutex);
> > > > > > > + return -EBUSY;
> > > > > > > + }
> > > > > > > + insn_array->state = INSN_ARRAY_STATE_INIT;
> > > > > > > + mutex_unlock(&insn_array->state_mutex);
> > > > > >
> > > > > > only verifier calls this helpers, no?
> > > > > > Why all the mutexes here and below ?
> > > > > > All the mutexes is a big red flag to me.
> > > > > > Will stop any further comments here.
> > > > >
> > > > > Mutex came here from the future patch for static keys.
> > > > > I will see how to rewrite this with just an atomic state.
> > > >
> > > > I don't follow. Who will be calling them other than the verifier?
> > > > Some kfunc? I couldn't find that in the patch set.
> > > > If so, add synchronization logic in the patch set that
> > > > actually needs it. This one doesn't not. So don't add
> > > > any mutex or atomics here.
> > >
> > > The usage of this map is as follows:
> > >
> > > 1. A user creates it and fills in the values using the map_update_element (syscall)
> > > 2. Then the program is loaded
> > >
> > > The map <-> program is 1:1 relation, so I want to prevent users from
> > >
> > > 1. Updating the map after the program started loading
> > > 2. Allowing two programs to use the same map (while, say, loading simultaneously)
> >
> > Then the user space should freeze the map after updating and
> > before loading.
> > As far as 1-1 relation, we just landed exclusive map support
> > that ties a map to one specific program.
> > This mechanism can be used or 1-1 can be established by the kernel
> > internally.
>
> I've actually first did it via frozen, and then removed it after Andrii's
> comments. Will get it back and remove all other mutexes
What was Andrii's concern with freeze ?
It seems like a good fit to me. User space updates and freezes,
because it shouldn't be updating it anymore. Normal jmp tables
in ELF are readonly too.
> > > At the same time I want map to be reusable for the same program for the case
> > > when the program failed to load and is reloaded with the log buffer.
> > > So there should be some synchronisation mechanism.
> > >
> > > (In future patchset, the bpf(STATIC_KEY_UPDATE) syscall needs to execute. It
> > > needs to be sure that the map was successfully loaded with the program. But
> > > you're right that this doesn't make sense to leak part of this patch into this
> > > patchset.)
> >
> > Even when that bit will be available it won't be modifying the map.
> > At best it will flip flag or bit whether the branch is nop or jmp.
> > I still don't see a need for mutexes.
>
> ok
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-22 17:57 ` Alexei Starovoitov
2025-09-22 19:23 ` Anton Protopopov
@ 2025-09-23 9:55 ` Anton Protopopov
2025-09-23 15:14 ` Alexei Starovoitov
1 sibling, 1 reply; 26+ messages in thread
From: Anton Protopopov @ 2025-09-23 9:55 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On 25/09/22 10:57AM, Alexei Starovoitov wrote:
> On Mon, Sep 22, 2025 at 10:31 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > On 25/09/22 09:16AM, Alexei Starovoitov wrote:
> > > On Mon, Sep 22, 2025 at 3:32 AM Anton Protopopov
> > > <a.s.protopopov@gmail.com> wrote:
> > > > > > +int bpf_insn_array_init(struct bpf_map *map, const struct bpf_prog *prog)
> > > > > > +{
> > > > > > + struct bpf_insn_array *insn_array = cast_insn_array(map);
> > > > > > + int i;
> > > > > > +
> > > > > > + if (!valid_offsets(insn_array, prog))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + /*
> > > > > > + * There can be only one program using the map
> > > > > > + */
> > > > > > + mutex_lock(&insn_array->state_mutex);
> > > > > > + if (insn_array->state != INSN_ARRAY_STATE_FREE) {
> > > > > > + mutex_unlock(&insn_array->state_mutex);
> > > > > > + return -EBUSY;
> > > > > > + }
> > > > > > + insn_array->state = INSN_ARRAY_STATE_INIT;
> > > > > > + mutex_unlock(&insn_array->state_mutex);
> > > > >
> > > > > only verifier calls this helpers, no?
> > > > > Why all the mutexes here and below ?
> > > > > All the mutexes is a big red flag to me.
> > > > > Will stop any further comments here.
> > > >
> > > > Mutex came here from the future patch for static keys.
> > > > I will see how to rewrite this with just an atomic state.
> > >
> > > I don't follow. Who will be calling them other than the verifier?
> > > Some kfunc? I couldn't find that in the patch set.
> > > If so, add synchronization logic in the patch set that
> > > actually needs it. This one doesn't not. So don't add
> > > any mutex or atomics here.
> >
> > The usage of this map is as follows:
> >
> > 1. A user creates it and fills in the values using the map_update_element (syscall)
> > 2. Then the program is loaded
> >
> > The map <-> program is 1:1 relation, so I want to prevent users from
> >
> > 1. Updating the map after the program started loading
> > 2. Allowing two programs to use the same map (while, say, loading simultaneously)
>
> Then the user space should freeze the map after updating and
> before loading.
> As far as 1-1 relation, we just landed exclusive map support
> that ties a map to one specific program.
AFAICS, this api is not applicable here, as it says "this map can
only be used with the program with sha256 hash X", but nothing
prevents users from loading, say, 2 same programs with the same map.
Are you ok with just this for 1:1 correspondance:
if (atomic64_fetch_add_unless(&insn_array->used, 1, 1))
return -EBUSY;
> This mechanism can be used or 1-1 can be established by the kernel
> internally.
>
> > At the same time I want map to be reusable for the same program for the case
> > when the program failed to load and is reloaded with the log buffer.
> > So there should be some synchronisation mechanism.
> >
> > (In future patchset, the bpf(STATIC_KEY_UPDATE) syscall needs to execute. It
> > needs to be sure that the map was successfully loaded with the program. But
> > you're right that this doesn't make sense to leak part of this patch into this
> > patchset.)
>
> Even when that bit will be available it won't be modifying the map.
> At best it will flip flag or bit whether the branch is nop or jmp.
> I still don't see a need for mutexes.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array
2025-09-23 9:55 ` Anton Protopopov
@ 2025-09-23 15:14 ` Alexei Starovoitov
0 siblings, 0 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2025-09-23 15:14 UTC (permalink / raw)
To: Anton Protopopov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
On Tue, Sep 23, 2025 at 2:49 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> Are you ok with just this for 1:1 correspondance:
>
> if (atomic64_fetch_add_unless(&insn_array->used, 1, 1))
> return -EBUSY;
Like that, but more canonical form:
if (atomic_xchg(&insn_array->used, 1))
return -EBUSY;
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-09-23 15:14 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-13 19:39 [PATCH v2 bpf-next 00/13] BPF indirect jumps Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 01/13] bpf: fix the return value of push_stack Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 02/13] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 03/13] bpf, x86: add new map type: instructions array Anton Protopopov
2025-09-15 4:09 ` kernel test robot
2025-09-20 0:30 ` Alexei Starovoitov
2025-09-22 10:38 ` Anton Protopopov
2025-09-22 16:16 ` Alexei Starovoitov
2025-09-22 17:37 ` Anton Protopopov
2025-09-22 17:57 ` Alexei Starovoitov
2025-09-22 19:23 ` Anton Protopopov
2025-09-22 20:24 ` Alexei Starovoitov
2025-09-23 9:55 ` Anton Protopopov
2025-09-23 15:14 ` Alexei Starovoitov
2025-09-13 19:39 ` [PATCH v2 bpf-next 04/13] selftests/bpf: add selftests for new insn_array map Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 05/13] bpf: support instructions arrays with constants blinding Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 06/13] selftests/bpf: test instructions arrays with blinding Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 07/13] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 08/13] bpf, x86: add support for indirect jumps Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 09/13] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 10/13] libbpf: fix formatting of bpf_object__append_subprog_code Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 11/13] libbpf: support llvm-generated indirect jumps Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 12/13] bpftool: Recognize insn_array map type Anton Protopopov
2025-09-16 20:33 ` Quentin Monnet
2025-09-18 8:11 ` Anton Protopopov
2025-09-13 19:39 ` [PATCH v2 bpf-next 13/13] selftests/bpf: add selftests for indirect jumps Anton Protopopov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox