From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Anton Protopopov <a.s.protopopov@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Anton Protopopov <aspsk@isovalent.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Eduard Zingerman <eddyz87@gmail.com>,
Quentin Monnet <qmo@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps
Date: Wed, 20 Aug 2025 17:20:08 -0700 [thread overview]
Message-ID: <CAEf4BzaZxoz+=_uycH=6rO3U548TF7K8v5zKukDSJjWUgEXSSw@mail.gmail.com> (raw)
In-Reply-To: <20250816180631.952085-11-a.s.protopopov@gmail.com>
On Sat, Aug 16, 2025 at 11:02 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> For v5 instruction set, LLVM now is allowed to generate indirect
> jumps for switch statements and for 'goto *rX' assembly. Every such a
> jump will be accompanied by necessary metadata, e.g. (`llvm-objdump
> -Sr ...`):
>
> 0: r2 = 0x0 ll
> 0000000000000030: R_BPF_64_64 BPF.JT.0.0
>
> Here BPF.JT.1.0 is a symbol residing in the .jumptables section:
>
> Symbol table:
> 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
>
> The -bpf-min-jump-table-entries llvm option may be used to control
> the minimal size of a switch which will be converted to an indirect
> jumps.
>
> The code generated by LLVM for a switch will look, approximately,
> like this:
>
> 0: rX <- jump_table_x[i]
> 2: rX <<= 3
> 3: gotox *rX
>
> Right now there is no robust way to associate the jump with the
> corresponding map, so libbpf doesn't insert map file descriptor
> inside the gotox instruction.
Just from the commit description it's not clear whether that's
something that needs fixing or is OK? If it's OK, why call it out?..
>
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> ---
> .../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> tools/bpf/bpftool/map.c | 2 +-
> tools/lib/bpf/libbpf.c | 159 +++++++++++++++---
> tools/lib/bpf/libbpf_probes.c | 4 +
> tools/lib/bpf/linker.c | 12 +-
> 5 files changed, 153 insertions(+), 26 deletions(-)
>
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index 252e4c538edb..3377d4a01c62 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -55,7 +55,7 @@ MAP COMMANDS
> | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
> | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> -| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
> +| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** | **insn_array** }
>
> DESCRIPTION
> ===========
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index c9de44a45778..79b90f274bef 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1477,7 +1477,7 @@ static int do_help(int argc, char **argv)
> " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
> " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> - " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena }\n"
> + " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena | insn_array }\n"
> " " HELP_SPEC_OPTIONS " |\n"
> " {-f|--bpffs} | {-n|--nomount} }\n"
> "",
bpftool changes sifted through into libbpf patch?
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index fe4fc5438678..a5f04544c09c 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -191,6 +191,7 @@ static const char * const map_type_name[] = {
> [BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf",
> [BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
> [BPF_MAP_TYPE_ARENA] = "arena",
> + [BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
> };
>
> static const char * const prog_type_name[] = {
> @@ -372,6 +373,7 @@ enum reloc_type {
> RELO_EXTERN_CALL,
> RELO_SUBPROG_ADDR,
> RELO_CORE,
> + RELO_INSN_ARRAY,
> };
>
> struct reloc_desc {
> @@ -383,6 +385,7 @@ struct reloc_desc {
> int map_idx;
> int sym_off;
> int ext_idx;
> + int sym_size;
make it a union with ext_idx? ext_idx isn't used for jump table
relocation, right?
> };
> };
> };
> @@ -496,6 +499,10 @@ struct bpf_program {
> __u32 line_info_rec_size;
> __u32 line_info_cnt;
> __u32 prog_flags;
> +
> + __u32 subprog_offset[256];
> + __u32 subprog_sec_offst[256];
> + __u32 subprog_cnt;
um... allocate dynamically, if necessary? (but also see above, might
not be necessary at all)
> };
>
> struct bpf_struct_ops {
> @@ -525,6 +532,7 @@ struct bpf_struct_ops {
> #define STRUCT_OPS_SEC ".struct_ops"
> #define STRUCT_OPS_LINK_SEC ".struct_ops.link"
> #define ARENA_SEC ".addr_space.1"
> +#define JUMPTABLES_SEC ".jumptables"
>
> enum libbpf_map_type {
> LIBBPF_MAP_UNSPEC,
> @@ -658,6 +666,7 @@ struct elf_state {
> Elf64_Ehdr *ehdr;
> Elf_Data *symbols;
> Elf_Data *arena_data;
> + Elf_Data jumptables_data;
> size_t shstrndx; /* section index for section name strings */
> size_t strtabidx;
> struct elf_sec_desc *secs;
> @@ -668,6 +677,7 @@ struct elf_state {
> int symbols_shndx;
> bool has_st_ops;
> int arena_data_shndx;
> + int jumptables_data_shndx;
> };
>
> struct usdt_manager;
> @@ -3945,6 +3955,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
> } else if (strcmp(name, ARENA_SEC) == 0) {
> obj->efile.arena_data = data;
> obj->efile.arena_data_shndx = idx;
> + } else if (strcmp(name, JUMPTABLES_SEC) == 0) {
> + memcpy(&obj->efile.jumptables_data, data, sizeof(*data));
you need to preserve the contents of jump tables to preparation stage,
right? Just memcpy'ing Elf_Data doesn't preserve d_buf's contents, no?
So you need to allocate memory for the contents and keep it until
preparation phase.
pw-bot: cr
> + obj->efile.jumptables_data_shndx = idx;
> } else {
> pr_info("elf: skipping unrecognized data section(%d) %s\n",
> idx, name);
> @@ -4599,6 +4612,16 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
> return 0;
> }
>
> + /* jump table data relocation */
> + if (shdr_idx == obj->efile.jumptables_data_shndx) {
> + reloc_desc->type = RELO_INSN_ARRAY;
> + reloc_desc->insn_idx = insn_idx;
> + reloc_desc->map_idx = -1;
> + reloc_desc->sym_off = sym->st_value;
> + reloc_desc->sym_size = sym->st_size;
> + return 0;
> + }
> +
> /* generic map reference relocation */
> if (type == LIBBPF_MAP_UNSPEC) {
> if (!bpf_object__shndx_is_maps(obj, shdr_idx)) {
> @@ -6101,6 +6124,60 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
> insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
> }
>
> +static int create_jt_map(struct bpf_object *obj, int off, int size, int adjust_off)
> +{
> + static union bpf_attr attr = {
> + .map_type = BPF_MAP_TYPE_INSN_ARRAY,
> + .key_size = 4,
> + .value_size = sizeof(struct bpf_insn_array_value),
> + .max_entries = 0,
> + };
> + struct bpf_insn_array_value val = {};
> + int map_fd;
> + int err;
> + __u32 i;
> + __u32 *jt;
nit: combine same-typed variable declarations?
> +
> + attr.max_entries = size / 8;
8 is sizeof(struct bpf_insns_array_value)? make it obvious?
> +
> + map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> + if (map_fd < 0)
> + return map_fd;
is the bpf_map_create() API not usable here? what's the reason to
open-code (incorrectly, not doing memset(0)) bpf_attr?
> +
> + jt = (__u32 *)(obj->efile.jumptables_data.d_buf + off);
> + if (!jt)
if off is not zero, this will never be true... this check looks wrong.
Check this once at the point where you record jumptables_data?
> + return -EINVAL;
> +
> + for (i = 0; i < attr.max_entries; i++) {
> + val.xlated_off = jt[2*i]/8 + adjust_off;
nit: code style: `jt[2 * i] / 8`
and this 8 is basically sizeof(struct bpf_insn), right? Can you use
that to have a bit more semantic meaning here?
> + err = bpf_map_update_elem(map_fd, &i, &val, 0);
> + if (err) {
> + close(map_fd);
> + return err;
> + }
> + }
> +
> + err = bpf_map_freeze(map_fd);
> + if (err) {
> + close(map_fd);
> + return err;
> + }
> +
> + return map_fd;
> +}
> +
> +static int subprog_insn_off(struct bpf_program *prog, int insn_idx)
> +{
> + int i;
> +
> + for (i = prog->subprog_cnt - 1; i >= 0; i--)
> + if (insn_idx >= prog->subprog_offset[i])
> + return prog->subprog_offset[i] - prog->subprog_sec_offst[i];
I feel like this whole subprog_offset and subprog_sec_offst shouldn't
be even necessary.
Check bpf_object__relocate(). I'm not sure why this was done this way
that we go across all programs in phases, doing code relocation first,
then data relocation later (across all programs again). I might be
forgetting some details, but if we change this to do all the
relocation for each program one at a time, then all this information
that you explicitly record is already recorded in
subprog->sub_insn_off and you can use it until we start relocating
another entry-point program. Can you give it a try?
So basically the structure will be:
for (i = 0; i < obj->nr_programs; i++) {
prog = ...
if (prog_is_subprog(...))
continue;
if (!prog->autoload)
continue;
bpf_object__relocate_calls()
/* that exception callback handling */
bpf_object__relocate_data()
bpf_program_fixup_func_info()
}
It feels like this should work because there cannot be
interdependencies between entry programs.
> +
> + return -prog->sec_insn_off;
why this return value?... can you elaborate?
> +}
> +
> +
> /* Relocate data references within program code:
> * - map references;
> * - global variable references;
> @@ -6192,6 +6269,21 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
> case RELO_CORE:
> /* will be handled by bpf_program_record_relos() */
> break;
> + case RELO_INSN_ARRAY: {
> + int map_fd;
> +
> + map_fd = create_jt_map(obj, relo->sym_off, relo->sym_size,
> + subprog_insn_off(prog, relo->insn_idx));
> + if (map_fd < 0) {
> + pr_warn("prog '%s': relo #%d: failed to create a jt map for sym_off=%u\n",
jt -> jump table, this is supposed to be at least somewhat
human-readable ;) also we seem to be not using blah=%d approach, so
just "sym_off %d" (and note that sym_off is int, not unsigned)
> + prog->name, i, relo->sym_off);
> + return map_fd;
> + }
> + insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
> + insn->imm = map_fd;
> + insn->off = 0;
> + }
> + break;
> default:
> pr_warn("prog '%s': relo #%d: bad relo type %d\n",
> prog->name, i, relo->type);
> @@ -6389,36 +6481,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
> return 0;
> }
>
> +static int
> +bpf_prog__append_subprog_offsets(struct bpf_program *prog, __u32 sec_insn_off, __u32 sub_insn_off)
please don't use double underscore for non-API functions, just
prog_append_subprog_offs()
but actually I'd just inline it into bpf_object__append_subprog_code,
it doesn't seem complicated enough to warrant its own function
> +{
> + if (prog->subprog_cnt == ARRAY_SIZE(prog->subprog_sec_offst)) {
please use libbpf_reallocarray()
> + pr_warn("prog '%s': number of subprogs exceeds %zu\n",
> + prog->name, ARRAY_SIZE(prog->subprog_sec_offst));
> + return -E2BIG;
> + }
> +
> + prog->subprog_sec_offst[prog->subprog_cnt] = sec_insn_off;
typo: offst, but also here and below prefer sticking to "off", it's
used pretty universally in libbpf code
> + prog->subprog_offset[prog->subprog_cnt] = sub_insn_off;
> +
> + prog->subprog_cnt += 1;
> + return 0;
> +}
> +
> static int
> bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> - struct bpf_program *subprog)
> + struct bpf_program *subprog)
> {
> - struct bpf_insn *insns;
> - size_t new_cnt;
> - int err;
> + struct bpf_insn *insns;
> + size_t new_cnt;
> + int err;
>
> - subprog->sub_insn_off = main_prog->insns_cnt;
> + subprog->sub_insn_off = main_prog->insns_cnt;
>
> - new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
> - insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
> - if (!insns) {
> - pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
> - return -ENOMEM;
> - }
> - main_prog->insns = insns;
> - main_prog->insns_cnt = new_cnt;
> + new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
> + insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
> + if (!insns) {
> + pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
> + return -ENOMEM;
> + }
> + main_prog->insns = insns;
> + main_prog->insns_cnt = new_cnt;
>
> - memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
> - subprog->insns_cnt * sizeof(*insns));
> + memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
> + subprog->insns_cnt * sizeof(*insns));
>
> - pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
> - main_prog->name, subprog->insns_cnt, subprog->name);
> + pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
> + main_prog->name, subprog->insns_cnt, subprog->name);
>
> - /* The subprog insns are now appended. Append its relos too. */
> - err = append_subprog_relos(main_prog, subprog);
> - if (err)
> - return err;
> - return 0;
> + /* The subprog insns are now appended. Append its relos too. */
> + err = append_subprog_relos(main_prog, subprog);
> + if (err)
> + return err;
> +
> + err = bpf_prog__append_subprog_offsets(main_prog, subprog->sec_insn_off,
> + subprog->sub_insn_off);
> + if (err)
> + return err;
> +
> + return 0;
> }
>
> static int
> @@ -7954,6 +8068,7 @@ static int bpf_object_prepare_progs(struct bpf_object *obj)
> if (err)
> return err;
> }
> +
?
> return 0;
> }
>
> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> index 9dfbe7750f56..bccf4bb747e1 100644
> --- a/tools/lib/bpf/libbpf_probes.c
> +++ b/tools/lib/bpf/libbpf_probes.c
> @@ -364,6 +364,10 @@ static int probe_map_create(enum bpf_map_type map_type)
> case BPF_MAP_TYPE_SOCKHASH:
> case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
> break;
> + case BPF_MAP_TYPE_INSN_ARRAY:
> + key_size = sizeof(__u32);
> + value_size = sizeof(struct bpf_insn_array_value);
> + break;
> case BPF_MAP_TYPE_UNSPEC:
> default:
> return -EOPNOTSUPP;
> diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
> index a469e5d4fee7..827867f8bba3 100644
> --- a/tools/lib/bpf/linker.c
> +++ b/tools/lib/bpf/linker.c
> @@ -28,6 +28,9 @@
> #include "str_error.h"
>
> #define BTF_EXTERN_SEC ".extern"
> +#define RODATA_REL_SEC ".rel.rodata"
> +#define JUMPTABLES_SEC ".jumptables"
> +#define JUMPTABLES_REL_SEC ".rel.jumptables"
>
> struct src_sec {
> const char *sec_name;
> @@ -2026,6 +2029,9 @@ static int linker_append_elf_sym(struct bpf_linker *linker, struct src_obj *obj,
> obj->sym_map[src_sym_idx] = dst_sec->sec_sym_idx;
> return 0;
> }
> +
> + if (!strcmp(src_sec->sec_name, JUMPTABLES_SEC))
If you look around in this file (and most of libbpf source code), you
won't see !strcmp() in it. Let's be consistent and explicit with == 0
and != 0 here and below.
> + goto add_sym;
> }
>
> if (sym_bind == STB_LOCAL)
> @@ -2272,8 +2278,10 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
> insn->imm += sec->dst_off / sizeof(struct bpf_insn);
> else
> insn->imm += sec->dst_off;
> - } else {
> - pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n");
> + } else if (strcmp(src_sec->sec_name, JUMPTABLES_REL_SEC) &&
> + strcmp(src_sec->sec_name, RODATA_REL_SEC)) {
where does .rel.rodata come from?
and we don't need to adjust the contents of any of those sections, right?...
can you please add some tests validating that two object files with
jumptables can be linked together and end up with proper combined
.jumptables section?
and in terms of code, can we do
} else if (strcmp(..., JUMPTABLES_REL_SEC) == 0) {
/* nothing to do for .rel.jumptables */
} else {
pr_warn(...);
}
It makes it more apparent what is supported and what's not.
> + pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
> + src_sec->sec_name);
> return -EINVAL;
> }
> }
> --
> 2.34.1
>
next prev parent reply other threads:[~2025-08-21 0:20 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-16 18:06 [PATCH v1 bpf-next 00/11] BPF indirect jumps Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 01/11] bpf: fix the return value of push_stack Anton Protopopov
2025-08-25 18:12 ` Eduard Zingerman
2025-08-26 15:00 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 02/11] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 03/11] bpf, x86: add new map type: instructions array Anton Protopopov
2025-08-25 21:05 ` Eduard Zingerman
2025-08-26 15:52 ` Anton Protopopov
2025-08-26 16:04 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 04/11] selftests/bpf: add selftests for new insn_array map Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 05/11] bpf: support instructions arrays with constants blinding Anton Protopopov
2025-08-17 5:50 ` kernel test robot
2025-08-18 8:24 ` Anton Protopopov
2025-08-25 23:29 ` Eduard Zingerman
2025-08-27 9:20 ` Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 06/11] selftests/bpf: test instructions arrays with blinding Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 07/11] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 08/11] bpf, x86: add support for indirect jumps Anton Protopopov
2025-08-18 7:57 ` Dan Carpenter
2025-08-18 8:22 ` Anton Protopopov
2025-08-25 23:15 ` Eduard Zingerman
2025-08-27 15:34 ` Anton Protopopov
2025-08-27 18:58 ` Eduard Zingerman
2025-08-28 9:58 ` Anton Protopopov
2025-08-28 14:15 ` Anton Protopopov
2025-08-28 16:10 ` Eduard Zingerman
2025-08-28 16:30 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 09/11] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
2025-08-16 18:06 ` [PATCH v1 bpf-next 10/11] libbpf: support llvm-generated indirect jumps Anton Protopopov
2025-08-21 0:20 ` Andrii Nakryiko [this message]
2025-08-21 13:05 ` Anton Protopopov
2025-08-21 18:14 ` Andrii Nakryiko
2025-08-21 19:12 ` Anton Protopopov
2025-08-26 0:06 ` Eduard Zingerman
2025-08-26 16:15 ` Anton Protopopov
2025-08-26 16:51 ` Anton Protopopov
2025-08-26 16:47 ` Eduard Zingerman
2025-08-16 18:06 ` [PATCH v1 bpf-next 11/11] selftests/bpf: add selftests for " Anton Protopopov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEf4BzaZxoz+=_uycH=6rO3U548TF7K8v5zKukDSJjWUgEXSSw@mail.gmail.com' \
--to=andrii.nakryiko@gmail.com \
--cc=a.s.protopopov@gmail.com \
--cc=andrii@kernel.org \
--cc=aspsk@isovalent.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=qmo@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).