bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC bpf-next 0/9] BPF indirect jumps
@ 2025-06-15  8:59 Anton Protopopov
  2025-06-15  8:59 ` [RFC bpf-next 1/9] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
                   ` (8 more replies)
  0 siblings, 9 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

This patchset implements a new type of map, instruction set, and uses
it to build support for indirect branches in BPF (x86). (The same map
will be later used to provide support for indirect calls and static
keys.) See [1], [2] for more context.

Short table of contents:

  * Patches 1,2,3 implement the new map of type
    BPF_MAP_TYPE_INSN_SET. This map can be used to track the
    "original -> xlated -> jitted mapping" for a given program.

  * patches 4,5 implement the support for indirect jumps

  * 6,7,8,9 add support for LLVM-compiled programs containing
    indirect jumps. A special LLVM should be used for that, see [3]
    for the details and some related discussions.

There is a list of TBDs (mostly, more checks & selftests, faster
lookups, etc.), plus the tests only can be compiled by a custom
LLVM, thus this is an RFC. However, all the selftests which compile 
to contain an indirect jump work with this patchset, so it is looking
worth sending it as is already. Namely, the following selftests
will contain an indirect jump:

    * bpf_goto_x, cgroup_tcp_skb, cls_redirect, bpf_tcp_ca,
    * bpf_iter_setsockopt, tc_change_tail, net_timestamping,
    * user_ringbuf, tcp_hdr_options, tunnel, exceptions,
    * tcpbpf_user, tcp_custom_syncookie

See individual patches for more details on implementation details.

Links:
  1. https://lpc.events/event/18/contributions/1941/
  2. https://lwn.net/Articles/1017439/
  3. https://github.com/llvm/llvm-project/pull/133856

Anton Protopopov (9):
  bpf: save the start of functions in bpf_prog_aux
  bpf, x86: add new map type: instructions set
  selftests/bpf: add selftests for new insn_set map
  bpf, x86: allow indirect jumps to r8...r15
  bpf, x86: add support for indirect jumps
  bpf: workaround llvm behaviour with indirect jumps
  bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X
  libbpf: support llvm-generated indirect jumps
  selftests/bpf: add selftests for indirect jumps

 arch/x86/net/bpf_jit_comp.c                   |  44 +-
 include/linux/bpf.h                           |  24 +
 include/linux/bpf_types.h                     |   1 +
 include/linux/bpf_verifier.h                  |   6 +
 include/uapi/linux/bpf.h                      |  11 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/bpf_insn_set.c                     | 407 +++++++++++++++
 kernel/bpf/core.c                             |   2 +
 kernel/bpf/disasm.c                           |  10 +
 kernel/bpf/syscall.c                          |  22 +
 kernel/bpf/verifier.c                         | 266 +++++++++-
 tools/include/uapi/linux/bpf.h                |  11 +
 tools/lib/bpf/libbpf.c                        | 333 +++++++++++-
 tools/lib/bpf/libbpf_internal.h               |   4 +
 tools/lib/bpf/linker.c                        |  66 ++-
 tools/testing/selftests/bpf/Makefile          |   4 +-
 .../selftests/bpf/prog_tests/bpf_goto_x.c     | 127 +++++
 .../selftests/bpf/prog_tests/bpf_insn_set.c   | 481 ++++++++++++++++++
 .../testing/selftests/bpf/progs/bpf_goto_x.c  | 336 ++++++++++++
 19 files changed, 2116 insertions(+), 41 deletions(-)
 create mode 100644 kernel/bpf/bpf_insn_set.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_insn_set.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_goto_x.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [RFC bpf-next 1/9] bpf: save the start of functions in bpf_prog_aux
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-15  8:59 ` [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set Anton Protopopov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

Introduce a new subprog_start field in bpf_prog_aux. This field may
be used by JIT compilers wanting to know the real absolute xlated
offset of the function being jitted. The func_info[func_id] may have
served this purpose, but func_info may be NULL, so JIT compilers
can't rely on it.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 include/linux/bpf.h   | 1 +
 kernel/bpf/verifier.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5dd556e89cce..8189f49e43d6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1555,6 +1555,7 @@ struct bpf_prog_aux {
 	u32 ctx_arg_info_size;
 	u32 max_rdonly_access;
 	u32 max_rdwr_access;
+	u32 subprog_start;
 	struct btf *attach_btf;
 	struct bpf_ctx_arg_aux *ctx_arg_info;
 	void __percpu *priv_stack_ptr;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 279a64933262..98c51f824956 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21389,6 +21389,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->func_idx = i;
 		/* Below members will be freed only at prog->aux */
 		func[i]->aux->btf = prog->aux->btf;
+		func[i]->aux->subprog_start = subprog_start;
 		func[i]->aux->func_info = prog->aux->func_info;
 		func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
 		func[i]->aux->poke_tab = prog->aux->poke_tab;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
  2025-06-15  8:59 ` [RFC bpf-next 1/9] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-18  0:57   ` Eduard Zingerman
  2025-06-15  8:59 ` [RFC bpf-next 3/9] selftests/bpf: add selftests for new insn_set map Anton Protopopov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

On bpf(BPF_PROG_LOAD) syscall user-supplied BPF programs are
translated by the verifier into "xlated" BPF programs. During this
process the original instructions offsets might be adjusted and/or
individual instructions might be replaced by new sets of instructions,
or deleted.

Add a new BPF map type which is aimed to keep track of how, for a
given program, the original instructions were relocated during the
verification. Also, besides keeping track of the original -> xlated
mapping, make x86 JIT to build the xlated -> jitted mapping for every
instruction listed in an instruction set. This is required for every
future application of instruction sets: static keys, indirect jumps
and indirect calls.

A map of the BPF_MAP_TYPE_INSN_SET type must be created with a u32
keys and value of size 8. The values have different semantics for
userspace and for BPF space. For userspace a value consists of two
u32 values – xlated and jitted offsets. For BPF side the value is
a real pointer to a jitted instruction.

On map creation/initialization, before loading the program, each
element of the map should be initialized to point to an instruction
offset within the program. Before the program load such maps should
be made frozen. After the program verification xlated and jitted
offsets can be read via the bpf(2) syscall.

If a tracked instruction is removed by the verifier, then the xlated
offset is set to (u32)-1 which is considered to be too big for a valid
BPF program offset.

One such a map can, obviously, be used to track one and only one BPF
program.  If the verification process was unsuccessful, then the same
map can be re-used to verify the program with a different log level.
However, if the program was loaded fine, then such a map, being
frozen in any case, can't be reused by other programs even after the
program release.

Example. Consider the following original and xlated programs:

    Original prog:                      Xlated prog:

     0:  r1 = 0x0                        0: r1 = 0
     1:  *(u32 *)(r10 - 0x4) = r1        1: *(u32 *)(r10 -4) = r1
     2:  r2 = r10                        2: r2 = r10
     3:  r2 += -0x4                      3: r2 += -4
     4:  r1 = 0x0 ll                     4: r1 = map[id:88]
     6:  call 0x1                        6: r1 += 272
                                         7: r0 = *(u32 *)(r2 +0)
                                         8: if r0 >= 0x1 goto pc+3
                                         9: r0 <<= 3
                                        10: r0 += r1
                                        11: goto pc+1
                                        12: r0 = 0
     7:  r6 = r0                        13: r6 = r0
     8:  if r6 == 0x0 goto +0x2         14: if r6 == 0x0 goto pc+4
     9:  call 0x76                      15: r0 = 0xffffffff8d2079c0
                                        17: r0 = *(u64 *)(r0 +0)
    10:  *(u64 *)(r6 + 0x0) = r0        18: *(u64 *)(r6 +0) = r0
    11:  r0 = 0x0                       19: r0 = 0x0
    12:  exit                           20: exit

An instruction set map, containing, e.g., indexes [0,4,7,12]
will be translated by the verifier to [0,4,13,20]. A map with
index 5 (the middle of 16-byte instruction) or indexes greater than 12
(outside the program boundaries) would be rejected.

The functionality provided by this patch will be extended in consequent
patches to implement BPF Static Keys, indirect jumps, and indirect calls.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c    |  11 ++
 include/linux/bpf.h            |  21 ++
 include/linux/bpf_types.h      |   1 +
 include/linux/bpf_verifier.h   |   2 +
 include/uapi/linux/bpf.h       |  11 ++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/bpf_insn_set.c      | 338 +++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  22 +++
 kernel/bpf/verifier.c          |  43 +++++
 tools/include/uapi/linux/bpf.h |  11 ++
 10 files changed, 461 insertions(+), 1 deletion(-)
 create mode 100644 kernel/bpf/bpf_insn_set.c

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 15672cb926fc..923c38f212dc 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1615,6 +1615,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 		const s32 imm32 = insn->imm;
 		u32 dst_reg = insn->dst_reg;
 		u32 src_reg = insn->src_reg;
+		int adjust_off = 0;
+		int abs_xlated_off;
 		u8 b2 = 0, b3 = 0;
 		u8 *start_of_ldx;
 		s64 jmp_offset;
@@ -1770,6 +1772,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 			emit_mov_imm64(&prog, dst_reg, insn[1].imm, insn[0].imm);
 			insn++;
 			i++;
+			adjust_off = 1;
 			break;
 
 			/* dst %= src, dst /= src, dst %= imm32, dst /= imm32 */
@@ -2642,6 +2645,14 @@ st:			if (is_imm8(insn->off))
 				return -EFAULT;
 			}
 			memcpy(rw_image + proglen, temp, ilen);
+
+			/*
+			 * Instruction sets need to know how xlated code
+			 * maps to jited code
+			 */
+			abs_xlated_off = bpf_prog->aux->subprog_start + i - 1 - adjust_off;
+			bpf_prog_update_insn_ptr(bpf_prog, abs_xlated_off, proglen, ilen,
+						 jmp_offset, image + proglen);
 		}
 		proglen += ilen;
 		addrs[i] = proglen;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8189f49e43d6..008bcd44c60e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3596,4 +3596,25 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog);
+int bpf_insn_set_ready(struct bpf_map *map);
+void bpf_insn_set_release(struct bpf_map *map);
+void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
+void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
+
+struct bpf_insn_ptr {
+	void *jitted_ip;
+	u32 jitted_len;
+	int jitted_jump_offset;
+	struct bpf_insn_set_value user_value; /* userspace-visible value */
+	u32 orig_xlated_off;
+};
+
+void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
+			      u32 xlated_off,
+			      u32 jitted_off,
+			      u32 jitted_len,
+			      int jitted_jump_offset,
+			      void *jitted_ip);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa78f49d4a9a..01df0e47a3f7 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -133,6 +133,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_INSN_SET, insn_set_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7e459e839f8b..84b5e6b25c52 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -766,8 +766,10 @@ struct bpf_verifier_env {
 	struct list_head free_list;	/* list of struct bpf_verifier_state_list */
 	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
 	struct btf_mod_pair used_btfs[MAX_USED_BTFS]; /* array of BTF's used by BPF program */
+	struct bpf_map *insn_set_maps[MAX_USED_MAPS]; /* array of INSN_SET map's to be relocated */
 	u32 used_map_cnt;		/* number of used maps */
 	u32 used_btf_cnt;		/* number of used BTF objects */
+	u32 insn_set_map_cnt;		/* number of used maps of type BPF_MAP_TYPE_INSN_SET */
 	u32 id_gen;			/* used to generate unique reg IDs */
 	u32 hidden_subprog_cnt;		/* number of hidden subprogs */
 	int exception_callback_subprog;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 39e7818cca80..a833c3b4dd75 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1013,6 +1013,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
 	BPF_MAP_TYPE_ARENA,
+	BPF_MAP_TYPE_INSN_SET,
 	__MAX_BPF_MAP_TYPE
 };
 
@@ -7589,4 +7590,14 @@ enum bpf_kfunc_flags {
 	BPF_F_PAD_ZEROS = (1ULL << 0),
 };
 
+/*
+ * Values of a BPF_MAP_TYPE_INSN_SET entry must be of this type.
+ * On updates jitted_off must be equal to 0.
+ */
+struct bpf_insn_set_value {
+	__u32 jitted_off;
+	__u32 xlated_off;
+};
+
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 3a335c50e6e3..18dfbc30184f 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -9,7 +9,7 @@ CFLAGS_core.o += -Wno-override-init $(cflags-nogcse-yy)
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
-obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o bpf_insn_set.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
new file mode 100644
index 000000000000..c20e99327118
--- /dev/null
+++ b/kernel/bpf/bpf_insn_set.c
@@ -0,0 +1,338 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/bpf.h>
+#include <linux/sort.h>
+
+#define MAX_ISET_ENTRIES 256
+
+struct bpf_insn_set {
+	struct bpf_map map;
+	struct mutex state_mutex;
+	int state;
+	long *ips;
+	DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
+};
+
+enum {
+	INSN_SET_STATE_FREE = 0,
+	INSN_SET_STATE_INIT,
+	INSN_SET_STATE_READY,
+};
+
+#define cast_insn_set(MAP_PTR) \
+	container_of(MAP_PTR, struct bpf_insn_set, map)
+
+#define INSN_DELETED ((u32)-1)
+
+static inline u32 insn_set_alloc_size(u32 max_entries)
+{
+	const u32 base_size = sizeof(struct bpf_insn_set);
+	const u32 entry_size = sizeof(struct bpf_insn_ptr);
+
+	return base_size + entry_size * max_entries;
+}
+
+static int insn_set_alloc_check(union bpf_attr *attr)
+{
+	if (attr->max_entries == 0 ||
+	    attr->key_size != 4 ||
+	    attr->value_size != 8 ||
+	    attr->map_flags != 0)
+		return -EINVAL;
+
+	if (attr->max_entries > MAX_ISET_ENTRIES)
+		return -E2BIG;
+
+	return 0;
+}
+
+static void insn_set_free(struct bpf_map *map)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+
+	kfree(insn_set->ips);
+	bpf_map_area_free(insn_set);
+}
+
+static struct bpf_map *insn_set_alloc(union bpf_attr *attr)
+{
+	u64 size = insn_set_alloc_size(attr->max_entries);
+	struct bpf_insn_set *insn_set;
+
+	insn_set = bpf_map_area_alloc(size, NUMA_NO_NODE);
+	if (!insn_set)
+		return ERR_PTR(-ENOMEM);
+
+	insn_set->ips = kcalloc(attr->max_entries, sizeof(long), GFP_KERNEL);
+	if (!insn_set->ips) {
+		insn_set_free(&insn_set->map);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	bpf_map_init_from_attr(&insn_set->map, attr);
+
+	mutex_init(&insn_set->state_mutex);
+	insn_set->state = INSN_SET_STATE_FREE;
+
+	return &insn_set->map;
+}
+
+static int insn_set_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+	u32 index = key ? *(u32 *)key : U32_MAX;
+	u32 *next = (u32 *)next_key;
+
+	if (index >= insn_set->map.max_entries) {
+		*next = 0;
+		return 0;
+	}
+
+	if (index == insn_set->map.max_entries - 1)
+		return -ENOENT;
+
+	*next = index + 1;
+	return 0;
+}
+
+static void *insn_set_lookup_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+	u32 index = *(u32 *)key;
+
+	if (unlikely(index >= insn_set->map.max_entries))
+		return NULL;
+
+	return &insn_set->ptrs[index].user_value;
+}
+
+static long insn_set_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+	u32 index = *(u32 *)key;
+	struct bpf_insn_set_value val = {};
+
+	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+		return -EINVAL;
+
+	if (unlikely(index >= insn_set->map.max_entries))
+		return -E2BIG;
+
+	if (unlikely(map_flags & BPF_NOEXIST))
+		return -EEXIST;
+
+	copy_map_value(map, &val, value);
+	if (val.jitted_off || val.xlated_off == INSN_DELETED)
+		return -EINVAL;
+
+	insn_set->ptrs[index].orig_xlated_off = val.xlated_off;
+	insn_set->ptrs[index].user_value.xlated_off = val.xlated_off;
+
+	return 0;
+}
+
+static long insn_set_delete_elem(struct bpf_map *map, void *key)
+{
+	return -EINVAL;
+}
+
+static int insn_set_check_btf(const struct bpf_map *map,
+			      const struct btf *btf,
+			      const struct btf_type *key_type,
+			      const struct btf_type *value_type)
+{
+	u32 int_data;
+
+	if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
+		return -EINVAL;
+
+	if (BTF_INFO_KIND(value_type->info) != BTF_KIND_INT)
+		return -EINVAL;
+
+	int_data = *(u32 *)(key_type + 1);
+	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
+		return -EINVAL;
+
+	int_data = *(u32 *)(value_type + 1);
+	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
+		return -EINVAL;
+
+	return 0;
+}
+
+static u64 insn_set_mem_usage(const struct bpf_map *map)
+{
+	u64 extra_size = 0;
+
+	extra_size += sizeof(long) * map->max_entries; /* insn_set->ips */
+
+	return insn_set_alloc_size(map->max_entries) + extra_size;
+}
+
+BTF_ID_LIST_SINGLE(insn_set_btf_ids, struct, bpf_insn_set)
+
+const struct bpf_map_ops insn_set_map_ops = {
+	.map_alloc_check = insn_set_alloc_check,
+	.map_alloc = insn_set_alloc,
+	.map_free = insn_set_free,
+	.map_get_next_key = insn_set_get_next_key,
+	.map_lookup_elem = insn_set_lookup_elem,
+	.map_update_elem = insn_set_update_elem,
+	.map_delete_elem = insn_set_delete_elem,
+	.map_check_btf = insn_set_check_btf,
+	.map_mem_usage = insn_set_mem_usage,
+	.map_btf_id = &insn_set_btf_ids[0],
+};
+
+static inline bool is_frozen(struct bpf_map *map)
+{
+	guard(mutex)(&map->freeze_mutex);
+
+	return map->frozen;
+}
+
+static bool is_insn_set(const struct bpf_map *map)
+{
+	return map->map_type == BPF_MAP_TYPE_INSN_SET;
+}
+
+static inline bool valid_offsets(const struct bpf_insn_set *insn_set,
+				 const struct bpf_prog *prog)
+{
+	u32 off;
+	int i;
+
+	for (i = 0; i < insn_set->map.max_entries; i++) {
+		off = insn_set->ptrs[i].orig_xlated_off;
+
+		if (off >= prog->len)
+			return false;
+
+		if (off > 0) {
+			if (prog->insnsi[off-1].code == (BPF_LD | BPF_DW | BPF_IMM))
+				return false;
+		}
+	}
+
+	return true;
+}
+
+int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+	int i;
+
+	if (!is_frozen(map))
+		return -EINVAL;
+
+	if (!valid_offsets(insn_set, prog))
+		return -EINVAL;
+
+	/*
+	 * There can be only one program using the map
+	 */
+	mutex_lock(&insn_set->state_mutex);
+	if (insn_set->state != INSN_SET_STATE_FREE) {
+		mutex_unlock(&insn_set->state_mutex);
+		return -EBUSY;
+	}
+	insn_set->state = INSN_SET_STATE_INIT;
+	mutex_unlock(&insn_set->state_mutex);
+
+	/*
+	 * Reset all the map indexes to the original values.  This is needed,
+	 * e.g., when a replay of verification with different log level should
+	 * be performed.
+	 */
+	for (i = 0; i < map->max_entries; i++)
+		insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
+
+	return 0;
+}
+
+int bpf_insn_set_ready(struct bpf_map *map)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+	int i;
+
+	for (i = 0; i < map->max_entries; i++) {
+		if (insn_set->ptrs[i].user_value.xlated_off == INSN_DELETED)
+			continue;
+		if (!insn_set->ips[i])
+			return -EFAULT;
+	}
+
+	insn_set->state = INSN_SET_STATE_READY;
+	return 0;
+}
+
+void bpf_insn_set_release(struct bpf_map *map)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+
+	insn_set->state = INSN_SET_STATE_FREE;
+}
+
+void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+	int i;
+
+	if (len <= 1)
+		return;
+
+	for (i = 0; i < map->max_entries; i++) {
+		if (insn_set->ptrs[i].user_value.xlated_off <= off)
+			continue;
+		if (insn_set->ptrs[i].user_value.xlated_off == INSN_DELETED)
+			continue;
+		insn_set->ptrs[i].user_value.xlated_off += len - 1;
+	}
+}
+
+void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+	int i;
+
+	for (i = 0; i < map->max_entries; i++) {
+		if (insn_set->ptrs[i].user_value.xlated_off < off)
+			continue;
+		if (insn_set->ptrs[i].user_value.xlated_off == INSN_DELETED)
+			continue;
+		if (insn_set->ptrs[i].user_value.xlated_off >= off &&
+		    insn_set->ptrs[i].user_value.xlated_off < off + len)
+			insn_set->ptrs[i].user_value.xlated_off = INSN_DELETED;
+		else
+			insn_set->ptrs[i].user_value.xlated_off -= len;
+	}
+}
+
+void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
+			      u32 xlated_off,
+			      u32 jitted_off,
+			      u32 jitted_len,
+			      int jitted_jump_offset,
+			      void *jitted_ip)
+{
+	struct bpf_insn_set *insn_set;
+	struct bpf_map *map;
+	int i, j;
+
+	for (i = 0; i < prog->aux->used_map_cnt; i++) {
+		map = prog->aux->used_maps[i];
+		if (!is_insn_set(map))
+			continue;
+
+		insn_set = cast_insn_set(map);
+		for (j = 0; j < map->max_entries; j++) {
+			if (insn_set->ptrs[j].user_value.xlated_off == xlated_off) {
+				insn_set->ips[j] = (long)jitted_ip;
+				insn_set->ptrs[j].jitted_ip = jitted_ip;
+				insn_set->ptrs[j].jitted_len = jitted_len;
+				insn_set->ptrs[j].jitted_jump_offset = jitted_jump_offset;
+				insn_set->ptrs[j].user_value.jitted_off = jitted_off;
+			}
+		}
+	}
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 56500381c28a..b9123fe0e872 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1458,6 +1458,7 @@ static int map_create(union bpf_attr *attr, bool kernel)
 	case BPF_MAP_TYPE_STRUCT_OPS:
 	case BPF_MAP_TYPE_CPUMAP:
 	case BPF_MAP_TYPE_ARENA:
+	case BPF_MAP_TYPE_INSN_SET:
 		if (!bpf_token_capable(token, CAP_BPF))
 			goto put_token;
 		break;
@@ -2754,6 +2755,23 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 	}
 }
 
+static int bpf_prog_mark_insn_sets_ready(struct bpf_prog *prog)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < prog->aux->used_map_cnt; i++) {
+		if (prog->aux->used_maps[i]->map_type != BPF_MAP_TYPE_INSN_SET)
+			continue;
+
+		err = bpf_insn_set_ready(prog->aux->used_maps[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 /* last field in 'union bpf_attr' used by this command */
 #define BPF_PROG_LOAD_LAST_FIELD fd_array_cnt
 
@@ -2977,6 +2995,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (err < 0)
 		goto free_used_maps;
 
+	err = bpf_prog_mark_insn_sets_ready(prog);
+	if (err < 0)
+		goto free_used_maps;
+
 	err = bpf_prog_alloc_id(prog);
 	if (err)
 		goto free_used_maps;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 98c51f824956..8ac9a0b5af53 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10007,6 +10007,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    func_id != BPF_FUNC_map_push_elem)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_INSN_SET:
+		goto error;
 	default:
 		break;
 	}
@@ -20312,6 +20314,15 @@ static int __add_used_map(struct bpf_verifier_env *env, struct bpf_map *map)
 
 	env->used_maps[env->used_map_cnt++] = map;
 
+	if (map->map_type == BPF_MAP_TYPE_INSN_SET) {
+		err = bpf_insn_set_init(map, env->prog);
+		if (err) {
+			verbose(env, "Failed to properly initialize insn set\n");
+			return err;
+		}
+		env->insn_set_maps[env->insn_set_map_cnt++] = map;
+	}
+
 	return env->used_map_cnt - 1;
 }
 
@@ -20561,6 +20572,33 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len
 	}
 }
 
+static void release_insn_sets(struct bpf_verifier_env *env)
+{
+	int i;
+
+	for (i = 0; i < env->insn_set_map_cnt; i++)
+		bpf_insn_set_release(env->insn_set_maps[i]);
+}
+
+static void adjust_insn_sets(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+	int i;
+
+	if (len == 1)
+		return;
+
+	for (i = 0; i < env->insn_set_map_cnt; i++)
+		bpf_insn_set_adjust(env->insn_set_maps[i], off, len);
+}
+
+static void adjust_insn_sets_after_remove(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+	int i;
+
+	for (i = 0; i < env->insn_set_map_cnt; i++)
+		bpf_insn_set_adjust_after_remove(env->insn_set_maps[i], off, len);
+}
+
 static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len)
 {
 	struct bpf_jit_poke_descriptor *tab = prog->aux->poke_tab;
@@ -20599,6 +20637,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of
 	}
 	adjust_insn_aux_data(env, new_data, new_prog, off, len);
 	adjust_subprog_starts(env, off, len);
+	adjust_insn_sets(env, off, len);
 	adjust_poke_descs(new_prog, off, len);
 	return new_prog;
 }
@@ -20782,6 +20821,8 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
 	if (err)
 		return err;
 
+	adjust_insn_sets_after_remove(env, off, cnt);
+
 	memmove(aux_data + off,	aux_data + off + cnt,
 		sizeof(*aux_data) * (orig_prog_len - off - cnt));
 
@@ -24625,6 +24666,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	adjust_btf_func(env);
 
 err_release_maps:
+	if (ret)
+		release_insn_sets(env);
 	if (!env->prog->aux->used_maps)
 		/* if we didn't copy map pointers into bpf_prog_info, release
 		 * them now. Otherwise free_used_maps() will release them.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 39e7818cca80..a833c3b4dd75 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1013,6 +1013,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
 	BPF_MAP_TYPE_ARENA,
+	BPF_MAP_TYPE_INSN_SET,
 	__MAX_BPF_MAP_TYPE
 };
 
@@ -7589,4 +7590,14 @@ enum bpf_kfunc_flags {
 	BPF_F_PAD_ZEROS = (1ULL << 0),
 };
 
+/*
+ * Values of a BPF_MAP_TYPE_INSN_SET entry must be of this type.
+ * On updates jitted_off must be equal to 0.
+ */
+struct bpf_insn_set_value {
+	__u32 jitted_off;
+	__u32 xlated_off;
+};
+
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 3/9] selftests/bpf: add selftests for new insn_set map
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
  2025-06-15  8:59 ` [RFC bpf-next 1/9] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
  2025-06-15  8:59 ` [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-18 11:04   ` Eduard Zingerman
  2025-06-15  8:59 ` [RFC bpf-next 4/9] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

Tests are split in two parts.

The `bpf_insn_set_ops` test checks that the map is managed properly:

  * Incorrect instruction indexes are rejected
  * Non-sorted and non-unique indexes are rejected
  * Unfrozen maps are not accepted
  * Two programs can't use the same map
  * BPF progs can't operate the map

The `bpf_insn_set_reloc` part validates, as best as it can do it from user
space, that instructions are relocated properly:

  * no relocations => map is the same
  * expected relocations when instructions are added
  * expected relocations when instructions are deleted
  * expected relocations when multiple functions are present

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 .../selftests/bpf/prog_tests/bpf_insn_set.c   | 481 ++++++++++++++++++
 1 file changed, 481 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_insn_set.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_insn_set.c b/tools/testing/selftests/bpf/prog_tests/bpf_insn_set.c
new file mode 100644
index 000000000000..5138e54d522a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_insn_set.c
@@ -0,0 +1,481 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <bpf/bpf.h>
+#include <test_progs.h>
+
+static int map_create(__u32 map_type, __u32 max_entries)
+{
+	const char *map_name = "insn_set";
+	__u32 key_size = 4;
+	__u32 value_size = sizeof(struct bpf_insn_set_value);
+
+	return bpf_map_create(map_type, map_name, key_size, value_size, max_entries, NULL);
+}
+
+static int prog_load(struct bpf_insn *insns, __u32 insn_cnt, int *fd_array, __u32 fd_array_cnt)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, opts);
+
+	opts.fd_array = fd_array;
+	opts.fd_array_cnt = fd_array_cnt;
+
+	return bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, &opts);
+}
+
+/*
+ * Load a program, which will not be anyhow mangled by the verifier.  Add an
+ * insn_set map pointing to every instruction. Check that it hasn't changed
+ * after the program load.
+ */
+static void check_one_to_one_mapping(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 4),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd = -1, map_fd;
+	struct bpf_insn_set_value val = {};
+	int i;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, ARRAY_SIZE(insns));
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		val.xlated_off = i;
+		if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+			goto cleanup;
+	}
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+			goto cleanup;
+
+		ASSERT_EQ(val.xlated_off, i, "val should be equal i");
+	}
+
+cleanup:
+	close(prog_fd);
+	close(map_fd);
+}
+
+/*
+ * Try to load a program with a map which points to outside of the program
+ */
+static void check_out_of_bounds_index(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 4),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd, map_fd;
+	struct bpf_insn_set_value val = {};
+	int key;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, 1);
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	key = 0;
+	val.xlated_off = ARRAY_SIZE(insns); /* too big */
+	if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &key, &val, 0), 0, "bpf_map_update_elem"))
+		goto cleanup;
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	errno = 0;
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)")) {
+		close(prog_fd);
+		goto cleanup;
+	}
+
+cleanup:
+	close(map_fd);
+}
+
+/*
+ * Try to load a program with a map which points to the middle of 16-bit insn
+ */
+static void check_mid_insn_index(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_LD_IMM64(BPF_REG_0, 0), /* 2 x 8 */
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd, map_fd;
+	struct bpf_insn_set_value val = {};
+	int key;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, 1);
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	key = 0;
+	val.xlated_off = 1; /* middle of 16-byte instruction */
+	if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &key, &val, 0), 0, "bpf_map_update_elem"))
+		goto cleanup;
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	errno = 0;
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)")) {
+		close(prog_fd);
+		goto cleanup;
+	}
+
+cleanup:
+	close(map_fd);
+}
+
+static void check_incorrect_index(void)
+{
+	check_out_of_bounds_index();
+	check_mid_insn_index();
+}
+
+/*
+ * Load a program with two patches (get jiffies, for simplicity). Add an
+ * insn_set map pointing to every instruction. Check how it was relocated
+ * after the program load.
+ */
+static void check_relocate_simple(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd = -1, map_fd;
+	__u32 map_in[] = {0, 1, 2, 3, 4, 5};
+	__u32 map_out[] = {0, 1, 4, 5, 8, 9};
+	struct bpf_insn_set_value val = {};
+	int i;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, ARRAY_SIZE(insns));
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		val.xlated_off = map_in[i];
+		if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+			       "bpf_map_update_elem"))
+			goto cleanup;
+	}
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+			goto cleanup;
+
+		ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+	}
+
+cleanup:
+	close(prog_fd);
+	close(map_fd);
+}
+
+/*
+ * Verifier can delete code in two cases: nops & dead code. From the relocation
+ * point of view, the two cases look the same, so test using the simplest
+ * method: by loading some nops
+ */
+static void check_relocate_deletions(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd = -1, map_fd;
+	__u32 map_in[] = {0, 1, 2, 3, 4, 5};
+	__u32 map_out[] = {0, -1, 1, -1, 2, 3};
+	struct bpf_insn_set_value val = {};
+	int i;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, ARRAY_SIZE(insns));
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		val.xlated_off = map_in[i];
+		if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+			       "bpf_map_update_elem"))
+			goto cleanup;
+	}
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+			goto cleanup;
+
+		ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+	}
+
+cleanup:
+	close(prog_fd);
+	close(map_fd);
+}
+
+static void check_relocate_with_functions(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0), /* nop */
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd = -1, map_fd;
+	__u32 map_in[] =  { 0, 1,  2, 3, 4, 5, /* func */  6, 7,  8, 9, 10};
+	__u32 map_out[] = {-1, 0, -1, 3, 4, 5, /* func */ -1, 6, -1, 9, 10};
+	struct bpf_insn_set_value val = {};
+	int i;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, ARRAY_SIZE(insns));
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		val.xlated_off = map_in[i];
+		if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0,
+			       "bpf_map_update_elem"))
+			goto cleanup;
+	}
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+			goto cleanup;
+
+		ASSERT_EQ(val.xlated_off, map_out[i], "val should be equal map_out[i]");
+	}
+
+cleanup:
+	close(prog_fd);
+	close(map_fd);
+}
+
+/* Once map was initialized, it should be frozen */
+static void check_load_unfrozen_map(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd = -1, map_fd;
+	struct bpf_insn_set_value val = {};
+	int i;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, ARRAY_SIZE(insns));
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		val.xlated_off = i;
+		if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+			goto cleanup;
+	}
+
+	errno = 0;
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)"))
+		goto cleanup;
+
+	/* correctness: now freeze the map, the program should load fine */
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+			goto cleanup;
+
+		ASSERT_EQ(val.xlated_off, i, "val should be equal i");
+	}
+
+cleanup:
+	close(prog_fd);
+	close(map_fd);
+}
+
+/* Map can be used only by one BPF program */
+static void check_no_map_reuse(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd = -1, map_fd, extra_fd = -1;
+	struct bpf_insn_set_value val = {};
+	int i;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, ARRAY_SIZE(insns));
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		val.xlated_off = i;
+		if (!ASSERT_EQ(bpf_map_update_elem(map_fd, &i, &val, 0), 0, "bpf_map_update_elem"))
+			goto cleanup;
+	}
+
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(insns); i++) {
+		if (!ASSERT_EQ(bpf_map_lookup_elem(map_fd, &i, &val), 0, "bpf_map_lookup_elem"))
+			goto cleanup;
+
+		ASSERT_EQ(val.xlated_off, i, "val should be equal i");
+	}
+
+	errno = 0;
+	extra_fd = prog_load(insns, ARRAY_SIZE(insns), &map_fd, 1);
+	if (!ASSERT_EQ(extra_fd, -EBUSY, "program should have been rejected (extra_fd != -EBUSY)"))
+		goto cleanup;
+
+	/* correctness: check that prog is still loadable without fd_array */
+	extra_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD): expected no error"))
+		goto cleanup;
+
+cleanup:
+	close(extra_fd);
+	close(prog_fd);
+	close(map_fd);
+}
+
+static void check_bpf_no_lookup(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_LD_MAP_FD(BPF_REG_1, 0),
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd = -1, map_fd;
+
+	map_fd = map_create(BPF_MAP_TYPE_INSN_SET, 1);
+	if (!ASSERT_GE(map_fd, 0, "map_create"))
+		return;
+
+	/* otherwise will be rejected as unfrozen */
+	if (!ASSERT_EQ(bpf_map_freeze(map_fd), 0, "bpf_map_freeze"))
+		goto cleanup;
+
+	insns[0].imm = map_fd;
+
+	errno = 0;
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+	if (!ASSERT_EQ(prog_fd, -EINVAL, "program should have been rejected (prog_fd != -EINVAL)"))
+		goto cleanup;
+
+	/* correctness: check that prog is still loadable with normal map */
+	close(map_fd);
+	map_fd = map_create(BPF_MAP_TYPE_ARRAY, 1);
+	insns[0].imm = map_fd;
+	prog_fd = prog_load(insns, ARRAY_SIZE(insns), NULL, 0);
+	if (!ASSERT_GE(prog_fd, 0, "bpf(BPF_PROG_LOAD)"))
+		goto cleanup;
+
+cleanup:
+	close(prog_fd);
+	close(map_fd);
+}
+
+static void check_bpf_side(void)
+{
+	check_bpf_no_lookup();
+}
+
+/* Test how relocations work */
+void test_bpf_insn_set_reloc(void)
+{
+	if (test__start_subtest("one2one"))
+		check_one_to_one_mapping();
+
+	if (test__start_subtest("relocate-simple"))
+		check_relocate_simple();
+
+	if (test__start_subtest("relocate-deletions"))
+		check_relocate_deletions();
+
+	if (test__start_subtest("relocate-multiple-functions"))
+		check_relocate_with_functions();
+}
+
+/* Check all kinds of operations and related restrictions */
+void test_bpf_insn_set_ops(void)
+{
+	if (test__start_subtest("incorrect-index"))
+		check_incorrect_index();
+
+	if (test__start_subtest("load-unfrozen-map"))
+		check_load_unfrozen_map();
+
+	if (test__start_subtest("no-map-reuse"))
+		check_no_map_reuse();
+
+	if (test__start_subtest("bpf-side-ops"))
+		check_bpf_side();
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 4/9] bpf, x86: allow indirect jumps to r8...r15
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
                   ` (2 preceding siblings ...)
  2025-06-15  8:59 ` [RFC bpf-next 3/9] selftests/bpf: add selftests for new insn_set map Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-17 19:41   ` Alexei Starovoitov
  2025-06-15  8:59 ` [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps Anton Protopopov
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

Currently, the emit_indirect_jump() function only accepts one of the
RAX, RCX, ..., RBP registers as the destination. Prepare it to accept
R8, R9, ..., R15 as well. This is necessary to enable indirect jumps
support in eBPF.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 923c38f212dc..37dc83d91832 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -659,7 +659,19 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 
 #define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
 
-static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)
+{
+	u8 *prog = *pprog;
+
+	if (ereg)
+		EMIT1(0x41);
+
+	EMIT2(0xFF, 0xE0 + reg);
+
+	*pprog = prog;
+}
+
+static void emit_indirect_jump(u8 **pprog, int reg, bool ereg, u8 *ip)
 {
 	u8 *prog = *pprog;
 
@@ -668,15 +680,15 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
 		emit_jump(&prog, its_static_thunk(reg), ip);
 	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE)) {
 		EMIT_LFENCE();
-		EMIT2(0xFF, 0xE0 + reg);
+		__emit_indirect_jump(pprog, reg, ereg);
 	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
 		OPTIMIZER_HIDE_VAR(reg);
 		if (cpu_feature_enabled(X86_FEATURE_CALL_DEPTH))
-			emit_jump(&prog, &__x86_indirect_jump_thunk_array[reg], ip);
+			emit_jump(&prog, &__x86_indirect_jump_thunk_array[reg + 8*ereg], ip);
 		else
-			emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
+			emit_jump(&prog, &__x86_indirect_thunk_array[reg + 8*ereg], ip);
 	} else {
-		EMIT2(0xFF, 0xE0 + reg);	/* jmp *%\reg */
+		__emit_indirect_jump(pprog, reg, ereg);
 		if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) || IS_ENABLED(CONFIG_MITIGATION_SLS))
 			EMIT1(0xCC);		/* int3 */
 	}
@@ -796,7 +808,7 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 	 * rdi == ctx (1st arg)
 	 * rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
 	 */
-	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
+	emit_indirect_jump(&prog, 1 /* rcx */, false, ip + (prog - start));
 
 	/* out: */
 	ctx->tail_call_indirect_label = prog - start;
@@ -3445,7 +3457,7 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs, u8 *image,
 		if (err)
 			return err;
 
-		emit_indirect_jump(&prog, 2 /* rdx */, image + (prog - buf));
+		emit_indirect_jump(&prog, 2 /* rdx */, false, image + (prog - buf));
 
 		*pprog = prog;
 		return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
                   ` (3 preceding siblings ...)
  2025-06-15  8:59 ` [RFC bpf-next 4/9] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-18  3:06   ` Alexei Starovoitov
  2025-06-18 11:03   ` Eduard Zingerman
  2025-06-15  8:59 ` [RFC bpf-next 6/9] bpf: workaround llvm behaviour with " Anton Protopopov
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

Add support for a new instruction

    BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)

which does an indirect jump to a location stored in Rx. The map M
is an instruction set map containing all possible targets for this
particular jump.

On the jump the register Rx should have type PTR_TO_INSN. This new
type assures that the Rx register contains a value (or a range of
values) loaded from the map M. Typically, this will be done like this
The code above could have been generated for a switch statement with
(e.g., this could be a switch statement compiled with LLVM):

    0:   r3 = r1                    # "switch (r3)"
    1:   if r3 > 0x13 goto +0x666   # check r3 boundaries
    2:   r3 <<= 0x3                 # r3 is void*, point to an address
    3:   r1 = 0xbeef ll             # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
    5:   r1 += r3                   # r1 inherits boundaries from r3
    6:   r1 = *(u64 *)(r1 + 0x0)    # r1 now has type INSN_TO_PTR
    7:   gotox r1[,imm=fd(M)]       # verifier checks that M == r1->map_ptr

On building the jump graph, and the static analysis, a new function
of the INSN_SET is used: bpf_insn_set_iter_xlated_offset(map, n).
It lets to iterate over unique slots in an instruction set (equal
items can be generated, e.g., for a sparse jump table for a switch,
where not all possible branches are taken).

Instruction (3) above loads an address of the first element of the
map. From BPF point of view, the map is a jump table in native
architecture, e.g., an array of jump targets. This patch allows
to grab such an address and then later to adjust an offset, like in
instruction (5). A value of such type can be dereferenced once to
create a PTR_TO_INSN, see instruction (6).

When building the config, the high 16 bytes of the insn_state are
used, so this patch (theoretically) supports jump tables of up to
2^16 slots.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c  |   7 ++
 include/linux/bpf.h          |   2 +
 include/linux/bpf_verifier.h |   4 +
 kernel/bpf/bpf_insn_set.c    |  71 ++++++++++++-
 kernel/bpf/core.c            |   2 +
 kernel/bpf/verifier.c        | 198 ++++++++++++++++++++++++++++++++++-
 6 files changed, 278 insertions(+), 6 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 37dc83d91832..d20f6775605d 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2520,6 +2520,13 @@ st:			if (is_imm8(insn->off))
 
 			break;
 
+		case BPF_JMP | BPF_JA | BPF_X:
+		case BPF_JMP32 | BPF_JA | BPF_X:
+			emit_indirect_jump(&prog,
+					   reg2hex[insn->dst_reg],
+					   is_ereg(insn->dst_reg),
+					   image + addrs[i - 1]);
+			break;
 		case BPF_JMP | BPF_JA:
 		case BPF_JMP32 | BPF_JA:
 			if (BPF_CLASS(insn->code) == BPF_JMP) {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 008bcd44c60e..3c5eaea2b476 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -952,6 +952,7 @@ enum bpf_reg_type {
 	PTR_TO_ARENA,
 	PTR_TO_BUF,		 /* reg points to a read/write buffer */
 	PTR_TO_FUNC,		 /* reg points to a bpf program function */
+	PTR_TO_INSN,		 /* reg points to a bpf program instruction */
 	CONST_PTR_TO_DYNPTR,	 /* reg points to a const struct bpf_dynptr */
 	__BPF_REG_TYPE_MAX,
 
@@ -3601,6 +3602,7 @@ int bpf_insn_set_ready(struct bpf_map *map);
 void bpf_insn_set_release(struct bpf_map *map);
 void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
 void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
+int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no);
 
 struct bpf_insn_ptr {
 	void *jitted_ip;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 84b5e6b25c52..80d9afcca488 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -229,6 +229,10 @@ struct bpf_reg_state {
 	enum bpf_reg_liveness live;
 	/* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
 	bool precise;
+
+	/* Used to track boundaries of a PTR_TO_INSN */
+	u32 min_index;
+	u32 max_index;
 };
 
 enum bpf_stack_slot_type {
diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
index c20e99327118..316cecad60a9 100644
--- a/kernel/bpf/bpf_insn_set.c
+++ b/kernel/bpf/bpf_insn_set.c
@@ -9,6 +9,8 @@ struct bpf_insn_set {
 	struct bpf_map map;
 	struct mutex state_mutex;
 	int state;
+	u32 **unique_offsets;
+	u32 unique_offsets_cnt;
 	long *ips;
 	DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
 };
@@ -50,6 +52,7 @@ static void insn_set_free(struct bpf_map *map)
 {
 	struct bpf_insn_set *insn_set = cast_insn_set(map);
 
+	kfree(insn_set->unique_offsets);
 	kfree(insn_set->ips);
 	bpf_map_area_free(insn_set);
 }
@@ -69,6 +72,12 @@ static struct bpf_map *insn_set_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	insn_set->unique_offsets = kzalloc(sizeof(long) * attr->max_entries, GFP_KERNEL);
+	if (!insn_set->unique_offsets) {
+		insn_set_free(&insn_set->map);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	bpf_map_init_from_attr(&insn_set->map, attr);
 
 	mutex_init(&insn_set->state_mutex);
@@ -165,10 +174,25 @@ static u64 insn_set_mem_usage(const struct bpf_map *map)
 	u64 extra_size = 0;
 
 	extra_size += sizeof(long) * map->max_entries; /* insn_set->ips */
+	extra_size += 4 * map->max_entries; /* insn_set->unique_offsets */
 
 	return insn_set_alloc_size(map->max_entries) + extra_size;
 }
 
+static int insn_set_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+
+	/* for now, just reject all such loads */
+	if (off > 0)
+		return -EINVAL;
+
+	/* from BPF's point of view, this map is a jump table */
+	*imm = (unsigned long)insn_set->ips;
+
+	return 0;
+}
+
 BTF_ID_LIST_SINGLE(insn_set_btf_ids, struct, bpf_insn_set)
 
 const struct bpf_map_ops insn_set_map_ops = {
@@ -181,6 +205,7 @@ const struct bpf_map_ops insn_set_map_ops = {
 	.map_delete_elem = insn_set_delete_elem,
 	.map_check_btf = insn_set_check_btf,
 	.map_mem_usage = insn_set_mem_usage,
+	.map_direct_value_addr = insn_set_map_direct_value_addr,
 	.map_btf_id = &insn_set_btf_ids[0],
 };
 
@@ -217,6 +242,37 @@ static inline bool valid_offsets(const struct bpf_insn_set *insn_set,
 	return true;
 }
 
+static int cmp_unique_offsets(const void *a, const void *b)
+{
+	return *(u32 *)a - *(u32 *)b;
+}
+
+static int bpf_insn_set_init_unique_offsets(struct bpf_insn_set *insn_set)
+{
+	u32 cnt = insn_set->map.max_entries, ucnt = 1;
+	u32 **off = insn_set->unique_offsets;
+	int i;
+
+	/* [0,3,2,4,6,5,5,5,1,1,0,0] */
+	for (i = 0; i < cnt; i++)
+		off[i] = &insn_set->ptrs[i].user_value.xlated_off;
+
+	/* [0,0,0,1,1,2,3,4,5,5,5,6] */
+	sort(off, cnt, sizeof(off[0]), cmp_unique_offsets, NULL);
+
+	/*
+	 * [0,1,2,3,4,5,6,x,x,x,x,x]
+	 *  \.........../
+	 *    unique_offsets_cnt
+	 */
+	for (i = 1; i < cnt; i++)
+		if (*off[i] != *off[ucnt-1])
+			off[ucnt++] = off[i];
+
+	insn_set->unique_offsets_cnt = ucnt;
+	return 0;
+}
+
 int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
 {
 	struct bpf_insn_set *insn_set = cast_insn_set(map);
@@ -247,7 +303,10 @@ int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
 	for (i = 0; i < map->max_entries; i++)
 		insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
 
-	return 0;
+	/*
+	 * Prepare a set of unique offsets
+	 */
+	return bpf_insn_set_init_unique_offsets(insn_set);
 }
 
 int bpf_insn_set_ready(struct bpf_map *map)
@@ -336,3 +395,13 @@ void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
 		}
 	}
 }
+
+int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no)
+{
+	struct bpf_insn_set *insn_set = cast_insn_set(map);
+
+	if (iter_no >= insn_set->unique_offsets_cnt)
+		return -ENOENT;
+
+	return *insn_set->unique_offsets[iter_no];
+}
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index e536a34a32c8..058f5f463b74 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1706,6 +1706,8 @@ bool bpf_opcode_in_insntable(u8 code)
 		[BPF_LD | BPF_IND | BPF_B] = true,
 		[BPF_LD | BPF_IND | BPF_H] = true,
 		[BPF_LD | BPF_IND | BPF_W] = true,
+		[BPF_JMP | BPF_JA | BPF_X] = true,
+		[BPF_JMP32 | BPF_JA | BPF_X] = true,
 		[BPF_JMP | BPF_JCOND] = true,
 	};
 #undef BPF_INSN_3_TBL
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8ac9a0b5af53..fba553f844f1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -206,6 +206,7 @@ static int ref_set_non_owning(struct bpf_verifier_env *env,
 static void specialize_kfunc(struct bpf_verifier_env *env,
 			     u32 func_id, u16 offset, unsigned long *addr);
 static bool is_trusted_reg(const struct bpf_reg_state *reg);
+static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr);
 
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
@@ -5648,6 +5649,19 @@ static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static int check_insn_set_mem_access(struct bpf_verifier_env *env,
+				     const struct bpf_map *map,
+				     int off, int size, u32 mem_size)
+{
+	if ((off < 0) || (off % sizeof(long)) || (off/sizeof(long) >= map->max_entries))
+		return -EACCES;
+
+	if (mem_size != 8 || size != 8)
+		return -EACCES;
+
+	return 0;
+}
+
 /* check read/write into memory region (e.g., map value, ringbuf sample, etc) */
 static int __check_mem_access(struct bpf_verifier_env *env, int regno,
 			      int off, int size, u32 mem_size,
@@ -5666,6 +5680,10 @@ static int __check_mem_access(struct bpf_verifier_env *env, int regno,
 			mem_size, off, size);
 		break;
 	case PTR_TO_MAP_VALUE:
+		if (reg->map_ptr->map_type == BPF_MAP_TYPE_INSN_SET &&
+		    check_insn_set_mem_access(env, reg->map_ptr, off, size, mem_size) == 0)
+			return 0;
+
 		verbose(env, "invalid access to map value, value_size=%d off=%d size=%d\n",
 			mem_size, off, size);
 		break;
@@ -7713,12 +7731,18 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
 			     bool allow_trust_mismatch);
 
+static bool map_is_insn_set(struct bpf_map *map)
+{
+	return map && map->map_type == BPF_MAP_TYPE_INSN_SET;
+}
+
 static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			  bool strict_alignment_once, bool is_ldsx,
 			  bool allow_trust_mismatch, const char *ctx)
 {
 	struct bpf_reg_state *regs = cur_regs(env);
 	enum bpf_reg_type src_reg_type;
+	struct bpf_map *map_ptr_copy = NULL;
 	int err;
 
 	/* check src operand */
@@ -7733,6 +7757,9 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	src_reg_type = regs[insn->src_reg].type;
 
+	if (src_reg_type == PTR_TO_MAP_VALUE && map_is_insn_set(regs[insn->src_reg].map_ptr))
+		map_ptr_copy = regs[insn->src_reg].map_ptr;
+
 	/* Check if (src_reg + off) is readable. The state of dst_reg will be
 	 * updated by this call.
 	 */
@@ -7743,6 +7770,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				       allow_trust_mismatch);
 	err = err ?: reg_bounds_sanity_check(env, &regs[insn->dst_reg], ctx);
 
+	if (map_ptr_copy) {
+		regs[insn->dst_reg].type = PTR_TO_INSN;
+		regs[insn->dst_reg].map_ptr = map_ptr_copy;
+		regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
+		regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
+	}
+
 	return err;
 }
 
@@ -15296,6 +15330,22 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 		return 0;
 	}
 
+	if (dst_reg->type == PTR_TO_MAP_VALUE && map_is_insn_set(dst_reg->map_ptr)) {
+		if (opcode != BPF_ADD) {
+			verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
+				bpf_alu_string[opcode >> 4]);
+			return -EACCES;
+		}
+		src_reg = &regs[insn->src_reg];
+		if (src_reg->type != SCALAR_VALUE) {
+			verbose(env, "Adding non-scalar R%d to an instruction ptr is prohibited\n",
+				insn->src_reg);
+			return -EACCES;
+		}
+		dst_reg->min_index = src_reg->umin_value / sizeof(long);
+		dst_reg->max_index = src_reg->umax_value / sizeof(long);
+	}
+
 	if (dst_reg->type != SCALAR_VALUE)
 		ptr_reg = dst_reg;
 
@@ -16797,6 +16847,11 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 			__mark_reg_unknown(env, dst_reg);
 			return 0;
 		}
+		if (map->map_type == BPF_MAP_TYPE_INSN_SET) {
+			dst_reg->type = PTR_TO_MAP_VALUE;
+			dst_reg->off = aux->map_off;
+			return 0;
+		}
 		dst_reg->type = PTR_TO_MAP_VALUE;
 		dst_reg->off = aux->map_off;
 		WARN_ON_ONCE(map->max_entries != 1);
@@ -17552,6 +17607,62 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
 	return 0;
 }
 
+#define SET_HIGH(STATE, LAST)	STATE = (STATE & 0xffffU) | ((LAST) << 16)
+#define GET_HIGH(STATE)		((u16)((STATE) >> 16))
+
+static int gotox_sanity_check(struct bpf_verifier_env *env, int from, int to)
+{
+	/* TBD: check that to belongs to the same BPF function && whatever else */
+
+	return 0;
+}
+
+static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct bpf_map *map)
+{
+	int *insn_stack = env->cfg.insn_stack;
+	int *insn_state = env->cfg.insn_state;
+	u16 prev_edge = GET_HIGH(insn_state[t]);
+	int err;
+	int w;
+
+	w = bpf_insn_set_iter_xlated_offset(map, prev_edge);
+	if (w == -ENOENT)
+		return DONE_EXPLORING;
+	else if (w < 0)
+		return w;
+
+	err = gotox_sanity_check(env, t, w);
+	if (err)
+		return err;
+
+	mark_prune_point(env, t);
+
+	if (env->cfg.cur_stack >= env->prog->len)
+		return -E2BIG;
+	insn_stack[env->cfg.cur_stack++] = w;
+
+	mark_jmp_point(env, w);
+
+	SET_HIGH(insn_state[t], prev_edge + 1);
+	return KEEP_EXPLORING;
+}
+
+/* "conditional jump with N edges" */
+static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
+{
+	struct bpf_map *map;
+	int ret;
+
+	ret = add_used_map(env, fd, &map);
+	if (ret < 0)
+		return ret;
+
+	if (map->map_type != BPF_MAP_TYPE_INSN_SET)
+		return -EINVAL;
+
+	return push_goto_x_edge(t, env, map);
+}
+
 /* Visits the instruction at index t and returns one of the following:
  *  < 0 - an error occurred
  *  DONE_EXPLORING - the instruction was fully explored
@@ -17642,8 +17753,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 		return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
 
 	case BPF_JA:
-		if (BPF_SRC(insn->code) != BPF_K)
-			return -EINVAL;
+		if (BPF_SRC(insn->code) == BPF_X)
+			return visit_goto_x_insn(t, env, insn->imm);
 
 		if (BPF_CLASS(insn->code) == BPF_JMP)
 			off = insn->off;
@@ -17674,6 +17785,13 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 	}
 }
 
+static bool insn_is_gotox(struct bpf_insn *insn)
+{
+	return BPF_CLASS(insn->code) == BPF_JMP &&
+	       BPF_OP(insn->code) == BPF_JA &&
+	       BPF_SRC(insn->code) == BPF_X;
+}
+
 /* non-recursive depth-first-search to detect loops in BPF program
  * loop == back-edge in directed graph
  */
@@ -18786,11 +18904,22 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
 			      struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
 {
 	u16 live_regs = env->insn_aux_data[insn_idx].live_regs_before;
+	struct bpf_insn *insn;
 	u16 i;
 
 	if (old->callback_depth > cur->callback_depth)
 		return false;
 
+	insn = &env->prog->insnsi[insn_idx];
+	if (insn_is_gotox(insn)) {
+		struct bpf_reg_state *old_dst = &old->regs[insn->dst_reg];
+		struct bpf_reg_state *cur_dst = &cur->regs[insn->dst_reg];
+
+		if (old_dst->min_index != cur_dst->min_index ||
+		    old_dst->max_index != cur_dst->max_index)
+			return false;
+	}
+
 	for (i = 0; i < MAX_BPF_REG; i++)
 		if (((1 << i) & live_regs) &&
 		    !regsafe(env, &old->regs[i], &cur->regs[i],
@@ -19654,6 +19783,55 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
 	return PROCESS_BPF_EXIT;
 }
 
+static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
+{
+	struct bpf_verifier_state *other_branch;
+	struct bpf_reg_state *dst_reg;
+	struct bpf_map *map;
+	int xoff;
+	int err;
+	u32 i;
+
+	/* this map should already have been added */
+	err = add_used_map(env, insn->imm, &map);
+	if (err < 0)
+		return err;
+
+	dst_reg = reg_state(env, insn->dst_reg);
+	if (dst_reg->type != PTR_TO_INSN) {
+		verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
+				insn->dst_reg, dst_reg->type);
+		return -EINVAL;
+	}
+
+	if (dst_reg->map_ptr != map) {
+		verbose(env, "BPF_JA|BPF_X R%d was loaded from map id=%u, expected id=%u\n",
+				insn->dst_reg, dst_reg->map_ptr->id, map->id);
+		return -EINVAL;
+	}
+
+	if (dst_reg->max_index >= map->max_entries)
+		return -EINVAL;
+
+	for (i = dst_reg->min_index + 1; i <= dst_reg->max_index; i++) {
+		xoff = bpf_insn_set_iter_xlated_offset(map, i);
+		if (xoff == -ENOENT)
+			break;
+		if (xoff < 0)
+			return xoff;
+
+		other_branch = push_stack(env, xoff, env->insn_idx, false);
+		if (!other_branch)
+			return -EFAULT;
+	}
+
+	env->insn_idx = bpf_insn_set_iter_xlated_offset(map, dst_reg->min_index);
+	if (env->insn_idx < 0)
+		return env->insn_idx;
+
+	return 0;
+}
+
 static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
 {
 	int err;
@@ -19756,6 +19934,9 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
 
 			mark_reg_scratched(env, BPF_REG_0);
 		} else if (opcode == BPF_JA) {
+			if (BPF_SRC(insn->code) == BPF_X)
+				return check_indirect_jump(env, insn);
+
 			if (BPF_SRC(insn->code) != BPF_K ||
 			    insn->src_reg != BPF_REG_0 ||
 			    insn->dst_reg != BPF_REG_0 ||
@@ -20243,6 +20424,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		case BPF_MAP_TYPE_QUEUE:
 		case BPF_MAP_TYPE_STACK:
 		case BPF_MAP_TYPE_ARENA:
+		case BPF_MAP_TYPE_INSN_SET:
 			break;
 		default:
 			verbose(env,
@@ -20330,10 +20512,11 @@ static int __add_used_map(struct bpf_verifier_env *env, struct bpf_map *map)
  * its index.
  * Returns <0 on error, or >= 0 index, on success.
  */
-static int add_used_map(struct bpf_verifier_env *env, int fd)
+static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr)
 {
 	struct bpf_map *map;
 	CLASS(fd, f)(fd);
+	int ret;
 
 	map = __bpf_map_get(f);
 	if (IS_ERR(map)) {
@@ -20341,7 +20524,10 @@ static int add_used_map(struct bpf_verifier_env *env, int fd)
 		return PTR_ERR(map);
 	}
 
-	return __add_used_map(env, map);
+	ret = __add_used_map(env, map);
+	if (ret >= 0 && map_ptr)
+		*map_ptr = map;
+	return ret;
 }
 
 /* find and rewrite pseudo imm in ld_imm64 instructions:
@@ -20435,7 +20621,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
 				break;
 			}
 
-			map_idx = add_used_map(env, fd);
+			map_idx = add_used_map(env, fd, NULL);
 			if (map_idx < 0)
 				return map_idx;
 			map = env->used_maps[map_idx];
@@ -21459,6 +21645,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->jited_linfo = prog->aux->jited_linfo;
 		func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
 		func[i]->aux->arena = prog->aux->arena;
+		func[i]->aux->used_maps = env->used_maps;
+		func[i]->aux->used_map_cnt = env->used_map_cnt;
 		num_exentries = 0;
 		insn = func[i]->insnsi;
 		for (j = 0; j < func[i]->len; j++, insn++) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 6/9] bpf: workaround llvm behaviour with indirect jumps
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
                   ` (4 preceding siblings ...)
  2025-06-15  8:59 ` [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-18 11:04   ` Eduard Zingerman
  2025-06-15  8:59 ` [RFC bpf-next 7/9] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

When indirect jumps are enabled in LLVM, it might generate
unreachable instructions. For example, the following code

    SEC("syscall") int foo(struct simple_ctx *ctx)
    {
            switch (ctx->x) {
            case 0:
                    ret_user = 2;
                    break;
            case 11:
                    ret_user = 3;
                    break;
            case 27:
                    ret_user = 4;
                    break;
            case 31:
                    ret_user = 5;
                    break;
            default:
                    ret_user = 19;
                    break;
            }

            return 0;
    }

compiles into

    <foo>:
    ;       switch (ctx->x) {
         224:       79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x0)
         225:       25 01 0f 00 1f 00 00 00 if r1 > 0x1f goto +0xf <foo+0x88>
         226:       67 01 00 00 03 00 00 00 r1 <<= 0x3
         227:       18 02 00 00 a8 00 00 00 00 00 00 00 00 00 00 00 r2 = 0xa8 ll
                    0000000000000718:  R_BPF_64_64  .rodata
         229:       0f 12 00 00 00 00 00 00 r2 += r1
         230:       79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x0)
         231:       0d 01 00 00 00 00 00 00 gotox r1
         232:       05 00 08 00 00 00 00 00 goto +0x8 <foo+0x88>
         233:       b7 01 00 00 02 00 00 00 r1 = 0x2
    ;       switch (ctx->x) {
         234:       05 00 07 00 00 00 00 00 goto +0x7 <foo+0x90>
         235:       b7 01 00 00 04 00 00 00 r1 = 0x4
    ;               break;
         236:       05 00 05 00 00 00 00 00 goto +0x5 <foo+0x90>
         237:       b7 01 00 00 03 00 00 00 r1 = 0x3
    ;               break;
         238:       05 00 03 00 00 00 00 00 goto +0x3 <foo+0x90>
         239:       b7 01 00 00 05 00 00 00 r1 = 0x5
    ;               break;
         240:       05 00 01 00 00 00 00 00 goto +0x1 <foo+0x90>
         241:       b7 01 00 00 13 00 00 00 r1 = 0x13
         242:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
                    0000000000000790:  R_BPF_64_64  ret_user
         244:       7b 12 00 00 00 00 00 00 *(u64 *)(r2 + 0x0) = r1
    ;       return 0;
         245:       b4 00 00 00 00 00 00 00 w0 = 0x0
         246:       95 00 00 00 00 00 00 00 exit

The jump table is

    242, 241, 241, 241, 241, 241, 241, 241,
    241, 241, 241, 237, 241, 241, 241, 241,
    241, 241, 241, 241, 241, 241, 241, 241,
    241, 241, 241, 235, 241, 241, 241, 239

The check

    225:       25 01 0f 00 1f 00 00 00 if r1 > 0x1f goto +0xf <foo+0x88>

makes sure that the r1 register is always loaded from the jump table.
This makes the instruction

    232:       05 00 08 00 00 00 00 00 goto +0x8 <foo+0x88>

unreachable.

Patch verifier to ignore such unreachable JA instructions.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 kernel/bpf/verifier.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fba553f844f1..2e4116c71f4b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17792,6 +17792,27 @@ static bool insn_is_gotox(struct bpf_insn *insn)
 	       BPF_SRC(insn->code) == BPF_X;
 }
 
+static bool insn_is_ja(struct bpf_insn *insn)
+{
+	return BPF_CLASS(insn->code) == BPF_JMP &&
+	       BPF_OP(insn->code) == BPF_JA &&
+	       BPF_SRC(insn->code) == BPF_K;
+}
+
+/*
+ * This is a workaround to overcome a LLVM "bug". The problem is that
+ * sometimes LLVM would generate code like
+ *
+ *     gotox rX
+ *     goto +offset
+ *
+ * even though rX never points to the goto +offset instruction.
+ */
+static inline bool magic_dead_ja(struct bpf_insn *insn, bool have_prev)
+{
+	return have_prev && insn_is_gotox(insn - 1) && insn_is_ja(insn);
+}
+
 /* non-recursive depth-first-search to detect loops in BPF program
  * loop == back-edge in directed graph
  */
@@ -17866,6 +17887,9 @@ static int check_cfg(struct bpf_verifier_env *env)
 		struct bpf_insn *insn = &env->prog->insnsi[i];
 
 		if (insn_state[i] != EXPLORED) {
+			if (magic_dead_ja(insn, i > 0))
+				continue;
+
 			verbose(env, "unreachable insn %d\n", i);
 			ret = -EINVAL;
 			goto err_free;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 7/9] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
                   ` (5 preceding siblings ...)
  2025-06-15  8:59 ` [RFC bpf-next 6/9] bpf: workaround llvm behaviour with " Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-15  8:59 ` [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps Anton Protopopov
  2025-06-15  8:59 ` [RFC bpf-next 9/9] selftests/bpf: add selftests for " Anton Protopopov
  8 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

Add support for indirect jump instruction.

Example output from bpftool:

   0: (79) r3 = *(u64 *)(r1 +0)
   1: (25) if r3 > 0x4 goto pc+666
   2: (67) r3 <<= 3
   3: (18) r1 = 0xffffbeefspameggs
   5: (0f) r1 += r3
   6: (79) r1 = *(u64 *)(r1 +0)
   7: (0d) gotox r1

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 kernel/bpf/disasm.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 20883c6b1546..202b39864de4 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -183,6 +183,13 @@ static inline bool is_mov_percpu_addr(const struct bpf_insn *insn)
 	return insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->off == BPF_ADDR_PERCPU;
 }
 
+static void print_bpf_ja_indirect(bpf_insn_print_t verbose,
+				  void *private_data,
+				  const struct bpf_insn *insn)
+{
+	verbose(private_data, "(%02x) gotox r%d\n", insn->code, insn->dst_reg);
+}
+
 void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 		    const struct bpf_insn *insn,
 		    bool allow_ptr_leaks)
@@ -358,6 +365,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 		} else if (insn->code == (BPF_JMP | BPF_JA)) {
 			verbose(cbs->private_data, "(%02x) goto pc%+d\n",
 				insn->code, insn->off);
+		} else if (insn->code == (BPF_JMP | BPF_JA | BPF_X) ||
+			   insn->code == (BPF_JMP32 | BPF_JA | BPF_X)) {
+			print_bpf_ja_indirect(verbose, cbs->private_data, insn);
 		} else if (insn->code == (BPF_JMP | BPF_JCOND) &&
 			   insn->src_reg == BPF_MAY_GOTO) {
 			verbose(cbs->private_data, "(%02x) may_goto pc%+d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
                   ` (6 preceding siblings ...)
  2025-06-15  8:59 ` [RFC bpf-next 7/9] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-18  3:22   ` Alexei Starovoitov
  2025-06-18 19:49   ` Eduard Zingerman
  2025-06-15  8:59 ` [RFC bpf-next 9/9] selftests/bpf: add selftests for " Anton Protopopov
  8 siblings, 2 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

For v5 instruction set, LLVM now is allowed to generate indirect
jumps for switch statements. Every such a jump will be accompanied
by necessary metadata (enabled by -emit-jump-table-sizes-section).
The -bpf-min-jump-table-entries llvm option may be used to control
the minimal size of a jump table which will be converted to an
indirect jumps.

For a given switch the following data will be generated by LLVM:
.rodata will contain actual addresses, .llvm_jump_table_sizes
and .rel.llvm_jump_table_sizes tables provide meta-data necessary
to find and relocate the offsets.

The code generated by LLVM for a switch will look, approximately,
like this:

    0: rX <- jump_table_x[i]
    2: rX *= rX
    3: gotox rX

This code will be rejected by the verifier as is. First
transformation required here is that jump_table_x at the insn(0)
actually points to the `.rodata` section (map). So for such loads
libbpf should create a proper map of type BPF_MAP_TYPE_INSN_SET,
using the aforementioned meta-data.

Then, in the insn(2) the address in rX gets dereferenced to point to
an actual instruction address. (From the verifier's point of view,
the rX changes type from PTR_TO_MAP_VALUE to PTR_TO_INSN.)

The final line generates an indirect jump. The
format of the indirect jump instruction supported by BPF is

    BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)

and, obviously, the map M must be the same map which was used to
init the register rX. This patch implements this in the following,
hacky, but so far suitable for all existing use-cases, way. On
encountering a `gotox` instruction libbpf tracks back to the
previous direct load from map and stores this map file descriptor
in the gotox instruction.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 tools/lib/bpf/libbpf.c          | 333 +++++++++++++++++++++++++++++---
 tools/lib/bpf/libbpf_internal.h |   4 +
 tools/lib/bpf/linker.c          |  66 ++++++-
 3 files changed, 376 insertions(+), 27 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6445165a24f2..a4cc15c8a3c0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -496,6 +496,10 @@ struct bpf_program {
 	__u32 line_info_rec_size;
 	__u32 line_info_cnt;
 	__u32 prog_flags;
+
+	__u32 subprog_offset[256];
+	__u32 subprog_sec_offst[256];
+	__u32 subprog_cnt;
 };
 
 struct bpf_struct_ops {
@@ -525,6 +529,7 @@ struct bpf_struct_ops {
 #define STRUCT_OPS_SEC ".struct_ops"
 #define STRUCT_OPS_LINK_SEC ".struct_ops.link"
 #define ARENA_SEC ".addr_space.1"
+#define LLVM_JT_SIZES_SEC ".llvm_jump_table_sizes"
 
 enum libbpf_map_type {
 	LIBBPF_MAP_UNSPEC,
@@ -658,6 +663,7 @@ struct elf_state {
 	Elf64_Ehdr *ehdr;
 	Elf_Data *symbols;
 	Elf_Data *arena_data;
+	Elf_Data *jt_sizes_data;
 	size_t shstrndx; /* section index for section name strings */
 	size_t strtabidx;
 	struct elf_sec_desc *secs;
@@ -668,6 +674,7 @@ struct elf_state {
 	int symbols_shndx;
 	bool has_st_ops;
 	int arena_data_shndx;
+	int jt_sizes_data_shndx;
 };
 
 struct usdt_manager;
@@ -678,6 +685,13 @@ enum bpf_object_state {
 	OBJ_LOADED,
 };
 
+struct jt {
+	__u64 insn_off; /* unique offset within .rodata */
+
+	size_t jump_target_cnt;
+	__u32 jump_target[];
+};
+
 struct bpf_object {
 	char name[BPF_OBJ_NAME_LEN];
 	char license[64];
@@ -698,6 +712,14 @@ struct bpf_object {
 	bool has_subcalls;
 	bool has_rodata;
 
+	const void *rodata;
+	size_t rodata_size;
+	int rodata_map_fd;
+
+	/* Jump Tables */
+	struct jt **jt;
+	size_t jt_cnt;
+
 	struct bpf_gen *gen_loader;
 
 	/* Information when doing ELF related work. Only valid if efile.elf is not NULL */
@@ -1888,6 +1910,98 @@ static char *internal_map_name(struct bpf_object *obj, const char *real_name)
 	return strdup(map_name);
 }
 
+static const struct jt *bpf_object__find_jt(struct bpf_object *obj, __u64 insn_off)
+{
+	size_t i;
+
+	for (i = 0; i < obj->jt_cnt; i++)
+		if (obj->jt[i]->insn_off == insn_off)
+			return obj->jt[i];
+
+	return ERR_PTR(-ENOENT);
+}
+
+static int bpf_object__alloc_jt(struct bpf_object *obj, __u64 insn_off, __u64 size)
+{
+	__u64 i, jump_target;
+	struct jt *jt;
+	int err = 0;
+	void *x;
+
+	jt = calloc(1, sizeof(struct jt) + sizeof(jt->jump_target[0])*size);
+	if (!jt)
+		return -ENOMEM;
+
+	jt->insn_off = insn_off;
+	jt->jump_target_cnt = size;
+
+	for (i = 0; i < size; i++) {
+		if (i + insn_off > obj->rodata_size / 8) {
+			pr_warn("can't resolve a pointer to .rodata[%llu]: rodata size is %lu!\n",
+				(i + insn_off) * 8, obj->rodata_size);
+			err = -EINVAL;
+			goto ret;
+		}
+
+		jump_target = ((__u64 *)obj->rodata)[insn_off + i] / 8;
+		if (jump_target > UINT32_MAX) {
+			pr_warn("jump target is too big: 0x%016llx!\n", jump_target);
+			err = -EINVAL;
+			goto ret;
+		}
+		jt->jump_target[i] = jump_target;
+	}
+
+	x = realloc(obj->jt, (obj->jt_cnt + 1) * sizeof(long));
+	if (!x) {
+		err = -ENOMEM;
+		goto ret;
+	}
+	obj->jt = x;
+	obj->jt[obj->jt_cnt++] = jt;
+
+ret:
+	if (err)
+		free(jt);
+	return err;
+}
+
+static int bpf_object__add_jt(struct bpf_object *obj, __u64 insn_off, __u64 size)
+{
+	if (!obj->rodata) {
+		pr_warn("attempt to add a jump table, but no .rodata present!\n");
+		return -EINVAL;
+	}
+
+	if (!IS_ERR(bpf_object__find_jt(obj, insn_off)))
+		return -EINVAL;
+
+	return bpf_object__alloc_jt(obj, insn_off, size);
+}
+
+static int bpf_object__collect_jt(struct bpf_object *obj)
+{
+	Elf_Data *data = obj->efile.jt_sizes_data;
+	__u64 *buf;
+	size_t i;
+	int err;
+
+	if (!data)
+		return 0;
+
+	buf = (__u64 *)data->d_buf;
+	for (i = 0; i < data->d_size / 16; i++) {
+		__u64 off = buf[2*i];
+		__u64 size = buf[2*i+1];
+
+		err = bpf_object__add_jt(obj, off / 8, size);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int
 map_fill_btf_type_info(struct bpf_object *obj, struct bpf_map *map);
 
@@ -1978,6 +2092,10 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 	if (data)
 		memcpy(map->mmaped, data, data_sz);
 
+	/* Save this file descriptor */
+	if (type == LIBBPF_MAP_RODATA)
+		obj->rodata_map_fd = map->fd;
+
 	pr_debug("map %td is \"%s\"\n", map - obj->maps, map->name);
 	return 0;
 }
@@ -2008,6 +2126,8 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 			break;
 		case SEC_RODATA:
 			obj->has_rodata = true;
+			obj->rodata = sec_desc->data->d_buf;
+			obj->rodata_size = sec_desc->data->d_size;
 			sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_RODATA,
 							    sec_name, sec_idx,
@@ -3961,7 +4081,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 			    strcmp(name, ".rel" STRUCT_OPS_LINK_SEC) &&
 			    strcmp(name, ".rel?" STRUCT_OPS_SEC) &&
 			    strcmp(name, ".rel?" STRUCT_OPS_LINK_SEC) &&
-			    strcmp(name, ".rel" MAPS_ELF_SEC)) {
+			    strcmp(name, ".rel" MAPS_ELF_SEC) &&
+			    strcmp(name, ".rel" LLVM_JT_SIZES_SEC) &&
+			    strcmp(name, ".rel" RODATA_SEC)) {
 				pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n",
 					idx, name, targ_sec_idx,
 					elf_sec_name(obj, elf_sec_by_idx(obj, targ_sec_idx)) ?: "<?>");
@@ -3976,6 +4098,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 			sec_desc->sec_type = SEC_BSS;
 			sec_desc->shdr = sh;
 			sec_desc->data = data;
+
+		} else if (sh->sh_type == SHT_LLVM_JT_SIZES) {
+			obj->efile.jt_sizes_data = data;
+			obj->efile.jt_sizes_data_shndx = idx;
 		} else {
 			pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name,
 				(size_t)sh->sh_size);
@@ -6078,6 +6204,59 @@ static void poison_kfunc_call(struct bpf_program *prog, int relo_idx,
 	insn->imm = POISON_CALL_KFUNC_BASE + ext_idx;
 }
 
+static bool map_fd_is_rodata(struct bpf_object *obj, int map_fd)
+{
+	return map_fd == obj->rodata_map_fd;
+}
+
+static int create_jt_map(const struct jt *jt, int adjust_off)
+{
+	static union bpf_attr attr = {
+		.map_type = BPF_MAP_TYPE_INSN_SET,
+		.key_size = 4,
+		.value_size = sizeof(struct bpf_insn_set_value),
+		.max_entries = 0,
+	};
+	struct bpf_insn_set_value val = {};
+	int map_fd;
+	int err;
+	__u32 i;
+
+	attr.max_entries = jt->jump_target_cnt;
+
+	map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+	if (map_fd < 0)
+		return map_fd;
+
+	for (i = 0; i < jt->jump_target_cnt; i++) {
+		val.xlated_off = jt->jump_target[i] + adjust_off;
+		err = bpf_map_update_elem(map_fd, &i, &val, 0);
+		if (err) {
+			close(map_fd);
+			return err;
+		}
+	}
+
+	err = bpf_map_freeze(map_fd);
+	if (err) {
+		close(map_fd);
+		return err;
+	}
+
+	return map_fd;
+}
+
+static int subprog_insn_off(struct bpf_program *prog, int insn_idx)
+{
+	int i;
+
+	for (i = prog->subprog_cnt - 1; i >= 0; i--)
+		if (insn_idx >= prog->subprog_offset[i])
+			return prog->subprog_offset[i] - prog->subprog_sec_offst[i];
+
+	return -prog->sec_insn_off;
+}
+
 /* Relocate data references within program code:
  *  - map references;
  *  - global variable references;
@@ -6115,8 +6294,31 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
 				insn[0].src_reg = BPF_PSEUDO_MAP_IDX_VALUE;
 				insn[0].imm = relo->map_idx;
 			} else if (map->autocreate) {
+				const struct jt *jt;
+				int ajdust_insn_off;
+				int map_fd = map->fd;
+
+				/*
+				 * Set imm to proper map file descriptor. In normal case,
+				 * it is just map->fd. However, in case of a jump table,
+				 * a new map file descriptor should be created
+				 */
+				jt = bpf_object__find_jt(obj, insn[1].imm / 8);
+				if (map_fd_is_rodata(obj, map_fd) && !IS_ERR(jt)) {
+					ajdust_insn_off = subprog_insn_off(prog, relo->insn_idx);
+					map_fd = create_jt_map(jt, ajdust_insn_off);
+					if (map_fd < 0) {
+						pr_warn("prog '%s': relo #%d: failed to create a jt map for .rodata offset %u\n",
+								prog->name, i, insn[1].imm / 8);
+						return map_fd;
+					}
+
+					/* a new map is created, so offset should be 0 */
+					insn[1].imm = 0;
+				}
+
 				insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
-				insn[0].imm = map->fd;
+				insn[0].imm = map_fd;
 			} else {
 				poison_map_ldimm64(prog, i, relo->insn_idx, insn,
 						   relo->map_idx, map);
@@ -6366,36 +6568,58 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
 	return 0;
 }
 
+static int
+bpf_prog__append_subprog_offsets(struct bpf_program *prog, __u32 sec_insn_off, __u32 sub_insn_off)
+{
+	if (prog->subprog_cnt == ARRAY_SIZE(prog->subprog_sec_offst)) {
+		pr_warn("prog '%s': number of subprogs exceeds %zu\n",
+			prog->name, ARRAY_SIZE(prog->subprog_sec_offst));
+		return -E2BIG;
+	}
+
+	prog->subprog_sec_offst[prog->subprog_cnt] = sec_insn_off;
+	prog->subprog_offset[prog->subprog_cnt] = sub_insn_off;
+
+	prog->subprog_cnt += 1;
+	return 0;
+}
+
 static int
 bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
-				struct bpf_program *subprog)
+		struct bpf_program *subprog)
 {
-       struct bpf_insn *insns;
-       size_t new_cnt;
-       int err;
+	struct bpf_insn *insns;
+	size_t new_cnt;
+	int err;
 
-       subprog->sub_insn_off = main_prog->insns_cnt;
+	subprog->sub_insn_off = main_prog->insns_cnt;
 
-       new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
-       insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
-       if (!insns) {
-               pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
-               return -ENOMEM;
-       }
-       main_prog->insns = insns;
-       main_prog->insns_cnt = new_cnt;
+	new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
+	insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
+	if (!insns) {
+		pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
+		return -ENOMEM;
+	}
+	main_prog->insns = insns;
+	main_prog->insns_cnt = new_cnt;
 
-       memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
-              subprog->insns_cnt * sizeof(*insns));
+	memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
+			subprog->insns_cnt * sizeof(*insns));
 
-       pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
-                main_prog->name, subprog->insns_cnt, subprog->name);
+	pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
+			main_prog->name, subprog->insns_cnt, subprog->name);
 
-       /* The subprog insns are now appended. Append its relos too. */
-       err = append_subprog_relos(main_prog, subprog);
-       if (err)
-               return err;
-       return 0;
+	/* The subprog insns are now appended. Append its relos too. */
+	err = append_subprog_relos(main_prog, subprog);
+	if (err)
+		return err;
+
+	err = bpf_prog__append_subprog_offsets(main_prog, subprog->sec_insn_off,
+					       subprog->sub_insn_off);
+	if (err)
+		return err;
+
+	return 0;
 }
 
 static int
@@ -7388,6 +7612,58 @@ static int bpf_object__sanitize_prog(struct bpf_object *obj, struct bpf_program
 	return 0;
 }
 
+static bool insn_is_gotox(struct bpf_insn *insn)
+{
+	return BPF_CLASS(insn->code) == BPF_JMP &&
+	       BPF_OP(insn->code) == BPF_JA &&
+	       BPF_SRC(insn->code) == BPF_X;
+}
+
+/*
+ * This one is too dumb, of course. TBD to make it smarter.
+ */
+static int find_jt_map_fd(struct bpf_program *prog, int insn_idx)
+{
+	struct bpf_insn *insn = &prog->insns[insn_idx];
+	__u8 dst_reg = insn->dst_reg;
+
+	/* TBD: this function is such smart for now that it even ignores this
+	 * register. Instead, it should backtrack the load more carefully.
+	 * (So far even this dumb version works with all selftests.)
+	 */
+	pr_debug("searching for a load instruction which populated dst_reg=r%u\n", dst_reg);
+
+	while (--insn >= prog->insns) {
+		if (insn->code == (BPF_LD|BPF_DW|BPF_IMM))
+			return insn[0].imm;
+	}
+
+	return -ENOENT;
+}
+
+static int bpf_object__patch_gotox(struct bpf_object *obj, struct bpf_program *prog)
+{
+	struct bpf_insn *insn = prog->insns;
+	int map_fd;
+	int i;
+
+	for (i = 0; i < prog->insns_cnt; i++, insn++) {
+		if (!insn_is_gotox(insn))
+			continue;
+
+		if (obj->gen_loader)
+			return -EFAULT;
+
+		map_fd = find_jt_map_fd(prog, i);
+		if (map_fd < 0)
+			return map_fd;
+
+		insn->imm = map_fd;
+	}
+
+	return 0;
+}
+
 static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attach_name,
 				     int *btf_obj_fd, int *btf_type_id);
 
@@ -7931,6 +8207,14 @@ static int bpf_object_prepare_progs(struct bpf_object *obj)
 		if (err)
 			return err;
 	}
+
+	for (i = 0; i < obj->nr_programs; i++) {
+		prog = &obj->programs[i];
+		err = bpf_object__patch_gotox(obj, prog);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
@@ -8063,6 +8347,7 @@ static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf,
 	err = err ? : bpf_object__init_maps(obj, opts);
 	err = err ? : bpf_object_init_progs(obj, opts);
 	err = err ? : bpf_object__collect_relos(obj);
+	err = err ? : bpf_object__collect_jt(obj);
 	if (err)
 		goto out;
 
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 477a3b3389a0..0632d2d812d7 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -64,6 +64,10 @@
 #define SHT_LLVM_ADDRSIG 0x6FFF4C03
 #endif
 
+#ifndef SHT_LLVM_JT_SIZES
+#define SHT_LLVM_JT_SIZES 0x6FFF4C0D
+#endif
+
 /* if libelf is old and doesn't support mmap(), fall back to read() */
 #ifndef ELF_C_READ_MMAP
 #define ELF_C_READ_MMAP ELF_C_READ
diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
index a469e5d4fee7..cbd57ece8594 100644
--- a/tools/lib/bpf/linker.c
+++ b/tools/lib/bpf/linker.c
@@ -28,6 +28,8 @@
 #include "str_error.h"
 
 #define BTF_EXTERN_SEC ".extern"
+#define RODATA_REL_SEC ".rel.rodata"
+#define LLVM_JT_SIZES_REL_SEC ".rel.llvm_jump_table_sizes"
 
 struct src_sec {
 	const char *sec_name;
@@ -178,6 +180,7 @@ static int linker_sanity_check_btf(struct src_obj *obj);
 static int linker_sanity_check_btf_ext(struct src_obj *obj);
 static int linker_fixup_btf(struct src_obj *obj);
 static int linker_append_sec_data(struct bpf_linker *linker, struct src_obj *obj);
+static int linker_append_sec_jt(struct bpf_linker *linker, struct src_obj *obj);
 static int linker_append_elf_syms(struct bpf_linker *linker, struct src_obj *obj);
 static int linker_append_elf_sym(struct bpf_linker *linker, struct src_obj *obj,
 				 Elf64_Sym *sym, const char *sym_name, int src_sym_idx);
@@ -499,6 +502,7 @@ static int bpf_linker_add_file(struct bpf_linker *linker, int fd,
 
 	err = err ?: linker_load_obj_file(linker, &obj);
 	err = err ?: linker_append_sec_data(linker, &obj);
+	err = err ?: linker_append_sec_jt(linker, &obj);
 	err = err ?: linker_append_elf_syms(linker, &obj);
 	err = err ?: linker_append_elf_relos(linker, &obj);
 	err = err ?: linker_append_btf(linker, &obj);
@@ -811,6 +815,9 @@ static int linker_load_obj_file(struct bpf_linker *linker,
 		case SHT_REL:
 			/* relocations */
 			break;
+		case SHT_LLVM_JT_SIZES:
+			/* LLVM jump tables sizes */
+			break;
 		default:
 			pr_warn("unrecognized section #%zu (%s) in %s\n",
 				sec_idx, sec_name, obj->filename);
@@ -899,6 +906,9 @@ static int linker_sanity_check_elf(struct src_obj *obj)
 			break;
 		case SHT_LLVM_ADDRSIG:
 			break;
+		case SHT_LLVM_JT_SIZES:
+			/* LLVM jump tables sizes */
+			break;
 		default:
 			pr_warn("ELF section #%zu (%s) has unrecognized type %zu in %s\n",
 				sec->sec_idx, sec->sec_name, (size_t)sec->shdr->sh_type, obj->filename);
@@ -1022,7 +1032,10 @@ static int linker_sanity_check_elf_relos(struct src_obj *obj, struct src_sec *se
 		return 0;
 
 	/* relocatable section is data or instructions */
-	if (link_sec->shdr->sh_type != SHT_PROGBITS && link_sec->shdr->sh_type != SHT_NOBITS) {
+	if (link_sec->shdr->sh_type != SHT_PROGBITS &&
+	    link_sec->shdr->sh_type != SHT_NOBITS &&
+	    link_sec->shdr->sh_type != SHT_LLVM_JT_SIZES
+		) {
 		pr_warn("ELF relo section #%zu points to invalid section #%zu in %s\n",
 			sec->sec_idx, (size_t)sec->shdr->sh_info, obj->filename);
 		return -EINVAL;
@@ -1351,6 +1364,13 @@ static bool is_relo_sec(struct src_sec *sec)
 	return sec->shdr->sh_type == SHT_REL;
 }
 
+static bool is_jt_sec(struct src_sec *sec)
+{
+	if (!sec || sec->skipped || !sec->shdr)
+		return false;
+	return sec->shdr->sh_type == SHT_LLVM_JT_SIZES;
+}
+
 static int linker_append_sec_data(struct bpf_linker *linker, struct src_obj *obj)
 {
 	int i, err;
@@ -1403,6 +1423,44 @@ static int linker_append_sec_data(struct bpf_linker *linker, struct src_obj *obj
 	return 0;
 }
 
+static int linker_append_sec_jt(struct bpf_linker *linker, struct src_obj *obj)
+{
+	int i, err;
+
+	for (i = 1; i < obj->sec_cnt; i++) {
+		struct src_sec *src_sec;
+		struct dst_sec *dst_sec;
+
+		src_sec = &obj->secs[i];
+		if (!is_jt_sec(src_sec))
+			continue;
+
+		dst_sec = find_dst_sec_by_name(linker, src_sec->sec_name);
+		if (!dst_sec) {
+			dst_sec = add_dst_sec(linker, src_sec->sec_name);
+			if (!dst_sec)
+				return -ENOMEM;
+			err = init_sec(linker, dst_sec, src_sec);
+			if (err) {
+				pr_warn("failed to init section '%s'\n", src_sec->sec_name);
+				return err;
+			}
+		} else if (!secs_match(dst_sec, src_sec)) {
+			pr_warn("ELF sections %s are incompatible\n", src_sec->sec_name);
+			return -EINVAL;
+		}
+
+		/* record mapped section index */
+		src_sec->dst_id = dst_sec->id;
+
+		err = extend_sec(linker, dst_sec, src_sec);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int linker_append_elf_syms(struct bpf_linker *linker, struct src_obj *obj)
 {
 	struct src_sec *symtab = &obj->secs[obj->symtab_sec_idx];
@@ -2272,8 +2330,10 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
 						insn->imm += sec->dst_off / sizeof(struct bpf_insn);
 					else
 						insn->imm += sec->dst_off;
-				} else {
-					pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n");
+				} else if (strcmp(src_sec->sec_name, LLVM_JT_SIZES_REL_SEC) &&
+					   strcmp(src_sec->sec_name, RODATA_REL_SEC)) {
+					pr_warn("relocation against STT_SECTION in section %s is not supported!\n",
+						src_sec->sec_name);
 					return -EINVAL;
 				}
 			}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
                   ` (7 preceding siblings ...)
  2025-06-15  8:59 ` [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps Anton Protopopov
@ 2025-06-15  8:59 ` Anton Protopopov
  2025-06-18  3:24   ` Alexei Starovoitov
  8 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-15  8:59 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song
  Cc: Anton Protopopov

Add selftests for indirect jumps. All the indirect jumps are
generated from C switch statements, so, if compiled by a compiler
which doesn't support indirect jumps, then should pass as well.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 tools/testing/selftests/bpf/Makefile          |   4 +-
 .../selftests/bpf/prog_tests/bpf_goto_x.c     | 127 +++++++
 .../testing/selftests/bpf/progs/bpf_goto_x.c  | 336 ++++++++++++++++++
 3 files changed, 466 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_goto_x.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 97013c49920b..53ec703ba713 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -453,7 +453,9 @@ BPF_CFLAGS = -g -Wall -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN)	\
 	     -I$(abspath $(OUTPUT)/../usr/include)			\
 	     -std=gnu11		 					\
 	     -fno-strict-aliasing 					\
-	     -Wno-compare-distinct-pointer-types
+	     -Wno-compare-distinct-pointer-types			\
+	     -Wno-initializer-overrides					\
+	     #
 # TODO: enable me -Wsign-compare
 
 CLANG_CFLAGS = $(CLANG_SYS_INCLUDES)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c b/tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
new file mode 100644
index 000000000000..15781b6f8249
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_goto_x.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/in6.h>
+#include <linux/udp.h>
+#include <linux/tcp.h>
+
+#include <sys/syscall.h>
+#include <bpf/bpf.h>
+
+#include "bpf_goto_x.skel.h"
+
+static void __test_run(struct bpf_program *prog, void *ctx_in, size_t ctx_size_in)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+			    .ctx_in = ctx_in,
+			    .ctx_size_in = ctx_size_in,
+		   );
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run_opts err");
+}
+
+static void check_simple(struct bpf_goto_x *skel,
+			 struct bpf_program *prog,
+			 __u64 ctx_in,
+			 __u64 expected)
+{
+	skel->bss->ret_user = 0;
+
+	__test_run(prog, &ctx_in, sizeof(ctx_in));
+
+	if (!ASSERT_EQ(skel->bss->ret_user, expected, "skel->bss->ret_user"))
+		return;
+}
+
+static void check_simple_fentry(struct bpf_goto_x *skel,
+				struct bpf_program *prog,
+				__u64 ctx_in,
+				__u64 expected)
+{
+	skel->bss->in_user = ctx_in;
+	skel->bss->ret_user = 0;
+
+	/* trigger */
+	usleep(1);
+
+	if (!ASSERT_EQ(skel->bss->ret_user, expected, "skel->bss->ret_user"))
+		return;
+}
+
+static void check_goto_x_skel(struct bpf_goto_x *skel)
+{
+	int i;
+	__u64 in[]   = {0, 1, 2, 3, 4,  5, 77};
+	__u64 out[]  = {2, 3, 4, 5, 7, 19, 19};
+	__u64 out2[] = {103, 104, 107, 205, 115, 1019, 1019};
+	__u64 in3[]  = {0, 11, 27, 31, 447, 22, 45, 999};
+	__u64 out3[] = {2,  3,  4,  5,   7, 19, 19,  19};
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.simple_test, in[i], out[i]);
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.simple_test2, in[i], out[i]);
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.two_towers, in[i], out2[i]);
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.the_return_of_the_king, in3[i], out3[i]);
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.use_static_global1, in[i], out[i]);
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.use_static_global2, in[i], out[i]);
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.use_nonstatic_global1, in[i], out[i]);
+
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple(skel, skel->progs.use_nonstatic_global2, in[i], out[i]);
+
+	bpf_program__attach(skel->progs.simple_test_other_sec);
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple_fentry(skel, skel->progs.simple_test_other_sec, in[i], out[i]);
+
+	bpf_program__attach(skel->progs.use_static_global_other_sec);
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple_fentry(skel, skel->progs.use_static_global_other_sec, in[i], out[i]);
+
+	bpf_program__attach(skel->progs.use_nonstatic_global_other_sec);
+	for (i = 0; i < ARRAY_SIZE(in); i++)
+		check_simple_fentry(skel, skel->progs.use_nonstatic_global_other_sec, in[i], out[i]);
+}
+
+void goto_x_skel(void)
+{
+	struct bpf_goto_x *skel;
+	int ret;
+
+	skel = bpf_goto_x__open();
+	if (!ASSERT_NEQ(skel, NULL, "bpf_goto_x__open"))
+		return;
+
+	ret = bpf_goto_x__load(skel);
+	if (!ASSERT_OK(ret, "bpf_goto_x__load"))
+		return;
+
+	check_goto_x_skel(skel);
+
+	bpf_goto_x__destroy(skel);
+}
+
+void test_bpf_goto_x(void)
+{
+	if (test__start_subtest("goto_x_skel"))
+		goto_x_skel();
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_goto_x.c b/tools/testing/selftests/bpf/progs/bpf_goto_x.c
new file mode 100644
index 000000000000..ebe4239cfd24
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_goto_x.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+
+__u64 in_user;
+__u64 ret_user;
+
+struct simple_ctx {
+	__u64 x;
+};
+
+SEC("syscall")
+int simple_test(struct simple_ctx *ctx)
+{
+	switch (ctx->x) {
+	case 0:
+		bpf_printk("%lu\n", ctx->x + 1);
+		ret_user = 2;
+		break;
+	case 1:
+		bpf_printk("%lu\n", ctx->x + 7);
+		ret_user = 3;
+		break;
+	case 2:
+		bpf_printk("%lu\n", ctx->x + 9);
+		ret_user = 4;
+		break;
+	case 3:
+		bpf_printk("%lu\n", ctx->x + 11);
+		ret_user = 5;
+		break;
+	case 4:
+		bpf_printk("%lu\n", ctx->x + 17);
+		ret_user = 7;
+		break;
+	default:
+		bpf_printk("%lu\n", ctx->x + 177);
+		ret_user = 19;
+		break;
+	}
+
+	return 0;
+}
+
+SEC("syscall")
+int simple_test2(struct simple_ctx *ctx)
+{
+	switch (ctx->x) {
+	case 0:
+		bpf_printk("%lu\n", ctx->x + 1);
+		ret_user = 2;
+		break;
+	case 1:
+		bpf_printk("%lu\n", ctx->x + 7);
+		ret_user = 3;
+		break;
+	case 2:
+		bpf_printk("%lu\n", ctx->x + 9);
+		ret_user = 4;
+		break;
+	case 3:
+		bpf_printk("%lu\n", ctx->x + 11);
+		ret_user = 5;
+		break;
+	case 4:
+		bpf_printk("%lu\n", ctx->x + 17);
+		ret_user = 7;
+		break;
+	default:
+		bpf_printk("%lu\n", ctx->x + 177);
+		ret_user = 19;
+		break;
+	}
+
+	return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int simple_test_other_sec(struct pt_regs *ctx)
+{
+	__u64 x = in_user;
+
+	switch (x) {
+	case 0:
+		bpf_printk("%lu\n", x + 1);
+		ret_user = 2;
+		break;
+	case 1:
+		bpf_printk("%lu\n", x + 7);
+		ret_user = 3;
+		break;
+	case 2:
+		bpf_printk("%lu\n", x + 9);
+		ret_user = 4;
+		break;
+	case 3:
+		bpf_printk("%lu\n", x + 11);
+		ret_user = 5;
+		break;
+	case 4:
+		bpf_printk("%lu\n", x + 17);
+		ret_user = 7;
+		break;
+	default:
+		bpf_printk("%lu\n", x + 177);
+		ret_user = 19;
+		break;
+	}
+
+	return 0;
+}
+
+SEC("syscall")
+int two_towers(struct simple_ctx *ctx)
+{
+	switch (ctx->x) {
+	case 0:
+		bpf_printk("%lu\n", ctx->x + 1);
+		ret_user = 2;
+		break;
+	case 1:
+		bpf_printk("%lu\n", ctx->x + 7);
+		ret_user = 3;
+		break;
+	case 2:
+		bpf_printk("%lu\n", ctx->x + 9);
+		ret_user = 4;
+		break;
+	case 3:
+		bpf_printk("%lu\n", ctx->x + 11);
+		ret_user = 5;
+		break;
+	case 4:
+		bpf_printk("%lu\n", ctx->x + 17);
+		ret_user = 7;
+		break;
+	default:
+		bpf_printk("%lu\n", ctx->x + 177);
+		ret_user = 19;
+		break;
+	}
+
+	switch (ctx->x + !!ret_user) {
+	case 0: /* never happens */
+		bpf_printk("%lu\n", ctx->x + 1);
+		ret_user = 102;
+		break;
+	case 1:
+		bpf_printk("%lu\n", ctx->x + 7);
+		ret_user = 103;
+		break;
+	case 2:
+		bpf_printk("%lu\n", ctx->x + 9);
+		ret_user = 104;
+		break;
+	case 3:
+		bpf_printk("%lu\n", ctx->x + 11);
+		ret_user = 107;
+		break;
+	case 4:
+		bpf_printk("%lu\n", ctx->x + 11);
+		ret_user = 205;
+		break;
+	case 5:
+		bpf_printk("%lu\n", ctx->x + 11);
+		ret_user = 115;
+		break;
+	default:
+		bpf_printk("%lu\n", ctx->x + 177);
+		ret_user = 1019;
+		break;
+	}
+
+	return 0;
+}
+
+/* this actually creates a big insn_set map */
+SEC("syscall")
+int the_return_of_the_king(struct simple_ctx *ctx)
+{
+	switch (ctx->x) {
+	case 0:
+		bpf_printk("%lu\n", ctx->x + 1);
+		ret_user = 2;
+		break;
+	case 11:
+		bpf_printk("%lu\n", ctx->x + 7);
+		ret_user = 3;
+		break;
+	case 27:
+		bpf_printk("%lu\n", ctx->x + 9);
+		ret_user = 4;
+		break;
+	case 31:
+		bpf_printk("%lu\n", ctx->x + 11);
+		ret_user = 5;
+		break;
+	case 447:
+		bpf_printk("%lu\n", ctx->x + 17);
+		ret_user = 7;
+		break;
+	default:
+		bpf_printk("%lu\n", ctx->x + 177);
+		ret_user = 19;
+		break;
+	}
+
+	return 0;
+}
+
+/* Just to introduce some non-zero offsets in .text */
+static __noinline int i_am_a_little_tiny_foo(volatile struct simple_ctx *ctx __arg_ctx)
+{
+	if (ctx)
+		return 1;
+	else
+		return 13;
+}
+
+SEC("syscall") int just_me(struct simple_ctx *ctx)
+{
+	ret_user = 0;
+	return i_am_a_little_tiny_foo(ctx);
+}
+
+static __noinline int __static_global(__u64 x)
+{
+	switch (x) {
+	case 0:
+		bpf_printk("%lu\n", x + 1);
+		ret_user = 2;
+		break;
+	case 1:
+		bpf_printk("%lu\n", x + 7);
+		ret_user = 3;
+		break;
+	case 2:
+		bpf_printk("%lu\n", x + 9);
+		ret_user = 4;
+		break;
+	case 3:
+		bpf_printk("%lu\n", x + 11);
+		ret_user = 5;
+		break;
+	case 4:
+		bpf_printk("%lu\n", x + 17);
+		ret_user = 7;
+		break;
+	default:
+		bpf_printk("%lu\n", x + 177);
+		ret_user = 19;
+		break;
+	}
+
+	return 0;
+}
+
+SEC("syscall")
+int use_static_global1(struct simple_ctx *ctx)
+{
+	ret_user = 0;
+	return __static_global(ctx->x);
+}
+
+SEC("syscall")
+int use_static_global2(struct simple_ctx *ctx)
+{
+	ret_user = 0;
+	bpf_printk("%lu\n", ctx->x + 1);
+	return __static_global(ctx->x);
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int use_static_global_other_sec(void *ctx)
+{
+	return __static_global(in_user);
+}
+
+__noinline int __gobble_till_you_global(__u64 x)
+{
+	switch (x) {
+	case 0:
+		bpf_printk("%lu\n", x + 1);
+		ret_user = 2;
+		break;
+	case 1:
+		bpf_printk("%lu\n", x + 7);
+		ret_user = 3;
+		break;
+	case 2:
+		bpf_printk("%lu\n", x + 9);
+		ret_user = 4;
+		break;
+	case 3:
+		bpf_printk("%lu\n", x + 11);
+		ret_user = 5;
+		break;
+	case 4:
+		bpf_printk("%lu\n", x + 17);
+		ret_user = 7;
+		break;
+	default:
+		bpf_printk("%lu\n", x + 177);
+		ret_user = 19;
+		break;
+	}
+
+	return 0;
+}
+
+SEC("syscall")
+int use_nonstatic_global1(struct simple_ctx *ctx)
+{
+	ret_user = 0;
+	return __gobble_till_you_global(ctx->x);
+}
+
+SEC("syscall")
+int use_nonstatic_global2(struct simple_ctx *ctx)
+{
+	ret_user = 0;
+	bpf_printk("%lu\n", ctx->x + 1);
+	return __gobble_till_you_global(ctx->x);
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int use_nonstatic_global_other_sec(void *ctx)
+{
+	return __gobble_till_you_global(in_user);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 4/9] bpf, x86: allow indirect jumps to r8...r15
  2025-06-15  8:59 ` [RFC bpf-next 4/9] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
@ 2025-06-17 19:41   ` Alexei Starovoitov
  2025-06-18 14:28     ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-17 19:41 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> Currently, the emit_indirect_jump() function only accepts one of the
> RAX, RCX, ..., RBP registers as the destination. Prepare it to accept
> R8, R9, ..., R15 as well. This is necessary to enable indirect jumps
> support in eBPF.
>
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 923c38f212dc..37dc83d91832 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -659,7 +659,19 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
>
>  #define EMIT_LFENCE()  EMIT3(0x0F, 0xAE, 0xE8)
>
> -static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
> +static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)

Instead of adding bool flag make reg to be bpf reg
instead of x86 reg, tweak the signature to
emit_indirect_jump(..., u32 reg, ..),
and add is_ereg(reg) inside.

Also drop RFC tag next time. Let CI do the work.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set
  2025-06-15  8:59 ` [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set Anton Protopopov
@ 2025-06-18  0:57   ` Eduard Zingerman
  2025-06-18  2:16     ` Alexei Starovoitov
  2025-06-19 18:55     ` Anton Protopopov
  0 siblings, 2 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-06-18  0:57 UTC (permalink / raw)
  To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:

Meta: "instruction set" is a super confusing name, at-least for me the
      first thought is about actual set of instructions supported by
      some h/w. instruction_info? instruction_offset? just
      "iset"/"ioffset"?

[...]

> On map creation/initialization, before loading the program, each
> element of the map should be initialized to point to an instruction
> offset within the program. Before the program load such maps should
> be made frozen. After the program verification xlated and jitted
> offsets can be read via the bpf(2) syscall.

I think such maps would be a bit more ergonomic it original
instruction index would be saved as well, e.g:

  (original_offset, xlated_offset, jitted_offset)

Otherwise user would have to recover original offset from some
external mapping. This information is stored in orig_xlated_off
anyway.

[...]

> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 15672cb926fc..923c38f212dc 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1615,6 +1615,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image

[...]

> @@ -2642,6 +2645,14 @@ st:			if (is_imm8(insn->off))
>  				return -EFAULT;
>  			}
>  			memcpy(rw_image + proglen, temp, ilen);
> +
> +			/*
> +			 * Instruction sets need to know how xlated code
> +			 * maps to jited code
> +			 */
> +			abs_xlated_off = bpf_prog->aux->subprog_start + i - 1 - adjust_off;

Nit: `adjust_off` is a bit hard to follow, maybe move the following:

	abs_xlated_off = bpf_prog->aux->subprog_start + i - 1;

     to the beginning of the loop?

> +			bpf_prog_update_insn_ptr(bpf_prog, abs_xlated_off, proglen, ilen,
> +						 jmp_offset, image + proglen);

Nit: initialize `jmp_offset` at each loop iteration to 0?
     otherwise it would denote jump offset of the last processed
     jump instruction for all following non-jump instructions.

>  		}
>  		proglen += ilen;
>  		addrs[i] = proglen;
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 8189f49e43d6..008bcd44c60e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3596,4 +3596,25 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
>  	return prog->aux->func_idx != 0;
>  }
>
> +int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog);
> +int bpf_insn_set_ready(struct bpf_map *map);
> +void bpf_insn_set_release(struct bpf_map *map);
> +void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
> +void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> +
> +struct bpf_insn_ptr {

Could you please add comments describing each field?
E.g.: "address of the instruction in the jitted image",
      "for jump instructions, the relative offset of the jump target",
      "index of the original instruction",
      "original value of the corresponding bpf_insn_set_value.xlated_off".

> +	void *jitted_ip;
> +	u32 jitted_len;
> +	int jitted_jump_offset;
> +	struct bpf_insn_set_value user_value; /* userspace-visible value */
> +	u32 orig_xlated_off;
> +};

[...]

> diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> new file mode 100644

[...]

> +static int insn_set_check_btf(const struct bpf_map *map,
> +			      const struct btf *btf,
> +			      const struct btf_type *key_type,
> +			      const struct btf_type *value_type)
> +{
> +	u32 int_data;
> +
> +	if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> +		return -EINVAL;
> +
> +	if (BTF_INFO_KIND(value_type->info) != BTF_KIND_INT)
> +		return -EINVAL;
> +
> +	int_data = *(u32 *)(key_type + 1);

Nit: use btf_type_int() accessor?

> +	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
> +		return -EINVAL;
> +
> +	int_data = *(u32 *)(value_type + 1);
> +	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))

Should this check for `BTF_INT_BITS(int_data) != 64`?

> +		return -EINVAL;
> +
> +	return 0;
> +}

[...]

> +int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
> +{
> +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> +	int i;
> +
> +	if (!is_frozen(map))
> +		return -EINVAL;
> +
> +	if (!valid_offsets(insn_set, prog))
> +		return -EINVAL;
> +
> +	/*
> +	 * There can be only one program using the map
> +	 */
> +	mutex_lock(&insn_set->state_mutex);
> +	if (insn_set->state != INSN_SET_STATE_FREE) {
> +		mutex_unlock(&insn_set->state_mutex);
> +		return -EBUSY;
> +	}
> +	insn_set->state = INSN_SET_STATE_INIT;
> +	mutex_unlock(&insn_set->state_mutex);
> +
> +	/*
> +	 * Reset all the map indexes to the original values.  This is needed,
> +	 * e.g., when a replay of verification with different log level should
> +	 * be performed.
> +	 */
> +	for (i = 0; i < map->max_entries; i++)
> +		insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
> +
> +	return 0;
> +}
> +
> +int bpf_insn_set_ready(struct bpf_map *map)

What is the reasoning for not needing to take the mutex here and in
the bpf_insn_set_release?

> +{
> +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> +	int i;
> +
> +	for (i = 0; i < map->max_entries; i++) {
> +		if (insn_set->ptrs[i].user_value.xlated_off == INSN_DELETED)
> +			continue;
> +		if (!insn_set->ips[i])
> +			return -EFAULT;
> +	}
> +
> +	insn_set->state = INSN_SET_STATE_READY;
> +	return 0;
> +}
> +
> +void bpf_insn_set_release(struct bpf_map *map)
> +{
> +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> +
> +	insn_set->state = INSN_SET_STATE_FREE;
> +}

[...]

(... I'll continue reading through patch-set a bit later ...)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set
  2025-06-18  0:57   ` Eduard Zingerman
@ 2025-06-18  2:16     ` Alexei Starovoitov
  2025-06-19 18:57       ` Anton Protopopov
  2025-06-19 18:55     ` Anton Protopopov
  1 sibling, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18  2:16 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Tue, Jun 17, 2025 at 5:57 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
>
> Meta: "instruction set" is a super confusing name, at-least for me the
>       first thought is about actual set of instructions supported by
>       some h/w. instruction_info? instruction_offset? just
>       "iset"/"ioffset"?

BPF_MAP_TYPE_INSN_ARRAY ?

and in the code use either insn_array or iarray

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps
  2025-06-15  8:59 ` [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps Anton Protopopov
@ 2025-06-18  3:06   ` Alexei Starovoitov
  2025-06-19 19:57     ` Anton Protopopov
  2025-06-19 19:58     ` Anton Protopopov
  2025-06-18 11:03   ` Eduard Zingerman
  1 sibling, 2 replies; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18  3:06 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> Add support for a new instruction
>
>     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
>
> which does an indirect jump to a location stored in Rx. The map M
> is an instruction set map containing all possible targets for this
> particular jump.
>
> On the jump the register Rx should have type PTR_TO_INSN. This new
> type assures that the Rx register contains a value (or a range of
> values) loaded from the map M. Typically, this will be done like this
> The code above could have been generated for a switch statement with
> (e.g., this could be a switch statement compiled with LLVM):
>
>     0:   r3 = r1                    # "switch (r3)"
>     1:   if r3 > 0x13 goto +0x666   # check r3 boundaries
>     2:   r3 <<= 0x3                 # r3 is void*, point to an address
>     3:   r1 = 0xbeef ll             # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M

Something doesn't add up.
Since you made libbpf to tag this ld_imm64 as BPF_PSEUDO_MAP_VALUE
which insn (map key) does it point to ?
In case of global data it's key==0.
Here it's 1st element of insn_array ?

>     5:   r1 += r3                   # r1 inherits boundaries from r3
>     6:   r1 = *(u64 *)(r1 + 0x0)    # r1 now has type INSN_TO_PTR
>     7:   gotox r1[,imm=fd(M)]       # verifier checks that M == r1->map_ptr
>
> On building the jump graph, and the static analysis, a new function
> of the INSN_SET is used: bpf_insn_set_iter_xlated_offset(map, n).
> It lets to iterate over unique slots in an instruction set (equal
> items can be generated, e.g., for a sparse jump table for a switch,
> where not all possible branches are taken).
>
> Instruction (3) above loads an address of the first element of the
> map. From BPF point of view, the map is a jump table in native
> architecture, e.g., an array of jump targets. This patch allows
> to grab such an address and then later to adjust an offset, like in
> instruction (5). A value of such type can be dereferenced once to
> create a PTR_TO_INSN, see instruction (6).
>
> When building the config, the high 16 bytes of the insn_state are
> used, so this patch (theoretically) supports jump tables of up to
> 2^16 slots.
>
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> ---
>  arch/x86/net/bpf_jit_comp.c  |   7 ++
>  include/linux/bpf.h          |   2 +
>  include/linux/bpf_verifier.h |   4 +
>  kernel/bpf/bpf_insn_set.c    |  71 ++++++++++++-
>  kernel/bpf/core.c            |   2 +
>  kernel/bpf/verifier.c        | 198 ++++++++++++++++++++++++++++++++++-
>  6 files changed, 278 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 37dc83d91832..d20f6775605d 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -2520,6 +2520,13 @@ st:                      if (is_imm8(insn->off))
>
>                         break;
>
> +               case BPF_JMP | BPF_JA | BPF_X:
> +               case BPF_JMP32 | BPF_JA | BPF_X:
> +                       emit_indirect_jump(&prog,
> +                                          reg2hex[insn->dst_reg],
> +                                          is_ereg(insn->dst_reg),
> +                                          image + addrs[i - 1]);
> +                       break;
>                 case BPF_JMP | BPF_JA:
>                 case BPF_JMP32 | BPF_JA:
>                         if (BPF_CLASS(insn->code) == BPF_JMP) {
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 008bcd44c60e..3c5eaea2b476 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -952,6 +952,7 @@ enum bpf_reg_type {
>         PTR_TO_ARENA,
>         PTR_TO_BUF,              /* reg points to a read/write buffer */
>         PTR_TO_FUNC,             /* reg points to a bpf program function */
> +       PTR_TO_INSN,             /* reg points to a bpf program instruction */
>         CONST_PTR_TO_DYNPTR,     /* reg points to a const struct bpf_dynptr */
>         __BPF_REG_TYPE_MAX,
>
> @@ -3601,6 +3602,7 @@ int bpf_insn_set_ready(struct bpf_map *map);
>  void bpf_insn_set_release(struct bpf_map *map);
>  void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
>  void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no);
>
>  struct bpf_insn_ptr {
>         void *jitted_ip;
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 84b5e6b25c52..80d9afcca488 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -229,6 +229,10 @@ struct bpf_reg_state {
>         enum bpf_reg_liveness live;
>         /* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
>         bool precise;
> +
> +       /* Used to track boundaries of a PTR_TO_INSN */
> +       u32 min_index;
> +       u32 max_index;

This is no go. We cannot grow bpf_reg_state.
Find a way to reuse fields without increasing the size.

>  };
>
>  enum bpf_stack_slot_type {
> diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> index c20e99327118..316cecad60a9 100644
> --- a/kernel/bpf/bpf_insn_set.c
> +++ b/kernel/bpf/bpf_insn_set.c
> @@ -9,6 +9,8 @@ struct bpf_insn_set {
>         struct bpf_map map;
>         struct mutex state_mutex;
>         int state;
> +       u32 **unique_offsets;
> +       u32 unique_offsets_cnt;
>         long *ips;
>         DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
>  };
> @@ -50,6 +52,7 @@ static void insn_set_free(struct bpf_map *map)
>  {
>         struct bpf_insn_set *insn_set = cast_insn_set(map);
>
> +       kfree(insn_set->unique_offsets);
>         kfree(insn_set->ips);
>         bpf_map_area_free(insn_set);
>  }
> @@ -69,6 +72,12 @@ static struct bpf_map *insn_set_alloc(union bpf_attr *attr)
>                 return ERR_PTR(-ENOMEM);
>         }
>
> +       insn_set->unique_offsets = kzalloc(sizeof(long) * attr->max_entries, GFP_KERNEL);
> +       if (!insn_set->unique_offsets) {
> +               insn_set_free(&insn_set->map);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
>         bpf_map_init_from_attr(&insn_set->map, attr);
>
>         mutex_init(&insn_set->state_mutex);
> @@ -165,10 +174,25 @@ static u64 insn_set_mem_usage(const struct bpf_map *map)
>         u64 extra_size = 0;
>
>         extra_size += sizeof(long) * map->max_entries; /* insn_set->ips */
> +       extra_size += 4 * map->max_entries; /* insn_set->unique_offsets */
>
>         return insn_set_alloc_size(map->max_entries) + extra_size;
>  }
>
> +static int insn_set_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
> +{
> +       struct bpf_insn_set *insn_set = cast_insn_set(map);
> +
> +       /* for now, just reject all such loads */
> +       if (off > 0)
> +               return -EINVAL;

I bet it's easy enough to make llvm generate such code,
so this needs to be supported sooner than later.

> +
> +       /* from BPF's point of view, this map is a jump table */
> +       *imm = (unsigned long)insn_set->ips;
> +
> +       return 0;
> +}
> +
>  BTF_ID_LIST_SINGLE(insn_set_btf_ids, struct, bpf_insn_set)
>
>  const struct bpf_map_ops insn_set_map_ops = {
> @@ -181,6 +205,7 @@ const struct bpf_map_ops insn_set_map_ops = {
>         .map_delete_elem = insn_set_delete_elem,
>         .map_check_btf = insn_set_check_btf,
>         .map_mem_usage = insn_set_mem_usage,
> +       .map_direct_value_addr = insn_set_map_direct_value_addr,
>         .map_btf_id = &insn_set_btf_ids[0],
>  };
>
> @@ -217,6 +242,37 @@ static inline bool valid_offsets(const struct bpf_insn_set *insn_set,
>         return true;
>  }
>
> +static int cmp_unique_offsets(const void *a, const void *b)
> +{
> +       return *(u32 *)a - *(u32 *)b;
> +}
> +
> +static int bpf_insn_set_init_unique_offsets(struct bpf_insn_set *insn_set)
> +{
> +       u32 cnt = insn_set->map.max_entries, ucnt = 1;
> +       u32 **off = insn_set->unique_offsets;
> +       int i;
> +
> +       /* [0,3,2,4,6,5,5,5,1,1,0,0] */
> +       for (i = 0; i < cnt; i++)
> +               off[i] = &insn_set->ptrs[i].user_value.xlated_off;
> +
> +       /* [0,0,0,1,1,2,3,4,5,5,5,6] */
> +       sort(off, cnt, sizeof(off[0]), cmp_unique_offsets, NULL);
> +
> +       /*
> +        * [0,1,2,3,4,5,6,x,x,x,x,x]
> +        *  \.........../
> +        *    unique_offsets_cnt
> +        */
> +       for (i = 1; i < cnt; i++)
> +               if (*off[i] != *off[ucnt-1])
> +                       off[ucnt++] = off[i];
> +
> +       insn_set->unique_offsets_cnt = ucnt;
> +       return 0;
> +}


Why bother with this optimization in the kernel?
Shouldn't libbpf give unique already?

> +
>  int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
>  {
>         struct bpf_insn_set *insn_set = cast_insn_set(map);
> @@ -247,7 +303,10 @@ int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
>         for (i = 0; i < map->max_entries; i++)
>                 insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
>
> -       return 0;
> +       /*
> +        * Prepare a set of unique offsets
> +        */
> +       return bpf_insn_set_init_unique_offsets(insn_set);
>  }
>
>  int bpf_insn_set_ready(struct bpf_map *map)
> @@ -336,3 +395,13 @@ void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
>                 }
>         }
>  }
> +
> +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no)
> +{
> +       struct bpf_insn_set *insn_set = cast_insn_set(map);
> +
> +       if (iter_no >= insn_set->unique_offsets_cnt)
> +               return -ENOENT;
> +
> +       return *insn_set->unique_offsets[iter_no];
> +}
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index e536a34a32c8..058f5f463b74 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -1706,6 +1706,8 @@ bool bpf_opcode_in_insntable(u8 code)
>                 [BPF_LD | BPF_IND | BPF_B] = true,
>                 [BPF_LD | BPF_IND | BPF_H] = true,
>                 [BPF_LD | BPF_IND | BPF_W] = true,
> +               [BPF_JMP | BPF_JA | BPF_X] = true,
> +               [BPF_JMP32 | BPF_JA | BPF_X] = true,
>                 [BPF_JMP | BPF_JCOND] = true,
>         };
>  #undef BPF_INSN_3_TBL
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8ac9a0b5af53..fba553f844f1 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -206,6 +206,7 @@ static int ref_set_non_owning(struct bpf_verifier_env *env,
>  static void specialize_kfunc(struct bpf_verifier_env *env,
>                              u32 func_id, u16 offset, unsigned long *addr);
>  static bool is_trusted_reg(const struct bpf_reg_state *reg);
> +static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr);
>
>  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
>  {
> @@ -5648,6 +5649,19 @@ static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
>         return 0;
>  }
>
> +static int check_insn_set_mem_access(struct bpf_verifier_env *env,
> +                                    const struct bpf_map *map,
> +                                    int off, int size, u32 mem_size)
> +{
> +       if ((off < 0) || (off % sizeof(long)) || (off/sizeof(long) >= map->max_entries))
> +               return -EACCES;
> +
> +       if (mem_size != 8 || size != 8)
> +               return -EACCES;
> +
> +       return 0;
> +}
> +
>  /* check read/write into memory region (e.g., map value, ringbuf sample, etc) */
>  static int __check_mem_access(struct bpf_verifier_env *env, int regno,
>                               int off, int size, u32 mem_size,
> @@ -5666,6 +5680,10 @@ static int __check_mem_access(struct bpf_verifier_env *env, int regno,
>                         mem_size, off, size);
>                 break;
>         case PTR_TO_MAP_VALUE:
> +               if (reg->map_ptr->map_type == BPF_MAP_TYPE_INSN_SET &&
> +                   check_insn_set_mem_access(env, reg->map_ptr, off, size, mem_size) == 0)
> +                       return 0;

Don't hack it like this.
If you're reusing PTR_TO_MAP_VALUE for this then set mem_size correctly
early on.

>                 verbose(env, "invalid access to map value, value_size=%d off=%d size=%d\n",
>                         mem_size, off, size);
>                 break;
> @@ -7713,12 +7731,18 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>  static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
>                              bool allow_trust_mismatch);
>
> +static bool map_is_insn_set(struct bpf_map *map)
> +{
> +       return map && map->map_type == BPF_MAP_TYPE_INSN_SET;
> +}
> +
>  static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                           bool strict_alignment_once, bool is_ldsx,
>                           bool allow_trust_mismatch, const char *ctx)
>  {
>         struct bpf_reg_state *regs = cur_regs(env);
>         enum bpf_reg_type src_reg_type;
> +       struct bpf_map *map_ptr_copy = NULL;
>         int err;
>
>         /* check src operand */
> @@ -7733,6 +7757,9 @@ static int /(struct bpf_verifier_env *env, struct bpf_insn *insn,
>
>         src_reg_type = regs[insn->src_reg].type;
>
> +       if (src_reg_type == PTR_TO_MAP_VALUE && map_is_insn_set(regs[insn->src_reg].map_ptr))
> +               map_ptr_copy = regs[insn->src_reg].map_ptr;
> +
>         /* Check if (src_reg + off) is readable. The state of dst_reg will be
>          * updated by this call.
>          */
> @@ -7743,6 +7770,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                                        allow_trust_mismatch);
>         err = err ?: reg_bounds_sanity_check(env, &regs[insn->dst_reg], ctx);
>
> +       if (map_ptr_copy) {
> +               regs[insn->dst_reg].type = PTR_TO_INSN;
> +               regs[insn->dst_reg].map_ptr = map_ptr_copy;
> +               regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
> +               regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
> +       }

Not pretty. Let's add another argument to map_direct_value_addr()
and pass regs[value_regno] to it,
so that callback can set the reg.type correctly instead
of defaulting to SCALAR_VALUE like it does today.

Then the callback for insn_array will set it to PTR_TO_INSN.

> +
>         return err;
>  }
>
> @@ -15296,6 +15330,22 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
>                 return 0;
>         }
>
> +       if (dst_reg->type == PTR_TO_MAP_VALUE && map_is_insn_set(dst_reg->map_ptr)) {
> +               if (opcode != BPF_ADD) {
> +                       verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> +                               bpf_alu_string[opcode >> 4]);
> +                       return -EACCES;
> +               }
> +               src_reg = &regs[insn->src_reg];
> +               if (src_reg->type != SCALAR_VALUE) {
> +                       verbose(env, "Adding non-scalar R%d to an instruction ptr is prohibited\n",
> +                               insn->src_reg);
> +                       return -EACCES;
> +               }

Here you need to check src_reg tnum to make sure it 8-byte aligned
or I'm missing where it's done.

> +               dst_reg->min_index = src_reg->umin_value / sizeof(long);
> +               dst_reg->max_index = src_reg->umax_value / sizeof(long);

Why bother consuming memory with these two fields if they are derivative ?

> +       }
> +
>         if (dst_reg->type != SCALAR_VALUE)
>                 ptr_reg = dst_reg;
>
> @@ -16797,6 +16847,11 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                         __mark_reg_unknown(env, dst_reg);
>                         return 0;
>                 }
> +               if (map->map_type == BPF_MAP_TYPE_INSN_SET) {
> +                       dst_reg->type = PTR_TO_MAP_VALUE;
> +                       dst_reg->off = aux->map_off;
> +                       return 0;
> +               }
>                 dst_reg->type = PTR_TO_MAP_VALUE;
>                 dst_reg->off = aux->map_off;
>                 WARN_ON_ONCE(map->max_entries != 1);

Instead of copy pasting two lines, make WARN conditional.

> @@ -17552,6 +17607,62 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
>         return 0;
>  }
>
> +#define SET_HIGH(STATE, LAST)  STATE = (STATE & 0xffffU) | ((LAST) << 16)
> +#define GET_HIGH(STATE)                ((u16)((STATE) >> 16))
> +
> +static int gotox_sanity_check(struct bpf_verifier_env *env, int from, int to)
> +{
> +       /* TBD: check that to belongs to the same BPF function && whatever else */
> +
> +       return 0;
> +}
> +
> +static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct bpf_map *map)
> +{
> +       int *insn_stack = env->cfg.insn_stack;
> +       int *insn_state = env->cfg.insn_state;
> +       u16 prev_edge = GET_HIGH(insn_state[t]);
> +       int err;
> +       int w;
> +
> +       w = bpf_insn_set_iter_xlated_offset(map, prev_edge);

I don't quite understand the algorithm.
Pls expand the comment.

Also insn_successors() needs to support gotox as well.
It's used by liveness and by scc.

> +       if (w == -ENOENT)
> +               return DONE_EXPLORING;
> +       else if (w < 0)
> +               return w;
> +
> +       err = gotox_sanity_check(env, t, w);
> +       if (err)
> +               return err;
> +
> +       mark_prune_point(env, t);
> +
> +       if (env->cfg.cur_stack >= env->prog->len)
> +               return -E2BIG;
> +       insn_stack[env->cfg.cur_stack++] = w;
> +
> +       mark_jmp_point(env, w);
> +
> +       SET_HIGH(insn_state[t], prev_edge + 1);
> +       return KEEP_EXPLORING;
> +}
> +
> +/* "conditional jump with N edges" */
> +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> +{
> +       struct bpf_map *map;
> +       int ret;
> +
> +       ret = add_used_map(env, fd, &map);
> +       if (ret < 0)
> +               return ret;
> +
> +       if (map->map_type != BPF_MAP_TYPE_INSN_SET)
> +               return -EINVAL;
> +
> +       return push_goto_x_edge(t, env, map);
> +}
> +
>  /* Visits the instruction at index t and returns one of the following:
>   *  < 0 - an error occurred
>   *  DONE_EXPLORING - the instruction was fully explored
> @@ -17642,8 +17753,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
>                 return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
>
>         case BPF_JA:
> -               if (BPF_SRC(insn->code) != BPF_K)
> -                       return -EINVAL;
> +               if (BPF_SRC(insn->code) == BPF_X)
> +                       return visit_goto_x_insn(t, env, insn->imm);

There should be a check somewhere that checks that insn->imm ==
insn_array_map_fd is the same map during the main pass of the
verifier.

>
>                 if (BPF_CLASS(insn->code) == BPF_JMP)
>                         off = insn->off;
> @@ -17674,6 +17785,13 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
>         }
>  }
>
> +static bool insn_is_gotox(struct bpf_insn *insn)
> +{
> +       return BPF_CLASS(insn->code) == BPF_JMP &&
> +              BPF_OP(insn->code) == BPF_JA &&
> +              BPF_SRC(insn->code) == BPF_X;
> +}
> +
>  /* non-recursive depth-first-search to detect loops in BPF program
>   * loop == back-edge in directed graph
>   */
> @@ -18786,11 +18904,22 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
>                               struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
>  {
>         u16 live_regs = env->insn_aux_data[insn_idx].live_regs_before;
> +       struct bpf_insn *insn;
>         u16 i;
>
>         if (old->callback_depth > cur->callback_depth)
>                 return false;
>
> +       insn = &env->prog->insnsi[insn_idx];
> +       if (insn_is_gotox(insn)) {

func_states_equal() shouldn't look back into insn_idx.
It should use what's in bpf_func_state.

> +               struct bpf_reg_state *old_dst = &old->regs[insn->dst_reg];
> +               struct bpf_reg_state *cur_dst = &cur->regs[insn->dst_reg];
> +
> +               if (old_dst->min_index != cur_dst->min_index ||
> +                   old_dst->max_index != cur_dst->max_index)
> +                       return false;

Doesn't look right. It should properly compare two PTR_TO_INSN.

> +       }
> +
>         for (i = 0; i < MAX_BPF_REG; i++)
>                 if (((1 << i) & live_regs) &&
>                     !regsafe(env, &old->regs[i], &cur->regs[i],
> @@ -19654,6 +19783,55 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
>         return PROCESS_BPF_EXIT;
>  }
>
> +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> +{
> +       struct bpf_verifier_state *other_branch;
> +       struct bpf_reg_state *dst_reg;
> +       struct bpf_map *map;
> +       int xoff;
> +       int err;
> +       u32 i;
> +
> +       /* this map should already have been added */
> +       err = add_used_map(env, insn->imm, &map);

Found that check.
Let's not abuse add_used_map() for that.
Remember map pointer during resolve_pseudo_ldimm64()
in insn_aux_data for gotox insn.
No need to call add_used_map() so late.

> +       if (err < 0)
> +               return err;
> +
> +       dst_reg = reg_state(env, insn->dst_reg);
> +       if (dst_reg->type != PTR_TO_INSN) {
> +               verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> +                               insn->dst_reg, dst_reg->type);
> +               return -EINVAL;
> +       }
> +
> +       if (dst_reg->map_ptr != map) {

and here it would compare dst_reg->map_ptr with env->used_maps[aux->map_index]

> +               verbose(env, "BPF_JA|BPF_X R%d was loaded from map id=%u, expected id=%u\n",
> +                               insn->dst_reg, dst_reg->map_ptr->id, map->id);
> +               return -EINVAL;
> +       }
> +
> +       if (dst_reg->max_index >= map->max_entries)
> +               return -EINVAL;
> +
> +       for (i = dst_reg->min_index + 1; i <= dst_reg->max_index; i++) {
> +               xoff = bpf_insn_set_iter_xlated_offset(map, i);
> +               if (xoff == -ENOENT)
> +                       break;
> +               if (xoff < 0)
> +                       return xoff;
> +
> +               other_branch = push_stack(env, xoff, env->insn_idx, false);
> +               if (!other_branch)
> +                       return -EFAULT;
> +       }
> +
> +       env->insn_idx = bpf_insn_set_iter_xlated_offset(map, dst_reg->min_index);
> +       if (env->insn_idx < 0)
> +               return env->insn_idx;
> +
> +       return 0;
> +}
> +
>  static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
>  {
>         int err;
> @@ -19756,6 +19934,9 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
>
>                         mark_reg_scratched(env, BPF_REG_0);
>                 } else if (opcode == BPF_JA) {
> +                       if (BPF_SRC(insn->code) == BPF_X)
> +                               return check_indirect_jump(env, insn);
> +
>                         if (BPF_SRC(insn->code) != BPF_K ||
>                             insn->src_reg != BPF_REG_0 ||
>                             insn->dst_reg != BPF_REG_0 ||
> @@ -20243,6 +20424,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>                 case BPF_MAP_TYPE_QUEUE:
>                 case BPF_MAP_TYPE_STACK:
>                 case BPF_MAP_TYPE_ARENA:
> +               case BPF_MAP_TYPE_INSN_SET:
>                         break;
>                 default:
>                         verbose(env,
> @@ -20330,10 +20512,11 @@ static int __add_used_map(struct bpf_verifier_env *env, struct bpf_map *map)
>   * its index.
>   * Returns <0 on error, or >= 0 index, on success.
>   */
> -static int add_used_map(struct bpf_verifier_env *env, int fd)
> +static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr)

no need.

>  {
>         struct bpf_map *map;
>         CLASS(fd, f)(fd);
> +       int ret;
>
>         map = __bpf_map_get(f);
>         if (IS_ERR(map)) {
> @@ -20341,7 +20524,10 @@ static int add_used_map(struct bpf_verifier_env *env, int fd)
>                 return PTR_ERR(map);
>         }
>
> -       return __add_used_map(env, map);
> +       ret = __add_used_map(env, map);
> +       if (ret >= 0 && map_ptr)
> +               *map_ptr = map;
> +       return ret;
>  }
>
>  /* find and rewrite pseudo imm in ld_imm64 instructions:
> @@ -20435,7 +20621,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
>                                 break;
>                         }
>
> -                       map_idx = add_used_map(env, fd);
> +                       map_idx = add_used_map(env, fd, NULL);
>                         if (map_idx < 0)
>                                 return map_idx;
>                         map = env->used_maps[map_idx];
> @@ -21459,6 +21645,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>                 func[i]->aux->jited_linfo = prog->aux->jited_linfo;
>                 func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
>                 func[i]->aux->arena = prog->aux->arena;
> +               func[i]->aux->used_maps = env->used_maps;
> +               func[i]->aux->used_map_cnt = env->used_map_cnt;
>                 num_exentries = 0;
>                 insn = func[i]->insnsi;
>                 for (j = 0; j < func[i]->len; j++, insn++) {
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-15  8:59 ` [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps Anton Protopopov
@ 2025-06-18  3:22   ` Alexei Starovoitov
  2025-06-18 15:08     ` Anton Protopopov
  2025-07-08 20:59     ` Eduard Zingerman
  2025-06-18 19:49   ` Eduard Zingerman
  1 sibling, 2 replies; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18  3:22 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> The final line generates an indirect jump. The
> format of the indirect jump instruction supported by BPF is
>
>     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
>
> and, obviously, the map M must be the same map which was used to
> init the register rX. This patch implements this in the following,
> hacky, but so far suitable for all existing use-cases, way. On
> encountering a `gotox` instruction libbpf tracks back to the
> previous direct load from map and stores this map file descriptor
> in the gotox instruction.

...

> +/*
> + * This one is too dumb, of course. TBD to make it smarter.
> + */
> +static int find_jt_map_fd(struct bpf_program *prog, int insn_idx)
> +{
> +       struct bpf_insn *insn = &prog->insns[insn_idx];
> +       __u8 dst_reg = insn->dst_reg;
> +
> +       /* TBD: this function is such smart for now that it even ignores this
> +        * register. Instead, it should backtrack the load more carefully.
> +        * (So far even this dumb version works with all selftests.)
> +        */
> +       pr_debug("searching for a load instruction which populated dst_reg=r%u\n", dst_reg);
> +
> +       while (--insn >= prog->insns) {
> +               if (insn->code == (BPF_LD|BPF_DW|BPF_IMM))
> +                       return insn[0].imm;
> +       }
> +
> +       return -ENOENT;
> +}
> +
> +static int bpf_object__patch_gotox(struct bpf_object *obj, struct bpf_program *prog)
> +{
> +       struct bpf_insn *insn = prog->insns;
> +       int map_fd;
> +       int i;
> +
> +       for (i = 0; i < prog->insns_cnt; i++, insn++) {
> +               if (!insn_is_gotox(insn))
> +                       continue;
> +
> +               if (obj->gen_loader)
> +                       return -EFAULT;
> +
> +               map_fd = find_jt_map_fd(prog, i);
> +               if (map_fd < 0)
> +                       return map_fd;
> +
> +               insn->imm = map_fd;
> +       }

This is obviously broken and cannot be made smarter in libbpf.
It won't be doing data flow analysis.

The only option I see is to teach llvm to tag jmp_table in gotox.
Probably the simplest way is to add the same relo to gotox insn
as for ld_imm64. Then libbpf has a direct way to assign
the same map_fd into both ld_imm64 and gotox.

Uglier alternatives is to redesign the gotox encoding and
drop ld_imm64 and *=8 altogether.
Then gotox jmp_table[R5] will be like jumbo insn that
does *=8 and load inside and JIT emits all that.
But it's ugly and likely has other downsides.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-15  8:59 ` [RFC bpf-next 9/9] selftests/bpf: add selftests for " Anton Protopopov
@ 2025-06-18  3:24   ` Alexei Starovoitov
  2025-06-18 14:49     ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18  3:24 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
> +SEC("syscall")
> +int two_towers(struct simple_ctx *ctx)
> +{
> +       switch (ctx->x) {
>

Not sure why you went with switch() statements everywhere.
Please add few tests with explicit indirect goto
like interpreter does: goto *jumptable[insn->code];

Remove all bpf_printk() too and get easy on names.
i_am_a_little_tiny_foo() sounds funny today, but
it won't be funny at all tomorrow.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps
  2025-06-15  8:59 ` [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps Anton Protopopov
  2025-06-18  3:06   ` Alexei Starovoitov
@ 2025-06-18 11:03   ` Eduard Zingerman
  2025-06-19 20:13     ` Anton Protopopov
  1 sibling, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-06-18 11:03 UTC (permalink / raw)
  To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:

[...]

>     0:   r3 = r1                    # "switch (r3)"
>     1:   if r3 > 0x13 goto +0x666   # check r3 boundaries
>     2:   r3 <<= 0x3                 # r3 is void*, point to an address
>     3:   r1 = 0xbeef ll             # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
>     5:   r1 += r3                   # r1 inherits boundaries from r3
>     6:   r1 = *(u64 *)(r1 + 0x0)    # r1 now has type INSN_TO_PTR
                                                        ^^^^^^^^^^^
                                                        PTR_TO_INSN?

>     7:   gotox r1[,imm=fd(M)]       # verifier checks that M == r1->map_ptr

[...]

> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 37dc83d91832..d20f6775605d 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -2520,6 +2520,13 @@ st:			if (is_imm8(insn->off))
>  
>  			break;
>  
> +		case BPF_JMP | BPF_JA | BPF_X:
> +		case BPF_JMP32 | BPF_JA | BPF_X:

Is it necessary to add both JMP and JMP32 versions?
Do we need to extend e.g. bpf_jit_supports_insn() and report an error
in verifier.c or should we rely on individual jits to report unknown
instruction?

> +			emit_indirect_jump(&prog,
> +					   reg2hex[insn->dst_reg],
> +					   is_ereg(insn->dst_reg),
> +					   image + addrs[i - 1]);
> +			break;
>  		case BPF_JMP | BPF_JA:
>  		case BPF_JMP32 | BPF_JA:
>  			if (BPF_CLASS(insn->code) == BPF_JMP) {
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 008bcd44c60e..3c5eaea2b476 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -952,6 +952,7 @@ enum bpf_reg_type {
>  	PTR_TO_ARENA,
>  	PTR_TO_BUF,		 /* reg points to a read/write buffer */
>  	PTR_TO_FUNC,		 /* reg points to a bpf program function */
> +	PTR_TO_INSN,		 /* reg points to a bpf program instruction */
>  	CONST_PTR_TO_DYNPTR,	 /* reg points to a const struct bpf_dynptr */
>  	__BPF_REG_TYPE_MAX,
>  
> @@ -3601,6 +3602,7 @@ int bpf_insn_set_ready(struct bpf_map *map);
>  void bpf_insn_set_release(struct bpf_map *map);
>  void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
>  void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no);

This is a horrible name:
- this function is not an iterator;
- it is way too long.

Maybe make it a bit more complex but convenient to use, e.g.:

  struct bpf_iarray_iter {
	struct bpf_map *map;
	u32 idx;
  };

  struct bpf_iset_iter bpf_iset_make_iter(struct bpf_map *map, u32 lo, u32 hi);
  bool bpf_iset_iter_next(struct bpf_iarray_iter *it, u32 *offset); // still a horrible name

This would hide the manipulation with unique indices from verifier.c.

?

>  
>  struct bpf_insn_ptr {
>  	void *jitted_ip;
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 84b5e6b25c52..80d9afcca488 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -229,6 +229,10 @@ struct bpf_reg_state {
>  	enum bpf_reg_liveness live;
>  	/* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
>  	bool precise;
> +
> +	/* Used to track boundaries of a PTR_TO_INSN */
> +	u32 min_index;
> +	u32 max_index;

Use {umin,umax}_value instead?

>  };
>  
>  enum bpf_stack_slot_type {
> diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> index c20e99327118..316cecad60a9 100644
> --- a/kernel/bpf/bpf_insn_set.c
> +++ b/kernel/bpf/bpf_insn_set.c
> @@ -9,6 +9,8 @@ struct bpf_insn_set {
>  	struct bpf_map map;
>  	struct mutex state_mutex;
>  	int state;
> +	u32 **unique_offsets;

Why is this a pointer to pointer?
bpf_insn_set_iter_xlated_offset() is only used during check_cfg() and
main verification. At that point no instruction movement occurred yet,
so no need to track `&insn_set->ptrs[i].user_value.xlated_off`?

> +	u32 unique_offsets_cnt;
>  	long *ips;
>  	DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
>  };

[...]

> @@ -15296,6 +15330,22 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
>  		return 0;
>  	}
>  
> +	if (dst_reg->type == PTR_TO_MAP_VALUE && map_is_insn_set(dst_reg->map_ptr)) {
> +		if (opcode != BPF_ADD) {
> +			verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> +				bpf_alu_string[opcode >> 4]);
> +			return -EACCES;
> +		}
> +		src_reg = &regs[insn->src_reg];
> +		if (src_reg->type != SCALAR_VALUE) {
> +			verbose(env, "Adding non-scalar R%d to an instruction ptr is prohibited\n",
> +				insn->src_reg);
> +			return -EACCES;
> +		}
> +		dst_reg->min_index = src_reg->umin_value / sizeof(long);
> +		dst_reg->max_index = src_reg->umax_value / sizeof(long);
> +	}
> +

What if there are several BPF_ADD on the same PTR_TO_MAP_VALUE in a row?
Shouldn't the {min,max}_index be accumulated in that case?

Nit: this should be handled inside adjust_ptr_min_max_vals().

>  	if (dst_reg->type != SCALAR_VALUE)
>  		ptr_reg = dst_reg;
>  

[...]

> @@ -17552,6 +17607,62 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)

[...]

> +/* "conditional jump with N edges" */
> +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> +{
> +	struct bpf_map *map;
> +	int ret;
> +
> +	ret = add_used_map(env, fd, &map);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (map->map_type != BPF_MAP_TYPE_INSN_SET)
> +		return -EINVAL;

Nit: print something in the log?

> +
> +	return push_goto_x_edge(t, env, map);
> +}
> +

[...]

> @@ -18786,11 +18904,22 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
>  			      struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
>  {
>  	u16 live_regs = env->insn_aux_data[insn_idx].live_regs_before;
> +	struct bpf_insn *insn;
>  	u16 i;
>  
>  	if (old->callback_depth > cur->callback_depth)
>  		return false;
>  
> +	insn = &env->prog->insnsi[insn_idx];
> +	if (insn_is_gotox(insn)) {
> +		struct bpf_reg_state *old_dst = &old->regs[insn->dst_reg];
> +		struct bpf_reg_state *cur_dst = &cur->regs[insn->dst_reg];
> +
> +		if (old_dst->min_index != cur_dst->min_index ||
> +		    old_dst->max_index != cur_dst->max_index)
> +			return false;
> +	}
> +

Concur with Alexei, this should be handled by regsafe().
Also, having cur_dst as a subset of old_dst should be fine.

>  	for (i = 0; i < MAX_BPF_REG; i++)
>  		if (((1 << i) & live_regs) &&
>  		    !regsafe(env, &old->regs[i], &cur->regs[i],
> @@ -19654,6 +19783,55 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
>  	return PROCESS_BPF_EXIT;
>  }
>  
> +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> +{
> +	struct bpf_verifier_state *other_branch;
> +	struct bpf_reg_state *dst_reg;
> +	struct bpf_map *map;
> +	int xoff;
> +	int err;
> +	u32 i;
> +
> +	/* this map should already have been added */
> +	err = add_used_map(env, insn->imm, &map);
> +	if (err < 0)
> +		return err;
> +
> +	dst_reg = reg_state(env, insn->dst_reg);
> +	if (dst_reg->type != PTR_TO_INSN) {
> +		verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> +				insn->dst_reg, dst_reg->type);
> +		return -EINVAL;
> +	}
> +
> +	if (dst_reg->map_ptr != map) {
> +		verbose(env, "BPF_JA|BPF_X R%d was loaded from map id=%u, expected id=%u\n",
> +				insn->dst_reg, dst_reg->map_ptr->id, map->id);
> +		return -EINVAL;
> +	}
> +
> +	if (dst_reg->max_index >= map->max_entries)
> +		return -EINVAL;
> +
> +	for (i = dst_reg->min_index + 1; i <= dst_reg->max_index; i++) {

Why +1 is needed in `i = dst_reg->min_index + 1`?

> +		xoff = bpf_insn_set_iter_xlated_offset(map, i);
> +		if (xoff == -ENOENT)
> +			break;
> +		if (xoff < 0)
> +			return xoff;
> +
> +		other_branch = push_stack(env, xoff, env->insn_idx, false);
> +		if (!other_branch)
> +			return -EFAULT;

Nit: `return -ENOMEM`.

> +	}
> +
> +	env->insn_idx = bpf_insn_set_iter_xlated_offset(map, dst_reg->min_index);
> +	if (env->insn_idx < 0)
> +		return env->insn_idx;
> +
> +	return 0;
> +}
> +
>  static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
>  {
>  	int err;

[...]


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 6/9] bpf: workaround llvm behaviour with indirect jumps
  2025-06-15  8:59 ` [RFC bpf-next 6/9] bpf: workaround llvm behaviour with " Anton Protopopov
@ 2025-06-18 11:04   ` Eduard Zingerman
  2025-06-18 13:59     ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-06-18 11:04 UTC (permalink / raw)
  To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> When indirect jumps are enabled in LLVM, it might generate
> unreachable instructions. For example, the following code
> 
>     SEC("syscall") int foo(struct simple_ctx *ctx)
>     {
>             switch (ctx->x) {
>             case 0:
>                     ret_user = 2;
>                     break;
>             case 11:
>                     ret_user = 3;
>                     break;
>             case 27:
>                     ret_user = 4;
>                     break;
>             case 31:
>                     ret_user = 5;
>                     break;
>             default:
>                     ret_user = 19;
>                     break;
>             }
> 
>             return 0;
>     }
> 
> compiles into
> 
>     <foo>:
>     ;       switch (ctx->x) {
>          224:       79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x0)
>          225:       25 01 0f 00 1f 00 00 00 if r1 > 0x1f goto +0xf <foo+0x88>
>          226:       67 01 00 00 03 00 00 00 r1 <<= 0x3
>          227:       18 02 00 00 a8 00 00 00 00 00 00 00 00 00 00 00 r2 = 0xa8 ll
>                     0000000000000718:  R_BPF_64_64  .rodata
>          229:       0f 12 00 00 00 00 00 00 r2 += r1
>          230:       79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x0)
>          231:       0d 01 00 00 00 00 00 00 gotox r1
>          232:       05 00 08 00 00 00 00 00 goto +0x8 <foo+0x88>
>          233:       b7 01 00 00 02 00 00 00 r1 = 0x2
>     ;       switch (ctx->x) {
>          234:       05 00 07 00 00 00 00 00 goto +0x7 <foo+0x90>
>          235:       b7 01 00 00 04 00 00 00 r1 = 0x4
>     ;               break;
>          236:       05 00 05 00 00 00 00 00 goto +0x5 <foo+0x90>
>          237:       b7 01 00 00 03 00 00 00 r1 = 0x3
>     ;               break;
>          238:       05 00 03 00 00 00 00 00 goto +0x3 <foo+0x90>
>          239:       b7 01 00 00 05 00 00 00 r1 = 0x5
>     ;               break;
>          240:       05 00 01 00 00 00 00 00 goto +0x1 <foo+0x90>
>          241:       b7 01 00 00 13 00 00 00 r1 = 0x13
>          242:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
>                     0000000000000790:  R_BPF_64_64  ret_user
>          244:       7b 12 00 00 00 00 00 00 *(u64 *)(r2 + 0x0) = r1
>     ;       return 0;
>          245:       b4 00 00 00 00 00 00 00 w0 = 0x0
>          246:       95 00 00 00 00 00 00 00 exit
> 
> The jump table is
> 
>     242, 241, 241, 241, 241, 241, 241, 241,
>     241, 241, 241, 237, 241, 241, 241, 241,
>     241, 241, 241, 241, 241, 241, 241, 241,
>     241, 241, 241, 235, 241, 241, 241, 239
> 
> The check
> 
>     225:       25 01 0f 00 1f 00 00 00 if r1 > 0x1f goto +0xf <foo+0x88>
> 
> makes sure that the r1 register is always loaded from the jump table.
> This makes the instruction
> 
>     232:       05 00 08 00 00 00 00 00 goto +0x8 <foo+0x88>
> 
> unreachable.
> 
> Patch verifier to ignore such unreachable JA instructions.
> 
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> ---

This should be possible to handle on LLVM side, no need to deal with
it in the kernel.

[...]



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 3/9] selftests/bpf: add selftests for new insn_set map
  2025-06-15  8:59 ` [RFC bpf-next 3/9] selftests/bpf: add selftests for new insn_set map Anton Protopopov
@ 2025-06-18 11:04   ` Eduard Zingerman
  2025-06-18 15:16     ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-06-18 11:04 UTC (permalink / raw)
  To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> Tests are split in two parts.
> 
> The `bpf_insn_set_ops` test checks that the map is managed properly:
> 
>   * Incorrect instruction indexes are rejected
>   * Non-sorted and non-unique indexes are rejected
>   * Unfrozen maps are not accepted
>   * Two programs can't use the same map
>   * BPF progs can't operate the map
> 
> The `bpf_insn_set_reloc` part validates, as best as it can do it from user
> space, that instructions are relocated properly:
> 
>   * no relocations => map is the same
>   * expected relocations when instructions are added
>   * expected relocations when instructions are deleted
>   * expected relocations when multiple functions are present
> 
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> ---

Nit: term "relocation" is ambiguous, in BPF context first thing that
     comes to mind are ELF relocations that allow CO-RE to work.

[...]



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 6/9] bpf: workaround llvm behaviour with indirect jumps
  2025-06-18 11:04   ` Eduard Zingerman
@ 2025-06-18 13:59     ` Alexei Starovoitov
  0 siblings, 0 replies; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18 13:59 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Wed, Jun 18, 2025 at 4:04 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> > When indirect jumps are enabled in LLVM, it might generate
> > unreachable instructions. For example, the following code
> >
> >     SEC("syscall") int foo(struct simple_ctx *ctx)
> >     {
> >             switch (ctx->x) {
> >             case 0:
> >                     ret_user = 2;
> >                     break;
> >             case 11:
> >                     ret_user = 3;
> >                     break;
> >             case 27:
> >                     ret_user = 4;
> >                     break;
> >             case 31:
> >                     ret_user = 5;
> >                     break;
> >             default:
> >                     ret_user = 19;
> >                     break;
> >             }
> >
> >             return 0;
> >     }
> >
> > compiles into
> >
> >     <foo>:
> >     ;       switch (ctx->x) {
> >          224:       79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x0)
> >          225:       25 01 0f 00 1f 00 00 00 if r1 > 0x1f goto +0xf <foo+0x88>
> >          226:       67 01 00 00 03 00 00 00 r1 <<= 0x3
> >          227:       18 02 00 00 a8 00 00 00 00 00 00 00 00 00 00 00 r2 = 0xa8 ll
> >                     0000000000000718:  R_BPF_64_64  .rodata
> >          229:       0f 12 00 00 00 00 00 00 r2 += r1
> >          230:       79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x0)
> >          231:       0d 01 00 00 00 00 00 00 gotox r1
> >          232:       05 00 08 00 00 00 00 00 goto +0x8 <foo+0x88>
> >          233:       b7 01 00 00 02 00 00 00 r1 = 0x2
> >     ;       switch (ctx->x) {
> >          234:       05 00 07 00 00 00 00 00 goto +0x7 <foo+0x90>
> >          235:       b7 01 00 00 04 00 00 00 r1 = 0x4
> >     ;               break;
> >          236:       05 00 05 00 00 00 00 00 goto +0x5 <foo+0x90>
> >          237:       b7 01 00 00 03 00 00 00 r1 = 0x3
> >     ;               break;
> >          238:       05 00 03 00 00 00 00 00 goto +0x3 <foo+0x90>
> >          239:       b7 01 00 00 05 00 00 00 r1 = 0x5
> >     ;               break;
> >          240:       05 00 01 00 00 00 00 00 goto +0x1 <foo+0x90>
> >          241:       b7 01 00 00 13 00 00 00 r1 = 0x13
> >          242:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
> >                     0000000000000790:  R_BPF_64_64  ret_user
> >          244:       7b 12 00 00 00 00 00 00 *(u64 *)(r2 + 0x0) = r1
> >     ;       return 0;
> >          245:       b4 00 00 00 00 00 00 00 w0 = 0x0
> >          246:       95 00 00 00 00 00 00 00 exit
> >
> > The jump table is
> >
> >     242, 241, 241, 241, 241, 241, 241, 241,
> >     241, 241, 241, 237, 241, 241, 241, 241,
> >     241, 241, 241, 241, 241, 241, 241, 241,
> >     241, 241, 241, 235, 241, 241, 241, 239
> >
> > The check
> >
> >     225:       25 01 0f 00 1f 00 00 00 if r1 > 0x1f goto +0xf <foo+0x88>
> >
> > makes sure that the r1 register is always loaded from the jump table.
> > This makes the instruction
> >
> >     232:       05 00 08 00 00 00 00 00 goto +0x8 <foo+0x88>
> >
> > unreachable.
> >
> > Patch verifier to ignore such unreachable JA instructions.
> >
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > ---
>
> This should be possible to handle on LLVM side, no need to deal with
> it in the kernel.

I think Yonghong already looked at it and it wasn't trivial.
I feel I saw this pattern with x86 code too, so it may be deep.
The kernel side workaround looks trivial enough,
but if llvm can actually be fixed then it's certainly better.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 4/9] bpf, x86: allow indirect jumps to r8...r15
  2025-06-17 19:41   ` Alexei Starovoitov
@ 2025-06-18 14:28     ` Anton Protopopov
  0 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-18 14:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/17 12:41PM, Alexei Starovoitov wrote:
> On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > Currently, the emit_indirect_jump() function only accepts one of the
> > RAX, RCX, ..., RBP registers as the destination. Prepare it to accept
> > R8, R9, ..., R15 as well. This is necessary to enable indirect jumps
> > support in eBPF.
> >
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > ---
> >  arch/x86/net/bpf_jit_comp.c | 26 +++++++++++++++++++-------
> >  1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 923c38f212dc..37dc83d91832 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -659,7 +659,19 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
> >
> >  #define EMIT_LFENCE()  EMIT3(0x0F, 0xAE, 0xE8)
> >
> > -static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
> > +static void __emit_indirect_jump(u8 **pprog, int reg, bool ereg)
> 
> Instead of adding bool flag make reg to be bpf reg
> instead of x86 reg, tweak the signature to
> emit_indirect_jump(..., u32 reg, ..),
> and add is_ereg(reg) inside.

Ok, will do. Didn't do it initially, because it assumes this change
(and another one similar):

-       emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
+       emit_indirect_jump(&prog, BPF_REG_4 /* R4 -> rcx */, ip + (prog - start));

but with the "R4 -> rcx" actually looks ok to me.

> Also drop RFC tag next time. Let CI do the work.

Ok

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-18  3:24   ` Alexei Starovoitov
@ 2025-06-18 14:49     ` Anton Protopopov
  2025-06-18 16:01       ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-18 14:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/17 08:24PM, Alexei Starovoitov wrote:
> On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> > +SEC("syscall")
> > +int two_towers(struct simple_ctx *ctx)
> > +{
> > +       switch (ctx->x) {
> >
> 
> Not sure why you went with switch() statements everywhere.
> Please add few tests with explicit indirect goto
> like interpreter does: goto *jumptable[insn->code];

This requires to patch libbpf a bit more, as some meta-info
accompanying this instruction should be emitted, like LLVM does with
jump_table_sizes. And this probably should be a different section,
such that it doesn't conflict with LLVM/GCC. I thought to add this
later, but will try to add to the next version.

> Remove all bpf_printk() too and get easy on names.

The `bpf_printk` is there to emit some instructions which later will
be replaced by the verifier with more instructions; this is to
additionally test "instruction set" basic functionality
(orig->xlated mapping). Do you think this selftest shouldn't have
this?

> i_am_a_little_tiny_foo() sounds funny today, but
> it won't be funny at all tomorrow.

Yeah, thanks, will rename it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-18  3:22   ` Alexei Starovoitov
@ 2025-06-18 15:08     ` Anton Protopopov
  2025-07-07 23:45       ` Eduard Zingerman
  2025-07-08 20:59     ` Eduard Zingerman
  1 sibling, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-18 15:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/17 08:22PM, Alexei Starovoitov wrote:
> On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > The final line generates an indirect jump. The
> > format of the indirect jump instruction supported by BPF is
> >
> >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> >
> > and, obviously, the map M must be the same map which was used to
> > init the register rX. This patch implements this in the following,
> > hacky, but so far suitable for all existing use-cases, way. On
> > encountering a `gotox` instruction libbpf tracks back to the
> > previous direct load from map and stores this map file descriptor
> > in the gotox instruction.
> 
> ...
> 
> > +/*
> > + * This one is too dumb, of course. TBD to make it smarter.
> > + */
> > +static int find_jt_map_fd(struct bpf_program *prog, int insn_idx)
> > +{
> > +       struct bpf_insn *insn = &prog->insns[insn_idx];
> > +       __u8 dst_reg = insn->dst_reg;
> > +
> > +       /* TBD: this function is such smart for now that it even ignores this
> > +        * register. Instead, it should backtrack the load more carefully.
> > +        * (So far even this dumb version works with all selftests.)
> > +        */
> > +       pr_debug("searching for a load instruction which populated dst_reg=r%u\n", dst_reg);
> > +
> > +       while (--insn >= prog->insns) {
> > +               if (insn->code == (BPF_LD|BPF_DW|BPF_IMM))
> > +                       return insn[0].imm;
> > +       }
> > +
> > +       return -ENOENT;
> > +}
> > +
> > +static int bpf_object__patch_gotox(struct bpf_object *obj, struct bpf_program *prog)
> > +{
> > +       struct bpf_insn *insn = prog->insns;
> > +       int map_fd;
> > +       int i;
> > +
> > +       for (i = 0; i < prog->insns_cnt; i++, insn++) {
> > +               if (!insn_is_gotox(insn))
> > +                       continue;
> > +
> > +               if (obj->gen_loader)
> > +                       return -EFAULT;
> > +
> > +               map_fd = find_jt_map_fd(prog, i);
> > +               if (map_fd < 0)
> > +                       return map_fd;
> > +
> > +               insn->imm = map_fd;
> > +       }
> 
> This is obviously broken and cannot be made smarter in libbpf.
> It won't be doing data flow analysis.
> 
> The only option I see is to teach llvm to tag jmp_table in gotox.
> Probably the simplest way is to add the same relo to gotox insn
> as for ld_imm64. Then libbpf has a direct way to assign
> the same map_fd into both ld_imm64 and gotox.

This would be nice.

> Uglier alternatives is to redesign the gotox encoding and
> drop ld_imm64 and *=8 altogether.
> Then gotox jmp_table[R5] will be like jumbo insn that
> does *=8 and load inside and JIT emits all that.
> But it's ugly and likely has other downsides.

I did this in my initial draft for LLVM (and supporting different
kind of instructions was done using bits in SRC). But the "native"
approach looks better for me now, especially if compiles can be
taught to link load&gotox.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 3/9] selftests/bpf: add selftests for new insn_set map
  2025-06-18 11:04   ` Eduard Zingerman
@ 2025-06-18 15:16     ` Anton Protopopov
  0 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-18 15:16 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/06/18 04:04AM, Eduard Zingerman wrote:
> On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> > Tests are split in two parts.
> > 
> > The `bpf_insn_set_ops` test checks that the map is managed properly:
> > 
> >   * Incorrect instruction indexes are rejected
> >   * Non-sorted and non-unique indexes are rejected
> >   * Unfrozen maps are not accepted
> >   * Two programs can't use the same map
> >   * BPF progs can't operate the map
> > 
> > The `bpf_insn_set_reloc` part validates, as best as it can do it from user
> > space, that instructions are relocated properly:
> > 
> >   * no relocations => map is the same
> >   * expected relocations when instructions are added
> >   * expected relocations when instructions are deleted
> >   * expected relocations when multiple functions are present
> > 
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > ---
> 
> Nit: term "relocation" is ambiguous, in BPF context first thing that
>      comes to mind are ELF relocations that allow CO-RE to work.

Thanks, agree. I will try to find other words for the desriptions

> [...]
> 
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-18 14:49     ` Anton Protopopov
@ 2025-06-18 16:01       ` Alexei Starovoitov
  2025-06-18 16:36         ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18 16:01 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On Wed, Jun 18, 2025 at 7:43 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> On 25/06/17 08:24PM, Alexei Starovoitov wrote:
> > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > > +SEC("syscall")
> > > +int two_towers(struct simple_ctx *ctx)
> > > +{
> > > +       switch (ctx->x) {
> > >
> >
> > Not sure why you went with switch() statements everywhere.
> > Please add few tests with explicit indirect goto
> > like interpreter does: goto *jumptable[insn->code];
>
> This requires to patch libbpf a bit more, as some meta-info
> accompanying this instruction should be emitted, like LLVM does with
> jump_table_sizes. And this probably should be a different section,
> such that it doesn't conflict with LLVM/GCC. I thought to add this
> later, but will try to add to the next version.

Hmm. I'm not sure why llvm should handle explicit indirect goto
any different than the one generated from switch.
The generated bpf.o should be the same.

> > Remove all bpf_printk() too and get easy on names.
>
> The `bpf_printk` is there to emit some instructions which later will
> be replaced by the verifier with more instructions; this is to
> additionally test "instruction set" basic functionality
> (orig->xlated mapping). Do you think this selftest shouldn't have
> this?

None of the runnable tests should have bpf_printk() since
it spams the global trace pipe.
There are few tests that have printks, but they shouldn't be runnable.
It's load only.

> > i_am_a_little_tiny_foo() sounds funny today, but
> > it won't be funny at all tomorrow.
>
> Yeah, thanks, will rename it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-18 16:01       ` Alexei Starovoitov
@ 2025-06-18 16:36         ` Anton Protopopov
  2025-06-18 16:43           ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-18 16:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/18 09:01AM, Alexei Starovoitov wrote:
> On Wed, Jun 18, 2025 at 7:43 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > On 25/06/17 08:24PM, Alexei Starovoitov wrote:
> > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > <a.s.protopopov@gmail.com> wrote:
> > > > +SEC("syscall")
> > > > +int two_towers(struct simple_ctx *ctx)
> > > > +{
> > > > +       switch (ctx->x) {
> > > >
> > >
> > > Not sure why you went with switch() statements everywhere.
> > > Please add few tests with explicit indirect goto
> > > like interpreter does: goto *jumptable[insn->code];
> >
> > This requires to patch libbpf a bit more, as some meta-info
> > accompanying this instruction should be emitted, like LLVM does with
> > jump_table_sizes. And this probably should be a different section,
> > such that it doesn't conflict with LLVM/GCC. I thought to add this
> > later, but will try to add to the next version.
> 
> Hmm. I'm not sure why llvm should handle explicit indirect goto
> any different than the one generated from switch.
> The generated bpf.o should be the same.

For a switch statement LLVM will create a jump table
and create the {,.rel}.llvm_jump_table_sizes tables.

For a direct goto *, say

    static const void *table[] = {
            &&l1, &&l2, &&l3, &&l4, &&l5, 
    };
    if (index > ARRAY_SIZE(table))
            return 0;
    goto *table[index];

it will not generate {,.rel}.llvm_jump_table_sizes. I wonder, does
LLVM emit the size of `table`? (If no, then some assembly needed to
emit it.) In any case it should be easy to add this case, but still
it is a bit of coding, thus a bit different case.)

> > > Remove all bpf_printk() too and get easy on names.
> >
> > The `bpf_printk` is there to emit some instructions which later will
> > be replaced by the verifier with more instructions; this is to
> > additionally test "instruction set" basic functionality
> > (orig->xlated mapping). Do you think this selftest shouldn't have
> > this?
> 
> None of the runnable tests should have bpf_printk() since
> it spams the global trace pipe.
> There are few tests that have printks, but they shouldn't be runnable.
> It's load only.

Ok, thanks, makes total sense now

> > > i_am_a_little_tiny_foo() sounds funny today, but
> > > it won't be funny at all tomorrow.
> >
> > Yeah, thanks, will rename it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-18 16:36         ` Anton Protopopov
@ 2025-06-18 16:43           ` Alexei Starovoitov
  2025-06-18 20:25             ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18 16:43 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On Wed, Jun 18, 2025 at 9:30 AM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> On 25/06/18 09:01AM, Alexei Starovoitov wrote:
> > On Wed, Jun 18, 2025 at 7:43 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > >
> > > On 25/06/17 08:24PM, Alexei Starovoitov wrote:
> > > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > > <a.s.protopopov@gmail.com> wrote:
> > > > > +SEC("syscall")
> > > > > +int two_towers(struct simple_ctx *ctx)
> > > > > +{
> > > > > +       switch (ctx->x) {
> > > > >
> > > >
> > > > Not sure why you went with switch() statements everywhere.
> > > > Please add few tests with explicit indirect goto
> > > > like interpreter does: goto *jumptable[insn->code];
> > >
> > > This requires to patch libbpf a bit more, as some meta-info
> > > accompanying this instruction should be emitted, like LLVM does with
> > > jump_table_sizes. And this probably should be a different section,
> > > such that it doesn't conflict with LLVM/GCC. I thought to add this
> > > later, but will try to add to the next version.
> >
> > Hmm. I'm not sure why llvm should handle explicit indirect goto
> > any different than the one generated from switch.
> > The generated bpf.o should be the same.
>
> For a switch statement LLVM will create a jump table
> and create the {,.rel}.llvm_jump_table_sizes tables.
>
> For a direct goto *, say
>
>     static const void *table[] = {
>             &&l1, &&l2, &&l3, &&l4, &&l5,
>     };
>     if (index > ARRAY_SIZE(table))
>             return 0;
>     goto *table[index];
>
> it will not generate {,.rel}.llvm_jump_table_sizes. I wonder, does
> LLVM emit the size of `table`? (If no, then some assembly needed to
> emit it.) In any case it should be easy to add this case, but still
> it is a bit of coding, thus a bit different case.)

It's controlled by -emit-jump-table-sizes-section flag.
I haven't looked at pending llvm/bpf diff, but it should be possible
to standardize. Emit it for both or for none.
My preference would be for _none_.

Not sure why you made libbpf rely on that section name.
Relocations against text can be in other rodata sections.
Normal behavior for x86 and other backends.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-15  8:59 ` [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps Anton Protopopov
  2025-06-18  3:22   ` Alexei Starovoitov
@ 2025-06-18 19:49   ` Eduard Zingerman
  2025-06-27  2:28     ` Eduard Zingerman
  1 sibling, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-06-18 19:49 UTC (permalink / raw)
  To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:

[...]

> @@ -698,6 +712,14 @@ struct bpf_object {
>  	bool has_subcalls;
>  	bool has_rodata;
>  
> +	const void *rodata;
> +	size_t rodata_size;
> +	int rodata_map_fd;

This is sort-of strange, that jump table metadata resides in one
section, while jump section itself is in .rodata. Wouldn't it be
simpler make LLVM emit all jump tables info in one section?
Also note that Elf_Sym has name, section index, value and size,
hence symbols defined for jump table section can encode jump tables.
E.g. the following implementation seems more intuitive:

  .jumptables
    <subprog-rel-off-0>
    <subprog-rel-off-1> | <--- jump table #1 symbol:
    <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
    ...                          .value = 1  // offset within .jumptables
    <subprog-rel-off-N>                          ^
                                                 |
  .text                                          |
    ...                                          |
    <insn-N>     <------ relocation referencing -'
    ...                  jump table #1 symbol

> +
> +	/* Jump Tables */
> +	struct jt **jt;
> +	size_t jt_cnt;
> +
>  	struct bpf_gen *gen_loader;
>  
>  	/* Information when doing ELF related work. Only valid if efile.elf is not NULL */

[...]


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-18 16:43           ` Alexei Starovoitov
@ 2025-06-18 20:25             ` Anton Protopopov
  2025-06-18 21:59               ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-18 20:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/18 09:43AM, Alexei Starovoitov wrote:
> On Wed, Jun 18, 2025 at 9:30 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > On 25/06/18 09:01AM, Alexei Starovoitov wrote:
> > > On Wed, Jun 18, 2025 at 7:43 AM Anton Protopopov
> > > <a.s.protopopov@gmail.com> wrote:
> > > >
> > > > On 25/06/17 08:24PM, Alexei Starovoitov wrote:
> > > > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > > > <a.s.protopopov@gmail.com> wrote:
> > > > > > +SEC("syscall")
> > > > > > +int two_towers(struct simple_ctx *ctx)
> > > > > > +{
> > > > > > +       switch (ctx->x) {
> > > > > >
> > > > >
> > > > > Not sure why you went with switch() statements everywhere.
> > > > > Please add few tests with explicit indirect goto
> > > > > like interpreter does: goto *jumptable[insn->code];
> > > >
> > > > This requires to patch libbpf a bit more, as some meta-info
> > > > accompanying this instruction should be emitted, like LLVM does with
> > > > jump_table_sizes. And this probably should be a different section,
> > > > such that it doesn't conflict with LLVM/GCC. I thought to add this
> > > > later, but will try to add to the next version.
> > >
> > > Hmm. I'm not sure why llvm should handle explicit indirect goto
> > > any different than the one generated from switch.
> > > The generated bpf.o should be the same.
> >
> > For a switch statement LLVM will create a jump table
> > and create the {,.rel}.llvm_jump_table_sizes tables.
> >
> > For a direct goto *, say
> >
> >     static const void *table[] = {
> >             &&l1, &&l2, &&l3, &&l4, &&l5,
> >     };
> >     if (index > ARRAY_SIZE(table))
> >             return 0;
> >     goto *table[index];
> >
> > it will not generate {,.rel}.llvm_jump_table_sizes. I wonder, does
> > LLVM emit the size of `table`? (If no, then some assembly needed to
> > emit it.) In any case it should be easy to add this case, but still
> > it is a bit of coding, thus a bit different case.)
> 
> It's controlled by -emit-jump-table-sizes-section flag.
> I haven't looked at pending llvm/bpf diff, but it should be possible
> to standardize. Emit it for both or for none.
> My preference would be for _none_.
> 
> Not sure why you made libbpf rely on that section name.
> Relocations against text can be in other rodata sections.
> Normal behavior for x86 and other backends.

So, those sections are just an easier way to find jump table sizes.
The other way is as was described by Yonghong in [1] (parse
.rel.rodata, follow each symbol to its section, find offset, then
find each gotox instruction, map it to a load, then one can find that
the load is from a jump table, etc.). Just to be sure, is the latter by
your opinion the better way (because it doesn't depend on emitting
tables?)?

Those tables are _not_ generated for the code I've listed above.
However, in this case I can get the size of the table directly from
the symtab.

  [1] https://github.com/llvm/llvm-project/pull/133856#issuecomment-2769970882

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-18 20:25             ` Anton Protopopov
@ 2025-06-18 21:59               ` Alexei Starovoitov
  2025-06-19  5:05                 ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-06-18 21:59 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On Wed, Jun 18, 2025 at 1:19 PM Anton Protopopov
<a.s.protopopov@gmail.com> wrote:
>
> On 25/06/18 09:43AM, Alexei Starovoitov wrote:
> > On Wed, Jun 18, 2025 at 9:30 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > >
> > > On 25/06/18 09:01AM, Alexei Starovoitov wrote:
> > > > On Wed, Jun 18, 2025 at 7:43 AM Anton Protopopov
> > > > <a.s.protopopov@gmail.com> wrote:
> > > > >
> > > > > On 25/06/17 08:24PM, Alexei Starovoitov wrote:
> > > > > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > > > > <a.s.protopopov@gmail.com> wrote:
> > > > > > > +SEC("syscall")
> > > > > > > +int two_towers(struct simple_ctx *ctx)
> > > > > > > +{
> > > > > > > +       switch (ctx->x) {
> > > > > > >
> > > > > >
> > > > > > Not sure why you went with switch() statements everywhere.
> > > > > > Please add few tests with explicit indirect goto
> > > > > > like interpreter does: goto *jumptable[insn->code];
> > > > >
> > > > > This requires to patch libbpf a bit more, as some meta-info
> > > > > accompanying this instruction should be emitted, like LLVM does with
> > > > > jump_table_sizes. And this probably should be a different section,
> > > > > such that it doesn't conflict with LLVM/GCC. I thought to add this
> > > > > later, but will try to add to the next version.
> > > >
> > > > Hmm. I'm not sure why llvm should handle explicit indirect goto
> > > > any different than the one generated from switch.
> > > > The generated bpf.o should be the same.
> > >
> > > For a switch statement LLVM will create a jump table
> > > and create the {,.rel}.llvm_jump_table_sizes tables.
> > >
> > > For a direct goto *, say
> > >
> > >     static const void *table[] = {
> > >             &&l1, &&l2, &&l3, &&l4, &&l5,
> > >     };
> > >     if (index > ARRAY_SIZE(table))
> > >             return 0;
> > >     goto *table[index];
> > >
> > > it will not generate {,.rel}.llvm_jump_table_sizes. I wonder, does
> > > LLVM emit the size of `table`? (If no, then some assembly needed to
> > > emit it.) In any case it should be easy to add this case, but still
> > > it is a bit of coding, thus a bit different case.)
> >
> > It's controlled by -emit-jump-table-sizes-section flag.
> > I haven't looked at pending llvm/bpf diff, but it should be possible
> > to standardize. Emit it for both or for none.
> > My preference would be for _none_.
> >
> > Not sure why you made libbpf rely on that section name.
> > Relocations against text can be in other rodata sections.
> > Normal behavior for x86 and other backends.
>
> So, those sections are just an easier way to find jump table sizes.
> The other way is as was described by Yonghong in [1] (parse
> .rel.rodata, follow each symbol to its section, find offset, then
> find each gotox instruction, map it to a load, then one can find that
> the load is from a jump table, etc.). Just to be sure, is the latter by
> your opinion the better way (because it doesn't depend on emitting
> tables?)?
>
> Those tables are _not_ generated for the code I've listed above.
> However, in this case I can get the size of the table directly from
> the symtab.

Since Yonghong's diff did:
bool BPFAsmPrinter::doInitialization(Module &M) {

EmitJumpTableSizesSection = true;

and llvm did not emit jump table for explicit 'goto *table[index]'
I suspect it will be hard to fix.
Meaning libbpf cannot rely on a special section name.
So it makes sense not to force this mode in llvm
(especially since no other backend does it) and do generic
detection in libbpf. It will work for both explicit gotox and
switch generated at the end.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 9/9] selftests/bpf: add selftests for indirect jumps
  2025-06-18 21:59               ` Alexei Starovoitov
@ 2025-06-19  5:05                 ` Anton Protopopov
  0 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-19  5:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/18 02:59PM, Alexei Starovoitov wrote:
> On Wed, Jun 18, 2025 at 1:19 PM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > On 25/06/18 09:43AM, Alexei Starovoitov wrote:
> > > On Wed, Jun 18, 2025 at 9:30 AM Anton Protopopov
> > > <a.s.protopopov@gmail.com> wrote:
> > > >
> > > > On 25/06/18 09:01AM, Alexei Starovoitov wrote:
> > > > > On Wed, Jun 18, 2025 at 7:43 AM Anton Protopopov
> > > > > <a.s.protopopov@gmail.com> wrote:
> > > > > >
> > > > > > On 25/06/17 08:24PM, Alexei Starovoitov wrote:
> > > > > > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > > > > > <a.s.protopopov@gmail.com> wrote:
> > > > > > > > +SEC("syscall")
> > > > > > > > +int two_towers(struct simple_ctx *ctx)
> > > > > > > > +{
> > > > > > > > +       switch (ctx->x) {
> > > > > > > >
> > > > > > >
> > > > > > > Not sure why you went with switch() statements everywhere.
> > > > > > > Please add few tests with explicit indirect goto
> > > > > > > like interpreter does: goto *jumptable[insn->code];
> > > > > >
> > > > > > This requires to patch libbpf a bit more, as some meta-info
> > > > > > accompanying this instruction should be emitted, like LLVM does with
> > > > > > jump_table_sizes. And this probably should be a different section,
> > > > > > such that it doesn't conflict with LLVM/GCC. I thought to add this
> > > > > > later, but will try to add to the next version.
> > > > >
> > > > > Hmm. I'm not sure why llvm should handle explicit indirect goto
> > > > > any different than the one generated from switch.
> > > > > The generated bpf.o should be the same.
> > > >
> > > > For a switch statement LLVM will create a jump table
> > > > and create the {,.rel}.llvm_jump_table_sizes tables.
> > > >
> > > > For a direct goto *, say
> > > >
> > > >     static const void *table[] = {
> > > >             &&l1, &&l2, &&l3, &&l4, &&l5,
> > > >     };
> > > >     if (index > ARRAY_SIZE(table))
> > > >             return 0;
> > > >     goto *table[index];
> > > >
> > > > it will not generate {,.rel}.llvm_jump_table_sizes. I wonder, does
> > > > LLVM emit the size of `table`? (If no, then some assembly needed to
> > > > emit it.) In any case it should be easy to add this case, but still
> > > > it is a bit of coding, thus a bit different case.)
> > >
> > > It's controlled by -emit-jump-table-sizes-section flag.
> > > I haven't looked at pending llvm/bpf diff, but it should be possible
> > > to standardize. Emit it for both or for none.
> > > My preference would be for _none_.
> > >
> > > Not sure why you made libbpf rely on that section name.
> > > Relocations against text can be in other rodata sections.
> > > Normal behavior for x86 and other backends.
> >
> > So, those sections are just an easier way to find jump table sizes.
> > The other way is as was described by Yonghong in [1] (parse
> > .rel.rodata, follow each symbol to its section, find offset, then
> > find each gotox instruction, map it to a load, then one can find that
> > the load is from a jump table, etc.). Just to be sure, is the latter by
> > your opinion the better way (because it doesn't depend on emitting
> > tables?)?
> >
> > Those tables are _not_ generated for the code I've listed above.
> > However, in this case I can get the size of the table directly from
> > the symtab.
> 
> Since Yonghong's diff did:
> bool BPFAsmPrinter::doInitialization(Module &M) {
> 
> EmitJumpTableSizesSection = true;
> 
> and llvm did not emit jump table for explicit 'goto *table[index]'
> I suspect it will be hard to fix.
> Meaning libbpf cannot rely on a special section name.
> So it makes sense not to force this mode in llvm
> (especially since no other backend does it) and do generic
> detection in libbpf. It will work for both explicit gotox and
> switch generated at the end.

Ok, got it, thanks for the explanation.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set
  2025-06-18  0:57   ` Eduard Zingerman
  2025-06-18  2:16     ` Alexei Starovoitov
@ 2025-06-19 18:55     ` Anton Protopopov
  2025-06-19 18:55       ` Eduard Zingerman
  1 sibling, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-19 18:55 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/06/17 05:57PM, Eduard Zingerman wrote:
> On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> 
> Meta: "instruction set" is a super confusing name, at-least for me the
>       first thought is about actual set of instructions supported by
>       some h/w. instruction_info? instruction_offset? just
>       "iset"/"ioffset"?
> 
> [...]
> 
> > On map creation/initialization, before loading the program, each
> > element of the map should be initialized to point to an instruction
> > offset within the program. Before the program load such maps should
> > be made frozen. After the program verification xlated and jitted
> > offsets can be read via the bpf(2) syscall.
> 
> I think such maps would be a bit more ergonomic it original
> instruction index would be saved as well, e.g:
> 
>   (original_offset, xlated_offset, jitted_offset)
> 
> Otherwise user would have to recover original offset from some
> external mapping. This information is stored in orig_xlated_off
> anyway.

I do agree that this might be convenient to have the original_offset.
But the only use case I see here is "BPF debuggers". Such programs
will be able to build this mapping themselves.

I would add it as is, the only obstacle I see now is map key size.
Now from BPF point of view and from userspace point of view it is 8.
Userspace sees (u32 xlated, u32 jitted), and BPF sees *ip. I haven't
looked at how much work it is to have different key sizes for
userspace and BPF, and if this breaks things too much.

> [...]
> 
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 15672cb926fc..923c38f212dc 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1615,6 +1615,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
> 
> [...]
> 
> > @@ -2642,6 +2645,14 @@ st:			if (is_imm8(insn->off))
> >  				return -EFAULT;
> >  			}
> >  			memcpy(rw_image + proglen, temp, ilen);
> > +
> > +			/*
> > +			 * Instruction sets need to know how xlated code
> > +			 * maps to jited code
> > +			 */
> > +			abs_xlated_off = bpf_prog->aux->subprog_start + i - 1 - adjust_off;
> 
> Nit: `adjust_off` is a bit hard to follow, maybe move the following:
> 
> 	abs_xlated_off = bpf_prog->aux->subprog_start + i - 1;
> 
>      to the beginning of the loop?

Thank, this isn't transparent indeed. I will move things to be more
readable.

> 
> > +			bpf_prog_update_insn_ptr(bpf_prog, abs_xlated_off, proglen, ilen,
> > +						 jmp_offset, image + proglen);
> 
> Nit: initialize `jmp_offset` at each loop iteration to 0?
>      otherwise it would denote jump offset of the last processed
>      jump instruction for all following non-jump instructions.

Yes, thanks.

> >  		}
> >  		proglen += ilen;
> >  		addrs[i] = proglen;
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 8189f49e43d6..008bcd44c60e 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -3596,4 +3596,25 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
> >  	return prog->aux->func_idx != 0;
> >  }
> >
> > +int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog);
> > +int bpf_insn_set_ready(struct bpf_map *map);
> > +void bpf_insn_set_release(struct bpf_map *map);
> > +void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
> > +void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> > +
> > +struct bpf_insn_ptr {
> 
> Could you please add comments describing each field?
> E.g.: "address of the instruction in the jitted image",
>       "for jump instructions, the relative offset of the jump target",
>       "index of the original instruction",
>       "original value of the corresponding bpf_insn_set_value.xlated_off".

Sure, will add.

(Also, not to repeat "yes" many times, all your comments below look
to make sense, will address them. Thanks.)

> > +	void *jitted_ip;
> > +	u32 jitted_len;
> > +	int jitted_jump_offset;
> > +	struct bpf_insn_set_value user_value; /* userspace-visible value */
> > +	u32 orig_xlated_off;
> > +};
> 
> [...]
> 
> > diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> > new file mode 100644
> 
> [...]
> 
> > +static int insn_set_check_btf(const struct bpf_map *map,
> > +			      const struct btf *btf,
> > +			      const struct btf_type *key_type,
> > +			      const struct btf_type *value_type)
> > +{
> > +	u32 int_data;
> > +
> > +	if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> > +		return -EINVAL;
> > +
> > +	if (BTF_INFO_KIND(value_type->info) != BTF_KIND_INT)
> > +		return -EINVAL;
> > +
> > +	int_data = *(u32 *)(key_type + 1);
> 
> Nit: use btf_type_int() accessor?
> 
> > +	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
> > +		return -EINVAL;
> > +
> > +	int_data = *(u32 *)(value_type + 1);
> > +	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
> 
> Should this check for `BTF_INT_BITS(int_data) != 64`?
> 
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> 
> [...]
> 
> > +int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
> > +{
> > +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +	int i;
> > +
> > +	if (!is_frozen(map))
> > +		return -EINVAL;
> > +
> > +	if (!valid_offsets(insn_set, prog))
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * There can be only one program using the map
> > +	 */
> > +	mutex_lock(&insn_set->state_mutex);
> > +	if (insn_set->state != INSN_SET_STATE_FREE) {
> > +		mutex_unlock(&insn_set->state_mutex);
> > +		return -EBUSY;
> > +	}
> > +	insn_set->state = INSN_SET_STATE_INIT;
> > +	mutex_unlock(&insn_set->state_mutex);
> > +
> > +	/*
> > +	 * Reset all the map indexes to the original values.  This is needed,
> > +	 * e.g., when a replay of verification with different log level should
> > +	 * be performed.
> > +	 */
> > +	for (i = 0; i < map->max_entries; i++)
> > +		insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
> > +
> > +	return 0;
> > +}
> > +
> > +int bpf_insn_set_ready(struct bpf_map *map)
> 
> What is the reasoning for not needing to take the mutex here and in
> the bpf_insn_set_release?
> 
> > +{
> > +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +	int i;
> > +
> > +	for (i = 0; i < map->max_entries; i++) {
> > +		if (insn_set->ptrs[i].user_value.xlated_off == INSN_DELETED)
> > +			continue;
> > +		if (!insn_set->ips[i])
> > +			return -EFAULT;
> > +	}
> > +
> > +	insn_set->state = INSN_SET_STATE_READY;
> > +	return 0;
> > +}
> > +
> > +void bpf_insn_set_release(struct bpf_map *map)
> > +{
> > +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +
> > +	insn_set->state = INSN_SET_STATE_FREE;
> > +}
> 
> [...]
> 
> (... I'll continue reading through patch-set a bit later ...)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set
  2025-06-19 18:55     ` Anton Protopopov
@ 2025-06-19 18:55       ` Eduard Zingerman
  0 siblings, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-06-19 18:55 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On Thu, 2025-06-19 at 18:55 +0000, Anton Protopopov wrote:

[...]

> > I think such maps would be a bit more ergonomic it original
> > instruction index would be saved as well, e.g:
> > 
> >   (original_offset, xlated_offset, jitted_offset)
> > 
> > Otherwise user would have to recover original offset from some
> > external mapping. This information is stored in orig_xlated_off
> > anyway.
> 
> I do agree that this might be convenient to have the original_offset.
> But the only use case I see here is "BPF debuggers". Such programs
> will be able to build this mapping themselves.
> 
> I would add it as is, the only obstacle I see now is map key size.
> Now from BPF point of view and from userspace point of view it is 8.
> Userspace sees (u32 xlated, u32 jitted), and BPF sees *ip. I haven't
> looked at how much work it is to have different key sizes for
> userspace and BPF, and if this breaks things too much.

Uh-oh, I haven't thought about key size being different for kernel/user,
might be a conundrum indeed.

[...]


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set
  2025-06-18  2:16     ` Alexei Starovoitov
@ 2025-06-19 18:57       ` Anton Protopopov
  0 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-19 18:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/06/17 07:16PM, Alexei Starovoitov wrote:
> On Tue, Jun 17, 2025 at 5:57 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> >
> > Meta: "instruction set" is a super confusing name, at-least for me the
> >       first thought is about actual set of instructions supported by
> >       some h/w. instruction_info? instruction_offset? just
> >       "iset"/"ioffset"?
> 
> BPF_MAP_TYPE_INSN_ARRAY ?
> 
> and in the code use either insn_array or iarray

Yes, thanks, "array" sounds better. Will reply here later if I have
any different ideas...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps
  2025-06-18  3:06   ` Alexei Starovoitov
@ 2025-06-19 19:57     ` Anton Protopopov
  2025-06-19 19:58     ` Anton Protopopov
  1 sibling, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-19 19:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/17 08:06PM, Alexei Starovoitov wrote:
> On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > Add support for a new instruction
> >
> >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> >
> > which does an indirect jump to a location stored in Rx. The map M
> > is an instruction set map containing all possible targets for this
> > particular jump.
> >
> > On the jump the register Rx should have type PTR_TO_INSN. This new
> > type assures that the Rx register contains a value (or a range of
> > values) loaded from the map M. Typically, this will be done like this
> > The code above could have been generated for a switch statement with
> > (e.g., this could be a switch statement compiled with LLVM):
> >
> >     0:   r3 = r1                    # "switch (r3)"
> >     1:   if r3 > 0x13 goto +0x666   # check r3 boundaries
> >     2:   r3 <<= 0x3                 # r3 is void*, point to an address
> >     3:   r1 = 0xbeef ll             # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
> 
> Something doesn't add up.
> Since you made libbpf to tag this ld_imm64 as BPF_PSEUDO_MAP_VALUE
> which insn (map key) does it point to ?
> In case of global data it's key==0.
> Here it's 1st element of insn_array ?
> 
> >     5:   r1 += r3                   # r1 inherits boundaries from r3
> >     6:   r1 = *(u64 *)(r1 + 0x0)    # r1 now has type INSN_TO_PTR
> >     7:   gotox r1[,imm=fd(M)]       # verifier checks that M == r1->map_ptr
> >
> > On building the jump graph, and the static analysis, a new function
> > of the INSN_SET is used: bpf_insn_set_iter_xlated_offset(map, n).
> > It lets to iterate over unique slots in an instruction set (equal
> > items can be generated, e.g., for a sparse jump table for a switch,
> > where not all possible branches are taken).
> >
> > Instruction (3) above loads an address of the first element of the
> > map. From BPF point of view, the map is a jump table in native
> > architecture, e.g., an array of jump targets. This patch allows
> > to grab such an address and then later to adjust an offset, like in
> > instruction (5). A value of such type can be dereferenced once to
> > create a PTR_TO_INSN, see instruction (6).
> >
> > When building the config, the high 16 bytes of the insn_state are
> > used, so this patch (theoretically) supports jump tables of up to
> > 2^16 slots.
> >
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > ---
> >  arch/x86/net/bpf_jit_comp.c  |   7 ++
> >  include/linux/bpf.h          |   2 +
> >  include/linux/bpf_verifier.h |   4 +
> >  kernel/bpf/bpf_insn_set.c    |  71 ++++++++++++-
> >  kernel/bpf/core.c            |   2 +
> >  kernel/bpf/verifier.c        | 198 ++++++++++++++++++++++++++++++++++-
> >  6 files changed, 278 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 37dc83d91832..d20f6775605d 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -2520,6 +2520,13 @@ st:                      if (is_imm8(insn->off))
> >
> >                         break;
> >
> > +               case BPF_JMP | BPF_JA | BPF_X:
> > +               case BPF_JMP32 | BPF_JA | BPF_X:
> > +                       emit_indirect_jump(&prog,
> > +                                          reg2hex[insn->dst_reg],
> > +                                          is_ereg(insn->dst_reg),
> > +                                          image + addrs[i - 1]);
> > +                       break;
> >                 case BPF_JMP | BPF_JA:
> >                 case BPF_JMP32 | BPF_JA:
> >                         if (BPF_CLASS(insn->code) == BPF_JMP) {
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 008bcd44c60e..3c5eaea2b476 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -952,6 +952,7 @@ enum bpf_reg_type {
> >         PTR_TO_ARENA,
> >         PTR_TO_BUF,              /* reg points to a read/write buffer */
> >         PTR_TO_FUNC,             /* reg points to a bpf program function */
> > +       PTR_TO_INSN,             /* reg points to a bpf program instruction */
> >         CONST_PTR_TO_DYNPTR,     /* reg points to a const struct bpf_dynptr */
> >         __BPF_REG_TYPE_MAX,
> >
> > @@ -3601,6 +3602,7 @@ int bpf_insn_set_ready(struct bpf_map *map);
> >  void bpf_insn_set_release(struct bpf_map *map);
> >  void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
> >  void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> > +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no);
> >
> >  struct bpf_insn_ptr {
> >         void *jitted_ip;
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 84b5e6b25c52..80d9afcca488 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -229,6 +229,10 @@ struct bpf_reg_state {
> >         enum bpf_reg_liveness live;
> >         /* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
> >         bool precise;
> > +
> > +       /* Used to track boundaries of a PTR_TO_INSN */
> > +       u32 min_index;
> > +       u32 max_index;
> 
> This is no go. We cannot grow bpf_reg_state.
> Find a way to reuse fields without increasing the size.

See my comment below, next to "Why bother consuming memory".

> >  };
> >
> >  enum bpf_stack_slot_type {
> > diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> > index c20e99327118..316cecad60a9 100644
> > --- a/kernel/bpf/bpf_insn_set.c
> > +++ b/kernel/bpf/bpf_insn_set.c
> > @@ -9,6 +9,8 @@ struct bpf_insn_set {
> >         struct bpf_map map;
> >         struct mutex state_mutex;
> >         int state;
> > +       u32 **unique_offsets;
> > +       u32 unique_offsets_cnt;
> >         long *ips;
> >         DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
> >  };
> > @@ -50,6 +52,7 @@ static void insn_set_free(struct bpf_map *map)
> >  {
> >         struct bpf_insn_set *insn_set = cast_insn_set(map);
> >
> > +       kfree(insn_set->unique_offsets);
> >         kfree(insn_set->ips);
> >         bpf_map_area_free(insn_set);
> >  }
> > @@ -69,6 +72,12 @@ static struct bpf_map *insn_set_alloc(union bpf_attr *attr)
> >                 return ERR_PTR(-ENOMEM);
> >         }
> >
> > +       insn_set->unique_offsets = kzalloc(sizeof(long) * attr->max_entries, GFP_KERNEL);
> > +       if (!insn_set->unique_offsets) {
> > +               insn_set_free(&insn_set->map);
> > +               return ERR_PTR(-ENOMEM);
> > +       }
> > +
> >         bpf_map_init_from_attr(&insn_set->map, attr);
> >
> >         mutex_init(&insn_set->state_mutex);
> > @@ -165,10 +174,25 @@ static u64 insn_set_mem_usage(const struct bpf_map *map)
> >         u64 extra_size = 0;
> >
> >         extra_size += sizeof(long) * map->max_entries; /* insn_set->ips */
> > +       extra_size += 4 * map->max_entries; /* insn_set->unique_offsets */
> >
> >         return insn_set_alloc_size(map->max_entries) + extra_size;
> >  }
> >
> > +static int insn_set_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
> > +{
> > +       struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +
> > +       /* for now, just reject all such loads */
> > +       if (off > 0)
> > +               return -EINVAL;
> 
> I bet it's easy enough to make llvm generate such code,
> so this needs to be supported sooner than later.

Ok, makes sense, will add to the list for the next version.

> > +
> > +       /* from BPF's point of view, this map is a jump table */
> > +       *imm = (unsigned long)insn_set->ips;
> > +
> > +       return 0;
> > +}
> > +
> >  BTF_ID_LIST_SINGLE(insn_set_btf_ids, struct, bpf_insn_set)
> >
> >  const struct bpf_map_ops insn_set_map_ops = {
> > @@ -181,6 +205,7 @@ const struct bpf_map_ops insn_set_map_ops = {
> >         .map_delete_elem = insn_set_delete_elem,
> >         .map_check_btf = insn_set_check_btf,
> >         .map_mem_usage = insn_set_mem_usage,
> > +       .map_direct_value_addr = insn_set_map_direct_value_addr,
> >         .map_btf_id = &insn_set_btf_ids[0],
> >  };
> >
> > @@ -217,6 +242,37 @@ static inline bool valid_offsets(const struct bpf_insn_set *insn_set,
> >         return true;
> >  }
> >
> > +static int cmp_unique_offsets(const void *a, const void *b)
> > +{
> > +       return *(u32 *)a - *(u32 *)b;
> > +}
> > +
> > +static int bpf_insn_set_init_unique_offsets(struct bpf_insn_set *insn_set)
> > +{
> > +       u32 cnt = insn_set->map.max_entries, ucnt = 1;
> > +       u32 **off = insn_set->unique_offsets;
> > +       int i;
> > +
> > +       /* [0,3,2,4,6,5,5,5,1,1,0,0] */
> > +       for (i = 0; i < cnt; i++)
> > +               off[i] = &insn_set->ptrs[i].user_value.xlated_off;
> > +
> > +       /* [0,0,0,1,1,2,3,4,5,5,5,6] */
> > +       sort(off, cnt, sizeof(off[0]), cmp_unique_offsets, NULL);
> > +
> > +       /*
> > +        * [0,1,2,3,4,5,6,x,x,x,x,x]
> > +        *  \.........../
> > +        *    unique_offsets_cnt
> > +        */
> > +       for (i = 1; i < cnt; i++)
> > +               if (*off[i] != *off[ucnt-1])
> > +                       off[ucnt++] = off[i];
> > +
> > +       insn_set->unique_offsets_cnt = ucnt;
> > +       return 0;
> > +}
> 
> 
> Why bother with this optimization in the kernel?
> Shouldn't libbpf give unique already?

So, in a _running_ program, an array may contain non-unique elements.
Example:

  switch(i) {
  case 0:
   ...
  case 2:
   ...
  case 4:
   ...
  default:
   ...
  }

LLVM will generate a jump table of size 6 , check that i <= 5,
and point slots 1 and 3 with a jump to "default".

But during the verification, this is not needed to take all the
branches, thus the "unique" array in kernel.

> > +
> >  int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
> >  {
> >         struct bpf_insn_set *insn_set = cast_insn_set(map);
> > @@ -247,7 +303,10 @@ int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
> >         for (i = 0; i < map->max_entries; i++)
> >                 insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
> >
> > -       return 0;
> > +       /*
> > +        * Prepare a set of unique offsets
> > +        */
> > +       return bpf_insn_set_init_unique_offsets(insn_set);
> >  }
> >
> >  int bpf_insn_set_ready(struct bpf_map *map)
> > @@ -336,3 +395,13 @@ void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
> >                 }
> >         }
> >  }
> > +
> > +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no)
> > +{
> > +       struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +
> > +       if (iter_no >= insn_set->unique_offsets_cnt)
> > +               return -ENOENT;
> > +
> > +       return *insn_set->unique_offsets[iter_no];
> > +}
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index e536a34a32c8..058f5f463b74 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -1706,6 +1706,8 @@ bool bpf_opcode_in_insntable(u8 code)
> >                 [BPF_LD | BPF_IND | BPF_B] = true,
> >                 [BPF_LD | BPF_IND | BPF_H] = true,
> >                 [BPF_LD | BPF_IND | BPF_W] = true,
> > +               [BPF_JMP | BPF_JA | BPF_X] = true,
> > +               [BPF_JMP32 | BPF_JA | BPF_X] = true,
> >                 [BPF_JMP | BPF_JCOND] = true,
> >         };
> >  #undef BPF_INSN_3_TBL
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 8ac9a0b5af53..fba553f844f1 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -206,6 +206,7 @@ static int ref_set_non_owning(struct bpf_verifier_env *env,
> >  static void specialize_kfunc(struct bpf_verifier_env *env,
> >                              u32 func_id, u16 offset, unsigned long *addr);
> >  static bool is_trusted_reg(const struct bpf_reg_state *reg);
> > +static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr);
> >
> >  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
> >  {
> > @@ -5648,6 +5649,19 @@ static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
> >         return 0;
> >  }
> >
> > +static int check_insn_set_mem_access(struct bpf_verifier_env *env,
> > +                                    const struct bpf_map *map,
> > +                                    int off, int size, u32 mem_size)
> > +{
> > +       if ((off < 0) || (off % sizeof(long)) || (off/sizeof(long) >= map->max_entries))
> > +               return -EACCES;
> > +
> > +       if (mem_size != 8 || size != 8)
> > +               return -EACCES;
> > +
> > +       return 0;
> > +}
> > +
> >  /* check read/write into memory region (e.g., map value, ringbuf sample, etc) */
> >  static int __check_mem_access(struct bpf_verifier_env *env, int regno,
> >                               int off, int size, u32 mem_size,
> > @@ -5666,6 +5680,10 @@ static int __check_mem_access(struct bpf_verifier_env *env, int regno,
> >                         mem_size, off, size);
> >                 break;
> >         case PTR_TO_MAP_VALUE:
> > +               if (reg->map_ptr->map_type == BPF_MAP_TYPE_INSN_SET &&
> > +                   check_insn_set_mem_access(env, reg->map_ptr, off, size, mem_size) == 0)
> > +                       return 0;
> 
> Don't hack it like this.
> If you're reusing PTR_TO_MAP_VALUE for this then set mem_size correctly
> early on.

Ok, I will see how to make this less hacky. I believe I added the
exception not becase the mem_size, but becaus of "off" (currently,
direct memory access is only allowed to maps of size 1).

> >                 verbose(env, "invalid access to map value, value_size=%d off=%d size=%d\n",
> >                         mem_size, off, size);
> >                 break;
> > @@ -7713,12 +7731,18 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> >  static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
> >                              bool allow_trust_mismatch);
> >
> > +static bool map_is_insn_set(struct bpf_map *map)
> > +{
> > +       return map && map->map_type == BPF_MAP_TYPE_INSN_SET;
> > +}
> > +
> >  static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                           bool strict_alignment_once, bool is_ldsx,
> >                           bool allow_trust_mismatch, const char *ctx)
> >  {
> >         struct bpf_reg_state *regs = cur_regs(env);
> >         enum bpf_reg_type src_reg_type;
> > +       struct bpf_map *map_ptr_copy = NULL;
> >         int err;
> >
> >         /* check src operand */
> > @@ -7733,6 +7757,9 @@ static int /(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >
> >         src_reg_type = regs[insn->src_reg].type;
> >
> > +       if (src_reg_type == PTR_TO_MAP_VALUE && map_is_insn_set(regs[insn->src_reg].map_ptr))
> > +               map_ptr_copy = regs[insn->src_reg].map_ptr;
> > +
> >         /* Check if (src_reg + off) is readable. The state of dst_reg will be
> >          * updated by this call.
> >          */
> > @@ -7743,6 +7770,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                                        allow_trust_mismatch);
> >         err = err ?: reg_bounds_sanity_check(env, &regs[insn->dst_reg], ctx);
> >
> > +       if (map_ptr_copy) {
> > +               regs[insn->dst_reg].type = PTR_TO_INSN;
> > +               regs[insn->dst_reg].map_ptr = map_ptr_copy;
> > +               regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
> > +               regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
> > +       }
> 
> Not pretty. Let's add another argument to map_direct_value_addr()
> and pass regs[value_regno] to it,
> so that callback can set the reg.type correctly instead
> of defaulting to SCALAR_VALUE like it does today.
> 
> Then the callback for insn_array will set it to PTR_TO_INSN.

But here we're dereferencing it. We need to have different types for

    rx = ldimm64 map # "ptr_to_map_value"
    rx *= rx         # PTR_TO_INSN

this is required to 1) make sure that it actually was dereferenced
and 2) that it was only dereferenced once.

Or is this a different comment?

> 
> > +
> >         return err;
> >  }
> >
> > @@ -15296,6 +15330,22 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> >                 return 0;
> >         }
> >
> > +       if (dst_reg->type == PTR_TO_MAP_VALUE && map_is_insn_set(dst_reg->map_ptr)) {
> > +               if (opcode != BPF_ADD) {
> > +                       verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> > +                               bpf_alu_string[opcode >> 4]);
> > +                       return -EACCES;
> > +               }
> > +               src_reg = &regs[insn->src_reg];
> > +               if (src_reg->type != SCALAR_VALUE) {
> > +                       verbose(env, "Adding non-scalar R%d to an instruction ptr is prohibited\n",
> > +                               insn->src_reg);
> > +                       return -EACCES;
> > +               }
> 
> Here you need to check src_reg tnum to make sure it 8-byte aligned
> or I'm missing where it's done.
> 
> > +               dst_reg->min_index = src_reg->umin_value / sizeof(long);
> > +               dst_reg->max_index = src_reg->umax_value / sizeof(long);
> 
> Why bother consuming memory with these two fields if they are derivative ?

I've added it becase it becase when we do

    gotox rx

rx will actually have src_reg->umin_value=0, src_reg->umax_value=~0,
becase rx is "pointing to instruction", thus can actually have random
address. The proper umin/umax were correct before the rx was
dereferenced from a PTR_TO_MAP_VALUE.

As this patch is to be refactored, I will try to see if this can be
avoided to waste more memory.

> > +       }
> > +
> >         if (dst_reg->type != SCALAR_VALUE)
> >                 ptr_reg = dst_reg;
> >
> > @@ -16797,6 +16847,11 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >                         __mark_reg_unknown(env, dst_reg);
> >                         return 0;
> >                 }
> > +               if (map->map_type == BPF_MAP_TYPE_INSN_SET) {
> > +                       dst_reg->type = PTR_TO_MAP_VALUE;
> > +                       dst_reg->off = aux->map_off;
> > +                       return 0;
> > +               }
> >                 dst_reg->type = PTR_TO_MAP_VALUE;
> >                 dst_reg->off = aux->map_off;
> >                 WARN_ON_ONCE(map->max_entries != 1);
> 
> Instead of copy pasting two lines, make WARN conditional.

Ok.

> > @@ -17552,6 +17607,62 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
> >         return 0;
> >  }
> >
> > +#define SET_HIGH(STATE, LAST)  STATE = (STATE & 0xffffU) | ((LAST) << 16)
> > +#define GET_HIGH(STATE)                ((u16)((STATE) >> 16))
> > +
> > +static int gotox_sanity_check(struct bpf_verifier_env *env, int from, int to)
> > +{
> > +       /* TBD: check that to belongs to the same BPF function && whatever else */
> > +
> > +       return 0;
> > +}
> > +
> > +static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct bpf_map *map)
> > +{
> > +       int *insn_stack = env->cfg.insn_stack;
> > +       int *insn_state = env->cfg.insn_state;
> > +       u16 prev_edge = GET_HIGH(insn_state[t]);
> > +       int err;
> > +       int w;
> > +
> > +       w = bpf_insn_set_iter_xlated_offset(map, prev_edge);
> 
> I don't quite understand the algorithm.
> Pls expand the comment.

When we reach a `gotox rx`, rx can contain a pointer to instruction
loaded from a map M. The verifier needs to try to jump to any
possible location in the map.  The bpf_insn_set_iter_xlated_offset()
helps to start from M[0] and iterate over the unique elements in that
set R defined above.

> Also insn_successors() needs to support gotox as well.
> It's used by liveness and by scc.

Oh, sure, thanks.

> > +       if (w == -ENOENT)
> > +               return DONE_EXPLORING;
> > +       else if (w < 0)
> > +               return w;
> > +
> > +       err = gotox_sanity_check(env, t, w);
> > +       if (err)
> > +               return err;
> > +
> > +       mark_prune_point(env, t);
> > +
> > +       if (env->cfg.cur_stack >= env->prog->len)
> > +               return -E2BIG;
> > +       insn_stack[env->cfg.cur_stack++] = w;
> > +
> > +       mark_jmp_point(env, w);
> > +
> > +       SET_HIGH(insn_state[t], prev_edge + 1);
> > +       return KEEP_EXPLORING;
> > +}
> > +
> > +/* "conditional jump with N edges" */
> > +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> > +{
> > +       struct bpf_map *map;
> > +       int ret;
> > +
> > +       ret = add_used_map(env, fd, &map);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       if (map->map_type != BPF_MAP_TYPE_INSN_SET)
> > +               return -EINVAL;
> > +
> > +       return push_goto_x_edge(t, env, map);
> > +}
> > +
> >  /* Visits the instruction at index t and returns one of the following:
> >   *  < 0 - an error occurred
> >   *  DONE_EXPLORING - the instruction was fully explored
> > @@ -17642,8 +17753,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> >                 return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
> >
> >         case BPF_JA:
> > -               if (BPF_SRC(insn->code) != BPF_K)
> > -                       return -EINVAL;
> > +               if (BPF_SRC(insn->code) == BPF_X)
> > +                       return visit_goto_x_insn(t, env, insn->imm);
> 
> There should be a check somewhere that checks that insn->imm ==
> insn_array_map_fd is the same map during the main pass of the
> verifier.

The check is below.

> >
> >                 if (BPF_CLASS(insn->code) == BPF_JMP)
> >                         off = insn->off;
> > @@ -17674,6 +17785,13 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> >         }
> >  }
> >
> > +static bool insn_is_gotox(struct bpf_insn *insn)
> > +{
> > +       return BPF_CLASS(insn->code) == BPF_JMP &&
> > +              BPF_OP(insn->code) == BPF_JA &&
> > +              BPF_SRC(insn->code) == BPF_X;
> > +}
> > +
> >  /* non-recursive depth-first-search to detect loops in BPF program
> >   * loop == back-edge in directed graph
> >   */
> > @@ -18786,11 +18904,22 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
> >                               struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
> >  {
> >         u16 live_regs = env->insn_aux_data[insn_idx].live_regs_before;
> > +       struct bpf_insn *insn;
> >         u16 i;
> >
> >         if (old->callback_depth > cur->callback_depth)
> >                 return false;
> >
> > +       insn = &env->prog->insnsi[insn_idx];
> > +       if (insn_is_gotox(insn)) {
> 
> func_states_equal() shouldn't look back into insn_idx.
> It should use what's in bpf_func_state.

Ok, thanks.

> 
> > +               struct bpf_reg_state *old_dst = &old->regs[insn->dst_reg];
> > +               struct bpf_reg_state *cur_dst = &cur->regs[insn->dst_reg];
> > +
> > +               if (old_dst->min_index != cur_dst->min_index ||
> > +                   old_dst->max_index != cur_dst->max_index)
> > +                       return false;
> 
> Doesn't look right. It should properly compare two PTR_TO_INSN.

Ok, will fix.

> > +       }
> > +
> >         for (i = 0; i < MAX_BPF_REG; i++)
> >                 if (((1 << i) & live_regs) &&
> >                     !regsafe(env, &old->regs[i], &cur->regs[i],
> > @@ -19654,6 +19783,55 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> >         return PROCESS_BPF_EXIT;
> >  }
> >
> > +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > +{
> > +       struct bpf_verifier_state *other_branch;
> > +       struct bpf_reg_state *dst_reg;
> > +       struct bpf_map *map;
> > +       int xoff;
> > +       int err;
> > +       u32 i;
> > +
> > +       /* this map should already have been added */
> > +       err = add_used_map(env, insn->imm, &map);
> 
> Found that check.
> Let's not abuse add_used_map() for that.
> Remember map pointer during resolve_pseudo_ldimm64()
> in insn_aux_data for gotox insn.
> No need to call add_used_map() so late.

Yes, thanks, I wanted initially to add smth like find_used_map()
(thus the commen above), but saving in aux is better.

> > +       if (err < 0)
> > +               return err;
> > +
> > +       dst_reg = reg_state(env, insn->dst_reg);
> > +       if (dst_reg->type != PTR_TO_INSN) {
> > +               verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> > +                               insn->dst_reg, dst_reg->type);
> > +               return -EINVAL;
> > +       }
> > +
> > +       if (dst_reg->map_ptr != map) {
> 
> and here it would compare dst_reg->map_ptr with env->used_maps[aux->map_index]

Yes, thanks.

> > +               verbose(env, "BPF_JA|BPF_X R%d was loaded from map id=%u, expected id=%u\n",
> > +                               insn->dst_reg, dst_reg->map_ptr->id, map->id);
> > +               return -EINVAL;
> > +       }
> > +
> > +       if (dst_reg->max_index >= map->max_entries)
> > +               return -EINVAL;
> > +
> > +       for (i = dst_reg->min_index + 1; i <= dst_reg->max_index; i++) {
> > +               xoff = bpf_insn_set_iter_xlated_offset(map, i);
> > +               if (xoff == -ENOENT)
> > +                       break;
> > +               if (xoff < 0)
> > +                       return xoff;
> > +
> > +               other_branch = push_stack(env, xoff, env->insn_idx, false);
> > +               if (!other_branch)
> > +                       return -EFAULT;
> > +       }
> > +
> > +       env->insn_idx = bpf_insn_set_iter_xlated_offset(map, dst_reg->min_index);
> > +       if (env->insn_idx < 0)
> > +               return env->insn_idx;
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> >  {
> >         int err;
> > @@ -19756,6 +19934,9 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> >
> >                         mark_reg_scratched(env, BPF_REG_0);
> >                 } else if (opcode == BPF_JA) {
> > +                       if (BPF_SRC(insn->code) == BPF_X)
> > +                               return check_indirect_jump(env, insn);
> > +
> >                         if (BPF_SRC(insn->code) != BPF_K ||
> >                             insn->src_reg != BPF_REG_0 ||
> >                             insn->dst_reg != BPF_REG_0 ||
> > @@ -20243,6 +20424,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
> >                 case BPF_MAP_TYPE_QUEUE:
> >                 case BPF_MAP_TYPE_STACK:
> >                 case BPF_MAP_TYPE_ARENA:
> > +               case BPF_MAP_TYPE_INSN_SET:
> >                         break;
> >                 default:
> >                         verbose(env,
> > @@ -20330,10 +20512,11 @@ static int __add_used_map(struct bpf_verifier_env *env, struct bpf_map *map)
> >   * its index.
> >   * Returns <0 on error, or >= 0 index, on success.
> >   */
> > -static int add_used_map(struct bpf_verifier_env *env, int fd)
> > +static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr)
> 
> no need.

Thanks, will revert.

> >  {
> >         struct bpf_map *map;
> >         CLASS(fd, f)(fd);
> > +       int ret;
> >
> >         map = __bpf_map_get(f);
> >         if (IS_ERR(map)) {
> > @@ -20341,7 +20524,10 @@ static int add_used_map(struct bpf_verifier_env *env, int fd)
> >                 return PTR_ERR(map);
> >         }
> >
> > -       return __add_used_map(env, map);
> > +       ret = __add_used_map(env, map);
> > +       if (ret >= 0 && map_ptr)
> > +               *map_ptr = map;
> > +       return ret;
> >  }
> >
> >  /* find and rewrite pseudo imm in ld_imm64 instructions:
> > @@ -20435,7 +20621,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
> >                                 break;
> >                         }
> >
> > -                       map_idx = add_used_map(env, fd);
> > +                       map_idx = add_used_map(env, fd, NULL);
> >                         if (map_idx < 0)
> >                                 return map_idx;
> >                         map = env->used_maps[map_idx];
> > @@ -21459,6 +21645,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
> >                 func[i]->aux->jited_linfo = prog->aux->jited_linfo;
> >                 func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
> >                 func[i]->aux->arena = prog->aux->arena;
> > +               func[i]->aux->used_maps = env->used_maps;
> > +               func[i]->aux->used_map_cnt = env->used_map_cnt;
> >                 num_exentries = 0;
> >                 insn = func[i]->insnsi;
> >                 for (j = 0; j < func[i]->len; j++, insn++) {
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps
  2025-06-18  3:06   ` Alexei Starovoitov
  2025-06-19 19:57     ` Anton Protopopov
@ 2025-06-19 19:58     ` Anton Protopopov
  1 sibling, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-19 19:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Eduard Zingerman, Quentin Monnet, Yonghong Song

On 25/06/17 08:06PM, Alexei Starovoitov wrote:
> On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > Add support for a new instruction
> >
> >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> >
> > which does an indirect jump to a location stored in Rx. The map M
> > is an instruction set map containing all possible targets for this
> > particular jump.
> >
> > On the jump the register Rx should have type PTR_TO_INSN. This new
> > type assures that the Rx register contains a value (or a range of
> > values) loaded from the map M. Typically, this will be done like this
> > The code above could have been generated for a switch statement with
> > (e.g., this could be a switch statement compiled with LLVM):
> >
> >     0:   r3 = r1                    # "switch (r3)"
> >     1:   if r3 > 0x13 goto +0x666   # check r3 boundaries
> >     2:   r3 <<= 0x3                 # r3 is void*, point to an address
> >     3:   r1 = 0xbeef ll             # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
> 
> Something doesn't add up.
> Since you made libbpf to tag this ld_imm64 as BPF_PSEUDO_MAP_VALUE
> which insn (map key) does it point to ?
> In case of global data it's key==0.
> Here it's 1st element of insn_array ?

Sorry, could you please rephrase the question here, I do not get it.

> >     5:   r1 += r3                   # r1 inherits boundaries from r3
> >     6:   r1 = *(u64 *)(r1 + 0x0)    # r1 now has type INSN_TO_PTR
> >     7:   gotox r1[,imm=fd(M)]       # verifier checks that M == r1->map_ptr
> >
> > On building the jump graph, and the static analysis, a new function
> > of the INSN_SET is used: bpf_insn_set_iter_xlated_offset(map, n).
> > It lets to iterate over unique slots in an instruction set (equal
> > items can be generated, e.g., for a sparse jump table for a switch,
> > where not all possible branches are taken).
> >
> > Instruction (3) above loads an address of the first element of the
> > map. From BPF point of view, the map is a jump table in native
> > architecture, e.g., an array of jump targets. This patch allows
> > to grab such an address and then later to adjust an offset, like in
> > instruction (5). A value of such type can be dereferenced once to
> > create a PTR_TO_INSN, see instruction (6).
> >
> > When building the config, the high 16 bytes of the insn_state are
> > used, so this patch (theoretically) supports jump tables of up to
> > 2^16 slots.
> >
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > ---
> >  arch/x86/net/bpf_jit_comp.c  |   7 ++
> >  include/linux/bpf.h          |   2 +
> >  include/linux/bpf_verifier.h |   4 +
> >  kernel/bpf/bpf_insn_set.c    |  71 ++++++++++++-
> >  kernel/bpf/core.c            |   2 +
> >  kernel/bpf/verifier.c        | 198 ++++++++++++++++++++++++++++++++++-
> >  6 files changed, 278 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 37dc83d91832..d20f6775605d 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -2520,6 +2520,13 @@ st:                      if (is_imm8(insn->off))
> >
> >                         break;
> >
> > +               case BPF_JMP | BPF_JA | BPF_X:
> > +               case BPF_JMP32 | BPF_JA | BPF_X:
> > +                       emit_indirect_jump(&prog,
> > +                                          reg2hex[insn->dst_reg],
> > +                                          is_ereg(insn->dst_reg),
> > +                                          image + addrs[i - 1]);
> > +                       break;
> >                 case BPF_JMP | BPF_JA:
> >                 case BPF_JMP32 | BPF_JA:
> >                         if (BPF_CLASS(insn->code) == BPF_JMP) {
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 008bcd44c60e..3c5eaea2b476 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -952,6 +952,7 @@ enum bpf_reg_type {
> >         PTR_TO_ARENA,
> >         PTR_TO_BUF,              /* reg points to a read/write buffer */
> >         PTR_TO_FUNC,             /* reg points to a bpf program function */
> > +       PTR_TO_INSN,             /* reg points to a bpf program instruction */
> >         CONST_PTR_TO_DYNPTR,     /* reg points to a const struct bpf_dynptr */
> >         __BPF_REG_TYPE_MAX,
> >
> > @@ -3601,6 +3602,7 @@ int bpf_insn_set_ready(struct bpf_map *map);
> >  void bpf_insn_set_release(struct bpf_map *map);
> >  void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
> >  void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> > +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no);
> >
> >  struct bpf_insn_ptr {
> >         void *jitted_ip;
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 84b5e6b25c52..80d9afcca488 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -229,6 +229,10 @@ struct bpf_reg_state {
> >         enum bpf_reg_liveness live;
> >         /* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
> >         bool precise;
> > +
> > +       /* Used to track boundaries of a PTR_TO_INSN */
> > +       u32 min_index;
> > +       u32 max_index;
> 
> This is no go. We cannot grow bpf_reg_state.
> Find a way to reuse fields without increasing the size.
> 
> >  };
> >
> >  enum bpf_stack_slot_type {
> > diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> > index c20e99327118..316cecad60a9 100644
> > --- a/kernel/bpf/bpf_insn_set.c
> > +++ b/kernel/bpf/bpf_insn_set.c
> > @@ -9,6 +9,8 @@ struct bpf_insn_set {
> >         struct bpf_map map;
> >         struct mutex state_mutex;
> >         int state;
> > +       u32 **unique_offsets;
> > +       u32 unique_offsets_cnt;
> >         long *ips;
> >         DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
> >  };
> > @@ -50,6 +52,7 @@ static void insn_set_free(struct bpf_map *map)
> >  {
> >         struct bpf_insn_set *insn_set = cast_insn_set(map);
> >
> > +       kfree(insn_set->unique_offsets);
> >         kfree(insn_set->ips);
> >         bpf_map_area_free(insn_set);
> >  }
> > @@ -69,6 +72,12 @@ static struct bpf_map *insn_set_alloc(union bpf_attr *attr)
> >                 return ERR_PTR(-ENOMEM);
> >         }
> >
> > +       insn_set->unique_offsets = kzalloc(sizeof(long) * attr->max_entries, GFP_KERNEL);
> > +       if (!insn_set->unique_offsets) {
> > +               insn_set_free(&insn_set->map);
> > +               return ERR_PTR(-ENOMEM);
> > +       }
> > +
> >         bpf_map_init_from_attr(&insn_set->map, attr);
> >
> >         mutex_init(&insn_set->state_mutex);
> > @@ -165,10 +174,25 @@ static u64 insn_set_mem_usage(const struct bpf_map *map)
> >         u64 extra_size = 0;
> >
> >         extra_size += sizeof(long) * map->max_entries; /* insn_set->ips */
> > +       extra_size += 4 * map->max_entries; /* insn_set->unique_offsets */
> >
> >         return insn_set_alloc_size(map->max_entries) + extra_size;
> >  }
> >
> > +static int insn_set_map_direct_value_addr(const struct bpf_map *map, u64 *imm, u32 off)
> > +{
> > +       struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +
> > +       /* for now, just reject all such loads */
> > +       if (off > 0)
> > +               return -EINVAL;
> 
> I bet it's easy enough to make llvm generate such code,
> so this needs to be supported sooner than later.
> 
> > +
> > +       /* from BPF's point of view, this map is a jump table */
> > +       *imm = (unsigned long)insn_set->ips;
> > +
> > +       return 0;
> > +}
> > +
> >  BTF_ID_LIST_SINGLE(insn_set_btf_ids, struct, bpf_insn_set)
> >
> >  const struct bpf_map_ops insn_set_map_ops = {
> > @@ -181,6 +205,7 @@ const struct bpf_map_ops insn_set_map_ops = {
> >         .map_delete_elem = insn_set_delete_elem,
> >         .map_check_btf = insn_set_check_btf,
> >         .map_mem_usage = insn_set_mem_usage,
> > +       .map_direct_value_addr = insn_set_map_direct_value_addr,
> >         .map_btf_id = &insn_set_btf_ids[0],
> >  };
> >
> > @@ -217,6 +242,37 @@ static inline bool valid_offsets(const struct bpf_insn_set *insn_set,
> >         return true;
> >  }
> >
> > +static int cmp_unique_offsets(const void *a, const void *b)
> > +{
> > +       return *(u32 *)a - *(u32 *)b;
> > +}
> > +
> > +static int bpf_insn_set_init_unique_offsets(struct bpf_insn_set *insn_set)
> > +{
> > +       u32 cnt = insn_set->map.max_entries, ucnt = 1;
> > +       u32 **off = insn_set->unique_offsets;
> > +       int i;
> > +
> > +       /* [0,3,2,4,6,5,5,5,1,1,0,0] */
> > +       for (i = 0; i < cnt; i++)
> > +               off[i] = &insn_set->ptrs[i].user_value.xlated_off;
> > +
> > +       /* [0,0,0,1,1,2,3,4,5,5,5,6] */
> > +       sort(off, cnt, sizeof(off[0]), cmp_unique_offsets, NULL);
> > +
> > +       /*
> > +        * [0,1,2,3,4,5,6,x,x,x,x,x]
> > +        *  \.........../
> > +        *    unique_offsets_cnt
> > +        */
> > +       for (i = 1; i < cnt; i++)
> > +               if (*off[i] != *off[ucnt-1])
> > +                       off[ucnt++] = off[i];
> > +
> > +       insn_set->unique_offsets_cnt = ucnt;
> > +       return 0;
> > +}
> 
> 
> Why bother with this optimization in the kernel?
> Shouldn't libbpf give unique already?
> 
> > +
> >  int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
> >  {
> >         struct bpf_insn_set *insn_set = cast_insn_set(map);
> > @@ -247,7 +303,10 @@ int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
> >         for (i = 0; i < map->max_entries; i++)
> >                 insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
> >
> > -       return 0;
> > +       /*
> > +        * Prepare a set of unique offsets
> > +        */
> > +       return bpf_insn_set_init_unique_offsets(insn_set);
> >  }
> >
> >  int bpf_insn_set_ready(struct bpf_map *map)
> > @@ -336,3 +395,13 @@ void bpf_prog_update_insn_ptr(struct bpf_prog *prog,
> >                 }
> >         }
> >  }
> > +
> > +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no)
> > +{
> > +       struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +
> > +       if (iter_no >= insn_set->unique_offsets_cnt)
> > +               return -ENOENT;
> > +
> > +       return *insn_set->unique_offsets[iter_no];
> > +}
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index e536a34a32c8..058f5f463b74 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -1706,6 +1706,8 @@ bool bpf_opcode_in_insntable(u8 code)
> >                 [BPF_LD | BPF_IND | BPF_B] = true,
> >                 [BPF_LD | BPF_IND | BPF_H] = true,
> >                 [BPF_LD | BPF_IND | BPF_W] = true,
> > +               [BPF_JMP | BPF_JA | BPF_X] = true,
> > +               [BPF_JMP32 | BPF_JA | BPF_X] = true,
> >                 [BPF_JMP | BPF_JCOND] = true,
> >         };
> >  #undef BPF_INSN_3_TBL
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 8ac9a0b5af53..fba553f844f1 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -206,6 +206,7 @@ static int ref_set_non_owning(struct bpf_verifier_env *env,
> >  static void specialize_kfunc(struct bpf_verifier_env *env,
> >                              u32 func_id, u16 offset, unsigned long *addr);
> >  static bool is_trusted_reg(const struct bpf_reg_state *reg);
> > +static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr);
> >
> >  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
> >  {
> > @@ -5648,6 +5649,19 @@ static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
> >         return 0;
> >  }
> >
> > +static int check_insn_set_mem_access(struct bpf_verifier_env *env,
> > +                                    const struct bpf_map *map,
> > +                                    int off, int size, u32 mem_size)
> > +{
> > +       if ((off < 0) || (off % sizeof(long)) || (off/sizeof(long) >= map->max_entries))
> > +               return -EACCES;
> > +
> > +       if (mem_size != 8 || size != 8)
> > +               return -EACCES;
> > +
> > +       return 0;
> > +}
> > +
> >  /* check read/write into memory region (e.g., map value, ringbuf sample, etc) */
> >  static int __check_mem_access(struct bpf_verifier_env *env, int regno,
> >                               int off, int size, u32 mem_size,
> > @@ -5666,6 +5680,10 @@ static int __check_mem_access(struct bpf_verifier_env *env, int regno,
> >                         mem_size, off, size);
> >                 break;
> >         case PTR_TO_MAP_VALUE:
> > +               if (reg->map_ptr->map_type == BPF_MAP_TYPE_INSN_SET &&
> > +                   check_insn_set_mem_access(env, reg->map_ptr, off, size, mem_size) == 0)
> > +                       return 0;
> 
> Don't hack it like this.
> If you're reusing PTR_TO_MAP_VALUE for this then set mem_size correctly
> early on.
> 
> >                 verbose(env, "invalid access to map value, value_size=%d off=%d size=%d\n",
> >                         mem_size, off, size);
> >                 break;
> > @@ -7713,12 +7731,18 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> >  static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
> >                              bool allow_trust_mismatch);
> >
> > +static bool map_is_insn_set(struct bpf_map *map)
> > +{
> > +       return map && map->map_type == BPF_MAP_TYPE_INSN_SET;
> > +}
> > +
> >  static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                           bool strict_alignment_once, bool is_ldsx,
> >                           bool allow_trust_mismatch, const char *ctx)
> >  {
> >         struct bpf_reg_state *regs = cur_regs(env);
> >         enum bpf_reg_type src_reg_type;
> > +       struct bpf_map *map_ptr_copy = NULL;
> >         int err;
> >
> >         /* check src operand */
> > @@ -7733,6 +7757,9 @@ static int /(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >
> >         src_reg_type = regs[insn->src_reg].type;
> >
> > +       if (src_reg_type == PTR_TO_MAP_VALUE && map_is_insn_set(regs[insn->src_reg].map_ptr))
> > +               map_ptr_copy = regs[insn->src_reg].map_ptr;
> > +
> >         /* Check if (src_reg + off) is readable. The state of dst_reg will be
> >          * updated by this call.
> >          */
> > @@ -7743,6 +7770,13 @@ static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                                        allow_trust_mismatch);
> >         err = err ?: reg_bounds_sanity_check(env, &regs[insn->dst_reg], ctx);
> >
> > +       if (map_ptr_copy) {
> > +               regs[insn->dst_reg].type = PTR_TO_INSN;
> > +               regs[insn->dst_reg].map_ptr = map_ptr_copy;
> > +               regs[insn->dst_reg].min_index = regs[insn->src_reg].min_index;
> > +               regs[insn->dst_reg].max_index = regs[insn->src_reg].max_index;
> > +       }
> 
> Not pretty. Let's add another argument to map_direct_value_addr()
> and pass regs[value_regno] to it,
> so that callback can set the reg.type correctly instead
> of defaulting to SCALAR_VALUE like it does today.
> 
> Then the callback for insn_array will set it to PTR_TO_INSN.
> 
> > +
> >         return err;
> >  }
> >
> > @@ -15296,6 +15330,22 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> >                 return 0;
> >         }
> >
> > +       if (dst_reg->type == PTR_TO_MAP_VALUE && map_is_insn_set(dst_reg->map_ptr)) {
> > +               if (opcode != BPF_ADD) {
> > +                       verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> > +                               bpf_alu_string[opcode >> 4]);
> > +                       return -EACCES;
> > +               }
> > +               src_reg = &regs[insn->src_reg];
> > +               if (src_reg->type != SCALAR_VALUE) {
> > +                       verbose(env, "Adding non-scalar R%d to an instruction ptr is prohibited\n",
> > +                               insn->src_reg);
> > +                       return -EACCES;
> > +               }
> 
> Here you need to check src_reg tnum to make sure it 8-byte aligned
> or I'm missing where it's done.
> 
> > +               dst_reg->min_index = src_reg->umin_value / sizeof(long);
> > +               dst_reg->max_index = src_reg->umax_value / sizeof(long);
> 
> Why bother consuming memory with these two fields if they are derivative ?
> 
> > +       }
> > +
> >         if (dst_reg->type != SCALAR_VALUE)
> >                 ptr_reg = dst_reg;
> >
> > @@ -16797,6 +16847,11 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >                         __mark_reg_unknown(env, dst_reg);
> >                         return 0;
> >                 }
> > +               if (map->map_type == BPF_MAP_TYPE_INSN_SET) {
> > +                       dst_reg->type = PTR_TO_MAP_VALUE;
> > +                       dst_reg->off = aux->map_off;
> > +                       return 0;
> > +               }
> >                 dst_reg->type = PTR_TO_MAP_VALUE;
> >                 dst_reg->off = aux->map_off;
> >                 WARN_ON_ONCE(map->max_entries != 1);
> 
> Instead of copy pasting two lines, make WARN conditional.
> 
> > @@ -17552,6 +17607,62 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
> >         return 0;
> >  }
> >
> > +#define SET_HIGH(STATE, LAST)  STATE = (STATE & 0xffffU) | ((LAST) << 16)
> > +#define GET_HIGH(STATE)                ((u16)((STATE) >> 16))
> > +
> > +static int gotox_sanity_check(struct bpf_verifier_env *env, int from, int to)
> > +{
> > +       /* TBD: check that to belongs to the same BPF function && whatever else */
> > +
> > +       return 0;
> > +}
> > +
> > +static int push_goto_x_edge(int t, struct bpf_verifier_env *env, struct bpf_map *map)
> > +{
> > +       int *insn_stack = env->cfg.insn_stack;
> > +       int *insn_state = env->cfg.insn_state;
> > +       u16 prev_edge = GET_HIGH(insn_state[t]);
> > +       int err;
> > +       int w;
> > +
> > +       w = bpf_insn_set_iter_xlated_offset(map, prev_edge);
> 
> I don't quite understand the algorithm.
> Pls expand the comment.
> 
> Also insn_successors() needs to support gotox as well.
> It's used by liveness and by scc.
> 
> > +       if (w == -ENOENT)
> > +               return DONE_EXPLORING;
> > +       else if (w < 0)
> > +               return w;
> > +
> > +       err = gotox_sanity_check(env, t, w);
> > +       if (err)
> > +               return err;
> > +
> > +       mark_prune_point(env, t);
> > +
> > +       if (env->cfg.cur_stack >= env->prog->len)
> > +               return -E2BIG;
> > +       insn_stack[env->cfg.cur_stack++] = w;
> > +
> > +       mark_jmp_point(env, w);
> > +
> > +       SET_HIGH(insn_state[t], prev_edge + 1);
> > +       return KEEP_EXPLORING;
> > +}
> > +
> > +/* "conditional jump with N edges" */
> > +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> > +{
> > +       struct bpf_map *map;
> > +       int ret;
> > +
> > +       ret = add_used_map(env, fd, &map);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       if (map->map_type != BPF_MAP_TYPE_INSN_SET)
> > +               return -EINVAL;
> > +
> > +       return push_goto_x_edge(t, env, map);
> > +}
> > +
> >  /* Visits the instruction at index t and returns one of the following:
> >   *  < 0 - an error occurred
> >   *  DONE_EXPLORING - the instruction was fully explored
> > @@ -17642,8 +17753,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> >                 return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
> >
> >         case BPF_JA:
> > -               if (BPF_SRC(insn->code) != BPF_K)
> > -                       return -EINVAL;
> > +               if (BPF_SRC(insn->code) == BPF_X)
> > +                       return visit_goto_x_insn(t, env, insn->imm);
> 
> There should be a check somewhere that checks that insn->imm ==
> insn_array_map_fd is the same map during the main pass of the
> verifier.
> 
> >
> >                 if (BPF_CLASS(insn->code) == BPF_JMP)
> >                         off = insn->off;
> > @@ -17674,6 +17785,13 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> >         }
> >  }
> >
> > +static bool insn_is_gotox(struct bpf_insn *insn)
> > +{
> > +       return BPF_CLASS(insn->code) == BPF_JMP &&
> > +              BPF_OP(insn->code) == BPF_JA &&
> > +              BPF_SRC(insn->code) == BPF_X;
> > +}
> > +
> >  /* non-recursive depth-first-search to detect loops in BPF program
> >   * loop == back-edge in directed graph
> >   */
> > @@ -18786,11 +18904,22 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
> >                               struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
> >  {
> >         u16 live_regs = env->insn_aux_data[insn_idx].live_regs_before;
> > +       struct bpf_insn *insn;
> >         u16 i;
> >
> >         if (old->callback_depth > cur->callback_depth)
> >                 return false;
> >
> > +       insn = &env->prog->insnsi[insn_idx];
> > +       if (insn_is_gotox(insn)) {
> 
> func_states_equal() shouldn't look back into insn_idx.
> It should use what's in bpf_func_state.
> 
> > +               struct bpf_reg_state *old_dst = &old->regs[insn->dst_reg];
> > +               struct bpf_reg_state *cur_dst = &cur->regs[insn->dst_reg];
> > +
> > +               if (old_dst->min_index != cur_dst->min_index ||
> > +                   old_dst->max_index != cur_dst->max_index)
> > +                       return false;
> 
> Doesn't look right. It should properly compare two PTR_TO_INSN.
> 
> > +       }
> > +
> >         for (i = 0; i < MAX_BPF_REG; i++)
> >                 if (((1 << i) & live_regs) &&
> >                     !regsafe(env, &old->regs[i], &cur->regs[i],
> > @@ -19654,6 +19783,55 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> >         return PROCESS_BPF_EXIT;
> >  }
> >
> > +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > +{
> > +       struct bpf_verifier_state *other_branch;
> > +       struct bpf_reg_state *dst_reg;
> > +       struct bpf_map *map;
> > +       int xoff;
> > +       int err;
> > +       u32 i;
> > +
> > +       /* this map should already have been added */
> > +       err = add_used_map(env, insn->imm, &map);
> 
> Found that check.
> Let's not abuse add_used_map() for that.
> Remember map pointer during resolve_pseudo_ldimm64()
> in insn_aux_data for gotox insn.
> No need to call add_used_map() so late.
> 
> > +       if (err < 0)
> > +               return err;
> > +
> > +       dst_reg = reg_state(env, insn->dst_reg);
> > +       if (dst_reg->type != PTR_TO_INSN) {
> > +               verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> > +                               insn->dst_reg, dst_reg->type);
> > +               return -EINVAL;
> > +       }
> > +
> > +       if (dst_reg->map_ptr != map) {
> 
> and here it would compare dst_reg->map_ptr with env->used_maps[aux->map_index]
> 
> > +               verbose(env, "BPF_JA|BPF_X R%d was loaded from map id=%u, expected id=%u\n",
> > +                               insn->dst_reg, dst_reg->map_ptr->id, map->id);
> > +               return -EINVAL;
> > +       }
> > +
> > +       if (dst_reg->max_index >= map->max_entries)
> > +               return -EINVAL;
> > +
> > +       for (i = dst_reg->min_index + 1; i <= dst_reg->max_index; i++) {
> > +               xoff = bpf_insn_set_iter_xlated_offset(map, i);
> > +               if (xoff == -ENOENT)
> > +                       break;
> > +               if (xoff < 0)
> > +                       return xoff;
> > +
> > +               other_branch = push_stack(env, xoff, env->insn_idx, false);
> > +               if (!other_branch)
> > +                       return -EFAULT;
> > +       }
> > +
> > +       env->insn_idx = bpf_insn_set_iter_xlated_offset(map, dst_reg->min_index);
> > +       if (env->insn_idx < 0)
> > +               return env->insn_idx;
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> >  {
> >         int err;
> > @@ -19756,6 +19934,9 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> >
> >                         mark_reg_scratched(env, BPF_REG_0);
> >                 } else if (opcode == BPF_JA) {
> > +                       if (BPF_SRC(insn->code) == BPF_X)
> > +                               return check_indirect_jump(env, insn);
> > +
> >                         if (BPF_SRC(insn->code) != BPF_K ||
> >                             insn->src_reg != BPF_REG_0 ||
> >                             insn->dst_reg != BPF_REG_0 ||
> > @@ -20243,6 +20424,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
> >                 case BPF_MAP_TYPE_QUEUE:
> >                 case BPF_MAP_TYPE_STACK:
> >                 case BPF_MAP_TYPE_ARENA:
> > +               case BPF_MAP_TYPE_INSN_SET:
> >                         break;
> >                 default:
> >                         verbose(env,
> > @@ -20330,10 +20512,11 @@ static int __add_used_map(struct bpf_verifier_env *env, struct bpf_map *map)
> >   * its index.
> >   * Returns <0 on error, or >= 0 index, on success.
> >   */
> > -static int add_used_map(struct bpf_verifier_env *env, int fd)
> > +static int add_used_map(struct bpf_verifier_env *env, int fd, struct bpf_map **map_ptr)
> 
> no need.
> 
> >  {
> >         struct bpf_map *map;
> >         CLASS(fd, f)(fd);
> > +       int ret;
> >
> >         map = __bpf_map_get(f);
> >         if (IS_ERR(map)) {
> > @@ -20341,7 +20524,10 @@ static int add_used_map(struct bpf_verifier_env *env, int fd)
> >                 return PTR_ERR(map);
> >         }
> >
> > -       return __add_used_map(env, map);
> > +       ret = __add_used_map(env, map);
> > +       if (ret >= 0 && map_ptr)
> > +               *map_ptr = map;
> > +       return ret;
> >  }
> >
> >  /* find and rewrite pseudo imm in ld_imm64 instructions:
> > @@ -20435,7 +20621,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
> >                                 break;
> >                         }
> >
> > -                       map_idx = add_used_map(env, fd);
> > +                       map_idx = add_used_map(env, fd, NULL);
> >                         if (map_idx < 0)
> >                                 return map_idx;
> >                         map = env->used_maps[map_idx];
> > @@ -21459,6 +21645,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
> >                 func[i]->aux->jited_linfo = prog->aux->jited_linfo;
> >                 func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
> >                 func[i]->aux->arena = prog->aux->arena;
> > +               func[i]->aux->used_maps = env->used_maps;
> > +               func[i]->aux->used_map_cnt = env->used_map_cnt;
> >                 num_exentries = 0;
> >                 insn = func[i]->insnsi;
> >                 for (j = 0; j < func[i]->len; j++, insn++) {
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps
  2025-06-18 11:03   ` Eduard Zingerman
@ 2025-06-19 20:13     ` Anton Protopopov
  0 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-06-19 20:13 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/06/18 04:03AM, Eduard Zingerman wrote:
> On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> 
> [...]
> 
> >     0:   r3 = r1                    # "switch (r3)"
> >     1:   if r3 > 0x13 goto +0x666   # check r3 boundaries
> >     2:   r3 <<= 0x3                 # r3 is void*, point to an address
> >     3:   r1 = 0xbeef ll             # r1 is PTR_TO_MAP_VALUE, r1->map_ptr=M
> >     5:   r1 += r3                   # r1 inherits boundaries from r3
> >     6:   r1 = *(u64 *)(r1 + 0x0)    # r1 now has type INSN_TO_PTR
>                                                         ^^^^^^^^^^^
>                                                         PTR_TO_INSN?

Heh, thanks, fill fix. [It's C, so a[1] and 1[a] means the same :-)]

> >     7:   gotox r1[,imm=fd(M)]       # verifier checks that M == r1->map_ptr
> 
> [...]
> 
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 37dc83d91832..d20f6775605d 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -2520,6 +2520,13 @@ st:			if (is_imm8(insn->off))
> >  
> >  			break;
> >  
> > +		case BPF_JMP | BPF_JA | BPF_X:
> > +		case BPF_JMP32 | BPF_JA | BPF_X:
> 
> Is it necessary to add both JMP and JMP32 versions?
> Do we need to extend e.g. bpf_jit_supports_insn() and report an error
> in verifier.c or should we rely on individual jits to report unknown
> instruction?

Hmm, should I just leave BPF_JMP? Or just leave as is and do not distinguish?

> 
> > +			emit_indirect_jump(&prog,
> > +					   reg2hex[insn->dst_reg],
> > +					   is_ereg(insn->dst_reg),
> > +					   image + addrs[i - 1]);
> > +			break;
> >  		case BPF_JMP | BPF_JA:
> >  		case BPF_JMP32 | BPF_JA:
> >  			if (BPF_CLASS(insn->code) == BPF_JMP) {
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 008bcd44c60e..3c5eaea2b476 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -952,6 +952,7 @@ enum bpf_reg_type {
> >  	PTR_TO_ARENA,
> >  	PTR_TO_BUF,		 /* reg points to a read/write buffer */
> >  	PTR_TO_FUNC,		 /* reg points to a bpf program function */
> > +	PTR_TO_INSN,		 /* reg points to a bpf program instruction */
> >  	CONST_PTR_TO_DYNPTR,	 /* reg points to a const struct bpf_dynptr */
> >  	__BPF_REG_TYPE_MAX,
> >  
> > @@ -3601,6 +3602,7 @@ int bpf_insn_set_ready(struct bpf_map *map);
> >  void bpf_insn_set_release(struct bpf_map *map);
> >  void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
> >  void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> > +int bpf_insn_set_iter_xlated_offset(struct bpf_map *map, u32 iter_no);
> 
> This is a horrible name:
> - this function is not an iterator;
> - it is way too long.
> 
> Maybe make it a bit more complex but convenient to use, e.g.:
> 
>   struct bpf_iarray_iter {
> 	struct bpf_map *map;
> 	u32 idx;
>   };
> 
>   struct bpf_iset_iter bpf_iset_make_iter(struct bpf_map *map, u32 lo, u32 hi);
>   bool bpf_iset_iter_next(struct bpf_iarray_iter *it, u32 *offset); // still a horrible name
> 
> This would hide the manipulation with unique indices from verifier.c.
> 
> ?

How about just bpf_insn_set_next[_unique]_offset()?

> 
> >  
> >  struct bpf_insn_ptr {
> >  	void *jitted_ip;
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 84b5e6b25c52..80d9afcca488 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -229,6 +229,10 @@ struct bpf_reg_state {
> >  	enum bpf_reg_liveness live;
> >  	/* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
> >  	bool precise;
> > +
> > +	/* Used to track boundaries of a PTR_TO_INSN */
> > +	u32 min_index;
> > +	u32 max_index;
> 
> Use {umin,umax}_value instead?

Please see my comments in the reply to Alexei.

> >  };
> >  
> >  enum bpf_stack_slot_type {
> > diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> > index c20e99327118..316cecad60a9 100644
> > --- a/kernel/bpf/bpf_insn_set.c
> > +++ b/kernel/bpf/bpf_insn_set.c
> > @@ -9,6 +9,8 @@ struct bpf_insn_set {
> >  	struct bpf_map map;
> >  	struct mutex state_mutex;
> >  	int state;
> > +	u32 **unique_offsets;
> 
> Why is this a pointer to pointer?
> bpf_insn_set_iter_xlated_offset() is only used during check_cfg() and
> main verification. At that point no instruction movement occurred yet,
> so no need to track `&insn_set->ptrs[i].user_value.xlated_off`?
> 
> > +	u32 unique_offsets_cnt;
> >  	long *ips;
> >  	DECLARE_FLEX_ARRAY(struct bpf_insn_ptr, ptrs);
> >  };
> 
> [...]
> 
> > @@ -15296,6 +15330,22 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> >  		return 0;
> >  	}
> >  
> > +	if (dst_reg->type == PTR_TO_MAP_VALUE && map_is_insn_set(dst_reg->map_ptr)) {
> > +		if (opcode != BPF_ADD) {
> > +			verbose(env, "Operation %s on ptr to instruction set map is prohibited\n",
> > +				bpf_alu_string[opcode >> 4]);
> > +			return -EACCES;
> > +		}
> > +		src_reg = &regs[insn->src_reg];
> > +		if (src_reg->type != SCALAR_VALUE) {
> > +			verbose(env, "Adding non-scalar R%d to an instruction ptr is prohibited\n",
> > +				insn->src_reg);
> > +			return -EACCES;
> > +		}
> > +		dst_reg->min_index = src_reg->umin_value / sizeof(long);
> > +		dst_reg->max_index = src_reg->umax_value / sizeof(long);
> > +	}
> > +
> 
> What if there are several BPF_ADD on the same PTR_TO_MAP_VALUE in a row?
> Shouldn't the {min,max}_index be accumulated in that case?
> 
> Nit: this should be handled inside adjust_ptr_min_max_vals().

Yes, thanks, I've had this in my TBDs list for the next version. (All
"legal" cases generated by LLVM just do one add, so I've skipped it.)

> >  	if (dst_reg->type != SCALAR_VALUE)
> >  		ptr_reg = dst_reg;
> >  
> 
> [...]
> 
> > @@ -17552,6 +17607,62 @@ static int mark_fastcall_patterns(struct bpf_verifier_env *env)
> 
> [...]
> 
> > +/* "conditional jump with N edges" */
> > +static int visit_goto_x_insn(int t, struct bpf_verifier_env *env, int fd)
> > +{
> > +	struct bpf_map *map;
> > +	int ret;
> > +
> > +	ret = add_used_map(env, fd, &map);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (map->map_type != BPF_MAP_TYPE_INSN_SET)
> > +		return -EINVAL;
> 
> Nit: print something in the log?

Yes, thanks.

> 
> > +
> > +	return push_goto_x_edge(t, env, map);
> > +}
> > +
> 
> [...]
> 
> > @@ -18786,11 +18904,22 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
> >  			      struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
> >  {
> >  	u16 live_regs = env->insn_aux_data[insn_idx].live_regs_before;
> > +	struct bpf_insn *insn;
> >  	u16 i;
> >  
> >  	if (old->callback_depth > cur->callback_depth)
> >  		return false;
> >  
> > +	insn = &env->prog->insnsi[insn_idx];
> > +	if (insn_is_gotox(insn)) {
> > +		struct bpf_reg_state *old_dst = &old->regs[insn->dst_reg];
> > +		struct bpf_reg_state *cur_dst = &cur->regs[insn->dst_reg];
> > +
> > +		if (old_dst->min_index != cur_dst->min_index ||
> > +		    old_dst->max_index != cur_dst->max_index)
> > +			return false;
> > +	}
> > +
> 
> Concur with Alexei, this should be handled by regsafe().
> Also, having cur_dst as a subset of old_dst should be fine.

Thanks, yes to both.

> >  	for (i = 0; i < MAX_BPF_REG; i++)
> >  		if (((1 << i) & live_regs) &&
> >  		    !regsafe(env, &old->regs[i], &cur->regs[i],
> > @@ -19654,6 +19783,55 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
> >  	return PROCESS_BPF_EXIT;
> >  }
> >  
> > +static int check_indirect_jump(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > +{
> > +	struct bpf_verifier_state *other_branch;
> > +	struct bpf_reg_state *dst_reg;
> > +	struct bpf_map *map;
> > +	int xoff;
> > +	int err;
> > +	u32 i;
> > +
> > +	/* this map should already have been added */
> > +	err = add_used_map(env, insn->imm, &map);
> > +	if (err < 0)
> > +		return err;
> > +
> > +	dst_reg = reg_state(env, insn->dst_reg);
> > +	if (dst_reg->type != PTR_TO_INSN) {
> > +		verbose(env, "BPF_JA|BPF_X R%d has type %d, expected PTR_TO_INSN\n",
> > +				insn->dst_reg, dst_reg->type);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (dst_reg->map_ptr != map) {
> > +		verbose(env, "BPF_JA|BPF_X R%d was loaded from map id=%u, expected id=%u\n",
> > +				insn->dst_reg, dst_reg->map_ptr->id, map->id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (dst_reg->max_index >= map->max_entries)
> > +		return -EINVAL;
> > +
> > +	for (i = dst_reg->min_index + 1; i <= dst_reg->max_index; i++) {
> 
> Why +1 is needed in `i = dst_reg->min_index + 1`?

We want to "jump" to states {min,min+1,...,max} so we push states
{min+1,...,max} and continue the current state with the `jump
M[min]`:

    env->insn_idx = bpf_insn_set_iter_xlated_offset(map, dst_reg->min_index);

> > +		xoff = bpf_insn_set_iter_xlated_offset(map, i);
> > +		if (xoff == -ENOENT)
> > +			break;
> > +		if (xoff < 0)
> > +			return xoff;
> > +
> > +		other_branch = push_stack(env, xoff, env->insn_idx, false);
> > +		if (!other_branch)
> > +			return -EFAULT;
> 
> Nit: `return -ENOMEM`.

Ok, thanks.

> 
> > +	}
> > +
> > +	env->insn_idx = bpf_insn_set_iter_xlated_offset(map, dst_reg->min_index);
> > +	if (env->insn_idx < 0)
> > +		return env->insn_idx;
> > +
> > +	return 0;
> > +}
> > +
> >  static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> >  {
> >  	int err;
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-18 19:49   ` Eduard Zingerman
@ 2025-06-27  2:28     ` Eduard Zingerman
  2025-06-27 10:18       ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-06-27  2:28 UTC (permalink / raw)
  To: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Wed, 2025-06-18 at 12:49 -0700, Eduard Zingerman wrote:
> On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> 
> [...]
> 
> > @@ -698,6 +712,14 @@ struct bpf_object {
> >  	bool has_subcalls;
> >  	bool has_rodata;
> >  
> > +	const void *rodata;
> > +	size_t rodata_size;
> > +	int rodata_map_fd;
> 
> This is sort-of strange, that jump table metadata resides in one
> section, while jump section itself is in .rodata. Wouldn't it be
> simpler make LLVM emit all jump tables info in one section?
> Also note that Elf_Sym has name, section index, value and size,
> hence symbols defined for jump table section can encode jump tables.
> E.g. the following implementation seems more intuitive:
> 
>   .jumptables
>     <subprog-rel-off-0>
>     <subprog-rel-off-1> | <--- jump table #1 symbol:
>     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
>     ...                          .value = 1  // offset within .jumptables
>     <subprog-rel-off-N>                          ^
>                                                  |
>   .text                                          |
>     ...                                          |
>     <insn-N>     <------ relocation referencing -'
>     ...                  jump table #1 symbol

Anton, Yonghong,

I talked to Alexei about this yesterday and we agreed that the above
arrangement (separate jump tables section, separate symbols for each
individual jump table) makes sense on two counts:
- there is no need for jump table to occupy space in .rodata at
  runtime, actual offsets are read from map object;
- it simplifies processing on libbpf side, as there is no need to
  visit both .rodata and jump table size sections.

Wdyt?

> > +
> > +	/* Jump Tables */
> > +	struct jt **jt;
> > +	size_t jt_cnt;
> > +
> >  	struct bpf_gen *gen_loader;
> >  
> >  	/* Information when doing ELF related work. Only valid if efile.elf is not NULL */
> 
> [...]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-27  2:28     ` Eduard Zingerman
@ 2025-06-27 10:18       ` Anton Protopopov
  2025-07-03 18:21         ` Eduard Zingerman
  0 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-06-27 10:18 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/06/26 07:28PM, Eduard Zingerman wrote:
> On Wed, 2025-06-18 at 12:49 -0700, Eduard Zingerman wrote:
> > On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> > 
> > [...]
> > 
> > > @@ -698,6 +712,14 @@ struct bpf_object {
> > >  	bool has_subcalls;
> > >  	bool has_rodata;
> > >  
> > > +	const void *rodata;
> > > +	size_t rodata_size;
> > > +	int rodata_map_fd;
> > 
> > This is sort-of strange, that jump table metadata resides in one
> > section, while jump section itself is in .rodata. Wouldn't it be
> > simpler make LLVM emit all jump tables info in one section?
> > Also note that Elf_Sym has name, section index, value and size,
> > hence symbols defined for jump table section can encode jump tables.
> > E.g. the following implementation seems more intuitive:
> > 
> >   .jumptables
> >     <subprog-rel-off-0>
> >     <subprog-rel-off-1> | <--- jump table #1 symbol:
> >     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
> >     ...                          .value = 1  // offset within .jumptables
> >     <subprog-rel-off-N>                          ^
> >                                                  |
> >   .text                                          |
> >     ...                                          |
> >     <insn-N>     <------ relocation referencing -'
> >     ...                  jump table #1 symbol
> 
> Anton, Yonghong,
> 
> I talked to Alexei about this yesterday and we agreed that the above
> arrangement (separate jump tables section, separate symbols for each
> individual jump table) makes sense on two counts:
> - there is no need for jump table to occupy space in .rodata at
>   runtime, actual offsets are read from map object;
> - it simplifies processing on libbpf side, as there is no need to
>   visit both .rodata and jump table size sections.
> 
> Wdyt?

Yes, this seems more straightforward. Also this will look ~ the same
for used-defined (= non-llvm-generated) jump tables.

Yonghong, what do you think, are there any problems with this?
Also, how complex this would be to directly link a gotox instruction
to a particular jump table? (For a switch, for "user-defined" jump
tables this is obviously easy to do.)

> > > +
> > > +	/* Jump Tables */
> > > +	struct jt **jt;
> > > +	size_t jt_cnt;
> > > +
> > >  	struct bpf_gen *gen_loader;
> > >  
> > >  	/* Information when doing ELF related work. Only valid if efile.elf is not NULL */
> > 
> > [...]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-27 10:18       ` Anton Protopopov
@ 2025-07-03 18:21         ` Eduard Zingerman
  2025-07-03 19:03           ` Anton Protopopov
  2025-07-07 19:07           ` Eduard Zingerman
  0 siblings, 2 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-03 18:21 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On Fri, 2025-06-27 at 10:18 +0000, Anton Protopopov wrote:
> On 25/06/26 07:28PM, Eduard Zingerman wrote:
> > On Wed, 2025-06-18 at 12:49 -0700, Eduard Zingerman wrote:
> > > On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> > > 
> > > [...]
> > > 
> > > > @@ -698,6 +712,14 @@ struct bpf_object {
> > > >  	bool has_subcalls;
> > > >  	bool has_rodata;
> > > >  
> > > > +	const void *rodata;
> > > > +	size_t rodata_size;
> > > > +	int rodata_map_fd;
> > > 
> > > This is sort-of strange, that jump table metadata resides in one
> > > section, while jump section itself is in .rodata. Wouldn't it be
> > > simpler make LLVM emit all jump tables info in one section?
> > > Also note that Elf_Sym has name, section index, value and size,
> > > hence symbols defined for jump table section can encode jump tables.
> > > E.g. the following implementation seems more intuitive:
> > > 
> > >   .jumptables
> > >     <subprog-rel-off-0>
> > >     <subprog-rel-off-1> | <--- jump table #1 symbol:
> > >     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
> > >     ...                          .value = 1  // offset within .jumptables
> > >     <subprog-rel-off-N>                          ^
> > >                                                  |
> > >   .text                                          |
> > >     ...                                          |
> > >     <insn-N>     <------ relocation referencing -'
> > >     ...                  jump table #1 symbol
> > 
> > Anton, Yonghong,
> > 
> > I talked to Alexei about this yesterday and we agreed that the above
> > arrangement (separate jump tables section, separate symbols for each
> > individual jump table) makes sense on two counts:
> > - there is no need for jump table to occupy space in .rodata at
> >   runtime, actual offsets are read from map object;
> > - it simplifies processing on libbpf side, as there is no need to
> >   visit both .rodata and jump table size sections.
> > 
> > Wdyt?
> 
> Yes, this seems more straightforward. Also this will look ~ the same
> for used-defined (= non-llvm-generated) jump tables.
> 
> Yonghong, what do you think, are there any problems with this?
> Also, how complex this would be to directly link a gotox instruction
> to a particular jump table? (For a switch, for "user-defined" jump
> tables this is obviously easy to do.)

I think I know how to hack this:
- in BPFAsmPrinter add a function generating a global symbol for jump
  table (same as MachineFunction::getJTISymbol(), but that one always
  produces a private symbol (one starting with "L"));
- override TargetLowering::getPICJumpTableRelocBaseExpr to use the
  above function;
- modify BPFMCInstLower::Lower to use the above function;
- override AsmPrinter::emitJumpTableInfo, a simplified version of the
  original one:
  - a loop over all jump tables:
	- before each jump table emit start global symbol
	- after each jump table emit temporary symbol to mark jt end
	- set jump table symbol size to
		OutStreamer->emitELFSize(StartSym,
		                         MCBinaryExpr::createSub(MCSymbolRefExpr::create(EndSym, OutContext),
								 MCSymbolRefExpr::create(StartSym, OutContext),
								 OutContext)
	- use AsmPrinter::emitJumpTableEntry to emit individual jump table
      entries;
- plus the code to create jump tables section.

I should be able to share the code for this tomorrow or on the weekend.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-03 18:21         ` Eduard Zingerman
@ 2025-07-03 19:03           ` Anton Protopopov
  2025-07-07 19:07           ` Eduard Zingerman
  1 sibling, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-07-03 19:03 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/07/03 11:21AM, Eduard Zingerman wrote:
> On Fri, 2025-06-27 at 10:18 +0000, Anton Protopopov wrote:
> > On 25/06/26 07:28PM, Eduard Zingerman wrote:
> > > On Wed, 2025-06-18 at 12:49 -0700, Eduard Zingerman wrote:
> > > > On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -698,6 +712,14 @@ struct bpf_object {
> > > > >  	bool has_subcalls;
> > > > >  	bool has_rodata;
> > > > >  
> > > > > +	const void *rodata;
> > > > > +	size_t rodata_size;
> > > > > +	int rodata_map_fd;
> > > > 
> > > > This is sort-of strange, that jump table metadata resides in one
> > > > section, while jump section itself is in .rodata. Wouldn't it be
> > > > simpler make LLVM emit all jump tables info in one section?
> > > > Also note that Elf_Sym has name, section index, value and size,
> > > > hence symbols defined for jump table section can encode jump tables.
> > > > E.g. the following implementation seems more intuitive:
> > > > 
> > > >   .jumptables
> > > >     <subprog-rel-off-0>
> > > >     <subprog-rel-off-1> | <--- jump table #1 symbol:
> > > >     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
> > > >     ...                          .value = 1  // offset within .jumptables
> > > >     <subprog-rel-off-N>                          ^
> > > >                                                  |
> > > >   .text                                          |
> > > >     ...                                          |
> > > >     <insn-N>     <------ relocation referencing -'
> > > >     ...                  jump table #1 symbol
> > > 
> > > Anton, Yonghong,
> > > 
> > > I talked to Alexei about this yesterday and we agreed that the above
> > > arrangement (separate jump tables section, separate symbols for each
> > > individual jump table) makes sense on two counts:
> > > - there is no need for jump table to occupy space in .rodata at
> > >   runtime, actual offsets are read from map object;
> > > - it simplifies processing on libbpf side, as there is no need to
> > >   visit both .rodata and jump table size sections.
> > > 
> > > Wdyt?
> > 
> > Yes, this seems more straightforward. Also this will look ~ the same
> > for used-defined (= non-llvm-generated) jump tables.
> > 
> > Yonghong, what do you think, are there any problems with this?
> > Also, how complex this would be to directly link a gotox instruction
> > to a particular jump table? (For a switch, for "user-defined" jump
> > tables this is obviously easy to do.)
> 
> I think I know how to hack this:
> - in BPFAsmPrinter add a function generating a global symbol for jump
>   table (same as MachineFunction::getJTISymbol(), but that one always
>   produces a private symbol (one starting with "L"));
> - override TargetLowering::getPICJumpTableRelocBaseExpr to use the
>   above function;
> - modify BPFMCInstLower::Lower to use the above function;
> - override AsmPrinter::emitJumpTableInfo, a simplified version of the
>   original one:
>   - a loop over all jump tables:
> 	- before each jump table emit start global symbol
> 	- after each jump table emit temporary symbol to mark jt end
> 	- set jump table symbol size to
> 		OutStreamer->emitELFSize(StartSym,
> 		                         MCBinaryExpr::createSub(MCSymbolRefExpr::create(EndSym, OutContext),
> 								 MCSymbolRefExpr::create(StartSym, OutContext),
> 								 OutContext)
> 	- use AsmPrinter::emitJumpTableEntry to emit individual jump table
>       entries;
> - plus the code to create jump tables section.
> 
> I should be able to share the code for this tomorrow or on the weekend.

Would be great, thanks a lot for looking into this! I will
try to address other comments by about the same time. (I am
now in the middle of the list or so.)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-03 18:21         ` Eduard Zingerman
  2025-07-03 19:03           ` Anton Protopopov
@ 2025-07-07 19:07           ` Eduard Zingerman
  2025-07-07 19:34             ` Anton Protopopov
                               ` (2 more replies)
  1 sibling, 3 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-07 19:07 UTC (permalink / raw)
  To: Anton Protopopov, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet

[-- Attachment #1: Type: text/plain, Size: 2377 bytes --]

On Thu, 2025-07-03 at 11:21 -0700, Eduard Zingerman wrote:

[...]

> > > >   .jumptables
> > > >     <subprog-rel-off-0>
> > > >     <subprog-rel-off-1> | <--- jump table #1 symbol:
> > > >     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
> > > >     ...                          .value = 1  // offset within .jumptables
> > > >     <subprog-rel-off-N>                          ^
> > > >                                                  |
> > > >   .text                                          |
> > > >     ...                                          |
> > > >     <insn-N>     <------ relocation referencing -'
> > > >     ...                  jump table #1 symbol

[...]

I think I got it working in:
https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section

Changes on top of Yonghong's work.
An example is in the attachment the gist is:

-------------------------------

$ clang --target=bpf -c -o jump-table-test.o jump-table-test.c
There are 8 section headers, starting at offset 0xaa0:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  ...
  [ 4] .jumptables       PROGBITS        0000000000000000 000220 000260 00      0   0  1
  ...

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     ...
     3: 0000000000000000   256 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.0
     4: 0000000000000100   352 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.1
     ...

$ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
jump-table-test.o:      file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
       ...
       6:       r2 <<= 0x3
       7:       r1 = 0x0 ll
                0000000000000038:  R_BPF_64_64  .jumptables
       9:       r1 += r2
      10:       r1 = *(u64 *)(r1 + 0x0)
      11:       gotox r1
      ...
      34:       r2 <<= 0x3
      35:       r1 = 0x100 ll
                0000000000000118:  R_BPF_64_64  .jumptables
      37:       r1 += r2
      38:       r1 = *(u64 *)(r1 + 0x0)
      39:       gotox r1
      ...

-------------------------------

The changes only touch BPF backend. Can be simplified a bit if I move
MachineFunction::getJTISymbol to TargetLowering in the shared LLVM
parts.

[-- Attachment #2: session.log --]
[-- Type: text/x-log, Size: 4797 bytes --]

$ cat jump-table-test.c
struct simple_ctx { int x; };

int bar(int v);

int foo(struct simple_ctx *ctx)
{
	int ret_user;

        switch (ctx->x) {
        case 0:
                ret_user = 2;
                break;
        case 11:
                ret_user = 3;
                break;
        case 27:
                ret_user = 4;
                break;
        case 31:
                ret_user = 5;
                break;
        default:
                ret_user = 19;
                break;
        }

        switch (bar(ret_user)) {
        case 1:
                ret_user = 5;
                break;
        case 12:
                ret_user = 7;
                break;
        case 27:
                ret_user = 23;
                break;
        case 32:
                ret_user = 37;
                break;
        case 44:
                ret_user = 77;
                break;
        default:
                ret_user = 11;
                break;
        }

        return ret_user;
}

$ clang --target=bpf -c -o jump-table-test.o jump-table-test.c
There are 8 section headers, starting at offset 0xaa0:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 000a31 00006b 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 0001e0 00  AX  0   0  8
  [ 3] .rel.text         REL             0000000000000000 000540 000030 10   I  7   2  8
  [ 4] .jumptables       PROGBITS        0000000000000000 000220 000260 00      0   0  1
  [ 5] .rel.jumptables   REL             0000000000000000 000570 0004c0 10   I  7   4  8
  [ 6] .llvm_addrsig     LLVM_ADDRSIG    0000000000000000 000a30 000001 00   E  7   0  1
  [ 7] .symtab           SYMTAB          0000000000000000 000480 0000c0 18      1   6  8
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  R (retain), p (processor specific)

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS jump-table-test.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     2 .text
     3: 0000000000000000   256 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.0
     4: 0000000000000100   352 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.1
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT     4 .jumptables
     6: 0000000000000000   480 FUNC    GLOBAL DEFAULT     2 foo
     7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND bar

$ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
jump-table-test.o:	file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
       0:	*(u64 *)(r10 - 0x8) = r1
       1:	r1 = *(u64 *)(r10 - 0x8)
       2:	w1 = *(u32 *)(r1 + 0x0)
       3:	*(u64 *)(r10 - 0x18) = r1
       4:	if w1 > 0x1f goto +0x13 <foo+0xc0>
       5:	r2 = *(u64 *)(r10 - 0x18)
       6:	r2 <<= 0x3
       7:	r1 = 0x0 ll
		0000000000000038:  R_BPF_64_64	.jumptables
       9:	r1 += r2
      10:	r1 = *(u64 *)(r1 + 0x0)
      11:	gotox r1
      12:	w1 = 0x2
      13:	*(u32 *)(r10 - 0xc) = w1
      14:	goto +0xc <foo+0xd8>
      15:	w1 = 0x3
      16:	*(u32 *)(r10 - 0xc) = w1
      17:	goto +0x9 <foo+0xd8>
      18:	w1 = 0x4
      19:	*(u32 *)(r10 - 0xc) = w1
      20:	goto +0x6 <foo+0xd8>
      21:	w1 = 0x5
      22:	*(u32 *)(r10 - 0xc) = w1
      23:	goto +0x3 <foo+0xd8>
      24:	w1 = 0x13
      25:	*(u32 *)(r10 - 0xc) = w1
      26:	goto +0x0 <foo+0xd8>
      27:	w1 = *(u32 *)(r10 - 0xc)
      28:	call -0x1
		00000000000000e0:  R_BPF_64_32	bar
      29:	w0 += -0x1
      30:	w1 = w0
      31:	*(u64 *)(r10 - 0x20) = r1
      32:	if w0 > 0x2b goto +0x16 <foo+0x1b8>
      33:	r2 = *(u64 *)(r10 - 0x20)
      34:	r2 <<= 0x3
      35:	r1 = 0x100 ll
		0000000000000118:  R_BPF_64_64	.jumptables
      37:	r1 += r2
      38:	r1 = *(u64 *)(r1 + 0x0)
      39:	gotox r1
      40:	w1 = 0x5
      41:	*(u32 *)(r10 - 0xc) = w1
      42:	goto +0xf <foo+0x1d0>
      43:	w1 = 0x7
      44:	*(u32 *)(r10 - 0xc) = w1
      45:	goto +0xc <foo+0x1d0>
      46:	w1 = 0x17
      47:	*(u32 *)(r10 - 0xc) = w1
      48:	goto +0x9 <foo+0x1d0>
      49:	w1 = 0x25
      50:	*(u32 *)(r10 - 0xc) = w1
      51:	goto +0x6 <foo+0x1d0>
      52:	w1 = 0x4d
      53:	*(u32 *)(r10 - 0xc) = w1
      54:	goto +0x3 <foo+0x1d0>
      55:	w1 = 0xb
      56:	*(u32 *)(r10 - 0xc) = w1
      57:	goto +0x0 <foo+0x1d0>
      58:	w0 = *(u32 *)(r10 - 0xc)
      59:	exit


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-07 19:07           ` Eduard Zingerman
@ 2025-07-07 19:34             ` Anton Protopopov
  2025-07-07 21:44             ` Yonghong Song
  2025-07-08  8:30             ` Eduard Zingerman
  2 siblings, 0 replies; 63+ messages in thread
From: Anton Protopopov @ 2025-07-07 19:34 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet

On 25/07/07 12:07PM, Eduard Zingerman wrote:
> On Thu, 2025-07-03 at 11:21 -0700, Eduard Zingerman wrote:
> 
> [...]
> 
> > > > >   .jumptables
> > > > >     <subprog-rel-off-0>
> > > > >     <subprog-rel-off-1> | <--- jump table #1 symbol:
> > > > >     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
> > > > >     ...                          .value = 1  // offset within .jumptables
> > > > >     <subprog-rel-off-N>                          ^
> > > > >                                                  |
> > > > >   .text                                          |
> > > > >     ...                                          |
> > > > >     <insn-N>     <------ relocation referencing -'
> > > > >     ...                  jump table #1 symbol
> 
> [...]
> 
> I think I got it working in:
> https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section

Awesome! I will try to use it tomorrow.

> Changes on top of Yonghong's work.
> An example is in the attachment the gist is:
> 
> -------------------------------
> 
> $ clang --target=bpf -c -o jump-table-test.o jump-table-test.c
> There are 8 section headers, starting at offset 0xaa0:
> 
> Section Headers:
>   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>   ...
>   [ 4] .jumptables       PROGBITS        0000000000000000 000220 000260 00      0   0  1
>   ...
> 
> Symbol table '.symtab' contains 8 entries:
>    Num:    Value          Size Type    Bind   Vis       Ndx Name
>      ...
>      3: 0000000000000000   256 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.0
>      4: 0000000000000100   352 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.1
>      ...
> 
> $ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
> jump-table-test.o:      file format elf64-bpf
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo>:
>        ...
>        6:       r2 <<= 0x3
>        7:       r1 = 0x0 ll
>                 0000000000000038:  R_BPF_64_64  .jumptables
>        9:       r1 += r2
>       10:       r1 = *(u64 *)(r1 + 0x0)
>       11:       gotox r1
>       ...
>       34:       r2 <<= 0x3
>       35:       r1 = 0x100 ll
>                 0000000000000118:  R_BPF_64_64  .jumptables
>       37:       r1 += r2
>       38:       r1 = *(u64 *)(r1 + 0x0)
>       39:       gotox r1
>       ...
> 
> -------------------------------
> 
> The changes only touch BPF backend. Can be simplified a bit if I move
> MachineFunction::getJTISymbol to TargetLowering in the shared LLVM
> parts.

> $ cat jump-table-test.c
> struct simple_ctx { int x; };
> 
> int bar(int v);
> 
> int foo(struct simple_ctx *ctx)
> {
> 	int ret_user;
> 
>         switch (ctx->x) {
>         case 0:
>                 ret_user = 2;
>                 break;
>         case 11:
>                 ret_user = 3;
>                 break;
>         case 27:
>                 ret_user = 4;
>                 break;
>         case 31:
>                 ret_user = 5;
>                 break;
>         default:
>                 ret_user = 19;
>                 break;
>         }
> 
>         switch (bar(ret_user)) {
>         case 1:
>                 ret_user = 5;
>                 break;
>         case 12:
>                 ret_user = 7;
>                 break;
>         case 27:
>                 ret_user = 23;
>                 break;
>         case 32:
>                 ret_user = 37;
>                 break;
>         case 44:
>                 ret_user = 77;
>                 break;
>         default:
>                 ret_user = 11;
>                 break;
>         }
> 
>         return ret_user;
> }
> 
> $ clang --target=bpf -c -o jump-table-test.o jump-table-test.c
> There are 8 section headers, starting at offset 0xaa0:
> 
> Section Headers:
>   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>   [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
>   [ 1] .strtab           STRTAB          0000000000000000 000a31 00006b 00      0   0  1
>   [ 2] .text             PROGBITS        0000000000000000 000040 0001e0 00  AX  0   0  8
>   [ 3] .rel.text         REL             0000000000000000 000540 000030 10   I  7   2  8
>   [ 4] .jumptables       PROGBITS        0000000000000000 000220 000260 00      0   0  1
>   [ 5] .rel.jumptables   REL             0000000000000000 000570 0004c0 10   I  7   4  8
>   [ 6] .llvm_addrsig     LLVM_ADDRSIG    0000000000000000 000a30 000001 00   E  7   0  1
>   [ 7] .symtab           SYMTAB          0000000000000000 000480 0000c0 18      1   6  8
> Key to Flags:
>   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
>   L (link order), O (extra OS processing required), G (group), T (TLS),
>   C (compressed), x (unknown), o (OS specific), E (exclude),
>   R (retain), p (processor specific)
> 
> Symbol table '.symtab' contains 8 entries:
>    Num:    Value          Size Type    Bind   Vis       Ndx Name
>      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
>      1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS jump-table-test.c
>      2: 0000000000000000     0 SECTION LOCAL  DEFAULT     2 .text
>      3: 0000000000000000   256 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.0
>      4: 0000000000000100   352 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.1
>      5: 0000000000000000     0 SECTION LOCAL  DEFAULT     4 .jumptables
>      6: 0000000000000000   480 FUNC    GLOBAL DEFAULT     2 foo
>      7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND bar
> 
> $ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
> jump-table-test.o:	file format elf64-bpf
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo>:
>        0:	*(u64 *)(r10 - 0x8) = r1
>        1:	r1 = *(u64 *)(r10 - 0x8)
>        2:	w1 = *(u32 *)(r1 + 0x0)
>        3:	*(u64 *)(r10 - 0x18) = r1
>        4:	if w1 > 0x1f goto +0x13 <foo+0xc0>
>        5:	r2 = *(u64 *)(r10 - 0x18)
>        6:	r2 <<= 0x3
>        7:	r1 = 0x0 ll
> 		0000000000000038:  R_BPF_64_64	.jumptables
>        9:	r1 += r2
>       10:	r1 = *(u64 *)(r1 + 0x0)
>       11:	gotox r1
>       12:	w1 = 0x2
>       13:	*(u32 *)(r10 - 0xc) = w1
>       14:	goto +0xc <foo+0xd8>
>       15:	w1 = 0x3
>       16:	*(u32 *)(r10 - 0xc) = w1
>       17:	goto +0x9 <foo+0xd8>
>       18:	w1 = 0x4
>       19:	*(u32 *)(r10 - 0xc) = w1
>       20:	goto +0x6 <foo+0xd8>
>       21:	w1 = 0x5
>       22:	*(u32 *)(r10 - 0xc) = w1
>       23:	goto +0x3 <foo+0xd8>
>       24:	w1 = 0x13
>       25:	*(u32 *)(r10 - 0xc) = w1
>       26:	goto +0x0 <foo+0xd8>
>       27:	w1 = *(u32 *)(r10 - 0xc)
>       28:	call -0x1
> 		00000000000000e0:  R_BPF_64_32	bar
>       29:	w0 += -0x1
>       30:	w1 = w0
>       31:	*(u64 *)(r10 - 0x20) = r1
>       32:	if w0 > 0x2b goto +0x16 <foo+0x1b8>
>       33:	r2 = *(u64 *)(r10 - 0x20)
>       34:	r2 <<= 0x3
>       35:	r1 = 0x100 ll
> 		0000000000000118:  R_BPF_64_64	.jumptables
>       37:	r1 += r2
>       38:	r1 = *(u64 *)(r1 + 0x0)
>       39:	gotox r1
>       40:	w1 = 0x5
>       41:	*(u32 *)(r10 - 0xc) = w1
>       42:	goto +0xf <foo+0x1d0>
>       43:	w1 = 0x7
>       44:	*(u32 *)(r10 - 0xc) = w1
>       45:	goto +0xc <foo+0x1d0>
>       46:	w1 = 0x17
>       47:	*(u32 *)(r10 - 0xc) = w1
>       48:	goto +0x9 <foo+0x1d0>
>       49:	w1 = 0x25
>       50:	*(u32 *)(r10 - 0xc) = w1
>       51:	goto +0x6 <foo+0x1d0>
>       52:	w1 = 0x4d
>       53:	*(u32 *)(r10 - 0xc) = w1
>       54:	goto +0x3 <foo+0x1d0>
>       55:	w1 = 0xb
>       56:	*(u32 *)(r10 - 0xc) = w1
>       57:	goto +0x0 <foo+0x1d0>
>       58:	w0 = *(u32 *)(r10 - 0xc)
>       59:	exit
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-07 19:07           ` Eduard Zingerman
  2025-07-07 19:34             ` Anton Protopopov
@ 2025-07-07 21:44             ` Yonghong Song
  2025-07-08  5:58               ` Yonghong Song
  2025-07-08  8:30             ` Eduard Zingerman
  2 siblings, 1 reply; 63+ messages in thread
From: Yonghong Song @ 2025-07-07 21:44 UTC (permalink / raw)
  To: Eduard Zingerman, Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet



On 7/7/25 12:07 PM, Eduard Zingerman wrote:
> On Thu, 2025-07-03 at 11:21 -0700, Eduard Zingerman wrote:
>
> [...]
>
>>>>>    .jumptables
>>>>>      <subprog-rel-off-0>
>>>>>      <subprog-rel-off-1> | <--- jump table #1 symbol:
>>>>>      <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
>>>>>      ...                          .value = 1  // offset within .jumptables
>>>>>      <subprog-rel-off-N>                          ^
>>>>>                                                   |
>>>>>    .text                                          |
>>>>>      ...                                          |
>>>>>      <insn-N>     <------ relocation referencing -'
>>>>>      ...                  jump table #1 symbol
> [...]
>
> I think I got it working in:
> https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section
>
> Changes on top of Yonghong's work.
> An example is in the attachment the gist is:
>
> -------------------------------
>
> $ clang --target=bpf -c -o jump-table-test.o jump-table-test.c
> There are 8 section headers, starting at offset 0xaa0:
>
> Section Headers:
>    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>    ...
>    [ 4] .jumptables       PROGBITS        0000000000000000 000220 000260 00      0   0  1
>    ...
>
> Symbol table '.symtab' contains 8 entries:
>     Num:    Value          Size Type    Bind   Vis       Ndx Name
>       ...
>       3: 0000000000000000   256 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.0
>       4: 0000000000000100   352 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.1
>       ...
>
> $ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
> jump-table-test.o:      file format elf64-bpf
>
> Disassembly of section .text:
>
> 0000000000000000 <foo>:
>         ...
>         6:       r2 <<= 0x3
>         7:       r1 = 0x0 ll
>                  0000000000000038:  R_BPF_64_64  .jumptables
>         9:       r1 += r2
>        10:       r1 = *(u64 *)(r1 + 0x0)
>        11:       gotox r1
>        ...
>        34:       r2 <<= 0x3
>        35:       r1 = 0x100 ll
>                  0000000000000118:  R_BPF_64_64  .jumptables
>        37:       r1 += r2
>        38:       r1 = *(u64 *)(r1 + 0x0)
>        39:       gotox r1
>        ...
>
> -------------------------------
>
> The changes only touch BPF backend. Can be simplified a bit if I move
> MachineFunction::getJTISymbol to TargetLowering in the shared LLVM
> parts.

Thanks, Eduard. I actually also explored a little bit and came up with
the below patch:
   https://github.com/yonghong-song/llvm-project/tree/br-jt-v6-seperate-jmptable
the top commit is the addition on top of https://github.com/llvm/llvm-project/pull/133856.
I tried to leverage existing llvm infrastructure and it will support ELF/XCOFF/COFF
and all backends.

Anton, besides Eduard's patch, please also take a look at the above patch.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-18 15:08     ` Anton Protopopov
@ 2025-07-07 23:45       ` Eduard Zingerman
  2025-07-07 23:49         ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-07 23:45 UTC (permalink / raw)
  To: Anton Protopopov, Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On Wed, 2025-06-18 at 15:08 +0000, Anton Protopopov wrote:
> On 25/06/17 08:22PM, Alexei Starovoitov wrote:
> > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > > 
> > > The final line generates an indirect jump. The
> > > format of the indirect jump instruction supported by BPF is
> > > 
> > >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> > > 
> > > and, obviously, the map M must be the same map which was used to
> > > init the register rX. This patch implements this in the following,
> > > hacky, but so far suitable for all existing use-cases, way. On
> > > encountering a `gotox` instruction libbpf tracks back to the
> > > previous direct load from map and stores this map file descriptor
> > > in the gotox instruction.
> > 
> > ...
> > 
> > > +/*
> > > + * This one is too dumb, of course. TBD to make it smarter.
> > > + */
> > > +static int find_jt_map_fd(struct bpf_program *prog, int insn_idx)
> > > +{
> > > +       struct bpf_insn *insn = &prog->insns[insn_idx];
> > > +       __u8 dst_reg = insn->dst_reg;
> > > +
> > > +       /* TBD: this function is such smart for now that it even ignores this
> > > +        * register. Instead, it should backtrack the load more carefully.
> > > +        * (So far even this dumb version works with all selftests.)
> > > +        */
> > > +       pr_debug("searching for a load instruction which populated dst_reg=r%u\n", dst_reg);
> > > +
> > > +       while (--insn >= prog->insns) {
> > > +               if (insn->code == (BPF_LD|BPF_DW|BPF_IMM))
> > > +                       return insn[0].imm;
> > > +       }
> > > +
> > > +       return -ENOENT;
> > > +}
> > > +
> > > +static int bpf_object__patch_gotox(struct bpf_object *obj, struct bpf_program *prog)
> > > +{
> > > +       struct bpf_insn *insn = prog->insns;
> > > +       int map_fd;
> > > +       int i;
> > > +
> > > +       for (i = 0; i < prog->insns_cnt; i++, insn++) {
> > > +               if (!insn_is_gotox(insn))
> > > +                       continue;
> > > +
> > > +               if (obj->gen_loader)
> > > +                       return -EFAULT;
> > > +
> > > +               map_fd = find_jt_map_fd(prog, i);
> > > +               if (map_fd < 0)
> > > +                       return map_fd;
> > > +
> > > +               insn->imm = map_fd;
> > > +       }
> > 
> > This is obviously broken and cannot be made smarter in libbpf.
> > It won't be doing data flow analysis.
> > 
> > The only option I see is to teach llvm to tag jmp_table in gotox.
> > Probably the simplest way is to add the same relo to gotox insn
> > as for ld_imm64. Then libbpf has a direct way to assign
> > the same map_fd into both ld_imm64 and gotox.
> 
> This would be nice.

I did not implement this is a change for jt section + jt symbols.
It can be added, but thinking about it again, are you sure it is
necessary to have map fd in the gotox?

Verifier should be smart enough already to track what map the rX in
the `gotox rX` is a derivative of. It can make use of
bpf_insn_aux_data->map_index to enforce that only one map is used with
a particular gotox instruction.

[...]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-07 23:45       ` Eduard Zingerman
@ 2025-07-07 23:49         ` Alexei Starovoitov
  2025-07-08  0:01           ` Eduard Zingerman
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-07-07 23:49 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Mon, Jul 7, 2025 at 4:45 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-06-18 at 15:08 +0000, Anton Protopopov wrote:
> > On 25/06/17 08:22PM, Alexei Starovoitov wrote:
> > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > <a.s.protopopov@gmail.com> wrote:
> > > >
> > > > The final line generates an indirect jump. The
> > > > format of the indirect jump instruction supported by BPF is
> > > >
> > > >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> > > >
> > > > and, obviously, the map M must be the same map which was used to
> > > > init the register rX. This patch implements this in the following,
> > > > hacky, but so far suitable for all existing use-cases, way. On
> > > > encountering a `gotox` instruction libbpf tracks back to the
> > > > previous direct load from map and stores this map file descriptor
> > > > in the gotox instruction.
> > >
> > > ...
> > >
> > > > +/*
> > > > + * This one is too dumb, of course. TBD to make it smarter.
> > > > + */
> > > > +static int find_jt_map_fd(struct bpf_program *prog, int insn_idx)
> > > > +{
> > > > +       struct bpf_insn *insn = &prog->insns[insn_idx];
> > > > +       __u8 dst_reg = insn->dst_reg;
> > > > +
> > > > +       /* TBD: this function is such smart for now that it even ignores this
> > > > +        * register. Instead, it should backtrack the load more carefully.
> > > > +        * (So far even this dumb version works with all selftests.)
> > > > +        */
> > > > +       pr_debug("searching for a load instruction which populated dst_reg=r%u\n", dst_reg);
> > > > +
> > > > +       while (--insn >= prog->insns) {
> > > > +               if (insn->code == (BPF_LD|BPF_DW|BPF_IMM))
> > > > +                       return insn[0].imm;
> > > > +       }
> > > > +
> > > > +       return -ENOENT;
> > > > +}
> > > > +
> > > > +static int bpf_object__patch_gotox(struct bpf_object *obj, struct bpf_program *prog)
> > > > +{
> > > > +       struct bpf_insn *insn = prog->insns;
> > > > +       int map_fd;
> > > > +       int i;
> > > > +
> > > > +       for (i = 0; i < prog->insns_cnt; i++, insn++) {
> > > > +               if (!insn_is_gotox(insn))
> > > > +                       continue;
> > > > +
> > > > +               if (obj->gen_loader)
> > > > +                       return -EFAULT;
> > > > +
> > > > +               map_fd = find_jt_map_fd(prog, i);
> > > > +               if (map_fd < 0)
> > > > +                       return map_fd;
> > > > +
> > > > +               insn->imm = map_fd;
> > > > +       }
> > >
> > > This is obviously broken and cannot be made smarter in libbpf.
> > > It won't be doing data flow analysis.
> > >
> > > The only option I see is to teach llvm to tag jmp_table in gotox.
> > > Probably the simplest way is to add the same relo to gotox insn
> > > as for ld_imm64. Then libbpf has a direct way to assign
> > > the same map_fd into both ld_imm64 and gotox.
> >
> > This would be nice.
>
> I did not implement this is a change for jt section + jt symbols.
> It can be added, but thinking about it again, are you sure it is
> necessary to have map fd in the gotox?
>
> Verifier should be smart enough already to track what map the rX in
> the `gotox rX` is a derivative of. It can make use of
> bpf_insn_aux_data->map_index to enforce that only one map is used with
> a particular gotox instruction.

How would it associate gotox with map (set of IPs) at check_cfg() stage?
llvm needs to help.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-07 23:49         ` Alexei Starovoitov
@ 2025-07-08  0:01           ` Eduard Zingerman
  2025-07-08  0:12             ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-08  0:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Mon, 2025-07-07 at 16:49 -0700, Alexei Starovoitov wrote:
> On Mon, Jul 7, 2025 at 4:45 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Wed, 2025-06-18 at 15:08 +0000, Anton Protopopov wrote:
> > > On 25/06/17 08:22PM, Alexei Starovoitov wrote:
> > > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > > <a.s.protopopov@gmail.com> wrote:
> > > > > 
> > > > > The final line generates an indirect jump. The
> > > > > format of the indirect jump instruction supported by BPF is
> > > > > 
> > > > >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> > > > > 
> > > > > and, obviously, the map M must be the same map which was used to
> > > > > init the register rX. This patch implements this in the following,
> > > > > hacky, but so far suitable for all existing use-cases, way. On
> > > > > encountering a `gotox` instruction libbpf tracks back to the
> > > > > previous direct load from map and stores this map file descriptor
> > > > > in the gotox instruction.
> > > > 
> > > > ...

[...]

> > > > 
> > > > This is obviously broken and cannot be made smarter in libbpf.
> > > > It won't be doing data flow analysis.
> > > > 
> > > > The only option I see is to teach llvm to tag jmp_table in gotox.
> > > > Probably the simplest way is to add the same relo to gotox insn
> > > > as for ld_imm64. Then libbpf has a direct way to assign
> > > > the same map_fd into both ld_imm64 and gotox.
> > > 
> > > This would be nice.
> > 
> > I did not implement this is a change for jt section + jt symbols.
> > It can be added, but thinking about it again, are you sure it is
> > necessary to have map fd in the gotox?
> > 
> > Verifier should be smart enough already to track what map the rX in
> > the `gotox rX` is a derivative of. It can make use of
> > bpf_insn_aux_data->map_index to enforce that only one map is used with
> > a particular gotox instruction.
> 
> How would it associate gotox with map (set of IPs) at check_cfg() stage?
> llvm needs to help.

check_cfg(), right, thank you.
But still, this feels like an artificial limitation.
Just because we have a check_cfg() pass as a separate thing we need
this hint.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08  0:01           ` Eduard Zingerman
@ 2025-07-08  0:12             ` Alexei Starovoitov
  2025-07-08  0:18               ` Eduard Zingerman
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-07-08  0:12 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Mon, Jul 7, 2025 at 5:01 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2025-07-07 at 16:49 -0700, Alexei Starovoitov wrote:
> > On Mon, Jul 7, 2025 at 4:45 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Wed, 2025-06-18 at 15:08 +0000, Anton Protopopov wrote:
> > > > On 25/06/17 08:22PM, Alexei Starovoitov wrote:
> > > > > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > > > > <a.s.protopopov@gmail.com> wrote:
> > > > > >
> > > > > > The final line generates an indirect jump. The
> > > > > > format of the indirect jump instruction supported by BPF is
> > > > > >
> > > > > >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> > > > > >
> > > > > > and, obviously, the map M must be the same map which was used to
> > > > > > init the register rX. This patch implements this in the following,
> > > > > > hacky, but so far suitable for all existing use-cases, way. On
> > > > > > encountering a `gotox` instruction libbpf tracks back to the
> > > > > > previous direct load from map and stores this map file descriptor
> > > > > > in the gotox instruction.
> > > > >
> > > > > ...
>
> [...]
>
> > > > >
> > > > > This is obviously broken and cannot be made smarter in libbpf.
> > > > > It won't be doing data flow analysis.
> > > > >
> > > > > The only option I see is to teach llvm to tag jmp_table in gotox.
> > > > > Probably the simplest way is to add the same relo to gotox insn
> > > > > as for ld_imm64. Then libbpf has a direct way to assign
> > > > > the same map_fd into both ld_imm64 and gotox.
> > > >
> > > > This would be nice.
> > >
> > > I did not implement this is a change for jt section + jt symbols.
> > > It can be added, but thinking about it again, are you sure it is
> > > necessary to have map fd in the gotox?
> > >
> > > Verifier should be smart enough already to track what map the rX in
> > > the `gotox rX` is a derivative of. It can make use of
> > > bpf_insn_aux_data->map_index to enforce that only one map is used with
> > > a particular gotox instruction.
> >
> > How would it associate gotox with map (set of IPs) at check_cfg() stage?
> > llvm needs to help.
>
> check_cfg(), right, thank you.
> But still, this feels like an artificial limitation.
> Just because we have a check_cfg() pass as a separate thing we need
> this hint.

and insn_successors().
All of them have to work before the main verifier analysis.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08  0:12             ` Alexei Starovoitov
@ 2025-07-08  0:18               ` Eduard Zingerman
  2025-07-08  0:49                 ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-08  0:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Mon, 2025-07-07 at 17:12 -0700, Alexei Starovoitov wrote:

[...]

> > check_cfg(), right, thank you.
> > But still, this feels like an artificial limitation.
> > Just because we have a check_cfg() pass as a separate thing we need
> > this hint.
> 
> and insn_successors().
> All of them have to work before the main verifier analysis.

Yeah, I see.
In theory, it shouldn't be hard to write a reaching definitions
analysis and make it do an additional pass once a connection between
gotox and a map is established.  And have this run before main
verification pass.

I'll modify llvm branch to emit the label.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08  0:18               ` Eduard Zingerman
@ 2025-07-08  0:49                 ` Alexei Starovoitov
  2025-07-08  0:51                   ` Eduard Zingerman
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-07-08  0:49 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Mon, Jul 7, 2025 at 5:18 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2025-07-07 at 17:12 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > > check_cfg(), right, thank you.
> > > But still, this feels like an artificial limitation.
> > > Just because we have a check_cfg() pass as a separate thing we need
> > > this hint.
> >
> > and insn_successors().
> > All of them have to work before the main verifier analysis.
>
> Yeah, I see.
> In theory, it shouldn't be hard to write a reaching definitions
> analysis and make it do an additional pass once a connection between
> gotox and a map is established.  And have this run before main
> verification pass.

Yes. In theory :) But we don't have it today.
Hence I don't understand the pushback to llvm-aid.
If/when such dataflow analysis is available, we can drop llvm-aid.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08  0:49                 ` Alexei Starovoitov
@ 2025-07-08  0:51                   ` Eduard Zingerman
  0 siblings, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-08  0:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Mon, 2025-07-07 at 17:49 -0700, Alexei Starovoitov wrote:
> On Mon, Jul 7, 2025 at 5:18 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Mon, 2025-07-07 at 17:12 -0700, Alexei Starovoitov wrote:
> > 
> > [...]
> > 
> > > > check_cfg(), right, thank you.
> > > > But still, this feels like an artificial limitation.
> > > > Just because we have a check_cfg() pass as a separate thing we need
> > > > this hint.
> > > 
> > > and insn_successors().
> > > All of them have to work before the main verifier analysis.
> > 
> > Yeah, I see.
> > In theory, it shouldn't be hard to write a reaching definitions
> > analysis and make it do an additional pass once a connection between
> > gotox and a map is established.  And have this run before main
> > verification pass.
> 
> Yes. In theory :) But we don't have it today.
> Hence I don't understand the pushback to llvm-aid.
> If/when such dataflow analysis is available, we can drop llvm-aid.

No pushback, I forgot about changes needed in check_cfg() + I need to
rant a bit.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-07 21:44             ` Yonghong Song
@ 2025-07-08  5:58               ` Yonghong Song
  0 siblings, 0 replies; 63+ messages in thread
From: Yonghong Song @ 2025-07-08  5:58 UTC (permalink / raw)
  To: Eduard Zingerman, Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet


On 7/7/25 2:44 PM, Yonghong Song wrote:
>
>
> On 7/7/25 12:07 PM, Eduard Zingerman wrote:
>> On Thu, 2025-07-03 at 11:21 -0700, Eduard Zingerman wrote:
>>
>> [...]
>>
>>>>>>    .jumptables
>>>>>>      <subprog-rel-off-0>
>>>>>>      <subprog-rel-off-1> | <--- jump table #1 symbol:
>>>>>>      <subprog-rel-off-2> |        .size = 2   // number of 
>>>>>> entries in the jump table
>>>>>>      ...                          .value = 1  // offset within 
>>>>>> .jumptables
>>>>>>      <subprog-rel-off-N> ^
>>>>>>                                                   |
>>>>>>    .text                                          |
>>>>>>      ...                                          |
>>>>>>      <insn-N>     <------ relocation referencing -'
>>>>>>      ...                  jump table #1 symbol
>> [...]
>>
>> I think I got it working in:
>> https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section
>>
>> Changes on top of Yonghong's work.
>> An example is in the attachment the gist is:
>>
>> -------------------------------
>>
>> $ clang --target=bpf -c -o jump-table-test.o jump-table-test.c
>> There are 8 section headers, starting at offset 0xaa0:
>>
>> Section Headers:
>>    [Nr] Name              Type            Address Off    Size   ES 
>> Flg Lk Inf Al
>>    ...
>>    [ 4] .jumptables       PROGBITS        0000000000000000 000220 
>> 000260 00      0   0  1
>>    ...
>>
>> Symbol table '.symtab' contains 8 entries:
>>     Num:    Value          Size Type    Bind   Vis       Ndx Name
>>       ...
>>       3: 0000000000000000   256 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.0
>>       4: 0000000000000100   352 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.1
>>       ...
>>
>> $ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
>> jump-table-test.o:      file format elf64-bpf
>>
>> Disassembly of section .text:
>>
>> 0000000000000000 <foo>:
>>         ...
>>         6:       r2 <<= 0x3
>>         7:       r1 = 0x0 ll
>>                  0000000000000038:  R_BPF_64_64  .jumptables
>>         9:       r1 += r2
>>        10:       r1 = *(u64 *)(r1 + 0x0)
>>        11:       gotox r1
>>        ...
>>        34:       r2 <<= 0x3
>>        35:       r1 = 0x100 ll
>>                  0000000000000118:  R_BPF_64_64  .jumptables
>>        37:       r1 += r2
>>        38:       r1 = *(u64 *)(r1 + 0x0)
>>        39:       gotox r1
>>        ...
>>
>> -------------------------------
>>
>> The changes only touch BPF backend. Can be simplified a bit if I move
>> MachineFunction::getJTISymbol to TargetLowering in the shared LLVM
>> parts.
>
> Thanks, Eduard. I actually also explored a little bit and came up with
> the below patch:
> https://github.com/yonghong-song/llvm-project/tree/br-jt-v6-seperate-jmptable
> the top commit is the addition on top of 
> https://github.com/llvm/llvm-project/pull/133856.
> I tried to leverage existing llvm infrastructure and it will support 
> ELF/XCOFF/COFF
> and all backends.
>
> Anton, besides Eduard's patch, please also take a look at the above 
> patch.

Sorry, I have not looked at this patch for a while.

I briefly went through the discussions, the github patch Eduard provided seems
the right choice and please ignore my above patch. Also, as discussion
with Alexei and Eduard, a little bit more work is needed in llvm to help verifier.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-07 19:07           ` Eduard Zingerman
  2025-07-07 19:34             ` Anton Protopopov
  2025-07-07 21:44             ` Yonghong Song
@ 2025-07-08  8:30             ` Eduard Zingerman
  2025-07-08 10:42               ` Eduard Zingerman
  2 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-08  8:30 UTC (permalink / raw)
  To: Anton Protopopov, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet

On Mon, 2025-07-07 at 12:07 -0700, Eduard Zingerman wrote:
> On Thu, 2025-07-03 at 11:21 -0700, Eduard Zingerman wrote:
> 
> [...]
> 
> > > > >   .jumptables
> > > > >     <subprog-rel-off-0>
> > > > >     <subprog-rel-off-1> | <--- jump table #1 symbol:
> > > > >     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
> > > > >     ...                          .value = 1  // offset within .jumptables
> > > > >     <subprog-rel-off-N>                          ^
> > > > >                                                  |
> > > > >   .text                                          |
> > > > >     ...                                          |
> > > > >     <insn-N>     <------ relocation referencing -'
> > > > >     ...                  jump table #1 symbol
> 
> [...]
> 
> I think I got it working in:
> https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section
> 
> Changes on top of Yonghong's work.
> An example is in the attachment the gist is:
> 
> -------------------------------
> 
> $ clang --target=bpf -c -o jump-table-test.o jump-table-test.c
> There are 8 section headers, starting at offset 0xaa0:
> 
> Section Headers:
>   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>   ...
>   [ 4] .jumptables       PROGBITS        0000000000000000 000220 000260 00      0   0  1
>   ...
> 
> Symbol table '.symtab' contains 8 entries:
>    Num:    Value          Size Type    Bind   Vis       Ndx Name
>      ...
>      3: 0000000000000000   256 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.0
>      4: 0000000000000100   352 NOTYPE  LOCAL  DEFAULT     4 .BPF.JT.0.1
>      ...
> 
> $ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
> jump-table-test.o:      file format elf64-bpf
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo>:
>        ...
>        6:       r2 <<= 0x3
>        7:       r1 = 0x0 ll
>                 0000000000000038:  R_BPF_64_64  .jumptables

I just realized that this relocation references a wrong symbol.
Instead of .BPF.JT.0.0 it references jump table itself.
Need more time to investigate.

>        9:       r1 += r2
>       10:       r1 = *(u64 *)(r1 + 0x0)
>       11:       gotox r1

Adding a relocation here requires to bend over backwards a little bit.
Need more time to figure this out.

>       ...
>       34:       r2 <<= 0x3
>       35:       r1 = 0x100 ll
>                 0000000000000118:  R_BPF_64_64  .jumptables
>       37:       r1 += r2
>       38:       r1 = *(u64 *)(r1 + 0x0)
>       39:       gotox r1
>       ...
> 
> -------------------------------
> 
> The changes only touch BPF backend. Can be simplified a bit if I move
> MachineFunction::getJTISymbol to TargetLowering in the shared LLVM
> parts.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08  8:30             ` Eduard Zingerman
@ 2025-07-08 10:42               ` Eduard Zingerman
  0 siblings, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-08 10:42 UTC (permalink / raw)
  To: Anton Protopopov, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet

On Tue, 2025-07-08 at 01:30 -0700, Eduard Zingerman wrote:
> On Mon, 2025-07-07 at 12:07 -0700, Eduard Zingerman wrote:
> > On Thu, 2025-07-03 at 11:21 -0700, Eduard Zingerman wrote:
> > 
> > [...]
> > 
> > > > > >   .jumptables
> > > > > >     <subprog-rel-off-0>
> > > > > >     <subprog-rel-off-1> | <--- jump table #1 symbol:
> > > > > >     <subprog-rel-off-2> |        .size = 2   // number of entries in the jump table
> > > > > >     ...                          .value = 1  // offset within .jumptables
> > > > > >     <subprog-rel-off-N>                          ^
> > > > > >                                                  |
> > > > > >   .text                                          |
> > > > > >     ...                                          |
> > > > > >     <insn-N>     <------ relocation referencing -'
> > > > > >     ...                  jump table #1 symbol
> > 
> > [...]
> > 
> > I think I got it working in:
> > https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section

Pushed fixes. Relocations are now emitted for gotox and reference
correct symbol:

0000000000000000 <foo>:
       0:       if w1 > 0x1f goto +0x10 <foo+0x88>
       1:       w1 = w1
       2:       r1 <<= 0x3
       3:       r2 = 0x0 ll
                0000000000000018:  R_BPF_64_64  .BPF.JT.0.0
       5:       r2 += r1
       6:       r1 = *(u64 *)(r2 + 0x0)
       7:       gotox r1
                0000000000000038:  R_BPF_64_64  .BPF.JT.0.0
       8:       goto +0x8 <foo+0x88>

Two llvm BPF unit tests are failing:

 Failed Tests (2):
  LLVM :: CodeGen/BPF/CORE/offset-reloc-fieldinfo-2-bpfeb.ll
  LLVM :: CodeGen/BPF/CORE/offset-reloc-fieldinfo-2.ll

But I think current state should be sufficient for basic testing.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-06-18  3:22   ` Alexei Starovoitov
  2025-06-18 15:08     ` Anton Protopopov
@ 2025-07-08 20:59     ` Eduard Zingerman
  2025-07-08 21:25       ` Alexei Starovoitov
  2025-07-09  5:33       ` Anton Protopopov
  1 sibling, 2 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-08 20:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Anton Protopopov,
	Daniel Borkmann, Quentin Monnet, Yonghong Song

On Tue, 2025-06-17 at 20:22 -0700, Alexei Starovoitov wrote:
> On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> > 
> > The final line generates an indirect jump. The
> > format of the indirect jump instruction supported by BPF is
> > 
> >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> > 

[...]

> Uglier alternatives is to redesign the gotox encoding and
> drop ld_imm64 and *=8 altogether.
> Then gotox jmp_table[R5] will be like jumbo insn that
> does *=8 and load inside and JIT emits all that.
> But it's ugly and likely has other downsides.

I talked to Alexei and Yonghong off-list, and we seem to be in
agreement that having a single gotox capturing both the map and the
offset looks more elegant. E.g.:

  gotox imm32[dst_reg];

Where imm32 is an fd of the map corresponding to the jump table,
and dst-reg is an offset inside the table (it could also be an index).

So, instead of a current codegen:

  0000000000000000 <foo>:
       ...
       1:       w1 = w1
       2:       r1 <<= 0x3
       3:       r2 = 0x0 ll
                0000000000000018:  R_BPF_64_64  .BPF.JT.0.0
       5:       r2 += r1
       6:       r1 = *(u64 *)(r2 + 0x0)
       7:       gotox r1
                0000000000000038:  R_BPF_64_64  .BPF.JT.0.0

LLVM would produce:

  0000000000000000 <foo>:
       ...
       1:       w1 = w1
       2:       r1 <<= 0x3
       3:       gotox r1
                0000000000000038:  R_BPF_64_64  .BPF.JT.0.0

This sequence leaks a bit less implementation details and avoids a
check for correspondence between load and gotox instructions.
It will require using REG_AX on the jit side.
LLVM side implementation is not hard, as it directly maps to `br_jt`
selection DAG instruction.

Anton, wdyt?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08 20:59     ` Eduard Zingerman
@ 2025-07-08 21:25       ` Alexei Starovoitov
  2025-07-08 21:29         ` Eduard Zingerman
  2025-07-09  5:33       ` Anton Protopopov
  1 sibling, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-07-08 21:25 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Tue, Jul 8, 2025 at 1:59 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2025-06-17 at 20:22 -0700, Alexei Starovoitov wrote:
> > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > >
> > > The final line generates an indirect jump. The
> > > format of the indirect jump instruction supported by BPF is
> > >
> > >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> > >
>
> [...]
>
> > Uglier alternatives is to redesign the gotox encoding and
> > drop ld_imm64 and *=8 altogether.
> > Then gotox jmp_table[R5] will be like jumbo insn that
> > does *=8 and load inside and JIT emits all that.
> > But it's ugly and likely has other downsides.
>
> I talked to Alexei and Yonghong off-list, and we seem to be in
> agreement that having a single gotox capturing both the map and the
> offset looks more elegant. E.g.:
>
>   gotox imm32[dst_reg];
>
> Where imm32 is an fd of the map corresponding to the jump table,
> and dst-reg is an offset inside the table (it could also be an index).
>
> So, instead of a current codegen:
>
>   0000000000000000 <foo>:
>        ...
>        1:       w1 = w1
>        2:       r1 <<= 0x3
>        3:       r2 = 0x0 ll
>                 0000000000000018:  R_BPF_64_64  .BPF.JT.0.0
>        5:       r2 += r1
>        6:       r1 = *(u64 *)(r2 + 0x0)
>        7:       gotox r1
>                 0000000000000038:  R_BPF_64_64  .BPF.JT.0.0
>
> LLVM would produce:
>
>   0000000000000000 <foo>:
>        ...
>        1:       w1 = w1
>        2:       r1 <<= 0x3

If we go this route, let's drop this *8 and make it an index ?
Less checks in the verifier...

>        3:       gotox r1
>                 0000000000000038:  R_BPF_64_64  .BPF.JT.0.0
>
> This sequence leaks a bit less implementation details and avoids a
> check for correspondence between load and gotox instructions.
> It will require using REG_AX on the jit side.
> LLVM side implementation is not hard, as it directly maps to `br_jt`
> selection DAG instruction.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08 21:25       ` Alexei Starovoitov
@ 2025-07-08 21:29         ` Eduard Zingerman
  0 siblings, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-08 21:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Anton Protopopov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Tue, 2025-07-08 at 14:25 -0700, Alexei Starovoitov wrote:

[...]

> > LLVM would produce:
> > 
> >   0000000000000000 <foo>:
> >        ...
> >        1:       w1 = w1
> >        2:       r1 <<= 0x3
> 
> If we go this route, let's drop this *8 and make it an index ?
> Less checks in the verifier...
> 
> >        3:       gotox r1
> >                 0000000000000038:  R_BPF_64_64  .BPF.JT.0.0

Makes sense to me.

[...]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-08 20:59     ` Eduard Zingerman
  2025-07-08 21:25       ` Alexei Starovoitov
@ 2025-07-09  5:33       ` Anton Protopopov
  2025-07-09  5:58         ` Eduard Zingerman
  1 sibling, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-07-09  5:33 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/07/08 01:59PM, Eduard Zingerman wrote:
> On Tue, 2025-06-17 at 20:22 -0700, Alexei Starovoitov wrote:
> > On Sun, Jun 15, 2025 at 1:55 AM Anton Protopopov
> > <a.s.protopopov@gmail.com> wrote:
> > > 
> > > The final line generates an indirect jump. The
> > > format of the indirect jump instruction supported by BPF is
> > > 
> > >     BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=fd(M)
> > > 
> 
> [...]
> 
> > Uglier alternatives is to redesign the gotox encoding and
> > drop ld_imm64 and *=8 altogether.
> > Then gotox jmp_table[R5] will be like jumbo insn that
> > does *=8 and load inside and JIT emits all that.
> > But it's ugly and likely has other downsides.
> 
> I talked to Alexei and Yonghong off-list, and we seem to be in
> agreement that having a single gotox capturing both the map and the
> offset looks more elegant. E.g.:
> 
>   gotox imm32[dst_reg];
> 
> Where imm32 is an fd of the map corresponding to the jump table,
> and dst-reg is an offset inside the table (it could also be an index).
> 
> So, instead of a current codegen:
> 
>   0000000000000000 <foo>:
>        ...
>        1:       w1 = w1
>        2:       r1 <<= 0x3
>        3:       r2 = 0x0 ll
>                 0000000000000018:  R_BPF_64_64  .BPF.JT.0.0
>        5:       r2 += r1
>        6:       r1 = *(u64 *)(r2 + 0x0)
>        7:       gotox r1
>                 0000000000000038:  R_BPF_64_64  .BPF.JT.0.0
> 
> LLVM would produce:
> 
>   0000000000000000 <foo>:
>        ...
>        1:       w1 = w1
>        2:       r1 <<= 0x3
>        3:       gotox r1
>                 0000000000000038:  R_BPF_64_64  .BPF.JT.0.0
> 
> This sequence leaks a bit less implementation details and avoids a
> check for correspondence between load and gotox instructions.
> It will require using REG_AX on the jit side.
> LLVM side implementation is not hard, as it directly maps to `br_jt`
> selection DAG instruction.
> 
> Anton, wdyt?

I think that this is exactly what I had proposed originally in [1],
so yes, IMO this looks more elegant indeed. (Back then the feedback was
that this is too esoteric, and instead the verifier should be taught
to eat what LLVM generates (<<3 and load).) The instruction can be
extended (SRC and OFF are unused) to support more formats later.

>        3:       gotox r1
>                 0000000000000038:  R_BPF_64_64  .BPF.JT.0.0

How hard is to teach the LLVM to generate this?

  [1] https://lpc.events/event/18/contributions/1941/

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-09  5:33       ` Anton Protopopov
@ 2025-07-09  5:58         ` Eduard Zingerman
  2025-07-09  8:38           ` Eduard Zingerman
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-09  5:58 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Wed, 2025-07-09 at 05:33 +0000, Anton Protopopov wrote:

[...]

> I think that this is exactly what I had proposed originally in [1],
> so yes, IMO this looks more elegant indeed. (Back then the feedback was
> that this is too esoteric, and instead the verifier should be taught
> to eat what LLVM generates (<<3 and load).) The instruction can be
> extended (SRC and OFF are unused) to support more formats later.

Well, we did a full circle. At-least everybody is on the same page now :)

> 
> >        3:       gotox r1
> >                 0000000000000038:  R_BPF_64_64  .BPF.JT.0.0
> 
> How hard is to teach the LLVM to generate this?
> 
>   [1] https://lpc.events/event/18/contributions/1941/

This seems to work:
https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section.1

Needs some tests and probably can be simplified a bit.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-09  5:58         ` Eduard Zingerman
@ 2025-07-09  8:38           ` Eduard Zingerman
  2025-07-10  5:11             ` Eduard Zingerman
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-09  8:38 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

On Tue, 2025-07-08 at 22:58 -0700, Eduard Zingerman wrote:

[...]

> This seems to work:
> https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section.1

Pushed an update:
- correct offsets computation;
- avoid relocations in the .jumptables section.

Here is how it looks now (updated session.log attached):

  foo:                                    # @foo
  # %bb.0:                                # %entry
  	w1 = *(u32 *)(r1 + 0)
  	if w1 > 31 goto LBB0_2
  # %bb.1:                                # %entry
  .LJTI0_0:
  	.reloc 0, FK_SecRel_8, .BPF.JT.0.0
  	gotox r1
  	goto LBB0_2
  LBB0_7:
  	w1 = 2
  	goto LBB0_3
        ...
  .Lfunc_end0:
  	.size	foo, .Lfunc_end0-foo

  	.section	.jumptables,"",@progbits
  .L0_0_set_7 = LBB0_7-.LJTI0_0
  .L0_0_set_2 = LBB0_2-.LJTI0_0
  .L0_0_set_8 = LBB0_8-.LJTI0_0
  .L0_0_set_9 = LBB0_9-.LJTI0_0
  .L0_0_set_10 = LBB0_10-.LJTI0_0
  .BPF.JT.0.0:
  	.long	.L0_0_set_7
  	.long	.L0_0_set_2
  	.long	.L0_0_set_2
  	.long	.L0_0_set_2
  	.long	.L0_0_set_2
        ...

I think this is a correct form, further changes should be LLVM
internal.

[-- Attachment #2: session.log --]
[-- Type: text/x-log, Size: 7532 bytes --]

$ cat jump-table-test.c

int bar(int v);

int foo(struct simple_ctx *ctx)
{
	int ret_user;

        switch (ctx->x) {
        case 0:
                ret_user = 2;
                break;
        case 11:
                ret_user = 3;
                break;
        case 27:
                ret_user = 4;
                break;
        case 31:
                ret_user = 5;
                break;
        default:
                ret_user = 19;
                break;
        }

        switch (bar(ret_user)) {
        case 1:
                ret_user = 5;
                break;
        case 12:
                ret_user = 7;
                break;
        case 27:
                ret_user = 23;
                break;
        case 32:
                ret_user = 37;
                break;
        case 44:
                ret_user = 77;
                break;
        default:
                ret_user = 11;
                break;
        }

        return ret_user;
}

$ clang --target=bpf -O2 -S -o - jump-table-test.c
	.text
	.globl	foo                             # -- Begin function foo
	.p2align	3
	.type	foo,@function
foo:                                    # @foo
# %bb.0:                                # %entry
	w1 = *(u32 *)(r1 + 0)
	if w1 > 31 goto LBB0_2
# %bb.1:                                # %entry
.LJTI0_0:
	.reloc 0, FK_SecRel_8, .BPF.JT.0.0
	gotox r1
	goto LBB0_2
LBB0_7:
	w1 = 2
	goto LBB0_3
LBB0_9:                                 # %sw.bb2
	w1 = 4
	goto LBB0_3
LBB0_8:                                 # %sw.bb1
	w1 = 3
	goto LBB0_3
LBB0_10:                                # %sw.bb3
	w1 = 5
	goto LBB0_3
LBB0_2:                                 # %sw.default
	w1 = 19
LBB0_3:                                 # %sw.epilog
	call bar
                                        # kill: def $w0 killed $w0 def $r0
	w0 += -1
	if w0 > 43 goto LBB0_5
# %bb.4:                                # %sw.epilog
.LJTI0_1:
	.reloc 0, FK_SecRel_8, .BPF.JT.0.1
	gotox r0
	goto LBB0_5
LBB0_11:
	w0 = 5
	goto LBB0_6
LBB0_5:                                 # %sw.default9
	w0 = 11
	goto LBB0_6
LBB0_13:                                # %sw.bb6
	w0 = 23
	goto LBB0_6
LBB0_12:                                # %sw.bb5
	w0 = 7
	goto LBB0_6
LBB0_14:                                # %sw.bb7
	w0 = 37
	goto LBB0_6
LBB0_15:                                # %sw.bb8
	w0 = 77
LBB0_6:                                 # %sw.epilog10
	exit
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.section	.jumptables,"",@progbits
.L0_0_set_7 = LBB0_7-.LJTI0_0
.L0_0_set_2 = LBB0_2-.LJTI0_0
.L0_0_set_8 = LBB0_8-.LJTI0_0
.L0_0_set_9 = LBB0_9-.LJTI0_0
.L0_0_set_10 = LBB0_10-.LJTI0_0
.BPF.JT.0.0:
	.long	.L0_0_set_7
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_8
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_9
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_2
	.long	.L0_0_set_10
	.size	.BPF.JT.0.0, 128
.L0_1_set_11 = LBB0_11-.LJTI0_1
.L0_1_set_5 = LBB0_5-.LJTI0_1
.L0_1_set_12 = LBB0_12-.LJTI0_1
.L0_1_set_13 = LBB0_13-.LJTI0_1
.L0_1_set_14 = LBB0_14-.LJTI0_1
.L0_1_set_15 = LBB0_15-.LJTI0_1
.BPF.JT.0.1:
	.long	.L0_1_set_11
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_12
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_13
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_14
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_5
	.long	.L0_1_set_15
	.size	.BPF.JT.0.1, 176
                                        # -- End function
	.addrsig

$ clang --target=bpf -O2 -c -o jump-table-test.o jump-table-test.c
$ llvm-readelf -r --symbols --sections jump-table-test.o
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 000320 000067 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 0000f0 00  AX  0   0  8
  [ 3] .rel.text         REL             0000000000000000 0002f0 000030 10   I  6   2  8
  [ 4] .jumptables       PROGBITS        0000000000000000 000130 000130 00      0   0  1
  [ 5] .llvm_addrsig     LLVM_ADDRSIG    0000000000000000 000320 000000 00   E  6   0  1
  [ 6] .symtab           SYMTAB          0000000000000000 000260 000090 18      1   2  8
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  R (retain), p (processor specific)

Relocation section '.rel.text' at offset 0x2f0 contains 3 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000010  0000000300000001 R_BPF_64_64            0000000000000000 .BPF.JT.0.0
0000000000000068  000000040000000a R_BPF_64_32            0000000000000000 bar
0000000000000080  0000000500000001 R_BPF_64_64            0000000000000080 .BPF.JT.0.1

Symbol table '.symtab' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS jump-table-test.c
     2: 0000000000000000   240 FUNC    GLOBAL DEFAULT     2 foo
     3: 0000000000000000   128 NOTYPE  GLOBAL DEFAULT     4 .BPF.JT.0.0
     4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND bar
     5: 0000000000000080   176 NOTYPE  GLOBAL DEFAULT     4 .BPF.JT.0.1

$ llvm-objdump --no-show-raw-insn -Sdr jump-table-test.o
jump-table-test.o:	file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
       0:	w1 = *(u32 *)(r1 + 0x0)
       1:	if w1 > 0x1f goto +0xa <foo+0x60>
       2:	gotox r1
		0000000000000010:  R_BPF_64_64	.BPF.JT.0.0
       3:	goto +0x8 <foo+0x60>
       4:	w1 = 0x2
       5:	goto +0x7 <foo+0x68>
       6:	w1 = 0x4
       7:	goto +0x5 <foo+0x68>
       8:	w1 = 0x3
       9:	goto +0x3 <foo+0x68>
      10:	w1 = 0x5
      11:	goto +0x1 <foo+0x68>
      12:	w1 = 0x13
      13:	call -0x1
		0000000000000068:  R_BPF_64_32	bar
      14:	w0 += -0x1
      15:	if w0 > 0x2b goto +0x4 <foo+0xa0>
      16:	gotox r0
		0000000000000080:  R_BPF_64_64	.BPF.JT.0.1
      17:	goto +0x2 <foo+0xa0>
      18:	w0 = 0x5
      19:	goto +0x9 <foo+0xe8>
      20:	w0 = 0xb
      21:	goto +0x7 <foo+0xe8>
      22:	w0 = 0x17
      23:	goto +0x5 <foo+0xe8>
      24:	w0 = 0x7
      25:	goto +0x3 <foo+0xe8>
      26:	w0 = 0x25
      27:	goto +0x1 <foo+0xe8>
      28:	w0 = 0x4d
      29:	exit

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-09  8:38           ` Eduard Zingerman
@ 2025-07-10  5:11             ` Eduard Zingerman
  2025-07-10  6:10               ` Anton Protopopov
  0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-10  5:11 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Wed, 2025-07-09 at 01:38 -0700, Eduard Zingerman wrote:
> On Tue, 2025-07-08 at 22:58 -0700, Eduard Zingerman wrote:
> 
> [...]
> 
> > This seems to work:
> > https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section.1

[...]

> I think this is a correct form, further changes should be LLVM
> internal.

Pushed yet another update. Jump table entries computation was off by 1.
Here is a comment from the commit:

--- 8< --------------------------------

Emit JX instruction anchor label:

       .reloc 0, FK_SecRel_8, BPF.JT.0.0
       gotox r1
  .LBPF.JX.0.0:                          <--- this

This label is used to compute jump table entries:

                 .--- basic block label
                 v
  .L0_0_set_7 = LBB0_7 - .LBPF.JX.0.0    <---- JX anchor label
  ...
  BPF.JT.0.0:                            <---- JT definition
       .long   .L0_0_set_7

The anchor needs to be placed after gotox to follow BPF
jump offset rules: dest_pc == jump_pc + off + 1.
For example:

  1: gotox r1 // suppose r1 value corresponds to to LBB0_7
     ...
  5: <insn>   // LBB0_7 physical address

In order to jump to 5 from 1 offset read from jump table has to be 3,
hence anchor should be placed at 2.

-------------------------------- >8 ---

Please let me know if this works end-to-end.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-10  5:11             ` Eduard Zingerman
@ 2025-07-10  6:10               ` Anton Protopopov
  2025-07-10  6:13                 ` Eduard Zingerman
  0 siblings, 1 reply; 63+ messages in thread
From: Anton Protopopov @ 2025-07-10  6:10 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On 25/07/09 10:11PM, Eduard Zingerman wrote:
> On Wed, 2025-07-09 at 01:38 -0700, Eduard Zingerman wrote:
> > On Tue, 2025-07-08 at 22:58 -0700, Eduard Zingerman wrote:
> > 
> > [...]
> > 
> > > This seems to work:
> > > https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section.1
> 
> [...]
> 
> > I think this is a correct form, further changes should be LLVM
> > internal.
> 
> Pushed yet another update. Jump table entries computation was off by 1.
> Here is a comment from the commit:
> 
> --- 8< --------------------------------
> 
> Emit JX instruction anchor label:
> 
>        .reloc 0, FK_SecRel_8, BPF.JT.0.0
>        gotox r1
>   .LBPF.JX.0.0:                          <--- this
> 
> This label is used to compute jump table entries:
> 
>                  .--- basic block label
>                  v
>   .L0_0_set_7 = LBB0_7 - .LBPF.JX.0.0    <---- JX anchor label
>   ...
>   BPF.JT.0.0:                            <---- JT definition
>        .long   .L0_0_set_7
> 
> The anchor needs to be placed after gotox to follow BPF
> jump offset rules: dest_pc == jump_pc + off + 1.
> For example:
> 
>   1: gotox r1 // suppose r1 value corresponds to to LBB0_7
>      ...
>   5: <insn>   // LBB0_7 physical address
> 
> In order to jump to 5 from 1 offset read from jump table has to be 3,
> hence anchor should be placed at 2.
> 
> -------------------------------- >8 ---
> 
> Please let me know if this works end-to-end.

Thanks! I will be testing this today with my patchset.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps
  2025-07-10  6:10               ` Anton Protopopov
@ 2025-07-10  6:13                 ` Eduard Zingerman
  0 siblings, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-07-10  6:13 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Anton Protopopov, Daniel Borkmann, Quentin Monnet, Yonghong Song

On Thu, 2025-07-10 at 06:10 +0000, Anton Protopopov wrote:
> On 25/07/09 10:11PM, Eduard Zingerman wrote:
> > On Wed, 2025-07-09 at 01:38 -0700, Eduard Zingerman wrote:
> > > On Tue, 2025-07-08 at 22:58 -0700, Eduard Zingerman wrote:
> > > 
> > > [...]
> > > 
> > > > This seems to work:
> > > > https://github.com/eddyz87/llvm-project/tree/separate-jumptables-section.1
> > 
> > [...]
> > 
> > > I think this is a correct form, further changes should be LLVM
> > > internal.
> > 
> > Pushed yet another update. Jump table entries computation was off by 1.
> > Here is a comment from the commit:
> > 
> > --- 8< --------------------------------
> > 
> > Emit JX instruction anchor label:
> > 
> >        .reloc 0, FK_SecRel_8, BPF.JT.0.0
> >        gotox r1
> >   .LBPF.JX.0.0:                          <--- this
> > 
> > This label is used to compute jump table entries:
> > 
> >                  .--- basic block label
> >                  v
> >   .L0_0_set_7 = LBB0_7 - .LBPF.JX.0.0    <---- JX anchor label
> >   ...
> >   BPF.JT.0.0:                            <---- JT definition
> >        .long   .L0_0_set_7
> > 
> > The anchor needs to be placed after gotox to follow BPF
> > jump offset rules: dest_pc == jump_pc + off + 1.
> > For example:
> > 
> >   1: gotox r1 // suppose r1 value corresponds to to LBB0_7
> >      ...
> >   5: <insn>   // LBB0_7 physical address
> > 
> > In order to jump to 5 from 1 offset read from jump table has to be 3,
> > hence anchor should be placed at 2.
> > 
> > -------------------------------- >8 ---
> > 
> > Please let me know if this works end-to-end.
> 
> Thanks! I will be testing this today with my patchset.

I just realized that /8 is also necessary for table values.
Will push an update in an hour.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2025-07-10  6:13 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-15  8:59 [RFC bpf-next 0/9] BPF indirect jumps Anton Protopopov
2025-06-15  8:59 ` [RFC bpf-next 1/9] bpf: save the start of functions in bpf_prog_aux Anton Protopopov
2025-06-15  8:59 ` [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set Anton Protopopov
2025-06-18  0:57   ` Eduard Zingerman
2025-06-18  2:16     ` Alexei Starovoitov
2025-06-19 18:57       ` Anton Protopopov
2025-06-19 18:55     ` Anton Protopopov
2025-06-19 18:55       ` Eduard Zingerman
2025-06-15  8:59 ` [RFC bpf-next 3/9] selftests/bpf: add selftests for new insn_set map Anton Protopopov
2025-06-18 11:04   ` Eduard Zingerman
2025-06-18 15:16     ` Anton Protopopov
2025-06-15  8:59 ` [RFC bpf-next 4/9] bpf, x86: allow indirect jumps to r8...r15 Anton Protopopov
2025-06-17 19:41   ` Alexei Starovoitov
2025-06-18 14:28     ` Anton Protopopov
2025-06-15  8:59 ` [RFC bpf-next 5/9] bpf, x86: add support for indirect jumps Anton Protopopov
2025-06-18  3:06   ` Alexei Starovoitov
2025-06-19 19:57     ` Anton Protopopov
2025-06-19 19:58     ` Anton Protopopov
2025-06-18 11:03   ` Eduard Zingerman
2025-06-19 20:13     ` Anton Protopopov
2025-06-15  8:59 ` [RFC bpf-next 6/9] bpf: workaround llvm behaviour with " Anton Protopopov
2025-06-18 11:04   ` Eduard Zingerman
2025-06-18 13:59     ` Alexei Starovoitov
2025-06-15  8:59 ` [RFC bpf-next 7/9] bpf: disasm: add support for BPF_JMP|BPF_JA|BPF_X Anton Protopopov
2025-06-15  8:59 ` [RFC bpf-next 8/9] libbpf: support llvm-generated indirect jumps Anton Protopopov
2025-06-18  3:22   ` Alexei Starovoitov
2025-06-18 15:08     ` Anton Protopopov
2025-07-07 23:45       ` Eduard Zingerman
2025-07-07 23:49         ` Alexei Starovoitov
2025-07-08  0:01           ` Eduard Zingerman
2025-07-08  0:12             ` Alexei Starovoitov
2025-07-08  0:18               ` Eduard Zingerman
2025-07-08  0:49                 ` Alexei Starovoitov
2025-07-08  0:51                   ` Eduard Zingerman
2025-07-08 20:59     ` Eduard Zingerman
2025-07-08 21:25       ` Alexei Starovoitov
2025-07-08 21:29         ` Eduard Zingerman
2025-07-09  5:33       ` Anton Protopopov
2025-07-09  5:58         ` Eduard Zingerman
2025-07-09  8:38           ` Eduard Zingerman
2025-07-10  5:11             ` Eduard Zingerman
2025-07-10  6:10               ` Anton Protopopov
2025-07-10  6:13                 ` Eduard Zingerman
2025-06-18 19:49   ` Eduard Zingerman
2025-06-27  2:28     ` Eduard Zingerman
2025-06-27 10:18       ` Anton Protopopov
2025-07-03 18:21         ` Eduard Zingerman
2025-07-03 19:03           ` Anton Protopopov
2025-07-07 19:07           ` Eduard Zingerman
2025-07-07 19:34             ` Anton Protopopov
2025-07-07 21:44             ` Yonghong Song
2025-07-08  5:58               ` Yonghong Song
2025-07-08  8:30             ` Eduard Zingerman
2025-07-08 10:42               ` Eduard Zingerman
2025-06-15  8:59 ` [RFC bpf-next 9/9] selftests/bpf: add selftests for " Anton Protopopov
2025-06-18  3:24   ` Alexei Starovoitov
2025-06-18 14:49     ` Anton Protopopov
2025-06-18 16:01       ` Alexei Starovoitov
2025-06-18 16:36         ` Anton Protopopov
2025-06-18 16:43           ` Alexei Starovoitov
2025-06-18 20:25             ` Anton Protopopov
2025-06-18 21:59               ` Alexei Starovoitov
2025-06-19  5:05                 ` Anton Protopopov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).