[RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs
@ 2026-02-09 15:59 Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Introduce the following 64-bit bitops kfuncs for x86_64 and arm64:

* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.

Especially,

* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0

bpf_ffs64() was previously discussed in "bpf: Add generic kfunc bpf_ffs64()" [1].

Background

In the earlier bpf_ffs64() discussion, the main concern with exposing such
operations as generic kfuncs was ABI cost. A normal kfunc call follows the
BPF calling convention, which forces the compiler/JIT to treat R1-R5 as
call-clobbered, resulting in unnecessary spill/fill compared to a dedicated
instruction.

This RFC keeps the user-facing API as kfuncs, but avoids the ABI cost in the
fast path. The verifier rewrites supported bitops kfunc calls into a single
internal ALU64 encoding (BPF_BITOPS with an immediate selector), and JIT
backends emit native instructions directly. As a result, these kfuncs behave
like ISA operations once loaded, rather than real helper calls.

To make this contract explicit, the kfuncs are marked with a new
KF_MUST_INLINE flag: program load fails with -EOPNOTSUPP if the active JIT
backend cannot inline a particular operation. This keeps the cost predictable
and avoids silent slow fallbacks. A weak hook, bpf_jit_inlines_bitops(),
allows each JIT backend to advertise support on a per-operation basis
(and potentially based on CPU features).

Most operations are also tagged KF_FASTCALL to avoid clobbering unused
argument registers. bpf_rol64() and bpf_ror64() are the exception on x86_64,
where variable rotates require CL (BPF_REG_4).

Selftests output

On x86_64:

 #18/1    bitops/clz64:OK
 #18/2    bitops/ctz64:OK
 #18/3    bitops/ffs64:OK
 #18/4    bitops/fls64:OK
 #18/5    bitops/bitrev64:SKIP
 #18/6    bitops/popcnt64:OK
 #18/7    bitops/rol64:OK
 #18/8    bitops/ror64:OK
 #18      bitops:OK (SKIP: 1/8)
 Summary: 1/7 PASSED, 1 SKIPPED, 0 FAILED

On arm64:

 #18/1    bitops/clz64:OK
 #18/2    bitops/ctz64:OK
 #18/3    bitops/ffs64:OK
 #18/4    bitops/fls64:OK
 #18/5    bitops/bitrev64:OK
 #18/6    bitops/popcnt64:SKIP
 #18/7    bitops/rol64:OK
 #18/8    bitops/ror64:OK
 #18      bitops:OK (SKIP: 1/8)
 Summary: 1/7 PASSED, 1 SKIPPED, 0 FAILED

Open questions

1. Should these operations be exposed as a proper BPF ISA extension (new
   ALU64 ops) instead of a kfunc API plus verifier rewrite? This RFC takes
   the kfunc route to iterate without immediately committing to new uapi
   instruction semantics, while still ensuring instruction-like codegen.

2. For operations without a reasonable native implementation on some
   targets (e.g. bitrev64 on x86_64; popcnt64 on arm64 without touching
   SIMD registers), should we allow a true generic fallback by dropping
   KF_MUST_INLINE for those ops, or keep the "no-inline == reject" behavior
   for predictability?

Links:
[1] https://lore.kernel.org/bpf/20240131155607.51157-1-hffilwlqm@gmail.com/

Leon Hwang (4):
  bpf: Introduce 64bit bitops kfuncs
  bpf, x86: Add 64bit bitops kfuncs support for x86_64
  bpf, arm64: Add 64bit bitops kfuncs support
  selftests/bpf: Add tests for 64bit bitops kfuncs

 arch/arm64/net/bpf_jit_comp.c                 | 143 ++++++++++++++
 arch/x86/net/bpf_jit_comp.c                   | 153 ++++++++++++++
 include/linux/btf.h                           |   1 +
 include/linux/filter.h                        |  20 ++
 kernel/bpf/core.c                             |   6 +
 kernel/bpf/helpers.c                          |  50 +++++
 kernel/bpf/verifier.c                         |  65 ++++++
 .../testing/selftests/bpf/bpf_experimental.h  |   9 +
 .../testing/selftests/bpf/prog_tests/bitops.c | 186 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bitops.c    |  69 +++++++
 10 files changed, 702 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitops.c

--
2.52.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  2026-02-11  3:05   ` Alexei Starovoitov
  2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Introduce the following 64bit bitops kfuncs:

* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
  is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.

Especially,

* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0

These kfuncs are marked with a new KF_MUST_INLINE flag, which indicates
the kfunc must be inlined by the JIT backend. A weak function
bpf_jit_inlines_bitops() is introduced for JIT backends to advertise
support for individual bitops.

bpf_rol64() and bpf_ror64() kfuncs do not have KF_FASTCALL due to
BPF_REG_4 ('cl' actually) will be used on x86_64. The other kfuncs have
KF_FASTCALL to avoid clobbering unused registers.

An internal BPF_ALU64 opcode BPF_BITOPS is introduced as the encoding
for these operations, with the immediate field selecting the specific
operation (BPF_CLZ64, BPF_CTZ64, etc.).

The verifier rejects the kfunc in check_kfunc_call() if the JIT backend
does not support it, and rewrites the call to a BPF_BITOPS instruction
in fixup_kfunc_call().

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/btf.h    |  1 +
 include/linux/filter.h | 20 +++++++++++++
 kernel/bpf/core.c      |  6 ++++
 kernel/bpf/helpers.c   | 50 ++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c  | 65 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 142 insertions(+)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 48108471c5b1..8ac1dc59ca85 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -79,6 +79,7 @@
 #define KF_ARENA_ARG1   (1 << 14) /* kfunc takes an arena pointer as its first argument */
 #define KF_ARENA_ARG2   (1 << 15) /* kfunc takes an arena pointer as its second argument */
 #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
+#define KF_MUST_INLINE  (1 << 17) /* kfunc must be inlined by JIT backend */
 
 /*
  * Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 4e1cb4f91f49..ff6c0cf68dd3 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
 		.off   = 0,					\
 		.imm   = 0 })
 
+/* bitops */
+#define BPF_BITOPS	0xe0	/* opcode for alu64 */
+#define BPF_CLZ64	0x00	/* imm for clz64 */
+#define BPF_CTZ64	0x01	/* imm for ctz64 */
+#define BPF_FFS64	0x02	/* imm for ffs64 */
+#define BPF_FLS64	0x03	/* imm for fls64 */
+#define BPF_BITREV64	0x04	/* imm for bitrev64 */
+#define BPF_POPCNT64	0x05	/* imm for popcnt64 */
+#define BPF_ROL64	0x06	/* imm for rol64 */
+#define BPF_ROR64	0x07	/* imm for ror64 */
+
+#define BPF_BITOPS_INSN(IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_BITOPS,		\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
 /* Internal classic blocks for direct assignment */
 
 #define __BPF_STMT(CODE, K)					\
@@ -1157,6 +1176,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_jit_needs_zext(void);
 bool bpf_jit_inlines_helper_call(s32 imm);
+bool bpf_jit_inlines_bitops(s32 imm);
 bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_percpu_insn(void);
 bool bpf_jit_supports_kfunc_call(void);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index dc906dfdff94..cee90181d169 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3113,6 +3113,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
 	return false;
 }
 
+/* Return TRUE if the JIT backend inlines the bitops insn. */
+bool __weak bpf_jit_inlines_bitops(s32 imm)
+{
+	return false;
+}
+
 /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool __weak bpf_jit_supports_subprog_tailcalls(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ac32798eb04..0a598c800f67 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -29,6 +29,8 @@
 #include <linux/task_work.h>
 #include <linux/irq_work.h>
 #include <linux/buildid.h>
+#include <linux/bitops.h>
+#include <linux/bitrev.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
 	}
 }
 
+__bpf_kfunc u64 bpf_clz64(u64 x)
+{
+	return x ? 64 - fls64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ctz64(u64 x)
+{
+	return x ? __ffs64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ffs64(u64 x)
+{
+	return x ? __ffs64(x) + 1 : 0;
+}
+
+__bpf_kfunc u64 bpf_fls64(u64 x)
+{
+	return fls64(x);
+}
+
+__bpf_kfunc u64 bpf_popcnt64(u64 x)
+{
+	return hweight64(x);
+}
+
+__bpf_kfunc u64 bpf_bitrev64(u64 x)
+{
+	return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
+}
+
+__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
+{
+	return rol64(x, s);
+}
+
+__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
+{
+	return ror64(x, s);
+}
+
 __bpf_kfunc_end_defs();
 
 static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
 #endif
 #endif
+BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)
 BTF_KFUNCS_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index edf5342b982f..ed9a077ecf2e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12477,6 +12477,14 @@ enum special_kfunc_type {
 	KF_bpf_session_is_return,
 	KF_bpf_stream_vprintk,
 	KF_bpf_stream_print_stack,
+	KF_bpf_clz64,
+	KF_bpf_ctz64,
+	KF_bpf_ffs64,
+	KF_bpf_fls64,
+	KF_bpf_bitrev64,
+	KF_bpf_popcnt64,
+	KF_bpf_rol64,
+	KF_bpf_ror64,
 };
 
 BTF_ID_LIST(special_kfunc_list)
@@ -12557,6 +12565,14 @@ BTF_ID(func, bpf_arena_reserve_pages)
 BTF_ID(func, bpf_session_is_return)
 BTF_ID(func, bpf_stream_vprintk)
 BTF_ID(func, bpf_stream_print_stack)
+BTF_ID(func, bpf_clz64)
+BTF_ID(func, bpf_ctz64)
+BTF_ID(func, bpf_ffs64)
+BTF_ID(func, bpf_fls64)
+BTF_ID(func, bpf_bitrev64)
+BTF_ID(func, bpf_popcnt64)
+BTF_ID(func, bpf_rol64)
+BTF_ID(func, bpf_ror64)
 
 static bool is_task_work_add_kfunc(u32 func_id)
 {
@@ -12564,6 +12580,30 @@ static bool is_task_work_add_kfunc(u32 func_id)
 	       func_id == special_kfunc_list[KF_bpf_task_work_schedule_resume];
 }
 
+static bool get_bitops_insn_imm(u32 func_id, s32 *imm)
+{
+	if (func_id == special_kfunc_list[KF_bpf_clz64])
+		*imm = BPF_CLZ64;
+	else if (func_id == special_kfunc_list[KF_bpf_ctz64])
+		*imm = BPF_CTZ64;
+	else if (func_id == special_kfunc_list[KF_bpf_ffs64])
+		*imm = BPF_FFS64;
+	else if (func_id == special_kfunc_list[KF_bpf_fls64])
+		*imm = BPF_FLS64;
+	else if (func_id == special_kfunc_list[KF_bpf_bitrev64])
+		*imm = BPF_BITREV64;
+	else if (func_id == special_kfunc_list[KF_bpf_popcnt64])
+		*imm = BPF_POPCNT64;
+	else if (func_id == special_kfunc_list[KF_bpf_rol64])
+		*imm = BPF_ROL64;
+	else if (func_id == special_kfunc_list[KF_bpf_ror64])
+		*imm = BPF_ROR64;
+	else
+		return false;
+
+	return true;
+}
+
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
 	if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
@@ -14044,6 +14084,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	struct btf *desc_btf;
+	bool is_bitops_kfunc;
+	s32 insn_imm;
 
 	/* skip for now, but return error when we find this in fixup_kfunc_call */
 	if (!insn->imm)
@@ -14423,6 +14465,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	if (meta.func_id == special_kfunc_list[KF_bpf_session_cookie])
 		env->prog->call_session_cookie = true;
 
+	is_bitops_kfunc = get_bitops_insn_imm(meta.func_id, &insn_imm);
+	if ((meta.kfunc_flags & KF_MUST_INLINE)) {
+		bool inlined = is_bitops_kfunc && bpf_jit_inlines_bitops(insn_imm);
+
+		if (!inlined) {
+			verbose(env, "JIT does not support inlining the kfunc %s.\n", func_name);
+			return -EOPNOTSUPP;
+		}
+	}
+
 	return 0;
 }
 
@@ -23236,6 +23288,19 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn_buf[4] = BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1);
 		insn_buf[5] = BPF_ALU64_IMM(BPF_NEG, BPF_REG_0, 0);
 		*cnt = 6;
+	} else if (get_bitops_insn_imm(desc->func_id, &insn_buf[0].imm)) {
+		s32 imm = insn_buf[0].imm;
+
+		if (imm == BPF_FFS64) {
+			insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, 0);
+			insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 2);
+			insn_buf[2] = BPF_BITOPS_INSN(imm);
+			insn_buf[3] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1);
+			*cnt = 4;
+		} else {
+			insn_buf[0] = BPF_BITOPS_INSN(imm);
+			*cnt = 1;
+		}
 	}
 
 	if (env->insn_aux_data[insn_idx].arg_prog) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Implement JIT inlining of the 64bit bitops kfuncs on x86_64.

bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.

bpf_clz64(), bpf_ctz64(), bpf_ffs64(), and bpf_fls64() are supported
when the CPU has X86_FEATURE_ABM (LZCNT/TZCNT).

bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.

bpf_bitrev64() is not supported as x86_64 has no native bit-reverse
instruction.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 153 ++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 070ba80e39d7..5d6215071cbd 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -19,6 +19,7 @@
 #include <asm/text-patching.h>
 #include <asm/unwind.h>
 #include <asm/cfi.h>
+#include <asm/cpufeatures.h>
 
 static bool all_callee_regs_used[4] = {true, true, true, true};
 
@@ -1604,6 +1605,134 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
 	*pprog = prog;
 }
 
+static int emit_bitops(u8 **pprog, u32 bitops)
+{
+	u8 *prog = *pprog;
+
+	/*
+	 * x86 Bit manipulation instruction set
+	 * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
+	 */
+
+	switch (bitops) {
+	case BPF_CLZ64:
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   LZCNT - Count the Number of Leading Zero Bits
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F BD /r
+		 *     LZCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RVM
+		 *
+		 *     64/32-bit Mode
+		 *     V/N.E.
+		 *
+		 *     CPUID Feature Flag
+		 *     LZCNT
+		 *
+		 *     Description
+		 *     Count the number of leading zero bits in r/m64, return
+		 *     result in r64.
+		 */
+		/* emit: x ? 64 - fls64(x) : 64 */
+		/* lzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+		break;
+
+	case BPF_CTZ64:
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   TZCNT - Count the Number of Trailing Zero Bits
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F BC /r
+		 *     TZCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RVM
+		 *
+		 *     64/32-bit Mode
+		 *     V/N.E.
+		 *
+		 *     CPUID Feature Flag
+		 *     BMI1
+		 *
+		 *     Description
+		 *     Count the number of trailing zero bits in r/m64, return
+		 *     result in r64.
+		 */
+		/* emit: x ? __ffs64(x) : 64 */
+		/* tzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+		break;
+
+	case BPF_FFS64:
+		/* emit: __ffs64(x), 'x == 0' was handled by verifier */
+		/* tzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+		break;
+
+	case BPF_FLS64:
+		/* emit: fls64(x) */
+		/* lzcnt rax, rdi; neg rax; add rax, 64 */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+		EMIT3(0x48, 0xF7, 0xD8);       /* neg rax */
+		EMIT4(0x48, 0x83, 0xC0, 0x40); /* add rax, 64 */
+		break;
+
+	case BPF_POPCNT64:
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   POPCNT - Return the Count of Number of Bits Set to 1
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F B8 /r
+		 *     POPCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RM
+		 *
+		 *     64 Mode
+		 *     Valid
+		 *
+		 *     Compat/Leg Mode
+		 *     N.E.
+		 *
+		 *     Description
+		 *     POPCNT on r/m64
+		 */
+		/* popcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xB8, 0xC7);
+		break;
+
+	case BPF_ROL64:
+		/* emit: rol64(x, s) */
+		EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+		EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+		EMIT3(0x48, 0xD3, 0xC0); /* rol rax, cl */
+		break;
+
+	case BPF_ROR64:
+		/* emit: ror64(x, s) */
+		EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+		EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+		EMIT3(0x48, 0xD3, 0xC8); /* ror rax, cl */
+		break;
+
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	*pprog = prog;
+	return 0;
+}
+
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
 #define __LOAD_TCC_PTR(off)			\
@@ -2113,6 +2242,12 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 			}
 			break;
 
+		case BPF_ALU64 | BPF_BITOPS:
+			err = emit_bitops(&prog, insn->imm);
+			if (err)
+				return err;
+			break;
+
 			/* speculation barrier */
 		case BPF_ST | BPF_NOSPEC:
 			EMIT_LFENCE();
@@ -4117,3 +4252,21 @@ bool bpf_jit_supports_fsession(void)
 {
 	return true;
 }
+
+bool bpf_jit_inlines_bitops(s32 imm)
+{
+	switch (imm) {
+	case BPF_CLZ64:
+	case BPF_CTZ64:
+	case BPF_FFS64:
+	case BPF_FLS64:
+		return boot_cpu_has(X86_FEATURE_ABM);
+	case BPF_POPCNT64:
+		return boot_cpu_has(X86_FEATURE_POPCNT);
+	case BPF_ROL64:
+	case BPF_ROR64:
+		return true;
+	default:
+		return false;
+	}
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Implement JIT inlining of the 64bit bitops kfuncs on arm64.

bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
supported using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
implemented via RBIT + CLZ, or via the native CTZ instruction when
FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always
supported via RORV.

bpf_popcnt64() is not supported as the native population count
instruction requires NEON/SIMD registers, which should not be touched
from BPF programs.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/arm64/net/bpf_jit_comp.c | 143 ++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 2dc5037694ba..b91896cef247 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1199,6 +1199,123 @@ static int add_exception_handler(const struct bpf_insn *insn,
 	return 0;
 }
 
+static inline u32 a64_clz64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.91 CLZ
+	 *
+	 *     Count leading zeros
+	 *
+	 *     This instruction counts the number of consecutive binary zero bits,
+	 *     starting from the most significant bit in the source register,
+	 *     and places the count in the destination register.
+	 */
+	/* CLZ Xd, Xn */
+	return 0xdac01000 | (rn << 5) | rd;
+}
+
+static inline u32 a64_ctz64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.144 CTZ
+	 *
+	 *     Count trailing zeros
+	 *
+	 *     This instruction counts the number of consecutive binary zero bits,
+	 *     starting from the least significant bit in the source register,
+	 *     and places the count in the destination register.
+	 *
+	 *     This instruction requires FEAT_CSSC.
+	 */
+	/* CTZ Xd, Xn */
+	return 0xdac01800 | (rn << 5) | rd;
+}
+
+static inline u32 a64_rbit64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.320 RBIT
+	 *
+	 *     Reverse bits
+	 *
+	 *     This instruction reverses the bit order in a register.
+	 */
+	/* RBIT Xd, Xn */
+	return 0xdac00000 | (rn << 5) | rd;
+}
+
+static inline bool supports_cssc(void)
+{
+	/*
+	 * Documentation/arch/arm64/cpu-feature-registers.rst
+	 *
+	 *   ID_AA64ISAR2_EL1 - Instruction set attribute register 2
+	 *
+	 *     CSSC
+	 */
+	return cpuid_feature_extract_unsigned_field(read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1),
+						    ID_AA64ISAR2_EL1_CSSC_SHIFT);
+}
+
+static int emit_bitops(struct jit_ctx *ctx, s32 imm)
+{
+	const u8 r0 = bpf2a64[BPF_REG_0];
+	const u8 r1 = bpf2a64[BPF_REG_1];
+	const u8 r2 = bpf2a64[BPF_REG_2];
+	const u8 tmp = bpf2a64[TMP_REG_1];
+
+	switch (imm) {
+	case BPF_CLZ64:
+		emit(a64_clz64(r0, r1), ctx);
+		break;
+	case BPF_CTZ64:
+	case BPF_FFS64:
+		if (supports_cssc()) {
+			emit(a64_ctz64(r0, r1), ctx);
+		} else {
+			emit(a64_rbit64(tmp, r1), ctx);
+			emit(a64_clz64(r0, tmp), ctx);
+		}
+		break;
+	case BPF_FLS64:
+		emit(a64_clz64(tmp, r1), ctx);
+		emit(A64_NEG(1, tmp, tmp), ctx);
+		emit(A64_ADD_I(1, r0, tmp, 64), ctx);
+		break;
+	case BPF_BITREV64:
+		emit(a64_rbit64(r0, r1), ctx);
+		break;
+	case BPF_ROL64:
+		emit(A64_NEG(1, tmp, r2), ctx);
+		emit(A64_DATA2(1, r0, r1, tmp, RORV), ctx);
+		break;
+	case BPF_ROR64:
+		emit(A64_DATA2(1, r0, r1, r2, RORV), ctx);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+	return 0;
+}
+
 /* JITs an eBPF instruction.
  * Returns:
  * 0  - successfully JITed an 8-byte eBPF instruction.
@@ -1451,6 +1568,11 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU64 | BPF_ARSH | BPF_K:
 		emit(A64_ASR(is64, dst, dst, imm), ctx);
 		break;
+	case BPF_ALU64 | BPF_BITOPS:
+		ret = emit_bitops(ctx, imm);
+		if (ret)
+			return ret;
+		break;
 
 	/* JUMP reg */
 	case BPF_JMP | BPF_JA | BPF_X:
@@ -3207,3 +3329,24 @@ void bpf_jit_free(struct bpf_prog *prog)
 
 	bpf_prog_unlock_free(prog);
 }
+
+bool bpf_jit_inlines_bitops(s32 imm)
+{
+	switch (imm) {
+	case BPF_CLZ64:
+	case BPF_CTZ64:
+	case BPF_FFS64:
+	case BPF_FLS64:
+	case BPF_BITREV64:
+		/* They use RBIT/CLZ/CTZ which are mandatory in ARM64 */
+		return true;
+	case BPF_POPCNT64:
+		/* We should not touch NEON/SIMD register to support popcnt64 */
+		return false;
+	case BPF_ROL64:
+	case BPF_ROR64:
+		return true;
+	default:
+		return false;
+	}
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
                   ` (2 preceding siblings ...)
  2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Add selftests for bpf_clz64(), bpf_ctz64(), bpf_ffs64(), bpf_fls64(),
bpf_bitrev64(), bpf_popcnt64(), bpf_rol64(), and bpf_ror64().

Each subtest compares the kfunc result against a userspace reference
implementation across a set of test vectors. If the JIT does not support
inlining a given kfunc, the subtest is skipped (-EOPNOTSUPP at load
time).

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../testing/selftests/bpf/bpf_experimental.h  |   9 +
 .../testing/selftests/bpf/prog_tests/bitops.c | 186 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bitops.c    |  69 +++++++
 3 files changed, 264 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitops.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 4b7210c318dd..3a7d126968b3 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -594,6 +594,15 @@ extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym;
 extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
 				 struct bpf_dynptr *value_p) __weak __ksym;
 
+extern __u64 bpf_clz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ctz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ffs64(__u64 x) __weak __ksym;
+extern __u64 bpf_fls64(__u64 x) __weak __ksym;
+extern __u64 bpf_bitrev64(__u64 x) __weak __ksym;
+extern __u64 bpf_popcnt64(__u64 x) __weak __ksym;
+extern __u64 bpf_rol64(__u64 x, __u64 s) __weak __ksym;
+extern __u64 bpf_ror64(__u64 x, __u64 s) __weak __ksym;
+
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_BITS	4
diff --git a/tools/testing/selftests/bpf/prog_tests/bitops.c b/tools/testing/selftests/bpf/prog_tests/bitops.c
new file mode 100644
index 000000000000..59bf1c5b5102
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bitops.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "bitops.skel.h"
+
+struct bitops_case {
+	__u64 x;
+	__u64 s;
+	__u64 exp;
+};
+
+static struct bitops_case cases[] = {
+	{ 0x0ULL, 0, 0 },
+	{ 0x1ULL, 1, 0 },
+	{ 0x8000000000000000ULL, 63, 0 },
+	{ 0xffffffffffffffffULL, 64, 0 },
+	{ 0x0123456789abcdefULL, 65, 0 },
+	{ 0x0000000100000000ULL, 127, 0 },
+};
+
+static __u64 clz64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? __builtin_clzll(x) : 64;
+}
+
+static __u64 ctz64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? __builtin_ctzll(x) : 64;
+}
+
+static __u64 ffs64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? (__u64)__builtin_ctzll(x) + 1 : 0;
+}
+
+static __u64 fls64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? 64 - __builtin_clzll(x) : 0;
+}
+
+static __u64 popcnt64(__u64 x, __u64 s)
+{
+	(void)s;
+	return __builtin_popcountll(x);
+}
+
+static __u64 bitrev64(__u64 x, __u64 s)
+{
+	__u64 y = 0;
+	int i;
+
+	(void)s;
+
+	for (i = 0; i < 64; i++) {
+		y <<= 1;
+		y |= x & 1;
+		x >>= 1;
+	}
+	return y;
+}
+
+static __u64 rol64(__u64 x, __u64 s)
+{
+	s &= 63;
+	return (x << s) | (x >> ((-s) & 63));
+}
+
+static __u64 ror64(__u64 x, __u64 s)
+{
+	s &= 63;
+	return (x >> s) | (x << ((-s) & 63));
+}
+
+static void test_bitops_case(const char *prog_name)
+{
+	struct bpf_program *prog;
+	struct bitops *skel;
+	size_t i;
+	int err;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = bitops__open();
+	if (!ASSERT_OK_PTR(skel, "bitops__open"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	bpf_program__set_autoload(prog, true);
+
+	err = bitops__load(skel);
+	if (err == -EOPNOTSUPP) {
+		test__skip();
+		goto cleanup;
+	}
+	if (!ASSERT_OK(err, "bitops__load"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		skel->bss->in_x = cases[i].x;
+		skel->bss->in_s = cases[i].s;
+		err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts);
+		if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+			goto cleanup;
+
+		if (!ASSERT_OK(topts.retval, "retval"))
+			goto cleanup;
+
+		ASSERT_EQ(skel->bss->out, cases[i].exp, "out");
+	}
+
+cleanup:
+	bitops__destroy(skel);
+}
+
+#define RUN_BITOPS_CASE(_bitops, _prog)					\
+	do {								\
+		for (size_t i = 0; i < ARRAY_SIZE(cases); i++)		\
+			cases[i].exp = _bitops(cases[i].x, cases[i].s);	\
+		test_bitops_case(_prog);				\
+	} while (0)
+
+static void test_clz64(void)
+{
+	RUN_BITOPS_CASE(clz64, "bitops_clz64");
+}
+
+static void test_ctz64(void)
+{
+	RUN_BITOPS_CASE(ctz64, "bitops_ctz64");
+}
+
+static void test_ffs64(void)
+{
+	RUN_BITOPS_CASE(ffs64, "bitops_ffs64");
+}
+
+static void test_fls64(void)
+{
+	RUN_BITOPS_CASE(fls64, "bitops_fls64");
+}
+
+static void test_bitrev64(void)
+{
+	RUN_BITOPS_CASE(bitrev64, "bitops_bitrev");
+}
+
+static void test_popcnt64(void)
+{
+	RUN_BITOPS_CASE(popcnt64, "bitops_popcnt");
+}
+
+static void test_rol64(void)
+{
+	RUN_BITOPS_CASE(rol64, "bitops_rol64");
+}
+
+static void test_ror64(void)
+{
+	RUN_BITOPS_CASE(ror64, "bitops_ror64");
+}
+
+void test_bitops(void)
+{
+	if (test__start_subtest("clz64"))
+		test_clz64();
+	if (test__start_subtest("ctz64"))
+		test_ctz64();
+	if (test__start_subtest("ffs64"))
+		test_ffs64();
+	if (test__start_subtest("fls64"))
+		test_fls64();
+	if (test__start_subtest("bitrev64"))
+		test_bitrev64();
+	if (test__start_subtest("popcnt64"))
+		test_popcnt64();
+	if (test__start_subtest("rol64"))
+		test_rol64();
+	if (test__start_subtest("ror64"))
+		test_ror64();
+}
diff --git a/tools/testing/selftests/bpf/progs/bitops.c b/tools/testing/selftests/bpf/progs/bitops.c
new file mode 100644
index 000000000000..5d5b192bf3d9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitops.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_experimental.h"
+
+__u64 in_x;
+__u64 in_s;
+
+__u64 out;
+
+SEC("?syscall")
+int bitops_clz64(void *ctx)
+{
+	out = bpf_clz64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ctz64(void *ctx)
+{
+	out = bpf_ctz64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ffs64(void *ctx)
+{
+	out = bpf_ffs64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_fls64(void *ctx)
+{
+	out = bpf_fls64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_bitrev(void *ctx)
+{
+	out = bpf_bitrev64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_popcnt(void *ctx)
+{
+	out = bpf_popcnt64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_rol64(void *ctx)
+{
+	out = bpf_rol64(in_x, in_s);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ror64(void *ctx)
+{
+	out = bpf_ror64(in_x, in_s);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
@ 2026-02-11  3:05   ` Alexei Starovoitov
  2026-02-11  3:29     ` Leon Hwang
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2026-02-11  3:05 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann

On Mon, Feb 9, 2026 at 7:59 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Introduce the following 64bit bitops kfuncs:
>
> * bpf_clz64(): Count leading zeros.
> * bpf_ctz64(): Count trailing zeros.
> * bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
>   is 0.
> * bpf_fls64(): Find last set bit, 1-based index.
> * bpf_bitrev64(): Reverse bits.
> * bpf_popcnt64(): Population count.
> * bpf_rol64(): Rotate left.
> * bpf_ror64(): Rotate right.
>
> Especially,
>
> * bpf_clz64(0) = 64
> * bpf_ctz64(0) = 64
> * bpf_ffs64(0) = 0
> * bpf_fls64(0) = 0
>
> These kfuncs are marked with a new KF_MUST_INLINE flag, which indicates
> the kfunc must be inlined by the JIT backend. A weak function
> bpf_jit_inlines_bitops() is introduced for JIT backends to advertise
> support for individual bitops.
>
> bpf_rol64() and bpf_ror64() kfuncs do not have KF_FASTCALL due to
> BPF_REG_4 ('cl' actually) will be used on x86_64. The other kfuncs have
> KF_FASTCALL to avoid clobbering unused registers.
>
> An internal BPF_ALU64 opcode BPF_BITOPS is introduced as the encoding
> for these operations, with the immediate field selecting the specific
> operation (BPF_CLZ64, BPF_CTZ64, etc.).
>
> The verifier rejects the kfunc in check_kfunc_call() if the JIT backend
> does not support it, and rewrites the call to a BPF_BITOPS instruction
> in fixup_kfunc_call().
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/btf.h    |  1 +
>  include/linux/filter.h | 20 +++++++++++++
>  kernel/bpf/core.c      |  6 ++++
>  kernel/bpf/helpers.c   | 50 ++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c  | 65 ++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 142 insertions(+)
>
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 48108471c5b1..8ac1dc59ca85 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -79,6 +79,7 @@
>  #define KF_ARENA_ARG1   (1 << 14) /* kfunc takes an arena pointer as its first argument */
>  #define KF_ARENA_ARG2   (1 << 15) /* kfunc takes an arena pointer as its second argument */
>  #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
> +#define KF_MUST_INLINE  (1 << 17) /* kfunc must be inlined by JIT backend */

UX is not great.
Just keep kfuncs in C as fallback when JIT cannot inline them
and don't remove spill/fills that llvm leaves for fastcall.

>
>  /*
>   * Tag marking a kernel function as a kfunc. This is meant to minimize the
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 4e1cb4f91f49..ff6c0cf68dd3 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
>                 .off   = 0,                                     \
>                 .imm   = 0 })
>
> +/* bitops */
> +#define BPF_BITOPS     0xe0    /* opcode for alu64 */
> +#define BPF_CLZ64      0x00    /* imm for clz64 */
> +#define BPF_CTZ64      0x01    /* imm for ctz64 */
> +#define BPF_FFS64      0x02    /* imm for ffs64 */
> +#define BPF_FLS64      0x03    /* imm for fls64 */
> +#define BPF_BITREV64   0x04    /* imm for bitrev64 */
> +#define BPF_POPCNT64   0x05    /* imm for popcnt64 */
> +#define BPF_ROL64      0x06    /* imm for rol64 */
> +#define BPF_ROR64      0x07    /* imm for ror64 */
> +
> +#define BPF_BITOPS_INSN(IMM)                                   \
> +       ((struct bpf_insn) {                                    \
> +               .code  = BPF_ALU64 | BPF_BITOPS,                \
> +               .dst_reg = 0,                                   \
> +               .src_reg = 0,                                   \
> +               .off   = 0,                                     \
> +               .imm   = IMM })
> +

why introduce pseudo instructions and this encoding?
Just let JIT identify kfunc calls by address.
bpf_jit_get_func_addr()
if (addr == bpf_clz64) ...

>  /* Internal classic blocks for direct assignment */
>
>  #define __BPF_STMT(CODE, K)                                    \
> @@ -1157,6 +1176,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
>  void bpf_jit_compile(struct bpf_prog *prog);
>  bool bpf_jit_needs_zext(void);
>  bool bpf_jit_inlines_helper_call(s32 imm);
> +bool bpf_jit_inlines_bitops(s32 imm);
>  bool bpf_jit_supports_subprog_tailcalls(void);
>  bool bpf_jit_supports_percpu_insn(void);
>  bool bpf_jit_supports_kfunc_call(void);
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index dc906dfdff94..cee90181d169 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3113,6 +3113,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
>         return false;
>  }
>
> +/* Return TRUE if the JIT backend inlines the bitops insn. */
> +bool __weak bpf_jit_inlines_bitops(s32 imm)
> +{
> +       return false;
> +}
> +
>  /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
>  bool __weak bpf_jit_supports_subprog_tailcalls(void)
>  {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 7ac32798eb04..0a598c800f67 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -29,6 +29,8 @@
>  #include <linux/task_work.h>
>  #include <linux/irq_work.h>
>  #include <linux/buildid.h>
> +#include <linux/bitops.h>
> +#include <linux/bitrev.h>
>
>  #include "../../lib/kstrtox.h"
>
> @@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
>         }
>  }
>
> +__bpf_kfunc u64 bpf_clz64(u64 x)
> +{
> +       return x ? 64 - fls64(x) : 64;
> +}
> +
> +__bpf_kfunc u64 bpf_ctz64(u64 x)
> +{
> +       return x ? __ffs64(x) : 64;
> +}
> +
> +__bpf_kfunc u64 bpf_ffs64(u64 x)
> +{
> +       return x ? __ffs64(x) + 1 : 0;
> +}
> +
> +__bpf_kfunc u64 bpf_fls64(u64 x)
> +{
> +       return fls64(x);
> +}
> +
> +__bpf_kfunc u64 bpf_popcnt64(u64 x)
> +{
> +       return hweight64(x);
> +}
> +
> +__bpf_kfunc u64 bpf_bitrev64(u64 x)
> +{
> +       return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
> +}
> +
> +__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
> +{
> +       return rol64(x, s);
> +}
> +
> +__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
> +{
> +       return ror64(x, s);
> +}
> +
>  __bpf_kfunc_end_defs();
>
>  static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
> @@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
>  BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
>  #endif
>  #endif
> +BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)

Mark all of them as fastcall and do push/pop in JIT when necessary.

pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
  2026-02-11  3:05   ` Alexei Starovoitov
@ 2026-02-11  3:29     ` Leon Hwang
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-11  3:29 UTC (permalink / raw)
  To: Alexei Starovoitov, Leon Hwang
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann



On 11/2/26 11:05, Alexei Starovoitov wrote:
> On Mon, Feb 9, 2026 at 7:59 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>

[...]

>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index 48108471c5b1..8ac1dc59ca85 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -79,6 +79,7 @@
>>  #define KF_ARENA_ARG1   (1 << 14) /* kfunc takes an arena pointer as its first argument */
>>  #define KF_ARENA_ARG2   (1 << 15) /* kfunc takes an arena pointer as its second argument */
>>  #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
>> +#define KF_MUST_INLINE  (1 << 17) /* kfunc must be inlined by JIT backend */
> 
> UX is not great.
> Just keep kfuncs in C as fallback when JIT cannot inline them
> and don't remove spill/fills that llvm leaves for fastcall.
> 

Ack.

I’ll drop KF_MUST_INLINE in the next revision and keep the C kfunc
implementation as the fallback when the JIT cannot inline it.

>>
>>  /*
>>   * Tag marking a kernel function as a kfunc. This is meant to minimize the
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 4e1cb4f91f49..ff6c0cf68dd3 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
>>                 .off   = 0,                                     \
>>                 .imm   = 0 })
>>
>> +/* bitops */
>> +#define BPF_BITOPS     0xe0    /* opcode for alu64 */
>> +#define BPF_CLZ64      0x00    /* imm for clz64 */
>> +#define BPF_CTZ64      0x01    /* imm for ctz64 */
>> +#define BPF_FFS64      0x02    /* imm for ffs64 */
>> +#define BPF_FLS64      0x03    /* imm for fls64 */
>> +#define BPF_BITREV64   0x04    /* imm for bitrev64 */
>> +#define BPF_POPCNT64   0x05    /* imm for popcnt64 */
>> +#define BPF_ROL64      0x06    /* imm for rol64 */
>> +#define BPF_ROR64      0x07    /* imm for ror64 */
>> +
>> +#define BPF_BITOPS_INSN(IMM)                                   \
>> +       ((struct bpf_insn) {                                    \
>> +               .code  = BPF_ALU64 | BPF_BITOPS,                \
>> +               .dst_reg = 0,                                   \
>> +               .src_reg = 0,                                   \
>> +               .off   = 0,                                     \
>> +               .imm   = IMM })
>> +
> 
> why introduce pseudo instructions and this encoding?
> Just let JIT identify kfunc calls by address.
> bpf_jit_get_func_addr()
> if (addr == bpf_clz64) ...
> 

Thanks for pointing me to bpf_jit_get_func_addr().

I’ll drop the BPF_BITOPS encoding and BPF_BITOPS_INSN, and instead
let the JIT identify the bitops kfuncs by their resolved function
address via bpf_jit_get_func_addr().

That should keep things simpler and avoid introducing a new internal
opcode.

>>  /* Internal classic blocks for direct assignment */
>>
>>  #define __BPF_STMT(CODE, K)                                    \

[...]

>> @@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
>>  BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
>>  #endif
>>  #endif
>> +BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)
> 
> Mark all of them as fastcall and do push/pop in JIT when necessary.
> 

Good idea.

I’ll mark all bitops kfuncs as KF_FASTCALL and handle any required
save/restore in the JIT.

In particular, for bpf_rol64() and bpf_ror64() on x86_64, we do need
to use rcx (CL) for variable rotates, so pushing/popping rcx in the
JIT when needed makes sense.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-11  3:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
2026-02-11  3:05   ` Alexei Starovoitov
2026-02-11  3:29     ` Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.