[RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs
@ 2026-02-09 15:59 Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Introduce the following 64-bit bitops kfuncs for x86_64 and arm64:

* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.

Especially,

* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0

bpf_ffs64() was previously discussed in "bpf: Add generic kfunc bpf_ffs64()" [1].

Background

In the earlier bpf_ffs64() discussion, the main concern with exposing such
operations as generic kfuncs was ABI cost. A normal kfunc call follows the
BPF calling convention, which forces the compiler/JIT to treat R1-R5 as
call-clobbered, resulting in unnecessary spill/fill compared to a dedicated
instruction.

This RFC keeps the user-facing API as kfuncs, but avoids the ABI cost in the
fast path. The verifier rewrites supported bitops kfunc calls into a single
internal ALU64 encoding (BPF_BITOPS with an immediate selector), and JIT
backends emit native instructions directly. As a result, these kfuncs behave
like ISA operations once loaded, rather than real helper calls.

To make this contract explicit, the kfuncs are marked with a new
KF_MUST_INLINE flag: program load fails with -EOPNOTSUPP if the active JIT
backend cannot inline a particular operation. This keeps the cost predictable
and avoids silent slow fallbacks. A weak hook, bpf_jit_inlines_bitops(),
allows each JIT backend to advertise support on a per-operation basis
(and potentially based on CPU features).

Most operations are also tagged KF_FASTCALL to avoid clobbering unused
argument registers. bpf_rol64() and bpf_ror64() are the exception on x86_64,
where variable rotates require CL (BPF_REG_4).

Selftests output

On x86_64:

 #18/1    bitops/clz64:OK
 #18/2    bitops/ctz64:OK
 #18/3    bitops/ffs64:OK
 #18/4    bitops/fls64:OK
 #18/5    bitops/bitrev64:SKIP
 #18/6    bitops/popcnt64:OK
 #18/7    bitops/rol64:OK
 #18/8    bitops/ror64:OK
 #18      bitops:OK (SKIP: 1/8)
 Summary: 1/7 PASSED, 1 SKIPPED, 0 FAILED

On arm64:

 #18/1    bitops/clz64:OK
 #18/2    bitops/ctz64:OK
 #18/3    bitops/ffs64:OK
 #18/4    bitops/fls64:OK
 #18/5    bitops/bitrev64:OK
 #18/6    bitops/popcnt64:SKIP
 #18/7    bitops/rol64:OK
 #18/8    bitops/ror64:OK
 #18      bitops:OK (SKIP: 1/8)
 Summary: 1/7 PASSED, 1 SKIPPED, 0 FAILED

Open questions

1. Should these operations be exposed as a proper BPF ISA extension (new
   ALU64 ops) instead of a kfunc API plus verifier rewrite? This RFC takes
   the kfunc route to iterate without immediately committing to new uapi
   instruction semantics, while still ensuring instruction-like codegen.

2. For operations without a reasonable native implementation on some
   targets (e.g. bitrev64 on x86_64; popcnt64 on arm64 without touching
   SIMD registers), should we allow a true generic fallback by dropping
   KF_MUST_INLINE for those ops, or keep the "no-inline == reject" behavior
   for predictability?

Links:
[1] https://lore.kernel.org/bpf/20240131155607.51157-1-hffilwlqm@gmail.com/

Leon Hwang (4):
  bpf: Introduce 64bit bitops kfuncs
  bpf, x86: Add 64bit bitops kfuncs support for x86_64
  bpf, arm64: Add 64bit bitops kfuncs support
  selftests/bpf: Add tests for 64bit bitops kfuncs

 arch/arm64/net/bpf_jit_comp.c                 | 143 ++++++++++++++
 arch/x86/net/bpf_jit_comp.c                   | 153 ++++++++++++++
 include/linux/btf.h                           |   1 +
 include/linux/filter.h                        |  20 ++
 kernel/bpf/core.c                             |   6 +
 kernel/bpf/helpers.c                          |  50 +++++
 kernel/bpf/verifier.c                         |  65 ++++++
 .../testing/selftests/bpf/bpf_experimental.h  |   9 +
 .../testing/selftests/bpf/prog_tests/bitops.c | 186 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bitops.c    |  69 +++++++
 10 files changed, 702 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitops.c

--
2.52.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  2026-02-11  3:05   ` Alexei Starovoitov
  2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Introduce the following 64bit bitops kfuncs:

* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
  is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.

Especially,

* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0

These kfuncs are marked with a new KF_MUST_INLINE flag, which indicates
the kfunc must be inlined by the JIT backend. A weak function
bpf_jit_inlines_bitops() is introduced for JIT backends to advertise
support for individual bitops.

bpf_rol64() and bpf_ror64() kfuncs do not have KF_FASTCALL due to
BPF_REG_4 ('cl' actually) will be used on x86_64. The other kfuncs have
KF_FASTCALL to avoid clobbering unused registers.

An internal BPF_ALU64 opcode BPF_BITOPS is introduced as the encoding
for these operations, with the immediate field selecting the specific
operation (BPF_CLZ64, BPF_CTZ64, etc.).

The verifier rejects the kfunc in check_kfunc_call() if the JIT backend
does not support it, and rewrites the call to a BPF_BITOPS instruction
in fixup_kfunc_call().

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/btf.h    |  1 +
 include/linux/filter.h | 20 +++++++++++++
 kernel/bpf/core.c      |  6 ++++
 kernel/bpf/helpers.c   | 50 ++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c  | 65 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 142 insertions(+)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 48108471c5b1..8ac1dc59ca85 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -79,6 +79,7 @@
 #define KF_ARENA_ARG1   (1 << 14) /* kfunc takes an arena pointer as its first argument */
 #define KF_ARENA_ARG2   (1 << 15) /* kfunc takes an arena pointer as its second argument */
 #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
+#define KF_MUST_INLINE  (1 << 17) /* kfunc must be inlined by JIT backend */
 
 /*
  * Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 4e1cb4f91f49..ff6c0cf68dd3 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
 		.off   = 0,					\
 		.imm   = 0 })
 
+/* bitops */
+#define BPF_BITOPS	0xe0	/* opcode for alu64 */
+#define BPF_CLZ64	0x00	/* imm for clz64 */
+#define BPF_CTZ64	0x01	/* imm for ctz64 */
+#define BPF_FFS64	0x02	/* imm for ffs64 */
+#define BPF_FLS64	0x03	/* imm for fls64 */
+#define BPF_BITREV64	0x04	/* imm for bitrev64 */
+#define BPF_POPCNT64	0x05	/* imm for popcnt64 */
+#define BPF_ROL64	0x06	/* imm for rol64 */
+#define BPF_ROR64	0x07	/* imm for ror64 */
+
+#define BPF_BITOPS_INSN(IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_BITOPS,		\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
 /* Internal classic blocks for direct assignment */
 
 #define __BPF_STMT(CODE, K)					\
@@ -1157,6 +1176,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_jit_needs_zext(void);
 bool bpf_jit_inlines_helper_call(s32 imm);
+bool bpf_jit_inlines_bitops(s32 imm);
 bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_percpu_insn(void);
 bool bpf_jit_supports_kfunc_call(void);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index dc906dfdff94..cee90181d169 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3113,6 +3113,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
 	return false;
 }
 
+/* Return TRUE if the JIT backend inlines the bitops insn. */
+bool __weak bpf_jit_inlines_bitops(s32 imm)
+{
+	return false;
+}
+
 /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool __weak bpf_jit_supports_subprog_tailcalls(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ac32798eb04..0a598c800f67 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -29,6 +29,8 @@
 #include <linux/task_work.h>
 #include <linux/irq_work.h>
 #include <linux/buildid.h>
+#include <linux/bitops.h>
+#include <linux/bitrev.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
 	}
 }
 
+__bpf_kfunc u64 bpf_clz64(u64 x)
+{
+	return x ? 64 - fls64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ctz64(u64 x)
+{
+	return x ? __ffs64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ffs64(u64 x)
+{
+	return x ? __ffs64(x) + 1 : 0;
+}
+
+__bpf_kfunc u64 bpf_fls64(u64 x)
+{
+	return fls64(x);
+}
+
+__bpf_kfunc u64 bpf_popcnt64(u64 x)
+{
+	return hweight64(x);
+}
+
+__bpf_kfunc u64 bpf_bitrev64(u64 x)
+{
+	return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
+}
+
+__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
+{
+	return rol64(x, s);
+}
+
+__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
+{
+	return ror64(x, s);
+}
+
 __bpf_kfunc_end_defs();
 
 static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
 #endif
 #endif
+BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)
 BTF_KFUNCS_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index edf5342b982f..ed9a077ecf2e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12477,6 +12477,14 @@ enum special_kfunc_type {
 	KF_bpf_session_is_return,
 	KF_bpf_stream_vprintk,
 	KF_bpf_stream_print_stack,
+	KF_bpf_clz64,
+	KF_bpf_ctz64,
+	KF_bpf_ffs64,
+	KF_bpf_fls64,
+	KF_bpf_bitrev64,
+	KF_bpf_popcnt64,
+	KF_bpf_rol64,
+	KF_bpf_ror64,
 };
 
 BTF_ID_LIST(special_kfunc_list)
@@ -12557,6 +12565,14 @@ BTF_ID(func, bpf_arena_reserve_pages)
 BTF_ID(func, bpf_session_is_return)
 BTF_ID(func, bpf_stream_vprintk)
 BTF_ID(func, bpf_stream_print_stack)
+BTF_ID(func, bpf_clz64)
+BTF_ID(func, bpf_ctz64)
+BTF_ID(func, bpf_ffs64)
+BTF_ID(func, bpf_fls64)
+BTF_ID(func, bpf_bitrev64)
+BTF_ID(func, bpf_popcnt64)
+BTF_ID(func, bpf_rol64)
+BTF_ID(func, bpf_ror64)
 
 static bool is_task_work_add_kfunc(u32 func_id)
 {
@@ -12564,6 +12580,30 @@ static bool is_task_work_add_kfunc(u32 func_id)
 	       func_id == special_kfunc_list[KF_bpf_task_work_schedule_resume];
 }
 
+static bool get_bitops_insn_imm(u32 func_id, s32 *imm)
+{
+	if (func_id == special_kfunc_list[KF_bpf_clz64])
+		*imm = BPF_CLZ64;
+	else if (func_id == special_kfunc_list[KF_bpf_ctz64])
+		*imm = BPF_CTZ64;
+	else if (func_id == special_kfunc_list[KF_bpf_ffs64])
+		*imm = BPF_FFS64;
+	else if (func_id == special_kfunc_list[KF_bpf_fls64])
+		*imm = BPF_FLS64;
+	else if (func_id == special_kfunc_list[KF_bpf_bitrev64])
+		*imm = BPF_BITREV64;
+	else if (func_id == special_kfunc_list[KF_bpf_popcnt64])
+		*imm = BPF_POPCNT64;
+	else if (func_id == special_kfunc_list[KF_bpf_rol64])
+		*imm = BPF_ROL64;
+	else if (func_id == special_kfunc_list[KF_bpf_ror64])
+		*imm = BPF_ROR64;
+	else
+		return false;
+
+	return true;
+}
+
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
 	if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
@@ -14044,6 +14084,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	struct btf *desc_btf;
+	bool is_bitops_kfunc;
+	s32 insn_imm;
 
 	/* skip for now, but return error when we find this in fixup_kfunc_call */
 	if (!insn->imm)
@@ -14423,6 +14465,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	if (meta.func_id == special_kfunc_list[KF_bpf_session_cookie])
 		env->prog->call_session_cookie = true;
 
+	is_bitops_kfunc = get_bitops_insn_imm(meta.func_id, &insn_imm);
+	if ((meta.kfunc_flags & KF_MUST_INLINE)) {
+		bool inlined = is_bitops_kfunc && bpf_jit_inlines_bitops(insn_imm);
+
+		if (!inlined) {
+			verbose(env, "JIT does not support inlining the kfunc %s.\n", func_name);
+			return -EOPNOTSUPP;
+		}
+	}
+
 	return 0;
 }
 
@@ -23236,6 +23288,19 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn_buf[4] = BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1);
 		insn_buf[5] = BPF_ALU64_IMM(BPF_NEG, BPF_REG_0, 0);
 		*cnt = 6;
+	} else if (get_bitops_insn_imm(desc->func_id, &insn_buf[0].imm)) {
+		s32 imm = insn_buf[0].imm;
+
+		if (imm == BPF_FFS64) {
+			insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, 0);
+			insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 2);
+			insn_buf[2] = BPF_BITOPS_INSN(imm);
+			insn_buf[3] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1);
+			*cnt = 4;
+		} else {
+			insn_buf[0] = BPF_BITOPS_INSN(imm);
+			*cnt = 1;
+		}
 	}
 
 	if (env->insn_aux_data[insn_idx].arg_prog) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Implement JIT inlining of the 64bit bitops kfuncs on x86_64.

bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.

bpf_clz64(), bpf_ctz64(), bpf_ffs64(), and bpf_fls64() are supported
when the CPU has X86_FEATURE_ABM (LZCNT/TZCNT).

bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.

bpf_bitrev64() is not supported as x86_64 has no native bit-reverse
instruction.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 153 ++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 070ba80e39d7..5d6215071cbd 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -19,6 +19,7 @@
 #include <asm/text-patching.h>
 #include <asm/unwind.h>
 #include <asm/cfi.h>
+#include <asm/cpufeatures.h>
 
 static bool all_callee_regs_used[4] = {true, true, true, true};
 
@@ -1604,6 +1605,134 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
 	*pprog = prog;
 }
 
+static int emit_bitops(u8 **pprog, u32 bitops)
+{
+	u8 *prog = *pprog;
+
+	/*
+	 * x86 Bit manipulation instruction set
+	 * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
+	 */
+
+	switch (bitops) {
+	case BPF_CLZ64:
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   LZCNT - Count the Number of Leading Zero Bits
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F BD /r
+		 *     LZCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RVM
+		 *
+		 *     64/32-bit Mode
+		 *     V/N.E.
+		 *
+		 *     CPUID Feature Flag
+		 *     LZCNT
+		 *
+		 *     Description
+		 *     Count the number of leading zero bits in r/m64, return
+		 *     result in r64.
+		 */
+		/* emit: x ? 64 - fls64(x) : 64 */
+		/* lzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+		break;
+
+	case BPF_CTZ64:
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   TZCNT - Count the Number of Trailing Zero Bits
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F BC /r
+		 *     TZCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RVM
+		 *
+		 *     64/32-bit Mode
+		 *     V/N.E.
+		 *
+		 *     CPUID Feature Flag
+		 *     BMI1
+		 *
+		 *     Description
+		 *     Count the number of trailing zero bits in r/m64, return
+		 *     result in r64.
+		 */
+		/* emit: x ? __ffs64(x) : 64 */
+		/* tzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+		break;
+
+	case BPF_FFS64:
+		/* emit: __ffs64(x), 'x == 0' was handled by verifier */
+		/* tzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+		break;
+
+	case BPF_FLS64:
+		/* emit: fls64(x) */
+		/* lzcnt rax, rdi; neg rax; add rax, 64 */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+		EMIT3(0x48, 0xF7, 0xD8);       /* neg rax */
+		EMIT4(0x48, 0x83, 0xC0, 0x40); /* add rax, 64 */
+		break;
+
+	case BPF_POPCNT64:
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   POPCNT - Return the Count of Number of Bits Set to 1
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F B8 /r
+		 *     POPCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RM
+		 *
+		 *     64 Mode
+		 *     Valid
+		 *
+		 *     Compat/Leg Mode
+		 *     N.E.
+		 *
+		 *     Description
+		 *     POPCNT on r/m64
+		 */
+		/* popcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xB8, 0xC7);
+		break;
+
+	case BPF_ROL64:
+		/* emit: rol64(x, s) */
+		EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+		EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+		EMIT3(0x48, 0xD3, 0xC0); /* rol rax, cl */
+		break;
+
+	case BPF_ROR64:
+		/* emit: ror64(x, s) */
+		EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+		EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+		EMIT3(0x48, 0xD3, 0xC8); /* ror rax, cl */
+		break;
+
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	*pprog = prog;
+	return 0;
+}
+
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
 #define __LOAD_TCC_PTR(off)			\
@@ -2113,6 +2242,12 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 			}
 			break;
 
+		case BPF_ALU64 | BPF_BITOPS:
+			err = emit_bitops(&prog, insn->imm);
+			if (err)
+				return err;
+			break;
+
 			/* speculation barrier */
 		case BPF_ST | BPF_NOSPEC:
 			EMIT_LFENCE();
@@ -4117,3 +4252,21 @@ bool bpf_jit_supports_fsession(void)
 {
 	return true;
 }
+
+bool bpf_jit_inlines_bitops(s32 imm)
+{
+	switch (imm) {
+	case BPF_CLZ64:
+	case BPF_CTZ64:
+	case BPF_FFS64:
+	case BPF_FLS64:
+		return boot_cpu_has(X86_FEATURE_ABM);
+	case BPF_POPCNT64:
+		return boot_cpu_has(X86_FEATURE_POPCNT);
+	case BPF_ROL64:
+	case BPF_ROR64:
+		return true;
+	default:
+		return false;
+	}
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Implement JIT inlining of the 64bit bitops kfuncs on arm64.

bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
supported using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
implemented via RBIT + CLZ, or via the native CTZ instruction when
FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always
supported via RORV.

bpf_popcnt64() is not supported as the native population count
instruction requires NEON/SIMD registers, which should not be touched
from BPF programs.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/arm64/net/bpf_jit_comp.c | 143 ++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 2dc5037694ba..b91896cef247 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1199,6 +1199,123 @@ static int add_exception_handler(const struct bpf_insn *insn,
 	return 0;
 }
 
+static inline u32 a64_clz64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.91 CLZ
+	 *
+	 *     Count leading zeros
+	 *
+	 *     This instruction counts the number of consecutive binary zero bits,
+	 *     starting from the most significant bit in the source register,
+	 *     and places the count in the destination register.
+	 */
+	/* CLZ Xd, Xn */
+	return 0xdac01000 | (rn << 5) | rd;
+}
+
+static inline u32 a64_ctz64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.144 CTZ
+	 *
+	 *     Count trailing zeros
+	 *
+	 *     This instruction counts the number of consecutive binary zero bits,
+	 *     starting from the least significant bit in the source register,
+	 *     and places the count in the destination register.
+	 *
+	 *     This instruction requires FEAT_CSSC.
+	 */
+	/* CTZ Xd, Xn */
+	return 0xdac01800 | (rn << 5) | rd;
+}
+
+static inline u32 a64_rbit64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.320 RBIT
+	 *
+	 *     Reverse bits
+	 *
+	 *     This instruction reverses the bit order in a register.
+	 */
+	/* RBIT Xd, Xn */
+	return 0xdac00000 | (rn << 5) | rd;
+}
+
+static inline bool supports_cssc(void)
+{
+	/*
+	 * Documentation/arch/arm64/cpu-feature-registers.rst
+	 *
+	 *   ID_AA64ISAR2_EL1 - Instruction set attribute register 2
+	 *
+	 *     CSSC
+	 */
+	return cpuid_feature_extract_unsigned_field(read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1),
+						    ID_AA64ISAR2_EL1_CSSC_SHIFT);
+}
+
+static int emit_bitops(struct jit_ctx *ctx, s32 imm)
+{
+	const u8 r0 = bpf2a64[BPF_REG_0];
+	const u8 r1 = bpf2a64[BPF_REG_1];
+	const u8 r2 = bpf2a64[BPF_REG_2];
+	const u8 tmp = bpf2a64[TMP_REG_1];
+
+	switch (imm) {
+	case BPF_CLZ64:
+		emit(a64_clz64(r0, r1), ctx);
+		break;
+	case BPF_CTZ64:
+	case BPF_FFS64:
+		if (supports_cssc()) {
+			emit(a64_ctz64(r0, r1), ctx);
+		} else {
+			emit(a64_rbit64(tmp, r1), ctx);
+			emit(a64_clz64(r0, tmp), ctx);
+		}
+		break;
+	case BPF_FLS64:
+		emit(a64_clz64(tmp, r1), ctx);
+		emit(A64_NEG(1, tmp, tmp), ctx);
+		emit(A64_ADD_I(1, r0, tmp, 64), ctx);
+		break;
+	case BPF_BITREV64:
+		emit(a64_rbit64(r0, r1), ctx);
+		break;
+	case BPF_ROL64:
+		emit(A64_NEG(1, tmp, r2), ctx);
+		emit(A64_DATA2(1, r0, r1, tmp, RORV), ctx);
+		break;
+	case BPF_ROR64:
+		emit(A64_DATA2(1, r0, r1, r2, RORV), ctx);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+	return 0;
+}
+
 /* JITs an eBPF instruction.
  * Returns:
  * 0  - successfully JITed an 8-byte eBPF instruction.
@@ -1451,6 +1568,11 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU64 | BPF_ARSH | BPF_K:
 		emit(A64_ASR(is64, dst, dst, imm), ctx);
 		break;
+	case BPF_ALU64 | BPF_BITOPS:
+		ret = emit_bitops(ctx, imm);
+		if (ret)
+			return ret;
+		break;
 
 	/* JUMP reg */
 	case BPF_JMP | BPF_JA | BPF_X:
@@ -3207,3 +3329,24 @@ void bpf_jit_free(struct bpf_prog *prog)
 
 	bpf_prog_unlock_free(prog);
 }
+
+bool bpf_jit_inlines_bitops(s32 imm)
+{
+	switch (imm) {
+	case BPF_CLZ64:
+	case BPF_CTZ64:
+	case BPF_FFS64:
+	case BPF_FLS64:
+	case BPF_BITREV64:
+		/* They use RBIT/CLZ/CTZ which are mandatory in ARM64 */
+		return true;
+	case BPF_POPCNT64:
+		/* We should not touch NEON/SIMD register to support popcnt64 */
+		return false;
+	case BPF_ROL64:
+	case BPF_ROR64:
+		return true;
+	default:
+		return false;
+	}
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs
  2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
                   ` (2 preceding siblings ...)
  2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

Add selftests for bpf_clz64(), bpf_ctz64(), bpf_ffs64(), bpf_fls64(),
bpf_bitrev64(), bpf_popcnt64(), bpf_rol64(), and bpf_ror64().

Each subtest compares the kfunc result against a userspace reference
implementation across a set of test vectors. If the JIT does not support
inlining a given kfunc, the subtest is skipped (-EOPNOTSUPP at load
time).

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../testing/selftests/bpf/bpf_experimental.h  |   9 +
 .../testing/selftests/bpf/prog_tests/bitops.c | 186 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bitops.c    |  69 +++++++
 3 files changed, 264 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitops.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 4b7210c318dd..3a7d126968b3 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -594,6 +594,15 @@ extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym;
 extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
 				 struct bpf_dynptr *value_p) __weak __ksym;
 
+extern __u64 bpf_clz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ctz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ffs64(__u64 x) __weak __ksym;
+extern __u64 bpf_fls64(__u64 x) __weak __ksym;
+extern __u64 bpf_bitrev64(__u64 x) __weak __ksym;
+extern __u64 bpf_popcnt64(__u64 x) __weak __ksym;
+extern __u64 bpf_rol64(__u64 x, __u64 s) __weak __ksym;
+extern __u64 bpf_ror64(__u64 x, __u64 s) __weak __ksym;
+
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_BITS	4
diff --git a/tools/testing/selftests/bpf/prog_tests/bitops.c b/tools/testing/selftests/bpf/prog_tests/bitops.c
new file mode 100644
index 000000000000..59bf1c5b5102
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bitops.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "bitops.skel.h"
+
+struct bitops_case {
+	__u64 x;
+	__u64 s;
+	__u64 exp;
+};
+
+static struct bitops_case cases[] = {
+	{ 0x0ULL, 0, 0 },
+	{ 0x1ULL, 1, 0 },
+	{ 0x8000000000000000ULL, 63, 0 },
+	{ 0xffffffffffffffffULL, 64, 0 },
+	{ 0x0123456789abcdefULL, 65, 0 },
+	{ 0x0000000100000000ULL, 127, 0 },
+};
+
+static __u64 clz64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? __builtin_clzll(x) : 64;
+}
+
+static __u64 ctz64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? __builtin_ctzll(x) : 64;
+}
+
+static __u64 ffs64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? (__u64)__builtin_ctzll(x) + 1 : 0;
+}
+
+static __u64 fls64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? 64 - __builtin_clzll(x) : 0;
+}
+
+static __u64 popcnt64(__u64 x, __u64 s)
+{
+	(void)s;
+	return __builtin_popcountll(x);
+}
+
+static __u64 bitrev64(__u64 x, __u64 s)
+{
+	__u64 y = 0;
+	int i;
+
+	(void)s;
+
+	for (i = 0; i < 64; i++) {
+		y <<= 1;
+		y |= x & 1;
+		x >>= 1;
+	}
+	return y;
+}
+
+static __u64 rol64(__u64 x, __u64 s)
+{
+	s &= 63;
+	return (x << s) | (x >> ((-s) & 63));
+}
+
+static __u64 ror64(__u64 x, __u64 s)
+{
+	s &= 63;
+	return (x >> s) | (x << ((-s) & 63));
+}
+
+static void test_bitops_case(const char *prog_name)
+{
+	struct bpf_program *prog;
+	struct bitops *skel;
+	size_t i;
+	int err;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = bitops__open();
+	if (!ASSERT_OK_PTR(skel, "bitops__open"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	bpf_program__set_autoload(prog, true);
+
+	err = bitops__load(skel);
+	if (err == -EOPNOTSUPP) {
+		test__skip();
+		goto cleanup;
+	}
+	if (!ASSERT_OK(err, "bitops__load"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		skel->bss->in_x = cases[i].x;
+		skel->bss->in_s = cases[i].s;
+		err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts);
+		if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+			goto cleanup;
+
+		if (!ASSERT_OK(topts.retval, "retval"))
+			goto cleanup;
+
+		ASSERT_EQ(skel->bss->out, cases[i].exp, "out");
+	}
+
+cleanup:
+	bitops__destroy(skel);
+}
+
+#define RUN_BITOPS_CASE(_bitops, _prog)					\
+	do {								\
+		for (size_t i = 0; i < ARRAY_SIZE(cases); i++)		\
+			cases[i].exp = _bitops(cases[i].x, cases[i].s);	\
+		test_bitops_case(_prog);				\
+	} while (0)
+
+static void test_clz64(void)
+{
+	RUN_BITOPS_CASE(clz64, "bitops_clz64");
+}
+
+static void test_ctz64(void)
+{
+	RUN_BITOPS_CASE(ctz64, "bitops_ctz64");
+}
+
+static void test_ffs64(void)
+{
+	RUN_BITOPS_CASE(ffs64, "bitops_ffs64");
+}
+
+static void test_fls64(void)
+{
+	RUN_BITOPS_CASE(fls64, "bitops_fls64");
+}
+
+static void test_bitrev64(void)
+{
+	RUN_BITOPS_CASE(bitrev64, "bitops_bitrev");
+}
+
+static void test_popcnt64(void)
+{
+	RUN_BITOPS_CASE(popcnt64, "bitops_popcnt");
+}
+
+static void test_rol64(void)
+{
+	RUN_BITOPS_CASE(rol64, "bitops_rol64");
+}
+
+static void test_ror64(void)
+{
+	RUN_BITOPS_CASE(ror64, "bitops_ror64");
+}
+
+void test_bitops(void)
+{
+	if (test__start_subtest("clz64"))
+		test_clz64();
+	if (test__start_subtest("ctz64"))
+		test_ctz64();
+	if (test__start_subtest("ffs64"))
+		test_ffs64();
+	if (test__start_subtest("fls64"))
+		test_fls64();
+	if (test__start_subtest("bitrev64"))
+		test_bitrev64();
+	if (test__start_subtest("popcnt64"))
+		test_popcnt64();
+	if (test__start_subtest("rol64"))
+		test_rol64();
+	if (test__start_subtest("ror64"))
+		test_ror64();
+}
diff --git a/tools/testing/selftests/bpf/progs/bitops.c b/tools/testing/selftests/bpf/progs/bitops.c
new file mode 100644
index 000000000000..5d5b192bf3d9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitops.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_experimental.h"
+
+__u64 in_x;
+__u64 in_s;
+
+__u64 out;
+
+SEC("?syscall")
+int bitops_clz64(void *ctx)
+{
+	out = bpf_clz64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ctz64(void *ctx)
+{
+	out = bpf_ctz64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ffs64(void *ctx)
+{
+	out = bpf_ffs64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_fls64(void *ctx)
+{
+	out = bpf_fls64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_bitrev(void *ctx)
+{
+	out = bpf_bitrev64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_popcnt(void *ctx)
+{
+	out = bpf_popcnt64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_rol64(void *ctx)
+{
+	out = bpf_rol64(in_x, in_s);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ror64(void *ctx)
+{
+	out = bpf_ror64(in_x, in_s);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
  2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
@ 2026-02-11  3:05   ` Alexei Starovoitov
  2026-02-11  3:29     ` Leon Hwang
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2026-02-11  3:05 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann

On Mon, Feb 9, 2026 at 7:59 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Introduce the following 64bit bitops kfuncs:
>
> * bpf_clz64(): Count leading zeros.
> * bpf_ctz64(): Count trailing zeros.
> * bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
>   is 0.
> * bpf_fls64(): Find last set bit, 1-based index.
> * bpf_bitrev64(): Reverse bits.
> * bpf_popcnt64(): Population count.
> * bpf_rol64(): Rotate left.
> * bpf_ror64(): Rotate right.
>
> Especially,
>
> * bpf_clz64(0) = 64
> * bpf_ctz64(0) = 64
> * bpf_ffs64(0) = 0
> * bpf_fls64(0) = 0
>
> These kfuncs are marked with a new KF_MUST_INLINE flag, which indicates
> the kfunc must be inlined by the JIT backend. A weak function
> bpf_jit_inlines_bitops() is introduced for JIT backends to advertise
> support for individual bitops.
>
> bpf_rol64() and bpf_ror64() kfuncs do not have KF_FASTCALL due to
> BPF_REG_4 ('cl' actually) will be used on x86_64. The other kfuncs have
> KF_FASTCALL to avoid clobbering unused registers.
>
> An internal BPF_ALU64 opcode BPF_BITOPS is introduced as the encoding
> for these operations, with the immediate field selecting the specific
> operation (BPF_CLZ64, BPF_CTZ64, etc.).
>
> The verifier rejects the kfunc in check_kfunc_call() if the JIT backend
> does not support it, and rewrites the call to a BPF_BITOPS instruction
> in fixup_kfunc_call().
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/btf.h    |  1 +
>  include/linux/filter.h | 20 +++++++++++++
>  kernel/bpf/core.c      |  6 ++++
>  kernel/bpf/helpers.c   | 50 ++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c  | 65 ++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 142 insertions(+)
>
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 48108471c5b1..8ac1dc59ca85 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -79,6 +79,7 @@
>  #define KF_ARENA_ARG1   (1 << 14) /* kfunc takes an arena pointer as its first argument */
>  #define KF_ARENA_ARG2   (1 << 15) /* kfunc takes an arena pointer as its second argument */
>  #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
> +#define KF_MUST_INLINE  (1 << 17) /* kfunc must be inlined by JIT backend */

UX is not great.
Just keep kfuncs in C as fallback when JIT cannot inline them
and don't remove spill/fills that llvm leaves for fastcall.

>
>  /*
>   * Tag marking a kernel function as a kfunc. This is meant to minimize the
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 4e1cb4f91f49..ff6c0cf68dd3 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
>                 .off   = 0,                                     \
>                 .imm   = 0 })
>
> +/* bitops */
> +#define BPF_BITOPS     0xe0    /* opcode for alu64 */
> +#define BPF_CLZ64      0x00    /* imm for clz64 */
> +#define BPF_CTZ64      0x01    /* imm for ctz64 */
> +#define BPF_FFS64      0x02    /* imm for ffs64 */
> +#define BPF_FLS64      0x03    /* imm for fls64 */
> +#define BPF_BITREV64   0x04    /* imm for bitrev64 */
> +#define BPF_POPCNT64   0x05    /* imm for popcnt64 */
> +#define BPF_ROL64      0x06    /* imm for rol64 */
> +#define BPF_ROR64      0x07    /* imm for ror64 */
> +
> +#define BPF_BITOPS_INSN(IMM)                                   \
> +       ((struct bpf_insn) {                                    \
> +               .code  = BPF_ALU64 | BPF_BITOPS,                \
> +               .dst_reg = 0,                                   \
> +               .src_reg = 0,                                   \
> +               .off   = 0,                                     \
> +               .imm   = IMM })
> +

why introduce pseudo instructions and this encoding?
Just let JIT identify kfunc calls by address.
bpf_jit_get_func_addr()
if (addr == bpf_clz64) ...

>  /* Internal classic blocks for direct assignment */
>
>  #define __BPF_STMT(CODE, K)                                    \
> @@ -1157,6 +1176,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
>  void bpf_jit_compile(struct bpf_prog *prog);
>  bool bpf_jit_needs_zext(void);
>  bool bpf_jit_inlines_helper_call(s32 imm);
> +bool bpf_jit_inlines_bitops(s32 imm);
>  bool bpf_jit_supports_subprog_tailcalls(void);
>  bool bpf_jit_supports_percpu_insn(void);
>  bool bpf_jit_supports_kfunc_call(void);
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index dc906dfdff94..cee90181d169 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3113,6 +3113,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
>         return false;
>  }
>
> +/* Return TRUE if the JIT backend inlines the bitops insn. */
> +bool __weak bpf_jit_inlines_bitops(s32 imm)
> +{
> +       return false;
> +}
> +
>  /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
>  bool __weak bpf_jit_supports_subprog_tailcalls(void)
>  {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 7ac32798eb04..0a598c800f67 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -29,6 +29,8 @@
>  #include <linux/task_work.h>
>  #include <linux/irq_work.h>
>  #include <linux/buildid.h>
> +#include <linux/bitops.h>
> +#include <linux/bitrev.h>
>
>  #include "../../lib/kstrtox.h"
>
> @@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
>         }
>  }
>
> +__bpf_kfunc u64 bpf_clz64(u64 x)
> +{
> +       return x ? 64 - fls64(x) : 64;
> +}
> +
> +__bpf_kfunc u64 bpf_ctz64(u64 x)
> +{
> +       return x ? __ffs64(x) : 64;
> +}
> +
> +__bpf_kfunc u64 bpf_ffs64(u64 x)
> +{
> +       return x ? __ffs64(x) + 1 : 0;
> +}
> +
> +__bpf_kfunc u64 bpf_fls64(u64 x)
> +{
> +       return fls64(x);
> +}
> +
> +__bpf_kfunc u64 bpf_popcnt64(u64 x)
> +{
> +       return hweight64(x);
> +}
> +
> +__bpf_kfunc u64 bpf_bitrev64(u64 x)
> +{
> +       return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
> +}
> +
> +__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
> +{
> +       return rol64(x, s);
> +}
> +
> +__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
> +{
> +       return ror64(x, s);
> +}
> +
>  __bpf_kfunc_end_defs();
>
>  static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
> @@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
>  BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
>  #endif
>  #endif
> +BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)

Mark all of them as fastcall and do push/pop in JIT when necessary.

pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
  2026-02-11  3:05   ` Alexei Starovoitov
@ 2026-02-11  3:29     ` Leon Hwang
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-11  3:29 UTC (permalink / raw)
  To: Alexei Starovoitov, Leon Hwang
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann



On 11/2/26 11:05, Alexei Starovoitov wrote:
> On Mon, Feb 9, 2026 at 7:59 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>

[...]

>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index 48108471c5b1..8ac1dc59ca85 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -79,6 +79,7 @@
>>  #define KF_ARENA_ARG1   (1 << 14) /* kfunc takes an arena pointer as its first argument */
>>  #define KF_ARENA_ARG2   (1 << 15) /* kfunc takes an arena pointer as its second argument */
>>  #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
>> +#define KF_MUST_INLINE  (1 << 17) /* kfunc must be inlined by JIT backend */
> 
> UX is not great.
> Just keep kfuncs in C as fallback when JIT cannot inline them
> and don't remove spill/fills that llvm leaves for fastcall.
> 

Ack.

I’ll drop KF_MUST_INLINE in the next revision and keep the C kfunc
implementation as the fallback when the JIT cannot inline it.

>>
>>  /*
>>   * Tag marking a kernel function as a kfunc. This is meant to minimize the
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 4e1cb4f91f49..ff6c0cf68dd3 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
>>                 .off   = 0,                                     \
>>                 .imm   = 0 })
>>
>> +/* bitops */
>> +#define BPF_BITOPS     0xe0    /* opcode for alu64 */
>> +#define BPF_CLZ64      0x00    /* imm for clz64 */
>> +#define BPF_CTZ64      0x01    /* imm for ctz64 */
>> +#define BPF_FFS64      0x02    /* imm for ffs64 */
>> +#define BPF_FLS64      0x03    /* imm for fls64 */
>> +#define BPF_BITREV64   0x04    /* imm for bitrev64 */
>> +#define BPF_POPCNT64   0x05    /* imm for popcnt64 */
>> +#define BPF_ROL64      0x06    /* imm for rol64 */
>> +#define BPF_ROR64      0x07    /* imm for ror64 */
>> +
>> +#define BPF_BITOPS_INSN(IMM)                                   \
>> +       ((struct bpf_insn) {                                    \
>> +               .code  = BPF_ALU64 | BPF_BITOPS,                \
>> +               .dst_reg = 0,                                   \
>> +               .src_reg = 0,                                   \
>> +               .off   = 0,                                     \
>> +               .imm   = IMM })
>> +
> 
> why introduce pseudo instructions and this encoding?
> Just let JIT identify kfunc calls by address.
> bpf_jit_get_func_addr()
> if (addr == bpf_clz64) ...
> 

Thanks for pointing me to bpf_jit_get_func_addr().

I’ll drop the BPF_BITOPS encoding and BPF_BITOPS_INSN, and instead
let the JIT identify the bitops kfuncs by their resolved function
address via bpf_jit_get_func_addr().

That should keep things simpler and avoid introducing a new internal
opcode.

>>  /* Internal classic blocks for direct assignment */
>>
>>  #define __BPF_STMT(CODE, K)                                    \

[...]

>> @@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
>>  BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
>>  #endif
>>  #endif
>> +BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)
> 
> Mark all of them as fastcall and do push/pop in JIT when necessary.
> 

Good idea.

I’ll mark all bitops kfuncs as KF_FASTCALL and handle any required
save/restore in the JIT.

In particular, for bpf_rol64() and bpf_ror64() on x86_64, we do need
to use rcx (CL) for variable rotates, so pushing/popping rcx in the
JIT when needed makes sense.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-11  3:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
2026-02-11  3:05   ` Alexei Starovoitov
2026-02-11  3:29     ` Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox