* [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs
@ 2026-02-09 15:59 Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
Introduce the following 64-bit bitops kfuncs for x86_64 and arm64:
* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.
Especially,
* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0
bpf_ffs64() was previously discussed in "bpf: Add generic kfunc bpf_ffs64()" [1].
Background
In the earlier bpf_ffs64() discussion, the main concern with exposing such
operations as generic kfuncs was ABI cost. A normal kfunc call follows the
BPF calling convention, which forces the compiler/JIT to treat R1-R5 as
call-clobbered, resulting in unnecessary spill/fill compared to a dedicated
instruction.
This RFC keeps the user-facing API as kfuncs, but avoids the ABI cost in the
fast path. The verifier rewrites supported bitops kfunc calls into a single
internal ALU64 encoding (BPF_BITOPS with an immediate selector), and JIT
backends emit native instructions directly. As a result, these kfuncs behave
like ISA operations once loaded, rather than real helper calls.
To make this contract explicit, the kfuncs are marked with a new
KF_MUST_INLINE flag: program load fails with -EOPNOTSUPP if the active JIT
backend cannot inline a particular operation. This keeps the cost predictable
and avoids silent slow fallbacks. A weak hook, bpf_jit_inlines_bitops(),
allows each JIT backend to advertise support on a per-operation basis
(and potentially based on CPU features).
Most operations are also tagged KF_FASTCALL to avoid clobbering unused
argument registers. bpf_rol64() and bpf_ror64() are the exception on x86_64,
where variable rotates require CL (BPF_REG_4).
Selftests output
On x86_64:
#18/1 bitops/clz64:OK
#18/2 bitops/ctz64:OK
#18/3 bitops/ffs64:OK
#18/4 bitops/fls64:OK
#18/5 bitops/bitrev64:SKIP
#18/6 bitops/popcnt64:OK
#18/7 bitops/rol64:OK
#18/8 bitops/ror64:OK
#18 bitops:OK (SKIP: 1/8)
Summary: 1/7 PASSED, 1 SKIPPED, 0 FAILED
On arm64:
#18/1 bitops/clz64:OK
#18/2 bitops/ctz64:OK
#18/3 bitops/ffs64:OK
#18/4 bitops/fls64:OK
#18/5 bitops/bitrev64:OK
#18/6 bitops/popcnt64:SKIP
#18/7 bitops/rol64:OK
#18/8 bitops/ror64:OK
#18 bitops:OK (SKIP: 1/8)
Summary: 1/7 PASSED, 1 SKIPPED, 0 FAILED
Open questions
1. Should these operations be exposed as a proper BPF ISA extension (new
ALU64 ops) instead of a kfunc API plus verifier rewrite? This RFC takes
the kfunc route to iterate without immediately committing to new uapi
instruction semantics, while still ensuring instruction-like codegen.
2. For operations without a reasonable native implementation on some
targets (e.g. bitrev64 on x86_64; popcnt64 on arm64 without touching
SIMD registers), should we allow a true generic fallback by dropping
KF_MUST_INLINE for those ops, or keep the "no-inline == reject" behavior
for predictability?
Links:
[1] https://lore.kernel.org/bpf/20240131155607.51157-1-hffilwlqm@gmail.com/
Leon Hwang (4):
bpf: Introduce 64bit bitops kfuncs
bpf, x86: Add 64bit bitops kfuncs support for x86_64
bpf, arm64: Add 64bit bitops kfuncs support
selftests/bpf: Add tests for 64bit bitops kfuncs
arch/arm64/net/bpf_jit_comp.c | 143 ++++++++++++++
arch/x86/net/bpf_jit_comp.c | 153 ++++++++++++++
include/linux/btf.h | 1 +
include/linux/filter.h | 20 ++
kernel/bpf/core.c | 6 +
kernel/bpf/helpers.c | 50 +++++
kernel/bpf/verifier.c | 65 ++++++
.../testing/selftests/bpf/bpf_experimental.h | 9 +
.../testing/selftests/bpf/prog_tests/bitops.c | 186 ++++++++++++++++++
tools/testing/selftests/bpf/progs/bitops.c | 69 +++++++
10 files changed, 702 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
create mode 100644 tools/testing/selftests/bpf/progs/bitops.c
--
2.52.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
2026-02-11 3:05 ` Alexei Starovoitov
2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
Introduce the following 64bit bitops kfuncs:
* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.
Especially,
* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0
These kfuncs are marked with a new KF_MUST_INLINE flag, which indicates
the kfunc must be inlined by the JIT backend. A weak function
bpf_jit_inlines_bitops() is introduced for JIT backends to advertise
support for individual bitops.
bpf_rol64() and bpf_ror64() kfuncs do not have KF_FASTCALL due to
BPF_REG_4 ('cl' actually) will be used on x86_64. The other kfuncs have
KF_FASTCALL to avoid clobbering unused registers.
An internal BPF_ALU64 opcode BPF_BITOPS is introduced as the encoding
for these operations, with the immediate field selecting the specific
operation (BPF_CLZ64, BPF_CTZ64, etc.).
The verifier rejects the kfunc in check_kfunc_call() if the JIT backend
does not support it, and rewrites the call to a BPF_BITOPS instruction
in fixup_kfunc_call().
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/btf.h | 1 +
include/linux/filter.h | 20 +++++++++++++
kernel/bpf/core.c | 6 ++++
kernel/bpf/helpers.c | 50 ++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 65 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 142 insertions(+)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 48108471c5b1..8ac1dc59ca85 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -79,6 +79,7 @@
#define KF_ARENA_ARG1 (1 << 14) /* kfunc takes an arena pointer as its first argument */
#define KF_ARENA_ARG2 (1 << 15) /* kfunc takes an arena pointer as its second argument */
#define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
+#define KF_MUST_INLINE (1 << 17) /* kfunc must be inlined by JIT backend */
/*
* Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 4e1cb4f91f49..ff6c0cf68dd3 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
.off = 0, \
.imm = 0 })
+/* bitops */
+#define BPF_BITOPS 0xe0 /* opcode for alu64 */
+#define BPF_CLZ64 0x00 /* imm for clz64 */
+#define BPF_CTZ64 0x01 /* imm for ctz64 */
+#define BPF_FFS64 0x02 /* imm for ffs64 */
+#define BPF_FLS64 0x03 /* imm for fls64 */
+#define BPF_BITREV64 0x04 /* imm for bitrev64 */
+#define BPF_POPCNT64 0x05 /* imm for popcnt64 */
+#define BPF_ROL64 0x06 /* imm for rol64 */
+#define BPF_ROR64 0x07 /* imm for ror64 */
+
+#define BPF_BITOPS_INSN(IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_BITOPS, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
/* Internal classic blocks for direct assignment */
#define __BPF_STMT(CODE, K) \
@@ -1157,6 +1176,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
void bpf_jit_compile(struct bpf_prog *prog);
bool bpf_jit_needs_zext(void);
bool bpf_jit_inlines_helper_call(s32 imm);
+bool bpf_jit_inlines_bitops(s32 imm);
bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_percpu_insn(void);
bool bpf_jit_supports_kfunc_call(void);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index dc906dfdff94..cee90181d169 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3113,6 +3113,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
return false;
}
+/* Return TRUE if the JIT backend inlines the bitops insn. */
+bool __weak bpf_jit_inlines_bitops(s32 imm)
+{
+ return false;
+}
+
/* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
bool __weak bpf_jit_supports_subprog_tailcalls(void)
{
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ac32798eb04..0a598c800f67 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -29,6 +29,8 @@
#include <linux/task_work.h>
#include <linux/irq_work.h>
#include <linux/buildid.h>
+#include <linux/bitops.h>
+#include <linux/bitrev.h>
#include "../../lib/kstrtox.h"
@@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
}
}
+__bpf_kfunc u64 bpf_clz64(u64 x)
+{
+ return x ? 64 - fls64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ctz64(u64 x)
+{
+ return x ? __ffs64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ffs64(u64 x)
+{
+ return x ? __ffs64(x) + 1 : 0;
+}
+
+__bpf_kfunc u64 bpf_fls64(u64 x)
+{
+ return fls64(x);
+}
+
+__bpf_kfunc u64 bpf_popcnt64(u64 x)
+{
+ return hweight64(x);
+}
+
+__bpf_kfunc u64 bpf_bitrev64(u64 x)
+{
+ return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
+}
+
+__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
+{
+ return rol64(x, s);
+}
+
+__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
+{
+ return ror64(x, s);
+}
+
__bpf_kfunc_end_defs();
static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
#endif
#endif
+BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
+BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)
BTF_KFUNCS_END(generic_btf_ids)
static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index edf5342b982f..ed9a077ecf2e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12477,6 +12477,14 @@ enum special_kfunc_type {
KF_bpf_session_is_return,
KF_bpf_stream_vprintk,
KF_bpf_stream_print_stack,
+ KF_bpf_clz64,
+ KF_bpf_ctz64,
+ KF_bpf_ffs64,
+ KF_bpf_fls64,
+ KF_bpf_bitrev64,
+ KF_bpf_popcnt64,
+ KF_bpf_rol64,
+ KF_bpf_ror64,
};
BTF_ID_LIST(special_kfunc_list)
@@ -12557,6 +12565,14 @@ BTF_ID(func, bpf_arena_reserve_pages)
BTF_ID(func, bpf_session_is_return)
BTF_ID(func, bpf_stream_vprintk)
BTF_ID(func, bpf_stream_print_stack)
+BTF_ID(func, bpf_clz64)
+BTF_ID(func, bpf_ctz64)
+BTF_ID(func, bpf_ffs64)
+BTF_ID(func, bpf_fls64)
+BTF_ID(func, bpf_bitrev64)
+BTF_ID(func, bpf_popcnt64)
+BTF_ID(func, bpf_rol64)
+BTF_ID(func, bpf_ror64)
static bool is_task_work_add_kfunc(u32 func_id)
{
@@ -12564,6 +12580,30 @@ static bool is_task_work_add_kfunc(u32 func_id)
func_id == special_kfunc_list[KF_bpf_task_work_schedule_resume];
}
+static bool get_bitops_insn_imm(u32 func_id, s32 *imm)
+{
+ if (func_id == special_kfunc_list[KF_bpf_clz64])
+ *imm = BPF_CLZ64;
+ else if (func_id == special_kfunc_list[KF_bpf_ctz64])
+ *imm = BPF_CTZ64;
+ else if (func_id == special_kfunc_list[KF_bpf_ffs64])
+ *imm = BPF_FFS64;
+ else if (func_id == special_kfunc_list[KF_bpf_fls64])
+ *imm = BPF_FLS64;
+ else if (func_id == special_kfunc_list[KF_bpf_bitrev64])
+ *imm = BPF_BITREV64;
+ else if (func_id == special_kfunc_list[KF_bpf_popcnt64])
+ *imm = BPF_POPCNT64;
+ else if (func_id == special_kfunc_list[KF_bpf_rol64])
+ *imm = BPF_ROL64;
+ else if (func_id == special_kfunc_list[KF_bpf_ror64])
+ *imm = BPF_ROR64;
+ else
+ return false;
+
+ return true;
+}
+
static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
{
if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
@@ -14044,6 +14084,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int err, insn_idx = *insn_idx_p;
const struct btf_param *args;
struct btf *desc_btf;
+ bool is_bitops_kfunc;
+ s32 insn_imm;
/* skip for now, but return error when we find this in fixup_kfunc_call */
if (!insn->imm)
@@ -14423,6 +14465,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
if (meta.func_id == special_kfunc_list[KF_bpf_session_cookie])
env->prog->call_session_cookie = true;
+ is_bitops_kfunc = get_bitops_insn_imm(meta.func_id, &insn_imm);
+ if ((meta.kfunc_flags & KF_MUST_INLINE)) {
+ bool inlined = is_bitops_kfunc && bpf_jit_inlines_bitops(insn_imm);
+
+ if (!inlined) {
+ verbose(env, "JIT does not support inlining the kfunc %s.\n", func_name);
+ return -EOPNOTSUPP;
+ }
+ }
+
return 0;
}
@@ -23236,6 +23288,19 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
insn_buf[4] = BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1);
insn_buf[5] = BPF_ALU64_IMM(BPF_NEG, BPF_REG_0, 0);
*cnt = 6;
+ } else if (get_bitops_insn_imm(desc->func_id, &insn_buf[0].imm)) {
+ s32 imm = insn_buf[0].imm;
+
+ if (imm == BPF_FFS64) {
+ insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, 0);
+ insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 2);
+ insn_buf[2] = BPF_BITOPS_INSN(imm);
+ insn_buf[3] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1);
+ *cnt = 4;
+ } else {
+ insn_buf[0] = BPF_BITOPS_INSN(imm);
+ *cnt = 1;
+ }
}
if (env->insn_aux_data[insn_idx].arg_prog) {
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64
2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang
3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
Implement JIT inlining of the 64bit bitops kfuncs on x86_64.
bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
bpf_clz64(), bpf_ctz64(), bpf_ffs64(), and bpf_fls64() are supported
when the CPU has X86_FEATURE_ABM (LZCNT/TZCNT).
bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
bpf_bitrev64() is not supported as x86_64 has no native bit-reverse
instruction.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
arch/x86/net/bpf_jit_comp.c | 153 ++++++++++++++++++++++++++++++++++++
1 file changed, 153 insertions(+)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 070ba80e39d7..5d6215071cbd 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -19,6 +19,7 @@
#include <asm/text-patching.h>
#include <asm/unwind.h>
#include <asm/cfi.h>
+#include <asm/cpufeatures.h>
static bool all_callee_regs_used[4] = {true, true, true, true};
@@ -1604,6 +1605,134 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
*pprog = prog;
}
+static int emit_bitops(u8 **pprog, u32 bitops)
+{
+ u8 *prog = *pprog;
+
+ /*
+ * x86 Bit manipulation instruction set
+ * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
+ */
+
+ switch (bitops) {
+ case BPF_CLZ64:
+ /*
+ * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+ *
+ * LZCNT - Count the Number of Leading Zero Bits
+ *
+ * Opcode/Instruction
+ * F3 REX.W 0F BD /r
+ * LZCNT r64, r/m64
+ *
+ * Op/En
+ * RVM
+ *
+ * 64/32-bit Mode
+ * V/N.E.
+ *
+ * CPUID Feature Flag
+ * LZCNT
+ *
+ * Description
+ * Count the number of leading zero bits in r/m64, return
+ * result in r64.
+ */
+ /* emit: x ? 64 - fls64(x) : 64 */
+ /* lzcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+ break;
+
+ case BPF_CTZ64:
+ /*
+ * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+ *
+ * TZCNT - Count the Number of Trailing Zero Bits
+ *
+ * Opcode/Instruction
+ * F3 REX.W 0F BC /r
+ * TZCNT r64, r/m64
+ *
+ * Op/En
+ * RVM
+ *
+ * 64/32-bit Mode
+ * V/N.E.
+ *
+ * CPUID Feature Flag
+ * BMI1
+ *
+ * Description
+ * Count the number of trailing zero bits in r/m64, return
+ * result in r64.
+ */
+ /* emit: x ? __ffs64(x) : 64 */
+ /* tzcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+ break;
+
+ case BPF_FFS64:
+ /* emit: __ffs64(x), 'x == 0' was handled by verifier */
+ /* tzcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+ break;
+
+ case BPF_FLS64:
+ /* emit: fls64(x) */
+ /* lzcnt rax, rdi; neg rax; add rax, 64 */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+ EMIT3(0x48, 0xF7, 0xD8); /* neg rax */
+ EMIT4(0x48, 0x83, 0xC0, 0x40); /* add rax, 64 */
+ break;
+
+ case BPF_POPCNT64:
+ /*
+ * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+ *
+ * POPCNT - Return the Count of Number of Bits Set to 1
+ *
+ * Opcode/Instruction
+ * F3 REX.W 0F B8 /r
+ * POPCNT r64, r/m64
+ *
+ * Op/En
+ * RM
+ *
+ * 64 Mode
+ * Valid
+ *
+ * Compat/Leg Mode
+ * N.E.
+ *
+ * Description
+ * POPCNT on r/m64
+ */
+ /* popcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xB8, 0xC7);
+ break;
+
+ case BPF_ROL64:
+ /* emit: rol64(x, s) */
+ EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+ EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+ EMIT3(0x48, 0xD3, 0xC0); /* rol rax, cl */
+ break;
+
+ case BPF_ROR64:
+ /* emit: ror64(x, s) */
+ EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+ EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+ EMIT3(0x48, 0xD3, 0xC8); /* ror rax, cl */
+ break;
+
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ *pprog = prog;
+ return 0;
+}
+
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
#define __LOAD_TCC_PTR(off) \
@@ -2113,6 +2242,12 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
}
break;
+ case BPF_ALU64 | BPF_BITOPS:
+ err = emit_bitops(&prog, insn->imm);
+ if (err)
+ return err;
+ break;
+
/* speculation barrier */
case BPF_ST | BPF_NOSPEC:
EMIT_LFENCE();
@@ -4117,3 +4252,21 @@ bool bpf_jit_supports_fsession(void)
{
return true;
}
+
+bool bpf_jit_inlines_bitops(s32 imm)
+{
+ switch (imm) {
+ case BPF_CLZ64:
+ case BPF_CTZ64:
+ case BPF_FFS64:
+ case BPF_FLS64:
+ return boot_cpu_has(X86_FEATURE_ABM);
+ case BPF_POPCNT64:
+ return boot_cpu_has(X86_FEATURE_POPCNT);
+ case BPF_ROL64:
+ case BPF_ROR64:
+ return true;
+ default:
+ return false;
+ }
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support
2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang
3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
Implement JIT inlining of the 64bit bitops kfuncs on arm64.
bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
supported using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
implemented via RBIT + CLZ, or via the native CTZ instruction when
FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always
supported via RORV.
bpf_popcnt64() is not supported as the native population count
instruction requires NEON/SIMD registers, which should not be touched
from BPF programs.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
arch/arm64/net/bpf_jit_comp.c | 143 ++++++++++++++++++++++++++++++++++
1 file changed, 143 insertions(+)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 2dc5037694ba..b91896cef247 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1199,6 +1199,123 @@ static int add_exception_handler(const struct bpf_insn *insn,
return 0;
}
+static inline u32 a64_clz64(u8 rd, u8 rn)
+{
+ /*
+ * Arm Architecture Reference Manual for A-profile architecture
+ * (Document number: ARM DDI 0487)
+ *
+ * A64 Base Instruction Descriptions
+ * C6.2 Alphabetical list of A64 base instructions
+ *
+ * C6.2.91 CLZ
+ *
+ * Count leading zeros
+ *
+ * This instruction counts the number of consecutive binary zero bits,
+ * starting from the most significant bit in the source register,
+ * and places the count in the destination register.
+ */
+ /* CLZ Xd, Xn */
+ return 0xdac01000 | (rn << 5) | rd;
+}
+
+static inline u32 a64_ctz64(u8 rd, u8 rn)
+{
+ /*
+ * Arm Architecture Reference Manual for A-profile architecture
+ * (Document number: ARM DDI 0487)
+ *
+ * A64 Base Instruction Descriptions
+ * C6.2 Alphabetical list of A64 base instructions
+ *
+ * C6.2.144 CTZ
+ *
+ * Count trailing zeros
+ *
+ * This instruction counts the number of consecutive binary zero bits,
+ * starting from the least significant bit in the source register,
+ * and places the count in the destination register.
+ *
+ * This instruction requires FEAT_CSSC.
+ */
+ /* CTZ Xd, Xn */
+ return 0xdac01800 | (rn << 5) | rd;
+}
+
+static inline u32 a64_rbit64(u8 rd, u8 rn)
+{
+ /*
+ * Arm Architecture Reference Manual for A-profile architecture
+ * (Document number: ARM DDI 0487)
+ *
+ * A64 Base Instruction Descriptions
+ * C6.2 Alphabetical list of A64 base instructions
+ *
+ * C6.2.320 RBIT
+ *
+ * Reverse bits
+ *
+ * This instruction reverses the bit order in a register.
+ */
+ /* RBIT Xd, Xn */
+ return 0xdac00000 | (rn << 5) | rd;
+}
+
+static inline bool supports_cssc(void)
+{
+ /*
+ * Documentation/arch/arm64/cpu-feature-registers.rst
+ *
+ * ID_AA64ISAR2_EL1 - Instruction set attribute register 2
+ *
+ * CSSC
+ */
+ return cpuid_feature_extract_unsigned_field(read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1),
+ ID_AA64ISAR2_EL1_CSSC_SHIFT);
+}
+
+static int emit_bitops(struct jit_ctx *ctx, s32 imm)
+{
+ const u8 r0 = bpf2a64[BPF_REG_0];
+ const u8 r1 = bpf2a64[BPF_REG_1];
+ const u8 r2 = bpf2a64[BPF_REG_2];
+ const u8 tmp = bpf2a64[TMP_REG_1];
+
+ switch (imm) {
+ case BPF_CLZ64:
+ emit(a64_clz64(r0, r1), ctx);
+ break;
+ case BPF_CTZ64:
+ case BPF_FFS64:
+ if (supports_cssc()) {
+ emit(a64_ctz64(r0, r1), ctx);
+ } else {
+ emit(a64_rbit64(tmp, r1), ctx);
+ emit(a64_clz64(r0, tmp), ctx);
+ }
+ break;
+ case BPF_FLS64:
+ emit(a64_clz64(tmp, r1), ctx);
+ emit(A64_NEG(1, tmp, tmp), ctx);
+ emit(A64_ADD_I(1, r0, tmp, 64), ctx);
+ break;
+ case BPF_BITREV64:
+ emit(a64_rbit64(r0, r1), ctx);
+ break;
+ case BPF_ROL64:
+ emit(A64_NEG(1, tmp, r2), ctx);
+ emit(A64_DATA2(1, r0, r1, tmp, RORV), ctx);
+ break;
+ case BPF_ROR64:
+ emit(A64_DATA2(1, r0, r1, r2, RORV), ctx);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+ return 0;
+}
+
/* JITs an eBPF instruction.
* Returns:
* 0 - successfully JITed an 8-byte eBPF instruction.
@@ -1451,6 +1568,11 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
case BPF_ALU64 | BPF_ARSH | BPF_K:
emit(A64_ASR(is64, dst, dst, imm), ctx);
break;
+ case BPF_ALU64 | BPF_BITOPS:
+ ret = emit_bitops(ctx, imm);
+ if (ret)
+ return ret;
+ break;
/* JUMP reg */
case BPF_JMP | BPF_JA | BPF_X:
@@ -3207,3 +3329,24 @@ void bpf_jit_free(struct bpf_prog *prog)
bpf_prog_unlock_free(prog);
}
+
+bool bpf_jit_inlines_bitops(s32 imm)
+{
+ switch (imm) {
+ case BPF_CLZ64:
+ case BPF_CTZ64:
+ case BPF_FFS64:
+ case BPF_FLS64:
+ case BPF_BITREV64:
+ /* They use RBIT/CLZ/CTZ which are mandatory in ARM64 */
+ return true;
+ case BPF_POPCNT64:
+ /* We should not touch NEON/SIMD register to support popcnt64 */
+ return false;
+ case BPF_ROL64:
+ case BPF_ROR64:
+ return true;
+ default:
+ return false;
+ }
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs
2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
` (2 preceding siblings ...)
2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
@ 2026-02-09 15:59 ` Leon Hwang
3 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-09 15:59 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
Add selftests for bpf_clz64(), bpf_ctz64(), bpf_ffs64(), bpf_fls64(),
bpf_bitrev64(), bpf_popcnt64(), bpf_rol64(), and bpf_ror64().
Each subtest compares the kfunc result against a userspace reference
implementation across a set of test vectors. If the JIT does not support
inlining a given kfunc, the subtest is skipped (-EOPNOTSUPP at load
time).
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
.../testing/selftests/bpf/bpf_experimental.h | 9 +
.../testing/selftests/bpf/prog_tests/bitops.c | 186 ++++++++++++++++++
tools/testing/selftests/bpf/progs/bitops.c | 69 +++++++
3 files changed, 264 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
create mode 100644 tools/testing/selftests/bpf/progs/bitops.c
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 4b7210c318dd..3a7d126968b3 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -594,6 +594,15 @@ extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym;
extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
struct bpf_dynptr *value_p) __weak __ksym;
+extern __u64 bpf_clz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ctz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ffs64(__u64 x) __weak __ksym;
+extern __u64 bpf_fls64(__u64 x) __weak __ksym;
+extern __u64 bpf_bitrev64(__u64 x) __weak __ksym;
+extern __u64 bpf_popcnt64(__u64 x) __weak __ksym;
+extern __u64 bpf_rol64(__u64 x, __u64 s) __weak __ksym;
+extern __u64 bpf_ror64(__u64 x, __u64 s) __weak __ksym;
+
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define HARDIRQ_BITS 4
diff --git a/tools/testing/selftests/bpf/prog_tests/bitops.c b/tools/testing/selftests/bpf/prog_tests/bitops.c
new file mode 100644
index 000000000000..59bf1c5b5102
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bitops.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "bitops.skel.h"
+
+struct bitops_case {
+ __u64 x;
+ __u64 s;
+ __u64 exp;
+};
+
+static struct bitops_case cases[] = {
+ { 0x0ULL, 0, 0 },
+ { 0x1ULL, 1, 0 },
+ { 0x8000000000000000ULL, 63, 0 },
+ { 0xffffffffffffffffULL, 64, 0 },
+ { 0x0123456789abcdefULL, 65, 0 },
+ { 0x0000000100000000ULL, 127, 0 },
+};
+
+static __u64 clz64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? __builtin_clzll(x) : 64;
+}
+
+static __u64 ctz64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? __builtin_ctzll(x) : 64;
+}
+
+static __u64 ffs64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? (__u64)__builtin_ctzll(x) + 1 : 0;
+}
+
+static __u64 fls64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? 64 - __builtin_clzll(x) : 0;
+}
+
+static __u64 popcnt64(__u64 x, __u64 s)
+{
+ (void)s;
+ return __builtin_popcountll(x);
+}
+
+static __u64 bitrev64(__u64 x, __u64 s)
+{
+ __u64 y = 0;
+ int i;
+
+ (void)s;
+
+ for (i = 0; i < 64; i++) {
+ y <<= 1;
+ y |= x & 1;
+ x >>= 1;
+ }
+ return y;
+}
+
+static __u64 rol64(__u64 x, __u64 s)
+{
+ s &= 63;
+ return (x << s) | (x >> ((-s) & 63));
+}
+
+static __u64 ror64(__u64 x, __u64 s)
+{
+ s &= 63;
+ return (x >> s) | (x << ((-s) & 63));
+}
+
+static void test_bitops_case(const char *prog_name)
+{
+ struct bpf_program *prog;
+ struct bitops *skel;
+ size_t i;
+ int err;
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+ skel = bitops__open();
+ if (!ASSERT_OK_PTR(skel, "bitops__open"))
+ return;
+
+ prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+ if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+ goto cleanup;
+
+ bpf_program__set_autoload(prog, true);
+
+ err = bitops__load(skel);
+ if (err == -EOPNOTSUPP) {
+ test__skip();
+ goto cleanup;
+ }
+ if (!ASSERT_OK(err, "bitops__load"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(cases); i++) {
+ skel->bss->in_x = cases[i].x;
+ skel->bss->in_s = cases[i].s;
+ err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts);
+ if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+ goto cleanup;
+
+ if (!ASSERT_OK(topts.retval, "retval"))
+ goto cleanup;
+
+ ASSERT_EQ(skel->bss->out, cases[i].exp, "out");
+ }
+
+cleanup:
+ bitops__destroy(skel);
+}
+
+#define RUN_BITOPS_CASE(_bitops, _prog) \
+ do { \
+ for (size_t i = 0; i < ARRAY_SIZE(cases); i++) \
+ cases[i].exp = _bitops(cases[i].x, cases[i].s); \
+ test_bitops_case(_prog); \
+ } while (0)
+
+static void test_clz64(void)
+{
+ RUN_BITOPS_CASE(clz64, "bitops_clz64");
+}
+
+static void test_ctz64(void)
+{
+ RUN_BITOPS_CASE(ctz64, "bitops_ctz64");
+}
+
+static void test_ffs64(void)
+{
+ RUN_BITOPS_CASE(ffs64, "bitops_ffs64");
+}
+
+static void test_fls64(void)
+{
+ RUN_BITOPS_CASE(fls64, "bitops_fls64");
+}
+
+static void test_bitrev64(void)
+{
+ RUN_BITOPS_CASE(bitrev64, "bitops_bitrev");
+}
+
+static void test_popcnt64(void)
+{
+ RUN_BITOPS_CASE(popcnt64, "bitops_popcnt");
+}
+
+static void test_rol64(void)
+{
+ RUN_BITOPS_CASE(rol64, "bitops_rol64");
+}
+
+static void test_ror64(void)
+{
+ RUN_BITOPS_CASE(ror64, "bitops_ror64");
+}
+
+void test_bitops(void)
+{
+ if (test__start_subtest("clz64"))
+ test_clz64();
+ if (test__start_subtest("ctz64"))
+ test_ctz64();
+ if (test__start_subtest("ffs64"))
+ test_ffs64();
+ if (test__start_subtest("fls64"))
+ test_fls64();
+ if (test__start_subtest("bitrev64"))
+ test_bitrev64();
+ if (test__start_subtest("popcnt64"))
+ test_popcnt64();
+ if (test__start_subtest("rol64"))
+ test_rol64();
+ if (test__start_subtest("ror64"))
+ test_ror64();
+}
diff --git a/tools/testing/selftests/bpf/progs/bitops.c b/tools/testing/selftests/bpf/progs/bitops.c
new file mode 100644
index 000000000000..5d5b192bf3d9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitops.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_experimental.h"
+
+__u64 in_x;
+__u64 in_s;
+
+__u64 out;
+
+SEC("?syscall")
+int bitops_clz64(void *ctx)
+{
+ out = bpf_clz64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_ctz64(void *ctx)
+{
+ out = bpf_ctz64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_ffs64(void *ctx)
+{
+ out = bpf_ffs64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_fls64(void *ctx)
+{
+ out = bpf_fls64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_bitrev(void *ctx)
+{
+ out = bpf_bitrev64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_popcnt(void *ctx)
+{
+ out = bpf_popcnt64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_rol64(void *ctx)
+{
+ out = bpf_rol64(in_x, in_s);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_ror64(void *ctx)
+{
+ out = bpf_ror64(in_x, in_s);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
@ 2026-02-11 3:05 ` Alexei Starovoitov
2026-02-11 3:29 ` Leon Hwang
0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2026-02-11 3:05 UTC (permalink / raw)
To: Leon Hwang; +Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
On Mon, Feb 9, 2026 at 7:59 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Introduce the following 64bit bitops kfuncs:
>
> * bpf_clz64(): Count leading zeros.
> * bpf_ctz64(): Count trailing zeros.
> * bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
> is 0.
> * bpf_fls64(): Find last set bit, 1-based index.
> * bpf_bitrev64(): Reverse bits.
> * bpf_popcnt64(): Population count.
> * bpf_rol64(): Rotate left.
> * bpf_ror64(): Rotate right.
>
> Especially,
>
> * bpf_clz64(0) = 64
> * bpf_ctz64(0) = 64
> * bpf_ffs64(0) = 0
> * bpf_fls64(0) = 0
>
> These kfuncs are marked with a new KF_MUST_INLINE flag, which indicates
> the kfunc must be inlined by the JIT backend. A weak function
> bpf_jit_inlines_bitops() is introduced for JIT backends to advertise
> support for individual bitops.
>
> bpf_rol64() and bpf_ror64() kfuncs do not have KF_FASTCALL due to
> BPF_REG_4 ('cl' actually) will be used on x86_64. The other kfuncs have
> KF_FASTCALL to avoid clobbering unused registers.
>
> An internal BPF_ALU64 opcode BPF_BITOPS is introduced as the encoding
> for these operations, with the immediate field selecting the specific
> operation (BPF_CLZ64, BPF_CTZ64, etc.).
>
> The verifier rejects the kfunc in check_kfunc_call() if the JIT backend
> does not support it, and rewrites the call to a BPF_BITOPS instruction
> in fixup_kfunc_call().
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> include/linux/btf.h | 1 +
> include/linux/filter.h | 20 +++++++++++++
> kernel/bpf/core.c | 6 ++++
> kernel/bpf/helpers.c | 50 ++++++++++++++++++++++++++++++++
> kernel/bpf/verifier.c | 65 ++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 142 insertions(+)
>
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 48108471c5b1..8ac1dc59ca85 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -79,6 +79,7 @@
> #define KF_ARENA_ARG1 (1 << 14) /* kfunc takes an arena pointer as its first argument */
> #define KF_ARENA_ARG2 (1 << 15) /* kfunc takes an arena pointer as its second argument */
> #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
> +#define KF_MUST_INLINE (1 << 17) /* kfunc must be inlined by JIT backend */
UX is not great.
Just keep kfuncs in C as fallback when JIT cannot inline them
and don't remove spill/fills that llvm leaves for fastcall.
>
> /*
> * Tag marking a kernel function as a kfunc. This is meant to minimize the
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 4e1cb4f91f49..ff6c0cf68dd3 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
> .off = 0, \
> .imm = 0 })
>
> +/* bitops */
> +#define BPF_BITOPS 0xe0 /* opcode for alu64 */
> +#define BPF_CLZ64 0x00 /* imm for clz64 */
> +#define BPF_CTZ64 0x01 /* imm for ctz64 */
> +#define BPF_FFS64 0x02 /* imm for ffs64 */
> +#define BPF_FLS64 0x03 /* imm for fls64 */
> +#define BPF_BITREV64 0x04 /* imm for bitrev64 */
> +#define BPF_POPCNT64 0x05 /* imm for popcnt64 */
> +#define BPF_ROL64 0x06 /* imm for rol64 */
> +#define BPF_ROR64 0x07 /* imm for ror64 */
> +
> +#define BPF_BITOPS_INSN(IMM) \
> + ((struct bpf_insn) { \
> + .code = BPF_ALU64 | BPF_BITOPS, \
> + .dst_reg = 0, \
> + .src_reg = 0, \
> + .off = 0, \
> + .imm = IMM })
> +
why introduce pseudo instructions and this encoding?
Just let JIT identify kfunc calls by address.
bpf_jit_get_func_addr()
if (addr == bpf_clz64) ...
> /* Internal classic blocks for direct assignment */
>
> #define __BPF_STMT(CODE, K) \
> @@ -1157,6 +1176,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
> void bpf_jit_compile(struct bpf_prog *prog);
> bool bpf_jit_needs_zext(void);
> bool bpf_jit_inlines_helper_call(s32 imm);
> +bool bpf_jit_inlines_bitops(s32 imm);
> bool bpf_jit_supports_subprog_tailcalls(void);
> bool bpf_jit_supports_percpu_insn(void);
> bool bpf_jit_supports_kfunc_call(void);
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index dc906dfdff94..cee90181d169 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3113,6 +3113,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
> return false;
> }
>
> +/* Return TRUE if the JIT backend inlines the bitops insn. */
> +bool __weak bpf_jit_inlines_bitops(s32 imm)
> +{
> + return false;
> +}
> +
> /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
> bool __weak bpf_jit_supports_subprog_tailcalls(void)
> {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 7ac32798eb04..0a598c800f67 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -29,6 +29,8 @@
> #include <linux/task_work.h>
> #include <linux/irq_work.h>
> #include <linux/buildid.h>
> +#include <linux/bitops.h>
> +#include <linux/bitrev.h>
>
> #include "../../lib/kstrtox.h"
>
> @@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
> }
> }
>
> +__bpf_kfunc u64 bpf_clz64(u64 x)
> +{
> + return x ? 64 - fls64(x) : 64;
> +}
> +
> +__bpf_kfunc u64 bpf_ctz64(u64 x)
> +{
> + return x ? __ffs64(x) : 64;
> +}
> +
> +__bpf_kfunc u64 bpf_ffs64(u64 x)
> +{
> + return x ? __ffs64(x) + 1 : 0;
> +}
> +
> +__bpf_kfunc u64 bpf_fls64(u64 x)
> +{
> + return fls64(x);
> +}
> +
> +__bpf_kfunc u64 bpf_popcnt64(u64 x)
> +{
> + return hweight64(x);
> +}
> +
> +__bpf_kfunc u64 bpf_bitrev64(u64 x)
> +{
> + return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
> +}
> +
> +__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
> +{
> + return rol64(x, s);
> +}
> +
> +__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
> +{
> + return ror64(x, s);
> +}
> +
> __bpf_kfunc_end_defs();
>
> static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
> @@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
> BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
> #endif
> #endif
> +BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
> +BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)
Mark all of them as fastcall and do push/pop in JIT when necessary.
pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH bpf-next 1/4] bpf: Introduce 64bit bitops kfuncs
2026-02-11 3:05 ` Alexei Starovoitov
@ 2026-02-11 3:29 ` Leon Hwang
0 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-02-11 3:29 UTC (permalink / raw)
To: Alexei Starovoitov, Leon Hwang
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
On 11/2/26 11:05, Alexei Starovoitov wrote:
> On Mon, Feb 9, 2026 at 7:59 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
[...]
>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index 48108471c5b1..8ac1dc59ca85 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -79,6 +79,7 @@
>> #define KF_ARENA_ARG1 (1 << 14) /* kfunc takes an arena pointer as its first argument */
>> #define KF_ARENA_ARG2 (1 << 15) /* kfunc takes an arena pointer as its second argument */
>> #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
>> +#define KF_MUST_INLINE (1 << 17) /* kfunc must be inlined by JIT backend */
>
> UX is not great.
> Just keep kfuncs in C as fallback when JIT cannot inline them
> and don't remove spill/fills that llvm leaves for fastcall.
>
Ack.
I’ll drop KF_MUST_INLINE in the next revision and keep the C kfunc
implementation as the fallback when the JIT cannot inline it.
>>
>> /*
>> * Tag marking a kernel function as a kfunc. This is meant to minimize the
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 4e1cb4f91f49..ff6c0cf68dd3 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -514,6 +514,25 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
>> .off = 0, \
>> .imm = 0 })
>>
>> +/* bitops */
>> +#define BPF_BITOPS 0xe0 /* opcode for alu64 */
>> +#define BPF_CLZ64 0x00 /* imm for clz64 */
>> +#define BPF_CTZ64 0x01 /* imm for ctz64 */
>> +#define BPF_FFS64 0x02 /* imm for ffs64 */
>> +#define BPF_FLS64 0x03 /* imm for fls64 */
>> +#define BPF_BITREV64 0x04 /* imm for bitrev64 */
>> +#define BPF_POPCNT64 0x05 /* imm for popcnt64 */
>> +#define BPF_ROL64 0x06 /* imm for rol64 */
>> +#define BPF_ROR64 0x07 /* imm for ror64 */
>> +
>> +#define BPF_BITOPS_INSN(IMM) \
>> + ((struct bpf_insn) { \
>> + .code = BPF_ALU64 | BPF_BITOPS, \
>> + .dst_reg = 0, \
>> + .src_reg = 0, \
>> + .off = 0, \
>> + .imm = IMM })
>> +
>
> why introduce pseudo instructions and this encoding?
> Just let JIT identify kfunc calls by address.
> bpf_jit_get_func_addr()
> if (addr == bpf_clz64) ...
>
Thanks for pointing me to bpf_jit_get_func_addr().
I’ll drop the BPF_BITOPS encoding and BPF_BITOPS_INSN, and instead
let the JIT identify the bitops kfuncs by their resolved function
address via bpf_jit_get_func_addr().
That should keep things simpler and avoid introducing a new internal
opcode.
>> /* Internal classic blocks for direct assignment */
>>
>> #define __BPF_STMT(CODE, K) \
[...]
>> @@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
>> BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
>> #endif
>> #endif
>> +BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL | KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_rol64, KF_MUST_INLINE)
>> +BTF_ID_FLAGS(func, bpf_ror64, KF_MUST_INLINE)
>
> Mark all of them as fastcall and do push/pop in JIT when necessary.
>
Good idea.
I’ll mark all bitops kfuncs as KF_FASTCALL and handle any required
save/restore in the JIT.
In particular, for bpf_rol64() and bpf_ror64() on x86_64, we do need
to use rcx (CL) for variable rotates, so pushing/popping rcx in the
JIT when needed makes sense.
Thanks,
Leon
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-02-11 3:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-09 15:59 [RFC PATCH bpf-next 0/4] bpf: Introduce 64bit bitops kfuncs Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 1/4] " Leon Hwang
2026-02-11 3:05 ` Alexei Starovoitov
2026-02-11 3:29 ` Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 2/4] bpf, x86: Add 64bit bitops kfuncs support for x86_64 Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 3/4] bpf, arm64: Add 64bit bitops kfuncs support Leon Hwang
2026-02-09 15:59 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add tests for 64bit bitops kfuncs Leon Hwang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox