* [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs
@ 2026-02-19 14:29 Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
` (5 more replies)
0 siblings, 6 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
linux-kselftest, kernel-patches-bot
This series adds generic 64-bit bitops kfuncs and JIT inlining support
on x86_64 and arm64.
The new kfuncs are:
* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.
Defined zero-input behavior:
* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0
bpf_ffs64() was previously discussed in
"bpf: Add generic kfunc bpf_ffs64()" [1].
The main concern in that discussion was ABI overhead: a regular kfunc call
follows the BPF calling convention and can introduce extra spill/fill compared
to dedicated instructions.
This series keeps the user-facing API as kfuncs while avoiding that overhead
on hot paths. When the JIT/backend and CPU support it, calls are inlined into
native instructions; otherwise they fall back to regular function calls.
Links:
[1] https://lore.kernel.org/bpf/20240131155607.51157-1-hffilwlqm@gmail.com/
Changes:
v1 -> v2:
* Drop RFC.
* Add __cpu_feature annotation for CPU-feature-gated tests.
* Add JIT disassembly tests for 64-bit bitops kfuncs
* Address comments from Alexei:
* Drop KF_MUST_INLINE.
* Drop internal BPF_ALU64 opcode BPF_BITOPS.
* Mark all of the kfuncs as fastcall and do push/pop in JIT when necessary.
* v1: https://lore.kernel.org/bpf/20260209155919.19015-1-leon.hwang@linux.dev/
Leon Hwang (6):
bpf: Introduce 64-bit bitops kfuncs
bpf, x86: Add 64-bit bitops kfuncs support for x86_64
bpf, arm64: Add 64-bit bitops kfuncs support
selftests/bpf: Add tests for 64-bit bitops kfuncs
selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated
tests
selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs
arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++
arch/x86/net/bpf_jit_comp.c | 141 +++++++++++++
include/linux/filter.h | 10 +
kernel/bpf/core.c | 6 +
kernel/bpf/helpers.c | 50 +++++
kernel/bpf/verifier.c | 53 ++++-
.../testing/selftests/bpf/bpf_experimental.h | 9 +
.../testing/selftests/bpf/prog_tests/bitops.c | 188 ++++++++++++++++++
tools/testing/selftests/bpf/progs/bitops.c | 68 +++++++
.../testing/selftests/bpf/progs/bitops_jit.c | 153 ++++++++++++++
tools/testing/selftests/bpf/progs/bpf_misc.h | 7 +
tools/testing/selftests/bpf/test_loader.c | 150 ++++++++++++++
12 files changed, 957 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
create mode 100644 tools/testing/selftests/bpf/progs/bitops.c
create mode 100644 tools/testing/selftests/bpf/progs/bitops_jit.c
--
2.52.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
2026-02-19 17:50 ` Alexei Starovoitov
2026-02-21 9:58 ` Dan Carpenter
2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
` (4 subsequent siblings)
5 siblings, 2 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
linux-kselftest, kernel-patches-bot
Add the following generic 64-bit bitops kfuncs:
* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.
Defined zero-input behavior:
* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0
These kfuncs are inlined by JIT backends when the required CPU features are
available. Otherwise, they fall back to regular function calls.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/filter.h | 10 ++++++++
kernel/bpf/core.c | 6 +++++
kernel/bpf/helpers.c | 50 +++++++++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 53 +++++++++++++++++++++++++++++++++++++++++-
4 files changed, 118 insertions(+), 1 deletion(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 44d7ae95ddbc..b8a538bec5c6 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1157,6 +1157,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
void bpf_jit_compile(struct bpf_prog *prog);
bool bpf_jit_needs_zext(void);
bool bpf_jit_inlines_helper_call(s32 imm);
+bool bpf_jit_inlines_kfunc_call(void *func_addr);
bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_percpu_insn(void);
bool bpf_jit_supports_kfunc_call(void);
@@ -1837,4 +1838,13 @@ static inline void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
}
#endif /* CONFIG_NET */
+u64 bpf_clz64(u64 x);
+u64 bpf_ctz64(u64 x);
+u64 bpf_ffs64(u64 x);
+u64 bpf_fls64(u64 x);
+u64 bpf_popcnt64(u64 x);
+u64 bpf_bitrev64(u64 x);
+u64 bpf_rol64(u64 x, u64 s);
+u64 bpf_ror64(u64 x, u64 s);
+
#endif /* __LINUX_FILTER_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 5ab6bace7d0d..5f37309d83fc 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3114,6 +3114,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
return false;
}
+/* Return TRUE if the JIT backend inlines the kfunc. */
+bool __weak bpf_jit_inlines_kfunc_call(void *func_addr)
+{
+ return false;
+}
+
/* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
bool __weak bpf_jit_supports_subprog_tailcalls(void)
{
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ac32798eb04..6bf73c46af72 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -29,6 +29,8 @@
#include <linux/task_work.h>
#include <linux/irq_work.h>
#include <linux/buildid.h>
+#include <linux/bitops.h>
+#include <linux/bitrev.h>
#include "../../lib/kstrtox.h"
@@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
}
}
+__bpf_kfunc u64 bpf_clz64(u64 x)
+{
+ return x ? 64 - fls64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ctz64(u64 x)
+{
+ return x ? __ffs64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ffs64(u64 x)
+{
+ return x ? __ffs64(x) + 1 : 0;
+}
+
+__bpf_kfunc u64 bpf_fls64(u64 x)
+{
+ return fls64(x);
+}
+
+__bpf_kfunc u64 bpf_popcnt64(u64 x)
+{
+ return hweight64(x);
+}
+
+__bpf_kfunc u64 bpf_bitrev64(u64 x)
+{
+ return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
+}
+
+__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
+{
+ return rol64(x, s);
+}
+
+__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
+{
+ return ror64(x, s);
+}
+
__bpf_kfunc_end_defs();
static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
#endif
#endif
+BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_rol64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_ror64, KF_FASTCALL)
BTF_KFUNCS_END(generic_btf_ids)
static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0162f946032f..2cb29bc1b3c3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12461,6 +12461,14 @@ enum special_kfunc_type {
KF_bpf_session_is_return,
KF_bpf_stream_vprintk,
KF_bpf_stream_print_stack,
+ KF_bpf_clz64,
+ KF_bpf_ctz64,
+ KF_bpf_ffs64,
+ KF_bpf_fls64,
+ KF_bpf_bitrev64,
+ KF_bpf_popcnt64,
+ KF_bpf_rol64,
+ KF_bpf_ror64,
};
BTF_ID_LIST(special_kfunc_list)
@@ -12541,6 +12549,14 @@ BTF_ID(func, bpf_arena_reserve_pages)
BTF_ID(func, bpf_session_is_return)
BTF_ID(func, bpf_stream_vprintk)
BTF_ID(func, bpf_stream_print_stack)
+BTF_ID(func, bpf_clz64)
+BTF_ID(func, bpf_ctz64)
+BTF_ID(func, bpf_ffs64)
+BTF_ID(func, bpf_fls64)
+BTF_ID(func, bpf_bitrev64)
+BTF_ID(func, bpf_popcnt64)
+BTF_ID(func, bpf_rol64)
+BTF_ID(func, bpf_ror64)
static bool is_task_work_add_kfunc(u32 func_id)
{
@@ -18204,6 +18220,34 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
}
}
+static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
+{
+ if (!(flags & KF_FASTCALL))
+ return false;
+
+ if (!env->prog->jit_requested)
+ return true;
+
+ if (func_id == special_kfunc_list[KF_bpf_clz64])
+ return bpf_jit_inlines_kfunc_call(bpf_clz64);
+ if (func_id == special_kfunc_list[KF_bpf_ctz64])
+ return bpf_jit_inlines_kfunc_call(bpf_ctz64);
+ if (func_id == special_kfunc_list[KF_bpf_ffs64])
+ return bpf_jit_inlines_kfunc_call(bpf_ffs64);
+ if (func_id == special_kfunc_list[KF_bpf_fls64])
+ return bpf_jit_inlines_kfunc_call(bpf_fls64);
+ if (func_id == special_kfunc_list[KF_bpf_bitrev64])
+ return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
+ if (func_id == special_kfunc_list[KF_bpf_popcnt64])
+ return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
+ if (func_id == special_kfunc_list[KF_bpf_rol64])
+ return bpf_jit_inlines_kfunc_call(bpf_rol64);
+ if (func_id == special_kfunc_list[KF_bpf_ror64])
+ return bpf_jit_inlines_kfunc_call(bpf_ror64);
+
+ return true;
+}
+
struct call_summary {
u8 num_params;
bool is_void;
@@ -18246,7 +18290,7 @@ static bool get_call_summary(struct bpf_verifier_env *env, struct bpf_insn *call
/* error would be reported later */
return false;
cs->num_params = btf_type_vlen(meta.func_proto);
- cs->fastcall = meta.kfunc_flags & KF_FASTCALL;
+ cs->fastcall = bpf_kfunc_is_fastcall(env, meta.func_id, meta.kfunc_flags);
cs->is_void = btf_type_is_void(btf_type_by_id(meta.btf, meta.func_proto->type));
return true;
}
@@ -23186,6 +23230,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
insn_buf[4] = BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1);
insn_buf[5] = BPF_ALU64_IMM(BPF_NEG, BPF_REG_0, 0);
*cnt = 6;
+ } else if (desc->func_id == special_kfunc_list[KF_bpf_ffs64] &&
+ bpf_jit_inlines_kfunc_call(bpf_ffs64)) {
+ insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, 0);
+ insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 2);
+ insn_buf[2] = *insn;
+ insn_buf[3] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1);
+ *cnt = 4;
}
if (env->insn_aux_data[insn_idx].arg_prog) {
--
2.52.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
2026-02-19 17:47 ` Alexei Starovoitov
` (2 more replies)
2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
` (3 subsequent siblings)
5 siblings, 3 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
linux-kselftest, kernel-patches-bot
Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
bpf_ctz64() and bpf_ffs64() are supported when the CPU has
X86_FEATURE_BMI1 (TZCNT).
bpf_clz64() and bpf_fls64() are supported when the CPU has
X86_FEATURE_ABM (LZCNT).
bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
instruction, so it falls back to a regular function call.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
1 file changed, 141 insertions(+)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 070ba80e39d7..193e1e2d7aa8 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -19,6 +19,7 @@
#include <asm/text-patching.h>
#include <asm/unwind.h>
#include <asm/cfi.h>
+#include <asm/cpufeatures.h>
static bool all_callee_regs_used[4] = {true, true, true, true};
@@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
*pprog = prog;
}
+static bool bpf_inlines_func_call(u8 **pprog, void *func)
+{
+ bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
+ bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
+ bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
+ bool inlined = true;
+ u8 *prog = *pprog;
+
+ /*
+ * x86 Bit manipulation instruction set
+ * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
+ */
+
+ if (func == bpf_clz64 && has_abm) {
+ /*
+ * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+ *
+ * LZCNT - Count the Number of Leading Zero Bits
+ *
+ * Opcode/Instruction
+ * F3 REX.W 0F BD /r
+ * LZCNT r64, r/m64
+ *
+ * Op/En
+ * RVM
+ *
+ * 64/32-bit Mode
+ * V/N.E.
+ *
+ * CPUID Feature Flag
+ * LZCNT
+ *
+ * Description
+ * Count the number of leading zero bits in r/m64, return
+ * result in r64.
+ */
+ /* emit: x ? 64 - fls64(x) : 64 */
+ /* lzcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+ } else if (func == bpf_ctz64 && has_bmi1) {
+ /*
+ * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+ *
+ * TZCNT - Count the Number of Trailing Zero Bits
+ *
+ * Opcode/Instruction
+ * F3 REX.W 0F BC /r
+ * TZCNT r64, r/m64
+ *
+ * Op/En
+ * RVM
+ *
+ * 64/32-bit Mode
+ * V/N.E.
+ *
+ * CPUID Feature Flag
+ * BMI1
+ *
+ * Description
+ * Count the number of trailing zero bits in r/m64, return
+ * result in r64.
+ */
+ /* emit: x ? __ffs64(x) : 64 */
+ /* tzcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+ } else if (func == bpf_ffs64 && has_bmi1) {
+ /* emit: __ffs64(x); x == 0 has been handled in verifier */
+ /* tzcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+ } else if (func == bpf_fls64 && has_abm) {
+ /* emit: fls64(x) */
+ /* lzcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+ EMIT3(0x48, 0xF7, 0xD8); /* neg rax */
+ EMIT4(0x48, 0x83, 0xC0, 0x40); /* add rax, 64 */
+ } else if (func == bpf_popcnt64 && has_popcnt) {
+ /*
+ * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+ *
+ * POPCNT - Return the Count of Number of Bits Set to 1
+ *
+ * Opcode/Instruction
+ * F3 REX.W 0F B8 /r
+ * POPCNT r64, r/m64
+ *
+ * Op/En
+ * RM
+ *
+ * 64 Mode
+ * Valid
+ *
+ * Compat/Leg Mode
+ * N.E.
+ *
+ * Description
+ * POPCNT on r/m64
+ */
+ /* popcnt rax, rdi */
+ EMIT5(0xF3, 0x48, 0x0F, 0xB8, 0xC7);
+ } else if (func == bpf_rol64) {
+ EMIT1(0x51); /* push rcx */
+ /* emit: rol64(x, s) */
+ EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+ EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+ EMIT3(0x48, 0xD3, 0xC0); /* rol rax, cl */
+ EMIT1(0x59); /* pop rcx */
+ } else if (func == bpf_ror64) {
+ EMIT1(0x51); /* push rcx */
+ /* emit: ror64(x, s) */
+ EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+ EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+ EMIT3(0x48, 0xD3, 0xC8); /* ror rax, cl */
+ EMIT1(0x59); /* pop rcx */
+ } else {
+ inlined = false;
+ }
+
+ *pprog = prog;
+ return inlined;
+}
+
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
#define __LOAD_TCC_PTR(off) \
@@ -2452,6 +2574,8 @@ st: if (is_imm8(insn->off))
u8 *ip = image + addrs[i - 1];
func = (u8 *) __bpf_call_base + imm32;
+ if (bpf_inlines_func_call(&prog, func))
+ break;
if (src_reg == BPF_PSEUDO_CALL && tail_call_reachable) {
LOAD_TAIL_CALL_CNT_PTR(stack_depth);
ip += 7;
@@ -4117,3 +4241,20 @@ bool bpf_jit_supports_fsession(void)
{
return true;
}
+
+bool bpf_jit_inlines_kfunc_call(void *func_addr)
+{
+ if (func_addr == bpf_ctz64 || func_addr == bpf_ffs64)
+ return boot_cpu_has(X86_FEATURE_BMI1);
+
+ if (func_addr == bpf_clz64 || func_addr == bpf_fls64)
+ return boot_cpu_has(X86_FEATURE_ABM);
+
+ if (func_addr == bpf_popcnt64)
+ return boot_cpu_has(X86_FEATURE_POPCNT);
+
+ if (func_addr == bpf_rol64 || func_addr == bpf_ror64)
+ return true;
+
+ return false;
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
2026-02-19 15:10 ` Puranjay Mohan
` (2 more replies)
2026-02-19 14:29 ` [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs Leon Hwang
` (2 subsequent siblings)
5 siblings, 3 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
linux-kselftest, kernel-patches-bot
Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
inlined via RBIT + CLZ, or via the native CTZ instruction when
FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
via RORV.
bpf_popcnt64() is not inlined as the native population count instruction
requires NEON/SIMD registers, which should not be touched from BPF
programs. It therefore falls back to a regular function call.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
1 file changed, 123 insertions(+)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 7a530ea4f5ae..f03f732063d9 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
return 0;
}
+static inline u32 a64_clz64(u8 rd, u8 rn)
+{
+ /*
+ * Arm Architecture Reference Manual for A-profile architecture
+ * (Document number: ARM DDI 0487)
+ *
+ * A64 Base Instruction Descriptions
+ * C6.2 Alphabetical list of A64 base instructions
+ *
+ * C6.2.91 CLZ
+ *
+ * Count leading zeros
+ *
+ * This instruction counts the number of consecutive binary zero bits,
+ * starting from the most significant bit in the source register,
+ * and places the count in the destination register.
+ */
+ /* CLZ Xd, Xn */
+ return 0xdac01000 | (rn << 5) | rd;
+}
+
+static inline u32 a64_ctz64(u8 rd, u8 rn)
+{
+ /*
+ * Arm Architecture Reference Manual for A-profile architecture
+ * (Document number: ARM DDI 0487)
+ *
+ * A64 Base Instruction Descriptions
+ * C6.2 Alphabetical list of A64 base instructions
+ *
+ * C6.2.144 CTZ
+ *
+ * Count trailing zeros
+ *
+ * This instruction counts the number of consecutive binary zero bits,
+ * starting from the least significant bit in the source register,
+ * and places the count in the destination register.
+ *
+ * This instruction requires FEAT_CSSC.
+ */
+ /* CTZ Xd, Xn */
+ return 0xdac01800 | (rn << 5) | rd;
+}
+
+static inline u32 a64_rbit64(u8 rd, u8 rn)
+{
+ /*
+ * Arm Architecture Reference Manual for A-profile architecture
+ * (Document number: ARM DDI 0487)
+ *
+ * A64 Base Instruction Descriptions
+ * C6.2 Alphabetical list of A64 base instructions
+ *
+ * C6.2.320 RBIT
+ *
+ * Reverse bits
+ *
+ * This instruction reverses the bit order in a register.
+ */
+ /* RBIT Xd, Xn */
+ return 0xdac00000 | (rn << 5) | rd;
+}
+
+static inline bool boot_cpu_supports_cssc(void)
+{
+ /*
+ * Documentation/arch/arm64/cpu-feature-registers.rst
+ *
+ * ID_AA64ISAR2_EL1 - Instruction set attribute register 2
+ *
+ * CSSC
+ */
+ return cpuid_feature_extract_unsigned_field(read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1),
+ ID_AA64ISAR2_EL1_CSSC_SHIFT);
+}
+
+static bool bpf_inlines_func_call(struct jit_ctx *ctx, void *func_addr)
+{
+ const u8 tmp = bpf2a64[TMP_REG_1];
+ const u8 r0 = bpf2a64[BPF_REG_0];
+ const u8 r1 = bpf2a64[BPF_REG_1];
+ const u8 r2 = bpf2a64[BPF_REG_2];
+ bool inlined = true;
+
+ if (func_addr == bpf_clz64) {
+ emit(a64_clz64(r0, r1), ctx);
+ } else if (func_addr == bpf_ctz64 || func_addr == bpf_ffs64) {
+ if (boot_cpu_supports_cssc()) {
+ emit(a64_ctz64(r0, r1), ctx);
+ } else {
+ emit(a64_rbit64(tmp, r1), ctx);
+ emit(a64_clz64(r0, tmp), ctx);
+ }
+ } else if (func_addr == bpf_fls64) {
+ emit(a64_clz64(tmp, r1), ctx);
+ emit(A64_NEG(1, tmp, tmp), ctx);
+ emit(A64_ADD_I(1, r0, tmp, 64), ctx);
+ } else if (func_addr == bpf_bitrev64) {
+ emit(a64_rbit64(r0, r1), ctx);
+ } else if (func_addr == bpf_rol64) {
+ emit(A64_NEG(1, tmp, r2), ctx);
+ emit(A64_DATA2(1, r0, r1, tmp, RORV), ctx);
+ } else if (func_addr == bpf_ror64) {
+ emit(A64_DATA2(1, r0, r1, r2, RORV), ctx);
+ } else {
+ inlined = false;
+ }
+
+ return inlined;
+}
+
+bool bpf_jit_inlines_kfunc_call(void *func_addr)
+{
+ if (func_addr == bpf_clz64 || func_addr == bpf_ctz64 ||
+ func_addr == bpf_ffs64 || func_addr == bpf_fls64 ||
+ func_addr == bpf_rol64 || func_addr == bpf_ror64 ||
+ func_addr == bpf_bitrev64)
+ return true;
+ return false;
+}
+
/* JITs an eBPF instruction.
* Returns:
* 0 - successfully JITed an 8-byte eBPF instruction.
@@ -1598,6 +1719,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
&func_addr, &func_addr_fixed);
if (ret < 0)
return ret;
+ if (bpf_inlines_func_call(ctx, (void *) func_addr))
+ break;
emit_call(func_addr, ctx);
/*
* Call to arch_bpf_timed_may_goto() is emitted by the
--
2.52.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
` (2 preceding siblings ...)
2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs Leon Hwang
5 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
linux-kselftest, kernel-patches-bot
Add selftests for bpf_clz64(), bpf_ctz64(), bpf_ffs64(), bpf_fls64(),
bpf_bitrev64(), bpf_popcnt64(), bpf_rol64(), and bpf_ror64().
Each subtest compares kfunc results against a userspace reference
implementation over a set of test vectors.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
.../testing/selftests/bpf/bpf_experimental.h | 9 +
.../testing/selftests/bpf/prog_tests/bitops.c | 182 ++++++++++++++++++
tools/testing/selftests/bpf/progs/bitops.c | 68 +++++++
3 files changed, 259 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
create mode 100644 tools/testing/selftests/bpf/progs/bitops.c
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 9df77e59d4f5..02a985ef71cc 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -594,6 +594,15 @@ extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym;
extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
struct bpf_dynptr *value_p) __weak __ksym;
+extern __u64 bpf_clz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ctz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ffs64(__u64 x) __weak __ksym;
+extern __u64 bpf_fls64(__u64 x) __weak __ksym;
+extern __u64 bpf_bitrev64(__u64 x) __weak __ksym;
+extern __u64 bpf_popcnt64(__u64 x) __weak __ksym;
+extern __u64 bpf_rol64(__u64 x, __u64 s) __weak __ksym;
+extern __u64 bpf_ror64(__u64 x, __u64 s) __weak __ksym;
+
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define HARDIRQ_BITS 4
diff --git a/tools/testing/selftests/bpf/prog_tests/bitops.c b/tools/testing/selftests/bpf/prog_tests/bitops.c
new file mode 100644
index 000000000000..9acc3cb1908c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bitops.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "bitops.skel.h"
+
+struct bitops_case {
+ __u64 x;
+ __u64 s;
+ __u64 exp;
+};
+
+static struct bitops_case cases[] = {
+ { 0x0ULL, 0, 0 },
+ { 0x1ULL, 1, 0 },
+ { 0x8000000000000000ULL, 63, 0 },
+ { 0xffffffffffffffffULL, 64, 0 },
+ { 0x0123456789abcdefULL, 65, 0 },
+ { 0x0000000100000000ULL, 127, 0 },
+};
+
+static __u64 clz64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? __builtin_clzll(x) : 64;
+}
+
+static __u64 ctz64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? __builtin_ctzll(x) : 64;
+}
+
+static __u64 ffs64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? (__u64)__builtin_ctzll(x) + 1 : 0;
+}
+
+static __u64 fls64(__u64 x, __u64 s)
+{
+ (void)s;
+ return x ? 64 - __builtin_clzll(x) : 0;
+}
+
+static __u64 popcnt64(__u64 x, __u64 s)
+{
+ (void)s;
+ return __builtin_popcountll(x);
+}
+
+static __u64 bitrev64(__u64 x, __u64 s)
+{
+ __u64 y = 0;
+ int i;
+
+ (void)s;
+
+ for (i = 0; i < 64; i++) {
+ y <<= 1;
+ y |= x & 1;
+ x >>= 1;
+ }
+ return y;
+}
+
+static __u64 rol64(__u64 x, __u64 s)
+{
+ s &= 63;
+ return (x << s) | (x >> ((-s) & 63));
+}
+
+static __u64 ror64(__u64 x, __u64 s)
+{
+ s &= 63;
+ return (x >> s) | (x << ((-s) & 63));
+}
+
+static void test_bitops_case(const char *prog_name)
+{
+ struct bpf_program *prog;
+ struct bitops *skel;
+ size_t i;
+ int err;
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+ skel = bitops__open();
+ if (!ASSERT_OK_PTR(skel, "bitops__open"))
+ return;
+
+ prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+ if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+ goto cleanup;
+
+ bpf_program__set_autoload(prog, true);
+
+ err = bitops__load(skel);
+ if (!ASSERT_OK(err, "bitops__load"))
+ goto cleanup;
+
+ for (i = 0; i < ARRAY_SIZE(cases); i++) {
+ skel->bss->in_x = cases[i].x;
+ skel->bss->in_s = cases[i].s;
+ err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts);
+ if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+ goto cleanup;
+
+ if (!ASSERT_OK(topts.retval, "retval"))
+ goto cleanup;
+
+ ASSERT_EQ(skel->bss->out, cases[i].exp, "out");
+ }
+
+cleanup:
+ bitops__destroy(skel);
+}
+
+#define RUN_BITOPS_CASE(_bitops, _prog) \
+ do { \
+ for (size_t i = 0; i < ARRAY_SIZE(cases); i++) \
+ cases[i].exp = _bitops(cases[i].x, cases[i].s); \
+ test_bitops_case(_prog); \
+ } while (0)
+
+static void test_clz64(void)
+{
+ RUN_BITOPS_CASE(clz64, "bitops_clz64");
+}
+
+static void test_ctz64(void)
+{
+ RUN_BITOPS_CASE(ctz64, "bitops_ctz64");
+}
+
+static void test_ffs64(void)
+{
+ RUN_BITOPS_CASE(ffs64, "bitops_ffs64");
+}
+
+static void test_fls64(void)
+{
+ RUN_BITOPS_CASE(fls64, "bitops_fls64");
+}
+
+static void test_bitrev64(void)
+{
+ RUN_BITOPS_CASE(bitrev64, "bitops_bitrev");
+}
+
+static void test_popcnt64(void)
+{
+ RUN_BITOPS_CASE(popcnt64, "bitops_popcnt");
+}
+
+static void test_rol64(void)
+{
+ RUN_BITOPS_CASE(rol64, "bitops_rol64");
+}
+
+static void test_ror64(void)
+{
+ RUN_BITOPS_CASE(ror64, "bitops_ror64");
+}
+
+void test_bitops(void)
+{
+ if (test__start_subtest("clz64"))
+ test_clz64();
+ if (test__start_subtest("ctz64"))
+ test_ctz64();
+ if (test__start_subtest("ffs64"))
+ test_ffs64();
+ if (test__start_subtest("fls64"))
+ test_fls64();
+ if (test__start_subtest("bitrev64"))
+ test_bitrev64();
+ if (test__start_subtest("popcnt64"))
+ test_popcnt64();
+ if (test__start_subtest("rol64"))
+ test_rol64();
+ if (test__start_subtest("ror64"))
+ test_ror64();
+}
diff --git a/tools/testing/selftests/bpf/progs/bitops.c b/tools/testing/selftests/bpf/progs/bitops.c
new file mode 100644
index 000000000000..deac09bc8683
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitops.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+
+__u64 in_x;
+__u64 in_s;
+
+__u64 out;
+
+SEC("?syscall")
+int bitops_clz64(void *ctx)
+{
+ out = bpf_clz64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_ctz64(void *ctx)
+{
+ out = bpf_ctz64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_ffs64(void *ctx)
+{
+ out = bpf_ffs64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_fls64(void *ctx)
+{
+ out = bpf_fls64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_bitrev(void *ctx)
+{
+ out = bpf_bitrev64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_popcnt(void *ctx)
+{
+ out = bpf_popcnt64(in_x);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_rol64(void *ctx)
+{
+ out = bpf_rol64(in_x, in_s);
+ return 0;
+}
+
+SEC("?syscall")
+int bitops_ror64(void *ctx)
+{
+ out = bpf_ror64(in_x, in_s);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.52.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
` (3 preceding siblings ...)
2026-02-19 14:29 ` [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs Leon Hwang
5 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
linux-kselftest, kernel-patches-bot
Add a new __cpu_feature("...") test annotation and parse it in
selftests/bpf test_loader.
Behavior:
- Annotation value is matched against CPU feature tokens from
/proc/cpuinfo (case-insensitive).
- Multiple __cpu_feature annotations can be specified for one test; all
required features must be present.
- If any required feature is missing, the test is skipped.
Limitation:
- __cpu_feature is evaluated per test function and is not scoped per
__arch_* block. A single test that combines multiple architectures
cannot express different per-arch feature requirements.
This lets JIT/disassembly-sensitive tests declare explicit CPU feature
requirements and avoid false failures on unsupported systems.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
tools/testing/selftests/bpf/progs/bpf_misc.h | 7 +
tools/testing/selftests/bpf/test_loader.c | 150 +++++++++++++++++++
2 files changed, 157 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index c9bfbe1bafc1..75e66373a64d 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -126,6 +126,12 @@
* Several __arch_* annotations could be specified at once.
* When test case is not run on current arch it is marked as skipped.
* __caps_unpriv Specify the capabilities that should be set when running the test.
+ * __cpu_feature Specify required CPU feature for test execution.
+ * Multiple __cpu_feature annotations could be specified.
+ * Value must match a CPU feature token exposed by
+ * /proc/cpuinfo (case-insensitive).
+ * Can't be used together with multiple __arch_* tags.
+ * If any required feature is not present, test case is skipped.
*
* __linear_size Specify the size of the linear area of non-linear skbs, or
* 0 for linear skbs.
@@ -156,6 +162,7 @@
#define __arch_riscv64 __arch("RISCV64")
#define __arch_s390x __arch("s390x")
#define __caps_unpriv(caps) __attribute__((btf_decl_tag("comment:test_caps_unpriv=" EXPAND_QUOTE(caps))))
+#define __cpu_feature(feat) __attribute__((btf_decl_tag("comment:test_cpu_feature=" feat)))
#define __load_if_JITed() __attribute__((btf_decl_tag("comment:load_mode=jited")))
#define __load_if_no_JITed() __attribute__((btf_decl_tag("comment:load_mode=no_jited")))
#define __stderr(msg) __attribute__((btf_decl_tag("comment:test_expect_stderr=" XSTR(__COUNTER__) "=" msg)))
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index 338c035c3688..3729d1572589 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -4,6 +4,7 @@
#include <stdlib.h>
#include <test_progs.h>
#include <bpf/btf.h>
+#include <ctype.h>
#include "autoconf_helper.h"
#include "disasm_helpers.h"
@@ -44,6 +45,7 @@
#define TEST_TAG_EXPECT_STDOUT_PFX "comment:test_expect_stdout="
#define TEST_TAG_EXPECT_STDOUT_PFX_UNPRIV "comment:test_expect_stdout_unpriv="
#define TEST_TAG_LINEAR_SIZE "comment:test_linear_size="
+#define TEST_TAG_CPU_FEATURE_PFX "comment:test_cpu_feature="
/* Warning: duplicated in bpf_misc.h */
#define POINTER_VALUE 0xbadcafe
@@ -67,6 +69,11 @@ enum load_mode {
NO_JITED = 1 << 1,
};
+struct cpu_feature_set {
+ char **names;
+ size_t cnt;
+};
+
struct test_subspec {
char *name;
bool expect_failure;
@@ -93,6 +100,7 @@ struct test_spec {
int linear_sz;
bool auxiliary;
bool valid;
+ struct cpu_feature_set cpu_features;
};
static int tester_init(struct test_loader *tester)
@@ -145,6 +153,16 @@ static void free_test_spec(struct test_spec *spec)
free(spec->unpriv.name);
spec->priv.name = NULL;
spec->unpriv.name = NULL;
+
+ if (spec->cpu_features.names) {
+ size_t i;
+
+ for (i = 0; i < spec->cpu_features.cnt; i++)
+ free(spec->cpu_features.names[i]);
+ free(spec->cpu_features.names);
+ spec->cpu_features.names = NULL;
+ spec->cpu_features.cnt = 0;
+ }
}
/* Compiles regular expression matching pattern.
@@ -394,6 +412,122 @@ static int get_current_arch(void)
return ARCH_UNKNOWN;
}
+static int cpu_feature_set_add(struct cpu_feature_set *set, const char *name)
+{
+ char **tmp, *norm;
+ size_t i, len;
+
+ if (!name || !name[0]) {
+ PRINT_FAIL("bad cpu feature spec: empty string");
+ return -EINVAL;
+ }
+
+ len = strlen(name);
+ norm = malloc(len + 1);
+ if (!norm)
+ return -ENOMEM;
+
+ for (i = 0; i < len; i++) {
+ if (isspace(name[i])) {
+ free(norm);
+ PRINT_FAIL("bad cpu feature spec: whitespace is not allowed in '%s'", name);
+ return -EINVAL;
+ }
+ norm[i] = tolower((unsigned char)name[i]);
+ }
+ norm[len] = '\0';
+
+ for (i = 0; i < set->cnt; i++) {
+ if (strcmp(set->names[i], norm) == 0) {
+ free(norm);
+ return 0;
+ }
+ }
+
+ tmp = realloc(set->names, (set->cnt + 1) * sizeof(*set->names));
+ if (!tmp) {
+ free(norm);
+ return -ENOMEM;
+ }
+ set->names = tmp;
+ set->names[set->cnt++] = norm;
+ return 0;
+}
+
+static bool cpu_feature_set_has(const struct cpu_feature_set *set, const char *name)
+{
+ size_t i;
+
+ for (i = 0; i < set->cnt; i++) {
+ if (strcmp(set->names[i], name) == 0)
+ return true;
+ }
+ return false;
+}
+
+static bool cpu_feature_set_includes(const struct cpu_feature_set *have,
+ const struct cpu_feature_set *need)
+{
+ size_t i;
+
+ for (i = 0; i < need->cnt; i++) {
+ if (!cpu_feature_set_has(have, need->names[i]))
+ return false;
+ }
+ return true;
+}
+
+static const struct cpu_feature_set *get_current_cpu_features(void)
+{
+ static struct cpu_feature_set set;
+ static bool initialized;
+ char *line = NULL;
+ size_t len = 0;
+ FILE *fp;
+ int err;
+
+ if (initialized)
+ return &set;
+
+ initialized = true;
+ fp = fopen("/proc/cpuinfo", "r");
+ if (!fp)
+ return &set;
+
+ while (getline(&line, &len, fp) != -1) {
+ char *p = line, *colon, *tok;
+
+ while (*p && isspace(*p))
+ p++;
+ if (!str_has_pfx(p, "flags") &&
+ !str_has_pfx(p, "Features") &&
+ !str_has_pfx(p, "features"))
+ continue;
+
+ colon = strchr(p, ':');
+ if (!colon)
+ continue;
+
+ for (tok = strtok(colon + 1, " \t\n"); tok; tok = strtok(NULL, " \t\n")) {
+ err = cpu_feature_set_add(&set, tok);
+ if (err) {
+ PRINT_FAIL("failed to parse cpu feature from '/proc/cpuinfo': '%s'",
+ tok);
+ break;
+ }
+ }
+ }
+
+ free(line);
+ fclose(fp);
+ return &set;
+}
+
+static int parse_cpu_feature(const char *name, struct cpu_feature_set *set)
+{
+ return cpu_feature_set_add(set, name);
+}
+
/* Uses btf_decl_tag attributes to describe the expected test
* behavior, see bpf_misc.h for detailed description of each attribute
* and attribute combinations.
@@ -650,9 +784,20 @@ static int parse_test_spec(struct test_loader *tester,
err = -EINVAL;
goto cleanup;
}
+ } else if (str_has_pfx(s, TEST_TAG_CPU_FEATURE_PFX)) {
+ val = s + sizeof(TEST_TAG_CPU_FEATURE_PFX) - 1;
+ err = parse_cpu_feature(val, &spec->cpu_features);
+ if (err)
+ goto cleanup;
}
}
+ if (spec->cpu_features.cnt && __builtin_popcount(arch_mask) != 1) {
+ PRINT_FAIL("__cpu_feature requires exactly one __arch_* tag");
+ err = -EINVAL;
+ goto cleanup;
+ }
+
spec->arch_mask = arch_mask ?: -1;
spec->load_mask = load_mask ?: (JITED | NO_JITED);
@@ -1161,6 +1306,11 @@ void run_subtest(struct test_loader *tester,
return;
}
+ if (!cpu_feature_set_includes(get_current_cpu_features(), &spec->cpu_features)) {
+ test__skip();
+ return;
+ }
+
if (unpriv) {
if (!can_execute_unpriv(tester, spec)) {
test__skip();
--
2.52.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
` (4 preceding siblings ...)
2026-02-19 14:29 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
5 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
linux-kselftest, kernel-patches-bot
Add bitops_jit selftests that verify JITed instruction sequences for
supported 64-bit bitops kfuncs on x86_64 and arm64, including
CPU-feature-gated coverage on x86 where required.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
.../testing/selftests/bpf/prog_tests/bitops.c | 6 +
.../testing/selftests/bpf/progs/bitops_jit.c | 153 ++++++++++++++++++
2 files changed, 159 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/bitops_jit.c
diff --git a/tools/testing/selftests/bpf/prog_tests/bitops.c b/tools/testing/selftests/bpf/prog_tests/bitops.c
index 9acc3cb1908c..2c203904880d 100644
--- a/tools/testing/selftests/bpf/prog_tests/bitops.c
+++ b/tools/testing/selftests/bpf/prog_tests/bitops.c
@@ -2,6 +2,7 @@
#include <test_progs.h>
#include "bitops.skel.h"
+#include "bitops_jit.skel.h"
struct bitops_case {
__u64 x;
@@ -180,3 +181,8 @@ void test_bitops(void)
if (test__start_subtest("ror64"))
test_ror64();
}
+
+void test_bitops_jit(void)
+{
+ RUN_TESTS(bitops_jit);
+}
diff --git a/tools/testing/selftests/bpf/progs/bitops_jit.c b/tools/testing/selftests/bpf/progs/bitops_jit.c
new file mode 100644
index 000000000000..9f414e56b1e8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitops_jit.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+SEC("syscall")
+__description("bitops jit: clz64 uses lzcnt on x86 with abm")
+__success __retval(63)
+__arch_x86_64
+__cpu_feature("abm")
+__jited(" lzcnt{{.*}}")
+int bitops_jit_clz64_x86(void *ctx)
+{
+ return bpf_clz64(1);
+}
+
+SEC("syscall")
+__description("bitops jit: ctz64 uses tzcnt on x86 with bmi1")
+__success __retval(4)
+__arch_x86_64
+__cpu_feature("bmi1")
+__jited(" tzcnt{{.*}}")
+int bitops_jit_ctz64_x86(void *ctx)
+{
+ return bpf_ctz64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: ffs64 uses tzcnt on x86 with bmi1")
+__success __retval(5)
+__arch_x86_64
+__cpu_feature("bmi1")
+__jited(" tzcnt{{.*}}")
+int bitops_jit_ffs64_x86(void *ctx)
+{
+ return bpf_ffs64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: fls64 uses lzcnt on x86 with abm")
+__success __retval(5)
+__arch_x86_64
+__cpu_feature("abm")
+__jited(" lzcnt{{.*}}")
+int bitops_jit_fls64_x86(void *ctx)
+{
+ return bpf_fls64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: popcnt64 uses popcnt on x86")
+__success __retval(3)
+__arch_x86_64
+__cpu_feature("popcnt")
+__jited(" popcnt{{.*}}")
+int bitops_jit_popcnt64_x86(void *ctx)
+{
+ return bpf_popcnt64(0x1011);
+}
+
+SEC("syscall")
+__description("bitops jit: rol64 uses rol on x86")
+__success __retval(6)
+__arch_x86_64
+__jited(" rol{{.*}}")
+int bitops_jit_rol64_x86(void *ctx)
+{
+ return bpf_rol64(3, 1);
+}
+
+SEC("syscall")
+__description("bitops jit: ror64 uses ror on x86")
+__success __retval(3)
+__arch_x86_64
+__jited(" ror{{.*}}")
+int bitops_jit_ror64_x86(void *ctx)
+{
+ return bpf_ror64(6, 1);
+}
+
+SEC("syscall")
+__description("bitops jit: clz64 uses clz on arm64")
+__success __retval(63)
+__arch_arm64
+__jited(" clz {{.*}}")
+int bitops_jit_clz64_arm64(void *ctx)
+{
+ return bpf_clz64(1);
+}
+
+SEC("syscall")
+__description("bitops jit: ctz64 uses ctz or rbit+clz on arm64")
+__success __retval(4)
+__arch_arm64
+__jited(" {{(ctz|rbit)}} {{.*}}")
+int bitops_jit_ctz64_arm64(void *ctx)
+{
+ return bpf_ctz64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: ffs64 uses ctz or rbit+clz on arm64")
+__success __retval(5)
+__arch_arm64
+__jited(" {{(ctz|rbit)}} {{.*}}")
+int bitops_jit_ffs64_arm64(void *ctx)
+{
+ return bpf_ffs64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: fls64 uses clz on arm64")
+__success __retval(5)
+__arch_arm64
+__jited(" clz {{.*}}")
+int bitops_jit_fls64_arm64(void *ctx)
+{
+ return bpf_fls64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: bitrev64 uses rbit on arm64")
+__success __retval(1)
+__arch_arm64
+__jited(" rbit {{.*}}")
+int bitops_jit_bitrev64_arm64(void *ctx)
+{
+ return bpf_bitrev64(0x8000000000000000ULL);
+}
+
+SEC("syscall")
+__description("bitops jit: rol64 uses rorv on arm64")
+__success __retval(6)
+__arch_arm64
+__jited(" ror {{.*}}")
+int bitops_jit_rol64_arm64(void *ctx)
+{
+ return bpf_rol64(3, 1);
+}
+
+SEC("syscall")
+__description("bitops jit: ror64 uses rorv on arm64")
+__success __retval(3)
+__arch_arm64
+__jited(" ror {{.*}}")
+int bitops_jit_ror64_arm64(void *ctx)
+{
+ return bpf_ror64(6, 1);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.52.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
@ 2026-02-19 15:10 ` Puranjay Mohan
2026-02-19 15:20 ` Puranjay Mohan
2026-02-19 15:25 ` Puranjay Mohan
2 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-19 15:10 UTC (permalink / raw)
To: Leon Hwang, bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst, Viktor Malik,
linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
kernel-patches-bot
> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>
> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
> inlined via RBIT + CLZ, or via the native CTZ instruction when
> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
> via RORV.
>
> bpf_popcnt64() is not inlined as the native population count instruction
> requires NEON/SIMD registers, which should not be touched from BPF
> programs. It therefore falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
> 1 file changed, 123 insertions(+)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 7a530ea4f5ae..f03f732063d9 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
> return 0;
> }
>
> +static inline u32 a64_clz64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.91 CLZ
> + *
> + * Count leading zeros
> + *
> + * This instruction counts the number of consecutive binary zero bits,
> + * starting from the most significant bit in the source register,
> + * and places the count in the destination register.
> + */
> + /* CLZ Xd, Xn */
> + return 0xdac01000 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_ctz64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.144 CTZ
> + *
> + * Count trailing zeros
> + *
> + * This instruction counts the number of consecutive binary zero bits,
> + * starting from the least significant bit in the source register,
> + * and places the count in the destination register.
> + *
> + * This instruction requires FEAT_CSSC.
> + */
> + /* CTZ Xd, Xn */
> + return 0xdac01800 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_rbit64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.320 RBIT
> + *
> + * Reverse bits
> + *
> + * This instruction reverses the bit order in a register.
> + */
> + /* RBIT Xd, Xn */
> + return 0xdac00000 | (rn << 5) | rd;
> +}
Instead of hardcoding the instructions with the above functions, do it the
proper way something like the following patch (not compile tested):
-- >8 --
diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 18c7811774d3..b2696af0b817 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
AARCH64_INSN_DATA1_REVERSE_16,
AARCH64_INSN_DATA1_REVERSE_32,
AARCH64_INSN_DATA1_REVERSE_64,
+ AARCH64_INSN_DATA1_RBIT,
+ AARCH64_INSN_DATA1_CLZ,
+ AARCH64_INSN_DATA1_CTZ,
};
enum aarch64_insn_data2_type {
@@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv, 0x7FE0FC00, 0x1AC02C00)
__AARCH64_INSN_FUNCS(rev16, 0x7FFFFC00, 0x5AC00400)
__AARCH64_INSN_FUNCS(rev32, 0x7FFFFC00, 0x5AC00800)
__AARCH64_INSN_FUNCS(rev64, 0x7FFFFC00, 0x5AC00C00)
+__AARCH64_INSN_FUNCS(rbit, 0x7FFFFC00, 0x5AC00000)
+__AARCH64_INSN_FUNCS(clz, 0x7FFFFC00, 0x5AC01000)
+__AARCH64_INSN_FUNCS(ctz, 0x7FFFFC00, 0x5AC01800)
__AARCH64_INSN_FUNCS(and, 0x7F200000, 0x0A000000)
__AARCH64_INSN_FUNCS(bic, 0x7F200000, 0x0A200000)
__AARCH64_INSN_FUNCS(orr, 0x7F200000, 0x2A000000)
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index 4e298baddc2e..2229ab596cda 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
}
insn = aarch64_insn_get_rev64_value();
break;
+ case AARCH64_INSN_DATA1_CLZ:
+ insn = aarch64_insn_get_clz_value();
+ break;
+ case AARCH64_INSN_DATA1_RBIT:
+ insn = aarch64_insn_get_rbit_value();
+ break;
+ case AARCH64_INSN_DATA1_CTZ:
+ insn = aarch64_insn_get_ctz_value();
+ break;
default:
pr_err("%s: unknown data1 encoding %d\n", __func__, type);
return AARCH64_BREAK_FAULT;
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index bbea4f36f9f2..af806c39dadb 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -248,6 +248,12 @@
#define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
#define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
#define A64_REV64(Rd, Rn) A64_DATA1(1, Rd, Rn, REVERSE_64)
+/* Rd = RBIT(Rn) */
+#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
+/* Rd = CLZ(Rn) */
+#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
+/* Rd = CTZ(Rn) */
+#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)
/* Data-processing (2 source) */
/* Rd = Rn OP Rm */
-- 8< --
Thanks,
Puranjay
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
2026-02-19 15:10 ` Puranjay Mohan
@ 2026-02-19 15:20 ` Puranjay Mohan
2026-02-19 15:25 ` Puranjay Mohan
2 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-19 15:20 UTC (permalink / raw)
To: Leon Hwang, bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst, Viktor Malik,
linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
kernel-patches-bot
Leon Hwang <leon.hwang@linux.dev> writes:
> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>
> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
> inlined via RBIT + CLZ, or via the native CTZ instruction when
> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
> via RORV.
>
> bpf_popcnt64() is not inlined as the native population count instruction
> requires NEON/SIMD registers, which should not be touched from BPF
> programs. It therefore falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
> 1 file changed, 123 insertions(+)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 7a530ea4f5ae..f03f732063d9 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
> return 0;
> }
>
> +static inline u32 a64_clz64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.91 CLZ
> + *
> + * Count leading zeros
> + *
> + * This instruction counts the number of consecutive binary zero bits,
> + * starting from the most significant bit in the source register,
> + * and places the count in the destination register.
> + */
> + /* CLZ Xd, Xn */
> + return 0xdac01000 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_ctz64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.144 CTZ
> + *
> + * Count trailing zeros
> + *
> + * This instruction counts the number of consecutive binary zero bits,
> + * starting from the least significant bit in the source register,
> + * and places the count in the destination register.
> + *
> + * This instruction requires FEAT_CSSC.
> + */
> + /* CTZ Xd, Xn */
> + return 0xdac01800 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_rbit64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.320 RBIT
> + *
> + * Reverse bits
> + *
> + * This instruction reverses the bit order in a register.
> + */
> + /* RBIT Xd, Xn */
> + return 0xdac00000 | (rn << 5) | rd;
> +}
I don't think adding the above three functions is the best to JIT these
intructions, do it like the other data1 and data2 instructions and add
them to the generic framework like the following patch(untested) does:
-- >8 --
diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 18c7811774d3..b2696af0b817 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
AARCH64_INSN_DATA1_REVERSE_16,
AARCH64_INSN_DATA1_REVERSE_32,
AARCH64_INSN_DATA1_REVERSE_64,
+ AARCH64_INSN_DATA1_RBIT,
+ AARCH64_INSN_DATA1_CLZ,
+ AARCH64_INSN_DATA1_CTZ,
};
enum aarch64_insn_data2_type {
@@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv, 0x7FE0FC00, 0x1AC02C00)
__AARCH64_INSN_FUNCS(rev16, 0x7FFFFC00, 0x5AC00400)
__AARCH64_INSN_FUNCS(rev32, 0x7FFFFC00, 0x5AC00800)
__AARCH64_INSN_FUNCS(rev64, 0x7FFFFC00, 0x5AC00C00)
+__AARCH64_INSN_FUNCS(rbit, 0x7FFFFC00, 0x5AC00000)
+__AARCH64_INSN_FUNCS(clz, 0x7FFFFC00, 0x5AC01000)
+__AARCH64_INSN_FUNCS(ctz, 0x7FFFFC00, 0x5AC01800)
__AARCH64_INSN_FUNCS(and, 0x7F200000, 0x0A000000)
__AARCH64_INSN_FUNCS(bic, 0x7F200000, 0x0A200000)
__AARCH64_INSN_FUNCS(orr, 0x7F200000, 0x2A000000)
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index 4e298baddc2e..2229ab596cda 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
}
insn = aarch64_insn_get_rev64_value();
break;
+ case AARCH64_INSN_DATA1_CLZ:
+ insn = aarch64_insn_get_clz_value();
+ break;
+ case AARCH64_INSN_DATA1_RBIT:
+ insn = aarch64_insn_get_rbit_value();
+ break;
+ case AARCH64_INSN_DATA1_CTZ:
+ insn = aarch64_insn_get_ctz_value();
+ break;
default:
pr_err("%s: unknown data1 encoding %d\n", __func__, type);
return AARCH64_BREAK_FAULT;
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index bbea4f36f9f2..af806c39dadb 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -248,6 +248,12 @@
#define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
#define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
#define A64_REV64(Rd, Rn) A64_DATA1(1, Rd, Rn, REVERSE_64)
+/* Rd = RBIT(Rn) */
+#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
+/* Rd = CLZ(Rn) */
+#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
+/* Rd = CTZ(Rn) */
+#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)
/* Data-processing (2 source) */
/* Rd = Rn OP Rm */
-- 8< --
Thanks,
Puranjay
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
2026-02-19 15:10 ` Puranjay Mohan
2026-02-19 15:20 ` Puranjay Mohan
@ 2026-02-19 15:25 ` Puranjay Mohan
2026-02-19 15:36 ` Leon Hwang
2 siblings, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-19 15:25 UTC (permalink / raw)
To: Leon Hwang, bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst, Viktor Malik,
linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
kernel-patches-bot
Leon Hwang <leon.hwang@linux.dev> writes:
> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>
> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
> inlined via RBIT + CLZ, or via the native CTZ instruction when
> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
> via RORV.
>
> bpf_popcnt64() is not inlined as the native population count instruction
> requires NEON/SIMD registers, which should not be touched from BPF
> programs. It therefore falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
> 1 file changed, 123 insertions(+)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 7a530ea4f5ae..f03f732063d9 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
> return 0;
> }
>
> +static inline u32 a64_clz64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.91 CLZ
> + *
> + * Count leading zeros
> + *
> + * This instruction counts the number of consecutive binary zero bits,
> + * starting from the most significant bit in the source register,
> + * and places the count in the destination register.
> + */
> + /* CLZ Xd, Xn */
> + return 0xdac01000 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_ctz64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.144 CTZ
> + *
> + * Count trailing zeros
> + *
> + * This instruction counts the number of consecutive binary zero bits,
> + * starting from the least significant bit in the source register,
> + * and places the count in the destination register.
> + *
> + * This instruction requires FEAT_CSSC.
> + */
> + /* CTZ Xd, Xn */
> + return 0xdac01800 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_rbit64(u8 rd, u8 rn)
> +{
> + /*
> + * Arm Architecture Reference Manual for A-profile architecture
> + * (Document number: ARM DDI 0487)
> + *
> + * A64 Base Instruction Descriptions
> + * C6.2 Alphabetical list of A64 base instructions
> + *
> + * C6.2.320 RBIT
> + *
> + * Reverse bits
> + *
> + * This instruction reverses the bit order in a register.
> + */
> + /* RBIT Xd, Xn */
> + return 0xdac00000 | (rn << 5) | rd;
> +}
I don't think adding the above three functions is the best to JIT these
intructions, do it like the other data1 and data2 instructions and add
them to the generic framework like the following patch(untested) does:
-- >8 --
diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 18c7811774d3..b2696af0b817 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
AARCH64_INSN_DATA1_REVERSE_16,
AARCH64_INSN_DATA1_REVERSE_32,
AARCH64_INSN_DATA1_REVERSE_64,
+ AARCH64_INSN_DATA1_RBIT,
+ AARCH64_INSN_DATA1_CLZ,
+ AARCH64_INSN_DATA1_CTZ,
};
enum aarch64_insn_data2_type {
@@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv, 0x7FE0FC00, 0x1AC02C00)
__AARCH64_INSN_FUNCS(rev16, 0x7FFFFC00, 0x5AC00400)
__AARCH64_INSN_FUNCS(rev32, 0x7FFFFC00, 0x5AC00800)
__AARCH64_INSN_FUNCS(rev64, 0x7FFFFC00, 0x5AC00C00)
+__AARCH64_INSN_FUNCS(rbit, 0x7FFFFC00, 0x5AC00000)
+__AARCH64_INSN_FUNCS(clz, 0x7FFFFC00, 0x5AC01000)
+__AARCH64_INSN_FUNCS(ctz, 0x7FFFFC00, 0x5AC01800)
__AARCH64_INSN_FUNCS(and, 0x7F200000, 0x0A000000)
__AARCH64_INSN_FUNCS(bic, 0x7F200000, 0x0A200000)
__AARCH64_INSN_FUNCS(orr, 0x7F200000, 0x2A000000)
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index 4e298baddc2e..2229ab596cda 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
}
insn = aarch64_insn_get_rev64_value();
break;
+ case AARCH64_INSN_DATA1_CLZ:
+ insn = aarch64_insn_get_clz_value();
+ break;
+ case AARCH64_INSN_DATA1_RBIT:
+ insn = aarch64_insn_get_rbit_value();
+ break;
+ case AARCH64_INSN_DATA1_CTZ:
+ insn = aarch64_insn_get_ctz_value();
+ break;
default:
pr_err("%s: unknown data1 encoding %d\n", __func__, type);
return AARCH64_BREAK_FAULT;
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index bbea4f36f9f2..af806c39dadb 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -248,6 +248,12 @@
#define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
#define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
#define A64_REV64(Rd, Rn) A64_DATA1(1, Rd, Rn, REVERSE_64)
+/* Rd = RBIT(Rn) */
+#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
+/* Rd = CLZ(Rn) */
+#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
+/* Rd = CTZ(Rn) */
+#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)
/* Data-processing (2 source) */
/* Rd = Rn OP Rm */
-- 8< --
Thanks,
Puranjay
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
2026-02-19 15:25 ` Puranjay Mohan
@ 2026-02-19 15:36 ` Leon Hwang
0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-19 15:36 UTC (permalink / raw)
To: Puranjay Mohan, bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
Shuah Khan, Peilin Ye, Luis Gerhorst, Viktor Malik,
linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
kernel-patches-bot
On 2026/2/19 23:25, Puranjay Mohan wrote:
> Leon Hwang <leon.hwang@linux.dev> writes:
>
>> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>>
>> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
>> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
>> inlined via RBIT + CLZ, or via the native CTZ instruction when
>> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
>> via RORV.
>>
>> bpf_popcnt64() is not inlined as the native population count instruction
>> requires NEON/SIMD registers, which should not be touched from BPF
>> programs. It therefore falls back to a regular function call.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>> arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
>> 1 file changed, 123 insertions(+)
>>
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index 7a530ea4f5ae..f03f732063d9 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
>> return 0;
>> }
>>
>> +static inline u32 a64_clz64(u8 rd, u8 rn)
>> +{
>> + /*
>> + * Arm Architecture Reference Manual for A-profile architecture
>> + * (Document number: ARM DDI 0487)
>> + *
>> + * A64 Base Instruction Descriptions
>> + * C6.2 Alphabetical list of A64 base instructions
>> + *
>> + * C6.2.91 CLZ
>> + *
>> + * Count leading zeros
>> + *
>> + * This instruction counts the number of consecutive binary zero bits,
>> + * starting from the most significant bit in the source register,
>> + * and places the count in the destination register.
>> + */
>> + /* CLZ Xd, Xn */
>> + return 0xdac01000 | (rn << 5) | rd;
>> +}
>> +
>> +static inline u32 a64_ctz64(u8 rd, u8 rn)
>> +{
>> + /*
>> + * Arm Architecture Reference Manual for A-profile architecture
>> + * (Document number: ARM DDI 0487)
>> + *
>> + * A64 Base Instruction Descriptions
>> + * C6.2 Alphabetical list of A64 base instructions
>> + *
>> + * C6.2.144 CTZ
>> + *
>> + * Count trailing zeros
>> + *
>> + * This instruction counts the number of consecutive binary zero bits,
>> + * starting from the least significant bit in the source register,
>> + * and places the count in the destination register.
>> + *
>> + * This instruction requires FEAT_CSSC.
>> + */
>> + /* CTZ Xd, Xn */
>> + return 0xdac01800 | (rn << 5) | rd;
>> +}
>> +
>> +static inline u32 a64_rbit64(u8 rd, u8 rn)
>> +{
>> + /*
>> + * Arm Architecture Reference Manual for A-profile architecture
>> + * (Document number: ARM DDI 0487)
>> + *
>> + * A64 Base Instruction Descriptions
>> + * C6.2 Alphabetical list of A64 base instructions
>> + *
>> + * C6.2.320 RBIT
>> + *
>> + * Reverse bits
>> + *
>> + * This instruction reverses the bit order in a register.
>> + */
>> + /* RBIT Xd, Xn */
>> + return 0xdac00000 | (rn << 5) | rd;
>> +}
>
> I don't think adding the above three functions is the best to JIT these
> intructions, do it like the other data1 and data2 instructions and add
> them to the generic framework like the following patch(untested) does:
> > -- >8 --
>
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index 18c7811774d3..b2696af0b817 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
> AARCH64_INSN_DATA1_REVERSE_16,
> AARCH64_INSN_DATA1_REVERSE_32,
> AARCH64_INSN_DATA1_REVERSE_64,
> + AARCH64_INSN_DATA1_RBIT,
> + AARCH64_INSN_DATA1_CLZ,
> + AARCH64_INSN_DATA1_CTZ,
> };
>
> enum aarch64_insn_data2_type {
> @@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv, 0x7FE0FC00, 0x1AC02C00)
> __AARCH64_INSN_FUNCS(rev16, 0x7FFFFC00, 0x5AC00400)
> __AARCH64_INSN_FUNCS(rev32, 0x7FFFFC00, 0x5AC00800)
> __AARCH64_INSN_FUNCS(rev64, 0x7FFFFC00, 0x5AC00C00)
> +__AARCH64_INSN_FUNCS(rbit, 0x7FFFFC00, 0x5AC00000)
> +__AARCH64_INSN_FUNCS(clz, 0x7FFFFC00, 0x5AC01000)
> +__AARCH64_INSN_FUNCS(ctz, 0x7FFFFC00, 0x5AC01800)
> __AARCH64_INSN_FUNCS(and, 0x7F200000, 0x0A000000)
> __AARCH64_INSN_FUNCS(bic, 0x7F200000, 0x0A200000)
> __AARCH64_INSN_FUNCS(orr, 0x7F200000, 0x2A000000)
> diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
> index 4e298baddc2e..2229ab596cda 100644
> --- a/arch/arm64/lib/insn.c
> +++ b/arch/arm64/lib/insn.c
> @@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
> }
> insn = aarch64_insn_get_rev64_value();
> break;
> + case AARCH64_INSN_DATA1_CLZ:
> + insn = aarch64_insn_get_clz_value();
> + break;
> + case AARCH64_INSN_DATA1_RBIT:
> + insn = aarch64_insn_get_rbit_value();
> + break;
> + case AARCH64_INSN_DATA1_CTZ:
> + insn = aarch64_insn_get_ctz_value();
> + break;
> default:
> pr_err("%s: unknown data1 encoding %d\n", __func__, type);
> return AARCH64_BREAK_FAULT;
> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
> index bbea4f36f9f2..af806c39dadb 100644
> --- a/arch/arm64/net/bpf_jit.h
> +++ b/arch/arm64/net/bpf_jit.h
> @@ -248,6 +248,12 @@
> #define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
> #define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
> #define A64_REV64(Rd, Rn) A64_DATA1(1, Rd, Rn, REVERSE_64)
> +/* Rd = RBIT(Rn) */
> +#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
> +/* Rd = CLZ(Rn) */
> +#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
> +/* Rd = CTZ(Rn) */
> +#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)
>
> /* Data-processing (2 source) */
> /* Rd = Rn OP Rm */
>
> -- 8< --
>
> Thanks,
> Puranjay
Ack.
I'll do it in the next revision.
Thanks,
Leon
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
@ 2026-02-19 17:47 ` Alexei Starovoitov
2026-02-20 15:54 ` Leon Hwang
2026-02-19 22:05 ` kernel test robot
2026-02-20 11:59 ` kernel test robot
2 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2026-02-19 17:47 UTC (permalink / raw)
To: Leon Hwang
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, LKML, Network Development,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
>
> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
>
> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> X86_FEATURE_BMI1 (TZCNT).
>
> bpf_clz64() and bpf_fls64() are supported when the CPU has
> X86_FEATURE_ABM (LZCNT).
>
> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
>
> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> instruction, so it falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 141 insertions(+)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 070ba80e39d7..193e1e2d7aa8 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -19,6 +19,7 @@
> #include <asm/text-patching.h>
> #include <asm/unwind.h>
> #include <asm/cfi.h>
> +#include <asm/cpufeatures.h>
>
> static bool all_callee_regs_used[4] = {true, true, true, true};
>
> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
> *pprog = prog;
> }
>
> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> +{
> + bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> + bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> + bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> + bool inlined = true;
> + u8 *prog = *pprog;
> +
> + /*
> + * x86 Bit manipulation instruction set
> + * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> + */
> +
> + if (func == bpf_clz64 && has_abm) {
> + /*
> + * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> + *
> + * LZCNT - Count the Number of Leading Zero Bits
> + *
> + * Opcode/Instruction
> + * F3 REX.W 0F BD /r
> + * LZCNT r64, r/m64
> + *
> + * Op/En
> + * RVM
> + *
> + * 64/32-bit Mode
> + * V/N.E.
> + *
> + * CPUID Feature Flag
> + * LZCNT
> + *
> + * Description
> + * Count the number of leading zero bits in r/m64, return
> + * result in r64.
> + */
> + /* emit: x ? 64 - fls64(x) : 64 */
> + /* lzcnt rax, rdi */
> + EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
Instead of emitting binary in x86 and arm JITs,
let's use in kernel disasm to check that all these kfuncs
conform to kf_fastcall (don't use unnecessary registers,
don't have calls to other functions) and then copy the binary
from code and skip the last 'ret' insn.
This way we can inline all kinds of kfuncs.
pw-bot: cr
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
@ 2026-02-19 17:50 ` Alexei Starovoitov
2026-02-20 15:34 ` Leon Hwang
2026-02-21 9:58 ` Dan Carpenter
1 sibling, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2026-02-19 17:50 UTC (permalink / raw)
To: Leon Hwang
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, LKML, Network Development,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
> +static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
> +{
> + if (!(flags & KF_FASTCALL))
> + return false;
> +
> + if (!env->prog->jit_requested)
> + return true;
> +
> + if (func_id == special_kfunc_list[KF_bpf_clz64])
> + return bpf_jit_inlines_kfunc_call(bpf_clz64);
> + if (func_id == special_kfunc_list[KF_bpf_ctz64])
> + return bpf_jit_inlines_kfunc_call(bpf_ctz64);
> + if (func_id == special_kfunc_list[KF_bpf_ffs64])
> + return bpf_jit_inlines_kfunc_call(bpf_ffs64);
> + if (func_id == special_kfunc_list[KF_bpf_fls64])
> + return bpf_jit_inlines_kfunc_call(bpf_fls64);
> + if (func_id == special_kfunc_list[KF_bpf_bitrev64])
> + return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
> + if (func_id == special_kfunc_list[KF_bpf_popcnt64])
> + return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
> + if (func_id == special_kfunc_list[KF_bpf_rol64])
> + return bpf_jit_inlines_kfunc_call(bpf_rol64);
> + if (func_id == special_kfunc_list[KF_bpf_ror64])
> + return bpf_jit_inlines_kfunc_call(bpf_ror64);
This is too ugly. Find a way to do it differently.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
2026-02-19 17:47 ` Alexei Starovoitov
@ 2026-02-19 22:05 ` kernel test robot
2026-02-20 14:12 ` Leon Hwang
2026-02-20 11:59 ` kernel test robot
2 siblings, 1 reply; 24+ messages in thread
From: kernel test robot @ 2026-02-19 22:05 UTC (permalink / raw)
To: Leon Hwang, bpf
Cc: oe-kbuild-all, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Shuah Khan, Leon Hwang,
Peilin Ye, Luis Gerhorst, Viktor Malik, linux-arm-kernel,
linux-kernel
Hi Leon,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Leon-Hwang/bpf-Introduce-64-bit-bitops-kfuncs/20260219-223550
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20260219142933.13904-3-leon.hwang%40linux.dev
patch subject: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
config: x86_64-randconfig-073-20260220 (https://download.01.org/0day-ci/archive/20260220/202602200536.JWzGHAc6-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260220/202602200536.JWzGHAc6-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602200536.JWzGHAc6-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: arch/x86/net/bpf_jit_comp.o: in function `bpf_inlines_func_call':
>> arch/x86/net/bpf_jit_comp.c:1621:(.text+0xe70b): undefined reference to `bpf_clz64'
>> ld: arch/x86/net/bpf_jit_comp.c:1647:(.text+0xe718): undefined reference to `bpf_ctz64'
>> ld: arch/x86/net/bpf_jit_comp.c:1673:(.text+0xe725): undefined reference to `bpf_ffs64'
>> ld: arch/x86/net/bpf_jit_comp.c:1677:(.text+0xe732): undefined reference to `bpf_fls64'
>> ld: arch/x86/net/bpf_jit_comp.c:1683:(.text+0xe743): undefined reference to `bpf_popcnt64'
>> ld: arch/x86/net/bpf_jit_comp.c:1707:(.text+0xe758): undefined reference to `bpf_rol64'
>> ld: arch/x86/net/bpf_jit_comp.c:1714:(.text+0xe765): undefined reference to `bpf_ror64'
ld: arch/x86/net/bpf_jit_comp.c:1647:(.text+0x10e85): undefined reference to `bpf_ctz64'
ld: arch/x86/net/bpf_jit_comp.c:1673:(.text+0x10e92): undefined reference to `bpf_ffs64'
ld: arch/x86/net/bpf_jit_comp.o: in function `bpf_jit_inlines_kfunc_call':
>> arch/x86/net/bpf_jit_comp.c:4247:(.text+0x177c8): undefined reference to `bpf_ffs64'
ld: arch/x86/net/bpf_jit_comp.c:4247:(.text+0x177d1): undefined reference to `bpf_ctz64'
ld: arch/x86/net/bpf_jit_comp.c:4250:(.text+0x177da): undefined reference to `bpf_fls64'
>> ld: arch/x86/net/bpf_jit_comp.c:4250:(.text+0x177e3): undefined reference to `bpf_clz64'
ld: arch/x86/net/bpf_jit_comp.c:4253:(.text+0x177ec): undefined reference to `bpf_popcnt64'
ld: arch/x86/net/bpf_jit_comp.c:4256:(.text+0x177f5): undefined reference to `bpf_ror64'
ld: arch/x86/net/bpf_jit_comp.c:4256:(.text+0x177ff): undefined reference to `bpf_rol64'
vim +1621 arch/x86/net/bpf_jit_comp.c
1607
1608 static bool bpf_inlines_func_call(u8 **pprog, void *func)
1609 {
1610 bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
1611 bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
1612 bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
1613 bool inlined = true;
1614 u8 *prog = *pprog;
1615
1616 /*
1617 * x86 Bit manipulation instruction set
1618 * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
1619 */
1620
> 1621 if (func == bpf_clz64 && has_abm) {
1622 /*
1623 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
1624 *
1625 * LZCNT - Count the Number of Leading Zero Bits
1626 *
1627 * Opcode/Instruction
1628 * F3 REX.W 0F BD /r
1629 * LZCNT r64, r/m64
1630 *
1631 * Op/En
1632 * RVM
1633 *
1634 * 64/32-bit Mode
1635 * V/N.E.
1636 *
1637 * CPUID Feature Flag
1638 * LZCNT
1639 *
1640 * Description
1641 * Count the number of leading zero bits in r/m64, return
1642 * result in r64.
1643 */
1644 /* emit: x ? 64 - fls64(x) : 64 */
1645 /* lzcnt rax, rdi */
1646 EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> 1647 } else if (func == bpf_ctz64 && has_bmi1) {
1648 /*
1649 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
1650 *
1651 * TZCNT - Count the Number of Trailing Zero Bits
1652 *
1653 * Opcode/Instruction
1654 * F3 REX.W 0F BC /r
1655 * TZCNT r64, r/m64
1656 *
1657 * Op/En
1658 * RVM
1659 *
1660 * 64/32-bit Mode
1661 * V/N.E.
1662 *
1663 * CPUID Feature Flag
1664 * BMI1
1665 *
1666 * Description
1667 * Count the number of trailing zero bits in r/m64, return
1668 * result in r64.
1669 */
1670 /* emit: x ? __ffs64(x) : 64 */
1671 /* tzcnt rax, rdi */
1672 EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
> 1673 } else if (func == bpf_ffs64 && has_bmi1) {
1674 /* emit: __ffs64(x); x == 0 has been handled in verifier */
1675 /* tzcnt rax, rdi */
1676 EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
> 1677 } else if (func == bpf_fls64 && has_abm) {
1678 /* emit: fls64(x) */
1679 /* lzcnt rax, rdi */
1680 EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
1681 EMIT3(0x48, 0xF7, 0xD8); /* neg rax */
1682 EMIT4(0x48, 0x83, 0xC0, 0x40); /* add rax, 64 */
> 1683 } else if (func == bpf_popcnt64 && has_popcnt) {
1684 /*
1685 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
1686 *
1687 * POPCNT - Return the Count of Number of Bits Set to 1
1688 *
1689 * Opcode/Instruction
1690 * F3 REX.W 0F B8 /r
1691 * POPCNT r64, r/m64
1692 *
1693 * Op/En
1694 * RM
1695 *
1696 * 64 Mode
1697 * Valid
1698 *
1699 * Compat/Leg Mode
1700 * N.E.
1701 *
1702 * Description
1703 * POPCNT on r/m64
1704 */
1705 /* popcnt rax, rdi */
1706 EMIT5(0xF3, 0x48, 0x0F, 0xB8, 0xC7);
> 1707 } else if (func == bpf_rol64) {
1708 EMIT1(0x51); /* push rcx */
1709 /* emit: rol64(x, s) */
1710 EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
1711 EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
1712 EMIT3(0x48, 0xD3, 0xC0); /* rol rax, cl */
1713 EMIT1(0x59); /* pop rcx */
> 1714 } else if (func == bpf_ror64) {
1715 EMIT1(0x51); /* push rcx */
1716 /* emit: ror64(x, s) */
1717 EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
1718 EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
1719 EMIT3(0x48, 0xD3, 0xC8); /* ror rax, cl */
1720 EMIT1(0x59); /* pop rcx */
1721 } else {
1722 inlined = false;
1723 }
1724
1725 *pprog = prog;
1726 return inlined;
1727 }
1728
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
2026-02-19 17:47 ` Alexei Starovoitov
2026-02-19 22:05 ` kernel test robot
@ 2026-02-20 11:59 ` kernel test robot
2 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2026-02-20 11:59 UTC (permalink / raw)
To: Leon Hwang, bpf
Cc: llvm, oe-kbuild-all, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Shuah Khan, Leon Hwang,
Peilin Ye, Luis Gerhorst, Viktor Malik, linux-arm-kernel,
linux-kernel
Hi Leon,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Leon-Hwang/bpf-Introduce-64-bit-bitops-kfuncs/20260219-223550
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20260219142933.13904-3-leon.hwang%40linux.dev
patch subject: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
config: x86_64-randconfig-012-20260220 (https://download.01.org/0day-ci/archive/20260220/202602201931.LBZGbpvs-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260220/202602201931.LBZGbpvs-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602201931.LBZGbpvs-lkp@intel.com/
All errors (new ones prefixed by >>):
>> ld.lld: error: undefined symbol: bpf_clz64
>>> referenced by bpf_jit_comp.c:1621 (arch/x86/net/bpf_jit_comp.c:1621)
>>> arch/x86/net/bpf_jit_comp.o:(do_jit) in archive vmlinux.a
>>> referenced by bpf_jit_comp.c:4250 (arch/x86/net/bpf_jit_comp.c:4250)
>>> arch/x86/net/bpf_jit_comp.o:(bpf_jit_inlines_kfunc_call) in archive vmlinux.a
--
>> ld.lld: error: undefined symbol: bpf_ctz64
>>> referenced by bpf_jit_comp.c:1647 (arch/x86/net/bpf_jit_comp.c:1647)
>>> arch/x86/net/bpf_jit_comp.o:(do_jit) in archive vmlinux.a
>>> referenced by bpf_jit_comp.c:4247 (arch/x86/net/bpf_jit_comp.c:4247)
>>> arch/x86/net/bpf_jit_comp.o:(bpf_jit_inlines_kfunc_call) in archive vmlinux.a
--
>> ld.lld: error: undefined symbol: bpf_ffs64
>>> referenced by bpf_jit_comp.c:1673 (arch/x86/net/bpf_jit_comp.c:1673)
>>> arch/x86/net/bpf_jit_comp.o:(do_jit) in archive vmlinux.a
>>> referenced by bpf_jit_comp.c:4247 (arch/x86/net/bpf_jit_comp.c:4247)
>>> arch/x86/net/bpf_jit_comp.o:(bpf_jit_inlines_kfunc_call) in archive vmlinux.a
--
>> ld.lld: error: undefined symbol: bpf_fls64
>>> referenced by bpf_jit_comp.c:1677 (arch/x86/net/bpf_jit_comp.c:1677)
>>> arch/x86/net/bpf_jit_comp.o:(do_jit) in archive vmlinux.a
>>> referenced by bpf_jit_comp.c:4250 (arch/x86/net/bpf_jit_comp.c:4250)
>>> arch/x86/net/bpf_jit_comp.o:(bpf_jit_inlines_kfunc_call) in archive vmlinux.a
--
>> ld.lld: error: undefined symbol: bpf_popcnt64
>>> referenced by bpf_jit_comp.c:1683 (arch/x86/net/bpf_jit_comp.c:1683)
>>> arch/x86/net/bpf_jit_comp.o:(do_jit) in archive vmlinux.a
>>> referenced by bpf_jit_comp.c:4253 (arch/x86/net/bpf_jit_comp.c:4253)
>>> arch/x86/net/bpf_jit_comp.o:(bpf_jit_inlines_kfunc_call) in archive vmlinux.a
--
>> ld.lld: error: undefined symbol: bpf_rol64
>>> referenced by bpf_jit_comp.c:1707 (arch/x86/net/bpf_jit_comp.c:1707)
>>> arch/x86/net/bpf_jit_comp.o:(do_jit) in archive vmlinux.a
>>> referenced by bpf_jit_comp.c:4256 (arch/x86/net/bpf_jit_comp.c:4256)
>>> arch/x86/net/bpf_jit_comp.o:(bpf_jit_inlines_kfunc_call) in archive vmlinux.a
--
>> ld.lld: error: undefined symbol: bpf_ror64
>>> referenced by bpf_jit_comp.c:1714 (arch/x86/net/bpf_jit_comp.c:1714)
>>> arch/x86/net/bpf_jit_comp.o:(do_jit) in archive vmlinux.a
>>> referenced by bpf_jit_comp.c:4256 (arch/x86/net/bpf_jit_comp.c:4256)
>>> arch/x86/net/bpf_jit_comp.o:(bpf_jit_inlines_kfunc_call) in archive vmlinux.a
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-19 22:05 ` kernel test robot
@ 2026-02-20 14:12 ` Leon Hwang
0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-20 14:12 UTC (permalink / raw)
To: kernel test robot, bpf
Cc: oe-kbuild-all, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Shuah Khan, Peilin Ye,
Luis Gerhorst, Viktor Malik, linux-arm-kernel, linux-kernel
On 2026/2/20 06:05, kernel test robot wrote:
> Hi Leon,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on bpf-next/master]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Leon-Hwang/bpf-Introduce-64-bit-bitops-kfuncs/20260219-223550
> base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link: https://lore.kernel.org/r/20260219142933.13904-3-leon.hwang%40linux.dev
> patch subject: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
> config: x86_64-randconfig-073-20260220 (https://download.01.org/0day-ci/archive/20260220/202602200536.JWzGHAc6-lkp@intel.com/config)
Ack.
It was caused by the missing CONFIG_BPF_SYSCALL.
$ rg _BPF .config
118:CONFIG_BPF=y
120:CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
125:# CONFIG_BPF_SYSCALL is not set
126:CONFIG_BPF_JIT=y
127:CONFIG_BPF_JIT_DEFAULT_ON=y
1339:CONFIG_LWTUNNEL_BPF=y
7449:CONFIG_IO_URING_BPF=y
I'll make those symbols relied on CONFIG_BPF_SYSCALL in the next revision.
Thanks,
Leon
> compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260220/202602200536.JWzGHAc6-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202602200536.JWzGHAc6-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> ld: arch/x86/net/bpf_jit_comp.o: in function `bpf_inlines_func_call':
>>> arch/x86/net/bpf_jit_comp.c:1621:(.text+0xe70b): undefined reference to `bpf_clz64'
>>> ld: arch/x86/net/bpf_jit_comp.c:1647:(.text+0xe718): undefined reference to `bpf_ctz64'
>>> ld: arch/x86/net/bpf_jit_comp.c:1673:(.text+0xe725): undefined reference to `bpf_ffs64'
>>> ld: arch/x86/net/bpf_jit_comp.c:1677:(.text+0xe732): undefined reference to `bpf_fls64'
>>> ld: arch/x86/net/bpf_jit_comp.c:1683:(.text+0xe743): undefined reference to `bpf_popcnt64'
>>> ld: arch/x86/net/bpf_jit_comp.c:1707:(.text+0xe758): undefined reference to `bpf_rol64'
>>> ld: arch/x86/net/bpf_jit_comp.c:1714:(.text+0xe765): undefined reference to `bpf_ror64'
> ld: arch/x86/net/bpf_jit_comp.c:1647:(.text+0x10e85): undefined reference to `bpf_ctz64'
> ld: arch/x86/net/bpf_jit_comp.c:1673:(.text+0x10e92): undefined reference to `bpf_ffs64'
> ld: arch/x86/net/bpf_jit_comp.o: in function `bpf_jit_inlines_kfunc_call':
>>> arch/x86/net/bpf_jit_comp.c:4247:(.text+0x177c8): undefined reference to `bpf_ffs64'
> ld: arch/x86/net/bpf_jit_comp.c:4247:(.text+0x177d1): undefined reference to `bpf_ctz64'
> ld: arch/x86/net/bpf_jit_comp.c:4250:(.text+0x177da): undefined reference to `bpf_fls64'
>>> ld: arch/x86/net/bpf_jit_comp.c:4250:(.text+0x177e3): undefined reference to `bpf_clz64'
> ld: arch/x86/net/bpf_jit_comp.c:4253:(.text+0x177ec): undefined reference to `bpf_popcnt64'
> ld: arch/x86/net/bpf_jit_comp.c:4256:(.text+0x177f5): undefined reference to `bpf_ror64'
> ld: arch/x86/net/bpf_jit_comp.c:4256:(.text+0x177ff): undefined reference to `bpf_rol64'
>
>
> vim +1621 arch/x86/net/bpf_jit_comp.c
>
> 1607
> 1608 static bool bpf_inlines_func_call(u8 **pprog, void *func)
> 1609 {
> 1610 bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> 1611 bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> 1612 bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> 1613 bool inlined = true;
> 1614 u8 *prog = *pprog;
> 1615
> 1616 /*
> 1617 * x86 Bit manipulation instruction set
> 1618 * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> 1619 */
> 1620
>> 1621 if (func == bpf_clz64 && has_abm) {
> 1622 /*
> 1623 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> 1624 *
> 1625 * LZCNT - Count the Number of Leading Zero Bits
> 1626 *
> 1627 * Opcode/Instruction
> 1628 * F3 REX.W 0F BD /r
> 1629 * LZCNT r64, r/m64
> 1630 *
> 1631 * Op/En
> 1632 * RVM
> 1633 *
> 1634 * 64/32-bit Mode
> 1635 * V/N.E.
> 1636 *
> 1637 * CPUID Feature Flag
> 1638 * LZCNT
> 1639 *
> 1640 * Description
> 1641 * Count the number of leading zero bits in r/m64, return
> 1642 * result in r64.
> 1643 */
> 1644 /* emit: x ? 64 - fls64(x) : 64 */
> 1645 /* lzcnt rax, rdi */
> 1646 EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
>> 1647 } else if (func == bpf_ctz64 && has_bmi1) {
> 1648 /*
> 1649 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> 1650 *
> 1651 * TZCNT - Count the Number of Trailing Zero Bits
> 1652 *
> 1653 * Opcode/Instruction
> 1654 * F3 REX.W 0F BC /r
> 1655 * TZCNT r64, r/m64
> 1656 *
> 1657 * Op/En
> 1658 * RVM
> 1659 *
> 1660 * 64/32-bit Mode
> 1661 * V/N.E.
> 1662 *
> 1663 * CPUID Feature Flag
> 1664 * BMI1
> 1665 *
> 1666 * Description
> 1667 * Count the number of trailing zero bits in r/m64, return
> 1668 * result in r64.
> 1669 */
> 1670 /* emit: x ? __ffs64(x) : 64 */
> 1671 /* tzcnt rax, rdi */
> 1672 EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
>> 1673 } else if (func == bpf_ffs64 && has_bmi1) {
> 1674 /* emit: __ffs64(x); x == 0 has been handled in verifier */
> 1675 /* tzcnt rax, rdi */
> 1676 EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
>> 1677 } else if (func == bpf_fls64 && has_abm) {
> 1678 /* emit: fls64(x) */
> 1679 /* lzcnt rax, rdi */
> 1680 EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> 1681 EMIT3(0x48, 0xF7, 0xD8); /* neg rax */
> 1682 EMIT4(0x48, 0x83, 0xC0, 0x40); /* add rax, 64 */
>> 1683 } else if (func == bpf_popcnt64 && has_popcnt) {
> 1684 /*
> 1685 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> 1686 *
> 1687 * POPCNT - Return the Count of Number of Bits Set to 1
> 1688 *
> 1689 * Opcode/Instruction
> 1690 * F3 REX.W 0F B8 /r
> 1691 * POPCNT r64, r/m64
> 1692 *
> 1693 * Op/En
> 1694 * RM
> 1695 *
> 1696 * 64 Mode
> 1697 * Valid
> 1698 *
> 1699 * Compat/Leg Mode
> 1700 * N.E.
> 1701 *
> 1702 * Description
> 1703 * POPCNT on r/m64
> 1704 */
> 1705 /* popcnt rax, rdi */
> 1706 EMIT5(0xF3, 0x48, 0x0F, 0xB8, 0xC7);
>> 1707 } else if (func == bpf_rol64) {
> 1708 EMIT1(0x51); /* push rcx */
> 1709 /* emit: rol64(x, s) */
> 1710 EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
> 1711 EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
> 1712 EMIT3(0x48, 0xD3, 0xC0); /* rol rax, cl */
> 1713 EMIT1(0x59); /* pop rcx */
>> 1714 } else if (func == bpf_ror64) {
> 1715 EMIT1(0x51); /* push rcx */
> 1716 /* emit: ror64(x, s) */
> 1717 EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
> 1718 EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
> 1719 EMIT3(0x48, 0xD3, 0xC8); /* ror rax, cl */
> 1720 EMIT1(0x59); /* pop rcx */
> 1721 } else {
> 1722 inlined = false;
> 1723 }
> 1724
> 1725 *pprog = prog;
> 1726 return inlined;
> 1727 }
> 1728
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
2026-02-19 17:50 ` Alexei Starovoitov
@ 2026-02-20 15:34 ` Leon Hwang
0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-20 15:34 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, LKML, Network Development,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
On 2026/2/20 01:50, Alexei Starovoitov wrote:
> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>> +static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
>> +{
>> + if (!(flags & KF_FASTCALL))
>> + return false;
>> +
>> + if (!env->prog->jit_requested)
>> + return true;
>> +
>> + if (func_id == special_kfunc_list[KF_bpf_clz64])
>> + return bpf_jit_inlines_kfunc_call(bpf_clz64);
>> + if (func_id == special_kfunc_list[KF_bpf_ctz64])
>> + return bpf_jit_inlines_kfunc_call(bpf_ctz64);
>> + if (func_id == special_kfunc_list[KF_bpf_ffs64])
>> + return bpf_jit_inlines_kfunc_call(bpf_ffs64);
>> + if (func_id == special_kfunc_list[KF_bpf_fls64])
>> + return bpf_jit_inlines_kfunc_call(bpf_fls64);
>> + if (func_id == special_kfunc_list[KF_bpf_bitrev64])
>> + return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
>> + if (func_id == special_kfunc_list[KF_bpf_popcnt64])
>> + return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
>> + if (func_id == special_kfunc_list[KF_bpf_rol64])
>> + return bpf_jit_inlines_kfunc_call(bpf_rol64);
>> + if (func_id == special_kfunc_list[KF_bpf_ror64])
>> + return bpf_jit_inlines_kfunc_call(bpf_ror64);
>
> This is too ugly. Find a way to do it differently.
Agreed.
I'd like to introduce a new flag KF_JIT_MAY_INLINE to indicate the kfunc
will be inlined by JIT backends if possible. As for those kfuncs w/
KF_FASTCALL w/o KF_JIT_MAY_INLINE, they are fastcall always.
Thanks,
Leon
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-19 17:47 ` Alexei Starovoitov
@ 2026-02-20 15:54 ` Leon Hwang
2026-02-20 17:50 ` Alexei Starovoitov
0 siblings, 1 reply; 24+ messages in thread
From: Leon Hwang @ 2026-02-20 15:54 UTC (permalink / raw)
To: Alexei Starovoitov, Ilya Leoshkevich
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
Viktor Malik, linux-arm-kernel, LKML, Network Development,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
On 2026/2/20 01:47, Alexei Starovoitov wrote:
> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
>>
>> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
>>
>> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
>> X86_FEATURE_BMI1 (TZCNT).
>>
>> bpf_clz64() and bpf_fls64() are supported when the CPU has
>> X86_FEATURE_ABM (LZCNT).
>>
>> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
>>
>> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
>> instruction, so it falls back to a regular function call.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>> arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
>> 1 file changed, 141 insertions(+)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 070ba80e39d7..193e1e2d7aa8 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -19,6 +19,7 @@
>> #include <asm/text-patching.h>
>> #include <asm/unwind.h>
>> #include <asm/cfi.h>
>> +#include <asm/cpufeatures.h>
>>
>> static bool all_callee_regs_used[4] = {true, true, true, true};
>>
>> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
>> *pprog = prog;
>> }
>>
>> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
>> +{
>> + bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
>> + bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
>> + bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
>> + bool inlined = true;
>> + u8 *prog = *pprog;
>> +
>> + /*
>> + * x86 Bit manipulation instruction set
>> + * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
>> + */
>> +
>> + if (func == bpf_clz64 && has_abm) {
>> + /*
>> + * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
>> + *
>> + * LZCNT - Count the Number of Leading Zero Bits
>> + *
>> + * Opcode/Instruction
>> + * F3 REX.W 0F BD /r
>> + * LZCNT r64, r/m64
>> + *
>> + * Op/En
>> + * RVM
>> + *
>> + * 64/32-bit Mode
>> + * V/N.E.
>> + *
>> + * CPUID Feature Flag
>> + * LZCNT
>> + *
>> + * Description
>> + * Count the number of leading zero bits in r/m64, return
>> + * result in r64.
>> + */
>> + /* emit: x ? 64 - fls64(x) : 64 */
>> + /* lzcnt rax, rdi */
>> + EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
>
> Instead of emitting binary in x86 and arm JITs,
> let's use in kernel disasm to check that all these kfuncs
> conform to kf_fastcall (don't use unnecessary registers,
> don't have calls to other functions) and then copy the binary
> from code and skip the last 'ret' insn.
> This way we can inline all kinds of kfuncs.
>
Good idea.
Quick question on “in-kernel disasm”: do you mean adding a kernel
instruction decoder/disassembler to validate a whitelist of kfuncs at
load time?
I’m trying to understand the intended scope:
* Is the expectation that we add an in-kernel disassembler/validator for
a small set of supported instructions and patterns (no calls/jumps,
only arg/ret regs touched, etc.)?
* Or is there already infrastructure you had in mind that we can reuse?
Once I understand that piece, I can rework the series to inline by
copying validated machine code (minus the final ret), rather than
emitting raw opcodes in the JITs.
I also noticed you mentioned a similar direction in "bpf/s390: Implement
get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
this approach further.
[1]
https://lore.kernel.org/bpf/CAADnVQKSMCohZy_HZwzNpFfTSnVu7rfxgmHEDgT9s28XxcDS5g@mail.gmail.com/
Thanks,
Leon
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-20 15:54 ` Leon Hwang
@ 2026-02-20 17:50 ` Alexei Starovoitov
2026-02-21 12:45 ` Leon Hwang
0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2026-02-20 17:50 UTC (permalink / raw)
To: Leon Hwang
Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
Network Development, open list:KERNEL SELFTEST FRAMEWORK,
kernel-patches-bot
On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2026/2/20 01:47, Alexei Starovoitov wrote:
> > On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
> >>
> >> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
> >>
> >> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> >> X86_FEATURE_BMI1 (TZCNT).
> >>
> >> bpf_clz64() and bpf_fls64() are supported when the CPU has
> >> X86_FEATURE_ABM (LZCNT).
> >>
> >> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
> >>
> >> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> >> instruction, so it falls back to a regular function call.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >> arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 141 insertions(+)
> >>
> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >> index 070ba80e39d7..193e1e2d7aa8 100644
> >> --- a/arch/x86/net/bpf_jit_comp.c
> >> +++ b/arch/x86/net/bpf_jit_comp.c
> >> @@ -19,6 +19,7 @@
> >> #include <asm/text-patching.h>
> >> #include <asm/unwind.h>
> >> #include <asm/cfi.h>
> >> +#include <asm/cpufeatures.h>
> >>
> >> static bool all_callee_regs_used[4] = {true, true, true, true};
> >>
> >> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
> >> *pprog = prog;
> >> }
> >>
> >> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> >> +{
> >> + bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> >> + bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> >> + bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> >> + bool inlined = true;
> >> + u8 *prog = *pprog;
> >> +
> >> + /*
> >> + * x86 Bit manipulation instruction set
> >> + * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> >> + */
> >> +
> >> + if (func == bpf_clz64 && has_abm) {
> >> + /*
> >> + * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> >> + *
> >> + * LZCNT - Count the Number of Leading Zero Bits
> >> + *
> >> + * Opcode/Instruction
> >> + * F3 REX.W 0F BD /r
> >> + * LZCNT r64, r/m64
> >> + *
> >> + * Op/En
> >> + * RVM
> >> + *
> >> + * 64/32-bit Mode
> >> + * V/N.E.
> >> + *
> >> + * CPUID Feature Flag
> >> + * LZCNT
> >> + *
> >> + * Description
> >> + * Count the number of leading zero bits in r/m64, return
> >> + * result in r64.
> >> + */
> >> + /* emit: x ? 64 - fls64(x) : 64 */
> >> + /* lzcnt rax, rdi */
> >> + EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> >
> > Instead of emitting binary in x86 and arm JITs,
> > let's use in kernel disasm to check that all these kfuncs
> > conform to kf_fastcall (don't use unnecessary registers,
> > don't have calls to other functions) and then copy the binary
> > from code and skip the last 'ret' insn.
> > This way we can inline all kinds of kfuncs.
> >
>
> Good idea.
>
> Quick question on “in-kernel disasm”: do you mean adding a kernel
> instruction decoder/disassembler to validate a whitelist of kfuncs at
> load time?
>
> I’m trying to understand the intended scope:
>
> * Is the expectation that we add an in-kernel disassembler/validator for
> a small set of supported instructions and patterns (no calls/jumps,
> only arg/ret regs touched, etc.)?
> * Or is there already infrastructure you had in mind that we can reuse?
>
> Once I understand that piece, I can rework the series to inline by
> copying validated machine code (minus the final ret), rather than
> emitting raw opcodes in the JITs.
>
> I also noticed you mentioned a similar direction in "bpf/s390: Implement
> get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
> this approach further.
You really sound like LLM. Do your homework as a human.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
2026-02-19 17:50 ` Alexei Starovoitov
@ 2026-02-21 9:58 ` Dan Carpenter
2026-02-21 12:50 ` Leon Hwang
1 sibling, 1 reply; 24+ messages in thread
From: Dan Carpenter @ 2026-02-21 9:58 UTC (permalink / raw)
To: oe-kbuild, Leon Hwang, bpf
Cc: lkp, oe-kbuild-all, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Shuah Khan, Leon Hwang,
Peilin Ye, Luis Gerhorst, Viktor Malik, linux-arm-kernel,
linux-kernel
Hi Leon,
kernel test robot noticed the following build warnings:
url: https://github.com/intel-lab-lkp/linux/commits/Leon-Hwang/bpf-Introduce-64-bit-bitops-kfuncs/20260219-223550
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20260219142933.13904-2-leon.hwang%40linux.dev
patch subject: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
config: i386-randconfig-141-20260220 (https://download.01.org/0day-ci/archive/20260221/202602210241.E7Q88vvq-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
smatch version: v0.5.0-8994-gd50c5a4c
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202602210241.E7Q88vvq-lkp@intel.com/
smatch warnings:
kernel/bpf/verifier.c:18245 bpf_kfunc_is_fastcall() error: buffer overflow 'special_kfunc_list' 64 <= 64
vim +/special_kfunc_list +18245 kernel/bpf/verifier.c
966e89879bbea4 Leon Hwang 2026-02-19 18223 static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
966e89879bbea4 Leon Hwang 2026-02-19 18224 {
966e89879bbea4 Leon Hwang 2026-02-19 18225 if (!(flags & KF_FASTCALL))
966e89879bbea4 Leon Hwang 2026-02-19 18226 return false;
966e89879bbea4 Leon Hwang 2026-02-19 18227
966e89879bbea4 Leon Hwang 2026-02-19 18228 if (!env->prog->jit_requested)
966e89879bbea4 Leon Hwang 2026-02-19 18229 return true;
966e89879bbea4 Leon Hwang 2026-02-19 18230
966e89879bbea4 Leon Hwang 2026-02-19 18231 if (func_id == special_kfunc_list[KF_bpf_clz64])
966e89879bbea4 Leon Hwang 2026-02-19 18232 return bpf_jit_inlines_kfunc_call(bpf_clz64);
966e89879bbea4 Leon Hwang 2026-02-19 18233 if (func_id == special_kfunc_list[KF_bpf_ctz64])
966e89879bbea4 Leon Hwang 2026-02-19 18234 return bpf_jit_inlines_kfunc_call(bpf_ctz64);
966e89879bbea4 Leon Hwang 2026-02-19 18235 if (func_id == special_kfunc_list[KF_bpf_ffs64])
966e89879bbea4 Leon Hwang 2026-02-19 18236 return bpf_jit_inlines_kfunc_call(bpf_ffs64);
966e89879bbea4 Leon Hwang 2026-02-19 18237 if (func_id == special_kfunc_list[KF_bpf_fls64])
966e89879bbea4 Leon Hwang 2026-02-19 18238 return bpf_jit_inlines_kfunc_call(bpf_fls64);
966e89879bbea4 Leon Hwang 2026-02-19 18239 if (func_id == special_kfunc_list[KF_bpf_bitrev64])
966e89879bbea4 Leon Hwang 2026-02-19 18240 return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
966e89879bbea4 Leon Hwang 2026-02-19 18241 if (func_id == special_kfunc_list[KF_bpf_popcnt64])
966e89879bbea4 Leon Hwang 2026-02-19 18242 return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
966e89879bbea4 Leon Hwang 2026-02-19 18243 if (func_id == special_kfunc_list[KF_bpf_rol64])
966e89879bbea4 Leon Hwang 2026-02-19 18244 return bpf_jit_inlines_kfunc_call(bpf_rol64);
966e89879bbea4 Leon Hwang 2026-02-19 @18245 if (func_id == special_kfunc_list[KF_bpf_ror64])
^^^^^^^^^^^^
special_kfunc_list[] has 64 elements and KF_bpf_ror64 is 64 so
this is out of bounds.
966e89879bbea4 Leon Hwang 2026-02-19 18246 return bpf_jit_inlines_kfunc_call(bpf_ror64);
966e89879bbea4 Leon Hwang 2026-02-19 18247
966e89879bbea4 Leon Hwang 2026-02-19 18248 return true;
966e89879bbea4 Leon Hwang 2026-02-19 18249 }
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-20 17:50 ` Alexei Starovoitov
@ 2026-02-21 12:45 ` Leon Hwang
2026-02-21 16:51 ` Alexei Starovoitov
0 siblings, 1 reply; 24+ messages in thread
From: Leon Hwang @ 2026-02-21 12:45 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
Network Development, open list:KERNEL SELFTEST FRAMEWORK,
kernel-patches-bot
On 2026/2/21 01:50, Alexei Starovoitov wrote:
> On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 2026/2/20 01:47, Alexei Starovoitov wrote:
>>> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
>>>>
>>>> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
>>>>
>>>> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
>>>> X86_FEATURE_BMI1 (TZCNT).
>>>>
>>>> bpf_clz64() and bpf_fls64() are supported when the CPU has
>>>> X86_FEATURE_ABM (LZCNT).
>>>>
>>>> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
>>>>
>>>> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
>>>> instruction, so it falls back to a regular function call.
>>>>
>>>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>>>> ---
>>>> arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 141 insertions(+)
>>>>
>>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>>> index 070ba80e39d7..193e1e2d7aa8 100644
>>>> --- a/arch/x86/net/bpf_jit_comp.c
>>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>>> @@ -19,6 +19,7 @@
>>>> #include <asm/text-patching.h>
>>>> #include <asm/unwind.h>
>>>> #include <asm/cfi.h>
>>>> +#include <asm/cpufeatures.h>
>>>>
>>>> static bool all_callee_regs_used[4] = {true, true, true, true};
>>>>
>>>> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
>>>> *pprog = prog;
>>>> }
>>>>
>>>> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
>>>> +{
>>>> + bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
>>>> + bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
>>>> + bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
>>>> + bool inlined = true;
>>>> + u8 *prog = *pprog;
>>>> +
>>>> + /*
>>>> + * x86 Bit manipulation instruction set
>>>> + * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
>>>> + */
>>>> +
>>>> + if (func == bpf_clz64 && has_abm) {
>>>> + /*
>>>> + * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
>>>> + *
>>>> + * LZCNT - Count the Number of Leading Zero Bits
>>>> + *
>>>> + * Opcode/Instruction
>>>> + * F3 REX.W 0F BD /r
>>>> + * LZCNT r64, r/m64
>>>> + *
>>>> + * Op/En
>>>> + * RVM
>>>> + *
>>>> + * 64/32-bit Mode
>>>> + * V/N.E.
>>>> + *
>>>> + * CPUID Feature Flag
>>>> + * LZCNT
>>>> + *
>>>> + * Description
>>>> + * Count the number of leading zero bits in r/m64, return
>>>> + * result in r64.
>>>> + */
>>>> + /* emit: x ? 64 - fls64(x) : 64 */
>>>> + /* lzcnt rax, rdi */
>>>> + EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
>>>
>>> Instead of emitting binary in x86 and arm JITs,
>>> let's use in kernel disasm to check that all these kfuncs
>>> conform to kf_fastcall (don't use unnecessary registers,
>>> don't have calls to other functions) and then copy the binary
>>> from code and skip the last 'ret' insn.
>>> This way we can inline all kinds of kfuncs.
>>>
>>
>> Good idea.
>>
>> Quick question on “in-kernel disasm”: do you mean adding a kernel
>> instruction decoder/disassembler to validate a whitelist of kfuncs at
>> load time?
>>
>> I’m trying to understand the intended scope:
>>
>> * Is the expectation that we add an in-kernel disassembler/validator for
>> a small set of supported instructions and patterns (no calls/jumps,
>> only arg/ret regs touched, etc.)?
>> * Or is there already infrastructure you had in mind that we can reuse?
>>
>> Once I understand that piece, I can rework the series to inline by
>> copying validated machine code (minus the final ret), rather than
>> emitting raw opcodes in the JITs.
>>
>> I also noticed you mentioned a similar direction in "bpf/s390: Implement
>> get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
>> this approach further.
>
> You really sound like LLM. Do your homework as a human.
Got it.
I polished my draft using ChatGPT, which would leave LLM smell in my reply.
Here's my original draft:
Good idea. But I concern about the "in kernel disasm". Do you mean we
will build a disassembler for whitelist kfuncs at starting?
I noticed you've mentioned the same direction in "bpf/s390: Implement
get_preempt_count()" [1]. So, I added Ilya here to discuss this direction.
[1]
https://lore.kernel.org/bpf/CAADnVQKSMCohZy_HZwzNpFfTSnVu7rfxgmHEDgT9s28XxcDS5g@mail.gmail.com/
Thanks,
Leon
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
2026-02-21 9:58 ` Dan Carpenter
@ 2026-02-21 12:50 ` Leon Hwang
0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-21 12:50 UTC (permalink / raw)
To: Dan Carpenter, oe-kbuild, bpf
Cc: lkp, oe-kbuild-all, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Shuah Khan, Peilin Ye,
Luis Gerhorst, Viktor Malik, linux-arm-kernel, linux-kernel
On 2026/2/21 17:58, Dan Carpenter wrote:
> Hi Leon,
>
> kernel test robot noticed the following build warnings:
>
> url: https://github.com/intel-lab-lkp/linux/commits/Leon-Hwang/bpf-Introduce-64-bit-bitops-kfuncs/20260219-223550
> base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link: https://lore.kernel.org/r/20260219142933.13904-2-leon.hwang%40linux.dev
> patch subject: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
> config: i386-randconfig-141-20260220 (https://download.01.org/0day-ci/archive/20260221/202602210241.E7Q88vvq-lkp@intel.com/config)
> compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
> smatch version: v0.5.0-8994-gd50c5a4c
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202602210241.E7Q88vvq-lkp@intel.com/
>
> smatch warnings:
> kernel/bpf/verifier.c:18245 bpf_kfunc_is_fastcall() error: buffer overflow 'special_kfunc_list' 64 <= 64
>
> vim +/special_kfunc_list +18245 kernel/bpf/verifier.c
>
> 966e89879bbea4 Leon Hwang 2026-02-19 18223 static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
> 966e89879bbea4 Leon Hwang 2026-02-19 18224 {
> 966e89879bbea4 Leon Hwang 2026-02-19 18225 if (!(flags & KF_FASTCALL))
> 966e89879bbea4 Leon Hwang 2026-02-19 18226 return false;
> 966e89879bbea4 Leon Hwang 2026-02-19 18227
> 966e89879bbea4 Leon Hwang 2026-02-19 18228 if (!env->prog->jit_requested)
> 966e89879bbea4 Leon Hwang 2026-02-19 18229 return true;
> 966e89879bbea4 Leon Hwang 2026-02-19 18230
> 966e89879bbea4 Leon Hwang 2026-02-19 18231 if (func_id == special_kfunc_list[KF_bpf_clz64])
> 966e89879bbea4 Leon Hwang 2026-02-19 18232 return bpf_jit_inlines_kfunc_call(bpf_clz64);
> 966e89879bbea4 Leon Hwang 2026-02-19 18233 if (func_id == special_kfunc_list[KF_bpf_ctz64])
> 966e89879bbea4 Leon Hwang 2026-02-19 18234 return bpf_jit_inlines_kfunc_call(bpf_ctz64);
> 966e89879bbea4 Leon Hwang 2026-02-19 18235 if (func_id == special_kfunc_list[KF_bpf_ffs64])
> 966e89879bbea4 Leon Hwang 2026-02-19 18236 return bpf_jit_inlines_kfunc_call(bpf_ffs64);
> 966e89879bbea4 Leon Hwang 2026-02-19 18237 if (func_id == special_kfunc_list[KF_bpf_fls64])
> 966e89879bbea4 Leon Hwang 2026-02-19 18238 return bpf_jit_inlines_kfunc_call(bpf_fls64);
> 966e89879bbea4 Leon Hwang 2026-02-19 18239 if (func_id == special_kfunc_list[KF_bpf_bitrev64])
> 966e89879bbea4 Leon Hwang 2026-02-19 18240 return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
> 966e89879bbea4 Leon Hwang 2026-02-19 18241 if (func_id == special_kfunc_list[KF_bpf_popcnt64])
> 966e89879bbea4 Leon Hwang 2026-02-19 18242 return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
> 966e89879bbea4 Leon Hwang 2026-02-19 18243 if (func_id == special_kfunc_list[KF_bpf_rol64])
> 966e89879bbea4 Leon Hwang 2026-02-19 18244 return bpf_jit_inlines_kfunc_call(bpf_rol64);
> 966e89879bbea4 Leon Hwang 2026-02-19 @18245 if (func_id == special_kfunc_list[KF_bpf_ror64])
> ^^^^^^^^^^^^
> special_kfunc_list[] has 64 elements and KF_bpf_ror64 is 64 so
> this is out of bounds.
>
Ack.
I'll try a new way using KF_JIT_MAY_INLINE flag in the next revision,
which will avoid adding these kfuncs to special_kfunc_list btw.
Thanks,
Leon
> 966e89879bbea4 Leon Hwang 2026-02-19 18246 return bpf_jit_inlines_kfunc_call(bpf_ror64);
> 966e89879bbea4 Leon Hwang 2026-02-19 18247
> 966e89879bbea4 Leon Hwang 2026-02-19 18248 return true;
> 966e89879bbea4 Leon Hwang 2026-02-19 18249 }
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-21 12:45 ` Leon Hwang
@ 2026-02-21 16:51 ` Alexei Starovoitov
2026-02-23 16:35 ` Leon Hwang
0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2026-02-21 16:51 UTC (permalink / raw)
To: Leon Hwang
Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
Network Development, open list:KERNEL SELFTEST FRAMEWORK,
kernel-patches-bot
On Sat, Feb 21, 2026 at 4:45 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2026/2/21 01:50, Alexei Starovoitov wrote:
> > On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> On 2026/2/20 01:47, Alexei Starovoitov wrote:
> >>> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>>>
> >>>> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
> >>>>
> >>>> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
> >>>>
> >>>> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> >>>> X86_FEATURE_BMI1 (TZCNT).
> >>>>
> >>>> bpf_clz64() and bpf_fls64() are supported when the CPU has
> >>>> X86_FEATURE_ABM (LZCNT).
> >>>>
> >>>> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
> >>>>
> >>>> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> >>>> instruction, so it falls back to a regular function call.
> >>>>
> >>>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >>>> ---
> >>>> arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
> >>>> 1 file changed, 141 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >>>> index 070ba80e39d7..193e1e2d7aa8 100644
> >>>> --- a/arch/x86/net/bpf_jit_comp.c
> >>>> +++ b/arch/x86/net/bpf_jit_comp.c
> >>>> @@ -19,6 +19,7 @@
> >>>> #include <asm/text-patching.h>
> >>>> #include <asm/unwind.h>
> >>>> #include <asm/cfi.h>
> >>>> +#include <asm/cpufeatures.h>
> >>>>
> >>>> static bool all_callee_regs_used[4] = {true, true, true, true};
> >>>>
> >>>> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
> >>>> *pprog = prog;
> >>>> }
> >>>>
> >>>> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> >>>> +{
> >>>> + bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> >>>> + bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> >>>> + bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> >>>> + bool inlined = true;
> >>>> + u8 *prog = *pprog;
> >>>> +
> >>>> + /*
> >>>> + * x86 Bit manipulation instruction set
> >>>> + * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> >>>> + */
> >>>> +
> >>>> + if (func == bpf_clz64 && has_abm) {
> >>>> + /*
> >>>> + * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> >>>> + *
> >>>> + * LZCNT - Count the Number of Leading Zero Bits
> >>>> + *
> >>>> + * Opcode/Instruction
> >>>> + * F3 REX.W 0F BD /r
> >>>> + * LZCNT r64, r/m64
> >>>> + *
> >>>> + * Op/En
> >>>> + * RVM
> >>>> + *
> >>>> + * 64/32-bit Mode
> >>>> + * V/N.E.
> >>>> + *
> >>>> + * CPUID Feature Flag
> >>>> + * LZCNT
> >>>> + *
> >>>> + * Description
> >>>> + * Count the number of leading zero bits in r/m64, return
> >>>> + * result in r64.
> >>>> + */
> >>>> + /* emit: x ? 64 - fls64(x) : 64 */
> >>>> + /* lzcnt rax, rdi */
> >>>> + EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> >>>
> >>> Instead of emitting binary in x86 and arm JITs,
> >>> let's use in kernel disasm to check that all these kfuncs
> >>> conform to kf_fastcall (don't use unnecessary registers,
> >>> don't have calls to other functions) and then copy the binary
> >>> from code and skip the last 'ret' insn.
> >>> This way we can inline all kinds of kfuncs.
> >>>
> >>
> >> Good idea.
> >>
> >> Quick question on “in-kernel disasm”: do you mean adding a kernel
> >> instruction decoder/disassembler to validate a whitelist of kfuncs at
> >> load time?
> >>
> >> I’m trying to understand the intended scope:
> >>
> >> * Is the expectation that we add an in-kernel disassembler/validator for
> >> a small set of supported instructions and patterns (no calls/jumps,
> >> only arg/ret regs touched, etc.)?
> >> * Or is there already infrastructure you had in mind that we can reuse?
> >>
> >> Once I understand that piece, I can rework the series to inline by
> >> copying validated machine code (minus the final ret), rather than
> >> emitting raw opcodes in the JITs.
> >>
> >> I also noticed you mentioned a similar direction in "bpf/s390: Implement
> >> get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
> >> this approach further.
> >
> > You really sound like LLM. Do your homework as a human.
>
> Got it.
>
> I polished my draft using ChatGPT, which would leave LLM smell in my reply.
... and for anyone reading it the smell is ohh too strong.
> Here's my original draft:
>
> Good idea. But I concern about the "in kernel disasm". Do you mean we
> will build a disassembler for whitelist kfuncs at starting?
>
> I noticed you've mentioned the same direction in "bpf/s390: Implement
> get_preempt_count()" [1]. So, I added Ilya here to discuss this direction.
Much better. Keep it human.
"in kernel disasm" already exists for some architectures
(at least x86 and arm64) since it's being used by kprobes.
The ask here is to figure out whether they're usable for such
insn analysis. x86 disasm is likely capable.
re:"whitelist kfunc"
I suspect an additional list is not necessary.
kf_fastcall is a good enough signal that such kfunc should
be inlinable.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
2026-02-21 16:51 ` Alexei Starovoitov
@ 2026-02-23 16:35 ` Leon Hwang
0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2026-02-23 16:35 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
Network Development, open list:KERNEL SELFTEST FRAMEWORK,
kernel-patches-bot
On 2026/2/22 00:51, Alexei Starovoitov wrote:
> On Sat, Feb 21, 2026 at 4:45 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
[...]
>>
>> Good idea. But I concern about the "in kernel disasm". Do you mean we
>> will build a disassembler for whitelist kfuncs at starting?
>>
>> I noticed you've mentioned the same direction in "bpf/s390: Implement
>> get_preempt_count()" [1]. So, I added Ilya here to discuss this direction.
>
> Much better. Keep it human.
>
> "in kernel disasm" already exists for some architectures
> (at least x86 and arm64) since it's being used by kprobes.
> The ask here is to figure out whether they're usable for such
> insn analysis. x86 disasm is likely capable.
>
After looking into x86&arm insn decoder, they are able to do insn analysis.
> re:"whitelist kfunc"
> I suspect an additional list is not necessary.
> kf_fastcall is a good enough signal that such kfunc should
> be inlinable.
I thought it was to build a light-weight custom disassembler, which
would only support limited machine codes (whitelist kfunc).
Obviously, I was wrong.
We can reuse the in-kernel insn decoding ability to validate fastcall
function by checking the registers use.
I'll post RFC after finishing poc, on both x86_64 and arm64 of course.
Thanks,
Leon
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2026-02-23 16:35 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
2026-02-19 17:50 ` Alexei Starovoitov
2026-02-20 15:34 ` Leon Hwang
2026-02-21 9:58 ` Dan Carpenter
2026-02-21 12:50 ` Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
2026-02-19 17:47 ` Alexei Starovoitov
2026-02-20 15:54 ` Leon Hwang
2026-02-20 17:50 ` Alexei Starovoitov
2026-02-21 12:45 ` Leon Hwang
2026-02-21 16:51 ` Alexei Starovoitov
2026-02-23 16:35 ` Leon Hwang
2026-02-19 22:05 ` kernel test robot
2026-02-20 14:12 ` Leon Hwang
2026-02-20 11:59 ` kernel test robot
2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
2026-02-19 15:10 ` Puranjay Mohan
2026-02-19 15:20 ` Puranjay Mohan
2026-02-19 15:25 ` Puranjay Mohan
2026-02-19 15:36 ` Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs Leon Hwang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox