[PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs
@ 2026-02-19 14:29 Leon Hwang
  2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

This series adds generic 64-bit bitops kfuncs and JIT inlining support
on x86_64 and arm64.

The new kfuncs are:

* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.

Defined zero-input behavior:

* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0

bpf_ffs64() was previously discussed in
"bpf: Add generic kfunc bpf_ffs64()" [1].

The main concern in that discussion was ABI overhead: a regular kfunc call
follows the BPF calling convention and can introduce extra spill/fill compared
to dedicated instructions.

This series keeps the user-facing API as kfuncs while avoiding that overhead
on hot paths. When the JIT/backend and CPU support it, calls are inlined into
native instructions; otherwise they fall back to regular function calls.

Links:
[1] https://lore.kernel.org/bpf/20240131155607.51157-1-hffilwlqm@gmail.com/

Changes:
v1 -> v2:
* Drop RFC.
* Add __cpu_feature annotation for CPU-feature-gated tests.
* Add JIT disassembly tests for 64-bit bitops kfuncs
* Address comments from Alexei:
  * Drop KF_MUST_INLINE.
  * Drop internal BPF_ALU64 opcode BPF_BITOPS.
  * Mark all of the kfuncs as fastcall and do push/pop in JIT when necessary.
* v1: https://lore.kernel.org/bpf/20260209155919.19015-1-leon.hwang@linux.dev/

Leon Hwang (6):
  bpf: Introduce 64-bit bitops kfuncs
  bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  bpf, arm64: Add 64-bit bitops kfuncs support
  selftests/bpf: Add tests for 64-bit bitops kfuncs
  selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated
    tests
  selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs

 arch/arm64/net/bpf_jit_comp.c                 | 123 ++++++++++++
 arch/x86/net/bpf_jit_comp.c                   | 141 +++++++++++++
 include/linux/filter.h                        |  10 +
 kernel/bpf/core.c                             |   6 +
 kernel/bpf/helpers.c                          |  50 +++++
 kernel/bpf/verifier.c                         |  53 ++++-
 .../testing/selftests/bpf/bpf_experimental.h  |   9 +
 .../testing/selftests/bpf/prog_tests/bitops.c | 188 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bitops.c    |  68 +++++++
 .../testing/selftests/bpf/progs/bitops_jit.c  | 153 ++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_misc.h  |   7 +
 tools/testing/selftests/bpf/test_loader.c     | 150 ++++++++++++++
 12 files changed, 957 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitops.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitops_jit.c

-- 
2.52.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
  2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
  2026-02-19 17:50   ` Alexei Starovoitov
  2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Add the following generic 64-bit bitops kfuncs:

* bpf_clz64(): Count leading zeros.
* bpf_ctz64(): Count trailing zeros.
* bpf_ffs64(): Find first set bit, 1-based index, returns 0 when input
  is 0.
* bpf_fls64(): Find last set bit, 1-based index.
* bpf_bitrev64(): Reverse bits.
* bpf_popcnt64(): Population count.
* bpf_rol64(): Rotate left.
* bpf_ror64(): Rotate right.

Defined zero-input behavior:

* bpf_clz64(0) = 64
* bpf_ctz64(0) = 64
* bpf_ffs64(0) = 0
* bpf_fls64(0) = 0

These kfuncs are inlined by JIT backends when the required CPU features are
available. Otherwise, they fall back to regular function calls.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/filter.h | 10 ++++++++
 kernel/bpf/core.c      |  6 +++++
 kernel/bpf/helpers.c   | 50 +++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c  | 53 +++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 44d7ae95ddbc..b8a538bec5c6 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1157,6 +1157,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_jit_needs_zext(void);
 bool bpf_jit_inlines_helper_call(s32 imm);
+bool bpf_jit_inlines_kfunc_call(void *func_addr);
 bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_percpu_insn(void);
 bool bpf_jit_supports_kfunc_call(void);
@@ -1837,4 +1838,13 @@ static inline void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
 }
 #endif /* CONFIG_NET */
 
+u64 bpf_clz64(u64 x);
+u64 bpf_ctz64(u64 x);
+u64 bpf_ffs64(u64 x);
+u64 bpf_fls64(u64 x);
+u64 bpf_popcnt64(u64 x);
+u64 bpf_bitrev64(u64 x);
+u64 bpf_rol64(u64 x, u64 s);
+u64 bpf_ror64(u64 x, u64 s);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 5ab6bace7d0d..5f37309d83fc 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3114,6 +3114,12 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
 	return false;
 }
 
+/* Return TRUE if the JIT backend inlines the kfunc. */
+bool __weak bpf_jit_inlines_kfunc_call(void *func_addr)
+{
+	return false;
+}
+
 /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool __weak bpf_jit_supports_subprog_tailcalls(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ac32798eb04..6bf73c46af72 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -29,6 +29,8 @@
 #include <linux/task_work.h>
 #include <linux/irq_work.h>
 #include <linux/buildid.h>
+#include <linux/bitops.h>
+#include <linux/bitrev.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -4501,6 +4503,46 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
 	}
 }
 
+__bpf_kfunc u64 bpf_clz64(u64 x)
+{
+	return x ? 64 - fls64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ctz64(u64 x)
+{
+	return x ? __ffs64(x) : 64;
+}
+
+__bpf_kfunc u64 bpf_ffs64(u64 x)
+{
+	return x ? __ffs64(x) + 1 : 0;
+}
+
+__bpf_kfunc u64 bpf_fls64(u64 x)
+{
+	return fls64(x);
+}
+
+__bpf_kfunc u64 bpf_popcnt64(u64 x)
+{
+	return hweight64(x);
+}
+
+__bpf_kfunc u64 bpf_bitrev64(u64 x)
+{
+	return ((u64)bitrev32(x & 0xFFFFFFFF) << 32) | bitrev32(x >> 32);
+}
+
+__bpf_kfunc u64 bpf_rol64(u64 x, u64 s)
+{
+	return rol64(x, s);
+}
+
+__bpf_kfunc u64 bpf_ror64(u64 x, u64 s)
+{
+	return ror64(x, s);
+}
+
 __bpf_kfunc_end_defs();
 
 static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4578,6 +4620,14 @@ BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
 #endif
 #endif
+BTF_ID_FLAGS(func, bpf_clz64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_ctz64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_ffs64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_fls64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_popcnt64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_bitrev64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_rol64, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_ror64, KF_FASTCALL)
 BTF_KFUNCS_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0162f946032f..2cb29bc1b3c3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12461,6 +12461,14 @@ enum special_kfunc_type {
 	KF_bpf_session_is_return,
 	KF_bpf_stream_vprintk,
 	KF_bpf_stream_print_stack,
+	KF_bpf_clz64,
+	KF_bpf_ctz64,
+	KF_bpf_ffs64,
+	KF_bpf_fls64,
+	KF_bpf_bitrev64,
+	KF_bpf_popcnt64,
+	KF_bpf_rol64,
+	KF_bpf_ror64,
 };
 
 BTF_ID_LIST(special_kfunc_list)
@@ -12541,6 +12549,14 @@ BTF_ID(func, bpf_arena_reserve_pages)
 BTF_ID(func, bpf_session_is_return)
 BTF_ID(func, bpf_stream_vprintk)
 BTF_ID(func, bpf_stream_print_stack)
+BTF_ID(func, bpf_clz64)
+BTF_ID(func, bpf_ctz64)
+BTF_ID(func, bpf_ffs64)
+BTF_ID(func, bpf_fls64)
+BTF_ID(func, bpf_bitrev64)
+BTF_ID(func, bpf_popcnt64)
+BTF_ID(func, bpf_rol64)
+BTF_ID(func, bpf_ror64)
 
 static bool is_task_work_add_kfunc(u32 func_id)
 {
@@ -18204,6 +18220,34 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
 	}
 }
 
+static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
+{
+	if (!(flags & KF_FASTCALL))
+		return false;
+
+	if (!env->prog->jit_requested)
+		return true;
+
+	if (func_id == special_kfunc_list[KF_bpf_clz64])
+		return bpf_jit_inlines_kfunc_call(bpf_clz64);
+	if (func_id == special_kfunc_list[KF_bpf_ctz64])
+		return bpf_jit_inlines_kfunc_call(bpf_ctz64);
+	if (func_id == special_kfunc_list[KF_bpf_ffs64])
+		return bpf_jit_inlines_kfunc_call(bpf_ffs64);
+	if (func_id == special_kfunc_list[KF_bpf_fls64])
+		return bpf_jit_inlines_kfunc_call(bpf_fls64);
+	if (func_id == special_kfunc_list[KF_bpf_bitrev64])
+		return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
+	if (func_id == special_kfunc_list[KF_bpf_popcnt64])
+		return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
+	if (func_id == special_kfunc_list[KF_bpf_rol64])
+		return bpf_jit_inlines_kfunc_call(bpf_rol64);
+	if (func_id == special_kfunc_list[KF_bpf_ror64])
+		return bpf_jit_inlines_kfunc_call(bpf_ror64);
+
+	return true;
+}
+
 struct call_summary {
 	u8 num_params;
 	bool is_void;
@@ -18246,7 +18290,7 @@ static bool get_call_summary(struct bpf_verifier_env *env, struct bpf_insn *call
 			/* error would be reported later */
 			return false;
 		cs->num_params = btf_type_vlen(meta.func_proto);
-		cs->fastcall = meta.kfunc_flags & KF_FASTCALL;
+		cs->fastcall = bpf_kfunc_is_fastcall(env, meta.func_id, meta.kfunc_flags);
 		cs->is_void = btf_type_is_void(btf_type_by_id(meta.btf, meta.func_proto->type));
 		return true;
 	}
@@ -23186,6 +23230,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn_buf[4] = BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1);
 		insn_buf[5] = BPF_ALU64_IMM(BPF_NEG, BPF_REG_0, 0);
 		*cnt = 6;
+	} else if (desc->func_id == special_kfunc_list[KF_bpf_ffs64] &&
+		   bpf_jit_inlines_kfunc_call(bpf_ffs64)) {
+		insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, 0);
+		insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 2);
+		insn_buf[2] = *insn;
+		insn_buf[3] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1);
+		*cnt = 4;
 	}
 
 	if (env->insn_aux_data[insn_idx].arg_prog) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
  2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
  2026-02-19 17:47   ` Alexei Starovoitov
  2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.

bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.

bpf_ctz64() and bpf_ffs64() are supported when the CPU has
X86_FEATURE_BMI1 (TZCNT).

bpf_clz64() and bpf_fls64() are supported when the CPU has
X86_FEATURE_ABM (LZCNT).

bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.

bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
instruction, so it falls back to a regular function call.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
 1 file changed, 141 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 070ba80e39d7..193e1e2d7aa8 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -19,6 +19,7 @@
 #include <asm/text-patching.h>
 #include <asm/unwind.h>
 #include <asm/cfi.h>
+#include <asm/cpufeatures.h>
 
 static bool all_callee_regs_used[4] = {true, true, true, true};
 
@@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
 	*pprog = prog;
 }
 
+static bool bpf_inlines_func_call(u8 **pprog, void *func)
+{
+	bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
+	bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
+	bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
+	bool inlined = true;
+	u8 *prog = *pprog;
+
+	/*
+	 * x86 Bit manipulation instruction set
+	 * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
+	 */
+
+	if (func == bpf_clz64 && has_abm) {
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   LZCNT - Count the Number of Leading Zero Bits
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F BD /r
+		 *     LZCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RVM
+		 *
+		 *     64/32-bit Mode
+		 *     V/N.E.
+		 *
+		 *     CPUID Feature Flag
+		 *     LZCNT
+		 *
+		 *     Description
+		 *     Count the number of leading zero bits in r/m64, return
+		 *     result in r64.
+		 */
+		/* emit: x ? 64 - fls64(x) : 64 */
+		/* lzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+	} else if (func == bpf_ctz64 && has_bmi1) {
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   TZCNT - Count the Number of Trailing Zero Bits
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F BC /r
+		 *     TZCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RVM
+		 *
+		 *     64/32-bit Mode
+		 *     V/N.E.
+		 *
+		 *     CPUID Feature Flag
+		 *     BMI1
+		 *
+		 *     Description
+		 *     Count the number of trailing zero bits in r/m64, return
+		 *     result in r64.
+		 */
+		/* emit: x ? __ffs64(x) : 64 */
+		/* tzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+	} else if (func == bpf_ffs64 && has_bmi1) {
+		/* emit: __ffs64(x); x == 0 has been handled in verifier */
+		/* tzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBC, 0xC7);
+	} else if (func == bpf_fls64 && has_abm) {
+		/* emit: fls64(x) */
+		/* lzcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
+		EMIT3(0x48, 0xF7, 0xD8);       /* neg rax */
+		EMIT4(0x48, 0x83, 0xC0, 0x40); /* add rax, 64 */
+	} else if (func == bpf_popcnt64 && has_popcnt) {
+		/*
+		 * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
+		 *
+		 *   POPCNT - Return the Count of Number of Bits Set to 1
+		 *
+		 *     Opcode/Instruction
+		 *     F3 REX.W 0F B8 /r
+		 *     POPCNT r64, r/m64
+		 *
+		 *     Op/En
+		 *     RM
+		 *
+		 *     64 Mode
+		 *     Valid
+		 *
+		 *     Compat/Leg Mode
+		 *     N.E.
+		 *
+		 *     Description
+		 *     POPCNT on r/m64
+		 */
+		/* popcnt rax, rdi */
+		EMIT5(0xF3, 0x48, 0x0F, 0xB8, 0xC7);
+	} else if (func == bpf_rol64) {
+		EMIT1(0x51);             /* push rcx */
+		/* emit: rol64(x, s) */
+		EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+		EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+		EMIT3(0x48, 0xD3, 0xC0); /* rol rax, cl */
+		EMIT1(0x59);             /* pop rcx */
+	} else if (func == bpf_ror64) {
+		EMIT1(0x51);             /* push rcx */
+		/* emit: ror64(x, s) */
+		EMIT3(0x48, 0x89, 0xF1); /* mov rcx, rsi */
+		EMIT3(0x48, 0x89, 0xF8); /* mov rax, rdi */
+		EMIT3(0x48, 0xD3, 0xC8); /* ror rax, cl */
+		EMIT1(0x59);             /* pop rcx */
+	} else {
+		inlined = false;
+	}
+
+	*pprog = prog;
+	return inlined;
+}
+
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
 #define __LOAD_TCC_PTR(off)			\
@@ -2452,6 +2574,8 @@ st:			if (is_imm8(insn->off))
 			u8 *ip = image + addrs[i - 1];
 
 			func = (u8 *) __bpf_call_base + imm32;
+			if (bpf_inlines_func_call(&prog, func))
+				break;
 			if (src_reg == BPF_PSEUDO_CALL && tail_call_reachable) {
 				LOAD_TAIL_CALL_CNT_PTR(stack_depth);
 				ip += 7;
@@ -4117,3 +4241,20 @@ bool bpf_jit_supports_fsession(void)
 {
 	return true;
 }
+
+bool bpf_jit_inlines_kfunc_call(void *func_addr)
+{
+	if (func_addr == bpf_ctz64 || func_addr == bpf_ffs64)
+		return boot_cpu_has(X86_FEATURE_BMI1);
+
+	if (func_addr == bpf_clz64 || func_addr == bpf_fls64)
+		return boot_cpu_has(X86_FEATURE_ABM);
+
+	if (func_addr == bpf_popcnt64)
+		return boot_cpu_has(X86_FEATURE_POPCNT);
+
+	if (func_addr == bpf_rol64 || func_addr == bpf_ror64)
+		return true;
+
+	return false;
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
  2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
  2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
  2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
  2026-02-19 15:10   ` Puranjay Mohan
                     ` (2 more replies)
  2026-02-19 14:29 ` [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs Leon Hwang
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Implement JIT inlining of the 64-bit bitops kfuncs on arm64.

bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
inlined via RBIT + CLZ, or via the native CTZ instruction when
FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
via RORV.

bpf_popcnt64() is not inlined as the native population count instruction
requires NEON/SIMD registers, which should not be touched from BPF
programs. It therefore falls back to a regular function call.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 7a530ea4f5ae..f03f732063d9 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
 	return 0;
 }
 
+static inline u32 a64_clz64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.91 CLZ
+	 *
+	 *     Count leading zeros
+	 *
+	 *     This instruction counts the number of consecutive binary zero bits,
+	 *     starting from the most significant bit in the source register,
+	 *     and places the count in the destination register.
+	 */
+	/* CLZ Xd, Xn */
+	return 0xdac01000 | (rn << 5) | rd;
+}
+
+static inline u32 a64_ctz64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.144 CTZ
+	 *
+	 *     Count trailing zeros
+	 *
+	 *     This instruction counts the number of consecutive binary zero bits,
+	 *     starting from the least significant bit in the source register,
+	 *     and places the count in the destination register.
+	 *
+	 *     This instruction requires FEAT_CSSC.
+	 */
+	/* CTZ Xd, Xn */
+	return 0xdac01800 | (rn << 5) | rd;
+}
+
+static inline u32 a64_rbit64(u8 rd, u8 rn)
+{
+	/*
+	 * Arm Architecture Reference Manual for A-profile architecture
+	 * (Document number: ARM DDI 0487)
+	 *
+	 *   A64 Base Instruction Descriptions
+	 *   C6.2 Alphabetical list of A64 base instructions
+	 *
+	 *   C6.2.320 RBIT
+	 *
+	 *     Reverse bits
+	 *
+	 *     This instruction reverses the bit order in a register.
+	 */
+	/* RBIT Xd, Xn */
+	return 0xdac00000 | (rn << 5) | rd;
+}
+
+static inline bool boot_cpu_supports_cssc(void)
+{
+	/*
+	 * Documentation/arch/arm64/cpu-feature-registers.rst
+	 *
+	 *   ID_AA64ISAR2_EL1 - Instruction set attribute register 2
+	 *
+	 *     CSSC
+	 */
+	return cpuid_feature_extract_unsigned_field(read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1),
+						    ID_AA64ISAR2_EL1_CSSC_SHIFT);
+}
+
+static bool bpf_inlines_func_call(struct jit_ctx *ctx, void *func_addr)
+{
+	const u8 tmp = bpf2a64[TMP_REG_1];
+	const u8 r0 = bpf2a64[BPF_REG_0];
+	const u8 r1 = bpf2a64[BPF_REG_1];
+	const u8 r2 = bpf2a64[BPF_REG_2];
+	bool inlined = true;
+
+	if (func_addr == bpf_clz64) {
+		emit(a64_clz64(r0, r1), ctx);
+	} else if (func_addr == bpf_ctz64 || func_addr == bpf_ffs64) {
+		if (boot_cpu_supports_cssc()) {
+			emit(a64_ctz64(r0, r1), ctx);
+		} else {
+			emit(a64_rbit64(tmp, r1), ctx);
+			emit(a64_clz64(r0, tmp), ctx);
+		}
+	} else if (func_addr == bpf_fls64) {
+		emit(a64_clz64(tmp, r1), ctx);
+		emit(A64_NEG(1, tmp, tmp), ctx);
+		emit(A64_ADD_I(1, r0, tmp, 64), ctx);
+	} else if (func_addr == bpf_bitrev64) {
+		emit(a64_rbit64(r0, r1), ctx);
+	} else if (func_addr == bpf_rol64) {
+		emit(A64_NEG(1, tmp, r2), ctx);
+		emit(A64_DATA2(1, r0, r1, tmp, RORV), ctx);
+	} else if (func_addr == bpf_ror64) {
+		emit(A64_DATA2(1, r0, r1, r2, RORV), ctx);
+	} else {
+		inlined = false;
+	}
+
+	return inlined;
+}
+
+bool bpf_jit_inlines_kfunc_call(void *func_addr)
+{
+	if (func_addr == bpf_clz64 || func_addr == bpf_ctz64 ||
+	    func_addr == bpf_ffs64 || func_addr == bpf_fls64 ||
+	    func_addr == bpf_rol64 || func_addr == bpf_ror64 ||
+	    func_addr == bpf_bitrev64)
+		return true;
+	return false;
+}
+
 /* JITs an eBPF instruction.
  * Returns:
  * 0  - successfully JITed an 8-byte eBPF instruction.
@@ -1598,6 +1719,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 					    &func_addr, &func_addr_fixed);
 		if (ret < 0)
 			return ret;
+		if (bpf_inlines_func_call(ctx, (void *) func_addr))
+			break;
 		emit_call(func_addr, ctx);
 		/*
 		 * Call to arch_bpf_timed_may_goto() is emitted by the
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs
  2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
                   ` (2 preceding siblings ...)
  2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
  2026-02-19 14:29 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests Leon Hwang
  2026-02-19 14:29 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs Leon Hwang
  5 siblings, 0 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Add selftests for bpf_clz64(), bpf_ctz64(), bpf_ffs64(), bpf_fls64(),
bpf_bitrev64(), bpf_popcnt64(), bpf_rol64(), and bpf_ror64().

Each subtest compares kfunc results against a userspace reference
implementation over a set of test vectors.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../testing/selftests/bpf/bpf_experimental.h  |   9 +
 .../testing/selftests/bpf/prog_tests/bitops.c | 182 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bitops.c    |  68 +++++++
 3 files changed, 259 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitops.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitops.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 9df77e59d4f5..02a985ef71cc 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -594,6 +594,15 @@ extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym;
 extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
 				 struct bpf_dynptr *value_p) __weak __ksym;
 
+extern __u64 bpf_clz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ctz64(__u64 x) __weak __ksym;
+extern __u64 bpf_ffs64(__u64 x) __weak __ksym;
+extern __u64 bpf_fls64(__u64 x) __weak __ksym;
+extern __u64 bpf_bitrev64(__u64 x) __weak __ksym;
+extern __u64 bpf_popcnt64(__u64 x) __weak __ksym;
+extern __u64 bpf_rol64(__u64 x, __u64 s) __weak __ksym;
+extern __u64 bpf_ror64(__u64 x, __u64 s) __weak __ksym;
+
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_BITS	4
diff --git a/tools/testing/selftests/bpf/prog_tests/bitops.c b/tools/testing/selftests/bpf/prog_tests/bitops.c
new file mode 100644
index 000000000000..9acc3cb1908c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bitops.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "bitops.skel.h"
+
+struct bitops_case {
+	__u64 x;
+	__u64 s;
+	__u64 exp;
+};
+
+static struct bitops_case cases[] = {
+	{ 0x0ULL, 0, 0 },
+	{ 0x1ULL, 1, 0 },
+	{ 0x8000000000000000ULL, 63, 0 },
+	{ 0xffffffffffffffffULL, 64, 0 },
+	{ 0x0123456789abcdefULL, 65, 0 },
+	{ 0x0000000100000000ULL, 127, 0 },
+};
+
+static __u64 clz64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? __builtin_clzll(x) : 64;
+}
+
+static __u64 ctz64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? __builtin_ctzll(x) : 64;
+}
+
+static __u64 ffs64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? (__u64)__builtin_ctzll(x) + 1 : 0;
+}
+
+static __u64 fls64(__u64 x, __u64 s)
+{
+	(void)s;
+	return x ? 64 - __builtin_clzll(x) : 0;
+}
+
+static __u64 popcnt64(__u64 x, __u64 s)
+{
+	(void)s;
+	return __builtin_popcountll(x);
+}
+
+static __u64 bitrev64(__u64 x, __u64 s)
+{
+	__u64 y = 0;
+	int i;
+
+	(void)s;
+
+	for (i = 0; i < 64; i++) {
+		y <<= 1;
+		y |= x & 1;
+		x >>= 1;
+	}
+	return y;
+}
+
+static __u64 rol64(__u64 x, __u64 s)
+{
+	s &= 63;
+	return (x << s) | (x >> ((-s) & 63));
+}
+
+static __u64 ror64(__u64 x, __u64 s)
+{
+	s &= 63;
+	return (x >> s) | (x << ((-s) & 63));
+}
+
+static void test_bitops_case(const char *prog_name)
+{
+	struct bpf_program *prog;
+	struct bitops *skel;
+	size_t i;
+	int err;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = bitops__open();
+	if (!ASSERT_OK_PTR(skel, "bitops__open"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	bpf_program__set_autoload(prog, true);
+
+	err = bitops__load(skel);
+	if (!ASSERT_OK(err, "bitops__load"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		skel->bss->in_x = cases[i].x;
+		skel->bss->in_s = cases[i].s;
+		err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts);
+		if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+			goto cleanup;
+
+		if (!ASSERT_OK(topts.retval, "retval"))
+			goto cleanup;
+
+		ASSERT_EQ(skel->bss->out, cases[i].exp, "out");
+	}
+
+cleanup:
+	bitops__destroy(skel);
+}
+
+#define RUN_BITOPS_CASE(_bitops, _prog)					\
+	do {								\
+		for (size_t i = 0; i < ARRAY_SIZE(cases); i++)		\
+			cases[i].exp = _bitops(cases[i].x, cases[i].s);	\
+		test_bitops_case(_prog);				\
+	} while (0)
+
+static void test_clz64(void)
+{
+	RUN_BITOPS_CASE(clz64, "bitops_clz64");
+}
+
+static void test_ctz64(void)
+{
+	RUN_BITOPS_CASE(ctz64, "bitops_ctz64");
+}
+
+static void test_ffs64(void)
+{
+	RUN_BITOPS_CASE(ffs64, "bitops_ffs64");
+}
+
+static void test_fls64(void)
+{
+	RUN_BITOPS_CASE(fls64, "bitops_fls64");
+}
+
+static void test_bitrev64(void)
+{
+	RUN_BITOPS_CASE(bitrev64, "bitops_bitrev");
+}
+
+static void test_popcnt64(void)
+{
+	RUN_BITOPS_CASE(popcnt64, "bitops_popcnt");
+}
+
+static void test_rol64(void)
+{
+	RUN_BITOPS_CASE(rol64, "bitops_rol64");
+}
+
+static void test_ror64(void)
+{
+	RUN_BITOPS_CASE(ror64, "bitops_ror64");
+}
+
+void test_bitops(void)
+{
+	if (test__start_subtest("clz64"))
+		test_clz64();
+	if (test__start_subtest("ctz64"))
+		test_ctz64();
+	if (test__start_subtest("ffs64"))
+		test_ffs64();
+	if (test__start_subtest("fls64"))
+		test_fls64();
+	if (test__start_subtest("bitrev64"))
+		test_bitrev64();
+	if (test__start_subtest("popcnt64"))
+		test_popcnt64();
+	if (test__start_subtest("rol64"))
+		test_rol64();
+	if (test__start_subtest("ror64"))
+		test_ror64();
+}
diff --git a/tools/testing/selftests/bpf/progs/bitops.c b/tools/testing/selftests/bpf/progs/bitops.c
new file mode 100644
index 000000000000..deac09bc8683
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitops.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+
+__u64 in_x;
+__u64 in_s;
+
+__u64 out;
+
+SEC("?syscall")
+int bitops_clz64(void *ctx)
+{
+	out = bpf_clz64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ctz64(void *ctx)
+{
+	out = bpf_ctz64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ffs64(void *ctx)
+{
+	out = bpf_ffs64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_fls64(void *ctx)
+{
+	out = bpf_fls64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_bitrev(void *ctx)
+{
+	out = bpf_bitrev64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_popcnt(void *ctx)
+{
+	out = bpf_popcnt64(in_x);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_rol64(void *ctx)
+{
+	out = bpf_rol64(in_x, in_s);
+	return 0;
+}
+
+SEC("?syscall")
+int bitops_ror64(void *ctx)
+{
+	out = bpf_ror64(in_x, in_s);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests
  2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
                   ` (3 preceding siblings ...)
  2026-02-19 14:29 ` [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
  2026-02-19 14:29 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs Leon Hwang
  5 siblings, 0 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Add a new __cpu_feature("...") test annotation and parse it in
selftests/bpf test_loader.

Behavior:
- Annotation value is matched against CPU feature tokens from
  /proc/cpuinfo (case-insensitive).
- Multiple __cpu_feature annotations can be specified for one test; all
  required features must be present.
- If any required feature is missing, the test is skipped.

Limitation:
- __cpu_feature is evaluated per test function and is not scoped per
  __arch_* block. A single test that combines multiple architectures
  cannot express different per-arch feature requirements.

This lets JIT/disassembly-sensitive tests declare explicit CPU feature
requirements and avoid false failures on unsupported systems.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/testing/selftests/bpf/progs/bpf_misc.h |   7 +
 tools/testing/selftests/bpf/test_loader.c    | 150 +++++++++++++++++++
 2 files changed, 157 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index c9bfbe1bafc1..75e66373a64d 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -126,6 +126,12 @@
  *                   Several __arch_* annotations could be specified at once.
  *                   When test case is not run on current arch it is marked as skipped.
  * __caps_unpriv     Specify the capabilities that should be set when running the test.
+ * __cpu_feature     Specify required CPU feature for test execution.
+ *                   Multiple __cpu_feature annotations could be specified.
+ *                   Value must match a CPU feature token exposed by
+ *                   /proc/cpuinfo (case-insensitive).
+ *                   Can't be used together with multiple __arch_* tags.
+ *                   If any required feature is not present, test case is skipped.
  *
  * __linear_size     Specify the size of the linear area of non-linear skbs, or
  *                   0 for linear skbs.
@@ -156,6 +162,7 @@
 #define __arch_riscv64		__arch("RISCV64")
 #define __arch_s390x		__arch("s390x")
 #define __caps_unpriv(caps)	__attribute__((btf_decl_tag("comment:test_caps_unpriv=" EXPAND_QUOTE(caps))))
+#define __cpu_feature(feat)	__attribute__((btf_decl_tag("comment:test_cpu_feature=" feat)))
 #define __load_if_JITed()	__attribute__((btf_decl_tag("comment:load_mode=jited")))
 #define __load_if_no_JITed()	__attribute__((btf_decl_tag("comment:load_mode=no_jited")))
 #define __stderr(msg)		__attribute__((btf_decl_tag("comment:test_expect_stderr=" XSTR(__COUNTER__) "=" msg)))
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index 338c035c3688..3729d1572589 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -4,6 +4,7 @@
 #include <stdlib.h>
 #include <test_progs.h>
 #include <bpf/btf.h>
+#include <ctype.h>
 
 #include "autoconf_helper.h"
 #include "disasm_helpers.h"
@@ -44,6 +45,7 @@
 #define TEST_TAG_EXPECT_STDOUT_PFX "comment:test_expect_stdout="
 #define TEST_TAG_EXPECT_STDOUT_PFX_UNPRIV "comment:test_expect_stdout_unpriv="
 #define TEST_TAG_LINEAR_SIZE "comment:test_linear_size="
+#define TEST_TAG_CPU_FEATURE_PFX "comment:test_cpu_feature="
 
 /* Warning: duplicated in bpf_misc.h */
 #define POINTER_VALUE	0xbadcafe
@@ -67,6 +69,11 @@ enum load_mode {
 	NO_JITED	= 1 << 1,
 };
 
+struct cpu_feature_set {
+	char **names;
+	size_t cnt;
+};
+
 struct test_subspec {
 	char *name;
 	bool expect_failure;
@@ -93,6 +100,7 @@ struct test_spec {
 	int linear_sz;
 	bool auxiliary;
 	bool valid;
+	struct cpu_feature_set cpu_features;
 };
 
 static int tester_init(struct test_loader *tester)
@@ -145,6 +153,16 @@ static void free_test_spec(struct test_spec *spec)
 	free(spec->unpriv.name);
 	spec->priv.name = NULL;
 	spec->unpriv.name = NULL;
+
+	if (spec->cpu_features.names) {
+		size_t i;
+
+		for (i = 0; i < spec->cpu_features.cnt; i++)
+			free(spec->cpu_features.names[i]);
+		free(spec->cpu_features.names);
+		spec->cpu_features.names = NULL;
+		spec->cpu_features.cnt = 0;
+	}
 }
 
 /* Compiles regular expression matching pattern.
@@ -394,6 +412,122 @@ static int get_current_arch(void)
 	return ARCH_UNKNOWN;
 }
 
+static int cpu_feature_set_add(struct cpu_feature_set *set, const char *name)
+{
+	char **tmp, *norm;
+	size_t i, len;
+
+	if (!name || !name[0]) {
+		PRINT_FAIL("bad cpu feature spec: empty string");
+		return -EINVAL;
+	}
+
+	len = strlen(name);
+	norm = malloc(len + 1);
+	if (!norm)
+		return -ENOMEM;
+
+	for (i = 0; i < len; i++) {
+		if (isspace(name[i])) {
+			free(norm);
+			PRINT_FAIL("bad cpu feature spec: whitespace is not allowed in '%s'", name);
+			return -EINVAL;
+		}
+		norm[i] = tolower((unsigned char)name[i]);
+	}
+	norm[len] = '\0';
+
+	for (i = 0; i < set->cnt; i++) {
+		if (strcmp(set->names[i], norm) == 0) {
+			free(norm);
+			return 0;
+		}
+	}
+
+	tmp = realloc(set->names, (set->cnt + 1) * sizeof(*set->names));
+	if (!tmp) {
+		free(norm);
+		return -ENOMEM;
+	}
+	set->names = tmp;
+	set->names[set->cnt++] = norm;
+	return 0;
+}
+
+static bool cpu_feature_set_has(const struct cpu_feature_set *set, const char *name)
+{
+	size_t i;
+
+	for (i = 0; i < set->cnt; i++) {
+		if (strcmp(set->names[i], name) == 0)
+			return true;
+	}
+	return false;
+}
+
+static bool cpu_feature_set_includes(const struct cpu_feature_set *have,
+				     const struct cpu_feature_set *need)
+{
+	size_t i;
+
+	for (i = 0; i < need->cnt; i++) {
+		if (!cpu_feature_set_has(have, need->names[i]))
+			return false;
+	}
+	return true;
+}
+
+static const struct cpu_feature_set *get_current_cpu_features(void)
+{
+	static struct cpu_feature_set set;
+	static bool initialized;
+	char *line = NULL;
+	size_t len = 0;
+	FILE *fp;
+	int err;
+
+	if (initialized)
+		return &set;
+
+	initialized = true;
+	fp = fopen("/proc/cpuinfo", "r");
+	if (!fp)
+		return &set;
+
+	while (getline(&line, &len, fp) != -1) {
+		char *p = line, *colon, *tok;
+
+		while (*p && isspace(*p))
+			p++;
+		if (!str_has_pfx(p, "flags") &&
+		    !str_has_pfx(p, "Features") &&
+		    !str_has_pfx(p, "features"))
+			continue;
+
+		colon = strchr(p, ':');
+		if (!colon)
+			continue;
+
+		for (tok = strtok(colon + 1, " \t\n"); tok; tok = strtok(NULL, " \t\n")) {
+			err = cpu_feature_set_add(&set, tok);
+			if (err) {
+				PRINT_FAIL("failed to parse cpu feature from '/proc/cpuinfo': '%s'",
+					   tok);
+				break;
+			}
+		}
+	}
+
+	free(line);
+	fclose(fp);
+	return &set;
+}
+
+static int parse_cpu_feature(const char *name, struct cpu_feature_set *set)
+{
+	return cpu_feature_set_add(set, name);
+}
+
 /* Uses btf_decl_tag attributes to describe the expected test
  * behavior, see bpf_misc.h for detailed description of each attribute
  * and attribute combinations.
@@ -650,9 +784,20 @@ static int parse_test_spec(struct test_loader *tester,
 				err = -EINVAL;
 				goto cleanup;
 			}
+		} else if (str_has_pfx(s, TEST_TAG_CPU_FEATURE_PFX)) {
+			val = s + sizeof(TEST_TAG_CPU_FEATURE_PFX) - 1;
+			err = parse_cpu_feature(val, &spec->cpu_features);
+			if (err)
+				goto cleanup;
 		}
 	}
 
+	if (spec->cpu_features.cnt && __builtin_popcount(arch_mask) != 1) {
+		PRINT_FAIL("__cpu_feature requires exactly one __arch_* tag");
+		err = -EINVAL;
+		goto cleanup;
+	}
+
 	spec->arch_mask = arch_mask ?: -1;
 	spec->load_mask = load_mask ?: (JITED | NO_JITED);
 
@@ -1161,6 +1306,11 @@ void run_subtest(struct test_loader *tester,
 		return;
 	}
 
+	if (!cpu_feature_set_includes(get_current_cpu_features(), &spec->cpu_features)) {
+		test__skip();
+		return;
+	}
+
 	if (unpriv) {
 		if (!can_execute_unpriv(tester, spec)) {
 			test__skip();
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs
  2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
                   ` (4 preceding siblings ...)
  2026-02-19 14:29 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests Leon Hwang
@ 2026-02-19 14:29 ` Leon Hwang
  5 siblings, 0 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 14:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Add bitops_jit selftests that verify JITed instruction sequences for
supported 64-bit bitops kfuncs on x86_64 and arm64, including
CPU-feature-gated coverage on x86 where required.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../testing/selftests/bpf/prog_tests/bitops.c |   6 +
 .../testing/selftests/bpf/progs/bitops_jit.c  | 153 ++++++++++++++++++
 2 files changed, 159 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bitops_jit.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bitops.c b/tools/testing/selftests/bpf/prog_tests/bitops.c
index 9acc3cb1908c..2c203904880d 100644
--- a/tools/testing/selftests/bpf/prog_tests/bitops.c
+++ b/tools/testing/selftests/bpf/prog_tests/bitops.c
@@ -2,6 +2,7 @@
 
 #include <test_progs.h>
 #include "bitops.skel.h"
+#include "bitops_jit.skel.h"
 
 struct bitops_case {
 	__u64 x;
@@ -180,3 +181,8 @@ void test_bitops(void)
 	if (test__start_subtest("ror64"))
 		test_ror64();
 }
+
+void test_bitops_jit(void)
+{
+	RUN_TESTS(bitops_jit);
+}
diff --git a/tools/testing/selftests/bpf/progs/bitops_jit.c b/tools/testing/selftests/bpf/progs/bitops_jit.c
new file mode 100644
index 000000000000..9f414e56b1e8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitops_jit.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+SEC("syscall")
+__description("bitops jit: clz64 uses lzcnt on x86 with abm")
+__success __retval(63)
+__arch_x86_64
+__cpu_feature("abm")
+__jited("	lzcnt{{.*}}")
+int bitops_jit_clz64_x86(void *ctx)
+{
+	return bpf_clz64(1);
+}
+
+SEC("syscall")
+__description("bitops jit: ctz64 uses tzcnt on x86 with bmi1")
+__success __retval(4)
+__arch_x86_64
+__cpu_feature("bmi1")
+__jited("	tzcnt{{.*}}")
+int bitops_jit_ctz64_x86(void *ctx)
+{
+	return bpf_ctz64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: ffs64 uses tzcnt on x86 with bmi1")
+__success __retval(5)
+__arch_x86_64
+__cpu_feature("bmi1")
+__jited("	tzcnt{{.*}}")
+int bitops_jit_ffs64_x86(void *ctx)
+{
+	return bpf_ffs64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: fls64 uses lzcnt on x86 with abm")
+__success __retval(5)
+__arch_x86_64
+__cpu_feature("abm")
+__jited("	lzcnt{{.*}}")
+int bitops_jit_fls64_x86(void *ctx)
+{
+	return bpf_fls64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: popcnt64 uses popcnt on x86")
+__success __retval(3)
+__arch_x86_64
+__cpu_feature("popcnt")
+__jited("	popcnt{{.*}}")
+int bitops_jit_popcnt64_x86(void *ctx)
+{
+	return bpf_popcnt64(0x1011);
+}
+
+SEC("syscall")
+__description("bitops jit: rol64 uses rol on x86")
+__success __retval(6)
+__arch_x86_64
+__jited("	rol{{.*}}")
+int bitops_jit_rol64_x86(void *ctx)
+{
+	return bpf_rol64(3, 1);
+}
+
+SEC("syscall")
+__description("bitops jit: ror64 uses ror on x86")
+__success __retval(3)
+__arch_x86_64
+__jited("	ror{{.*}}")
+int bitops_jit_ror64_x86(void *ctx)
+{
+	return bpf_ror64(6, 1);
+}
+
+SEC("syscall")
+__description("bitops jit: clz64 uses clz on arm64")
+__success __retval(63)
+__arch_arm64
+__jited("	clz	{{.*}}")
+int bitops_jit_clz64_arm64(void *ctx)
+{
+	return bpf_clz64(1);
+}
+
+SEC("syscall")
+__description("bitops jit: ctz64 uses ctz or rbit+clz on arm64")
+__success __retval(4)
+__arch_arm64
+__jited("	{{(ctz|rbit)}}	{{.*}}")
+int bitops_jit_ctz64_arm64(void *ctx)
+{
+	return bpf_ctz64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: ffs64 uses ctz or rbit+clz on arm64")
+__success __retval(5)
+__arch_arm64
+__jited("	{{(ctz|rbit)}}	{{.*}}")
+int bitops_jit_ffs64_arm64(void *ctx)
+{
+	return bpf_ffs64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: fls64 uses clz on arm64")
+__success __retval(5)
+__arch_arm64
+__jited("	clz	{{.*}}")
+int bitops_jit_fls64_arm64(void *ctx)
+{
+	return bpf_fls64(0x10);
+}
+
+SEC("syscall")
+__description("bitops jit: bitrev64 uses rbit on arm64")
+__success __retval(1)
+__arch_arm64
+__jited("	rbit	{{.*}}")
+int bitops_jit_bitrev64_arm64(void *ctx)
+{
+	return bpf_bitrev64(0x8000000000000000ULL);
+}
+
+SEC("syscall")
+__description("bitops jit: rol64 uses rorv on arm64")
+__success __retval(6)
+__arch_arm64
+__jited("	ror	{{.*}}")
+int bitops_jit_rol64_arm64(void *ctx)
+{
+	return bpf_rol64(3, 1);
+}
+
+SEC("syscall")
+__description("bitops jit: ror64 uses rorv on arm64")
+__success __retval(3)
+__arch_arm64
+__jited("	ror	{{.*}}")
+int bitops_jit_ror64_arm64(void *ctx)
+{
+	return bpf_ror64(6, 1);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
  2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
@ 2026-02-19 15:10   ` Puranjay Mohan
  2026-02-19 15:20   ` Puranjay Mohan
  2026-02-19 15:25   ` Puranjay Mohan
  2 siblings, 0 replies; 19+ messages in thread
From: Puranjay Mohan @ 2026-02-19 15:10 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
	Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst, Viktor Malik,
	linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	kernel-patches-bot


> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>
> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
> inlined via RBIT + CLZ, or via the native CTZ instruction when
> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
> via RORV.
>
> bpf_popcnt64() is not inlined as the native population count instruction
> requires NEON/SIMD registers, which should not be touched from BPF
> programs. It therefore falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
>  1 file changed, 123 insertions(+)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 7a530ea4f5ae..f03f732063d9 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
>  	return 0;
>  }
>  
> +static inline u32 a64_clz64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.91 CLZ
> +	 *
> +	 *     Count leading zeros
> +	 *
> +	 *     This instruction counts the number of consecutive binary zero bits,
> +	 *     starting from the most significant bit in the source register,
> +	 *     and places the count in the destination register.
> +	 */
> +	/* CLZ Xd, Xn */
> +	return 0xdac01000 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_ctz64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.144 CTZ
> +	 *
> +	 *     Count trailing zeros
> +	 *
> +	 *     This instruction counts the number of consecutive binary zero bits,
> +	 *     starting from the least significant bit in the source register,
> +	 *     and places the count in the destination register.
> +	 *
> +	 *     This instruction requires FEAT_CSSC.
> +	 */
> +	/* CTZ Xd, Xn */
> +	return 0xdac01800 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_rbit64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.320 RBIT
> +	 *
> +	 *     Reverse bits
> +	 *
> +	 *     This instruction reverses the bit order in a register.
> +	 */
> +	/* RBIT Xd, Xn */
> +	return 0xdac00000 | (rn << 5) | rd;
> +}

Instead of hardcoding the instructions with the above functions, do it the
proper way something like the following patch (not compile tested):

-- >8 --

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 18c7811774d3..b2696af0b817 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
 	AARCH64_INSN_DATA1_REVERSE_16,
 	AARCH64_INSN_DATA1_REVERSE_32,
 	AARCH64_INSN_DATA1_REVERSE_64,
+	AARCH64_INSN_DATA1_RBIT,
+	AARCH64_INSN_DATA1_CLZ,
+	AARCH64_INSN_DATA1_CTZ,
 };

 enum aarch64_insn_data2_type {
@@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv,	0x7FE0FC00, 0x1AC02C00)
 __AARCH64_INSN_FUNCS(rev16,	0x7FFFFC00, 0x5AC00400)
 __AARCH64_INSN_FUNCS(rev32,	0x7FFFFC00, 0x5AC00800)
 __AARCH64_INSN_FUNCS(rev64,	0x7FFFFC00, 0x5AC00C00)
+__AARCH64_INSN_FUNCS(rbit,	0x7FFFFC00, 0x5AC00000)
+__AARCH64_INSN_FUNCS(clz,      0x7FFFFC00, 0x5AC01000)
+__AARCH64_INSN_FUNCS(ctz,      0x7FFFFC00, 0x5AC01800)
 __AARCH64_INSN_FUNCS(and,      0x7F200000, 0x0A000000)
 __AARCH64_INSN_FUNCS(bic,      0x7F200000, 0x0A200000)
 __AARCH64_INSN_FUNCS(orr,      0x7F200000, 0x2A000000)
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index 4e298baddc2e..2229ab596cda 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
 		}
 		insn = aarch64_insn_get_rev64_value();
 		break;
+	case AARCH64_INSN_DATA1_CLZ:
+		insn = aarch64_insn_get_clz_value();
+		break;
+	case AARCH64_INSN_DATA1_RBIT:
+		insn = aarch64_insn_get_rbit_value();
+		break;
+	case AARCH64_INSN_DATA1_CTZ:
+		insn = aarch64_insn_get_ctz_value();
+		break;
 	default:
 		pr_err("%s: unknown data1 encoding %d\n", __func__, type);
 		return AARCH64_BREAK_FAULT;
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index bbea4f36f9f2..af806c39dadb 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -248,6 +248,12 @@
 #define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
 #define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
 #define A64_REV64(Rd, Rn)     A64_DATA1(1, Rd, Rn, REVERSE_64)
+/* Rd = RBIT(Rn) */
+#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
+/* Rd = CLZ(Rn) */
+#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
+/* Rd = CTZ(Rn) */
+#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)

 /* Data-processing (2 source) */
 /* Rd = Rn OP Rm */

-- 8< --

Thanks,
Puranjay

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
  2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
  2026-02-19 15:10   ` Puranjay Mohan
@ 2026-02-19 15:20   ` Puranjay Mohan
  2026-02-19 15:25   ` Puranjay Mohan
  2 siblings, 0 replies; 19+ messages in thread
From: Puranjay Mohan @ 2026-02-19 15:20 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
	Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst, Viktor Malik,
	linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	kernel-patches-bot

Leon Hwang <leon.hwang@linux.dev> writes:

> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>
> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
> inlined via RBIT + CLZ, or via the native CTZ instruction when
> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
> via RORV.
>
> bpf_popcnt64() is not inlined as the native population count instruction
> requires NEON/SIMD registers, which should not be touched from BPF
> programs. It therefore falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
>  1 file changed, 123 insertions(+)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 7a530ea4f5ae..f03f732063d9 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
>  	return 0;
>  }
>  
> +static inline u32 a64_clz64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.91 CLZ
> +	 *
> +	 *     Count leading zeros
> +	 *
> +	 *     This instruction counts the number of consecutive binary zero bits,
> +	 *     starting from the most significant bit in the source register,
> +	 *     and places the count in the destination register.
> +	 */
> +	/* CLZ Xd, Xn */
> +	return 0xdac01000 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_ctz64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.144 CTZ
> +	 *
> +	 *     Count trailing zeros
> +	 *
> +	 *     This instruction counts the number of consecutive binary zero bits,
> +	 *     starting from the least significant bit in the source register,
> +	 *     and places the count in the destination register.
> +	 *
> +	 *     This instruction requires FEAT_CSSC.
> +	 */
> +	/* CTZ Xd, Xn */
> +	return 0xdac01800 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_rbit64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.320 RBIT
> +	 *
> +	 *     Reverse bits
> +	 *
> +	 *     This instruction reverses the bit order in a register.
> +	 */
> +	/* RBIT Xd, Xn */
> +	return 0xdac00000 | (rn << 5) | rd;
> +}

I don't think adding the above three functions is the best to JIT these
intructions, do it like the other data1 and data2 instructions and add
them to the generic framework like the following patch(untested) does:

-- >8 --

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 18c7811774d3..b2696af0b817 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
 	AARCH64_INSN_DATA1_REVERSE_16,
 	AARCH64_INSN_DATA1_REVERSE_32,
 	AARCH64_INSN_DATA1_REVERSE_64,
+	AARCH64_INSN_DATA1_RBIT,
+	AARCH64_INSN_DATA1_CLZ,
+	AARCH64_INSN_DATA1_CTZ,
 };

 enum aarch64_insn_data2_type {
@@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv,	0x7FE0FC00, 0x1AC02C00)
 __AARCH64_INSN_FUNCS(rev16,	0x7FFFFC00, 0x5AC00400)
 __AARCH64_INSN_FUNCS(rev32,	0x7FFFFC00, 0x5AC00800)
 __AARCH64_INSN_FUNCS(rev64,	0x7FFFFC00, 0x5AC00C00)
+__AARCH64_INSN_FUNCS(rbit,	0x7FFFFC00, 0x5AC00000)
+__AARCH64_INSN_FUNCS(clz,	0x7FFFFC00, 0x5AC01000)
+__AARCH64_INSN_FUNCS(ctz,	0x7FFFFC00, 0x5AC01800)
 __AARCH64_INSN_FUNCS(and,	0x7F200000, 0x0A000000)
 __AARCH64_INSN_FUNCS(bic,	0x7F200000, 0x0A200000)
 __AARCH64_INSN_FUNCS(orr,	0x7F200000, 0x2A000000)
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index 4e298baddc2e..2229ab596cda 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
 		}
 		insn = aarch64_insn_get_rev64_value();
 		break;
+	case AARCH64_INSN_DATA1_CLZ:
+		insn = aarch64_insn_get_clz_value();
+		break;
+	case AARCH64_INSN_DATA1_RBIT:
+		insn = aarch64_insn_get_rbit_value();
+		break;
+	case AARCH64_INSN_DATA1_CTZ:
+		insn = aarch64_insn_get_ctz_value();
+		break;
 	default:
 		pr_err("%s: unknown data1 encoding %d\n", __func__, type);
 		return AARCH64_BREAK_FAULT;
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index bbea4f36f9f2..af806c39dadb 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -248,6 +248,12 @@
 #define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
 #define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
 #define A64_REV64(Rd, Rn)     A64_DATA1(1, Rd, Rn, REVERSE_64)
+/* Rd = RBIT(Rn) */
+#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
+/* Rd = CLZ(Rn) */
+#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
+/* Rd = CTZ(Rn) */
+#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)

 /* Data-processing (2 source) */
 /* Rd = Rn OP Rm */

-- 8< --

Thanks,
Puranjay

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
  2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
  2026-02-19 15:10   ` Puranjay Mohan
  2026-02-19 15:20   ` Puranjay Mohan
@ 2026-02-19 15:25   ` Puranjay Mohan
  2026-02-19 15:36     ` Leon Hwang
  2 siblings, 1 reply; 19+ messages in thread
From: Puranjay Mohan @ 2026-02-19 15:25 UTC (permalink / raw)
  To: Leon Hwang, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
	Shuah Khan, Leon Hwang, Peilin Ye, Luis Gerhorst, Viktor Malik,
	linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	kernel-patches-bot

Leon Hwang <leon.hwang@linux.dev> writes:

> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>
> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
> inlined via RBIT + CLZ, or via the native CTZ instruction when
> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
> via RORV.
>
> bpf_popcnt64() is not inlined as the native population count instruction
> requires NEON/SIMD registers, which should not be touched from BPF
> programs. It therefore falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
>  1 file changed, 123 insertions(+)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 7a530ea4f5ae..f03f732063d9 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
>  	return 0;
>  }
>  
> +static inline u32 a64_clz64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.91 CLZ
> +	 *
> +	 *     Count leading zeros
> +	 *
> +	 *     This instruction counts the number of consecutive binary zero bits,
> +	 *     starting from the most significant bit in the source register,
> +	 *     and places the count in the destination register.
> +	 */
> +	/* CLZ Xd, Xn */
> +	return 0xdac01000 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_ctz64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.144 CTZ
> +	 *
> +	 *     Count trailing zeros
> +	 *
> +	 *     This instruction counts the number of consecutive binary zero bits,
> +	 *     starting from the least significant bit in the source register,
> +	 *     and places the count in the destination register.
> +	 *
> +	 *     This instruction requires FEAT_CSSC.
> +	 */
> +	/* CTZ Xd, Xn */
> +	return 0xdac01800 | (rn << 5) | rd;
> +}
> +
> +static inline u32 a64_rbit64(u8 rd, u8 rn)
> +{
> +	/*
> +	 * Arm Architecture Reference Manual for A-profile architecture
> +	 * (Document number: ARM DDI 0487)
> +	 *
> +	 *   A64 Base Instruction Descriptions
> +	 *   C6.2 Alphabetical list of A64 base instructions
> +	 *
> +	 *   C6.2.320 RBIT
> +	 *
> +	 *     Reverse bits
> +	 *
> +	 *     This instruction reverses the bit order in a register.
> +	 */
> +	/* RBIT Xd, Xn */
> +	return 0xdac00000 | (rn << 5) | rd;
> +}

I don't think adding the above three functions is the best to JIT these
intructions, do it like the other data1 and data2 instructions and add
them to the generic framework like the following patch(untested) does:

-- >8 --

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 18c7811774d3..b2696af0b817 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
 	AARCH64_INSN_DATA1_REVERSE_16,
 	AARCH64_INSN_DATA1_REVERSE_32,
 	AARCH64_INSN_DATA1_REVERSE_64,
+	AARCH64_INSN_DATA1_RBIT,
+	AARCH64_INSN_DATA1_CLZ,
+	AARCH64_INSN_DATA1_CTZ,
 };

 enum aarch64_insn_data2_type {
@@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv,	0x7FE0FC00, 0x1AC02C00)
 __AARCH64_INSN_FUNCS(rev16,	0x7FFFFC00, 0x5AC00400)
 __AARCH64_INSN_FUNCS(rev32,	0x7FFFFC00, 0x5AC00800)
 __AARCH64_INSN_FUNCS(rev64,	0x7FFFFC00, 0x5AC00C00)
+__AARCH64_INSN_FUNCS(rbit,	0x7FFFFC00, 0x5AC00000)
+__AARCH64_INSN_FUNCS(clz,	0x7FFFFC00, 0x5AC01000)
+__AARCH64_INSN_FUNCS(ctz,	0x7FFFFC00, 0x5AC01800)
 __AARCH64_INSN_FUNCS(and,	0x7F200000, 0x0A000000)
 __AARCH64_INSN_FUNCS(bic,	0x7F200000, 0x0A200000)
 __AARCH64_INSN_FUNCS(orr,	0x7F200000, 0x2A000000)
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index 4e298baddc2e..2229ab596cda 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
 		}
 		insn = aarch64_insn_get_rev64_value();
 		break;
+	case AARCH64_INSN_DATA1_CLZ:
+		insn = aarch64_insn_get_clz_value();
+		break;
+	case AARCH64_INSN_DATA1_RBIT:
+		insn = aarch64_insn_get_rbit_value();
+		break;
+	case AARCH64_INSN_DATA1_CTZ:
+		insn = aarch64_insn_get_ctz_value();
+		break;
 	default:
 		pr_err("%s: unknown data1 encoding %d\n", __func__, type);
 		return AARCH64_BREAK_FAULT;
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index bbea4f36f9f2..af806c39dadb 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -248,6 +248,12 @@
 #define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
 #define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
 #define A64_REV64(Rd, Rn)     A64_DATA1(1, Rd, Rn, REVERSE_64)
+/* Rd = RBIT(Rn) */
+#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
+/* Rd = CLZ(Rn) */
+#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
+/* Rd = CTZ(Rn) */
+#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)

 /* Data-processing (2 source) */
 /* Rd = Rn OP Rm */

-- 8< --

Thanks,
Puranjay

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support
  2026-02-19 15:25   ` Puranjay Mohan
@ 2026-02-19 15:36     ` Leon Hwang
  0 siblings, 0 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-19 15:36 UTC (permalink / raw)
  To: Puranjay Mohan, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Xu Kuohai, Catalin Marinas, Will Deacon, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
	Shuah Khan, Peilin Ye, Luis Gerhorst, Viktor Malik,
	linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	kernel-patches-bot



On 2026/2/19 23:25, Puranjay Mohan wrote:
> Leon Hwang <leon.hwang@linux.dev> writes:
> 
>> Implement JIT inlining of the 64-bit bitops kfuncs on arm64.
>>
>> bpf_clz64(), bpf_ffs64(), bpf_fls64(), and bpf_bitrev64() are always
>> inlined using mandatory ARMv8 CLZ/RBIT instructions. bpf_ctz64() is
>> inlined via RBIT + CLZ, or via the native CTZ instruction when
>> FEAT_CSSC is available. bpf_rol64() and bpf_ror64() are always inlined
>> via RORV.
>>
>> bpf_popcnt64() is not inlined as the native population count instruction
>> requires NEON/SIMD registers, which should not be touched from BPF
>> programs. It therefore falls back to a regular function call.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  arch/arm64/net/bpf_jit_comp.c | 123 ++++++++++++++++++++++++++++++++++
>>  1 file changed, 123 insertions(+)
>>
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index 7a530ea4f5ae..f03f732063d9 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -1192,6 +1192,127 @@ static int add_exception_handler(const struct bpf_insn *insn,
>>  	return 0;
>>  }
>>  
>> +static inline u32 a64_clz64(u8 rd, u8 rn)
>> +{
>> +	/*
>> +	 * Arm Architecture Reference Manual for A-profile architecture
>> +	 * (Document number: ARM DDI 0487)
>> +	 *
>> +	 *   A64 Base Instruction Descriptions
>> +	 *   C6.2 Alphabetical list of A64 base instructions
>> +	 *
>> +	 *   C6.2.91 CLZ
>> +	 *
>> +	 *     Count leading zeros
>> +	 *
>> +	 *     This instruction counts the number of consecutive binary zero bits,
>> +	 *     starting from the most significant bit in the source register,
>> +	 *     and places the count in the destination register.
>> +	 */
>> +	/* CLZ Xd, Xn */
>> +	return 0xdac01000 | (rn << 5) | rd;
>> +}
>> +
>> +static inline u32 a64_ctz64(u8 rd, u8 rn)
>> +{
>> +	/*
>> +	 * Arm Architecture Reference Manual for A-profile architecture
>> +	 * (Document number: ARM DDI 0487)
>> +	 *
>> +	 *   A64 Base Instruction Descriptions
>> +	 *   C6.2 Alphabetical list of A64 base instructions
>> +	 *
>> +	 *   C6.2.144 CTZ
>> +	 *
>> +	 *     Count trailing zeros
>> +	 *
>> +	 *     This instruction counts the number of consecutive binary zero bits,
>> +	 *     starting from the least significant bit in the source register,
>> +	 *     and places the count in the destination register.
>> +	 *
>> +	 *     This instruction requires FEAT_CSSC.
>> +	 */
>> +	/* CTZ Xd, Xn */
>> +	return 0xdac01800 | (rn << 5) | rd;
>> +}
>> +
>> +static inline u32 a64_rbit64(u8 rd, u8 rn)
>> +{
>> +	/*
>> +	 * Arm Architecture Reference Manual for A-profile architecture
>> +	 * (Document number: ARM DDI 0487)
>> +	 *
>> +	 *   A64 Base Instruction Descriptions
>> +	 *   C6.2 Alphabetical list of A64 base instructions
>> +	 *
>> +	 *   C6.2.320 RBIT
>> +	 *
>> +	 *     Reverse bits
>> +	 *
>> +	 *     This instruction reverses the bit order in a register.
>> +	 */
>> +	/* RBIT Xd, Xn */
>> +	return 0xdac00000 | (rn << 5) | rd;
>> +}
> 
> I don't think adding the above three functions is the best to JIT these
> intructions, do it like the other data1 and data2 instructions and add
> them to the generic framework like the following patch(untested) does:
> > -- >8 --
> 
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index 18c7811774d3..b2696af0b817 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -221,6 +221,9 @@ enum aarch64_insn_data1_type {
>  	AARCH64_INSN_DATA1_REVERSE_16,
>  	AARCH64_INSN_DATA1_REVERSE_32,
>  	AARCH64_INSN_DATA1_REVERSE_64,
> +	AARCH64_INSN_DATA1_RBIT,
> +	AARCH64_INSN_DATA1_CLZ,
> +	AARCH64_INSN_DATA1_CTZ,
>  };
> 
>  enum aarch64_insn_data2_type {
> @@ -389,6 +392,9 @@ __AARCH64_INSN_FUNCS(rorv,	0x7FE0FC00, 0x1AC02C00)
>  __AARCH64_INSN_FUNCS(rev16,	0x7FFFFC00, 0x5AC00400)
>  __AARCH64_INSN_FUNCS(rev32,	0x7FFFFC00, 0x5AC00800)
>  __AARCH64_INSN_FUNCS(rev64,	0x7FFFFC00, 0x5AC00C00)
> +__AARCH64_INSN_FUNCS(rbit,	0x7FFFFC00, 0x5AC00000)
> +__AARCH64_INSN_FUNCS(clz,	0x7FFFFC00, 0x5AC01000)
> +__AARCH64_INSN_FUNCS(ctz,	0x7FFFFC00, 0x5AC01800)
>  __AARCH64_INSN_FUNCS(and,	0x7F200000, 0x0A000000)
>  __AARCH64_INSN_FUNCS(bic,	0x7F200000, 0x0A200000)
>  __AARCH64_INSN_FUNCS(orr,	0x7F200000, 0x2A000000)
> diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
> index 4e298baddc2e..2229ab596cda 100644
> --- a/arch/arm64/lib/insn.c
> +++ b/arch/arm64/lib/insn.c
> @@ -1008,6 +1008,15 @@ u32 aarch64_insn_gen_data1(enum aarch64_insn_register dst,
>  		}
>  		insn = aarch64_insn_get_rev64_value();
>  		break;
> +	case AARCH64_INSN_DATA1_CLZ:
> +		insn = aarch64_insn_get_clz_value();
> +		break;
> +	case AARCH64_INSN_DATA1_RBIT:
> +		insn = aarch64_insn_get_rbit_value();
> +		break;
> +	case AARCH64_INSN_DATA1_CTZ:
> +		insn = aarch64_insn_get_ctz_value();
> +		break;
>  	default:
>  		pr_err("%s: unknown data1 encoding %d\n", __func__, type);
>  		return AARCH64_BREAK_FAULT;
> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
> index bbea4f36f9f2..af806c39dadb 100644
> --- a/arch/arm64/net/bpf_jit.h
> +++ b/arch/arm64/net/bpf_jit.h
> @@ -248,6 +248,12 @@
>  #define A64_REV16(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_16)
>  #define A64_REV32(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, REVERSE_32)
>  #define A64_REV64(Rd, Rn)     A64_DATA1(1, Rd, Rn, REVERSE_64)
> +/* Rd = RBIT(Rn) */
> +#define A64_RBIT(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, RBIT)
> +/* Rd = CLZ(Rn) */
> +#define A64_CLZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CLZ)
> +/* Rd = CTZ(Rn) */
> +#define A64_CTZ(sf, Rd, Rn) A64_DATA1(sf, Rd, Rn, CTZ)
> 
>  /* Data-processing (2 source) */
>  /* Rd = Rn OP Rm */
> 
> -- 8< --
> 
> Thanks,
> Puranjay

Ack.

I'll do it in the next revision.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
@ 2026-02-19 17:47   ` Alexei Starovoitov
  2026-02-20 15:54     ` Leon Hwang
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2026-02-19 17:47 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot

On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
>
> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
>
> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> X86_FEATURE_BMI1 (TZCNT).
>
> bpf_clz64() and bpf_fls64() are supported when the CPU has
> X86_FEATURE_ABM (LZCNT).
>
> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
>
> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> instruction, so it falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 141 insertions(+)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 070ba80e39d7..193e1e2d7aa8 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -19,6 +19,7 @@
>  #include <asm/text-patching.h>
>  #include <asm/unwind.h>
>  #include <asm/cfi.h>
> +#include <asm/cpufeatures.h>
>
>  static bool all_callee_regs_used[4] = {true, true, true, true};
>
> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
>         *pprog = prog;
>  }
>
> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> +{
> +       bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> +       bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> +       bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> +       bool inlined = true;
> +       u8 *prog = *pprog;
> +
> +       /*
> +        * x86 Bit manipulation instruction set
> +        * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> +        */
> +
> +       if (func == bpf_clz64 && has_abm) {
> +               /*
> +                * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> +                *
> +                *   LZCNT - Count the Number of Leading Zero Bits
> +                *
> +                *     Opcode/Instruction
> +                *     F3 REX.W 0F BD /r
> +                *     LZCNT r64, r/m64
> +                *
> +                *     Op/En
> +                *     RVM
> +                *
> +                *     64/32-bit Mode
> +                *     V/N.E.
> +                *
> +                *     CPUID Feature Flag
> +                *     LZCNT
> +                *
> +                *     Description
> +                *     Count the number of leading zero bits in r/m64, return
> +                *     result in r64.
> +                */
> +               /* emit: x ? 64 - fls64(x) : 64 */
> +               /* lzcnt rax, rdi */
> +               EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);

Instead of emitting binary in x86 and arm JITs,
let's use in kernel disasm to check that all these kfuncs
conform to kf_fastcall (don't use unnecessary registers,
don't have calls to other functions) and then copy the binary
from code and skip the last 'ret' insn.
This way we can inline all kinds of kfuncs.

pw-bot: cr

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
  2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
@ 2026-02-19 17:50   ` Alexei Starovoitov
  2026-02-20 15:34     ` Leon Hwang
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2026-02-19 17:50 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot

On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
> +static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
> +{
> +       if (!(flags & KF_FASTCALL))
> +               return false;
> +
> +       if (!env->prog->jit_requested)
> +               return true;
> +
> +       if (func_id == special_kfunc_list[KF_bpf_clz64])
> +               return bpf_jit_inlines_kfunc_call(bpf_clz64);
> +       if (func_id == special_kfunc_list[KF_bpf_ctz64])
> +               return bpf_jit_inlines_kfunc_call(bpf_ctz64);
> +       if (func_id == special_kfunc_list[KF_bpf_ffs64])
> +               return bpf_jit_inlines_kfunc_call(bpf_ffs64);
> +       if (func_id == special_kfunc_list[KF_bpf_fls64])
> +               return bpf_jit_inlines_kfunc_call(bpf_fls64);
> +       if (func_id == special_kfunc_list[KF_bpf_bitrev64])
> +               return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
> +       if (func_id == special_kfunc_list[KF_bpf_popcnt64])
> +               return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
> +       if (func_id == special_kfunc_list[KF_bpf_rol64])
> +               return bpf_jit_inlines_kfunc_call(bpf_rol64);
> +       if (func_id == special_kfunc_list[KF_bpf_ror64])
> +               return bpf_jit_inlines_kfunc_call(bpf_ror64);

This is too ugly. Find a way to do it differently.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 1/6] bpf: Introduce 64-bit bitops kfuncs
  2026-02-19 17:50   ` Alexei Starovoitov
@ 2026-02-20 15:34     ` Leon Hwang
  0 siblings, 0 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-20 15:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot



On 2026/2/20 01:50, Alexei Starovoitov wrote:
> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>> +static bool bpf_kfunc_is_fastcall(struct bpf_verifier_env *env, u32 func_id, u32 flags)
>> +{
>> +       if (!(flags & KF_FASTCALL))
>> +               return false;
>> +
>> +       if (!env->prog->jit_requested)
>> +               return true;
>> +
>> +       if (func_id == special_kfunc_list[KF_bpf_clz64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_clz64);
>> +       if (func_id == special_kfunc_list[KF_bpf_ctz64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_ctz64);
>> +       if (func_id == special_kfunc_list[KF_bpf_ffs64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_ffs64);
>> +       if (func_id == special_kfunc_list[KF_bpf_fls64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_fls64);
>> +       if (func_id == special_kfunc_list[KF_bpf_bitrev64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_bitrev64);
>> +       if (func_id == special_kfunc_list[KF_bpf_popcnt64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_popcnt64);
>> +       if (func_id == special_kfunc_list[KF_bpf_rol64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_rol64);
>> +       if (func_id == special_kfunc_list[KF_bpf_ror64])
>> +               return bpf_jit_inlines_kfunc_call(bpf_ror64);
> 
> This is too ugly. Find a way to do it differently.

Agreed.

I'd like to introduce a new flag KF_JIT_MAY_INLINE to indicate the kfunc
will be inlined by JIT backends if possible. As for those kfuncs w/
KF_FASTCALL w/o KF_JIT_MAY_INLINE, they are fastcall always.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  2026-02-19 17:47   ` Alexei Starovoitov
@ 2026-02-20 15:54     ` Leon Hwang
  2026-02-20 17:50       ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Hwang @ 2026-02-20 15:54 UTC (permalink / raw)
  To: Alexei Starovoitov, Ilya Leoshkevich
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye, Luis Gerhorst,
	Viktor Malik, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot



On 2026/2/20 01:47, Alexei Starovoitov wrote:
> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
>>
>> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
>>
>> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
>> X86_FEATURE_BMI1 (TZCNT).
>>
>> bpf_clz64() and bpf_fls64() are supported when the CPU has
>> X86_FEATURE_ABM (LZCNT).
>>
>> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
>>
>> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
>> instruction, so it falls back to a regular function call.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 141 insertions(+)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 070ba80e39d7..193e1e2d7aa8 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -19,6 +19,7 @@
>>  #include <asm/text-patching.h>
>>  #include <asm/unwind.h>
>>  #include <asm/cfi.h>
>> +#include <asm/cpufeatures.h>
>>
>>  static bool all_callee_regs_used[4] = {true, true, true, true};
>>
>> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
>>         *pprog = prog;
>>  }
>>
>> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
>> +{
>> +       bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
>> +       bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
>> +       bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
>> +       bool inlined = true;
>> +       u8 *prog = *pprog;
>> +
>> +       /*
>> +        * x86 Bit manipulation instruction set
>> +        * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
>> +        */
>> +
>> +       if (func == bpf_clz64 && has_abm) {
>> +               /*
>> +                * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
>> +                *
>> +                *   LZCNT - Count the Number of Leading Zero Bits
>> +                *
>> +                *     Opcode/Instruction
>> +                *     F3 REX.W 0F BD /r
>> +                *     LZCNT r64, r/m64
>> +                *
>> +                *     Op/En
>> +                *     RVM
>> +                *
>> +                *     64/32-bit Mode
>> +                *     V/N.E.
>> +                *
>> +                *     CPUID Feature Flag
>> +                *     LZCNT
>> +                *
>> +                *     Description
>> +                *     Count the number of leading zero bits in r/m64, return
>> +                *     result in r64.
>> +                */
>> +               /* emit: x ? 64 - fls64(x) : 64 */
>> +               /* lzcnt rax, rdi */
>> +               EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> 
> Instead of emitting binary in x86 and arm JITs,
> let's use in kernel disasm to check that all these kfuncs
> conform to kf_fastcall (don't use unnecessary registers,
> don't have calls to other functions) and then copy the binary
> from code and skip the last 'ret' insn.
> This way we can inline all kinds of kfuncs.
> 

Good idea.

Quick question on “in-kernel disasm”: do you mean adding a kernel
instruction decoder/disassembler to validate a whitelist of kfuncs at
load time?

I’m trying to understand the intended scope:

* Is the expectation that we add an in-kernel disassembler/validator for
  a small set of supported instructions and patterns (no calls/jumps,
  only arg/ret regs touched, etc.)?
* Or is there already infrastructure you had in mind that we can reuse?

Once I understand that piece, I can rework the series to inline by
copying validated machine code (minus the final ret), rather than
emitting raw opcodes in the JITs.

I also noticed you mentioned a similar direction in "bpf/s390: Implement
get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
this approach further.

[1]
https://lore.kernel.org/bpf/CAADnVQKSMCohZy_HZwzNpFfTSnVu7rfxgmHEDgT9s28XxcDS5g@mail.gmail.com/

Thanks,
Leon


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  2026-02-20 15:54     ` Leon Hwang
@ 2026-02-20 17:50       ` Alexei Starovoitov
  2026-02-21 12:45         ` Leon Hwang
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2026-02-20 17:50 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
	Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
	Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
	Network Development, open list:KERNEL SELFTEST FRAMEWORK,
	kernel-patches-bot

On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2026/2/20 01:47, Alexei Starovoitov wrote:
> > On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
> >>
> >> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
> >>
> >> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> >> X86_FEATURE_BMI1 (TZCNT).
> >>
> >> bpf_clz64() and bpf_fls64() are supported when the CPU has
> >> X86_FEATURE_ABM (LZCNT).
> >>
> >> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
> >>
> >> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> >> instruction, so it falls back to a regular function call.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >>  arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 141 insertions(+)
> >>
> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >> index 070ba80e39d7..193e1e2d7aa8 100644
> >> --- a/arch/x86/net/bpf_jit_comp.c
> >> +++ b/arch/x86/net/bpf_jit_comp.c
> >> @@ -19,6 +19,7 @@
> >>  #include <asm/text-patching.h>
> >>  #include <asm/unwind.h>
> >>  #include <asm/cfi.h>
> >> +#include <asm/cpufeatures.h>
> >>
> >>  static bool all_callee_regs_used[4] = {true, true, true, true};
> >>
> >> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
> >>         *pprog = prog;
> >>  }
> >>
> >> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> >> +{
> >> +       bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> >> +       bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> >> +       bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> >> +       bool inlined = true;
> >> +       u8 *prog = *pprog;
> >> +
> >> +       /*
> >> +        * x86 Bit manipulation instruction set
> >> +        * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> >> +        */
> >> +
> >> +       if (func == bpf_clz64 && has_abm) {
> >> +               /*
> >> +                * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> >> +                *
> >> +                *   LZCNT - Count the Number of Leading Zero Bits
> >> +                *
> >> +                *     Opcode/Instruction
> >> +                *     F3 REX.W 0F BD /r
> >> +                *     LZCNT r64, r/m64
> >> +                *
> >> +                *     Op/En
> >> +                *     RVM
> >> +                *
> >> +                *     64/32-bit Mode
> >> +                *     V/N.E.
> >> +                *
> >> +                *     CPUID Feature Flag
> >> +                *     LZCNT
> >> +                *
> >> +                *     Description
> >> +                *     Count the number of leading zero bits in r/m64, return
> >> +                *     result in r64.
> >> +                */
> >> +               /* emit: x ? 64 - fls64(x) : 64 */
> >> +               /* lzcnt rax, rdi */
> >> +               EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> >
> > Instead of emitting binary in x86 and arm JITs,
> > let's use in kernel disasm to check that all these kfuncs
> > conform to kf_fastcall (don't use unnecessary registers,
> > don't have calls to other functions) and then copy the binary
> > from code and skip the last 'ret' insn.
> > This way we can inline all kinds of kfuncs.
> >
>
> Good idea.
>
> Quick question on “in-kernel disasm”: do you mean adding a kernel
> instruction decoder/disassembler to validate a whitelist of kfuncs at
> load time?
>
> I’m trying to understand the intended scope:
>
> * Is the expectation that we add an in-kernel disassembler/validator for
>   a small set of supported instructions and patterns (no calls/jumps,
>   only arg/ret regs touched, etc.)?
> * Or is there already infrastructure you had in mind that we can reuse?
>
> Once I understand that piece, I can rework the series to inline by
> copying validated machine code (minus the final ret), rather than
> emitting raw opcodes in the JITs.
>
> I also noticed you mentioned a similar direction in "bpf/s390: Implement
> get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
> this approach further.

You really sound like LLM. Do your homework as a human.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  2026-02-20 17:50       ` Alexei Starovoitov
@ 2026-02-21 12:45         ` Leon Hwang
  2026-02-21 16:51           ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Hwang @ 2026-02-21 12:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
	Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
	Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
	Network Development, open list:KERNEL SELFTEST FRAMEWORK,
	kernel-patches-bot



On 2026/2/21 01:50, Alexei Starovoitov wrote:
> On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 2026/2/20 01:47, Alexei Starovoitov wrote:
>>> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
>>>>
>>>> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
>>>>
>>>> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
>>>> X86_FEATURE_BMI1 (TZCNT).
>>>>
>>>> bpf_clz64() and bpf_fls64() are supported when the CPU has
>>>> X86_FEATURE_ABM (LZCNT).
>>>>
>>>> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
>>>>
>>>> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
>>>> instruction, so it falls back to a regular function call.
>>>>
>>>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>>>> ---
>>>>  arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 141 insertions(+)
>>>>
>>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>>> index 070ba80e39d7..193e1e2d7aa8 100644
>>>> --- a/arch/x86/net/bpf_jit_comp.c
>>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>>> @@ -19,6 +19,7 @@
>>>>  #include <asm/text-patching.h>
>>>>  #include <asm/unwind.h>
>>>>  #include <asm/cfi.h>
>>>> +#include <asm/cpufeatures.h>
>>>>
>>>>  static bool all_callee_regs_used[4] = {true, true, true, true};
>>>>
>>>> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
>>>>         *pprog = prog;
>>>>  }
>>>>
>>>> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
>>>> +{
>>>> +       bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
>>>> +       bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
>>>> +       bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
>>>> +       bool inlined = true;
>>>> +       u8 *prog = *pprog;
>>>> +
>>>> +       /*
>>>> +        * x86 Bit manipulation instruction set
>>>> +        * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
>>>> +        */
>>>> +
>>>> +       if (func == bpf_clz64 && has_abm) {
>>>> +               /*
>>>> +                * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
>>>> +                *
>>>> +                *   LZCNT - Count the Number of Leading Zero Bits
>>>> +                *
>>>> +                *     Opcode/Instruction
>>>> +                *     F3 REX.W 0F BD /r
>>>> +                *     LZCNT r64, r/m64
>>>> +                *
>>>> +                *     Op/En
>>>> +                *     RVM
>>>> +                *
>>>> +                *     64/32-bit Mode
>>>> +                *     V/N.E.
>>>> +                *
>>>> +                *     CPUID Feature Flag
>>>> +                *     LZCNT
>>>> +                *
>>>> +                *     Description
>>>> +                *     Count the number of leading zero bits in r/m64, return
>>>> +                *     result in r64.
>>>> +                */
>>>> +               /* emit: x ? 64 - fls64(x) : 64 */
>>>> +               /* lzcnt rax, rdi */
>>>> +               EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
>>>
>>> Instead of emitting binary in x86 and arm JITs,
>>> let's use in kernel disasm to check that all these kfuncs
>>> conform to kf_fastcall (don't use unnecessary registers,
>>> don't have calls to other functions) and then copy the binary
>>> from code and skip the last 'ret' insn.
>>> This way we can inline all kinds of kfuncs.
>>>
>>
>> Good idea.
>>
>> Quick question on “in-kernel disasm”: do you mean adding a kernel
>> instruction decoder/disassembler to validate a whitelist of kfuncs at
>> load time?
>>
>> I’m trying to understand the intended scope:
>>
>> * Is the expectation that we add an in-kernel disassembler/validator for
>>   a small set of supported instructions and patterns (no calls/jumps,
>>   only arg/ret regs touched, etc.)?
>> * Or is there already infrastructure you had in mind that we can reuse?
>>
>> Once I understand that piece, I can rework the series to inline by
>> copying validated machine code (minus the final ret), rather than
>> emitting raw opcodes in the JITs.
>>
>> I also noticed you mentioned a similar direction in "bpf/s390: Implement
>> get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
>> this approach further.
> 
> You really sound like LLM. Do your homework as a human.

Got it.

I polished my draft using ChatGPT, which would leave LLM smell in my reply.

Here's my original draft:

Good idea. But I concern about the "in kernel disasm". Do you mean we
will build a disassembler for whitelist kfuncs at starting?

I noticed you've mentioned the same direction in "bpf/s390: Implement
get_preempt_count()" [1]. So, I added Ilya here to discuss this direction.

[1]
https://lore.kernel.org/bpf/CAADnVQKSMCohZy_HZwzNpFfTSnVu7rfxgmHEDgT9s28XxcDS5g@mail.gmail.com/

Thanks,
Leon



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  2026-02-21 12:45         ` Leon Hwang
@ 2026-02-21 16:51           ` Alexei Starovoitov
  2026-02-23 16:35             ` Leon Hwang
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2026-02-21 16:51 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
	Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
	Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
	Network Development, open list:KERNEL SELFTEST FRAMEWORK,
	kernel-patches-bot

On Sat, Feb 21, 2026 at 4:45 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2026/2/21 01:50, Alexei Starovoitov wrote:
> > On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> On 2026/2/20 01:47, Alexei Starovoitov wrote:
> >>> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>>>
> >>>> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
> >>>>
> >>>> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
> >>>>
> >>>> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> >>>> X86_FEATURE_BMI1 (TZCNT).
> >>>>
> >>>> bpf_clz64() and bpf_fls64() are supported when the CPU has
> >>>> X86_FEATURE_ABM (LZCNT).
> >>>>
> >>>> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
> >>>>
> >>>> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> >>>> instruction, so it falls back to a regular function call.
> >>>>
> >>>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >>>> ---
> >>>>  arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 141 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >>>> index 070ba80e39d7..193e1e2d7aa8 100644
> >>>> --- a/arch/x86/net/bpf_jit_comp.c
> >>>> +++ b/arch/x86/net/bpf_jit_comp.c
> >>>> @@ -19,6 +19,7 @@
> >>>>  #include <asm/text-patching.h>
> >>>>  #include <asm/unwind.h>
> >>>>  #include <asm/cfi.h>
> >>>> +#include <asm/cpufeatures.h>
> >>>>
> >>>>  static bool all_callee_regs_used[4] = {true, true, true, true};
> >>>>
> >>>> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
> >>>>         *pprog = prog;
> >>>>  }
> >>>>
> >>>> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> >>>> +{
> >>>> +       bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> >>>> +       bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> >>>> +       bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> >>>> +       bool inlined = true;
> >>>> +       u8 *prog = *pprog;
> >>>> +
> >>>> +       /*
> >>>> +        * x86 Bit manipulation instruction set
> >>>> +        * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> >>>> +        */
> >>>> +
> >>>> +       if (func == bpf_clz64 && has_abm) {
> >>>> +               /*
> >>>> +                * Intel® 64 and IA-32 Architectures Software Developer's Manual (June 2023)
> >>>> +                *
> >>>> +                *   LZCNT - Count the Number of Leading Zero Bits
> >>>> +                *
> >>>> +                *     Opcode/Instruction
> >>>> +                *     F3 REX.W 0F BD /r
> >>>> +                *     LZCNT r64, r/m64
> >>>> +                *
> >>>> +                *     Op/En
> >>>> +                *     RVM
> >>>> +                *
> >>>> +                *     64/32-bit Mode
> >>>> +                *     V/N.E.
> >>>> +                *
> >>>> +                *     CPUID Feature Flag
> >>>> +                *     LZCNT
> >>>> +                *
> >>>> +                *     Description
> >>>> +                *     Count the number of leading zero bits in r/m64, return
> >>>> +                *     result in r64.
> >>>> +                */
> >>>> +               /* emit: x ? 64 - fls64(x) : 64 */
> >>>> +               /* lzcnt rax, rdi */
> >>>> +               EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> >>>
> >>> Instead of emitting binary in x86 and arm JITs,
> >>> let's use in kernel disasm to check that all these kfuncs
> >>> conform to kf_fastcall (don't use unnecessary registers,
> >>> don't have calls to other functions) and then copy the binary
> >>> from code and skip the last 'ret' insn.
> >>> This way we can inline all kinds of kfuncs.
> >>>
> >>
> >> Good idea.
> >>
> >> Quick question on “in-kernel disasm”: do you mean adding a kernel
> >> instruction decoder/disassembler to validate a whitelist of kfuncs at
> >> load time?
> >>
> >> I’m trying to understand the intended scope:
> >>
> >> * Is the expectation that we add an in-kernel disassembler/validator for
> >>   a small set of supported instructions and patterns (no calls/jumps,
> >>   only arg/ret regs touched, etc.)?
> >> * Or is there already infrastructure you had in mind that we can reuse?
> >>
> >> Once I understand that piece, I can rework the series to inline by
> >> copying validated machine code (minus the final ret), rather than
> >> emitting raw opcodes in the JITs.
> >>
> >> I also noticed you mentioned a similar direction in "bpf/s390: Implement
> >> get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
> >> this approach further.
> >
> > You really sound like LLM. Do your homework as a human.
>
> Got it.
>
> I polished my draft using ChatGPT, which would leave LLM smell in my reply.

... and for anyone reading it the smell is ohh too strong.

> Here's my original draft:
>
> Good idea. But I concern about the "in kernel disasm". Do you mean we
> will build a disassembler for whitelist kfuncs at starting?
>
> I noticed you've mentioned the same direction in "bpf/s390: Implement
> get_preempt_count()" [1]. So, I added Ilya here to discuss this direction.

Much better. Keep it human.

"in kernel disasm" already exists for some architectures
(at least x86 and arm64) since it's being used by kprobes.
The ask here is to figure out whether they're usable for such
insn analysis. x86 disasm is likely capable.

re:"whitelist kfunc"
I suspect an additional list is not necessary.
kf_fastcall is a good enough signal that such kfunc should
be inlinable.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64
  2026-02-21 16:51           ` Alexei Starovoitov
@ 2026-02-23 16:35             ` Leon Hwang
  0 siblings, 0 replies; 19+ messages in thread
From: Leon Hwang @ 2026-02-23 16:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ilya Leoshkevich, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Puranjay Mohan, Xu Kuohai, Catalin Marinas,
	Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, X86 ML, H . Peter Anvin, Shuah Khan, Peilin Ye,
	Luis Gerhorst, Viktor Malik, linux-arm-kernel, LKML,
	Network Development, open list:KERNEL SELFTEST FRAMEWORK,
	kernel-patches-bot



On 2026/2/22 00:51, Alexei Starovoitov wrote:
> On Sat, Feb 21, 2026 at 4:45 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>

[...]

>>
>> Good idea. But I concern about the "in kernel disasm". Do you mean we
>> will build a disassembler for whitelist kfuncs at starting?
>>
>> I noticed you've mentioned the same direction in "bpf/s390: Implement
>> get_preempt_count()" [1]. So, I added Ilya here to discuss this direction.
> 
> Much better. Keep it human.
> 
> "in kernel disasm" already exists for some architectures
> (at least x86 and arm64) since it's being used by kprobes.
> The ask here is to figure out whether they're usable for such
> insn analysis. x86 disasm is likely capable.
> 

After looking into x86&arm insn decoder, they are able to do insn analysis.

> re:"whitelist kfunc"
> I suspect an additional list is not necessary.
> kf_fastcall is a good enough signal that such kfunc should
> be inlinable.

I thought it was to build a light-weight custom disassembler, which
would only support limited machine codes (whitelist kfunc).

Obviously, I was wrong.

We can reuse the in-kernel insn decoding ability to validate fastcall
function by checking the registers use.

I'll post RFC after finishing poc, on both x86_64 and arm64 of course.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-02-23 16:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 14:29 [PATCH bpf-next v2 0/6] bpf: Introduce 64-bit bitops kfuncs Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 1/6] " Leon Hwang
2026-02-19 17:50   ` Alexei Starovoitov
2026-02-20 15:34     ` Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 2/6] bpf, x86: Add 64-bit bitops kfuncs support for x86_64 Leon Hwang
2026-02-19 17:47   ` Alexei Starovoitov
2026-02-20 15:54     ` Leon Hwang
2026-02-20 17:50       ` Alexei Starovoitov
2026-02-21 12:45         ` Leon Hwang
2026-02-21 16:51           ` Alexei Starovoitov
2026-02-23 16:35             ` Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 3/6] bpf, arm64: Add 64-bit bitops kfuncs support Leon Hwang
2026-02-19 15:10   ` Puranjay Mohan
2026-02-19 15:20   ` Puranjay Mohan
2026-02-19 15:25   ` Puranjay Mohan
2026-02-19 15:36     ` Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 4/6] selftests/bpf: Add tests for 64-bit bitops kfuncs Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add __cpu_feature annotation for CPU-feature-gated tests Leon Hwang
2026-02-19 14:29 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add JIT disassembly tests for 64-bit bitops kfuncs Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox