[PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs
@ 2026-04-18 13:15 Puranjay Mohan
  2026-04-18 13:15 ` [PATCH bpf-next v13 1/6] bpf: adjust BPF JIT dependency to BPF_SYSCALL Puranjay Mohan
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-18 13:15 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Xu Kuohai,
	Vadim Fedorenko, Catalin Marinas, Will Deacon, kernel-team

This patchset adds 2 kfuncs to provide a way to precisely measure the
time spent running some code. The first patch provides a way to get cpu
cycles counter which is used to feed CLOCK_MONOTONIC_RAW. On x86
architecture it is effectively rdtsc_ordered() function. On arm64, it is
JITed to a read of CNTVCT_EL0 with ordering or CNTVCTSS_EL0, depending
on cpu support.

The second patch adds a kfunc to convert cpu cycles to nanoseconds using
shift/mult constants discovered by kernel. The main use-case for this
kfunc is to convert deltas of timestamp counter values into nanoseconds.
It is not supposed to get CLOCK_MONOTONIC_RAW values as offset part is
skipped. JIT version is done for x86 and arm64 for now, on other
architectures it falls back to get CLOCK_MONOTONIC_RAW values.

The reason to have these functions is to avoid overhead added by
a bpf_ktime_get_ns() call in case of benchmarking, when two timestamps
are taken to get delta value. With both functions being JITed, the
overhead is minimal and the result has better precision. New functions
can be used to benchmark BPF code directly in the program, or can be
used in kprobe/uprobe to store timestamp counter in the session cookie
and then in kretprobe/uretprobe the delta can be calculated and
converted into nanoseconds.

These will be used in under-development BPF benchmarks, the first of
which is available at [1].

Pre-requriement for this patch is to adjust CONFIG_BPF_JIT to depend on
CONFIG_BPF_SYSCALL as it simplify things and it's actually a long
overdue patch.

Selftests are also added to check whether the JIT implementation is
correct and to show the simplest usage example.

[1] https://github.com/puranjaymohan/bpf/commits/bpf_bench/

Change log:
v12: https://lore.kernel.org/bpf/20250319163638.3607043-1-vadfed@meta.com/
Changes in v13:
- Added support for arm64
- Rebased on latest bpf-next/master
- Changed the word helper to kfunc in the commit message to not confuse
  with bpf helpers.

v11 -> v12:
* drop x86_32 JIT implementation because previous implementation of
  mul_u64_u32_shr() was not fully correct and the amount of work to
	properly implement it in asm is not worth the result.
* add comment explaining stablility of shift and mult values
* add comment explaining the reasoning behind CLOCK_MONOTONIC_RAW
  usage in fallback mode
* drop KF_FASTCALL tag from bpf_get_cpu_time_counter() as it cannot
  be implemented as fastcall in the verifier
* re-implement verifier checks for possible JIT of kfuncs
* adjust selftests to use JIT code for x86_64 and verifier inlining
  for arm64
v10 -> v11:
* add missing IS_ENABLED(CONFIG_BPF_SYSCALL)
* reword "cycles" -> "counter"
v9 -> v10:
* rework fallback implementation to avoid using vDSO data from
  kernel space.
* add comment about using "LFENCE; RDTSC" instead of "RDTSCP"
* guard x86 JIT implementation to be sure that TSC is enabled and
  stable
* v9 link:
  https://lore.kernel.org/bpf/20241123005833.810044-1-vadfed@meta.com/
v8 -> v9:
* rewording of commit messages, no code changes
* move change log from each patch into cover letter
v7 -> v8:
* rename kfuncs again to bpf_get_cpu_time_counter() and
  bpf_cpu_time_counter_to_ns()
* use cyc2ns_read_begin()/cyc2ns_read_end() to get mult and shift
  constants in bpf_cpu_time_counter_to_ns()
v6 -> v7:
* change boot_cpu_has() to cpu_feature_enabled() (Borislav)
* return constant clock_mode in __arch_get_hw_counter() call
v5 -> v6:
* added cover letter
* add comment about dropping S64_MAX manipulation in jitted
  implementation of rdtsc_oredered (Alexey)
* add comment about using 'lfence;rdtsc' variant (Alexey)
* change the check in fixup_kfunc_call() (Eduard)
* make __arch_get_hw_counter() call more aligned with vDSO
  implementation (Yonghong)
v4 -> v5:
* use #if instead of #ifdef with IS_ENABLED
v3 -> v4:
* change name of the helper to bpf_get_cpu_cycles (Andrii)
* Hide the helper behind CONFIG_GENERIC_GETTIMEOFDAY to avoid exposing
  it on architectures which do not have vDSO functions and data
* reduce the scope of check of inlined functions in verifier to only 2,
  which are actually inlined.
* change helper name to bpf_cpu_cycles_to_ns.
* hide it behind CONFIG_GENERIC_GETTIMEOFDAY to avoid exposing on
  unsupported architectures.
v2 -> v3:
* change name of the helper to bpf_get_cpu_cycles_counter to
  explicitly mention what counter it provides (Andrii)
* move kfunc definition to bpf.h to use it in JIT.
* introduce another kfunc to convert cycles into nanoseconds as
  more meaningful time units for generic tracing use case (Andrii)
v1 -> v2:
* Fix incorrect function return value type to u64
* Introduce bpf_jit_inlines_kfunc_call() and use it in
	mark_fastcall_pattern_for_call() to avoid clobbering in case
	of running programs with no JIT (Eduard)
* Avoid rewriting instruction and check function pointer directly
	in JIT (Alexei)
* Change includes to fix compile issues on non x86 architectures

Puranjay Mohan (1):
  bpf, arm64: Add JIT support for cpu time counter kfuncs

Vadim Fedorenko (5):
  bpf: adjust BPF JIT dependency to BPF_SYSCALL
  bpf: add bpf_get_cpu_time_counter kfunc
  bpf: add bpf_cpu_time_counter_to_ns kfunc
  selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit
  selftests/bpf: add usage example for cpu time counter kfuncs

 arch/arm64/include/asm/insn.h                 |   2 +
 arch/arm64/net/bpf_jit.h                      |   4 +
 arch/arm64/net/bpf_jit_comp.c                 |  54 ++++++
 arch/x86/net/bpf_jit_comp.c                   |  72 ++++++++
 arch/x86/net/bpf_jit_comp32.c                 |   2 +
 include/linux/bpf.h                           |   4 +
 include/linux/filter.h                        |   1 +
 kernel/bpf/Kconfig                            |   2 +-
 kernel/bpf/core.c                             |  11 ++
 kernel/bpf/helpers.c                          |  17 ++
 kernel/bpf/verifier.c                         |  14 +-
 .../bpf/prog_tests/test_cpu_cycles.c          |  35 ++++
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 .../selftests/bpf/progs/test_cpu_cycles.c     |  25 +++
 .../selftests/bpf/progs/verifier_cpu_cycles.c | 168 ++++++++++++++++++
 15 files changed, 408 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_cpu_cycles.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c


base-commit: eb0d6d97c27c29cd7392c8fd74f46edf7dff7ec2
-- 
2.52.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v13 1/6] bpf: adjust BPF JIT dependency to BPF_SYSCALL
  2026-04-18 13:15 [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs Puranjay Mohan
@ 2026-04-18 13:15 ` Puranjay Mohan
  2026-04-18 13:46   ` sashiko-bot
  2026-04-18 13:16 ` [PATCH bpf-next v13 2/6] bpf: add bpf_get_cpu_time_counter kfunc Puranjay Mohan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-18 13:15 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Xu Kuohai,
	Vadim Fedorenko, Catalin Marinas, Will Deacon, kernel-team,
	Vadim Fedorenko

From: Vadim Fedorenko <vadfed@meta.com>

BPF JIT is moving towards optimizing kfuncs and it was long overdue to
switch the dependency. Let's do it now to simplify other patches in the
series.

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index eb3de35734f0..b0f49be35675 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -41,7 +41,7 @@ config BPF_SYSCALL
 
 config BPF_JIT
 	bool "Enable BPF Just In Time compiler"
-	depends on BPF
+	depends on BPF_SYSCALL
 	depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
 	select EXECMEM
 	help
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 1/6] bpf: adjust BPF JIT dependency to BPF_SYSCALL
  2026-04-18 13:15 ` [PATCH bpf-next v13 1/6] bpf: adjust BPF JIT dependency to BPF_SYSCALL Puranjay Mohan
@ 2026-04-18 13:46   ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-18 13:46 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Sashiko AI review found 1 potential issue(s):
- [High] Making `CONFIG_BPF_JIT` depend on `CONFIG_BPF_SYSCALL` silently disables JIT compilation for seccomp and classic BPF (cBPF) socket filters on systems that intentionally disable the `bpf()` syscall.
--

commit b9f1b2db51ab2071f24b457fa58ed412ec5bc050
Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>

bpf: adjust BPF JIT dependency to BPF_SYSCALL

This commit changes the Kconfig dependency for BPF_JIT from BPF to
BPF_SYSCALL. The intention is to simplify other patches in the series as
BPF JIT moves towards optimizing kfuncs.

> diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
> index eb3de35734f09..b0f49be35675f 100644
> --- a/kernel/bpf/Kconfig
> +++ b/kernel/bpf/Kconfig
> @@ -41,7 +41,7 @@ config BPF_SYSCALL
>  
>  config BPF_JIT
>  	bool "Enable BPF Just In Time compiler"
> -	depends on BPF
> +	depends on BPF_SYSCALL

Does this change cause a regression by silently disabling JIT compilation
for seccomp and classic BPF socket filters on systems that intentionally
disable the bpf() syscall?

The kernel's internal BPF engine (CONFIG_BPF) translates classic BPF to
eBPF for internal users like seccomp and SO_ATTACH_FILTER. The JIT
compiler (CONFIG_BPF_JIT) optimizes these internal programs independently
of the bpf() syscall.

Systems can configure CONFIG_BPF_JIT=y and CONFIG_BPF_SYSCALL=n to prevent
userspace from loading arbitrary eBPF programs via the syscall.

By tying BPF_JIT directly to BPF_SYSCALL, this change forces these
environments to either run filters in the interpreter or expose the bpf()
syscall.

Additionally, does this force architectures that only implement
HAVE_CBPF_JIT to enable BPF_SYSCALL, pulling in unused eBPF verifier and
map infrastructure just to use their classic JIT compiler?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418131614.1501848-1-puranjay@kernel.org?part=1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v13 2/6] bpf: add bpf_get_cpu_time_counter kfunc
  2026-04-18 13:15 [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs Puranjay Mohan
  2026-04-18 13:15 ` [PATCH bpf-next v13 1/6] bpf: adjust BPF JIT dependency to BPF_SYSCALL Puranjay Mohan
@ 2026-04-18 13:16 ` Puranjay Mohan
  2026-04-18 14:24   ` sashiko-bot
  2026-04-18 13:16 ` [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc Puranjay Mohan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-18 13:16 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Xu Kuohai,
	Vadim Fedorenko, Catalin Marinas, Will Deacon, kernel-team,
	Vadim Fedorenko, Yonghong Song

From: Vadim Fedorenko <vadfed@meta.com>

New kfunc to return ARCH-specific timecounter. The main reason to
implement this kfunc is to avoid extra overhead of benchmark
measurements, which are usually done by a pair of bpf_ktime_get_ns()
at the beginnig and at the end of the code block under benchmark.
When fully JITed this function doesn't implement conversion to the
monotonic clock and saves some CPU cycles by receiving timecounter
values in single-digit amount of instructions. The delta values can be
translated into nanoseconds using kfunc introduced in the next patch.
For x86_64 BPF JIT converts this kfunc into rdtsc ordered call. Other
architectures will get JIT implementation too if supported. The fallback
is to get CLOCK_MONOTONIC_RAW value in ns.

JIT version of the function uses "LFENCE; RDTSC" variant because it
doesn't care about cookie value returned by "RDTSCP" and it doesn't want
to trash RCX value. LFENCE option provides the same ordering guarantee as
RDTSCP variant.

The simplest use-case is added in 5th patch, where we calculate the time
spent by bpf_get_ns_current_pid_tgid() kfunc. More complex example is to
use session cookie to store timecounter value at kprobe/uprobe using
kprobe.session/uprobe.session, and calculate the difference at
kretprobe/uretprobe.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c   | 45 +++++++++++++++++++++++++++++++++++
 arch/x86/net/bpf_jit_comp32.c |  1 +
 include/linux/bpf.h           |  3 +++
 include/linux/filter.h        |  1 +
 kernel/bpf/core.c             | 11 +++++++++
 kernel/bpf/helpers.c          | 11 +++++++++
 kernel/bpf/verifier.c         |  4 +++-
 7 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index ea9e707e8abf..7cda5589107b 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -16,6 +16,7 @@
 #include <asm/ftrace.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
+#include <asm/timer.h>
 #include <asm/text-patching.h>
 #include <asm/unwind.h>
 #include <asm/cfi.h>
@@ -2445,6 +2446,40 @@ st:			if (is_imm8(insn->off))
 
 			/* call */
 		case BPF_JMP | BPF_CALL: {
+			u8 *ip = image + addrs[i - 1];
+
+			if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+			    imm32 == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
+			    bpf_jit_inlines_kfunc_call(imm32)) {
+				/* The default implementation of this kfunc uses
+				 * ktime_get_raw_ns() which effectively is implemented as
+				 * `(u64)rdtsc_ordered() & S64_MAX`. For JIT We skip
+				 * masking part because we assume it's not needed in BPF
+				 * use case (two measurements close in time).
+				 * Original code for rdtsc_ordered() uses sequence:
+				 * 'rdtsc; nop; nop; nop' to patch it into
+				 * 'lfence; rdtsc' or 'rdtscp' depending on CPU features.
+				 * JIT uses 'lfence; rdtsc' variant because BPF program
+				 * doesn't care about cookie provided by rdtscp in RCX.
+				 * Save RDX because RDTSC will use EDX:EAX to return u64
+				 */
+				emit_mov_reg(&prog, true, AUX_REG, BPF_REG_3);
+				if (cpu_feature_enabled(X86_FEATURE_LFENCE_RDTSC))
+					EMIT_LFENCE();
+				EMIT2(0x0F, 0x31);
+
+				/* shl RDX, 32 */
+				maybe_emit_1mod(&prog, BPF_REG_3, true);
+				EMIT3(0xC1, add_1reg(0xE0, BPF_REG_3), 32);
+				/* or RAX, RDX */
+				maybe_emit_mod(&prog, BPF_REG_0, BPF_REG_3, true);
+				EMIT2(0x09, add_2reg(0xC0, BPF_REG_0, BPF_REG_3));
+				/* restore RDX from R11 */
+				emit_mov_reg(&prog, true, BPF_REG_3, AUX_REG);
+
+				break;
+			}
+
 			func = (u8 *) __bpf_call_base + imm32;
 			if (src_reg == BPF_PSEUDO_CALL && tail_call_reachable) {
 				LOAD_TAIL_CALL_CNT_PTR(stack_depth);
@@ -4081,3 +4116,13 @@ bool bpf_jit_supports_fsession(void)
 {
 	return true;
 }
+
+/* x86-64 JIT can inline kfunc */
+bool bpf_jit_inlines_kfunc_call(s32 imm)
+{
+	if (imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
+	    cpu_feature_enabled(X86_FEATURE_TSC) &&
+	    using_native_sched_clock() && sched_clock_stable())
+		return true;
+	return false;
+}
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 852baf2e4db4..ca208378c979 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -16,6 +16,7 @@
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
 #include <asm/asm-prototypes.h>
+#include <asm/timer.h>
 #include <linux/bpf.h>
 
 /*
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b4b703c90ca9..74abf2b639fd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3744,6 +3744,9 @@ void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 u64 bpf_get_raw_cpu_id(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
+/* Inlined kfuncs */
+u64 bpf_get_cpu_time_counter(void);
+
 #if defined(CONFIG_NET)
 bool bpf_sock_common_is_valid_access(int off, int size,
 				     enum bpf_access_type type,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1ec6d5ba64cc..4924c0bab1a0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1159,6 +1159,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_jit_needs_zext(void);
 bool bpf_jit_inlines_helper_call(s32 imm);
+bool bpf_jit_inlines_kfunc_call(s32 imm);
 bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_percpu_insn(void);
 bool bpf_jit_supports_kfunc_call(void);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8b018ff48875..e06e750ee16f 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3201,6 +3201,17 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
 	return false;
 }
 
+/* Return true if the JIT inlines the call to the kfunc corresponding to
+ * the imm.
+ *
+ * The verifier will not patch the insn->imm for the call to the helper if
+ * this returns true.
+ */
+bool __weak bpf_jit_inlines_kfunc_call(s32 imm)
+{
+	return false;
+}
+
 /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool __weak bpf_jit_supports_subprog_tailcalls(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 2bb60200c266..dfe280440120 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -4664,6 +4664,16 @@ __bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
 	}
 }
 
+__bpf_kfunc u64 bpf_get_cpu_time_counter(void)
+{
+	/* CLOCK_MONOTONIC_RAW is the closest analogue to what is implemented
+	 * in JIT. The access time is the same as for CLOCK_MONOTONIC, but the
+	 * slope of 'raw' is not affected by NTP adjustments, and with stable
+	 * TSC it can provide less jitter in short term measurements.
+	 */
+	return ktime_get_raw_fast_ns();
+}
+
 __bpf_kfunc_end_defs();
 
 static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4859,6 +4869,7 @@ BTF_ID_FLAGS(func, bpf_task_work_schedule_resume, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_dynptr_from_file)
 BTF_ID_FLAGS(func, bpf_dynptr_file_discard)
 BTF_ID_FLAGS(func, bpf_timer_cancel_async)
+BTF_ID_FLAGS(func, bpf_get_cpu_time_counter)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 69d75515ed3f..b8d26e1bff48 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17035,7 +17035,9 @@ bool bpf_get_call_summary(struct bpf_verifier_env *env, struct bpf_insn *call,
 			/* error would be reported later */
 			return false;
 		cs->num_params = btf_type_vlen(meta.func_proto);
-		cs->fastcall = meta.kfunc_flags & KF_FASTCALL;
+		cs->fastcall = (meta.kfunc_flags & KF_FASTCALL) ||
+			       (meta.btf == btf_vmlinux &&
+				bpf_jit_inlines_kfunc_call(call->imm));
 		cs->is_void = btf_type_is_void(btf_type_by_id(meta.btf, meta.func_proto->type));
 		return true;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 2/6] bpf: add bpf_get_cpu_time_counter kfunc
  2026-04-18 13:16 ` [PATCH bpf-next v13 2/6] bpf: add bpf_get_cpu_time_counter kfunc Puranjay Mohan
@ 2026-04-18 14:24   ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-18 14:24 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Sashiko AI review found 3 potential issue(s):
- [High] Type confusion in `bpf_jit_inlines_kfunc_call` breaks fastcall optimization and hides a TOCTOU register corruption vulnerability.
- [High] The `bpf_get_cpu_time_counter()` kfunc returns inconsistent physical units (CPU cycles vs. nanoseconds) depending on JIT execution.
- [Medium] Missing memory barrier for `RDTSC` allows speculative out-of-order execution on certain CPUs.
--

commit e2ed941037816a39f712a14ccb3383436319583d
Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>

bpf: add bpf_get_cpu_time_counter kfunc

This commit adds a bpf_get_cpu_time_counter kfunc to return an
ARCH-specific timecounter, bypassing the usual ktime_get_ns() overhead.
It implements an inlined x86_64 JIT version using RDTSC, and falls back
to ktime_get_raw_fast_ns() in C.

> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index ea9e707e8abff..7cda5589107bf 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c

[ ... ]

> @@ -2445,6 +2446,40 @@ st:			if (is_imm8(insn->off))
>  
>  			/* call */
>  		case BPF_JMP | BPF_CALL: {
> +			u8 *ip = image + addrs[i - 1];
> +
> +			if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +			    imm32 == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
> +			    bpf_jit_inlines_kfunc_call(imm32)) {
> +				/* The default implementation of this kfunc uses
> +				 * ktime_get_raw_ns() which effectively is implemented as
> +				 * `(u64)rdtsc_ordered() & S64_MAX`. For JIT We skip

Does this JIT implementation return inconsistent time units compared to the C
fallback?

The inline RDTSC emits hardware CPU cycles, but the fallback
bpf_get_cpu_time_counter() calls ktime_get_raw_fast_ns(), which returns
nanoseconds. If a BPF program calculates a delta using the fallback path and
later converts it, won't the nanosecond value be multiplied again, resulting
in incorrect metrics?

[ ... ]

> +				emit_mov_reg(&prog, true, AUX_REG, BPF_REG_3);
> +				if (cpu_feature_enabled(X86_FEATURE_LFENCE_RDTSC))
> +					EMIT_LFENCE();
> +				EMIT2(0x0F, 0x31);

The kernel's standard rdtsc_ordered() implementation ensures serialization by
checking both X86_FEATURE_LFENCE_RDTSC and X86_FEATURE_MFENCE_RDTSC.

Does this code need to check for MFENCE as well? On older AMD CPUs that rely
on MFENCE, omitting it could result in out-of-order execution across the
measured code block.

[ ... ]

> @@ -4081,3 +4116,13 @@ bool bpf_jit_supports_fsession(void)
>  {
>  	return true;
>  }
> +
> +/* x86-64 JIT can inline kfunc */
> +bool bpf_jit_inlines_kfunc_call(s32 imm)
> +{
> +	if (imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
> +	    cpu_feature_enabled(X86_FEATURE_TSC) &&
> +	    using_native_sched_clock() && sched_clock_stable())
> +		return true;
> +	return false;
> +}

Is it safe to evaluate sched_clock_stable() dynamically for fastcall
optimizations?

If sched_clock_stable() returns true during verification, the verifier
could strip the caller-saved register spills. If the clock later transitions
to false before JIT compilation, the JIT will emit a standard C call
instead of inlining, which could silently clobber BPF registers R1-R5.

[ ... ]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 69d75515ed3f7..b8d26e1bff481 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c

[ ... ]

> @@ -17035,7 +17035,9 @@ bool bpf_get_call_summary(struct bpf_verifier_env *env, struct bpf_insn *call,
>  			/* error would be reported later */
>  			return false;
>  		cs->num_params = btf_type_vlen(meta.func_proto);
> -		cs->fastcall = meta.kfunc_flags & KF_FASTCALL;
> +		cs->fastcall = (meta.kfunc_flags & KF_FASTCALL) ||
> +			       (meta.btf == btf_vmlinux &&
> +				bpf_jit_inlines_kfunc_call(call->imm));
>  		cs->is_void = btf_type_is_void(btf_type_by_id(meta.btf, meta.func_proto->type));
>  		return true;
>  	}

In the verifier, call->imm holds the kfunc's BTF ID, but the x86 JIT
implementation of bpf_jit_inlines_kfunc_call() compares it against
BPF_CALL_IMM(bpf_get_cpu_time_counter), which is a relative memory offset.

Since a BTF ID will never match this memory offset, won't this always
return false during verification, defeating the fastcall optimization?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418131614.1501848-1-puranjay@kernel.org?part=2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc
  2026-04-18 13:15 [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs Puranjay Mohan
  2026-04-18 13:15 ` [PATCH bpf-next v13 1/6] bpf: adjust BPF JIT dependency to BPF_SYSCALL Puranjay Mohan
  2026-04-18 13:16 ` [PATCH bpf-next v13 2/6] bpf: add bpf_get_cpu_time_counter kfunc Puranjay Mohan
@ 2026-04-18 13:16 ` Puranjay Mohan
  2026-04-18 14:03   ` bot+bpf-ci
  2026-04-18 14:54   ` sashiko-bot
  2026-04-18 13:16 ` [PATCH bpf-next v13 4/6] selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit Puranjay Mohan
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-18 13:16 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Xu Kuohai,
	Vadim Fedorenko, Catalin Marinas, Will Deacon, kernel-team,
	Vadim Fedorenko

From: Vadim Fedorenko <vadfed@meta.com>

The new kfunc should be used to convert deltas of values
received by bpf_get_cpu_time_counter() into nanoseconds. It is not
designed to do full conversion of time counter values to
CLOCK_MONOTONIC_RAW nanoseconds and cannot guarantee monotonicity of 2
independent values, but rather to convert the difference of 2 close
enough values of CPU timestamp counter into nanoseconds.

This function is JITted into just several instructions and adds as
low overhead as possible and perfectly suits benchmark use-cases.

When the kfunc is not JITted it returns the value provided as argument
because the kfunc in previous patch will return values in nanoseconds
and can be optimized by verifier.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c   | 29 ++++++++++++++++++++++++++++-
 arch/x86/net/bpf_jit_comp32.c |  1 +
 include/linux/bpf.h           |  1 +
 kernel/bpf/helpers.c          |  6 ++++++
 kernel/bpf/verifier.c         | 10 +++++++---
 5 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 7cda5589107b..a8956eb867ef 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -10,6 +10,7 @@
 #include <linux/if_vlan.h>
 #include <linux/bitfield.h>
 #include <linux/bpf.h>
+#include <linux/clocksource.h>
 #include <linux/memory.h>
 #include <linux/sort.h>
 #include <asm/extable.h>
@@ -2480,6 +2481,31 @@ st:			if (is_imm8(insn->off))
 				break;
 			}
 
+			if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+			    imm32 == BPF_CALL_IMM(bpf_cpu_time_counter_to_ns) &&
+			    bpf_jit_inlines_kfunc_call(imm32)) {
+				struct cyc2ns_data data;
+				u32 mult, shift;
+
+				/* stable TSC runs with fixed frequency and
+				 * transformation coefficients are also fixed
+				 */
+				cyc2ns_read_begin(&data);
+				mult = data.cyc2ns_mul;
+				shift = data.cyc2ns_shift;
+				cyc2ns_read_end();
+				/* imul RAX, RDI, mult */
+				maybe_emit_mod(&prog, BPF_REG_1, BPF_REG_0, true);
+				EMIT2_off32(0x69, add_2reg(0xC0, BPF_REG_1, BPF_REG_0),
+					    mult);
+
+				/* shr RAX, shift (which is less than 64) */
+				maybe_emit_1mod(&prog, BPF_REG_0, true);
+				EMIT3(0xC1, add_1reg(0xE8, BPF_REG_0), shift);
+
+				break;
+			}
+
 			func = (u8 *) __bpf_call_base + imm32;
 			if (src_reg == BPF_PSEUDO_CALL && tail_call_reachable) {
 				LOAD_TAIL_CALL_CNT_PTR(stack_depth);
@@ -4120,7 +4146,8 @@ bool bpf_jit_supports_fsession(void)
 /* x86-64 JIT can inline kfunc */
 bool bpf_jit_inlines_kfunc_call(s32 imm)
 {
-	if (imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
+	if ((imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) ||
+	     imm == BPF_CALL_IMM(bpf_cpu_time_counter_to_ns)) &&
 	    cpu_feature_enabled(X86_FEATURE_TSC) &&
 	    using_native_sched_clock() && sched_clock_stable())
 		return true;
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index ca208378c979..da61bc5585aa 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -12,6 +12,7 @@
 #include <linux/netdevice.h>
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
+#include <linux/clocksource.h>
 #include <asm/cacheflush.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 74abf2b639fd..d523168b8998 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3746,6 +3746,7 @@ u64 bpf_get_raw_cpu_id(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
 /* Inlined kfuncs */
 u64 bpf_get_cpu_time_counter(void);
+u64 bpf_cpu_time_counter_to_ns(u64 counter);
 
 #if defined(CONFIG_NET)
 bool bpf_sock_common_is_valid_access(int off, int size,
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index dfe280440120..bc7f5ccac761 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -4674,6 +4674,11 @@ __bpf_kfunc u64 bpf_get_cpu_time_counter(void)
 	return ktime_get_raw_fast_ns();
 }
 
+__bpf_kfunc u64 bpf_cpu_time_counter_to_ns(u64 counter)
+{
+	return counter;
+}
+
 __bpf_kfunc_end_defs();
 
 static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4870,6 +4875,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_from_file)
 BTF_ID_FLAGS(func, bpf_dynptr_file_discard)
 BTF_ID_FLAGS(func, bpf_timer_cancel_async)
 BTF_ID_FLAGS(func, bpf_get_cpu_time_counter)
+BTF_ID_FLAGS(func, bpf_cpu_time_counter_to_ns, KF_FASTCALL)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b8d26e1bff48..5341dc6d29ca 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11178,8 +11178,8 @@ enum special_kfunc_type {
 	KF_bpf_session_is_return,
 	KF_bpf_stream_vprintk,
 	KF_bpf_stream_print_stack,
+	KF_bpf_cpu_time_counter_to_ns,
 };
-
 BTF_ID_LIST(special_kfunc_list)
 BTF_ID(func, bpf_obj_new_impl)
 BTF_ID(func, bpf_obj_new)
@@ -11266,6 +11266,7 @@ BTF_ID(func, bpf_arena_reserve_pages)
 BTF_ID(func, bpf_session_is_return)
 BTF_ID(func, bpf_stream_vprintk)
 BTF_ID(func, bpf_stream_print_stack)
+BTF_ID(func, bpf_cpu_time_counter_to_ns)
 
 static bool is_bpf_obj_new_kfunc(u32 func_id)
 {
@@ -18629,7 +18630,6 @@ static void sanitize_dead_code(struct bpf_verifier_env *env)
 }
 
 
-
 static void free_states(struct bpf_verifier_env *env)
 {
 	struct bpf_verifier_state_list *sl;
@@ -19791,6 +19791,9 @@ int bpf_fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	if (!bpf_jit_supports_far_kfunc_call())
 		insn->imm = BPF_CALL_IMM(desc->addr);
+	/* if JIT will inline kfunc verifier shouldn't change the code */
+	if (bpf_jit_inlines_kfunc_call(insn->imm))
+		return 0;
 
 	if (is_bpf_obj_new_kfunc(desc->func_id) || is_bpf_percpu_obj_new_kfunc(desc->func_id)) {
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
@@ -19851,7 +19854,8 @@ int bpf_fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
 						node_offset_reg, insn, insn_buf, cnt);
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
-		   desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
+		   desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast] ||
+		   desc->func_id == special_kfunc_list[KF_bpf_cpu_time_counter_to_ns]) {
 		insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
 		*cnt = 1;
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_session_is_return] &&
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc
  2026-04-18 13:16 ` [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc Puranjay Mohan
@ 2026-04-18 14:03   ` bot+bpf-ci
  2026-04-18 14:54   ` sashiko-bot
  1 sibling, 0 replies; 19+ messages in thread
From: bot+bpf-ci @ 2026-04-18 14:03 UTC (permalink / raw)
  To: puranjay, bpf
  Cc: puranjay, ast, andrii, daniel, martin.lau, eddyz87, memxor,
	mykyta.yatsenko5, xukuohai, vadim.fedorenko, catalin.marinas,
	will, kernel-team, vadfed, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1590 bytes --]

> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 7cda5589107b..a8956eb867ef 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -10,6 +10,7 @@
>  #include <linux/if_vlan.h>
>  #include <linux/bitfield.h>
>  #include <linux/bpf.h>
> +#include <linux/clocksource.h>
>  #include <linux/memory.h>
>  #include <linux/sort.h>
>  #include <asm/extable.h>

[ ... ]

> diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
> index ca208378c979..da61bc5585aa 100644
> --- a/arch/x86/net/bpf_jit_comp32.c
> +++ b/arch/x86/net/bpf_jit_comp32.c
> @@ -12,6 +12,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/filter.h>
>  #include <linux/if_vlan.h>
> +#include <linux/clocksource.h>
>  #include <asm/cacheflush.h>
>  #include <asm/set_memory.h>
>  #include <asm/nospec-branch.h>

This isn't a bug, but the include appears to be unused in
bpf_jit_comp32.c. During review of v11 on the mailing list, Alexei
Starovoitov noted that the 32-bit JIT implementation was questionable and
should be dropped entirely:

https://lore.kernel.org/bpf/CAADnVQLYT5SV+tS2ycLteBMYOc12C=X7iHZ=RjhyVzuY=6=8Uw@mail.gmail.com/

The 32-bit JIT implementation was indeed removed, but this header include
remained. Would it make sense to remove the unused include from
bpf_jit_comp32.c?

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24605755621

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc
  2026-04-18 13:16 ` [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc Puranjay Mohan
  2026-04-18 14:03   ` bot+bpf-ci
@ 2026-04-18 14:54   ` sashiko-bot
  1 sibling, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-18 14:54 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Sashiko AI review found 3 potential issue(s):
- [Critical] JIT-emitted IMUL instruction sign-extends the 32-bit cyc2ns_mul multiplier, corrupting the nanosecond conversion.
- [High] 64-bit truncation in JITted bpf_cpu_time_counter_to_ns causes severe time overflow for intervals larger than ~4.3 seconds.
- [High] API unit inconsistency and TOCTOU race on sched_clock_stable() cause silent unit corruption.
--

commit d48481a8aafecd7a1cb43ba6a7f2840fa2d497e3
Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Subject: bpf: add bpf_cpu_time_counter_to_ns kfunc

Adds a bpf_cpu_time_counter_to_ns() kfunc to convert CPU cycle deltas
into nanoseconds. The kfunc is JIT inlined on x86 for low overhead,
leveraging cyc2ns multiplication and shift values.

> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c

[ ... ]

> @@ -2480,6 +2481,31 @@ st:			if (is_imm8(insn->off))
>  				break;
>  			}
>  
> +			if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +			    imm32 == BPF_CALL_IMM(bpf_cpu_time_counter_to_ns) &&
> +			    bpf_jit_inlines_kfunc_call(imm32)) {
> +				struct cyc2ns_data data;
> +				u32 mult, shift;
> +
> +				/* stable TSC runs with fixed frequency and
> +				 * transformation coefficients are also fixed
> +				 */
> +				cyc2ns_read_begin(&data);
> +				mult = data.cyc2ns_mul;
> +				shift = data.cyc2ns_shift;
> +				cyc2ns_read_end();
> +				/* imul RAX, RDI, mult */
> +				maybe_emit_mod(&prog, BPF_REG_1, BPF_REG_0, true);
> +				EMIT2_off32(0x69, add_2reg(0xC0, BPF_REG_1, BPF_REG_0),
> +					    mult);

Can this IMUL instruction corrupt the multiplication? The IMUL r64,
r/m64, imm32 instruction (opcode 0x69) sign-extends the 32-bit
immediate to 64 bits. Since data.cyc2ns_mul is heavily scaled to
retain precision, it is frequently >= 0x80000000.

When cyc2ns_mul >= 0x80000000, it would be sign-extended to a negative
64-bit value, drastically altering the product when multiplied with the
unsigned cycle count. Would it be safer to zero-extend the multiplier
by loading it into a scratch register first?

Will this 64-bit multiplication overflow for longer time intervals? The
IMUL r64, r/m64, imm32 instruction computes a 64x64-bit multiplication
but only stores the lower 64 bits of the product in the destination
register.

Because cyc2ns_mul is scaled up for precision, the product could overflow
the 64-bit register when the cycle count exceeds roughly 2^33 (around
4.3 seconds on a typical 2 GHz CPU). Could this be avoided by using the
MUL r/m64 instruction to compute the full 128-bit product in RDX:RAX
followed by SHRD?

> +
> +				/* shr RAX, shift (which is less than 64) */
> +				maybe_emit_1mod(&prog, BPF_REG_0, true);
> +				EMIT3(0xC1, add_1reg(0xE8, BPF_REG_0), shift);
> +
> +				break;
> +			}

[ ... ]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -4674,6 +4674,11 @@ __bpf_kfunc u64 bpf_get_cpu_time_counter(void)
>  	return ktime_get_raw_fast_ns();
>  }
>  
> +__bpf_kfunc u64 bpf_cpu_time_counter_to_ns(u64 counter)
> +{
> +	return counter;
> +}

Does this introduce an API unit inconsistency? bpf_get_cpu_time_counter()
returns raw cycles when JIT inlining is active on x86, but falls back to
ktime_get_raw_fast_ns() (nanoseconds) when JIT is disabled or on other
architectures. BPF programs exporting this value might output different
units depending on the environment.

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19791,6 +19791,9 @@ int bpf_fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  
>  	if (!bpf_jit_supports_far_kfunc_call())
>  		insn->imm = BPF_CALL_IMM(desc->addr);
> +	/* if JIT will inline kfunc verifier shouldn't change the code */
> +	if (bpf_jit_inlines_kfunc_call(insn->imm))
> +		return 0;

[ ... ]

> @@ -19851,7 +19854,8 @@ int bpf_fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>  						node_offset_reg, insn, insn_buf, cnt);
>  	} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
> -		   desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
> +		   desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast] ||
> +		   desc->func_id == special_kfunc_list[KF_bpf_cpu_time_counter_to_ns]) {
>  		insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
>  		*cnt = 1;

Is there a race condition here involving sched_clock_stable()?
bpf_jit_inlines_kfunc_call() reads sched_clock_stable() which can change
dynamically.

If sched_clock_stable() returns false during verification,
bpf_fixup_kfunc_call() will replace bpf_cpu_time_counter_to_ns with a MOV
instruction. If it transitions to true before the JIT phase,
bpf_get_cpu_time_counter() will be inlined to emit RDTSC. In that
scenario, the program would receive raw cycles from the get kfunc but
use a no-op for the conversion, silently treating raw cycles as
nanoseconds.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418131614.1501848-1-puranjay@kernel.org?part=3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v13 4/6] selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit
  2026-04-18 13:15 [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs Puranjay Mohan
                   ` (2 preceding siblings ...)
  2026-04-18 13:16 ` [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc Puranjay Mohan
@ 2026-04-18 13:16 ` Puranjay Mohan
  2026-04-18 15:08   ` sashiko-bot
  2026-04-18 13:16 ` [PATCH bpf-next v13 5/6] selftests/bpf: add usage example for cpu time counter kfuncs Puranjay Mohan
  2026-04-18 13:16 ` [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support " Puranjay Mohan
  5 siblings, 1 reply; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-18 13:16 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Xu Kuohai,
	Vadim Fedorenko, Catalin Marinas, Will Deacon, kernel-team,
	Vadim Fedorenko

From: Vadim Fedorenko <vadfed@meta.com>

bpf_get_cpu_time_counter() is replaced with rdtsc instruction on x86_64.
Add tests to check that JIT works as expected. When JIT is not
supported, bpf_cpu_time_counter_to_ns() can be inlined by verifier.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 .../selftests/bpf/progs/verifier_cpu_cycles.c | 120 ++++++++++++++++++
 2 files changed, 122 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index a96b25ebff23..775fe69b0948 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -116,6 +116,7 @@
 #include "verifier_bits_iter.skel.h"
 #include "verifier_lsm.skel.h"
 #include "verifier_jit_inline.skel.h"
+#include "verifier_cpu_cycles.skel.h"
 #include "irq.skel.h"
 #include "verifier_ctx_ptr_param.skel.h"
 
@@ -265,6 +266,7 @@ void test_irq(void)			      { RUN(irq); }
 void test_verifier_mtu(void)		      { RUN(verifier_mtu); }
 void test_verifier_jit_inline(void)               { RUN(verifier_jit_inline); }
 void test_verifier_ctx_ptr_param(void)       { RUN(verifier_ctx_ptr_param); }
+void test_verifier_cpu_cycles(void)	      { RUN(verifier_cpu_cycles); }
 
 static int init_test_val_map(struct bpf_object *obj, char *map_name)
 {
diff --git a/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c b/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c
new file mode 100644
index 000000000000..26c02010ccf1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Inc. */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+
+extern u64 bpf_cpu_time_counter_to_ns(u64 cycles) __weak __ksym;
+extern u64 bpf_get_cpu_time_counter(void) __weak __ksym;
+
+SEC("syscall")
+__arch_x86_64
+__xlated("0: call kernel-function")
+__naked int bpf_rdtsc(void)
+{
+	asm volatile(
+	"call %[bpf_get_cpu_time_counter];"
+	"exit"
+	:
+	: __imm(bpf_get_cpu_time_counter)
+	: __clobber_all
+	);
+}
+
+SEC("syscall")
+__arch_x86_64
+/* program entry for bpf_rdtsc_jit_x86_64(), regular function prologue */
+__jited("	endbr64")
+__jited("	nopl	(%rax,%rax)")
+__jited("	nopl	(%rax)")
+__jited("	pushq	%rbp")
+__jited("	movq	%rsp, %rbp")
+__jited("	endbr64")
+/* save RDX in R11 as it will be overwritten */
+__jited("	movq	%rdx, %r11")
+/* lfence may not be executed depending on cpu features */
+__jited("	{{(lfence|)}}")
+__jited("	rdtsc")
+/* combine EDX:EAX into RAX */
+__jited("	shlq	${{(32|0x20)}}, %rdx")
+__jited("	orq	%rdx, %rax")
+/* restore RDX from R11 */
+__jited("	movq	%r11, %rdx")
+__jited("	leave")
+__naked int bpf_rdtsc_jit_x86_64(void)
+{
+	asm volatile(
+	"call %[bpf_get_cpu_time_counter];"
+	"exit"
+	:
+	: __imm(bpf_get_cpu_time_counter)
+	: __clobber_all
+	);
+}
+
+SEC("syscall")
+__arch_arm64
+__xlated("0: r1 = 42")
+__xlated("1: r0 = r1")
+__naked int bpf_cyc2ns_arm(void)
+{
+	asm volatile(
+	"r1=0x2a;"
+	"call %[bpf_cpu_time_counter_to_ns];"
+	"exit"
+	:
+	: __imm(bpf_cpu_time_counter_to_ns)
+	: __clobber_all
+	);
+}
+
+SEC("syscall")
+__arch_x86_64
+__xlated("0: r1 = 42")
+__xlated("1: call kernel-function")
+__naked int bpf_cyc2ns(void)
+{
+	asm volatile(
+	"r1=0x2a;"
+	"call %[bpf_cpu_time_counter_to_ns];"
+	"exit"
+	:
+	: __imm(bpf_cpu_time_counter_to_ns)
+	: __clobber_all
+	);
+}
+
+SEC("syscall")
+__arch_x86_64
+/* program entry for bpf_rdtsc_jit_x86_64(), regular function prologue */
+__jited("	endbr64")
+__jited("	nopl	(%rax,%rax)")
+__jited("	nopl	(%rax)")
+__jited("	pushq	%rbp")
+__jited("	movq	%rsp, %rbp")
+__jited("	endbr64")
+/* save RDX in R11 as it will be overwritten */
+__jited("	movabsq	$0x2a2a2a2a2a, %rdi")
+__jited("	imulq	${{.*}}, %rdi, %rax")
+__jited("	shrq	${{.*}}, %rax")
+__jited("	leave")
+__naked int bpf_cyc2ns_jit_x86(void)
+{
+	asm volatile(
+	"r1=0x2a2a2a2a2a ll;"
+	"call %[bpf_cpu_time_counter_to_ns];"
+	"exit"
+	:
+	: __imm(bpf_cpu_time_counter_to_ns)
+	: __clobber_all
+	);
+}
+
+void rdtsc(void)
+{
+	bpf_get_cpu_time_counter();
+	bpf_cpu_time_counter_to_ns(42);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 4/6] selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit
  2026-04-18 13:16 ` [PATCH bpf-next v13 4/6] selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit Puranjay Mohan
@ 2026-04-18 15:08   ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-18 15:08 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Sashiko AI review found 3 potential issue(s):
- [High] The selftests rigidly enforce a broken x86-64 JIT implementation for `bpf_cpu_time_counter_to_ns()`.
- [Medium] Fragile tests dependent on hardware features will cause CI flakiness.
- [Low] Mismatched and inaccurate comments due to copy-pasting.
--

commit 01407c2c12560c811a29d35f221844615da2962d
Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>

selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit

This commit adds selftests for the JIT implementations of
bpf_get_cpu_time_counter() and bpf_cpu_time_counter_to_ns() on x86_64 and
arm64. It verifies the emitted assembly using __jited and __xlated macros.

> diff --git a/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c b/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c
> new file mode 100644
> index 0000000000000..26c02010ccf1f
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c
[ ... ]
> +SEC("syscall")
> +__arch_x86_64
> +/* program entry for bpf_rdtsc_jit_x86_64(), regular function prologue */
> +__jited("	endbr64")
> +__jited("	nopl	(%rax,%rax)")
> +__jited("	nopl	(%rax)")
> +__jited("	pushq	%rbp")
> +__jited("	movq	%rsp, %rbp")
> +__jited("	endbr64")
> +/* save RDX in R11 as it will be overwritten */
> +__jited("	movq	%rdx, %r11")
> +/* lfence may not be executed depending on cpu features */
> +__jited("	{{(lfence|)}}")
> +__jited("	rdtsc")

Will this test deterministically fail in environments where
sched_clock_stable() is false or the CPU lacks the TSC feature?

The test strictly asserts that rdtsc is emitted. However, the underlying
JIT falls back to a standard function call if those hardware conditions are
not met, which might cause test failures in some CI and virtualization
environments.

[ ... ]
> +SEC("syscall")
> +__arch_x86_64
> +/* program entry for bpf_rdtsc_jit_x86_64(), regular function prologue */

This isn't a bug, but does this comment reference the wrong function name
due to a copy-paste error?

> +__jited("	endbr64")
> +__jited("	nopl	(%rax,%rax)")
> +__jited("	nopl	(%rax)")
> +__jited("	pushq	%rbp")
> +__jited("	movq	%rsp, %rbp")
> +__jited("	endbr64")
> +/* save RDX in R11 as it will be overwritten */
> +__jited("	movabsq	$0x2a2a2a2a2a, %rdi")

This isn't a bug, but the comment mentions saving RDX in R11, while the
following instruction does not touch either of those registers.

> +__jited("	imulq	${{.*}}, %rdi, %rax")
> +__jited("	shrq	${{.*}}, %rax")

Does enforcing this specific JIT sequence codify broken math for large cycle
counts?

The x86 imulq instruction computes a 64-bit product, truncating the upper
bits. Since cycle counts are 64-bit and multipliers can be large, their
product often requires 128 bits. The truncation would result in silent
wraparound.

Additionally, using a 32-bit immediate with imulq sign-extends the value to
64 bits. If the multiplier has its most significant bit set, it will be
treated as a large negative number, corrupting the result.

Finally, the emitted sequence does not add the offset. On arm64, the test
expects a no-op since the counter returns absolute nanoseconds, but on
x86-64 without the offset, absolute counter conversions will return
incorrect values. Does this test assert a cross-architecture inconsistency?

> +__jited("	leave")
> +__naked int bpf_cyc2ns_jit_x86(void)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418131614.1501848-1-puranjay@kernel.org?part=4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v13 5/6] selftests/bpf: add usage example for cpu time counter kfuncs
  2026-04-18 13:15 [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs Puranjay Mohan
                   ` (3 preceding siblings ...)
  2026-04-18 13:16 ` [PATCH bpf-next v13 4/6] selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit Puranjay Mohan
@ 2026-04-18 13:16 ` Puranjay Mohan
  2026-04-18 15:17   ` sashiko-bot
  2026-04-18 13:16 ` [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support " Puranjay Mohan
  5 siblings, 1 reply; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-18 13:16 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Xu Kuohai,
	Vadim Fedorenko, Catalin Marinas, Will Deacon, kernel-team,
	Vadim Fedorenko

From: Vadim Fedorenko <vadfed@meta.com>

The selftest provides an example of how to measure the latency of bpf
kfunc/helper call using time stamp counter and how to convert measured
value into nanoseconds.

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 .../bpf/prog_tests/test_cpu_cycles.c          | 35 +++++++++++++++++++
 .../selftests/bpf/progs/test_cpu_cycles.c     | 25 +++++++++++++
 2 files changed, 60 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_cpu_cycles.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c b/tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c
new file mode 100644
index 000000000000..067307f0c4c2
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Inc. */
+
+#include <test_progs.h>
+#include "test_cpu_cycles.skel.h"
+
+static void cpu_cycles(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct test_cpu_cycles *skel;
+	int err, pfd;
+
+	skel = test_cpu_cycles__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_cpu_cycles open and load"))
+		return;
+
+	pfd = bpf_program__fd(skel->progs.bpf_cpu_cycles);
+	if (!ASSERT_GT(pfd, 0, "test_cpu_cycles fd"))
+		goto fail;
+
+	err = bpf_prog_test_run_opts(pfd, &opts);
+	if (!ASSERT_OK(err, "test_cpu_cycles test run"))
+		goto fail;
+
+	ASSERT_NEQ(skel->bss->cycles, 0, "test_cpu_cycles 0 cycles");
+	ASSERT_NEQ(skel->bss->ns, 0, "test_cpu_cycles 0 ns");
+fail:
+	test_cpu_cycles__destroy(skel);
+}
+
+void test_cpu_cycles(void)
+{
+	if (test__start_subtest("cpu_cycles"))
+		cpu_cycles();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_cpu_cycles.c b/tools/testing/selftests/bpf/progs/test_cpu_cycles.c
new file mode 100644
index 000000000000..3c428f3be831
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_cpu_cycles.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Inc. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+extern u64 bpf_cpu_time_counter_to_ns(u64 cycles) __weak __ksym;
+extern u64 bpf_get_cpu_time_counter(void) __weak __ksym;
+
+__u64 cycles, ns;
+
+SEC("syscall")
+int bpf_cpu_cycles(void)
+{
+	struct bpf_pidns_info pidns;
+	__u64 start;
+
+	start = bpf_get_cpu_time_counter();
+	bpf_get_ns_current_pid_tgid(0, 0, &pidns, sizeof(struct bpf_pidns_info));
+	cycles = bpf_get_cpu_time_counter() - start;
+	ns = bpf_cpu_time_counter_to_ns(cycles);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 5/6] selftests/bpf: add usage example for cpu time counter kfuncs
  2026-04-18 13:16 ` [PATCH bpf-next v13 5/6] selftests/bpf: add usage example for cpu time counter kfuncs Puranjay Mohan
@ 2026-04-18 15:17   ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-18 15:17 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Sashiko AI review found 1 potential issue(s):
- [Medium] Incorrect and redundant file descriptor validation in BPF skeleton test.
--

commit 0a7806f6db918db501677b21fa0e15ae508d1ada
Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>

selftests/bpf: add usage example for cpu time counter kfuncs

This commit adds a selftest example demonstrating how to measure the
latency of BPF kfuncs or helpers using the time stamp counter and how
to convert the measured values into nanoseconds.

> diff --git a/tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c b/tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c
> new file mode 100644
> index 0000000000000..067307f0c4c29
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_cpu_cycles.c
[ ... ]
> +static void cpu_cycles(void)
> +{
> +	LIBBPF_OPTS(bpf_test_run_opts, opts);
> +	struct test_cpu_cycles *skel;
> +	int err, pfd;
> +
> +	skel = test_cpu_cycles__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "test_cpu_cycles open and load"))
> +		return;
> +
> +	pfd = bpf_program__fd(skel->progs.bpf_cpu_cycles);
> +	if (!ASSERT_GT(pfd, 0, "test_cpu_cycles fd"))
> +		goto fail;

Is it necessary to validate the file descriptor here? The BPF skeleton API
guarantees that after a successful open_and_load(), all programs and maps
are created with valid file descriptors, making this validation redundant.

Additionally, if a file descriptor check were needed, shouldn't it use
ASSERT_GE rather than ASSERT_GT? 0 is technically a valid file descriptor,
so if pfd evaluated to 0, this test would falsely fail.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418131614.1501848-1-puranjay@kernel.org?part=5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support for cpu time counter kfuncs
  2026-04-18 13:15 [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs Puranjay Mohan
                   ` (4 preceding siblings ...)
  2026-04-18 13:16 ` [PATCH bpf-next v13 5/6] selftests/bpf: add usage example for cpu time counter kfuncs Puranjay Mohan
@ 2026-04-18 13:16 ` Puranjay Mohan
  2026-04-18 14:03   ` bot+bpf-ci
                     ` (3 more replies)
  5 siblings, 4 replies; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-18 13:16 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Xu Kuohai,
	Vadim Fedorenko, Catalin Marinas, Will Deacon, kernel-team

Add ARM64 JIT inlining for bpf_get_cpu_time_counter() and
bpf_cpu_time_counter_to_ns() kfuncs.

bpf_get_cpu_time_counter() is JIT-inlined as:

  ISB                       // serialize instruction stream
  MRS Xn, CNTVCT_EL0        // read architected timer counter

The ISB before the MRS is required for ordering, matching the kernel's
arch_timer_read_cntvct_el0() implementation.

On newer CPUs it will be JITed to:

  MRS Xn, CNTVCTSS_EL0      // self-synchronized (ISB not needed)

bpf_cpu_time_counter_to_ns() is JIT-inlined using mult/shift constants
computed at JIT time from the architected timer frequency (CNTFRQ_EL0):

  MOV Xtmp, #mult           // load conversion multiplier
  MUL Xn, Xarg, Xtmp        // delta_ticks * mult
  LSR Xn, Xn, #shift        // >> shift = nanoseconds

On systems with a 1GHz counter (e.g., Neoverse-V2), mult=1 and shift=0,
so the conversion collapses to a single MOV (identity).

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/arm64/include/asm/insn.h                 |  2 +
 arch/arm64/net/bpf_jit.h                      |  4 ++
 arch/arm64/net/bpf_jit_comp.c                 | 54 +++++++++++++++++++
 .../selftests/bpf/progs/verifier_cpu_cycles.c | 50 ++++++++++++++++-
 4 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index f463a654a2bb..bb235a39cef0 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -139,6 +139,8 @@ enum aarch64_insn_system_register {
 	AARCH64_INSN_SYSREG_TPIDR_EL1	= 0x4684,
 	AARCH64_INSN_SYSREG_TPIDR_EL2	= 0x6682,
 	AARCH64_INSN_SYSREG_SP_EL0	= 0x4208,
+	AARCH64_INSN_SYSREG_CNTVCT_EL0		= 0x5F02,
+	AARCH64_INSN_SYSREG_CNTVCTSS_EL0	= 0x5F06,
 };
 
 enum aarch64_insn_variant {
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index d13de4222cfb..a525387439fe 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -326,6 +326,10 @@
 	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_TPIDR_EL2)
 #define A64_MRS_SP_EL0(Rt) \
 	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_SP_EL0)
+#define A64_MRS_CNTVCT_EL0(Rt) \
+	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_CNTVCT_EL0)
+#define A64_MRS_CNTVCTSS_EL0(Rt) \
+	aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_CNTVCTSS_EL0)
 
 /* Barriers */
 #define A64_SB aarch64_insn_get_sb_value()
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 0816c40fc7af..7da7507ab431 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -10,6 +10,7 @@
 #include <linux/arm-smccc.h>
 #include <linux/bitfield.h>
 #include <linux/bpf.h>
+#include <linux/clocksource.h>
 #include <linux/cfi.h>
 #include <linux/filter.h>
 #include <linux/memory.h>
@@ -19,6 +20,7 @@
 #include <asm/asm-extable.h>
 #include <asm/byteorder.h>
 #include <asm/cpufeature.h>
+#include <asm/arch_timer.h>
 #include <asm/debug-monitors.h>
 #include <asm/insn.h>
 #include <asm/text-patching.h>
@@ -1571,10 +1573,54 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
 	case BPF_JMP | BPF_CALL:
 	{
 		const u8 r0 = bpf2a64[BPF_REG_0];
+		const u8 r1 = bpf2a64[BPF_REG_1];
+		const s32 imm = insn->imm;
 		bool func_addr_fixed;
 		u64 func_addr;
 		u32 cpu_offset;
 
+		/* Inline kfunc bpf_get_cpu_time_counter() */
+		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+		    imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
+		    bpf_jit_inlines_kfunc_call(imm)) {
+			/*
+			 * With ECV (ARMv8.6+), CNTVCTSS_EL0 is self-
+			 * synchronizing — no ISB needed.  Without ECV,
+			 * an ISB is required before reading CNTVCT_EL0
+			 * to prevent speculative/out-of-order reads.
+			 *
+			 * Matches arch_timer_read_cntvct_el0().
+			 */
+			if (cpus_have_cap(ARM64_HAS_ECV)) {
+				emit(A64_MRS_CNTVCTSS_EL0(r0), ctx);
+			} else {
+				emit(A64_ISB, ctx);
+				emit(A64_MRS_CNTVCT_EL0(r0), ctx);
+			}
+			break;
+		}
+
+		/* Inline kfunc bpf_cpu_time_counter_to_ns() */
+		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+		    imm == BPF_CALL_IMM(bpf_cpu_time_counter_to_ns) &&
+		    bpf_jit_inlines_kfunc_call(imm)) {
+			u32 freq = arch_timer_get_cntfrq();
+
+			if (freq == NSEC_PER_SEC) {
+				/* 1 GHz counter: 1 tick = 1 ns, identity */
+				emit(A64_MOV(1, r0, r1), ctx);
+			} else {
+				u32 mult, shift;
+
+				clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, 3600);
+				emit_a64_mov_i(1, tmp, mult, ctx);
+				emit(A64_MUL(1, r0, r1, tmp), ctx);
+				if (shift)
+					emit(A64_LSR(1, r0, r0, shift), ctx);
+			}
+			break;
+		}
+
 		/* Implement helper call to bpf_get_smp_processor_id() inline */
 		if (insn->src_reg == 0 && insn->imm == BPF_FUNC_get_smp_processor_id) {
 			cpu_offset = offsetof(struct thread_info, cpu);
@@ -3127,6 +3173,14 @@ bool bpf_jit_inlines_helper_call(s32 imm)
 	}
 }
 
+bool bpf_jit_inlines_kfunc_call(s32 imm)
+{
+	if (imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) ||
+	    imm == BPF_CALL_IMM(bpf_cpu_time_counter_to_ns))
+		return true;
+	return false;
+}
+
 void bpf_jit_free(struct bpf_prog *prog)
 {
 	if (prog->jited) {
diff --git a/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c b/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c
index 26c02010ccf1..ab1b20e28084 100644
--- a/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c
+++ b/tools/testing/selftests/bpf/progs/verifier_cpu_cycles.c
@@ -56,7 +56,7 @@ __naked int bpf_rdtsc_jit_x86_64(void)
 SEC("syscall")
 __arch_arm64
 __xlated("0: r1 = 42")
-__xlated("1: r0 = r1")
+__xlated("1: call kernel-function")
 __naked int bpf_cyc2ns_arm(void)
 {
 	asm volatile(
@@ -111,6 +111,54 @@ __naked int bpf_cyc2ns_jit_x86(void)
 	);
 }
 
+SEC("syscall")
+__arch_arm64
+__xlated("0: call kernel-function")
+__naked int bpf_cntvct(void)
+{
+	asm volatile(
+	"call %[bpf_get_cpu_time_counter];"
+	"exit"
+	:
+	: __imm(bpf_get_cpu_time_counter)
+	: __clobber_all
+	);
+}
+
+SEC("syscall")
+__arch_arm64
+/*
+ * With ECV:    mrs x7, CNTVCTSS_EL0
+ * Without ECV: isb; mrs x7, CNTVCT_EL0
+ */
+__jited("	mrs	x7, CNTVCT{{(SS_EL0|_EL0)}}")
+__naked int bpf_cntvct_jit_arm64(void)
+{
+	asm volatile(
+	"call %[bpf_get_cpu_time_counter];"
+	"exit"
+	:
+	: __imm(bpf_get_cpu_time_counter)
+	: __clobber_all
+	);
+}
+
+SEC("syscall")
+__arch_arm64
+/* bpf_cpu_time_counter_to_ns: mov (1GHz identity) or mul+lsr */
+__jited("	{{(mov	x7, x0|mul	x7, x0, x10)}}")
+__naked int bpf_cyc2ns_jit_arm64(void)
+{
+	asm volatile(
+	"r1=0x2a;"
+	"call %[bpf_cpu_time_counter_to_ns];"
+	"exit"
+	:
+	: __imm(bpf_cpu_time_counter_to_ns)
+	: __clobber_all
+	);
+}
+
 void rdtsc(void)
 {
 	bpf_get_cpu_time_counter();
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support for cpu time counter kfuncs
  2026-04-18 13:16 ` [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support " Puranjay Mohan
@ 2026-04-18 14:03   ` bot+bpf-ci
  2026-04-18 16:06   ` sashiko-bot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: bot+bpf-ci @ 2026-04-18 14:03 UTC (permalink / raw)
  To: puranjay, bpf
  Cc: puranjay, ast, andrii, daniel, martin.lau, eddyz87, memxor,
	mykyta.yatsenko5, xukuohai, vadim.fedorenko, catalin.marinas,
	will, kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3695 bytes --]

> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 0816c40fc7af..7da7507ab431 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c

[ ... ]

> @@ -1571,10 +1573,54 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
>  	case BPF_JMP | BPF_CALL:
>  	{
>  		const u8 r0 = bpf2a64[BPF_REG_0];
> +		const u8 r1 = bpf2a64[BPF_REG_1];
> +		const s32 imm = insn->imm;
>  		bool func_addr_fixed;
>  		u64 func_addr;
>  		u32 cpu_offset;
>
> +		/* Inline kfunc bpf_get_cpu_time_counter() */
> +		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +		    imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
> +		    bpf_jit_inlines_kfunc_call(imm)) {
> +			/*
> +			 * With ECV (ARMv8.6+), CNTVCTSS_EL0 is self-
> +			 * synchronizing — no ISB needed.  Without ECV,
> +			 * an ISB is required before reading CNTVCT_EL0
> +			 * to prevent speculative/out-of-order reads.
> +			 *
> +			 * Matches arch_timer_read_cntvct_el0().
> +			 */
> +			if (cpus_have_cap(ARM64_HAS_ECV)) {
> +				emit(A64_MRS_CNTVCTSS_EL0(r0), ctx);
> +			} else {
> +				emit(A64_ISB, ctx);
> +				emit(A64_MRS_CNTVCT_EL0(r0), ctx);
> +			}
> +			break;
> +		}
> +
> +		/* Inline kfunc bpf_cpu_time_counter_to_ns() */
> +		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +		    imm == BPF_CALL_IMM(bpf_cpu_time_counter_to_ns) &&
> +		    bpf_jit_inlines_kfunc_call(imm)) {
> +			u32 freq = arch_timer_get_cntfrq();
> +
> +			if (freq == NSEC_PER_SEC) {
> +				/* 1 GHz counter: 1 tick = 1 ns, identity */
> +				emit(A64_MOV(1, r0, r1), ctx);
> +			} else {
> +				u32 mult, shift;
> +
> +				clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, 3600);
> +				emit_a64_mov_i(1, tmp, mult, ctx);
                                                         ^^^^

Can mult be sign-extended here? Looking at the declaration of mult
as u32, when it has bit 31 set (mult >= 0x80000000), the implicit
conversion to s32 in emit_a64_mov_i (which takes 'const s32 val')
makes the value negative.

In emit_a64_mov_i:

  u16 hi = val >> 16;      /* arithmetic shift when val is s32 negative */
  u16 lo = val & 0xffff;
  if (hi & 0x8000) {
    ...
    emit(A64_MOVN(is64, reg, (u16)~hi, 16), ctx);  /* is64=1 => 64-bit MOVN */
    ...
  }

For is64=1, MOVN sets the full 64-bit register to ~(imm << shift), so
the upper 32 bits become 0xFFFFFFFF. For example, mult = 0xFA000000
would yield tmp = 0xFFFFFFFF_FA000000 instead of 0x00000000_FA000000.
The subsequent 64-bit MUL would then multiply by this sign-extended
value.

This can happen when clocks_calc_mult_shift produces mult >= 2^31. The
algorithm in kernel/time/clocksource.c keeps sftacc at 32 whenever
maxsec*from < 2^32, i.e., whenever arch_timer_get_cntfrq() returns a
value less than about 1,193,046 Hz. In that regime mult can occupy the
full u32 range, including values with bit 31 set (e.g., for freq == 1
MHz, mult = 0xFA000000).

While drivers/clocksource/arm_arch_timer.c only warns for
arch_timer_rate < 1000000 and does not reject such rates, a platform
configured with CNTFRQ_EL0 below about 1.2 MHz would execute this code
path.

Other uses of emit_a64_mov_i(1, ...) in this file pass s32
offsets/imms where sign-extension is desired. For the u32 mult value
zero-extension is required.

> +				emit(A64_MUL(1, r0, r1, tmp), ctx);
> +				if (shift)
> +					emit(A64_LSR(1, r0, r0, shift), ctx);
> +			}
> +			break;
> +		}
> +

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24605755621

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support for cpu time counter kfuncs
  2026-04-18 13:16 ` [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support " Puranjay Mohan
  2026-04-18 14:03   ` bot+bpf-ci
@ 2026-04-18 16:06   ` sashiko-bot
  2026-04-20  4:03   ` Xu Kuohai
  2026-04-20 10:16   ` Will Deacon
  3 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-18 16:06 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Sashiko AI review found 4 potential issue(s):
- [Critical] Division by zero kernel panic during JIT compilation due to direct read of cntfrq_el0.
- [High] Implicit sign-extension of the 32-bit multiplier corrupts nanosecond conversions.
- [Medium] Unconditional JIT inlining of timer read bypasses hardware errata workarounds.
- [Medium] Silent 64-bit math overflow for duration measurements exceeding 1 hour.
--

commit f282b55d9d2af467b0a51264b002bf14ff3648fc
Author: Puranjay Mohan <puranjay@kernel.org>

    bpf, arm64: Add JIT support for cpu time counter kfuncs

This commit adds ARM64 JIT inlining for the bpf_get_cpu_time_counter() and
bpf_cpu_time_counter_to_ns() kfuncs. It emits instructions to read the
architected timer directly and perform the nanosecond conversion.

> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 0816c40fc7af9..7da7507ab4315 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1571,10 +1573,54 @@ static int build_insn(const struct bpf_verifier_env *env, const struct bpf_insn
[ ... ]
> +		/* Inline kfunc bpf_get_cpu_time_counter() */
> +		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +		    imm == BPF_CALL_IMM(bpf_get_cpu_time_counter) &&
> +		    bpf_jit_inlines_kfunc_call(imm)) {
> +			/*
> +			 * With ECV (ARMv8.6+), CNTVCTSS_EL0 is self-
> +			 * synchronizing — no ISB needed.  Without ECV,
> +			 * an ISB is required before reading CNTVCT_EL0
> +			 * to prevent speculative/out-of-order reads.
> +			 *
> +			 * Matches arch_timer_read_cntvct_el0().
> +			 */
> +			if (cpus_have_cap(ARM64_HAS_ECV)) {
> +				emit(A64_MRS_CNTVCTSS_EL0(r0), ctx);
> +			} else {
> +				emit(A64_ISB, ctx);
> +				emit(A64_MRS_CNTVCT_EL0(r0), ctx);
> +			}
> +			break;
> +		}

Does this unconditional JIT inlining bypass the hardware errata workarounds?

The generic C implementation of bpf_get_cpu_time_counter() uses
ktime_get_raw_fast_ns(), which safely respects the out-of-line workaround
framework (CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND). By blindly emitting the
MRS CNTVCT_EL0 (or CNTVCTSS_EL0) instructions, could BPF programs receive
unstable and corrupted timestamps on affected hardware?

> +		/* Inline kfunc bpf_cpu_time_counter_to_ns() */
> +		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +		    imm == BPF_CALL_IMM(bpf_cpu_time_counter_to_ns) &&
> +		    bpf_jit_inlines_kfunc_call(imm)) {
> +			u32 freq = arch_timer_get_cntfrq();

Could this cause a division-by-zero kernel panic during JIT compilation?

On platforms with broken firmware, the cntfrq_el0 register is often left
uninitialized (0), and the kernel relies on a device tree fallback.
If arch_timer_get_cntfrq() returns 0, will clocks_calc_mult_shift() below
execute a do_div(tmp, freq) and crash the system during BPF program load?

> +			if (freq == NSEC_PER_SEC) {
> +				/* 1 GHz counter: 1 tick = 1 ns, identity */
> +				emit(A64_MOV(1, r0, r1), ctx);
> +			} else {
> +				u32 mult, shift;
> +
> +				clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, 3600);
> +				emit_a64_mov_i(1, tmp, mult, ctx);

Does emit_a64_mov_i() implicitly sign-extend the 32-bit multiplier?

The emit_a64_mov_i() function takes a signed s32 value as its argument.
For systems with slower timer frequencies, clocks_calc_mult_shift() can
compute a large unsigned 32-bit multiplier whose most significant bit is
set. Will this cause the unsigned multiplier to be implicitly cast to a
negative number and sign-extended to 64 bits, corrupting the subsequent
multiplication?

> +				emit(A64_MUL(1, r0, r1, tmp), ctx);

Will this simple 64-bit multiplication silently overflow for duration
measurements exceeding one hour?

Since clocks_calc_mult_shift() is called with maxsec = 3600 (1 hour), the
multiplier is calculated such that delta_ticks * mult fits within 64 bits
up to that duration. For longer durations, could this simple 64-bit
multiplication wrap around, returning falsely small or corrupted nanosecond
durations, unlike the 128-bit math used in the generic kernel timekeeping?

> +				if (shift)
> +					emit(A64_LSR(1, r0, r0, shift), ctx);
> +			}
> +			break;
> +		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418131614.1501848-1-puranjay@kernel.org?part=6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support for cpu time counter kfuncs
  2026-04-18 13:16 ` [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support " Puranjay Mohan
  2026-04-18 14:03   ` bot+bpf-ci
  2026-04-18 16:06   ` sashiko-bot
@ 2026-04-20  4:03   ` Xu Kuohai
  2026-04-20  9:45     ` Puranjay Mohan
  2026-04-20 10:16   ` Will Deacon
  3 siblings, 1 reply; 19+ messages in thread
From: Xu Kuohai @ 2026-04-20  4:03 UTC (permalink / raw)
  To: Puranjay Mohan, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, Vadim Fedorenko, Catalin Marinas, Will Deacon,
	kernel-team

On 4/18/2026 9:16 PM, Puranjay Mohan wrote:
> Add ARM64 JIT inlining for bpf_get_cpu_time_counter() and
> bpf_cpu_time_counter_to_ns() kfuncs.
> 
> bpf_get_cpu_time_counter() is JIT-inlined as:
> 
>    ISB                       // serialize instruction stream
>    MRS Xn, CNTVCT_EL0        // read architected timer counter
> 
> The ISB before the MRS is required for ordering, matching the kernel's
> arch_timer_read_cntvct_el0() implementation.
> 
> On newer CPUs it will be JITed to:
> 
>    MRS Xn, CNTVCTSS_EL0      // self-synchronized (ISB not needed)
> 
> bpf_cpu_time_counter_to_ns() is JIT-inlined using mult/shift constants
> computed at JIT time from the architected timer frequency (CNTFRQ_EL0):
> 
>    MOV Xtmp, #mult           // load conversion multiplier
>    MUL Xn, Xarg, Xtmp        // delta_ticks * mult
>    LSR Xn, Xn, #shift        // >> shift = nanoseconds
> 
> On systems with a 1GHz counter (e.g., Neoverse-V2), mult=1 and shift=0,
> so the conversion collapses to a single MOV (identity).
> 

ISB triggers pipeline flush and instruction refetch. Since it is a really heavy
operation, I think We should collect performance data to verify whether the
jited instructions are actually faster than the original bpf_ktime_get_ns().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support for cpu time counter kfuncs
  2026-04-20  4:03   ` Xu Kuohai
@ 2026-04-20  9:45     ` Puranjay Mohan
  0 siblings, 0 replies; 19+ messages in thread
From: Puranjay Mohan @ 2026-04-20  9:45 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, Vadim Fedorenko, Catalin Marinas, Will Deacon,
	kernel-team

On Mon, Apr 20, 2026 at 5:03 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
>
> On 4/18/2026 9:16 PM, Puranjay Mohan wrote:
> > Add ARM64 JIT inlining for bpf_get_cpu_time_counter() and
> > bpf_cpu_time_counter_to_ns() kfuncs.
> >
> > bpf_get_cpu_time_counter() is JIT-inlined as:
> >
> >    ISB                       // serialize instruction stream
> >    MRS Xn, CNTVCT_EL0        // read architected timer counter
> >
> > The ISB before the MRS is required for ordering, matching the kernel's
> > arch_timer_read_cntvct_el0() implementation.
> >
> > On newer CPUs it will be JITed to:
> >
> >    MRS Xn, CNTVCTSS_EL0      // self-synchronized (ISB not needed)
> >
> > bpf_cpu_time_counter_to_ns() is JIT-inlined using mult/shift constants
> > computed at JIT time from the architected timer frequency (CNTFRQ_EL0):
> >
> >    MOV Xtmp, #mult           // load conversion multiplier
> >    MUL Xn, Xarg, Xtmp        // delta_ticks * mult
> >    LSR Xn, Xn, #shift        // >> shift = nanoseconds
> >
> > On systems with a 1GHz counter (e.g., Neoverse-V2), mult=1 and shift=0,
> > so the conversion collapses to a single MOV (identity).
> >
>
> ISB triggers pipeline flush and instruction refetch. Since it is a really heavy
> operation, I think We should collect performance data to verify whether the
> jited instructions are actually faster than the original bpf_ktime_get_ns().

Okay, I will get data, but shouldn't we still have it as future CPUs
will have CNTVCTSS_EL0 and will not need ISB.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support for cpu time counter kfuncs
  2026-04-18 13:16 ` [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support " Puranjay Mohan
                     ` (2 preceding siblings ...)
  2026-04-20  4:03   ` Xu Kuohai
@ 2026-04-20 10:16   ` Will Deacon
  2026-04-20 10:44     ` Marc Zyngier
  3 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2026-04-20 10:16 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, Xu Kuohai, Vadim Fedorenko, Catalin Marinas,
	kernel-team, maz

[+ Marc]

On Sat, Apr 18, 2026 at 06:16:04AM -0700, Puranjay Mohan wrote:
> Add ARM64 JIT inlining for bpf_get_cpu_time_counter() and
> bpf_cpu_time_counter_to_ns() kfuncs.
> 
> bpf_get_cpu_time_counter() is JIT-inlined as:
> 
>   ISB                       // serialize instruction stream
>   MRS Xn, CNTVCT_EL0        // read architected timer counter
> 
> The ISB before the MRS is required for ordering, matching the kernel's
> arch_timer_read_cntvct_el0() implementation.

Careful here: this will _not_ order counter accesses against normal
memory barriers (e.g. smp_mb(), acquire/release). If you need that, then
you need something like arch_counter_enforce_ordering(), which we do
have in __arch_counter_get_cntvct().

Furthermore, using the virtual counter may expose you to situations
where a guest value for CNTVOFF is installed (e.g. during the VCPU run
loop). Given all the contexts in which BPF can run, this worries me a
little as you might end up seeing a non-monotonic view of time between
BPF programs.

> On newer CPUs it will be JITed to:
> 
>   MRS Xn, CNTVCTSS_EL0      // self-synchronized (ISB not needed)
> 
> bpf_cpu_time_counter_to_ns() is JIT-inlined using mult/shift constants
> computed at JIT time from the architected timer frequency (CNTFRQ_EL0):
> 
>   MOV Xtmp, #mult           // load conversion multiplier
>   MUL Xn, Xarg, Xtmp        // delta_ticks * mult
>   LSR Xn, Xn, #shift        // >> shift = nanoseconds
> 
> On systems with a 1GHz counter (e.g., Neoverse-V2), mult=1 and shift=0,
> so the conversion collapses to a single MOV (identity).

Do you have any performance numbers to show that this is worthwhile
compared to calling a helper wrapping __arch_counter_get_cntvct().

> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  arch/arm64/include/asm/insn.h                 |  2 +
>  arch/arm64/net/bpf_jit.h                      |  4 ++
>  arch/arm64/net/bpf_jit_comp.c                 | 54 +++++++++++++++++++
>  .../selftests/bpf/progs/verifier_cpu_cycles.c | 50 ++++++++++++++++-
>  4 files changed, 109 insertions(+), 1 deletion(-)

Aren't you conveniently ignoring CPU errata here? I suspect this needs
to be predicated on the absence of those, in a similar way to how the
vDSO deals with the 'vsdo_clockmode'.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support for cpu time counter kfuncs
  2026-04-20 10:16   ` Will Deacon
@ 2026-04-20 10:44     ` Marc Zyngier
  0 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2026-04-20 10:44 UTC (permalink / raw)
  To: Puranjay Mohan, Will Deacon
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, Xu Kuohai, Vadim Fedorenko, Catalin Marinas,
	kernel-team

On Mon, 20 Apr 2026 11:16:34 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> [+ Marc]
> 
> On Sat, Apr 18, 2026 at 06:16:04AM -0700, Puranjay Mohan wrote:
> > Add ARM64 JIT inlining for bpf_get_cpu_time_counter() and
> > bpf_cpu_time_counter_to_ns() kfuncs.
> > 
> > bpf_get_cpu_time_counter() is JIT-inlined as:
> > 
> >   ISB                       // serialize instruction stream
> >   MRS Xn, CNTVCT_EL0        // read architected timer counter
> > 
> > The ISB before the MRS is required for ordering, matching the kernel's
> > arch_timer_read_cntvct_el0() implementation.
> 
> Careful here: this will _not_ order counter accesses against normal
> memory barriers (e.g. smp_mb(), acquire/release). If you need that, then
> you need something like arch_counter_enforce_ordering(), which we do
> have in __arch_counter_get_cntvct().
> 
> Furthermore, using the virtual counter may expose you to situations
> where a guest value for CNTVOFF is installed (e.g. during the VCPU run
> loop). Given all the contexts in which BPF can run, this worries me a
> little as you might end up seeing a non-monotonic view of time between
> BPF programs.

Yup, and this will catch you out in subtle ways depending on whether
the host runs at EL2 or not.

The kernel (mostly) avoids this pitfall by picking the physical
counter for sched_clock() when running at EL2, so that it isn't
directly affected by CNTVOFF_EL2.

I suspect that you'll need something similar if you insist on a
globally monotonic view of the counter.

>
> > On newer CPUs it will be JITed to:
> > 
> >   MRS Xn, CNTVCTSS_EL0      // self-synchronized (ISB not needed)
> > 
> > bpf_cpu_time_counter_to_ns() is JIT-inlined using mult/shift constants
> > computed at JIT time from the architected timer frequency (CNTFRQ_EL0):

And what happens when CNTFRQ_EL0 is not populated, as it is the case
on a lot of stupidly integrated HW? The kernel has a fallback, but you
don't seem to make use of it here.

> >
> >   MOV Xtmp, #mult           // load conversion multiplier
> >   MUL Xn, Xarg, Xtmp        // delta_ticks * mult
> >   LSR Xn, Xn, #shift        // >> shift = nanoseconds
> > 
> > On systems with a 1GHz counter (e.g., Neoverse-V2), mult=1 and shift=0,
> > so the conversion collapses to a single MOV (identity).
> 
> Do you have any performance numbers to show that this is worthwhile
> compared to calling a helper wrapping __arch_counter_get_cntvct().
> 
> > Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> > ---
> >  arch/arm64/include/asm/insn.h                 |  2 +
> >  arch/arm64/net/bpf_jit.h                      |  4 ++
> >  arch/arm64/net/bpf_jit_comp.c                 | 54 +++++++++++++++++++
> >  .../selftests/bpf/progs/verifier_cpu_cycles.c | 50 ++++++++++++++++-
> >  4 files changed, 109 insertions(+), 1 deletion(-)
> 
> Aren't you conveniently ignoring CPU errata here? I suspect this needs
> to be predicated on the absence of those, in a similar way to how the
> vDSO deals with the 'vsdo_clockmode'.

And I'm afraid this still is the common case...

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-04-20 10:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-18 13:15 [PATCH bpf-next v13 0/6] bpf: add cpu time counter kfuncs Puranjay Mohan
2026-04-18 13:15 ` [PATCH bpf-next v13 1/6] bpf: adjust BPF JIT dependency to BPF_SYSCALL Puranjay Mohan
2026-04-18 13:46   ` sashiko-bot
2026-04-18 13:16 ` [PATCH bpf-next v13 2/6] bpf: add bpf_get_cpu_time_counter kfunc Puranjay Mohan
2026-04-18 14:24   ` sashiko-bot
2026-04-18 13:16 ` [PATCH bpf-next v13 3/6] bpf: add bpf_cpu_time_counter_to_ns kfunc Puranjay Mohan
2026-04-18 14:03   ` bot+bpf-ci
2026-04-18 14:54   ` sashiko-bot
2026-04-18 13:16 ` [PATCH bpf-next v13 4/6] selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit Puranjay Mohan
2026-04-18 15:08   ` sashiko-bot
2026-04-18 13:16 ` [PATCH bpf-next v13 5/6] selftests/bpf: add usage example for cpu time counter kfuncs Puranjay Mohan
2026-04-18 15:17   ` sashiko-bot
2026-04-18 13:16 ` [PATCH bpf-next v13 6/6] bpf, arm64: Add JIT support " Puranjay Mohan
2026-04-18 14:03   ` bot+bpf-ci
2026-04-18 16:06   ` sashiko-bot
2026-04-20  4:03   ` Xu Kuohai
2026-04-20  9:45     ` Puranjay Mohan
2026-04-20 10:16   ` Will Deacon
2026-04-20 10:44     ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox