From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 211D940DFC2 for ; Tue, 28 Apr 2026 00:12:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777335174; cv=none; b=RYB1c+nWPANdblABqTi7UrIK4fdFGRVDTMQM/2Hpm7M1+3jFaT16Zt2SpwidjyjZ5MVJMf2/G9xnHWzriM0UqjtD0kJNvF/n5uwZpKWvXxHdgZIlE0ALSDYUuohPL7llidZLxqShs484+up90LpLCWVNA/+d5aK4KaKpnwVsI8w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777335174; c=relaxed/simple; bh=ZTzqeKSs7yt4mjadCpjAaZLuI1Edgw3v4wVrEULsUrI=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=dj2Ze8hOi7gL+biDNwgqArt1Lxif7D+uX+BrA4qzVHYmb+0dIqwrqvK7mJGul/nqQm1JWIqP886zR/s2bASoqO45JTTkapK1s+vcPaOfnkqzby7VRIjswGv7I+XyEVkrMhuOr0vJ9SOJJFy0otBsGCjuQf6Bou/qDx2u49bfvgM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GrpD0jIo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GrpD0jIo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E5EDCC19425; Tue, 28 Apr 2026 00:12:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777335174; bh=ZTzqeKSs7yt4mjadCpjAaZLuI1Edgw3v4wVrEULsUrI=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=GrpD0jIodn9ViCazT6t02Oyh7WaDhsQc/G4JjgzoDbJhYEbcd/mAyLOWBN0q+YuLt nNoMCPmsCYthbIUDeEq5w6n7Y4k5iUajqDoDWS1tc/BVLDFRH4eWAPs6mX+Zzosq7r jSn4cEWksE91tW6H7C/Zoxx2finp+y3OB2kCE8paBW+f79y2XUq2djtwiOsTsa4c3Q 1mE6asVFHTqMJuvwtfoBUm5xNWozoBg4rOLkYNtgpuHMdr4MxhwkEzBehvQauXb7D9 MKDqlvF15CS98xjQfcT2uKs/SDbLGF9VaA2TwQGYwEyZUVu67vT5CcghpcjPb0t1/1 mUx0w4IBA6H6Q== From: sashiko-bot@kernel.org Subject: Re: [PATCH bpf-next 2/7] selftests/bpf: Add BPF batch-timing library Reply-To: sashiko@lists.linux.dev To: "Puranjay Mohan" Cc: bpf@vger.kernel.org In-Reply-To: <20260427232313.1582588-3-puranjay@kernel.org> References: <20260427232313.1582588-3-puranjay@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 28 Apr 2026 00:12:53 +0000 Message-Id: <20260428001253.E5EDCC19425@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Sashiko AI review found 4 potential issue(s): - [Medium] Integer division truncation in `bpf_bench_calibrate()` breaks au= to-calibration for very fast BPF programs. - [Medium] Zero warmup time (`-w 0`) causes the benchmark to silently colle= ct no samples. - [Medium] BPF `can_loop` (`bpf_may_goto`) can prematurely terminate the ti= med loop, silently invalidating the userspace `ns/op` calculation. - [Low] The BPF header includes a non-existent file ``. -- commit ecbc1ddd2ab895fa706970f55f150bb87ba2790e Author: Puranjay Mohan selftests/bpf: Add BPF batch-timing library =20 This commit adds a reusable timing library for BPF benchmarks that need to measure BPF program execution time. It provides a BPF macro to record elaps= ed time and a userspace library to auto-calibrate batch iterations and compute statistics. > diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c b/tool= s/testing/selftests/bpf/benchs/bench_bpf_timing.c > new file mode 100644 > index 000000000000..75a39da69655 > --- /dev/null > +++ b/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c [ ... ] > +static int collect_samples(struct bpf_bench_timing *t, > + double *out, int max_out) > +{ > + unsigned int nr_cpus =3D bpf_num_possible_cpus(); > + __u32 timed_iters =3D t->batch_iters; > + int total =3D 0; [ ... ] > + for (__u32 i =3D 0; i < count && total < max_out; i++) { > + __u64 sample =3D t->samples[cpu][i]; > + > + if (sample =3D=3D 0) > + continue; > + out[total++] =3D (double)sample / timed_iters; If the loop in BENCH_BPF_LOOP is terminated early by the can_loop condition (e.g., due to instruction limits or yielding), the actual number of complet= ed iterations will be less than timed_iters. Since the BPF program only records the elapsed time, will dividing by timed_iters produce artificially low ns/op calculations? Should the BPF program explicitly record and export the actual number of completed iterations for this calculation? > + } > + } > + > + qsort(out, total, sizeof(double), cmp_double); > + return total; > +} [ ... ] > +void bpf_bench_timing_measure(struct bpf_bench_timing *t, struct bench_r= es *res) > +{ > + unsigned int nr_cpus; > + __u32 total_samples; > + int i; > + > + t->warmup_ticks++; > + > + if (t->warmup_ticks < env.warmup_sec) > + return; > + > + if (t->warmup_ticks =3D=3D env.warmup_sec) { > + *t->timing_enabled =3D 1; > + return; > + } If the benchmark is run with zero warmup (env.warmup_sec =3D=3D 0), t->warmup_ticks becomes 1 on the first tick. Does this mean t->warmup_ticks =3D=3D env.warmup_sec will never be true, by= passing the assignment and preventing timing_enabled from ever being set to 1? [ ... ] > +void bpf_bench_calibrate(struct bpf_bench_timing *t, bpf_bench_run_fn ru= n_fn, void *run_ctx) > +{ > + __u64 elapsed, per_op_ns; > + __u64 time_n, time_2n; > + double ratio; > + > + elapsed =3D measure_elapsed(t, run_fn, run_ctx, CALIBRATE_SEED_BATCH, C= ALIBRATE_RUNS); > + if (elapsed =3D=3D 0) { > + fprintf(stderr, "calibration: no timing samples, using default\n"); > + t->batch_iters =3D 10000; > + *t->batch_iters_bss =3D t->batch_iters; > + reset_timing(t); > + return; > + } > + > + per_op_ns =3D elapsed / CALIBRATE_SEED_BATCH; If the BPF operation executes in less than 1 ns (e.g., an empty loop or bpf-nop), elapsed could be less than 100 since CALIBRATE_SEED_BATCH is 100. Will the integer division truncate per_op_ns to 0 here, causing compute_batch_iters() to incorrectly fall back to CALIBRATE_MIN_BATCH inste= ad of scaling up the iterations? > + t->batch_iters =3D compute_batch_iters(per_op_ns); [ ... ] > diff --git a/tools/testing/selftests/bpf/progs/bench_bpf_timing.bpf.h b/t= ools/testing/selftests/bpf/progs/bench_bpf_timing.bpf.h > new file mode 100644 > index 000000000000..6a1ad75f1fd7 > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/bench_bpf_timing.bpf.h > @@ -0,0 +1,69 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */ > + > +#ifndef __BENCH_BPF_TIMING_BPF_H__ > +#define __BENCH_BPF_TIMING_BPF_H__ > + > +#include > +#include > +#include > +#include Does bpf_may_goto.h exist in the kernel source tree? I noticed that can_loop is typically defined in bpf_experimental.h. Will including this cause a No such file or directory build error? > + > +#ifndef BENCH_NR_SAMPLES --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260427232313.1582= 588-1-puranjay@kernel.org?part=3D2