[PATCH bpf-next 0/4] selftests/bpf: XDP LB benchmark fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/4] selftests/bpf: XDP LB benchmark fixes
@ 2026-05-19 16:36 Puranjay Mohan
  2026-05-19 16:36 ` [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark Puranjay Mohan
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-05-19 16:36 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	Fei Chen, Taruna Agrawal, Nikhil Dixit Limaye, Nikita V. Shirokov,
	kernel-team

Three bug fixes and one improvement for the XDP LB and batch-timing
benchmarks.

The cold_lru validation was failing a lot because batch_hash could
compute to zero when batch_gen matched the CPU id. Similarly,
pre-populated UDP LRU entries had atime=0 so they'd expire immediately
on any CPU that calibration didn't warm. Both are fixed in patches 1-2.

Patch 3 lowers CALIBRATE_MAX_BATCH to stay under BPF_MAX_LOOPS.

Patch 4 adds IQR outlier filtering to the timing stats to stabilize
scenarios with high stddev.

Puranjay Mohan (4):
  selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB
    benchmark
  selftests/bpf: Fix expired UDP LRU entries in XDP LB benchmark
  selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit
  selftests/bpf: Filter timing outliers with IQR in batch-timing library

 .../selftests/bpf/benchs/bench_bpf_timing.c   | 28 ++++++++++++++++++-
 .../selftests/bpf/benchs/bench_xdp_lb.c       | 11 ++++++++
 .../selftests/bpf/progs/xdp_lb_bench.c        |  2 +-
 3 files changed, 39 insertions(+), 2 deletions(-)

-- 
2.53.0-Meta


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark
  2026-05-19 16:36 [PATCH bpf-next 0/4] selftests/bpf: XDP LB benchmark fixes Puranjay Mohan
@ 2026-05-19 16:36 ` Puranjay Mohan
  2026-05-19 16:51   ` sashiko-bot
  2026-05-19 17:14   ` bot+bpf-ci
  2026-05-19 16:36 ` [PATCH bpf-next 2/4] selftests/bpf: Fix expired UDP LRU entries " Puranjay Mohan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-05-19 16:36 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	Fei Chen, Taruna Agrawal, Nikhil Dixit Limaye, Nikita V. Shirokov,
	kernel-team

The cold_lru mechanism generates a per-batch hash to XOR into the source
address, ensuring each batch tests a fresh (uncached) flow:

    batch_hash = (batch_gen ^ cpu_id) * KNUTH_HASH_MULT;
    *saddr ^= batch_hash;

When batch_gen equals the CPU ID, the XOR is zero and batch_hash becomes
zero.  The source address is left unchanged, so every iteration hits the
warm LRU entry instead of testing cold lookups.

During validation, batch_gen is 2 (one increment from seeding, one from
the validation run itself).  If the BPF program happens to execute on
CPU 2, batch_hash is zero and all LRU lookups hit, producing:

    [udp-v4-lru-miss] COUNTER FAIL: LRU misses=0, expected 1
    Validation FAILED - aborting benchmark

This fails roughly 50% of the time depending on scheduler placement.

Fix by forcing the multiplier input to be non-zero with | 1.  Since
KNUTH_HASH_MULT (2654435761) is odd, the product of two odd numbers is
always odd, so batch_hash can never be zero.

Fixes: 4b4f2229104c ("selftests/bpf: Add XDP load-balancer BPF program")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 tools/testing/selftests/bpf/progs/xdp_lb_bench.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
index b9fd848c035d..f40e36aa183e 100644
--- a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
+++ b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
@@ -618,7 +618,7 @@ int xdp_lb_bench(struct xdp_md *xdp)
 		__u32 *saddr = data + saddr_off;
 
 		batch_gen++;
-		batch_hash = (batch_gen ^ bpf_get_smp_processor_id()) * KNUTH_HASH_MULT;
+		batch_hash = ((batch_gen ^ bpf_get_smp_processor_id()) | 1) * KNUTH_HASH_MULT;
 		if ((void *)(saddr + 1) <= data_end)
 			*saddr ^= batch_hash;
 	}
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark
  2026-05-19 16:36 ` [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark Puranjay Mohan
@ 2026-05-19 16:51   ` sashiko-bot
  2026-05-19 17:14   ` bot+bpf-ci
  1 sibling, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-19 16:51 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The fix for zero `batch_hash` introduces a regression in the `cold_lru` benchmark by destroying the lowest bit of entropy, causing 50% of consecutive batches to generate identical hashes.
--

commit d60966744103a5bacba48698e7e96189a24c69c1
Author: Puranjay Mohan <puranjay@kernel.org>

selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark

This commit addresses an issue in the cold_lru benchmark where a batch_hash
of zero causes the benchmark to incorrectly hit warm LRU entries. It attempts
to fix this by bitwise-ORing the input to the hash multiplier with 1.

> diff --git a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> --- a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> @@ -618,7 +618,7 @@ int xdp_lb_bench(struct xdp_md *xdp)
>  		__u32 *saddr = data + saddr_off;
>  
>  		batch_gen++;
> -		batch_hash = (batch_gen ^ bpf_get_smp_processor_id()) * KNUTH_HASH_MULT;
> +		batch_hash = ((batch_gen ^ bpf_get_smp_processor_id()) | 1) * KNUTH_HASH_MULT;

[Severity: Medium]
Does this code introduce a regression where using | 1 inadvertently destroys
the lowest bit of entropy from the batch generator?

Since batch_gen increments sequentially by 1 per batch execution, forcing
the least significant bit to 1 maps pairs of adjacent values to the same
number. For example, if the CPU ID evaluates to 0, both batch_gen = 2 and
batch_gen = 3 will yield 3 before the multiplication.

Could this cause 50% of consecutive batches to generate the exact same
batch_hash as the previous batch, causing those iterations to hit the warm
LRU cache entries that were just populated and defeating the purpose of
the cold_lru benchmark?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260519163632.2220753-1-puranjay@kernel.org?part=1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark
  2026-05-19 16:36 ` [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark Puranjay Mohan
  2026-05-19 16:51   ` sashiko-bot
@ 2026-05-19 17:14   ` bot+bpf-ci
  2026-05-19 19:25     ` Puranjay Mohan
  1 sibling, 1 reply; 11+ messages in thread
From: bot+bpf-ci @ 2026-05-19 17:14 UTC (permalink / raw)
  To: puranjay, bpf
  Cc: puranjay, puranjay12, ast, andrii, daniel, martin.lau, eddyz87,
	memxor, mykyta.yatsenko5, feichen, taragrawal, ndixit, tehnerd,
	kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

> diff --git a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> index b9fd848c035d..f40e36aa183e 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> @@ -618,7 +618,7 @@ int xdp_lb_bench(struct xdp_md *xdp)
>  		__u32 *saddr = data + saddr_off;
>
>  		batch_gen++;
> -		batch_hash = (batch_gen ^ bpf_get_smp_processor_id()) * KNUTH_HASH_MULT;
> +		batch_hash = ((batch_gen ^ bpf_get_smp_processor_id()) | 1) * KNUTH_HASH_MULT;
                                                                           ^^^^

Does forcing the LSB to 1 with '| 1' inadvertently destroy the lowest bit
of entropy from the batch generator?

Since batch_gen increments sequentially by 1 per batch execution, forcing
the least significant bit to 1 maps pairs of adjacent values to the same
number. For example, if the CPU ID evaluates to 0, both batch_gen = 2 and
batch_gen = 3 will yield 3 before the multiplication:

  batch_gen = 2, CPU = 0:  (2 ^ 0) | 1 = 2 | 1 = 3
  batch_gen = 3, CPU = 0:  (3 ^ 0) | 1 = 3 | 1 = 3

Could this cause 50% of consecutive batches to generate the exact same
batch_hash as the previous batch, causing those iterations to hit the warm
LRU cache entries that were just populated and defeating the purpose of
the cold_lru benchmark?

Reference: https://lore.kernel.org/bpf/20260519165140.12529C2BCB3@smtp.kernel.org/

>  		if ((void *)(saddr + 1) <= data_end)
>  			*saddr ^= batch_hash;
>  	}


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26112027147

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark
  2026-05-19 17:14   ` bot+bpf-ci
@ 2026-05-19 19:25     ` Puranjay Mohan
  0 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-05-19 19:25 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: bpf, ast, andrii, daniel, martin.lau, eddyz87, memxor,
	mykyta.yatsenko5, feichen, taragrawal, ndixit, tehnerd,
	kernel-team, yonghong.song, clm, ihor.solodrai

On Tue, May 19, 2026 at 6:14 PM <bot+bpf-ci@kernel.org> wrote:
>
> > diff --git a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> > index b9fd848c035d..f40e36aa183e 100644
> > --- a/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> > +++ b/tools/testing/selftests/bpf/progs/xdp_lb_bench.c
> > @@ -618,7 +618,7 @@ int xdp_lb_bench(struct xdp_md *xdp)
> >               __u32 *saddr = data + saddr_off;
> >
> >               batch_gen++;
> > -             batch_hash = (batch_gen ^ bpf_get_smp_processor_id()) * KNUTH_HASH_MULT;
> > +             batch_hash = ((batch_gen ^ bpf_get_smp_processor_id()) | 1) * KNUTH_HASH_MULT;
>                                                                            ^^^^
>
> Does forcing the LSB to 1 with '| 1' inadvertently destroy the lowest bit
> of entropy from the batch generator?
>
> Since batch_gen increments sequentially by 1 per batch execution, forcing
> the least significant bit to 1 maps pairs of adjacent values to the same
> number. For example, if the CPU ID evaluates to 0, both batch_gen = 2 and
> batch_gen = 3 will yield 3 before the multiplication:
>
>   batch_gen = 2, CPU = 0:  (2 ^ 0) | 1 = 2 | 1 = 3
>   batch_gen = 3, CPU = 0:  (3 ^ 0) | 1 = 3 | 1 = 3
>
> Could this cause 50% of consecutive batches to generate the exact same
> batch_hash as the previous batch, causing those iterations to hit the warm
> LRU cache entries that were just populated and defeating the purpose of
> the cold_lru benchmark?

Yeah, this analysis is correct, I need to fix it another way.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 2/4] selftests/bpf: Fix expired UDP LRU entries in XDP LB benchmark
  2026-05-19 16:36 [PATCH bpf-next 0/4] selftests/bpf: XDP LB benchmark fixes Puranjay Mohan
  2026-05-19 16:36 ` [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark Puranjay Mohan
@ 2026-05-19 16:36 ` Puranjay Mohan
  2026-05-19 16:36 ` [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit Puranjay Mohan
  2026-05-19 16:36 ` [PATCH bpf-next 4/4] selftests/bpf: Filter timing outliers with IQR in batch-timing library Puranjay Mohan
  3 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-05-19 16:36 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	Fei Chen, Taruna Agrawal, Nikhil Dixit Limaye, Nikita V. Shirokov,
	kernel-team

populate_lru() inserts LRU entries with atime zero-initialized:

    struct real_pos_lru lru = { .pos = real_idx };

The BPF program's connection_table_lookup() enforces a 30-second timeout
for UDP flows:

    if (cur_time - dst_lru->atime > LRU_UDP_TIMEOUT)
        return NULL;

Since cur_time (system uptime) is always much larger than 30 seconds,
every pre-populated UDP entry is treated as expired on first access.
The lookup returns NULL, counting as an LRU miss.

This is masked by calibration: the BPF program runs during calibration
on one CPU, misses the expired entry, and re-inserts it with a fresh
atime.  But if the scheduler moves the thread to a different CPU for
validation, that CPU still has the atime=0 entry from populate_lru(),
causing intermittent validation failures for UDP LRU-hit scenarios:

    [udp-v4-lru-hit] COUNTER FAIL: LRU misses=1, expected 0

Fix by initializing atime with the current CLOCK_MONOTONIC time for UDP
flows, matching the clock source used by bpf_ktime_get_ns() in the BPF
program.

Fixes: a4b5ba8187cb ("selftests/bpf: Add XDP load-balancer benchmark driver")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 tools/testing/selftests/bpf/benchs/bench_xdp_lb.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/bpf/benchs/bench_xdp_lb.c b/tools/testing/selftests/bpf/benchs/bench_xdp_lb.c
index 0b6709a2b03c..8e25bccbde92 100644
--- a/tools/testing/selftests/bpf/benchs/bench_xdp_lb.c
+++ b/tools/testing/selftests/bpf/benchs/bench_xdp_lb.c
@@ -563,12 +563,23 @@ static void create_per_cpu_lru_maps(struct xdp_lb_bench *skel)
 	nr_inner_maps = nr_cpus;
 }
 
+static __u64 ktime_get_ns(void)
+{
+	struct timespec ts;
+
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+	return (__u64)ts.tv_sec * 1000000000ULL + ts.tv_nsec;
+}
+
 static void populate_lru(const struct test_scenario *sc, __u32 real_idx)
 {
 	struct real_pos_lru lru = { .pos = real_idx };
 	struct flow_key fk;
 	int i, err;
 
+	if (sc->ip_proto == IPPROTO_UDP)
+		lru.atime = ktime_get_ns();
+
 	build_flow_key(&fk, sc);
 
 	/* Insert into every per-CPU inner LRU so the entry is found
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit
  2026-05-19 16:36 [PATCH bpf-next 0/4] selftests/bpf: XDP LB benchmark fixes Puranjay Mohan
  2026-05-19 16:36 ` [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark Puranjay Mohan
  2026-05-19 16:36 ` [PATCH bpf-next 2/4] selftests/bpf: Fix expired UDP LRU entries " Puranjay Mohan
@ 2026-05-19 16:36 ` Puranjay Mohan
  2026-05-19 17:19   ` sashiko-bot
  2026-05-19 19:10   ` Alexei Starovoitov
  2026-05-19 16:36 ` [PATCH bpf-next 4/4] selftests/bpf: Filter timing outliers with IQR in batch-timing library Puranjay Mohan
  3 siblings, 2 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-05-19 16:36 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	Fei Chen, Taruna Agrawal, Nikhil Dixit Limaye, Nikita V. Shirokov,
	kernel-team

BENCH_BPF_LOOP uses the may_goto instruction (via can_loop) to bound
its iteration loop.  The kernel enforces a maximum of BPF_MAX_LOOPS
(8,388,608) iterations per may_goto instruction.

CALIBRATE_MAX_BATCH was set to 10,000,000, exceeding this limit.  If
calibration computed a batch_iters above 8M, the BPF loop would exit
early when can_loop becomes false.  The timing sample would cover fewer
iterations than batch_iters, causing the per-op calculation to
underestimate the true cost.  Validation would also fail since map
counters would not reach the expected batch_iters + 1.

No current benchmark triggers this (the fastest, bpf-nop at ~1.8 ns/op,
calibrates to ~5.5M), but future sub-1.25 ns/op benchmarks would.

Lower the cap to 8,000,000 to stay within the may_goto limit.

Fixes: 08158c111d7d ("selftests/bpf: Add BPF batch-timing library")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 tools/testing/selftests/bpf/benchs/bench_bpf_timing.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c b/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
index 75a39da69655..081e2e860cb4 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
@@ -166,7 +166,7 @@ void bpf_bench_timing_report(struct bpf_bench_timing *t, const char *name, const
 
 #define CALIBRATE_SEED_BATCH	100
 #define CALIBRATE_MIN_BATCH	100
-#define CALIBRATE_MAX_BATCH	10000000
+#define CALIBRATE_MAX_BATCH	8000000
 #define CALIBRATE_TARGET_MS	10
 #define CALIBRATE_RUNS		5
 #define PROPORTIONALITY_TOL	0.05	/* 5% */
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit
  2026-05-19 16:36 ` [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit Puranjay Mohan
@ 2026-05-19 17:19   ` sashiko-bot
  2026-05-19 19:10   ` Alexei Starovoitov
  1 sibling, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-19 17:19 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The new CALIBRATE_MAX_BATCH cap of 8,000,000 is mathematically incorrect and fails the patch's stated goal of keeping the benchmark within the BPF may_goto limit.
--

commit d9779455687fa035b58a0d4ba0d495256b32ed13
Author: Puranjay Mohan <puranjay@kernel.org>

selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit

This commit lowers CALIBRATE_MAX_BATCH from 10000000 to 8000000. This
is intended to keep the benchmark iterations within the BPF_MAX_LOOPS
limit of 8388608, avoiding early loop exits that cause underestimated
costs.

> diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c b/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
> index 75a39da696552..081e2e860cb49 100644
> --- a/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
> +++ b/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
> @@ -166,7 +166,7 @@ void bpf_bench_timing_report(struct bpf_bench_timing *t, const char *name, const
>  
>  #define CALIBRATE_SEED_BATCH	100
>  #define CALIBRATE_MIN_BATCH	100
> -#define CALIBRATE_MAX_BATCH	10000000
> +#define CALIBRATE_MAX_BATCH	8000000

[Severity: Medium]
Will this new cap of 8000000 still cause the benchmark to hit the BPF
may_goto limit during calibration?

Looking at bpf_bench_calibrate(), the proportionality check explicitly
requests a measurement of t->batch_iters * 2:

	time_n = measure_elapsed(t, run_fn, run_ctx, t->batch_iters, CALIBRATE_RUNS);
	time_2n = measure_elapsed(t, run_fn, run_ctx, t->batch_iters * 2, CALIBRATE_RUNS);

With a cap of 8000000, t->batch_iters * 2 can reach up to 16000000
iterations. Since this exceeds the BPF_MAX_LOOPS limit of 8388608,
won't the BPF program's can_loop condition return false early during the
time_2n measurement?

This would cause time_2n to record the time for only ~8.38M iterations
instead of 16M. When compared to time_n (8M iterations), the ratio would
be ~1.05 instead of the expected 2.0. This completely violates the 5%
PROPORTIONALITY_TOL and triggers a bogus proportionality check warning
for fast benchmarks.

To safely avoid the limit across all phases of calibration, shouldn't the
cap be no more than BPF_MAX_LOOPS / 2 (e.g., 4000000)?

>  #define CALIBRATE_TARGET_MS	10
>  #define CALIBRATE_RUNS		5
>  #define PROPORTIONALITY_TOL	0.05	/* 5% */

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260519163632.2220753-1-puranjay@kernel.org?part=3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit
  2026-05-19 16:36 ` [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit Puranjay Mohan
  2026-05-19 17:19   ` sashiko-bot
@ 2026-05-19 19:10   ` Alexei Starovoitov
  2026-05-19 19:23     ` Puranjay Mohan
  1 sibling, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2026-05-19 19:10 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Fei Chen,
	Taruna Agrawal, Nikhil Dixit Limaye, Nikita V. Shirokov,
	Kernel Team

On Tue, May 19, 2026 at 9:36 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> BENCH_BPF_LOOP uses the may_goto instruction (via can_loop) to bound
> its iteration loop.  The kernel enforces a maximum of BPF_MAX_LOOPS
> (8,388,608) iterations per may_goto instruction.

claude is incorrect.
may_goto is timer based. It aborts at 1/4 of a second.
Old BPF_MAX_LOOPS limit applies for bpf_loop and bpf_for() iters only.
Is this a theoretical issue?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit
  2026-05-19 19:10   ` Alexei Starovoitov
@ 2026-05-19 19:23     ` Puranjay Mohan
  0 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-05-19 19:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, Fei Chen, Taruna Agrawal, Nikhil Dixit Limaye,
	Nikita V. Shirokov, Kernel Team

On Tue, May 19, 2026 at 8:11 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 19, 2026 at 9:36 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> >
> > BENCH_BPF_LOOP uses the may_goto instruction (via can_loop) to bound
> > its iteration loop.  The kernel enforces a maximum of BPF_MAX_LOOPS
> > (8,388,608) iterations per may_goto instruction.
>
> claude is incorrect.
> may_goto is timer based. It aborts at 1/4 of a second.
> Old BPF_MAX_LOOPS limit applies for bpf_loop and bpf_for() iters only.
> Is this a theoretical issue?

Yes, this is theoretical, the benchmark will never cross this limit
because it runs the loop for 20ms at max.
So, I will drop this patch in the next version.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 4/4] selftests/bpf: Filter timing outliers with IQR in batch-timing library
  2026-05-19 16:36 [PATCH bpf-next 0/4] selftests/bpf: XDP LB benchmark fixes Puranjay Mohan
                   ` (2 preceding siblings ...)
  2026-05-19 16:36 ` [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit Puranjay Mohan
@ 2026-05-19 16:36 ` Puranjay Mohan
  3 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-05-19 16:36 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	Fei Chen, Taruna Agrawal, Nikhil Dixit Limaye, Nikita V. Shirokov,
	kernel-team

System noise (timer interrupts, scheduling) can produce outlier batch
samples that inflate the reported stddev.  For example, tcp-v4-syn
showed stddev 37.86 ns without filtering vs 0.16 ns with filtering on
the same run, because a handful of interrupt-hit batches dominated the
variance.

Apply IQR-based outlier filtering (1.5 * IQR fences) before computing
statistics.  Samples outside [Q1 - 1.5*IQR, Q3 + 1.5*IQR] are
discarded.  This removes system noise while preserving genuine
operational variance: scenarios with inherently wide distributions
(e.g., LRU-miss with eviction pressure) have large IQR, so their
fences are wide and the filter has minimal effect.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 .../selftests/bpf/benchs/bench_bpf_timing.c   | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c b/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
index 081e2e860cb4..1c3a74807c1c 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
@@ -65,6 +65,31 @@ static int collect_samples(struct bpf_bench_timing *t,
 	return total;
 }
 
+static int filter_outliers_iqr(double *sorted, int n)
+{
+	double q1, q3, iqr, lo, hi;
+	int start = 0, end = n;
+
+	if (n < 8)
+		return n;
+
+	q1 = sorted[n / 4];
+	q3 = sorted[3 * n / 4];
+	iqr = q3 - q1;
+	lo = q1 - 1.5 * iqr;
+	hi = q3 + 1.5 * iqr;
+
+	while (start < end && sorted[start] < lo)
+		start++;
+	while (end > start && sorted[end - 1] > hi)
+		end--;
+
+	if (start > 0)
+		memmove(sorted, sorted + start, (end - start) * sizeof(double));
+
+	return end - start;
+}
+
 static void compute_stats(const double *sorted, int n,
 			  struct timing_stats *s)
 {
@@ -150,6 +175,7 @@ void bpf_bench_timing_report(struct bpf_bench_timing *t, const char *name, const
 		return;
 	}
 
+	total = filter_outliers_iqr(all, total);
 	compute_stats(all, total, &s);
 
 	if (t->machine_readable) {
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-19 19:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 16:36 [PATCH bpf-next 0/4] selftests/bpf: XDP LB benchmark fixes Puranjay Mohan
2026-05-19 16:36 ` [PATCH bpf-next 1/4] selftests/bpf: Fix cold_lru producing zero batch_hash in XDP LB benchmark Puranjay Mohan
2026-05-19 16:51   ` sashiko-bot
2026-05-19 17:14   ` bot+bpf-ci
2026-05-19 19:25     ` Puranjay Mohan
2026-05-19 16:36 ` [PATCH bpf-next 2/4] selftests/bpf: Fix expired UDP LRU entries " Puranjay Mohan
2026-05-19 16:36 ` [PATCH bpf-next 3/4] selftests/bpf: Cap batch-timing calibration at BPF may_goto loop limit Puranjay Mohan
2026-05-19 17:19   ` sashiko-bot
2026-05-19 19:10   ` Alexei Starovoitov
2026-05-19 19:23     ` Puranjay Mohan
2026-05-19 16:36 ` [PATCH bpf-next 4/4] selftests/bpf: Filter timing outliers with IQR in batch-timing library Puranjay Mohan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.