From: Puranjay Mohan <puranjay@kernel.org>
To: bpf@vger.kernel.org
Cc: Puranjay Mohan <puranjay@kernel.org>,
Puranjay Mohan <puranjay12@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>,
Fei Chen <feichen@meta.com>, Taruna Agrawal <taragrawal@meta.com>,
Nikhil Dixit Limaye <ndixit@meta.com>,
"Nikita V. Shirokov" <tehnerd@tehnerd.com>,
kernel-team@meta.com
Subject: [PATCH bpf-next 0/7] selftests/bpf: Add XDP load-balancer benchmark
Date: Mon, 27 Apr 2026 16:22:57 -0700 [thread overview]
Message-ID: <20260427232313.1582588-1-puranjay@kernel.org> (raw)
Changelog:
RFC: https://lore.kernel.org/all/20260420111726.2118636-1-puranjay@kernel.org/
Changes in v1:
- Replace bpf_get_cpu_time_counter() with bpf_ktime_get_ns()
- Replace bpf_repeat() with plain for loop and may_goto
- Refactor collect_measurements() to reuse bench_force_done()
- Remove histogram, verbose calibration output, and per-scenario status prints
- Trim run script table to p50/stddev/p99
- Set env.quiet when --machine-readable is passed
- Add || true to run script benchmark invocation for set -e safety
- Add bpf-nop benchmark as timing overhead baseline (patch 3)
- Use named struct for LRU inner map to fix build on older toolchains
This series adds an XDP load-balancer benchmark (based on Katran) to the BPF
selftest bench framework.
Motivation
----------
Existing BPF bench tests measure individual operations (map lookups,
kprobes, ring buffers) in isolation. Production BPF programs combine
parsing, map lookups, branching, and packet rewriting in a single call
chain. The performance characteristics of such programs depend on the
interaction of these operations -- register pressure, spills, inlining
decisions, branch layout -- which isolated micro-benchmarks do not
capture.
This benchmark implements a simplified L4 load-balancer modeled after
katran [1]. The BPF program reproduces katran's core datapath:
L3/L4 parsing -> VIP hash lookup -> per-CPU LRU connection table
with consistent-hash fallback -> real server selection -> per-VIP
and per-real stats -> IPIP/IP6IP6 encapsulation
The BPF code exercises hash maps, array-of-maps (per-CPU LRU),
percpu arrays, jhash, bpf_xdp_adjust_head(), bpf_ktime_get_ns(),
and bpf_get_smp_processor_id() in a single pipeline.
This is intended as the first in a series of BPF workload benchmarks
covering other use cases (sched_ext, etc.).
Design
------
A userspace loop calling bpf_prog_test_run_opts(repeat=1) would
measure syscall overhead, not BPF program cost -- the ~4 ns early-exit
paths would be buried under kernel entry/exit. Using repeat=N is
also unsuitable: the kernel re-runs the same packet without resetting
state between iterations, so the second iteration of an encap scenario
would process an already-encapsulated packet.
Instead, timing is measured inside the BPF program using
bpf_ktime_get_ns(). BENCH_BPF_LOOP() brackets N iterations with
timestamp reads using a plain for loop with may_goto, runs a
caller-supplied reset block between iterations to undo side effects
(e.g. strip encapsulation), and records the elapsed time per batch.
One extra untimed iteration runs afterward for output validation.
Auto-calibration picks a batch size targeting ~10 ms per invocation.
A proportionality sanity check verifies that 2N iterations take ~2x
as long as N.
24 scenarios cover the code-path matrix:
- Protocol: TCP, UDP
- Address family: IPv4, IPv6, cross-AF (IPv4-in-IPv6)
- LRU state: hit, miss (16M flow space), diverse (4K flows), cold
- Consistent-hash: direct (LRU bypass)
- TCP flags: SYN (skip LRU, force CH), RST (skip LRU insert)
- Early exits: unknown VIP, non-IP, ICMP, fragments, IP options
Each scenario validates correctness before benchmarking by comparing
the output packet byte-for-byte against a pre-built expected packet
and checking BPF map counters.
Sample single-scenario output:
$ sudo ./bench xdp-lb --scenario tcp-v4-lru-hit
Setting up benchmark 'xdp-lb'...
Benchmark 'xdp-lb' started.
tcp-v4-lru-hit: median 74.51 ns/op, stddev 0.11, p99 74.81 (202 samples)
Sample run script output:
$ ./benchs/run_bench_xdp_lb.sh
XDP load-balancer benchmark
===========================
+----------------------------------+----------+---------+----------+
| Single-flow baseline | p50 | stddev | p99 |
+----------------------------------+----------+---------+----------+
| tcp-v4-lru-hit | 74.30 | 0.08 | 74.48 |
| tcp-v4-ch | 101.73 | 0.11 | 102.01 |
| tcp-v6-lru-hit | 76.77 | 0.14 | 77.04 |
| tcp-v6-ch | 121.40 | 0.10 | 121.65 |
| udp-v4-lru-hit | 107.42 | 0.22 | 107.90 |
| udp-v6-lru-hit | 110.21 | 0.12 | 110.45 |
| tcp-v4v6-lru-hit | 74.82 | 0.35 | 75.43 |
+----------------------------------+----------+---------+----------+
| Diverse flows (4K src addrs) | p50 | stddev | p99 |
+----------------------------------+----------+---------+----------+
| tcp-v4-lru-diverse | 86.63 | 0.37 | 89.04 |
| tcp-v4-ch-diverse | 104.09 | 0.19 | 105.67 |
| tcp-v6-lru-diverse | 89.34 | 0.42 | 90.70 |
| tcp-v6-ch-diverse | 122.20 | 0.21 | 123.78 |
| udp-v4-lru-diverse | 119.37 | 0.58 | 123.10 |
+----------------------------------+----------+---------+----------+
| TCP flags | p50 | stddev | p99 |
+----------------------------------+----------+---------+----------+
| tcp-v4-syn | 165.52 | 15.68 | 198.34 |
| tcp-v4-rst-miss | 161.34 | 2.69 | 172.64 |
+----------------------------------+----------+---------+----------+
| LRU stress | p50 | stddev | p99 |
+----------------------------------+----------+---------+----------+
| tcp-v4-lru-miss | 440.39 | 35.75 | 550.62 |
| udp-v4-lru-miss | 571.88 | 57.38 | 680.61 |
| tcp-v4-lru-warmup | 317.75 | 9.55 | 356.20 |
+----------------------------------+----------+---------+----------+
| Early exits | p50 | stddev | p99 |
+----------------------------------+----------+---------+----------+
| pass-v4-no-vip | 18.26 | 0.13 | 18.66 |
| pass-v6-no-vip | 19.08 | 0.01 | 19.10 |
| pass-v4-icmp | 6.81 | 0.02 | 6.86 |
| pass-non-ip | 5.71 | 0.03 | 5.76 |
| drop-v4-frag | 6.09 | 0.01 | 6.10 |
| drop-v4-options | 5.88 | 0.00 | 5.89 |
| drop-v6-frag | 6.00 | 0.03 | 6.04 |
+----------------------------------+----------+---------+----------+
Patches
-------
Patch 1 adds bench_force_done() to the bench framework so benchmarks
can signal early completion when enough samples have been collected.
Patch 2 adds the shared BPF batch-timing library (BPF-side timing
arrays, BENCH_BPF_LOOP macro, userspace statistics and calibration).
Patch 3 adds a bpf-nop benchmark as a timing overhead baseline and
usage example for the timing library.
Patch 4 adds the common header shared between the BPF program and
userspace (flow_key, vip_definition, real_definition, encap helpers).
Patch 5 adds the XDP load-balancer BPF program.
Patch 6 adds the userspace benchmark driver with 24 scenarios,
packet construction, validation, and bench framework integration.
Patch 7 adds the run script for running all scenarios.
[1] https://github.com/facebookincubator/katran
Puranjay Mohan (7):
selftests/bpf: Add bench_force_done() for early benchmark completion
selftests/bpf: Add BPF batch-timing library
selftests/bpf: Add bpf-nop benchmark for timing overhead baseline
selftests/bpf: Add XDP load-balancer common definitions
selftests/bpf: Add XDP load-balancer BPF program
selftests/bpf: Add XDP load-balancer benchmark driver
selftests/bpf: Add XDP load-balancer benchmark run script
tools/testing/selftests/bpf/Makefile | 6 +
tools/testing/selftests/bpf/bench.c | 20 +-
tools/testing/selftests/bpf/bench.h | 1 +
.../testing/selftests/bpf/bench_bpf_timing.h | 50 +
.../selftests/bpf/benchs/bench_bpf_nop.c | 84 ++
.../selftests/bpf/benchs/bench_bpf_timing.c | 272 ++++
.../selftests/bpf/benchs/bench_xdp_lb.c | 1113 +++++++++++++++++
.../selftests/bpf/benchs/run_bench_xdp_lb.sh | 79 ++
.../bpf/progs/bench_bpf_timing.bpf.h | 69 +
.../selftests/bpf/progs/bpf_nop_bench.c | 14 +
.../selftests/bpf/progs/xdp_lb_bench.c | 647 ++++++++++
.../selftests/bpf/xdp_lb_bench_common.h | 112 ++
12 files changed, 2462 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/bpf/bench_bpf_timing.h
create mode 100644 tools/testing/selftests/bpf/benchs/bench_bpf_nop.c
create mode 100644 tools/testing/selftests/bpf/benchs/bench_bpf_timing.c
create mode 100644 tools/testing/selftests/bpf/benchs/bench_xdp_lb.c
create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_xdp_lb.sh
create mode 100644 tools/testing/selftests/bpf/progs/bench_bpf_timing.bpf.h
create mode 100644 tools/testing/selftests/bpf/progs/bpf_nop_bench.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_lb_bench.c
create mode 100644 tools/testing/selftests/bpf/xdp_lb_bench_common.h
--
2.52.0
next reply other threads:[~2026-04-27 23:23 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 23:22 Puranjay Mohan [this message]
2026-04-27 23:22 ` [PATCH bpf-next 1/7] selftests/bpf: Add bench_force_done() for early benchmark completion Puranjay Mohan
2026-04-27 23:39 ` sashiko-bot
2026-04-28 0:05 ` bot+bpf-ci
2026-04-28 9:15 ` Puranjay Mohan
2026-04-27 23:22 ` [PATCH bpf-next 2/7] selftests/bpf: Add BPF batch-timing library Puranjay Mohan
2026-04-28 0:12 ` sashiko-bot
2026-04-28 0:18 ` bot+bpf-ci
2026-04-28 9:23 ` Puranjay Mohan
2026-04-27 23:23 ` [PATCH bpf-next 3/7] selftests/bpf: Add bpf-nop benchmark for timing overhead baseline Puranjay Mohan
2026-04-27 23:23 ` [PATCH bpf-next 4/7] selftests/bpf: Add XDP load-balancer common definitions Puranjay Mohan
2026-04-28 0:05 ` bot+bpf-ci
2026-04-28 0:38 ` sashiko-bot
2026-04-28 9:29 ` Puranjay Mohan
2026-04-27 23:23 ` [PATCH bpf-next 5/7] selftests/bpf: Add XDP load-balancer BPF program Puranjay Mohan
2026-04-28 0:18 ` bot+bpf-ci
2026-04-28 1:05 ` sashiko-bot
2026-04-28 9:30 ` Puranjay Mohan
2026-04-27 23:23 ` [PATCH bpf-next 6/7] selftests/bpf: Add XDP load-balancer benchmark driver Puranjay Mohan
2026-04-28 0:05 ` bot+bpf-ci
2026-04-28 1:29 ` sashiko-bot
2026-04-28 9:33 ` Puranjay Mohan
2026-04-27 23:23 ` [PATCH bpf-next 7/7] selftests/bpf: Add XDP load-balancer benchmark run script Puranjay Mohan
2026-04-28 2:03 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427232313.1582588-1-puranjay@kernel.org \
--to=puranjay@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=feichen@meta.com \
--cc=kernel-team@meta.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=mykyta.yatsenko5@gmail.com \
--cc=ndixit@meta.com \
--cc=puranjay12@gmail.com \
--cc=taragrawal@meta.com \
--cc=tehnerd@tehnerd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox