public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/3] Optimize kprobe.session attachment for exact match
@ 2026-02-23 21:51 Andrey Grodzovsky
  2026-02-23 21:51 ` [RFC PATCH bpf-next 1/3] libbpf: Optimize kprobe.session attachment for exact function names Andrey Grodzovsky
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrey Grodzovsky @ 2026-02-23 21:51 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, jolsa, rostedt, linux-trace-kernel,
	linux-open-source

When libbpf attaches kprobe.session programs with exact function names
(the common case: SEC("kprobe.session/vfs_read")), the current code path
has two independent performance bottlenecks:

1. Userspace (libbpf): attach_kprobe_session() always parses
   /proc/kallsyms to resolve function names, even when the name is exact
   (no wildcards).

2. Kernel (ftrace): ftrace_lookup_symbols() does a full O(N) linear scan
   Worse case ~200K kernel symbols via kallsyms_on_each_symbol(), decompressing
   every symbol name, even when resolving a single symbol (cnt == 1).

This series optimizes both layers:

Patch 1 adds a dual-path optimization to libbpf's attach_kprobe_session().
When the section name contains no wildcards (* or ?), it passes the
function name via opts.syms[] directly to the kernel, completely skipping
the /proc/kallsyms parse.  When wildcards are present, it falls back to
the existing pattern matching path.  Error codes are normalized (ESRCH →
ENOENT) so both paths present identical errors for "symbol not found".

Patch 2 adds a cnt == 1 fast path inside ftrace_lookup_symbols().  For a
single symbol, it uses kallsyms_lookup_name() which performs an O(log N)
binary search via the sorted kallsyms index, needing only ~17 symbol
decompressions instead of ~200K.  If the binary lookup fails (duplicate
symbol names where the first match is not ftrace-instrumented, or module
symbols), it falls through to the existing linear scan.

The optimization is placed inside ftrace_lookup_symbols() rather than in
its callers because:
  - It benefits all callers (bpf_kprobe_multi_link_attach,
    register_fprobe_syms) without duplicating logic.
  - The cnt == 1 binary search with fallback is purely an internal
    optimization detail of ftrace_lookup_symbols()'s contract.

For batch lookups (cnt > 1), the existing single-pass O(N) linear scan
is retained.  Empirical profiling with perf and bpftrace on both QEMU
and real hardware showed that the linear scan beats per-symbol
binary search for batch resolution at every measured scale (500, 10K,
41K symbols).

Patch 3 adds selftests covering the optimization: test_session_syms
validates that exact function name attachment works correctly through
the fast path, and test_session_errors verifies that both the wildcard
(slow) and exact (fast) paths return identical -ENOENT errors for
non-existent functions.

Example -  (50 kprobe.session programs, each attaching to one exact
function name via separate BPF_LINK_CREATE syscall, 50 distinct
functions):

  Configuration                                  Attach Time
  -----------------------------------------------+-----------
  Before (unpatched libbpf + kernel)              7,488 ms
  Patched libbpf only                               858 ms
  Both patches (libbpf + ftrace)                      52 ms
  Traditional kprobe pairs (100 progs, reference)    132 ms

Combined improvement: 144x faster.  kprobe.session is now 2.5x faster
than the equivalent traditional kprobe entry+return pair.

Background: ftrace_lookup_symbols() was added by "ftrace: Add
ftrace_lookup_symbols function" to batch-resolve thousands of
wildcard-matched symbols in a single linear pass.  At the time,
kallsyms_lookup_name() was also a linear scan, so the batch approach
was strictly better.  "kallsyms: Improve the performance of
kallsyms_lookup_name()" later added a sorted index making
kallsyms_lookup_name() O(log N), but ftrace_lookup_symbols() was
never updated to take advantage of this for the single-symbol case.

Andrey Grodzovsky (3):
  libbpf: Optimize kprobe.session attachment for exact function names
  ftrace: Use kallsyms binary search for single-symbol lookup
  selftests/bpf: add tests for kprobe.session optimization

 kernel/trace/ftrace.c                         | 28 +++++++
 tools/lib/bpf/libbpf.c                        | 32 ++++++--
 .../bpf/prog_tests/kprobe_multi_test.c        | 76 +++++++++++++++++++
 .../bpf/progs/kprobe_multi_session_errors.c   | 27 +++++++
 .../bpf/progs/kprobe_multi_session_syms.c     | 45 +++++++++++
 5 files changed, 203 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/kprobe_multi_session_errors.c
 create mode 100644 tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-24 21:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-23 21:51 [RFC PATCH bpf-next 0/3] Optimize kprobe.session attachment for exact match Andrey Grodzovsky
2026-02-23 21:51 ` [RFC PATCH bpf-next 1/3] libbpf: Optimize kprobe.session attachment for exact function names Andrey Grodzovsky
2026-02-24 13:10   ` Jiri Olsa
2026-02-23 21:51 ` [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup Andrey Grodzovsky
2026-02-24 13:12   ` Jiri Olsa
2026-02-25 11:47   ` Steven Rostedt
2026-02-25 15:25     ` [External] " Andrey Grodzovsky
2026-02-25 23:32       ` Steven Rostedt
2026-02-26  1:22         ` Andrey Grodzovsky
2026-03-24 21:03           ` Steven Rostedt
2026-02-23 21:51 ` [RFC PATCH bpf-next 3/3] selftests/bpf: add tests for kprobe.session optimization Andrey Grodzovsky
2026-02-24 13:12   ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox