* [RFC PATCH bpf-next 1/3] libbpf: Optimize kprobe.session attachment for exact function names
2026-02-23 21:51 [RFC PATCH bpf-next 0/3] Optimize kprobe.session attachment for exact match Andrey Grodzovsky
@ 2026-02-23 21:51 ` Andrey Grodzovsky
2026-02-24 13:10 ` Jiri Olsa
2026-02-23 21:51 ` [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup Andrey Grodzovsky
2026-02-23 21:51 ` [RFC PATCH bpf-next 3/3] selftests/bpf: add tests for kprobe.session optimization Andrey Grodzovsky
2 siblings, 1 reply; 12+ messages in thread
From: Andrey Grodzovsky @ 2026-02-23 21:51 UTC (permalink / raw)
To: bpf
Cc: ast, daniel, andrii, jolsa, rostedt, linux-trace-kernel,
linux-open-source
Implement dual-path optimization in attach_kprobe_session():
- Fast path: Use syms[] array for exact function names
(no kallsyms parsing)
- Slow path: Use pattern matching with kallsyms only for
wildcards
This avoids expensive kallsyms file parsing (~150ms) when function names
are specified exactly, improving attachment time 50x (~3-5ms).
Error code normalization: The fast path returns ESRCH from kernel's
ftrace_lookup_symbols(), while slow path returns ENOENT from userspace
kallsyms parsing. Convert ESRCH to ENOENT in fast path to maintain API
consistency - both paths now return identical error codes for "symbol
not found".
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
tools/lib/bpf/libbpf.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0be7017800fe..87a71eab4308 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12192,7 +12192,7 @@ static int attach_kprobe_session(const struct bpf_program *prog, long cookie,
{
LIBBPF_OPTS(bpf_kprobe_multi_opts, opts, .session = true);
const char *spec;
- char *pattern;
+ char *func_name;
int n;
*link = NULL;
@@ -12202,14 +12202,36 @@ static int attach_kprobe_session(const struct bpf_program *prog, long cookie,
return 0;
spec = prog->sec_name + sizeof("kprobe.session/") - 1;
- n = sscanf(spec, "%m[a-zA-Z0-9_.*?]", &pattern);
+ n = sscanf(spec, "%m[a-zA-Z0-9_.*?]", &func_name);
if (n < 1) {
- pr_warn("kprobe session pattern is invalid: %s\n", spec);
+ pr_warn("kprobe session function name is invalid: %s\n", spec);
return -EINVAL;
}
- *link = bpf_program__attach_kprobe_multi_opts(prog, pattern, &opts);
- free(pattern);
+ /* Check if pattern contains wildcards */
+ if (strpbrk(func_name, "*?")) {
+ /* Wildcard pattern - use pattern matching path with kallsyms parsing */
+ *link = bpf_program__attach_kprobe_multi_opts(prog, func_name, &opts);
+ } else {
+ /* Exact function name - use syms array path (fast, no kallsyms parsing) */
+ const char *syms[1];
+
+ syms[0] = func_name;
+ opts.syms = syms;
+ opts.cnt = 1;
+ *link = bpf_program__attach_kprobe_multi_opts(prog, NULL, &opts);
+ if (!*link && errno == ESRCH) {
+ /*
+ * Normalize error code for API consistency: fast path returns ESRCH
+ * from kernel's ftrace_lookup_symbols(), while slow path returns ENOENT
+ * from userspace kallsyms parsing. Convert ESRCH to ENOENT so both paths
+ * return the same error for "symbol not found".
+ */
+ errno = ENOENT;
+ }
+ }
+
+ free(func_name);
return *link ? 0 : -errno;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC PATCH bpf-next 1/3] libbpf: Optimize kprobe.session attachment for exact function names
2026-02-23 21:51 ` [RFC PATCH bpf-next 1/3] libbpf: Optimize kprobe.session attachment for exact function names Andrey Grodzovsky
@ 2026-02-24 13:10 ` Jiri Olsa
0 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2026-02-24 13:10 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, ast, daniel, andrii, rostedt, linux-trace-kernel,
linux-open-source
On Mon, Feb 23, 2026 at 04:51:11PM -0500, Andrey Grodzovsky wrote:
> Implement dual-path optimization in attach_kprobe_session():
> - Fast path: Use syms[] array for exact function names
> (no kallsyms parsing)
> - Slow path: Use pattern matching with kallsyms only for
> wildcards
>
> This avoids expensive kallsyms file parsing (~150ms) when function names
> are specified exactly, improving attachment time 50x (~3-5ms).
>
> Error code normalization: The fast path returns ESRCH from kernel's
> ftrace_lookup_symbols(), while slow path returns ENOENT from userspace
> kallsyms parsing. Convert ESRCH to ENOENT in fast path to maintain API
> consistency - both paths now return identical error codes for "symbol
> not found".
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
> ---
> tools/lib/bpf/libbpf.c | 32 +++++++++++++++++++++++++++-----
> 1 file changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0be7017800fe..87a71eab4308 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -12192,7 +12192,7 @@ static int attach_kprobe_session(const struct bpf_program *prog, long cookie,
> {
> LIBBPF_OPTS(bpf_kprobe_multi_opts, opts, .session = true);
> const char *spec;
> - char *pattern;
> + char *func_name;
> int n;
>
> *link = NULL;
> @@ -12202,14 +12202,36 @@ static int attach_kprobe_session(const struct bpf_program *prog, long cookie,
> return 0;
>
> spec = prog->sec_name + sizeof("kprobe.session/") - 1;
> - n = sscanf(spec, "%m[a-zA-Z0-9_.*?]", &pattern);
> + n = sscanf(spec, "%m[a-zA-Z0-9_.*?]", &func_name);
> if (n < 1) {
> - pr_warn("kprobe session pattern is invalid: %s\n", spec);
> + pr_warn("kprobe session function name is invalid: %s\n", spec);
> return -EINVAL;
> }
>
> - *link = bpf_program__attach_kprobe_multi_opts(prog, pattern, &opts);
> - free(pattern);
> + /* Check if pattern contains wildcards */
> + if (strpbrk(func_name, "*?")) {
> + /* Wildcard pattern - use pattern matching path with kallsyms parsing */
> + *link = bpf_program__attach_kprobe_multi_opts(prog, func_name, &opts);
> + } else {
> + /* Exact function name - use syms array path (fast, no kallsyms parsing) */
> + const char *syms[1];
> +
> + syms[0] = func_name;
> + opts.syms = syms;
> + opts.cnt = 1;
> + *link = bpf_program__attach_kprobe_multi_opts(prog, NULL, &opts);
hi,
good idea, could we do this directly in bpf_program__attach_kprobe_multi_opts ?
seems like it's not drectly related to session
jirka
> + if (!*link && errno == ESRCH) {
> + /*
> + * Normalize error code for API consistency: fast path returns ESRCH
> + * from kernel's ftrace_lookup_symbols(), while slow path returns ENOENT
> + * from userspace kallsyms parsing. Convert ESRCH to ENOENT so both paths
> + * return the same error for "symbol not found".
> + */
> + errno = ENOENT;
> + }
> + }
> +
> + free(func_name);
> return *link ? 0 : -errno;
> }
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup
2026-02-23 21:51 [RFC PATCH bpf-next 0/3] Optimize kprobe.session attachment for exact match Andrey Grodzovsky
2026-02-23 21:51 ` [RFC PATCH bpf-next 1/3] libbpf: Optimize kprobe.session attachment for exact function names Andrey Grodzovsky
@ 2026-02-23 21:51 ` Andrey Grodzovsky
2026-02-24 13:12 ` Jiri Olsa
2026-02-25 11:47 ` Steven Rostedt
2026-02-23 21:51 ` [RFC PATCH bpf-next 3/3] selftests/bpf: add tests for kprobe.session optimization Andrey Grodzovsky
2 siblings, 2 replies; 12+ messages in thread
From: Andrey Grodzovsky @ 2026-02-23 21:51 UTC (permalink / raw)
To: bpf
Cc: ast, daniel, andrii, jolsa, rostedt, linux-trace-kernel,
linux-open-source
When ftrace_lookup_symbols() is called with a single symbol (cnt == 1),
use kallsyms_lookup_name() for O(log N) binary search instead of the
full linear scan via kallsyms_on_each_symbol().
ftrace_lookup_symbols() was designed for batch resolution of many
symbols in a single pass. For large cnt this is efficient: a single
O(N) walk over all symbols with O(log cnt) binary search into the
sorted input array. But for cnt == 1 it still decompresses all ~200K
kernel symbols only to match one.
kallsyms_lookup_name() uses the sorted kallsyms index and needs only
~17 decompressions for a single lookup.
This is the common path for kprobe.session with exact function names,
where libbpf sends one symbol per BPF_LINK_CREATE syscall.
If binary lookup fails (duplicate symbol names where the first match
is not ftrace-instrumented, or module symbols), the function falls
through to the existing linear scan path.
Before (cnt=1, 50 kprobe.session programs):
Attach: 858 ms (kallsyms_expand_symbol 25% of CPU)
After:
Attach: 52 ms (16x faster)
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
kernel/trace/ftrace.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 827fb9a0bf0d..bfd7670669c2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -9263,6 +9263,19 @@ static int kallsyms_callback(void *data, const char *name, unsigned long addr)
* @addrs array, which needs to be big enough to store at least @cnt
* addresses.
*
+ * For a single symbol (cnt == 1), uses kallsyms_lookup_name() which
+ * performs an O(log N) binary search via the sorted kallsyms index.
+ * This avoids the full O(N) linear scan over all kernel symbols that
+ * the multi-symbol path requires.
+ *
+ * For multiple symbols, uses a single-pass linear scan via
+ * kallsyms_on_each_symbol() with binary search into the sorted input
+ * array. While individual lookups are O(log N), doing K lookups
+ * totals O(K * log N) which loses to a single sequential O(N) pass
+ * at scale due to cache-friendly memory access patterns of the linear
+ * walk. Empirical testing shows the linear scan is faster for batch
+ * lookups even well below 10K symbols.
+ *
* Returns: 0 if all provided symbols are found, -ESRCH otherwise.
*/
int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs)
@@ -9270,6 +9283,21 @@ int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *a
struct kallsyms_data args;
int found_all;
+ /* Fast path: single symbol uses O(log N) binary search */
+ if (cnt == 1) {
+ addrs[0] = kallsyms_lookup_name(sorted_syms[0]);
+ if (addrs[0])
+ addrs[0] = ftrace_location(addrs[0]);
+ if (addrs[0])
+ return 0;
+ /*
+ * Binary lookup can fail for duplicate symbol names
+ * where the first match is not ftrace-instrumented,
+ * or for module symbols. Retry with linear scan.
+ */
+ }
+
+ /* Batch path: single-pass O(N) linear scan */
memset(addrs, 0, sizeof(*addrs) * cnt);
args.addrs = addrs;
args.syms = sorted_syms;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup
2026-02-23 21:51 ` [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup Andrey Grodzovsky
@ 2026-02-24 13:12 ` Jiri Olsa
2026-02-25 11:47 ` Steven Rostedt
1 sibling, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2026-02-24 13:12 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, ast, daniel, andrii, rostedt, linux-trace-kernel,
linux-open-source
On Mon, Feb 23, 2026 at 04:51:12PM -0500, Andrey Grodzovsky wrote:
> When ftrace_lookup_symbols() is called with a single symbol (cnt == 1),
> use kallsyms_lookup_name() for O(log N) binary search instead of the
> full linear scan via kallsyms_on_each_symbol().
>
> ftrace_lookup_symbols() was designed for batch resolution of many
> symbols in a single pass. For large cnt this is efficient: a single
> O(N) walk over all symbols with O(log cnt) binary search into the
> sorted input array. But for cnt == 1 it still decompresses all ~200K
> kernel symbols only to match one.
>
> kallsyms_lookup_name() uses the sorted kallsyms index and needs only
> ~17 decompressions for a single lookup.
>
> This is the common path for kprobe.session with exact function names,
> where libbpf sends one symbol per BPF_LINK_CREATE syscall.
>
> If binary lookup fails (duplicate symbol names where the first match
> is not ftrace-instrumented, or module symbols), the function falls
> through to the existing linear scan path.
>
> Before (cnt=1, 50 kprobe.session programs):
> Attach: 858 ms (kallsyms_expand_symbol 25% of CPU)
>
> After:
> Attach: 52 ms (16x faster)
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
> ---
> kernel/trace/ftrace.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 827fb9a0bf0d..bfd7670669c2 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -9263,6 +9263,19 @@ static int kallsyms_callback(void *data, const char *name, unsigned long addr)
> * @addrs array, which needs to be big enough to store at least @cnt
> * addresses.
> *
> + * For a single symbol (cnt == 1), uses kallsyms_lookup_name() which
> + * performs an O(log N) binary search via the sorted kallsyms index.
> + * This avoids the full O(N) linear scan over all kernel symbols that
> + * the multi-symbol path requires.
> + *
> + * For multiple symbols, uses a single-pass linear scan via
> + * kallsyms_on_each_symbol() with binary search into the sorted input
> + * array. While individual lookups are O(log N), doing K lookups
> + * totals O(K * log N) which loses to a single sequential O(N) pass
> + * at scale due to cache-friendly memory access patterns of the linear
> + * walk. Empirical testing shows the linear scan is faster for batch
> + * lookups even well below 10K symbols.
> + *
> * Returns: 0 if all provided symbols are found, -ESRCH otherwise.
> */
> int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs)
> @@ -9270,6 +9283,21 @@ int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *a
> struct kallsyms_data args;
> int found_all;
>
> + /* Fast path: single symbol uses O(log N) binary search */
> + if (cnt == 1) {
> + addrs[0] = kallsyms_lookup_name(sorted_syms[0]);
> + if (addrs[0])
> + addrs[0] = ftrace_location(addrs[0]);
the kallsyms_callback callback code does not take the address
from ftrace_location, just checks it exists .. I think it is
done later in the fprobe layer .. let's keep it the same
jirka
> + if (addrs[0])
> + return 0;
> + /*
> + * Binary lookup can fail for duplicate symbol names
> + * where the first match is not ftrace-instrumented,
> + * or for module symbols. Retry with linear scan.
> + */
> + }
> +
> + /* Batch path: single-pass O(N) linear scan */
> memset(addrs, 0, sizeof(*addrs) * cnt);
> args.addrs = addrs;
> args.syms = sorted_syms;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup
2026-02-23 21:51 ` [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup Andrey Grodzovsky
2026-02-24 13:12 ` Jiri Olsa
@ 2026-02-25 11:47 ` Steven Rostedt
2026-02-25 15:25 ` [External] " Andrey Grodzovsky
1 sibling, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2026-02-25 11:47 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, ast, daniel, andrii, jolsa, linux-trace-kernel,
linux-open-source
On Mon, 23 Feb 2026 16:51:12 -0500
Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> wrote:
> When ftrace_lookup_symbols() is called with a single symbol (cnt == 1),
> use kallsyms_lookup_name() for O(log N) binary search instead of the
> full linear scan via kallsyms_on_each_symbol().
So this patch looks like it should go through the tracing tree, not bpf.
>
> ftrace_lookup_symbols() was designed for batch resolution of many
> symbols in a single pass. For large cnt this is efficient: a single
> O(N) walk over all symbols with O(log cnt) binary search into the
> sorted input array. But for cnt == 1 it still decompresses all ~200K
> kernel symbols only to match one.
>
> kallsyms_lookup_name() uses the sorted kallsyms index and needs only
> ~17 decompressions for a single lookup.
>
> This is the common path for kprobe.session with exact function names,
> where libbpf sends one symbol per BPF_LINK_CREATE syscall.
>
> If binary lookup fails (duplicate symbol names where the first match
> is not ftrace-instrumented, or module symbols), the function falls
> through to the existing linear scan path.
>
> Before (cnt=1, 50 kprobe.session programs):
> Attach: 858 ms (kallsyms_expand_symbol 25% of CPU)
>
> After:
> Attach: 52 ms (16x faster)
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
> ---
> kernel/trace/ftrace.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 827fb9a0bf0d..bfd7670669c2 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -9263,6 +9263,19 @@ static int kallsyms_callback(void *data, const char *name, unsigned long addr)
> * @addrs array, which needs to be big enough to store at least @cnt
> * addresses.
> *
> + * For a single symbol (cnt == 1), uses kallsyms_lookup_name() which
> + * performs an O(log N) binary search via the sorted kallsyms index.
> + * This avoids the full O(N) linear scan over all kernel symbols that
> + * the multi-symbol path requires.
> + *
> + * For multiple symbols, uses a single-pass linear scan via
> + * kallsyms_on_each_symbol() with binary search into the sorted input
> + * array.
The above is fine.
> While individual lookups are O(log N), doing K lookups
> + * totals O(K * log N) which loses to a single sequential O(N) pass
> + * at scale due to cache-friendly memory access patterns of the linear
> + * walk. Empirical testing shows the linear scan is faster for batch
> + * lookups even well below 10K symbols.
The above is unneeded for a comment in the code and just belongs in the
change log.
-- Steve
> + *
> * Returns: 0 if all provided symbols are found, -ESRCH otherwise.
> */
> int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs)
> @@ -9270,6 +9283,21 @@ int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *a
> struct kallsyms_data args;
> int found_all;
>
> + /* Fast path: single symbol uses O(log N) binary search */
> + if (cnt == 1) {
> + addrs[0] = kallsyms_lookup_name(sorted_syms[0]);
> + if (addrs[0])
> + addrs[0] = ftrace_location(addrs[0]);
> + if (addrs[0])
> + return 0;
> + /*
> + * Binary lookup can fail for duplicate symbol names
> + * where the first match is not ftrace-instrumented,
> + * or for module symbols. Retry with linear scan.
> + */
> + }
> +
> + /* Batch path: single-pass O(N) linear scan */
> memset(addrs, 0, sizeof(*addrs) * cnt);
> args.addrs = addrs;
> args.syms = sorted_syms;
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [External] Re: [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup
2026-02-25 11:47 ` Steven Rostedt
@ 2026-02-25 15:25 ` Andrey Grodzovsky
2026-02-25 23:32 ` Steven Rostedt
0 siblings, 1 reply; 12+ messages in thread
From: Andrey Grodzovsky @ 2026-02-25 15:25 UTC (permalink / raw)
To: Steven Rostedt
Cc: bpf, ast, daniel, andrii, jolsa, linux-trace-kernel,
linux-open-source
On Wed, Feb 25, 2026 at 6:47 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 23 Feb 2026 16:51:12 -0500
> Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> wrote:
>
> > When ftrace_lookup_symbols() is called with a single symbol (cnt == 1),
> > use kallsyms_lookup_name() for O(log N) binary search instead of the
> > full linear scan via kallsyms_on_each_symbol().
>
> So this patch looks like it should go through the tracing tree, not bpf.
Hey Steve, are there any extra steps required on my side to make this go
through your tree?
Andrey
>
> >
> > ftrace_lookup_symbols() was designed for batch resolution of many
> > symbols in a single pass. For large cnt this is efficient: a single
> > O(N) walk over all symbols with O(log cnt) binary search into the
> > sorted input array. But for cnt == 1 it still decompresses all ~200K
> > kernel symbols only to match one.
> >
> > kallsyms_lookup_name() uses the sorted kallsyms index and needs only
> > ~17 decompressions for a single lookup.
> >
> > This is the common path for kprobe.session with exact function names,
> > where libbpf sends one symbol per BPF_LINK_CREATE syscall.
> >
> > If binary lookup fails (duplicate symbol names where the first match
> > is not ftrace-instrumented, or module symbols), the function falls
> > through to the existing linear scan path.
> >
> > Before (cnt=1, 50 kprobe.session programs):
> > Attach: 858 ms (kallsyms_expand_symbol 25% of CPU)
> >
> > After:
> > Attach: 52 ms (16x faster)
> >
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
> > ---
> > kernel/trace/ftrace.c | 28 ++++++++++++++++++++++++++++
> > 1 file changed, 28 insertions(+)
> >
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 827fb9a0bf0d..bfd7670669c2 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -9263,6 +9263,19 @@ static int kallsyms_callback(void *data, const char *name, unsigned long addr)
> > * @addrs array, which needs to be big enough to store at least @cnt
> > * addresses.
> > *
> > + * For a single symbol (cnt == 1), uses kallsyms_lookup_name() which
> > + * performs an O(log N) binary search via the sorted kallsyms index.
> > + * This avoids the full O(N) linear scan over all kernel symbols that
> > + * the multi-symbol path requires.
> > + *
> > + * For multiple symbols, uses a single-pass linear scan via
> > + * kallsyms_on_each_symbol() with binary search into the sorted input
> > + * array.
>
> The above is fine.
>
> > While individual lookups are O(log N), doing K lookups
> > + * totals O(K * log N) which loses to a single sequential O(N) pass
> > + * at scale due to cache-friendly memory access patterns of the linear
> > + * walk. Empirical testing shows the linear scan is faster for batch
> > + * lookups even well below 10K symbols.
>
> The above is unneeded for a comment in the code and just belongs in the
> change log.
>
> -- Steve
>
> > + *
> > * Returns: 0 if all provided symbols are found, -ESRCH otherwise.
> > */
> > int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs)
> > @@ -9270,6 +9283,21 @@ int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *a
> > struct kallsyms_data args;
> > int found_all;
> >
> > + /* Fast path: single symbol uses O(log N) binary search */
> > + if (cnt == 1) {
> > + addrs[0] = kallsyms_lookup_name(sorted_syms[0]);
> > + if (addrs[0])
> > + addrs[0] = ftrace_location(addrs[0]);
> > + if (addrs[0])
> > + return 0;
> > + /*
> > + * Binary lookup can fail for duplicate symbol names
> > + * where the first match is not ftrace-instrumented,
> > + * or for module symbols. Retry with linear scan.
> > + */
> > + }
> > +
> > + /* Batch path: single-pass O(N) linear scan */
> > memset(addrs, 0, sizeof(*addrs) * cnt);
> > args.addrs = addrs;
> > args.syms = sorted_syms;
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [External] Re: [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup
2026-02-25 15:25 ` [External] " Andrey Grodzovsky
@ 2026-02-25 23:32 ` Steven Rostedt
2026-02-26 1:22 ` Andrey Grodzovsky
0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2026-02-25 23:32 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, ast, daniel, andrii, jolsa, linux-trace-kernel,
linux-open-source
On Wed, 25 Feb 2026 10:25:04 -0500
Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> wrote:
> Hey Steve, are there any extra steps required on my side to make this go
> through your tree?
I see you Cc'd linux-trace-kernel which places it into the tracing
patchwork. If there's no dependency on any other patch, I can add it to
my 7.1 queue. That is, after I get some more time to review it a little
deeper.
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [External] Re: [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup
2026-02-25 23:32 ` Steven Rostedt
@ 2026-02-26 1:22 ` Andrey Grodzovsky
2026-03-24 21:03 ` Steven Rostedt
0 siblings, 1 reply; 12+ messages in thread
From: Andrey Grodzovsky @ 2026-02-26 1:22 UTC (permalink / raw)
To: Steven Rostedt
Cc: bpf, ast, daniel, andrii, jolsa, linux-trace-kernel,
linux-open-source
On Wed, Feb 25, 2026 at 6:32 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 25 Feb 2026 10:25:04 -0500
> Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> wrote:
>
> > Hey Steve, are there any extra steps required on my side to make this go
> > through your tree?
>
> I see you Cc'd linux-trace-kernel which places it into the tracing
> patchwork. If there's no dependency on any other patch, I can add it to
> my 7.1 queue. That is, after I get some more time to review it a little
> deeper.
>
> -- Steve
There are no dependencies, I performed the original optimization in libbpf
(patch 1) in hopes this will make session kprobes faster then legacy
kprobe/kretprobe pairs and when it didn't, I dug deeper and came up
with this second optimization here (patch 2).
Let me know of any issues once you have time to review, in the meantime
I will roll V2 with all the fixes per Jiri's comments including this patch.
Andrey.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [External] Re: [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup
2026-02-26 1:22 ` Andrey Grodzovsky
@ 2026-03-24 21:03 ` Steven Rostedt
0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2026-03-24 21:03 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, ast, daniel, andrii, jolsa, linux-trace-kernel,
linux-open-source
On Wed, 25 Feb 2026 20:22:50 -0500
Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> wrote:
> On Wed, Feb 25, 2026 at 6:32 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 25 Feb 2026 10:25:04 -0500
> > Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> wrote:
> >
> > > Hey Steve, are there any extra steps required on my side to make this go
> > > through your tree?
> >
> > I see you Cc'd linux-trace-kernel which places it into the tracing
> > patchwork. If there's no dependency on any other patch, I can add it to
> > my 7.1 queue. That is, after I get some more time to review it a little
> > deeper.
> >
> > -- Steve
>
> There are no dependencies, I performed the original optimization in libbpf
> (patch 1) in hopes this will make session kprobes faster then legacy
> kprobe/kretprobe pairs and when it didn't, I dug deeper and came up
> with this second optimization here (patch 2).
>
> Let me know of any issues once you have time to review, in the meantime
> I will roll V2 with all the fixes per Jiri's comments including this patch.
Sorry for the late reply:
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH bpf-next 3/3] selftests/bpf: add tests for kprobe.session optimization
2026-02-23 21:51 [RFC PATCH bpf-next 0/3] Optimize kprobe.session attachment for exact match Andrey Grodzovsky
2026-02-23 21:51 ` [RFC PATCH bpf-next 1/3] libbpf: Optimize kprobe.session attachment for exact function names Andrey Grodzovsky
2026-02-23 21:51 ` [RFC PATCH bpf-next 2/3] ftrace: Use kallsyms binary search for single-symbol lookup Andrey Grodzovsky
@ 2026-02-23 21:51 ` Andrey Grodzovsky
2026-02-24 13:12 ` Jiri Olsa
2 siblings, 1 reply; 12+ messages in thread
From: Andrey Grodzovsky @ 2026-02-23 21:51 UTC (permalink / raw)
To: bpf
Cc: ast, daniel, andrii, jolsa, rostedt, linux-trace-kernel,
linux-open-source
Add two new subtests to kprobe_multi_test to validate the
kprobe.session exact function name optimization:
test_session_syms: Attaches a kprobe.session program to an exact
function name (bpf_fentry_test1) to verify the fast syms[] path
works correctly. Validates that both entry and return probes fire.
test_session_errors: Verifies error code consistency between the
wildcard pattern path (slow, parses kallsyms) and the exact function
name path (fast, uses syms[] array). Both paths must return -ENOENT
for non-existent functions, protecting against future changes that
could break API consistency.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
.../bpf/prog_tests/kprobe_multi_test.c | 76 +++++++++++++++++++
.../bpf/progs/kprobe_multi_session_errors.c | 27 +++++++
.../bpf/progs/kprobe_multi_session_syms.c | 45 +++++++++++
3 files changed, 148 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/kprobe_multi_session_errors.c
create mode 100644 tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c
diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
index 9caef222e528..62f7959858a9 100644
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -8,6 +8,8 @@
#include "kprobe_multi_override.skel.h"
#include "kprobe_multi_session.skel.h"
#include "kprobe_multi_session_cookie.skel.h"
+#include "kprobe_multi_session_syms.skel.h"
+#include "kprobe_multi_session_errors.skel.h"
#include "kprobe_multi_verifier.skel.h"
#include "kprobe_write_ctx.skel.h"
#include "bpf/libbpf_internal.h"
@@ -400,6 +402,76 @@ static void test_session_cookie_skel_api(void)
kprobe_multi_session_cookie__destroy(skel);
}
+static void test_session_syms_skel_api(void)
+{
+ struct kprobe_multi_session_syms *skel = NULL;
+
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+ int err, prog_fd;
+
+ skel = kprobe_multi_session_syms__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "kprobe_multi_session_syms__open_and_load"))
+ return;
+
+ skel->bss->pid = getpid();
+
+ err = kprobe_multi_session_syms__attach(skel);
+ if (!ASSERT_OK(err, "kprobe_multi_session_syms__attach"))
+ goto cleanup;
+
+ prog_fd = bpf_program__fd(skel->progs.trigger);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(topts.retval, 0, "test_run");
+
+ /* Test 1: Both entry and return should fire */
+ ASSERT_EQ(skel->bss->test1_count, 2, "test1_count");
+ ASSERT_TRUE(skel->bss->test1_return, "test1_return");
+
+cleanup:
+ kprobe_multi_session_syms__destroy(skel);
+}
+
+static void test_session_errors(void)
+{
+ struct kprobe_multi_session_errors *skel = NULL;
+ struct bpf_link *link_wildcard = NULL;
+ struct bpf_link *link_exact = NULL;
+ int err_wildcard, err_exact;
+
+ skel = kprobe_multi_session_errors__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "kprobe_multi_session_errors__open_and_load"))
+ return;
+
+ /*
+ * Test error code consistency: both wildcard (slow path) and exact name
+ * (fast path) should return the same error code (ENOENT) for non-existent
+ * functions. This protects against future kernel changes that might alter
+ * error return values.
+ */
+
+ /* Try to attach with non-existent wildcard pattern (slow path) */
+ link_wildcard = bpf_program__attach(skel->progs.test_nonexistent_wildcard);
+ err_wildcard = -errno;
+ ASSERT_ERR_PTR(link_wildcard, "attach_nonexistent_wildcard");
+ ASSERT_EQ(err_wildcard, -ENOENT, "wildcard_error_enoent");
+
+ /* Try to attach with non-existent exact name (fast path) */
+ link_exact = bpf_program__attach(skel->progs.test_nonexistent_exact);
+ err_exact = -errno;
+ ASSERT_ERR_PTR(link_exact, "attach_nonexistent_exact");
+ ASSERT_EQ(err_exact, -ENOENT, "exact_error_enoent");
+
+ /*
+ * Verify both paths return identical error codes - this is critical for
+ * API consistency and prevents user code from breaking when switching
+ * between wildcard patterns and exact function names.
+ */
+ ASSERT_EQ(err_wildcard, err_exact, "error_consistency");
+
+ kprobe_multi_session_errors__destroy(skel);
+}
+
static void test_unique_match(void)
{
LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
@@ -645,6 +717,10 @@ void test_kprobe_multi_test(void)
test_session_skel_api();
if (test__start_subtest("session_cookie"))
test_session_cookie_skel_api();
+ if (test__start_subtest("session_syms"))
+ test_session_syms_skel_api();
+ if (test__start_subtest("session_errors"))
+ test_session_errors();
if (test__start_subtest("unique_match"))
test_unique_match();
if (test__start_subtest("attach_write_ctx"))
diff --git a/tools/testing/selftests/bpf/progs/kprobe_multi_session_errors.c b/tools/testing/selftests/bpf/progs/kprobe_multi_session_errors.c
new file mode 100644
index 000000000000..749d43b35bc2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kprobe_multi_session_errors.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Test error code consistency between fast and slow paths for non-existent functions */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+/*
+ * Test 1: Non-existent wildcard pattern (slow path)
+ * This uses pattern matching with kallsyms parsing and should fail with ENOENT
+ */
+SEC("kprobe.session/__impossible_test_func_xyz_wildcard_*")
+int test_nonexistent_wildcard(struct pt_regs *ctx)
+{
+ return 0;
+}
+
+/*
+ * Test 2: Non-existent exact function name (fast path)
+ * This uses syms[] array and should fail with ENOENT (normalized from ESRCH)
+ */
+SEC("kprobe.session/__impossible_test_func_xyz_exact_123")
+int test_nonexistent_exact(struct pt_regs *ctx)
+{
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c b/tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c
new file mode 100644
index 000000000000..6a4bd57af1fc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Test kprobe.session with exact function names to verify syms[] optimization */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <stdbool.h>
+
+char _license[] SEC("license") = "GPL";
+
+int pid = 0;
+
+/* Results for each function: incremented on entry and return */
+__u64 test1_count = 0;
+
+/* Track entry vs return */
+bool test1_return = false;
+
+/*
+ * No tests in here, just to trigger 'bpf_fentry_test*'
+ * through tracing test_run
+ */
+SEC("fentry/bpf_modify_return_test")
+int BPF_PROG(trigger)
+{
+ return 0;
+}
+
+/*
+ * Test 1: Exact function name (no wildcards) - uses fast syms[] path
+ * This should attach via opts.syms array, bypassing kallsyms parsing
+ */
+SEC("kprobe.session/bpf_fentry_test1")
+int test_kprobe_syms_1(struct pt_regs *ctx)
+{
+ if (bpf_get_current_pid_tgid() >> 32 != pid)
+ return 0;
+
+ test1_count++;
+
+ /* Check if this is return probe */
+ if (bpf_session_is_return(ctx))
+ test1_return = true;
+
+ return 0; /* Always execute return probe */
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC PATCH bpf-next 3/3] selftests/bpf: add tests for kprobe.session optimization
2026-02-23 21:51 ` [RFC PATCH bpf-next 3/3] selftests/bpf: add tests for kprobe.session optimization Andrey Grodzovsky
@ 2026-02-24 13:12 ` Jiri Olsa
0 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2026-02-24 13:12 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, ast, daniel, andrii, rostedt, linux-trace-kernel,
linux-open-source
On Mon, Feb 23, 2026 at 04:51:13PM -0500, Andrey Grodzovsky wrote:
> Add two new subtests to kprobe_multi_test to validate the
> kprobe.session exact function name optimization:
SNIP
> +static void test_session_errors(void)
> +{
> + struct kprobe_multi_session_errors *skel = NULL;
> + struct bpf_link *link_wildcard = NULL;
> + struct bpf_link *link_exact = NULL;
> + int err_wildcard, err_exact;
> +
> + skel = kprobe_multi_session_errors__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "kprobe_multi_session_errors__open_and_load"))
> + return;
> +
> + /*
> + * Test error code consistency: both wildcard (slow path) and exact name
> + * (fast path) should return the same error code (ENOENT) for non-existent
> + * functions. This protects against future kernel changes that might alter
> + * error return values.
> + */
> +
> + /* Try to attach with non-existent wildcard pattern (slow path) */
> + link_wildcard = bpf_program__attach(skel->progs.test_nonexistent_wildcard);
> + err_wildcard = -errno;
> + ASSERT_ERR_PTR(link_wildcard, "attach_nonexistent_wildcard");
> + ASSERT_EQ(err_wildcard, -ENOENT, "wildcard_error_enoent");
> +
> + /* Try to attach with non-existent exact name (fast path) */
> + link_exact = bpf_program__attach(skel->progs.test_nonexistent_exact);
> + err_exact = -errno;
> + ASSERT_ERR_PTR(link_exact, "attach_nonexistent_exact");
> + ASSERT_EQ(err_exact, -ENOENT, "exact_error_enoent");
> +
> + /*
> + * Verify both paths return identical error codes - this is critical for
> + * API consistency and prevents user code from breaking when switching
> + * between wildcard patterns and exact function names.
> + */
> + ASSERT_EQ(err_wildcard, err_exact, "error_consistency");
> +
> + kprobe_multi_session_errors__destroy(skel);
there's already subtest for attach failures (test_attach_api_fails),
so maybe let's put this over there?
SNIP
> diff --git a/tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c b/tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c
> new file mode 100644
> index 000000000000..6a4bd57af1fc
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c
> @@ -0,0 +1,45 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Test kprobe.session with exact function names to verify syms[] optimization */
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <stdbool.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int pid = 0;
> +
> +/* Results for each function: incremented on entry and return */
> +__u64 test1_count = 0;
> +
> +/* Track entry vs return */
> +bool test1_return = false;
> +
> +/*
> + * No tests in here, just to trigger 'bpf_fentry_test*'
> + * through tracing test_run
> + */
> +SEC("fentry/bpf_modify_return_test")
> +int BPF_PROG(trigger)
> +{
> + return 0;
> +}
> +
> +/*
> + * Test 1: Exact function name (no wildcards) - uses fast syms[] path
> + * This should attach via opts.syms array, bypassing kallsyms parsing
> + */
> +SEC("kprobe.session/bpf_fentry_test1")
> +int test_kprobe_syms_1(struct pt_regs *ctx)
perhaps we could execute this as part of test_session_skel_api test?
seems like we could put this directly to progs/kprobe_multi_session.c and
call session_check(ctx) and change test_results validation accordingly
thanks,
jirka
> +{
> + if (bpf_get_current_pid_tgid() >> 32 != pid)
> + return 0;
> +
> + test1_count++;
> +
> + /* Check if this is return probe */
> + if (bpf_session_is_return(ctx))
> + test1_return = true;
> +
> + return 0; /* Always execute return probe */
> +}
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread