BPF List
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next v2 0/6] bpf: better error reporting when verifier hits 1M instructions limit
@ 2026-05-26 19:37 Eduard Zingerman
  2026-05-26 19:37 ` [PATCH RFC bpf-next v2 1/6] bpf: move live registers and scc printout to a standalone function Eduard Zingerman
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Eduard Zingerman @ 2026-05-26 19:37 UTC (permalink / raw)
  To: bpf, ast, andrii
  Cc: daniel, martin.lau, kernel-team, yonghong.song, eddyz87, memxor

When the BPF verifier exceeds the 1M instruction budget, the current
error output shows a random execution trace that happens to be active
at the moment, which is not very helpful for debugging.

This series improves the error report using a profiler-inspired
approach: collect and count "callchain" stack traces that the verifier
visits during program validation, and report the top 3 hottest traces
when the budget is exhausted. To minimize performance an memory impact
of such profiling, only collect samples when verifier visits loop
headers, iterator next, may_goto and callback-calling instructions.

For callchains ending at iterator next, may_goto, or callback-calling
instructions, identify which registers or stack slots most frequently
differ between cached and current states.

Here is an example of the report for scx lavd_dispatch, with verifier
limited to 200K instructions to trigger the error:

  lavd_dispatch():
    ; void BPF_STRUCT_OPS(lavd_dispatch, s32 cpu, struct task_struct *prev) @ main.bpf.c:889
    ... disassembly ...

  consume_task():
    ; bool consume_task(u64 cpu_dsq_id, u64 cpdom_dsq_id) @ balance.bpf.c:410
    ... disassembly ...

  #1 most visited simulated stacktrace (visited 1807 times):
    lavd_dispatch/124 (.../scx/scheds/rust/scx_lavd/src/bpf/main.bpf.c:1107)
    consume_task/2715 (.../scx/scheds/rust/scx_lavd/src/bpf/balance.bpf.c:316)

  #2 most visited simulated stacktrace (visited 1682 times):
    lavd_dispatch/124 (.../scx/scheds/rust/scx_lavd/src/bpf/main.bpf.c:1107)
    consume_task/2994 (.../scx/scheds/rust/scx_lavd/src/bpf/balance.bpf.c:386)

  #3 most visited simulated stacktrace (visited 8 times):
    lavd_dispatch/255 (.../scx/scheds/rust/scx_lavd/src/bpf/main.bpf.c:1022)
      Most varying: R7 (frame 0)

  BPF program is too large. Processed 200001 insn

Changelog:
v1 -> v2 (bots):
  - Use kvfree() in bpf_compute_loops().
  - Adjust fwd_edges_no_loop test case to avoid dead code elimination
    converting 'if' to 'goto'.
  - Use GFP_KERNEL_ACCOUNT for callchain entry allocation in
    update_callchain_profile().
  - Zero-initialize 'cc' in update_callchain_profile() to avoid
    copying uninitialized stack memory to the heap.
  - Use %td instead of %ld for ptrdiff_t format specifier in
    print_callchain_entry() and disasm_subprog().
  - Size printed_subs bitmap as BPF_MAX_SUBPROGS + 2 to account for
    fake and exception subprograms.
  - Fix bpf_sample_state_diffs() inner loop to iterate from head
    instead of pos_i, avoiding container_of() on the dummy list head.
  - Add DIFF_OTHER to distinguish states that differ because of idmap
    or other inconsistencies.

v1: https://lore.kernel.org/bpf/20260526-better-1m-reporting-v1-0-51e4f2c59780@gmail.com/T/
---
Eduard Zingerman (6):
      bpf: move live registers and scc printout to a standalone function
      bpf: compute loops hierarchy
      selftests/bpf: test cases for loop hierarchy computation
      bpf: report hot simulated callchains when 1M instructions limit is met
      bpf: report register diff summary for hot callchains
      selftests/bpf: test budget exhaustion profiling report

 include/linux/bpf_verifier.h                       |  39 ++++
 kernel/bpf/Makefile                                |   2 +-
 kernel/bpf/fixups.c                                |   5 +
 kernel/bpf/liveness.c                              |  22 +-
 kernel/bpf/loops.c                                 | 184 ++++++++++++++++
 kernel/bpf/states.c                                | 180 ++++++++++++++--
 kernel/bpf/verifier.c                              | 233 +++++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/verifier.c  |   4 +
 .../selftests/bpf/progs/verifier_budget_report.c   | 175 ++++++++++++++++
 .../selftests/bpf/progs/verifier_live_stack.c      |   2 +-
 .../selftests/bpf/progs/verifier_loop_hierarchy.c  | 233 +++++++++++++++++++++
 11 files changed, 1034 insertions(+), 45 deletions(-)
---
base-commit: 8496d9020ff37a33c2a7b2fc84350fd03ffbde78
change-id: 20260525-better-1m-reporting-1d795a21cf72

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-26 23:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 19:37 [PATCH RFC bpf-next v2 0/6] bpf: better error reporting when verifier hits 1M instructions limit Eduard Zingerman
2026-05-26 19:37 ` [PATCH RFC bpf-next v2 1/6] bpf: move live registers and scc printout to a standalone function Eduard Zingerman
2026-05-26 19:37 ` [PATCH RFC bpf-next v2 2/6] bpf: compute loops hierarchy Eduard Zingerman
2026-05-26 20:26   ` sashiko-bot
2026-05-26 20:33     ` Eduard Zingerman
2026-05-26 19:37 ` [PATCH RFC bpf-next v2 3/6] selftests/bpf: test cases for loop hierarchy computation Eduard Zingerman
2026-05-26 19:37 ` [PATCH RFC bpf-next v2 4/6] bpf: report hot simulated callchains when 1M instructions limit is met Eduard Zingerman
2026-05-26 19:37 ` [PATCH RFC bpf-next v2 5/6] bpf: report register diff summary for hot callchains Eduard Zingerman
2026-05-26 20:17   ` bot+bpf-ci
2026-05-26 20:35     ` Eduard Zingerman
2026-05-26 21:31   ` sashiko-bot
2026-05-26 23:24     ` Eduard Zingerman
2026-05-26 19:37 ` [PATCH RFC bpf-next v2 6/6] selftests/bpf: test budget exhaustion profiling report Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox