[PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale

Linux Kernel Selftest development
 help / color / mirror / Atom feed

* [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600
@ 2026-03-06 12:00 Sun Jian
  2026-03-06 16:14 ` Paul Chaignon
  0 siblings, 1 reply; 7+ messages in thread
From: Sun Jian @ 2026-03-06 12:00 UTC (permalink / raw)
  To: bpf, linux-kselftest
  Cc: andrii, eddyz87, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah, Sun Jian

pyperf600 is a verifier scale test. With newer LLVM, calling __on_event()
twice can push the generated program over the verifier jump sequence
complexity limit (8192), failing with:

  The sequence of 8193 jumps is too complex.

Let pyperf600 provide its own on_event() that calls __on_event() once, and
guard the shared wrapper accordingly. Other pyperf600 variants are
unaffected.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
---

v2:
  - Drop runtime -E2BIG skip; instead tweak pyperf600 program source to avoid
    hitting the verifier jump sequence complexity limit.
  - verif_scale_pyperf600 now passes; other pyperf600 variants unchanged.

 tools/testing/selftests/bpf/progs/pyperf.h    | 4 +++-
 tools/testing/selftests/bpf/progs/pyperf600.c | 7 +++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/progs/pyperf.h b/tools/testing/selftests/bpf/progs/pyperf.h
index 86484f07e1d1..c02c49c52c77 100644
--- a/tools/testing/selftests/bpf/progs/pyperf.h
+++ b/tools/testing/selftests/bpf/progs/pyperf.h
@@ -343,8 +343,9 @@ int __on_event(struct bpf_raw_tracepoint_args *ctx)
 	return 0;
 }
 
+#ifndef PYPERF_CUSTOM_ON_EVENT
 SEC("raw_tracepoint/kfree_skb")
-int on_event(struct bpf_raw_tracepoint_args* ctx)
+int on_event(struct bpf_raw_tracepoint_args *ctx)
 {
 	int ret = 0;
 	ret |= __on_event(ctx);
@@ -354,5 +355,6 @@ int on_event(struct bpf_raw_tracepoint_args* ctx)
 	ret |= __on_event(ctx);
 	return ret;
 }
+#endif
 
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/pyperf600.c b/tools/testing/selftests/bpf/progs/pyperf600.c
index ce1aa5189cc4..31e8a422d804 100644
--- a/tools/testing/selftests/bpf/progs/pyperf600.c
+++ b/tools/testing/selftests/bpf/progs/pyperf600.c
@@ -9,4 +9,11 @@
  * the loop will still execute 600 times.
  */
 #define UNROLL_COUNT 150
+#define PYPERF_CUSTOM_ON_EVENT
 #include "pyperf.h"
+
+SEC("raw_tracepoint/kfree_skb")
+int on_event(struct bpf_raw_tracepoint_args *ctx)
+{
+	return __on_event(ctx);
+}

base-commit: 5ee8dbf54602dc340d6235b1d6aa17c0f283f48c
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600
  2026-03-06 12:00 [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600 Sun Jian
@ 2026-03-06 16:14 ` Paul Chaignon
  2026-03-06 16:23   ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Chaignon @ 2026-03-06 16:14 UTC (permalink / raw)
  To: Sun Jian
  Cc: bpf, linux-kselftest, andrii, eddyz87, ast, daniel, martin.lau,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	shuah

On Fri, Mar 06, 2026 at 08:00:24PM +0800, Sun Jian wrote:
> pyperf600 is a verifier scale test. With newer LLVM, calling __on_event()
> twice can push the generated program over the verifier jump sequence
> complexity limit (8192), failing with:
> 
>   The sequence of 8193 jumps is too complex.
> 
> Let pyperf600 provide its own on_event() that calls __on_event() once, and
> guard the shared wrapper accordingly. Other pyperf600 variants are
> unaffected.
> 
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
> 
> v2:
>   - Drop runtime -E2BIG skip; instead tweak pyperf600 program source to avoid
>     hitting the verifier jump sequence complexity limit.
>   - verif_scale_pyperf600 now passes; other pyperf600 variants unchanged.
> 
>  tools/testing/selftests/bpf/progs/pyperf.h    | 4 +++-
>  tools/testing/selftests/bpf/progs/pyperf600.c | 7 +++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/progs/pyperf.h b/tools/testing/selftests/bpf/progs/pyperf.h
> index 86484f07e1d1..c02c49c52c77 100644
> --- a/tools/testing/selftests/bpf/progs/pyperf.h
> +++ b/tools/testing/selftests/bpf/progs/pyperf.h
> @@ -343,8 +343,9 @@ int __on_event(struct bpf_raw_tracepoint_args *ctx)
>  	return 0;
>  }
>  
> +#ifndef PYPERF_CUSTOM_ON_EVENT
>  SEC("raw_tracepoint/kfree_skb")
> -int on_event(struct bpf_raw_tracepoint_args* ctx)
> +int on_event(struct bpf_raw_tracepoint_args *ctx)
>  {
>  	int ret = 0;
>  	ret |= __on_event(ctx);
> @@ -354,5 +355,6 @@ int on_event(struct bpf_raw_tracepoint_args* ctx)
>  	ret |= __on_event(ctx);
>  	return ret;
>  }
> +#endif
>  
>  char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/progs/pyperf600.c b/tools/testing/selftests/bpf/progs/pyperf600.c
> index ce1aa5189cc4..31e8a422d804 100644
> --- a/tools/testing/selftests/bpf/progs/pyperf600.c
> +++ b/tools/testing/selftests/bpf/progs/pyperf600.c
> @@ -9,4 +9,11 @@
>   * the loop will still execute 600 times.
>   */
>  #define UNROLL_COUNT 150
> +#define PYPERF_CUSTOM_ON_EVENT
>  #include "pyperf.h"
> +
> +SEC("raw_tracepoint/kfree_skb")
> +int on_event(struct bpf_raw_tracepoint_args *ctx)
> +{
> +	return __on_event(ctx);
> +}

I'm not sure I like this approach. It feels like the 600 in pyperf600
wouldn't really be comparable to the 600 or the 180 in other test cases
since they wouldn't correspond to the same program (yours in five times
shorter). A low-effort alternative would be to tweak STACK_MAX_LEN and
UNROLL_COUNT, but I only managed to make that work by reducing
STACK_MAX_LEN to 180 and then I guess there isn't much difference with
pyperf180 :(

Ideally we would understand what changes in the bytecode with LLVM 19+
and mitigate that by adapting __on_event.

> 
> base-commit: 5ee8dbf54602dc340d6235b1d6aa17c0f283f48c
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600
  2026-03-06 16:14 ` Paul Chaignon
@ 2026-03-06 16:23   ` Alexei Starovoitov
  2026-03-08  5:55     ` sun jian
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2026-03-06 16:23 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: Sun Jian, bpf, open list:KERNEL SELFTEST FRAMEWORK,
	Andrii Nakryiko, Eduard, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan

On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <paul.chaignon@gmail.com> wrote:
>
> On Fri, Mar 06, 2026 at 08:00:24PM +0800, Sun Jian wrote:
> > pyperf600 is a verifier scale test. With newer LLVM, calling __on_event()
> > twice can push the generated program over the verifier jump sequence
> > complexity limit (8192), failing with:
> >
> >   The sequence of 8193 jumps is too complex.
> >
> > Let pyperf600 provide its own on_event() that calls __on_event() once, and
> > guard the shared wrapper accordingly. Other pyperf600 variants are
> > unaffected.
> >
> > Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> > ---
> >
> > v2:
> >   - Drop runtime -E2BIG skip; instead tweak pyperf600 program source to avoid
> >     hitting the verifier jump sequence complexity limit.
> >   - verif_scale_pyperf600 now passes; other pyperf600 variants unchanged.
> >
> >  tools/testing/selftests/bpf/progs/pyperf.h    | 4 +++-
> >  tools/testing/selftests/bpf/progs/pyperf600.c | 7 +++++++
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/progs/pyperf.h b/tools/testing/selftests/bpf/progs/pyperf.h
> > index 86484f07e1d1..c02c49c52c77 100644
> > --- a/tools/testing/selftests/bpf/progs/pyperf.h
> > +++ b/tools/testing/selftests/bpf/progs/pyperf.h
> > @@ -343,8 +343,9 @@ int __on_event(struct bpf_raw_tracepoint_args *ctx)
> >       return 0;
> >  }
> >
> > +#ifndef PYPERF_CUSTOM_ON_EVENT
> >  SEC("raw_tracepoint/kfree_skb")
> > -int on_event(struct bpf_raw_tracepoint_args* ctx)
> > +int on_event(struct bpf_raw_tracepoint_args *ctx)
> >  {
> >       int ret = 0;
> >       ret |= __on_event(ctx);
> > @@ -354,5 +355,6 @@ int on_event(struct bpf_raw_tracepoint_args* ctx)
> >       ret |= __on_event(ctx);
> >       return ret;
> >  }
> > +#endif
> >
> >  char _license[] SEC("license") = "GPL";
> > diff --git a/tools/testing/selftests/bpf/progs/pyperf600.c b/tools/testing/selftests/bpf/progs/pyperf600.c
> > index ce1aa5189cc4..31e8a422d804 100644
> > --- a/tools/testing/selftests/bpf/progs/pyperf600.c
> > +++ b/tools/testing/selftests/bpf/progs/pyperf600.c
> > @@ -9,4 +9,11 @@
> >   * the loop will still execute 600 times.
> >   */
> >  #define UNROLL_COUNT 150
> > +#define PYPERF_CUSTOM_ON_EVENT
> >  #include "pyperf.h"
> > +
> > +SEC("raw_tracepoint/kfree_skb")
> > +int on_event(struct bpf_raw_tracepoint_args *ctx)
> > +{
> > +     return __on_event(ctx);
> > +}
>
> I'm not sure I like this approach. It feels like the 600 in pyperf600
> wouldn't really be comparable to the 600 or the 180 in other test cases
> since they wouldn't correspond to the same program (yours in five times
> shorter). A low-effort alternative would be to tweak STACK_MAX_LEN and
> UNROLL_COUNT, but I only managed to make that work by reducing
> STACK_MAX_LEN to 180 and then I guess there isn't much difference with
> pyperf180 :(

+1

Sun Jian,
I asked to do a _minimal_ tweak to pyperf600.
What you did is a drastic change. Pls don't hack tests
just to make them pass. The tests have to be meaningful
and test coverage shouldn't degrade.

pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600
  2026-03-06 16:23   ` Alexei Starovoitov
@ 2026-03-08  5:55     ` sun jian
  2026-03-08  8:12       ` Eduard Zingerman
  0 siblings, 1 reply; 7+ messages in thread
From: sun jian @ 2026-03-08  5:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Paul Chaignon, bpf, open list:KERNEL SELFTEST FRAMEWORK,
	Andrii Nakryiko, Eduard, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan

On Sat, Mar 7, 2026 at 12:23 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> Sun Jian,
> I asked to do a _minimal_ tweak to pyperf600.
> What you did is a drastic change. Pls don't hack tests
> just to make them pass. The tests have to be meaningful
> and test coverage shouldn't degrade.
>

Hi Alexei, Paul,

I spent some more time looking into this.

Comparing unmodified pyperf600 bytecode between clang-18 and clang-20, I
see fewer instructions with clang-20 and nearly the same number of
branches:

clang-18: 90134 lines of disassembly, 6090 gotos
clang-20: 78369 lines of disassembly, 6085 gotos

So this does not look like a simple program-size increase. What seems to
change is the branch layout in the unrolled loop body, which seems to
make the verifier DFS go deeper before pruning.

One useful data point is that a single __on_event() copy does load
successfully (that was my v2), while with 2 or more copies it
consistently fails at exactly 8193 jumps. In other words, the verifier
hits the jump-sequence limit before reaching the second copy.

I also tried a range of source-level mitigations, but so far I couldn't
find one that preserves the test intent and keeps pyperf600 comparable
to the other variants:

- UNROLL_COUNT tuning: 99 does not compile; 100-120 compile but still
fail at 8193; 121-145 fail to compile; 146-150 compile but still fail
at 8193
- early break/goto on !frame_ptr: insufficient for pyperf600, and also
hurts pyperf600_nounroll by adding branch points to the 600-iteration loop
- wrapping 5x __on_event() in a non-unrolled loop: verifier still unrolls it
- making get_frame_data() __noinline: still fails
- moving the unwind loop into a __noinline subprog: still fails
- SUBPROGS / __on_event as __noinline: still fails; codegen changes,
but the verifier still hits 8193

Paul also mentioned trying STACK_MAX_LEN/UNROLL_COUNT and only getting it
to work with STACK_MAX_LEN reduced to 180, which would make it too close
to pyperf180.

The only source change I found that passes is reducing __on_event() to a
single copy, but that clearly weakens the test as pointed out.

At this point, I don't have a source-level fix that preserves the test
intent.

Regards,
Sun Jian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600
  2026-03-08  5:55     ` sun jian
@ 2026-03-08  8:12       ` Eduard Zingerman
  2026-03-08 10:01         ` sun jian
  2026-03-08 16:08         ` Alexei Starovoitov
  0 siblings, 2 replies; 7+ messages in thread
From: Eduard Zingerman @ 2026-03-08  8:12 UTC (permalink / raw)
  To: sun jian, Alexei Starovoitov
  Cc: Paul Chaignon, bpf, open list:KERNEL SELFTEST FRAMEWORK,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan

[-- Attachment #1: Type: text/plain, Size: 3707 bytes --]

On Sun, 2026-03-08 at 13:55 +0800, sun jian wrote:
> On Sat, Mar 7, 2026 at 12:23 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> > Sun Jian,
> > I asked to do a _minimal_ tweak to pyperf600.
> > What you did is a drastic change. Pls don't hack tests
> > just to make them pass. The tests have to be meaningful
> > and test coverage shouldn't degrade.
> > 
> 
> Hi Alexei, Paul,
> 
> I spent some more time looking into this.
> 
> Comparing unmodified pyperf600 bytecode between clang-18 and clang-20, I
> see fewer instructions with clang-20 and nearly the same number of
> branches:
> 
> clang-18: 90134 lines of disassembly, 6090 gotos
> clang-20: 78369 lines of disassembly, 6085 gotos
> 
> So this does not look like a simple program-size increase. What seems to
> change is the branch layout in the unrolled loop body, which seems to
> make the verifier DFS go deeper before pruning.
> 
> One useful data point is that a single __on_event() copy does load
> successfully (that was my v2), while with 2 or more copies it
> consistently fails at exactly 8193 jumps. In other words, the verifier
> hits the jump-sequence limit before reaching the second copy.
> 
> I also tried a range of source-level mitigations, but so far I couldn't
> find one that preserves the test intent and keeps pyperf600 comparable
> to the other variants:
> 
> - UNROLL_COUNT tuning: 99 does not compile; 100-120 compile but still
> fail at 8193; 121-145 fail to compile; 146-150 compile but still fail
> at 8193
> - early break/goto on !frame_ptr: insufficient for pyperf600, and also
> hurts pyperf600_nounroll by adding branch points to the 600-iteration loop
> - wrapping 5x __on_event() in a non-unrolled loop: verifier still unrolls it
> - making get_frame_data() __noinline: still fails
> - moving the unwind loop into a __noinline subprog: still fails
> - SUBPROGS / __on_event as __noinline: still fails; codegen changes,
> but the verifier still hits 8193
> 
> Paul also mentioned trying STACK_MAX_LEN/UNROLL_COUNT and only getting it
> to work with STACK_MAX_LEN reduced to 180, which would make it too close
> to pyperf180.
> 
> The only source change I found that passes is reducing __on_event() to a
> single copy, but that clearly weakens the test as pointed out.
> 
> At this point, I don't have a source-level fix that preserves the test
> intent.

Hi Sun,

I have an old investigation for the pyperf600 failure reason from March 2024.
Attaching it to the email. The discussion happened off-list.
The source-level "mitigation" I found back then still stands:

  --- a/tools/testing/selftests/bpf/progs/pyperf.h
  +++ b/tools/testing/selftests/bpf/progs/pyperf.h
  @@ -97,8 +97,15 @@ static __always_inline bool get_frame_data(void *frame_ptr, PidData *pidData,
                              frame_ptr + pidData->offsets.PyFrameObject_code);
   
          // read data from PyCodeObject
  +#if __BPF_CPU_VERSION__ < 4
          if (!frame->f_code)
                  return false;
  +#else
  +        asm volatile goto("if %[f_code] == 0 goto %l[has_f_code];"
  +                             :: [f_code]"r"(frame->f_code) :: has_f_code);
  +        return false;
  +has_f_code:
  +#endif

(One needs cpuv4 because of the jump instructions exceeding 16-bit
 offset ranges are only possible with cpuv4).

The decision back then was that the "mitigation" is too brittle to
apply and we should leave the test as-is, hoping that verifier would
get smarter some day and be able to load the program.

Best regards,
Eduard

[-- Attachment #2: old-pyperf600-investigation.md --]
[-- Type: text/markdown, Size: 7734 bytes --]

# What happened

The pyperf600 test fails to verify when compiled by recent clang revisions.
The last known good revision is [0], the first known bad revision is [1].
Revision [1] comes from the pull request [2].

Verifier error when using revision [1]:

    ...
    ; if (frame->co_name) @ pyperf.h:118
    25460: (79) r3 = *(u64 *)(r10 -32)    ; R3_w=scalar() R10=fp0 fp-32=mmmmmmmm
    25461: (15) if r3 == 0x0 goto pc+7
    The sequence of 8193 jumps is too complex.
    verification time 822174 usec
    stack depth 360

All testing below was done using revisions [0] and [1].

# pyperf600 structure

The relevant parts of the test look as follows:

    static __always_inline bool get_frame_data(...)
    {
        ...
        if (!frame->f_code)
            return false;
        ...
        if (frame->co_filename) { ... }
        if (frame->co_name) { ... }
        return true;
    }

    int __on_event(...)
    {
        ...
        #pragma clang loop unroll(UNROLL_COUNT)      //  UNROLL_COUNT == 150
        for (int i = 0; i < STACK_MAX_LEN; ++i)      // STACK_MAX_LEN == 600
            if (frame_ptr && get_frame_data(...)) {
                if (!symbol_id) { ... }
                if (*symbol_id == new_symbol_id) { ... }
                ...
            }
        ...
    }

    SEC("raw_tracepoint/kfree_skb")
    int on_event(struct bpf_raw_tracepoint_args* ctx)
    {
        ...
        __on_event(...);
        __on_event(...);
        __on_event(...);
        __on_event(...);
        __on_event(...);
        ...
    }

The call to get_frame_data() is inlined.
The main takeaways are:
- BPF program consists of five calls to __on_event();
- __on_event() has a big loop inside;
- loop body has 5 conditionals
  (when counted with conditionals in get_frame_data()).

# LLVM change description

The relevant part of [1] is:

    --- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
    +++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
    @@ -1282,7 +1295,7 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
       }

       // Do not attempt partial/runtime unrolling in FullLoopUnrolling
    -  if (OnlyFullUnroll && !(UP.Count >= MaxTripCount)) {
    +  if (OnlyFullUnroll && (UP.Count < TripCount || UP.Count < MaxTripCount)) {
         LLVM_DEBUG(
             dbgs() << "Not attempting partial/runtime unroll in FullLoopUnroll.\n");
         return LoopUnrollResult::Unmodified;

- `UP.Count` is a preferred number of iterations to be unrolled,
  it is 150 for pyperf600;
- `TripCount` is a predicted number or loop iterations,
  it is 600 for pyperf600.

The hunk above does exactly what comments says:
prevents partial unrolling of the main pyperf600 loop on full
unrolling pass.

There is also a partial unrolling pass done later in the pipeline.

# LLVM change impact on pyperf600

Prior to [1] the loop in pyperf600 was unrolled by full unrolling pass,
after [1] it is unrolled by partial unrolling pass.
Such change causes a subtle rearrangement of basic blocks inside the
loop which turns out to be important for the verifier.

The rearrangement occurs inside inlined body of get_frame_data():

    static __always_inline bool get_frame_data(...)
    {
        ...
        if (!frame->f_code)
            return false;
        ...
    }

Translation before [1]:                 Translation after [1]

; if (!frame->f_code)                   ; if (!frame->f_code)
  r3 = *(u64 *)(r10 - 0x30)               r3 = *(u64 *)(r10 - 0x30)
  if r3 != 0x0 goto +0x2 <LBB0_19>        if r3 == 0x0 goto +0x4b <LBB0_39>

Before [1] the fall-through path is to `return false`,
after [1] the fall-through path is to the rest of get_frame_data() body.

The `if (!frame->f_code)` is the first conditional in the loop body
and it guards all other conditionals in the body
(when !frame_code == 1 the rest of conditionals is skipped).

Before [1] the verifier would process pyperf600 in the following order:

- __on_event()
  - process loop 600 times:
    - `if (!frame->f_code) return false`:
      - fall-through is to `return false`;
      - push one jump to jump history
      - assume fall-through branch and skip the rest of the loop body;
- __on_event(): same thing, push 600 jumps to jump history;
- __on_event(): same thing, push 600 jumps to jump history;
- __on_event(): same thing, push 600 jumps to jump history;
- __on_event():
  this is the last call to __on_event,
  all branches within it are verified before proceeding
  with branches pushed for previous calls.

When the loop inside the last call to __on_event() is verified a
checkpoint at it's start becomes viable.
Branches pushed when previous calls to __on_event() were processed
would eventually hit this checkpoint and the whole process would
converge eventually.
Thus, at it's peak the jump history length would be ~600*5 == 3000.

However, after [1] the fall-through path for the `if (!frame->f_code)`
leads to other conditionals in the loop body,
pushing up to 5 conditionals to jump history for each iteration.
Hence, peak jump history length would be something like ~600*5*5 == 15000.
Which is outside of current limits for the verifier.

# Possible fix #1: change pyperf600 basic blocks layout

The diff below is sufficient to make test verify again after [1]
(when tested for cpuv4, cpuv3 generates jumps overflowing 16-bit offset):

    --- a/tools/testing/selftests/bpf/progs/pyperf.h
    +++ b/tools/testing/selftests/bpf/progs/pyperf.h
    @@ -97,8 +97,10 @@ static __always_inline bool get_frame_data(void *frame_ptr, PidData *pidData,
                                frame_ptr + pidData->offsets.PyFrameObject_code);

            // read data from PyCodeObject
    -       if (!frame->f_code)
    -               return false;
    +       asm volatile goto("if %[f_code] != 0 goto %l[has_f_code]"
    +                         :: [f_code]"r"(frame->f_code) :: has_f_code);
    +       return false;
    +has_f_code:
            bpf_probe_read_user(&frame->co_filename,
                                sizeof(frame->co_filename),
                                frame->f_code + pidData->offsets.PyCodeObject_filename);

Effectively this forces verifier to first explore `return false`
branch of the first conditional in the loop body,
same way it was done before [1].

(The likely/unlikely macro relying on __builtin_expect() don't give
 the desired code layout for some reason).

# Possible fix #2: change pyperf600 limits

The diff below reduces the loop size sufficiently to fit inside jump
history (again, works for cpuv4, but not for cpuv3):

    --- a/tools/testing/selftests/bpf/progs/pyperf600.c
    +++ b/tools/testing/selftests/bpf/progs/pyperf600.c
    @@ -1,6 +1,6 @@
     // SPDX-License-Identifier: GPL-2.0
     // Copyright (c) 2019 Facebook
    -#define STACK_MAX_LEN 600
    +#define STACK_MAX_LEN 230
     /* Full unroll of 600 iterations will have total
      * program size close to 298k insns and this may
      * cause BPF_JMP insn out of 16-bit integer range.

# Possible fix #3: verifier changes

Another option would be to forgo current verifier conditional
exploration rules:
when inside a loop, don't explore the fall-through branch first,
instead predict which branch would push less conditionals
onto jump history and explore that first.

Need more time to asses if this a feasible option in terms of added complexity.

# Links

[0] Last good revision:
    c3291253c3b5 ("Revert "[scudo] [MTE] resize stack depot for allocation ring buffer" (#80777)")
[1] First broken revision:
    99ddd77ed9e1 ("[LoopUnroll] Introduce PragmaUnrollFullMaxIterations as a hard cap on how many iterations we try to unroll (#78648)")
[2] https://github.com/llvm/llvm-project/pull/78648

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600
  2026-03-08  8:12       ` Eduard Zingerman
@ 2026-03-08 10:01         ` sun jian
  2026-03-08 16:08         ` Alexei Starovoitov
  1 sibling, 0 replies; 7+ messages in thread
From: sun jian @ 2026-03-08 10:01 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Paul Chaignon, bpf,
	open list:KERNEL SELFTEST FRAMEWORK, Andrii Nakryiko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan

On Sun, Mar 8, 2026 at 4:12 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> I have an old investigation for the pyperf600 failure reason from March 2024.
> Attaching it to the email. The discussion happened off-list.
> The source-level "mitigation" I found back then still stands:
>
> The decision back then was that the "mitigation" is too brittle to
> apply and we should leave the test as-is, hoping that verifier would
> get smarter some day and be able to load the program.
Hi Eduard,
Thanks, this is helpful.

The write-up made the reason much clearer to me.
Let's leave the test as-is.

Regards.
Sun Jian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600
  2026-03-08  8:12       ` Eduard Zingerman
  2026-03-08 10:01         ` sun jian
@ 2026-03-08 16:08         ` Alexei Starovoitov
  1 sibling, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2026-03-08 16:08 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: sun jian, Paul Chaignon, bpf, open list:KERNEL SELFTEST FRAMEWORK,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan

On Sun, Mar 8, 2026 at 12:12 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Sun, 2026-03-08 at 13:55 +0800, sun jian wrote:
> > On Sat, Mar 7, 2026 at 12:23 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> > > Sun Jian,
> > > I asked to do a _minimal_ tweak to pyperf600.
> > > What you did is a drastic change. Pls don't hack tests
> > > just to make them pass. The tests have to be meaningful
> > > and test coverage shouldn't degrade.
> > >
> >
> > Hi Alexei, Paul,
> >
> > I spent some more time looking into this.
> >
> > Comparing unmodified pyperf600 bytecode between clang-18 and clang-20, I
> > see fewer instructions with clang-20 and nearly the same number of
> > branches:
> >
> > clang-18: 90134 lines of disassembly, 6090 gotos
> > clang-20: 78369 lines of disassembly, 6085 gotos
> >
> > So this does not look like a simple program-size increase. What seems to
> > change is the branch layout in the unrolled loop body, which seems to
> > make the verifier DFS go deeper before pruning.
> >
> > One useful data point is that a single __on_event() copy does load
> > successfully (that was my v2), while with 2 or more copies it
> > consistently fails at exactly 8193 jumps. In other words, the verifier
> > hits the jump-sequence limit before reaching the second copy.
> >
> > I also tried a range of source-level mitigations, but so far I couldn't
> > find one that preserves the test intent and keeps pyperf600 comparable
> > to the other variants:
> >
> > - UNROLL_COUNT tuning: 99 does not compile; 100-120 compile but still
> > fail at 8193; 121-145 fail to compile; 146-150 compile but still fail
> > at 8193
> > - early break/goto on !frame_ptr: insufficient for pyperf600, and also
> > hurts pyperf600_nounroll by adding branch points to the 600-iteration loop
> > - wrapping 5x __on_event() in a non-unrolled loop: verifier still unrolls it
> > - making get_frame_data() __noinline: still fails
> > - moving the unwind loop into a __noinline subprog: still fails
> > - SUBPROGS / __on_event as __noinline: still fails; codegen changes,
> > but the verifier still hits 8193
> >
> > Paul also mentioned trying STACK_MAX_LEN/UNROLL_COUNT and only getting it
> > to work with STACK_MAX_LEN reduced to 180, which would make it too close
> > to pyperf180.
> >
> > The only source change I found that passes is reducing __on_event() to a
> > single copy, but that clearly weakens the test as pointed out.
> >
> > At this point, I don't have a source-level fix that preserves the test
> > intent.
>
> Hi Sun,
>
> I have an old investigation for the pyperf600 failure reason from March 2024.
> Attaching it to the email. The discussion happened off-list.
> The source-level "mitigation" I found back then still stands:
>
>   --- a/tools/testing/selftests/bpf/progs/pyperf.h
>   +++ b/tools/testing/selftests/bpf/progs/pyperf.h
>   @@ -97,8 +97,15 @@ static __always_inline bool get_frame_data(void *frame_ptr, PidData *pidData,
>                               frame_ptr + pidData->offsets.PyFrameObject_code);
>
>           // read data from PyCodeObject
>   +#if __BPF_CPU_VERSION__ < 4
>           if (!frame->f_code)
>                   return false;
>   +#else
>   +        asm volatile goto("if %[f_code] == 0 goto %l[has_f_code];"
>   +                             :: [f_code]"r"(frame->f_code) :: has_f_code);
>   +        return false;
>   +has_f_code:
>   +#endif
>
> (One needs cpuv4 because of the jump instructions exceeding 16-bit
>  offset ranges are only possible with cpuv4).
>
> The decision back then was that the "mitigation" is too brittle to
> apply and we should leave the test as-is, hoping that verifier would
> get smarter some day and be able to load the program.

Back then the hope was that it will be fixed imminently,
but 2 years later it still fails. So please send your workaround.

I prefer to have 'test_progs' passing all tests without denylist.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-08 16:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 12:00 [PATCH v2] selftests/bpf: avoid jump seq limit in verif_scale_pyperf600 Sun Jian
2026-03-06 16:14 ` Paul Chaignon
2026-03-06 16:23   ` Alexei Starovoitov
2026-03-08  5:55     ` sun jian
2026-03-08  8:12       ` Eduard Zingerman
2026-03-08 10:01         ` sun jian
2026-03-08 16:08         ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox