Netdev List
 help / color / mirror / Atom feed
* [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes
@ 2026-06-01 18:03 Zhenzhong Wu
  2026-06-01 18:03 ` [RFC PATCH 6.1.y 1/2] bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic Zhenzhong Wu
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Zhenzhong Wu @ 2026-06-01 18:03 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, sdf, haoluo, jolsa,
	menglong8.dong, eddyz87, shung-hsi.yu, tamird

Hi BPF maintainers,

This RFC backports two BPF verifier scalar range-tracking fixes to 6.1.y.
The series is intended to fix a verifier state-pruning issue where an
impossible scalar path can be kept while the real success path is pruned.

This is a verifier scalar range-tracking issue, not a helper-specific
issue.
The visible failure is that the verifier can prune the real success
continuation, which should not be skipped, and keep only an impossible one.
In the reproducer, the traced function returns 15 at runtime, but the
verifier keeps the path where r7 is treated as 0, hard-wires the opposite
branch, and the program reports the error branch.

The minimized reproducer uses fexit/bpf_get_func_ret only because it
provides a compact way to create the interesting register flow: one scalar
in r0 for the helper status, and another scalar loaded from the stack for
the traced function return value. The issue is not specific to
bpf_get_func_ret itself.
Because bpf_get_func_ret() was added in v5.17, this particular reproducer
directly applies to 6.1.y. I have not built a 5.15.y-compatible reproducer.

The relevant verifier-log bytecode from the reproducer is below. The later
instructions only store r7 into a map so user space can observe which
branch the verifier kept.

  15: (85) call bpf_get_func_ret#184    ; R0_w=scalar() fp-8_w=mmmmmmmm
  16: (79) r7 = *(u64 *)(r10 -8)        ; R7_w=scalar() R10=fp0
  17: (15) if r0 == 0x0 goto pc+1       ; R0_w=scalar()
  18: (bf) r7 = r0                      ; R0=scalar(id=1) R7=scalar(id=1)
  19: (55) if r0 != 0x0 goto pc+6       ; R0=0
  20: (67) r7 <<= 32                    ; R7_w=0
  21: (77) r7 >>= 32                    ; R7_w=0
  22: (b7) r1 = 1                       ; R1_w=1
  23: (55) if r7 != 0xf goto pc+1

The failure mechanism is:

  1. The program checks "if r0 == 0". The jump target is the success path,
     and the fallthrough path is the failure path and should imply r0 != 0.

  2. On v6.1.91, the verifier does not record that r0 != 0 fact for the
     fallthrough path. The following "r7 = r0" then gives r0 and r7 the
     same scalar id while both are still treated as possibly zero.

  3. At the later "if r0 != 0" check, the verifier still thinks r0 may be
     zero, so it explores the fallthrough path of that JNE. That path means
     r0 == 0, and because r7 shares the same scalar id, r7 is narrowed to
     zero as well. This is an impossible path: it came from the earlier
     failure path that should have implied r0 != 0.

  4. That impossible continuation reaches the return-value comparison with
     r7 == 0 and can make the verifier keep only the wrong branch. When the
     real success path is analyzed later, state pruning considers it safe
     against the earlier cached verifier state, so the real continuation is
     not explored.

The relevant pruning point is that regsafe()/states_equal() accepted the
real success-path state against an earlier cached state where r0 was an
imprecise scalar and r7 constraints were loose enough to cover the current
r7.

After confirming the mechanism, I ran git bisect with this minimized C
reproducer as the test case. The bisect started from the affected 6.7.y
behavior and the fixed v6.8 behavior, and narrowed the fix to the
v6.7..v6.8 window:

  https://gist.github.com/swananan/165cca6008f6c81870a28aa7a445d5ea

The bisect identified the upstream fix as:

  d028f87517d6775dccff4ddbca2740826f9e53f1
  bpf: make the verifier tracks the "not equal" for regs

For 6.1.y, applying d028f87517d6 alone is not sufficient. The older
verifier code also needs the range-preservation semantics from:

  9e314f5d8682e1fe6ac214fb34580a238b6fd3c4
  bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic

Without that semantic prerequisite, the old range-combining logic can still
discard the refined bounds after the verifier learns them.

The 6.1.y adaptation is split as follows:

  - patch 1 carries the 6.1.y-relevant part of 9e314f5d8682 by removing the
    knowledge-losing __reg_combine_{32,64}_into_{64,32} paths and using
    reg_bounds_sync() after conditional refinement;
  - patch 2 carries d028f87517d6 in the older reg_set_min_max() layout. In
    newer kernels, reg_set_min_max() refines the fallthrough branch through
    rev_opcode(opcode), so the fallthrough branch of BPF_JEQ is handled by
    the BPF_JNE refinement. In 6.1.y that split does not exist, so the same
    not-equal fact is expressed directly on BPF_JEQ's false_reg and
    BPF_JNE's true_reg.

Observed results with that reproducer:

  v6.1.91:               REPRO: BAD  (ran=1 error=1)
  v6.7.12:               REPRO: BAD  (ran=1 error=1)
  v6.8:                  REPRO: GOOD (ran=1 error=0)
  v6.1.91 + this series: REPRO: GOOD (ran=1 error=0)

Because this touches shared verifier scalar range logic, I am sending it as
RFC and would appreciate BPF maintainer guidance on whether this 6.1.y
semantic backport should be carried and whether the split in this series is
reasonable. The same issue should also be relevant to 6.6.y, which still
has the older verifier logic and predates the v6.8 fix, but this RFC only
includes the 6.1.y backport.

Zhenzhong Wu (2):
  bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic
  bpf: make the verifier tracks the "not equal" for regs

 kernel/bpf/verifier.c | 92 +++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 52 deletions(-)

base-commit: 228da13e907e2b46b7222cfc35290fbfad920bef
-- 
2.43.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-02 17:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 18:03 [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes Zhenzhong Wu
2026-06-01 18:03 ` [RFC PATCH 6.1.y 1/2] bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic Zhenzhong Wu
2026-06-01 18:04 ` [RFC PATCH 6.1.y 2/2] bpf: make the verifier tracks the "not equal" for regs Zhenzhong Wu
2026-06-02  5:47 ` [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes Shung-Hsi Yu
2026-06-02  6:42   ` Shung-Hsi Yu
2026-06-02  9:17     ` Shung-Hsi Yu
2026-06-02 17:25       ` Zhenzhong Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox