Re: [PATCH bpf-next] selftests/bpf: Fix reg_bounds to match new tnum-based refinement

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Paul Chaignon <paul.chaignon@gmail.com>
To: Eduard Zingerman <eddyz87@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Subject: Re: [PATCH bpf-next] selftests/bpf: Fix reg_bounds to match new tnum-based refinement
Date: Wed, 8 Apr 2026 22:48:23 +0200	[thread overview]
Message-ID: <ada_F2WbRcnOYXWb@mail.gmail.com> (raw)
In-Reply-To: <ada9UuSQi2SE2IfB@mail.gmail.com>

On Wed, Apr 08, 2026 at 10:40:50PM +0200, Paul Chaignon wrote:
> Commit efc11a667878 ("bpf: Improve bounds when tnum has a single
> possible value") improved the bounds refinement to detect when the tnum
> and u64 range overlap in a single value (and the bounds can thus be set
> to that value).
> 
> Eduard then noticed that it broke the slow-mode reg_bounds selftests
> because they don't have an equivalent logic and are therefore unable to
> refine the bounds as much as the verifier. The following test case
> illustrates this.
> 
>   ACTUAL   TRUE1:  scalar(u64=0xffffffff00000000,u32=0,s64=0xffffffff00000000,s32=0)
>   EXPECTED TRUE1:  scalar(u64=[0xfffffffe00000001; 0xffffffff00000000],u32=0,s64=[0xfffffffe00000001; 0xffffffff00000000],s32=0)
>   [...]
>   #323/1007 reg_bounds_gen_consts_s64_s32/(s64)[0xfffffffe00000001; 0xffffffff00000000] (s32)<op> S64_MIN:FAIL
> 
> with the verifier logs:
> 
>   [...]
>   19: w0 = w6                 ; R0=scalar(smin=0,smax=umax=0xffffffff,
>                                           var_off=(0x0; 0xffffffff))
>                                 R6=scalar(smin=0xfffffffe00000001,smax=0xffffffff00000000,
>                                           umin=0xfffffffe00000001,umax=0xffffffff00000000,
>                                           var_off=(0xfffffffe00000000; 0x1ffffffff))
>   20: w0 = w7                 ; R0=0 R7=0x8000000000000000
>   21: if w6 == w7 goto pc+3
>   [...]
>   from 21 to 25: [...]
>   25: w0 = w6                 ; R0=0 R6=0xffffffff00000000
>                               ;         ^
>                               ;         unexpected refined value
>   26: w0 = w7                 ; R0=0 R7=0x8000000000000000
>   27: exit
> 
> When w6 == w7 is true, the verifier can deduce that the R6's tnum is
> equal to (0xfffffffe00000000; 0x100000000) and then use that information
> to refine the bounds: the tnum only overlap with the u64 range in
> 0xffffffff00000000. The reg_bounds selftest doesn't know about tnums
> and therefore fails to perform the same refinement.
> 
> This issue happens when the tnum carries information that cannot be
> represented in the ranges, as otherwise the selftest could reach the
> same refined value using just the ranges. The tnum thus needs to
> represent non-contiguous values (ex., R6's tnum above, after the
> condition). The only way this can happen in the reg_bounds selftest is
> at the boundary between the 32 and 64bit ranges. We therefore only need
> to handle that case.
> 
> This patch fixes the selftest refinement logic by checking if the u32
> and u64 ranges overlap in a single value. If so, the ranges can be set
> to that value. We need to handle two cases: either they overlap in
> umin64...
> 
>   u64 values
>   matching u32 range:     xxx        xxx        xxx        xxx
>                       |--------------------------------------|
>   u64 range:          0                xxxxx                 UMAX64
> 
> or in umax64:
> 
>   u64 values
>   matching u32 range:     xxx        xxx        xxx        xxx
>                       |--------------------------------------|
>   u64 range:          0          xxxxx                       UMAX64
> 
> To detect the first case, we decrease umax64 to the maximum value that
> matches the u32 range. If that happens to be umin64, then umin64 is the
> only overlap. We proceed similarly for the second case, increasing
> umin64 to the minimum value that matches the u32 range.
> 
> Note this is similar to how the verifier handles the general case using
> tnum, but we don't need to care about a single-value overlap in the
> middle of the range. That case is not possible when comparing two
> ranges.
> 
> This patch also adds two test cases reproducing this bug as part of the
> normal test runs (without SLOW_TESTS=1).
> 
> Fixes: efc11a667878 ("bpf: Improve bounds when tnum has a single possible value")
> Reported-by: Eduard Zingerman <eddyz87@gmail.com>
> Closes: https://lore.kernel.org/bpf/4e6dd64a162b3cab3635706ae6abfdd0be4db5db.camel@gmail.com/
> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
> ---

Hi Eduard,

This patch fixes the test case you reported and a couple variants:

reg_bounds_gen_consts_s64_u32/(s64)[0xfffffffe00000001; 0xffffffff00000000] (u32)<op> S64_MIN
reg_bounds_gen_consts_s64_s32/(s64)[0xfffffffe00000001; 0xffffffff00000000] (s32)<op> S64_MIN
reg_bounds_gen_consts_s64_u32/(s64)[0xfffffffe00000000; 0xfffffffffffffffe] (u32)<op> 0xffffffffffffffff
reg_bounds_gen_consts_s64_s32/(s64)[0xfffffffe00000000; 0xfffffffffffffffe] (s32)<op> 0xffffffffffffffff

but we're not out of the woods yet. While running reg_bounds* tests, I
noticed a few other unrelated failures.

---

reg_bounds_gen_consts_s64_u32/(s64)[0xfffffffe00000002; 0xffffffff00000000] (u32)<op> S64_MIN+1

This one hits an invariant violation on an impossible branch and the
bounds are set to an incorrect value that doesn't match what the test
expects.

  19: w0 = w6                ; R0=scalar(smin=0,smax=umax=0xffffffff,
                                         var_off=(0x0; 0xffffffff))
                               R6=scalar(smin=0xfffffffe00000002,smax=0xffffffff00000000,
                                         umin=0xfffffffe00000002,umax=0xffffffff00000000,
                                         var_off=(0xfffffffe00000000; 0x1ffffffff))
  20: w0 = w7                ; R0=1 R7=0x8000000000000001
  21: if w6 == w7 goto pc+3  ; [...]
  [...]

  from 21 to 25: R0=1 R1=0x8000000000000001 R2=0x8000000000000001 R6=0xffffffff00000001 R7=0x8000000000000001 R10=fp0
  [...]

  ACTUAL   TRUE1:  scalar(u64=0xffffffff00000001,u32=1,s64=0xffffffff00000001,s32=0x1)
  EXPECTED TRUE1:  scalar(u64=[0xfffffffe00000002; 0xffffffff00000000],u32=1,s64=[0xfffffffe00000002; 0xffffffff00000000],s32=0x1)

W7 is equal to 1 and, given R6's ranges, cannot be equal to W6. The
condition is always false. On the true branch, the verifier thus
incorrectly refines R6's value to 0xffffffff00000001.

This is a new type of invariant violation (i.e., involving the tnum)
that is not detected by range_bounds_violation(). I'm expecting it will
be handled by Hari's followup patchset. I've shared the program with
Hari so it can maybe be used as a selftest. I'm guessing we're fine
waiting for that fix as it's not failing in CI; if not, we could do a
quick fix in the verifier.

---

reg_bounds_gen_consts_s64_u32/(s64)[0xffffffff00000002; 0] (u32)<op> S64_MIN+1

This one fails because reg_bounds' branch detection logic doesn't match
the kernel's.

  ACTUAL   FALSE1: scalar(u64=[0; U64_MAX],u32=[0; 4294967295],s64=[0xffffffff00000002; 0],s32=[S32_MIN; S32_MAX])
  EXPECTED FALSE1: scalar(u64=[0; U64_MAX],u32=[0; 4294967295],s64=[0xffffffff00000002; 0],s32=[S32_MIN; S32_MAX])
  ACTUAL   FALSE2: scalar(u64=0x8000000000000001,u32=1,s64=S64_MIN+1,s32=0x1)
  EXPECTED FALSE2: scalar(u64=0x8000000000000001,u32=1,s64=S64_MIN+1,s32=0x1)
  ACTUAL   TRUE1:  <not found>
  EXPECTED TRUE1:  scalar(u64=[0xffffffff00000002; 0x7fffffffffffffff],u32=[2147483648; 1],s64=[0xffffffff00000002; 0xffffffff00000001],s32=0x1)
  ACTUAL   TRUE2:  <not found>
  EXPECTED TRUE2:  scalar(u64=0x8000000000000001,u32=1,s64=S64_MIN+1,s32=0x1)

It's failing with that error since b254c6d816e5 ("bpf: Simulate branches
to prune based on range violations"), but was already failing before
with a different error (unexpected range). The root cause seems to be
that the test runs into an invariant violation, here as well.

We'll probably need to update reg_bounds's branch prediction logic to
match what the kernel is now doing. I can look into this next.

[...]

next prev parent reply	other threads:[~2026-04-08 20:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 20:40 [PATCH bpf-next] selftests/bpf: Fix reg_bounds to match new tnum-based refinement Paul Chaignon
2026-04-08 20:48 ` Paul Chaignon [this message]
2026-04-09  5:18   ` Harishankar Vishwanathan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ada_F2WbRcnOYXWb@mail.gmail.com \
    --to=paul.chaignon@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=harishankar.vishwanathan@gmail.com \
    --cc=memxor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox